NOTE: THIS IS A DRAFT COPY MISSING IMAGES, DIAGRAMS, AND OTHER MEDIA SUPPORT
This is to talk about random ID generation. Take notice, this is random ID generation for Drupal as a whole, NOT just random node ID generation. Tackling just the nid's, and not id generation as a whole is futile and ridiculous. Please note that this is being posted from my Windows Live Writer (which I love blogging on), but lacks some of the more advanced tools to make this entry a nicely formatted as I would like for a presentation level paper for discussion.
With that disclaimer out of the say, let us move on to the problem set. I was working on a project that was doing a very high insertion of nodes (from 120-250 full nodes per-second, this is NOT the number of quires that were running per/sec, which one can assume was on the order of 600-1000/sec). The first problem was speed. I knew what the system "should" be doing, but it was annoyingly slow and cumbersome. The problem came down the the locks on the sequence table. They were piling up very fast. I could have used a serial field (auto_increment), but this is turn makes some logic much harder, and the way the PHP handles DB connections, there is the chance of using a "dirty" connection, and getting a bad id read. Further, getting the last inserted id (from a programmatic stand point) is different across db platforms. Long story short, I needed a reliable random-id number generator. This system had to adhere to the following:
The Plan
So, the first 2 are easy right? And then it would of been even easier if I just had to "handle" collisions; but, it gets tricky because I had to avoid them, they could not happen. So, how do I do this? Well, first let me be honest, #3 came about because after implementing 1 and 2, I rand into system dead-locks (for other seasons NOT attributed to this ID implementation) because of the sheer speed, but then more collisions that I should be handling. It should also be noted that due to PHP's architecture, it is impossible to have a 0% collision rate w/o going back full circle and running right back into the problems we had discussed with the current sequential ID implementation. Even though it's "random", the speed of the application was creating frequent collisions in the number space because, in theory, I was basically rolling the dice many many times, and thus it was inevitable. I wasn't exactly just inserting a node every hour or so. Anyway, instead of going into details of the iterations I had to go through before coming to a final solution, lets just get to it.
The Solution
The solution is a 2-pronged, legacy approach. I though of it this way:
There are 3 elements in play:
The Working ID Stack/Table
As stated before, w/o going into details, it is impossible due to PHP's arch and non-use of persistent data objects to reduce the likely hook of a collision to 0%. It is possible though to reduce the possibility to near 0%. This why we need the working ID table. This serves as a buffer to the ID's we are currently working with. This is because, at an arbitrarily high rate of inserts, it is still very feasibly possible for an ID to be issued, and then that same ID also be issued for another incoming thread because in between the time the that ID1 is issued for T1 (thread 1), and T1 actually inserts it's information in the table, T2 could be issued ID1, and then thus a collision would happen for T2, assuming that T1 finishes before T2. This table is flushed at intervals of its buffer contents.
Conclusion
So, that, in a nutshell, is how I implemented random ID generation.
Comments
Nice
I have found two interesting sources ( http://filesfinds.com & http://fileshunt.com ) and would like to give the benefit of my experience to you.
Interesting
I have found two interesting sources ( http://filesfinds.com & http://fileshunt.com ) and would like to give the benefit of my experience to you.
dede
dede
伟哥
伟哥
美国伟哥
万艾可
十年前的3月27日,它被美国食品药品管理局批准上市使用。当时的美国《时代周刊》称:“世界等待此药已经4000年了!伟哥问世的第一周,美国每天就开出1.5万张处方。等到第七周,开出此药的处方每天达到27万张。如今十年过去了,伟哥创造了奇迹。据最新报道称,迄今为止,全球已有3500万名男性使用了伟哥。
How does removing the sequence table
impact this?
benjamin, Agaric Design Collective
Not sure
Not sure. Removed them all, but that's very interesting. They probabley now have that in their logic. It's pretty easy to read the tag and do the operation to validate....I'll have to come up with something else.
How did the spambots get through here?
Looking at all the spam above I wonder how the bot's got around the challenge?
Bjorn Solstad
Post new comment