NOTE: THIS IS A DRAFT COPY MISSING IMAGES, DIAGRAMS, AND OTHER MEDIA SUPPORT
This is to talk about random ID generation. Take notice, this is random ID generation for Drupal as a whole, NOT just random node ID generation. Tackling just the nid's, and not id generation as a whole is futile and ridiculous. Please note that this is being posted from my Windows Live Writer (which I love blogging on), but lacks some of the more advanced tools to make this entry a nicely formatted as I would like for a presentation level paper for discussion.
With that disclaimer out of the say, let us move on to the problem set. I was working on a project that was doing a very high insertion of nodes (from 120-250 full nodes per-second, this is NOT the number of quires that were running per/sec, which one can assume was on the order of 600-1000/sec). The first problem was speed. I knew what the system "should" be doing, but it was annoyingly slow and cumbersome. The problem came down the the locks on the sequence table. They were piling up very fast. I could have used a serial field (auto_increment), but this is turn makes some logic much harder, and the way the PHP handles DB connections, there is the chance of using a "dirty" connection, and getting a bad id read. Further, getting the last inserted id (from a programmatic stand point) is different across db platforms. Long story short, I needed a reliable random-id number generator. This system had to adhere to the following:
The Plan
So, the first 2 are easy right? And then it would of been even easier if I just had to "handle" collisions; but, it gets tricky because I had to avoid them, they could not happen. So, how do I do this? Well, first let me be honest, #3 came about because after implementing 1 and 2, I rand into system dead-locks (for other seasons NOT attributed to this ID implementation) because of the sheer speed, but then more collisions that I should be handling. It should also be noted that due to PHP's architecture, it is impossible to have a 0% collision rate w/o going back full circle and running right back into the problems we had discussed with the current sequential ID implementation. Even though it's "random", the speed of the application was creating frequent collisions in the number space because, in theory, I was basically rolling the dice many many times, and thus it was inevitable. I wasn't exactly just inserting a node every hour or so. Anyway, instead of going into details of the iterations I had to go through before coming to a final solution, lets just get to it.
The Solution
The solution is a 2-pronged, legacy approach. I though of it this way:
There are 3 elements in play:
The Working ID Stack/Table
As stated before, w/o going into details, it is impossible due to PHP's arch and non-use of persistent data objects to reduce the likely hook of a collision to 0%. It is possible though to reduce the possibility to near 0%. This why we need the working ID table. This serves as a buffer to the ID's we are currently working with. This is because, at an arbitrarily high rate of inserts, it is still very feasibly possible for an ID to be issued, and then that same ID also be issued for another incoming thread because in between the time the that ID1 is issued for T1 (thread 1), and T1 actually inserts it's information in the table, T2 could be issued ID1, and then thus a collision would happen for T2, assuming that T1 finishes before T2. This table is flushed at intervals of its buffer contents.
Conclusion
So, that, in a nutshell, is how I implemented random ID generation.
Comments
Sayısal Loto
As stated before, w/o going into details, it is impossible due to PHP's arch and non-use of persistent data objects to reduce the likely hook of a collision to 0%. It is possible though to reduce the possibility to near 0%. This why we need the working ID table. This serves as a buffer to the ID's we are currently working with. This is because, at an arbitrarily high rate of inserts, it is still very feasibly possible for an ID to be issued, and then that same ID also be issued for another incoming thread because in between the time the that ID1 is issued for T1 (thread 1), and T1 actually inserts it's information in the table, T2 could be issued ID1, and then thus a collision
Is there any way to remove
Is there any way to remove them? I am strugling into this issue.
Please Help
Marios
--------
http://emailextractor14.com
www.pokerbot-smart.com
How does removing the sequence table
impact this?
benjamin, Agaric Design Collective
I dont think is even impact
I dont think is even impact it.
Maxon - poker bot , email extractor
Not sure
Not sure. Removed them all, but that's very interesting. They probabley now have that in their logic. It's pretty easy to read the tag and do the operation to validate....I'll have to come up with something else.
Post new comment