Implementing Random ID Generation

NOTE: THIS IS A DRAFT COPY MISSING IMAGES, DIAGRAMS, AND OTHER MEDIA SUPPORT

This is to talk about random ID generation.  Take notice, this is random ID generation for Drupal as a whole, NOT just random node ID generation. Tackling just the nid's, and not id generation as a whole is futile and ridiculous. Please note that this is being posted from my Windows Live Writer (which I love blogging on), but lacks some of the more advanced tools to make this entry a nicely formatted as I would like for a presentation level paper for discussion.

With that disclaimer out of the say, let us move on to the problem set. I was working on a project that was doing a very high insertion of nodes (from 120-250 full nodes per-second, this is NOT the number of quires that were running per/sec, which one can assume was on the order of 600-1000/sec). The first problem was speed. I knew what the system "should" be doing, but it was annoyingly slow and cumbersome. The problem came down the the locks on the sequence table. They were piling up very fast. I could have used a serial field (auto_increment), but this is turn makes some logic much harder, and the way the PHP handles DB connections, there is the chance of using a "dirty" connection, and getting a bad id read. Further, getting the last inserted id (from a programmatic stand point) is different across db platforms. Long story short, I needed a reliable random-id number generator. This system had to adhere to the following:

  1. Reduce the number of locks on the table
  2. issue random, non-sequential numbers
  3. A collision of 2 numbers had to be avoided 110%. (Read: Not handles but avoided; meaning, it could not happen...ever)

The Plan

So, the first 2 are easy right? And then it would of been even easier if I just had to "handle" collisions; but, it gets tricky because I had to avoid them, they could not happen. So, how do I do this? Well, first let me be honest, #3 came about because after implementing 1 and 2, I rand into system dead-locks (for other seasons NOT attributed to this ID implementation) because of the sheer speed, but then more collisions that I should be handling. It should also be noted that due to PHP's architecture, it is impossible to have a 0% collision rate w/o going back full circle and running right back into the problems we had discussed with the current sequential ID implementation. Even though it's "random", the speed of the application was creating frequent collisions in the number space because, in theory, I was basically rolling the dice many many times, and thus it was inevitable. I wasn't exactly just inserting a node every hour or so. Anyway, instead of going into details of the iterations I had to go through before coming to a final solution, lets just get to it.

 

The Solution

The solution is a 2-pronged, legacy approach. I though of it this way:

There are 3 elements in play:

  1. Sequence table
  2. ID/Sequence string
  3. "working ID stack/table" (more on this after the following bullets)
  • Overcoming Collisions
    • Drupal's naming convention for ID's is {table_name}_key_name, thus if someone followed this, i would use this to check the {table_name} and then the "key_name" to check to an ID in the number pool for that number domain (table). [This means I do a SELECT 1 FROM {table_name} WHERE key_name = $rand_id; also note that there is also a NOLOCK attribute on the implementation of this query, but it is ommitted here for sake of discussion]
    • Create a function that updates a variable map that double checks that the convention is followed (see if the table name and the key exists), and mark accordingly. This map is only re-run during DDL statements (module updates, enabledment, etc.) or arbitrarily by the admin.
    • When an ID is requested, see if the ID is in the mapping as "approved", and follow the decision tree in fig. 2.1
    • If the ID is not approved, issue a legacy sequence (basically use db_next_id as is).
  • ID Generation decision tree
    • Generate a random ID
    • Check to see if that ID exists in the current table using a NO LOCK attribute to read the table for speed
    • Check to see if that ID exists in the "working stack"
    • If ID collision, regenerate ID and repeat (this is rare, but happens)
    • Add the "working ID" to the stack

The Working ID Stack/Table

As stated before, w/o going into details, it is impossible due to PHP's arch and non-use of persistent data objects to reduce the likely hook of a collision to 0%. It is possible though to reduce the possibility to near 0%. This why we need the working ID table. This serves as a buffer to the ID's we are currently working with. This is because, at an arbitrarily high rate of inserts, it is still very feasibly possible for an ID to be issued, and then that same ID also be issued for another incoming thread because in between the time the that ID1 is issued for T1 (thread 1), and T1 actually inserts it's information in the table, T2 could be issued ID1, and then thus a collision would happen for T2, assuming that T1 finishes before T2. This table is flushed at intervals of its buffer contents.

 Conclusion

So, that, in a nutshell, is how I implemented random ID generation.

Comments

Sayısal Loto

As stated before, w/o going into details, it is impossible due to PHP's arch and non-use of persistent data objects to reduce the likely hook of a collision to 0%. It is possible though to reduce the possibility to near 0%. This why we need the working ID table. This serves as a buffer to the ID's we are currently working with. This is because, at an arbitrarily high rate of inserts, it is still very feasibly possible for an ID to be issued, and then that same ID also be issued for another incoming thread because in between the time the that ID1 is issued for T1 (thread 1), and T1 actually inserts it's information in the table, T2 could be issued ID1, and then thus a collision

Is there any way to remove

Is there any way to remove them? I am strugling into this issue.

Please Help

Marios
--------
http://emailextractor14.com
www.pokerbot-smart.com

How does removing the sequence table

impact this?

benjamin, Agaric Design Collective

I dont think is even impact

I dont think is even impact it.

Maxon - poker bot , email extractor

Not sure

Not sure. Removed them all, but that's very interesting. They probabley now have that in their logic. It's pretty easy to read the tag and do the operation to validate....I'll have to come up with something else.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.