New Jersey Drivers License Restriction Codes
Driver's License numbers in New Jersey aren't random. They follow the format: Affff lllii mmyye
, where A
is the first letter of the person's last name, ffff
is some mapping of the remaining letters of the last name to a four digit numeric, lll
is a mapping of the full first name to a three digit numeric and ii
is a code representing the middle initial (according to the below table:
Where the number corresponding to the initial is 10*column number + row number. mm corresponds to the month born, and yy
to the year born. e
is the eye color (a value 1-8 corresponding to BRO
, BLU
, GRY
, GRN
, BLK
, etc.)
Justia US Law US Codes and Statutes New Jersey Revised. 2016 New Jersey Revised Statutes. Of the New Jersey Motor Vehicle Commission for a license to operate a.
The only thing I don't understand is how the names are mapped to the integer values. I only have 5 examples for the last name mappings: (ignoring the first letter because it doesn't play into the mapping
For first names, I only have four:
Does anyone have any ideas how the implementation is done, or even a general mapping function that will hash a max 25 length string to a four digit or three digit number while maintaining lexicographical order (<=, not <).
Things I've Tried
Convert each letter to a number 1-26. Then, taking only the first four numbers, create the number by the rule 26^3 * first number + 26^2 * second number + 26 * third + fourth. Then, divide this number by 26^4 + 26^3 + 26^2 + 26, and multiply by 10000 to map the decimal into 0-9999. This produces the following mappings:
Get a list of the top 10,000 most common surnames. Order by the second letter, and then check the index. This produces the following mappings:
Each letter subdivides the 10,000. The first number (according to 1-26) cuts it into one of 26 pieces. The second cuts the piece into one of 26, and so on and so forth. This produces the following mappings:
Convert each of the first four letters to 1-26. Concatenate all of them, multiply the resulting number by 10,000, and divide by 26262626. This produces the following mappings:
Do the above with 0-25, divide by 25252525. This produces the following mappings:
Additional Samples
While I believe all of the above samples are correct, I tried to track down more authentic sample data points. Ones that I can guarantee are below:
Last Names
First Names
4 Answers
This is not yet a complete answer, but perhaps what I've found can be combined with other information to come up with the complete solution.
First name encoding
If we assume a linear encoding, then we have everything needed to figure this out based on your four samples. If we consider letter values as a=0, b=1, ...
regardless of whether they're uppercase or lowercase, your four samples can be turned into four linear equations:
Since we have four equations and four unknowns, it's easily solved using simple but tedious algebra or in matrix form using Gaussian elimination. (Sorry for the ugly looking math, but unlike other StackExchange sites apparently ReverseEngineering doesn't support MathML, which is unfortunate.)
If you do so, you get the following values:
All very neat and accurate, but there's a problem, which is that any four samples would result in some answer. The question is whether it works for all possible names, and unfortunately, the answer is no.
Nj Dl Restriction Codes
Further samples
I did some searching on the internet and found a few more samples. Here's an image of a Russian spy's New Jersey license and here is a Police guide (see page 60). This pamphlet from the NJ MVC encodes 'Dennis J. Driver' as D4047-16371
If we try the first name equation above on these new samples, they fail, so it's not quite right. The result suggests that the weighting is not quite so simple. When searching, I also found that both Ontario and Québec licenses appear to use the same first and last name encodings. So for example, this temporary Ontario permit verifies that 'Dennis' is encoded as 163 in Ontario as well as in New Jersey.
When I run a linear regression on all of the first name values vs. the first letter l
(encoded as a=0, b=1, ...
) I get the equation 32.42*l+52.55
with an R^2 value of 0.986 which shows this to be highly linear.
Last name experiment
I tried a very simple experiment with the last name encoding which was a very simplistic method not mentioned in your list of things you have tried. That was to simply consider each character as a base-26 digit. Using the 4 characters following the first, the encodings for 'Baab' and 'Jackson' are correctly obtained, but no others matched.
Other encoding schemes
I did some searching for existing encoding schemes. Soundex was both easily found and easily discounted, but there are many variations to it and it's possible that some expanded variation was used. I was not able to locate a Soundex variant that produced these particular values, but I learned some interesting things along the way.
First, perhaps not surprisingly, there has long been a need to try to match up names in a database using some kind of encoding. Generically, the problem is called record-linking and is typically thought of as mathing a possibly misspelled name to a subset of possible matches in a database. Soundex has been used for this purpose, but found to be somewhat lacking in effectiveness.
Other schemes I have located, or at least located references to include:
- Cutter-Sanborn Four-Figure used to encode author names for libraries
This stringmetric project has what appears to be a nice collection of algorithm implementations with links to the original describing papers, but I haven't tried all of these.
Perhaps if someone does, they can report back here.
In case you're still trying to figure this out, I've made some progress. With assistance from u/jccool5000 on reddit (post), who has a collection of over 900 samples mostly from Ontario. AFAIK, Ontario and NJ share the same encoding - Quebec, not so sure. I did some data manipulation to figure this out.
Starting with the numbers of the last name, 1st of 4 digits corresponds to the 2nd letter of the last name, as the 1st is already coded directly to the first letter of the license number.
The remaining three numerical digits codes the second letter of the last name as well, from 000-999. However, each second-digit has its own 000-999 range. That is to say:
- Hypothetical last name XA is X0001
- Hypothetical last name XAZZZ is X0999, or something close to 999.
- Hypothetical last name XB is X1001
- Hypothetical last name XDZZZ is X1999, or something close to 999.
- Hypothetical last name XE is X2001
- Hypothetical last name XEZZZ is X2999, or something close to 999.
You can refer to the above table to see when the 999 will reset back to 000. This is just the pattern I've found so far. I don't know how the numbers are distributed to the names.
First name code is a lot simpler, but at the same time, it's also not evenly distributed. The difference with first name code is it only goes from 000 (Aaron) to probably 799 (796 for Zoe). What I mean by not evenly distributed is names that start with A range from 000 to 071, which 071 has some names that start with BA. Meanwhile, names that begin in Y are confined to a small range of no less than 785 to no more than 792.
New Jersey Driver's License
Many states use something called SoundEx to generate license numbers (sometimes you even see SoundEx on government forms and/or computer screens when they ask for drivers license numbers.)
The soundex system was designed to phonetically map names that sound similar to close values, even though they might be spelled wildly differently eg Pheiffer vs Fifer)
See also things like Metaphone. Also, they may not use soundex directly.
I don't see this above, but male or female is coded in as well. in the last five digits, the first 2 are month of birth. Males are 01-12. Females 50 is added. so the run from 51 (january) to 62 (december)Also, my name is Alexandra, which is also 019 as is your example of alexander. The absence of a middle name is reflected as 00i know a friend with middle name alexandra has 61 = (ii)another, is Serafina middle name 82 = (ii)another, is Dorothy middle name 64 = (ii)I would suggest collecting more name samples to compare