Some thoughts on A UID numbering scheme - A paper published by Unique Identification Authority of India.

  1. Process for De-duplication (page 5)
    Since biometric information contain no ordering and hence cannot be indexed like text based information, when a resident applies for a UID with his/her fingerprints, iris and photo of face, these biometrics have to be compared against the entire UID database (existing residents with UIDs) to ensure that this new applicant is indeed unique and has not already been allotted a UID (even under a different name, address etc). This 1:N biometric comparison (N=size of the UID database) is the most compute intensive operation of the UID server systeem.
    It sounds as if the search operation would take O(N) time. Why can't the data be treated as binary numbers that can be ordered and indexed? It'll improve the search time to at least O(log N) and possibly even more if good algorithms are used. The difference is huge when N is as large as 1.2 billion.
  2. Memorization of UID (page 6)
    This section is about how long the string length should be. In short, the string has to be as short as possible but that meets density requirement and does not include alphabet characters, just numbers. It is important to keep the UID simple and small to help residents to remember their number.
    Firstly the use of the hindu-arabic numeral system(0,1,2,3,4,5,6,7,8,9) is suggested since these numerals are recognized/used by the largest subset of people in the country. Secondly we suggest the use of 12 digits (11 + 1 check sum) since 11 digits gives us a 100billion number space which in turn can provide a low density of used numbers.
    A 12 digit UID doesn't sound like an easy thing to memorize. However, this length is necessary to ensure 80 billion unique UIDs.
  3. UID static PIN and dynamic PIN (page 7)
    In order to authenticate(ascertain it is who s/he claims to be) a resident needs to provide his/her UID number as well as say a biometric marker – such as a fingerprint.
    Using biometrics while issuing UIDs may be fine. But using biometrics for other important transactions might put the user at risk. e.g. Malaysia car thieves steal finger
  4. Principles and Requirements (page 11)
    Number Generation: The numbers are generated in a random, non-repeating sequence. There are several approaches to doing this in the computer science literature. The algorithm and any"seed" chosen to generate IDs should not be made public and should be considered a national secret.
    This violates Shannon's maxim, "The enemy knows the system". Why can't the system rely on the secrecy of the seed only?
  5. The Checksum (page 12)
    There is one scheme that meets our requirements: the Verhoeff Scheme. This scheme is relatively complex, and in the days before ubiquitous computing, there was a tendency to avoid it in favor of simpler schemes. In this day and age however, and at the scale of the UID, precision must be the goal. The Verhoeff scheme catches all single errors and all adjacent transpositions. It also catches >95% of twin errors and >94% of jump transpositions.
    More information on the scheme can be found here: and
Update: I mailed this link to Nandan Nilekani and he responded by requesting me to send this as a document to him which I eventually did.


Susam Pal said:

No, I haven't. Probably I should. I am curious about points 1, 3 and 4.

Utkarshraj Atmaram said:

Nice observations! Have you mailed these to nandan.nilekani _at_ nic _dot_ in?

Post a comment