Generating fake phone numbers that are valid based on libphonenumber's rules - faker

I'd like to use libphonenumber to validate phone numbers, but a number of our integration tests are using Faker to generate test data, and the numbers that this generates are not always valid based on the full validation rules defined in libphonenumber
Is there any way we can use libphonenumber to generate a random set of valid phone numbers? I'm aware of getExampleNumber but this isn't suitable as it only returns a single number of a given type.

Related

Meteor Id collision proof shortening

Meteor uses it's internal Random package to generate Mongo-Ids for the documents, where the used set of characters is defined as:
var UNMISTAKABLE_CHARS = "23456789ABCDEFGHJKLMNPQRSTWXYZabcdefghijkmnopqrstuvwxyz";
The method description for Random.id also states:
Return a unique identifier, such as "Jjwjg6gouWLXhMGKW", that is likely to be unique in the whole world.
which is defined for the default length of an Id (17 chars; each one of UNMISTAKABLE_CHARS).
Now I would like to use only the first N characters of the Id to shorten my URLs (which include the Ids to dynamically load pages that require a specific document, which is determined by the Id).
So if my original Id is
`v5sw59HEdX9KM5KQE`
I would like to use for example (consider a totally random-picked N=5 here):
{
_id:"v5sw59HEdX9KM5KQE",
short: "v5sw5"
}
as document schema and fetch the respective document by this Id using { short } as query in my Mongo.Collection.
Now my question is how many characters are satisfactory to prevent collision if an amount of documents (thus Ids) between 5000 to 10000 are to be considered.
Note: I have some tools on entropy calculation and all these values (character set, length of the original Ids, number of documents) in front of me but I don't know how to wire this all up to safely calculate N.
If I understand correctly, besides the normal 17 chars long id generated for your documents _id, you would like a shorter id so that typically url's look less scary when they contain that id.
In your example you truncate the id, hence creating an explicit association between your shorter id and the original document id.
This sounds like git shorten commit hash: How does Git(Hub) handle possible collisions from short SHAs?
You could follow a similar path, i.e. first determine an initial default length that is reasonable to avoid probable collision (as explained in Peter O.'s answer), but explicitly check for uniqueness server-side and increase the length of any new shorten version in case of collision, until it becomes unique again.
Generating identifiers at random already runs the risk, at least in theory, of generating a duplicate identifier. For the default length of MongoIDs (assuming there are 5517 of them), the chance of having a duplicate MongoID reaches 50% after generating almost 731156 billion random MongoIDs (see "Birthday problem"), so the chance of a duplicate is negligible in practice for most applications.
Shortening a random identifier will make the collision problem even worse. In this case, for an ID length of 5 characters (resulting in 555 or 503284375 different IDs), the chance of having a duplicate MongoID reaches 50% after generating only about 26415 random IDs.
Since it appears that you can't control how MongoIDs are generated as easily as you can control how shortened "unique IDs" are generated, one thing you can do is the following:
Create a document that assigns each MongoID to a uniquely assigned number (such as a monotonically increasing counter).
To make the numbers assigned this way "look random", feed each number to a so-called "full-period" linear congruential generator to get a unique but "randomized" number within the generator's period.
The numbers (encoded similarly to MongoID strings) can then serve as short identifiers for your purposes.
But consider whether you really want the short identifiers created this way to be predictable. Using short identifiers hardly achieves this predictability goal.
If you wish to go the route of using shortened MongoIDs, see "Birthday problem" for formulas you can use to estimate how many random numbers it takes for the risk of collision to remain tolerable.
For further information on how Meteor generates MongoIDs, see also this question; one of its answers includes a way you can have MongoDB generate MongoIDs on the server rather than have Meteor do so on the client. It appears, too, that Meteor doesn't check the MongoIDs it generates for uniqueness before inserting them into a document.
I would argue that if you want to avoid collisions on a small collection then you don't want to use random ids, but either go with fully deterministic IDs or at least reduce the randomness to something more controlled. Along those lines, another option for you to consider would be to use MONGO for idGeneration in your collection. Those IDs are generated following a known recipe. Accordingly you could take characters 1-4 and 12 of that ID and would get a guarantee for no hash collisions as long as no more than N documents are created in the same second, where N is the number of characters used in MongoIDs (which I don't know off hand).

symfony validator error name constant

I was wondering what is the usage of these codes that are in each validator, i.e. in https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Validator/Constraints/NotBlank.php#L24
class NotBlank extends Constraint
{
const IS_BLANK_ERROR = 'c1051bb4-d103-4f74-8988-acbcafc7fdc3';
I could not find any documentation about it, neither in http://symfony.com/doc/master/validation/custom_constraint.html: what algorithm is used to generate them?
It seems to be a UUID. From Wikipedia:
A universally unique identifier (UUID) is a 128-bit number used to
identify information in computer systems. The term globally unique
identifier (GUID) is also used.
When generated according to the standard methods, UUIDs are for
practical purposes unique, without depending for their uniqueness on a
central registration authority or coordination between the parties
generating them, unlike most other numbering schemes. While the
probability that a UUID will be duplicated is not zero, it is close
enough to zero to be negligible.
In PHP you can generate it using UUID PECL package or using a library like this one.

How to generate unique DICOM UID?

I am working on DICOM gated (PET) data.
I would like to artificially create a DICOM image series which includes gated data. I am inquiring on the increment values of SOPInstanceUID which labels each image slice in each phase or gate.
These have different values for each slice in a gate and are incremented between gates but I can't find out the logic to how this value is chosen.
Is there a reference to where and how these values are written?
Multiple algorithms to generate DICOM UID are explained in this answer with their drawbacks.
As per DICOM specifications, all UIDs including SOPInstanceUID in question should be unique. This is irrelevant to what data (gated PET data or other) you are working on.
Following is from specifications:
2017a Part 5 - Data Structures and Encoding (9 Unique Identifiers (UIDs))
Unique Identifiers (UIDs) provide the capability to uniquely identify a wide variety of items. They guarantee uniqueness across multiple countries, sites, vendors and equipment. Different classes of objects, instance of objects and information entities can be distinguished from one another across the DICOM universe of discourse irrespective of any semantic context.
UID consists of two parts:
Organization root:
This part of UID ensures the uniqueness across organizations. There are service providers who offer this for free. Medical Connections is the one I am aware about. You can contact them to get the one for free.
Suffix:
Further, you should generate suffix in such a way that it guarantees uniqueness inside your organization.
Following are the general rules for DICOM UID:
Total length must be <= 64 characters, including the stops
Must contain only digits 0-9 and full stops
Each numeric "component" (between stops) must be a valid and unambiguous integer number, and so must not have a leading zero (unless the whole component is zero)
Must be guaranteed to be unique - this means:
It must be derived from a proper official root under your sole control.
It must not be created by appending digits (however special you consider the combination!) to someone else's UID.
In particular, series UIDs for secondary capture images, KIN objects etc. must not be created as derivatives of the Study UID (unless you own that root!)
Related to the above, there is no expectation or requirement that the Study UID, Series UID and Instance UID for images should be derived from the same root (though in practice, Series UID and Instance UID normally are, as both must be generated internally by the equipment which generates the images)
Date and Time are useful for generating UIDs, but only if:
Each machine has a unique root (normally your company UID root + a machine specific suffix such as a serial number
If it is possible for UIDs to be generated at > 1 per second, then a sequential counter should also be used
if on a multi-threaded machine, then the thread ID or a properly interlocked counter are needed to prevent 2 applications or 2 threads in the same application from generating identical UIDs simultaneously.
Do not use time on its own - it is too easy to end up with a leading zero 0 - e.g. 20060724.093017 use instead 20060724093017
Same can be found in specifications.
Following example is from DICOM Specifications to generate UID. Please note that this is Informative section.
2017a Part 5 - Data Structures and Encoding (B Creating a Privately Defined Unique Identifier (Informative))
B.1 Organizationally Derived UID:
The following example presents a particular choice made by a specific
organization in defining its suffix to guarantee uniqueness of a SOP
Instance UID.
"1.2.840.xxxxx.3.152.235.2.12.187636473"
In this example, the root is:
1 Identifies ISO
2 Identifies ANSI Member Body
840 Country code of a specific Member Body (U.S. for ANSI)
xxxxx Identifies a specific Organization.(assigned by ANSI)
In this example the first two components of the suffix relate to the
identification of the device:
3 Manufacturer defined device type
152 Manufacturer defined serial number
The remaining four components of the suffix relate to the
identification of the image:
235 Study number
2 Series number
12 Image number
187636473 Encoded date and time stamp of image acquisition
In this example, the organization has chosen these components to
guarantee uniqueness. Other organizations may choose an entirely
different series of components to uniquely identify its images. For
example it may have been perfectly valid to omit the Study Number,
Series Number and Image Number if the time stamp had a sufficient
precision to ensure that no two images might have the same date and
time stamp. Because of the flexibility allowed by the DICOM Standard
in creating organizationally derived UIDs, implementations should not
depend on any assumed structure of UIDs and should not attempt to
parse UIDs to extract the semantics of some of its components.
There is one more way mentioned in specifications
2017a Part 5 - Data Structures and Encoding (B Creating a Privately Defined Unique Identifier (Informative))
B.2 UUID Derived UID:
UID may be constructed from the root "2.25." followed by a decimal representation of a Universally Unique Identifier (UUID). That decimal representation treats the 128 bit UUID as an integer, and may thus be up to 39 digits long (leading zeros must be suppressed).
A UUID derived UID may be appropriate for dynamically created UIDs, such as SOP Instance UIDs, but is usually not appropriate for UIDs determined during application software design, such as private SOP Class or Transfer Syntax UIDs, or Implementation Class UIDs.

Can One Time Passwords be used as identifiers?

If I have bunch of OTPs mixed and if I know all of their generation seeds (the OPT URI) can I group by source URI?
I have a use case there I need the system to be 100% blind to the data relationships that its passing around.
For example: Users enter OTPs from their smartphones instead of their logins it should become very difficult identify entries by one user. As data is exported of the system that has OPT seeds is it possible to reestablish entry's ownership?
That's possible, but with a big complexity. You will need to generate codes for all seeds you have and then find if there is any match.
Also, there is a chance to receive the same code for different seeds at some moment. To avoid this problem you can ask a user for several consecutive codes, this significantly decreases the possibility of codes matching just by case.

ASP.NET: Generating Order ID?

I am getting ready to launch a website I designed in ASP.NET.
The problem is, I don't want my customers to have a super low order id(example:#00000001).
How would I generate a Unique(and random) Order ID, so the customer would get an order number like K20434034?
Set your Identity Seed for your OrderId to a large number. Then when you present an order number to the user, you could have a constant that you prepend to the order id (like all orders start with K), or you could generate a random character string and store that on the order record as well.
There are multiple options from both the business tier and database:
Consider
a random number has a chance of collision
it is probably best not to expose an internal ID, especially a sequential one
a long value will annoy users if they ever have to type or speak it
Options
Generate a cryptographically random number (an Int64 generated with RNGCryptoServiceProvider has a very low chance of collision or predictability)
begin an auto-incremented column which begins at some arbitrary number other than zero
use UNIQUEIDENTIFIER (or System.Guid) and base 62 encode the bytes
I suggest you just start the identity seed at some higher number if all you care about is that they don't think the number is low. The problem with random is that there is always the chance for collisions, and it gets more and more expensive to check for duplicates as the number of existing order IDs piles up.
Make column data type as UNIQUEIDENTIFIER . This data type will provide you the ID in the below mentioned format. Hope this fulfills the need.
B85E62C3-DC56-40C0-852A-49F759AC68FB.

Resources