I want to generate unique random number sequence in QT, Using QDateTime::currentDateTime().toTime_t() as seed value, will qrand() generate unique random numbers?
No. qrand can only generate as many unique numbers as fit into an integer, so -- whatever the implementation -- you cannot count on uniqueness.
Also, knowing that a different seed creates a different random integer would yield a level of predictability that effectively makes qrand not random anymore.
Edit: I swear I'm not trying to make fun of you by posting a cartoon; I think this is a quite good explanation of the problem:
(source: dilbert.com)
Depending on how you store your session ids, you can generated a (mostly) guaranteed unique identifier by using a UUID. See the documentation for QUuid. Also be aware of this (bold added):
You can also use createUuid(). UUIDs generated by createUuid() are of the random type. Their QUuid::Version bits are set to QUuid::Random, and their QUuid::Variant bits are set to QUuid::DCE. The rest of the UUID is composed of random numbers. Theoretically, this means there is a small chance that a UUID generated by createUuid() will not be unique. But it is a very small chance.
I can vouch for the fact that those generated UUIDs won't necessarily be unique, so if you do need them to be unique, look into libuuid or something similar.
According to the Qt Documentation, QRand is just a thread-safe version of the standard rand(), I wouldn't assume the method used is any more secure/superior to that of rand() based on that description.
I think you need to use different terminology than 'unique' random numbers (no Psuedo-Random Number Generator will produce a unique stream, as input X will always produce output Y). What's the actual situation?
Related
For the purpose of reproducibility, one has to choose a seed. In R, we can use set.seed().
My question is, when the seed is not set explicitly, how does the computer choose the seed?
Why is there no default seed?
A pseudo random number generator (PRNG) needs a default start value, which you can set with set.seed(). If there is no given it generally takes computer based information. This could be time, cpu temperatur or something similar. If you want a more random start value it is possible to use physical values, like white noise or nuclear decay, but you generally need an extern information source for this kind of random information.
The documentation mentions R uses current time and the process ID:
Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.
A default seed is a bad idea, since a random generators would always produce the same samples of numbers by default. If you always take the same seed it's not anymore randomized, since there will be always the same numbers. So you just provide a fixed data sample, which is not the intended output of a PRNG. You could of course turn the default seed off (if there would be one), but the intended function is primary to generate a completely random set of data and not a fixed one.
For statistical approaches it matters for validation and verification reasons, but it's getting more important when you get to cryptography. In this field a good PRNG is mandatory.
i need to implement a coupon-code feature. because of the number of codes required and some other constraints, i can't store them in a database. in addition the displayed codes need to be short (around 10 characters).
my original idea was to use a cryptographic function to create codes by encrypting an ongoing counter. but i'm at a loss what method to use.
Because of the counter i would be encoding only a couple of bytes and I am aware that many algorithms are not secure when used with very short messages.
Is my Approach a good idea?
What algorithm could i use?
I'm not sure if this is what you're after, and as per my comment, you have no real guarantee of security, but one possible answer could be to seed a prng with some number and give out the first x numbers as codes. As long as x is much smaller than the total possible number of outcomes, the chance for repetition is small, and codes could be validated by re-generating the sequence (you may want to hash parts of it for speed purposes)
if you use base 62: [a-z A-Z 0-9] with 10 numbers, there are over 839 quadrillion possible outcomes. If you were to give everyone on the planet a unique code, you would have used roughly 0.0000009% of your addressable space
I've been looking into hash tables, where some data is hashed and that is used for a bucket index.
Some libraries use the modulo of the hash with the bucket size, and others use a bit-mask.
Where only the bits used by the bucket mask are used (ensuring the range is not exceeded).
bitmask:
index = h->hash_func(key) & h->hash_mask;
modulo:
index = h->hash_func(key) % h->bucket_tot;
While there are obvious differences between the two, such as bucket size constraints with bit-masks, ensuring hashing gives good distribution on lower bits, speed of modulo... etc.
Are there strong reasons to choose one over another?
(I'll probably try & benchmark for my own use-case, but curious whats already known on the matter).
Note, this is simply for key:value store, (dictionary/hash/associative-array) and not security related.
Example of a dynamic resizing, chaining hash table implementation using bit-mask:
https://github.com/amadvance/tommyds/blob/master/tommyds/tommyhashdyn.c
https://github.com/GNOME/glib/blob/master/glib/ghash.c
Example using modulo:
https://www.daniweb.com/software-development/c/threads/104887/sucinct-example-of-hash-table-w-chaining
You mentioned "bucket" index so I assume you mean hash tables with separate chaining as collision resolution, in this case there is no reasons for using modulo or bit mask "stronger" that you mentioned (which BTW not so obvious, as you said).
In some languages, most notably Java/JVM-based, array index is positive signed 32-bit integer, thus maximum array size for bit mask is 2^30, that could be insufficient and a strong reason to use no-power-of-two table size and modulo, with which you can approach 2^31-1(max possible signed 32-bit integer) very closely. But since you used C++ syntax this shouldn't be a concern for you.
Also, if you meant not only separate chaining, some open addressing collision resolution algorithms require table size to meet certain conditions, for example, if you implement double hashing, table size should be prime. In this case you obviously should use only modulo to obtain the initial index in the table.
It isn't always just about performance either, sometimes it's about the domain of your problem. You may, for example, have a mask that wishes to hash negative numbers as well. With modulo you have to write special cases to handle them, not so with a bitmask.
I have an application whereby users have their own IDs.
The IDs are unique.
The IDs are GUIDs, so they include letters and numbers.
I want a formulae whereby if I have both IDs I can find their combined GUID, regardless of which order I use them in.
These GUIDs are 16 digits long, for the example below I will pretend they are 4.
user A: x43y
user B: f29a
If I use formula X which takes two arguments: X(a,b) I want the produced code to give the same result regardless whether a = UserA or UserB's GUID.
I do not require a method to find either users IDs, given one, from this formulae - ie it is a one way method.
Thank you for any answers or direction
So I'll turn my comment into an answer. Then this question can get answered, the answer accepted (if it is good enough) and we can all move on.
Sort the GUIDs lexicographically and append the second to the first. The result is unique, and has all the other characteristics you've asked for.
Can you compress it (I know you wrote shorten but bear with me) down to 16 characters ? No you can't; not, that is, if you want to be able to decompress it again and recover the original bits. (You've written that you don't need to be able to recover the original GUIDs, skip the next paragraph if you want to.)
A GUID is, essentially, a random sequence of 128 bits. Random sequences can't, by definition, be compressed. If a sequence of 128 bits is compressible it can't be random, there would have to be some algorithm for inflating the compressed version back to 128 bits. I know that since GUIDs are generated algorithmically they're not truly random. However, in practice there is almost no point in regarding them as anything other than truly random; I certainly don't think you should waste your time trying to compress them.
Given that the total population of possible GUIDs is large, you might be satisfied by a method which takes the first half of each individual GUID and assembles a pseudo-GUID from them. Depending on how many GUIDs your system is likely to be working with, and your appetite for risk, this might satisfy your practical needs.
I am interested in a modification of the usual idea of random number generators. That is, typical generators generate long strings of reasonably independent, uniformly distributed numbers from that space. This is intended to be used with one seed, repeatedly.
However, for my purpose, I want a way of generating a "random number" from another number (actually from a grid of integers) in a way that is "independent," in the sense that knowing the outputs for nearby points don't help you predict the value at your point.
In practice, using traditional random number generators works reasonably well, but I'd be interested in any work that was actually done for this purpose.
It sounds like you are looking for a cryptographic hash function.
The ideal cryptographic hash function has four main properties:
it is easy to compute the hash value for any given message
it is infeasible to generate a message that has a given hash
it is infeasible to modify a message without changing the hash
it is infeasible to find two different messages with the same hash
Some commonly used hash functions are SHA-1 and SHA-512. One called MD5 is still being used even though it has been shown to be insecure.