Is it ok to use encrypted - hashed e-commerce customer email as Google Analytics User ID?
I found different privacy policy sections about the use of PII in Google Analytics. For example here it says , it is ok to use the encrypted hashed form of the Data . But here in the caution section it says we are not allowed to use the PII data. I will be using Measurement Protocol and GTM for sending the data to Google Analytics.
If I use a proper level of encryption + hashing , will that be ok to use the customer email address (in hashed encrypted form) as the User ID in google analytics?
Regards,
Lina
Yes it is OK to use SHA256-hashed PII data like you pointed out as hashing destroys the original data, thus it's no longer PII: cryptographic hash functions such as SHA256 are one-way functions, thus from the output you can't figure out the input (FYI you can brute-force the generation of inputs matching a given output - especially with weaker algorithms such as MD5 - to break into a system - eg guessing a password - but for the purpose of hiding PII it still does its job: you simply cannot know with certainty what the original PII was, so mission accomplished as far as protecting PII).
The only downside with using hashing to generate a User ID is collision: SHA256 produces 2^256 possible outputs, so if you're really unlucky (# emails / 2^256 = chance of collision) it's possible that different emails produce the same SHA-256 hash and thus the same User ID in which case different users will be incorrectly identified as the same user. To reduce chances of collision you could combine the hash with other attributes, eg {user_signup_timestamp}-{email_hash}, but the only way to prevent collision is to rely on a database ID for each user as the DB will ensure each User ID is unique.
Related
A typical example of hashing use would be the storage of passwords or sensitive data because this form of encryption is irreversible, but if it cannot be decrypted, why store it? The only possible use (from my limited knowledge) would be to have a user enter a password, have a program hash it and then check whether the user input hash is the same as the stored hash for said user. Is that a (or the only) correct scenario? What am I missing here? If that isn't the case, then how are passwords checked for correctness, and why not just delete the data instead of one-way encrypt it?
A typical example of hashing use would be the storage of passwords
Purpose of the hash (generally) is to create a fixed-size thumbprint of input of any size. Cryptographic hash has extra properties - the most important in this context it is hard (impossible) to derive any information about the input and create a duplicate (intentionally or not).
So there are other uses of a hash function:
anonymizing data
integrity check, that data are not changed
referencing large content
...
but if it cannot be decrypted, why store it?
Because we could compare if two contents are the same without needing to know or read the content itself.
or sensitive data because this form of encryption is irreversible
No, not storing any information. Hash is not any form of encryption.
The only possible use (from my limited knowledge) would be to have a user enter a password, have a program hash it and then check whether the user input hash is the same as the stored hash for said user. Is that a (or the only) correct scenario?
Basically yes. Reality is a little bit more complex, for storing the user credentials the best known option today we have is slow salted hash, so PBKDF2, BCrypt, SCrypt or Argon2.
and why not just delete the data instead of one-way encrypt it?
Because you need to compare the user password (it's hash) if it is correct. Or to check if some data are not changed.
Scenario: I need to store document accepted by the customer in my database. Customer needs to be sure that I don't modify it through time, and I need to have possibility to prove that stored document was accepted by the customer.
Do you know proven ways how to achieve this without doubts from any side?
I think I can create checksum from stored data for the customer, but I need to ensure that this checksum is unmodifiable by the customer. Any ideas?
PS. If you have better idea how to title this question then tell me, please.
PS. Let me know if you see better forum to ask this question, please.
What we call this in Cryptography is data integrity.
To ensure that the data is not changed by you or someone else, your customer can calculate the hash of the file with a cryptographic hash functions, which are designed to have collision resistance. I.e.
Hash(Original) != Hash(Modified) // equality almost impossible
In short, when you modify it is expected that the new modified document has the same hash value is impossible (in Cryptology term, negligible).
Your customer can use SHA-3 hash function which is standardized by NIST.
Don't use SHA-1 which has shattered.
If you want to go further, your customer can use HMAC which are key-based hash functions which supply data integrity and the authentication of data.
For the second part, we can solve it by digital signatures. Your customer signs the message
Sign(hash(message))
and gives you
( Sign(hash(message)), message ) )
and his public key.
You can verify the signature with the public key of the customer to see that the customer changed the data or not. Digital signatures gives us Non-Repudation.
This part actually solves your two problems. Even third parties can check that the data is not modified and comes from the signer (your customer).
Note : don't use checksums which are not Cryptographically secure and mostly easy to modify the document in a way that they have the same checksums.
I was hoping someone could help me sort something out. I've been working on a shopping cart plugin for WordPress for quite a while now. I started coding it at the end of 2008 (and it's been one of those "work on it when I have time" projects, so the going is very slow, obviously!) and got pretty far with it. Even had a few testers take me up on it and give me feedback. (Please note that this plugin is also meant to be a fee download - I have no intention of making it a premium plugin.)
Anyway, in 2010, when all the PCI/DSS stuff became standard, I shelved it, because the plugin was meant to retain certain information in the database, and I was not 100% sure what qualified as "sensitive data," and I didn't want to put anything out there that might compromise anyone, and possibly come back on me.
Over the last few weeks, some colleagues and I have been having a discussion about PCI/DSS compliance, and it's sparked a re-interest in finally finishing this plugin. I'm going to remove the storage of credit card numbers and any data of that nature, but I do like the idea of storing the names and shipping addresses of people who voluntarily might want to create an account with the site that might use this plugin so if they shop there again, that kind of info is retained. Keep in mind, the data stored would be public information - the kind of thing you'd find in a phone book, or a peek in the record room of a courthouse. So nothing like storing SS#'s, medical histories or credit card numbers. Just stuff that would maybe let someone see past purchases, and retain some info to make a future checkout process a bit easier.
One of my colleagues suggested I still do something to enhance security a bit, since the name and shipping address would likely be passed to whatever payment gateway the site owner would choose to use. They suggested I use "one-way encryption." Now, I'm not a huge security freak, but I'm pretty sure this involves (one aspect anyway) stuff like MD5 hashes with salts, or the like. So this confuses me, because I wouldn't have the slightest idea of where to look to see how to use that kind of thing with my code, and/or if it will work when passing that kind of data to PayPal or Google Checkout, or Mal's, or what have you.
So I suppose this isn't an "I need code examples" kind of question, but more of a "please enlighten me, because I'm sort of a dunce" kind of question. (which, I'm sure, makes people feel much better about the fact that I'm writing a shopping cart plugin LOL)
One way encryption is used to store information in the database that you don't need back out of the database again in its unencrypted stage (hence the one-way moniker). It could, in a more general sense, be used to demonstrate that two different people (or systems) are in possession of the same piece of data. Git, for instance, uses hashes to check if files (and indeed entire directory structures) are identical.
Generally in an ecomm contect hashes are used for passwords (and sometimes credit cards) because as the site owner, you don't need to retain the actual password, you just need a function to be able to determine if the password currently being sent by the user is the same as the one previously provided. So in order to authenticate a user you would pass the password provided through the encryption algorithm (MD5, SHA, etc) in order to get a 'hash'. If the hash matches the hash previously generated and stored in the database, you know the password is the same.
WordPress uses salted hashes to store it's passwords. If you open up your wp_users table in the database you'll see the hashes.
Upside to this system is that if someone steals your database, they don't get the original passwords, just the hash values which the thief can't then use to log in to your users' Facebook, banking, etc sites (if your user has used the same password). Actually, they can't even use the hashes to log in to the site they were stolen from as hashing a hash produces a different hash.
The salt provides a measure of protection against dictionary attacks on the hash. There are databases available of mappings between common passwords and hash values where the hash values have been generated by regularly used one way hash functions. If, when generating the hash, you tack a salt value on to the end of your password string (eg my password becomes abc123salt), you can still do the comparison against the hash value you've previously generated and stored if you use the same salt value each time.
You wouldn't one way hash something like an address or phone number (or something along those lines) if you need to use it in the future again in its raw form, say to for instance pre-populate a checkout field for a logged in user.
Best practices would also involve just not storing data that you don't need again in the future, if you don't need the phone number in the future, don't store it. If you store the response transaction number from the payment gateway, you can use this for fraud investigations and leave the storage of all of the other data up to the gateway.
I'll leave it to others to discuss the relative merits of MD5 vs. SHA vs ??? hashing systems. Note, there's functions built in to PHP to do the hashing.
I am debating using user-names as a means to salt passwords, instead of storing a random string along with the names. My justification is that the purpose of the salt is to prevent rainbow tables, so what makes this realistically less secure than another set of data in there?
For example,
hash( md5(johnny_381#example.com), p4ss\/\/0rD)
vs
hash( md5(some_UUID_value), p4ss\/\/0rD)
Is there a real reason I couldn't just stick with the user name and simplify things? The only thing my web searching resulted was debates as to how a salt should be like a password, but ended without any reasoning behind it, where I'm under the impression this is just to prevent something like a cain-and-able cracker to run against it without being in the range of a million years. Thinking about processing limitations of reality, I don't believe this is a big deal if people know the hash, they still don't know the password, and they've moved into the super-computer range to brute force each individual hash.
Could someone please enlighten me here?
You'll run into problems, when the username changes (if it can be changed). There's no way you can update the hashed password, because you don't store the unsalted, unhashed password.
I don't see a problem with utilizing the username as the salt value.
A more secure way of storing passwords involves using a different salt value for each record anyway.
If you look at the aspnet_Membership table of the asp.net membership provider you'll see that they have stored the password, passwordsalt, and username fields in pretty much the same record. So, from that perspective, there's no security difference in just using the username for the salt value.
Note that some systems use a single salt value for all of the passwords, and store that in a config file. The only difference in security here is that if they gained access to a single salt value, then they can more easily build a rainbow table to crack all of the passwords at once...
But then again, if they have access to the encrypted form of the passwords, then they probably would have access to the salt value stored in the user table right along with it... Which might mean that they would have a slightly harder time of figuring out the password values.
However, at the end of the day I believe nearly all applications fail on the encryption front because they only encrypt what is ostensibly one of the least important pieces of data: the password. What should really be encrypted is nearly everything else.
After all, if I have access to your database, why would I care if the password is encrypted? I already have access to the important things...
There are obviously other considerations at play, but at the end of the day I wouldn't sweat this one too much as it's a minor issue compared others.
If you use the username as password and there are many instances of your application, people may create rainbow tables for specific users like "admin" or "system" like it is the case with Oracle databases or with a whole list of common names like they did for WPA (CowPatty)
You better take a really random salt, it is not that difficult and it will not come back haunting you.
This method was deemed secure enough for the working group that created HTTP digest authentication which operates with a hash of the string "username:realm:password".
I think you would be fine seeing as this decision is secret. If someone steals your database and source code to see how you actually implemented your hashing, well what are they logging in to access at that point? The website that displays the data in the database that they've already stolen?
In this case a salt buys your user a couple of security benefits. First, if the thief has precomputed values (rainbow tables) they would have to recompute them for every single user in order to do their attack; if the thief is after a single user's password this isn't a big win.
Second, the hashes for all users will always be different even if they share the same password, so the thief wouldn't get any hash collisions for free (crack one user get 300 passwords).
These two benefits help protect your users that may use the same password at multiple sites even if the thief happens to acquire the databases of other sites.
So while a salt for password hashing is best kept secret (which in your case the exact data used for the salt would be) it does still provide benefits even if it is compromised.
Random salting prevents comparison of two independently-computed password hashes for the same username. Without it, it would be possible to test whether a person's password on one machine matched the one on another, or whether a password matched one that was used in the past, etc., without having to have the actual password. It would also greatly facilitate searching for criteria like the above even when the password is available (since one could search for the computed hash, rather than computing the hash separately for each old password hash value).
As to whether such prevention is a good thing or a bad thing, who knows.
I know this is an old question but for anyone searching for a solution based on this question.
If you use a derived salt (as opposed to random salt), the salt source should be strengthened by using a key derivation function like PBKDF2.
Thus if your username is "theunhandledexception" pass that through PBKDF2 for x iterations to generate a 32 bit (or whatever length salt you need) value.
Make x pseudo random (as opposed to even numbers like 1,000) and pass in a static site specific salt to the PBKDF2 and you make it highly improbable that your username salt will match any other site's username salt.
I'm using ASP.Net but my question is a little more general than that. I'm interested in reading about strategies to prevent users from fooling with their HTML form values and links in an attempt to update records that don't belong to them.
For instance, if my application dealt with used cars and had links to add/remove inventory, which included as part of the URL the userid, what can I do to intercept attempts to munge the link and put someone else's ID in there? In this limited instance I can always run a check at the server to ensure that userid XYZ actually has rights to car ABC, but I was curious what other strategies are out there to keep the clever at bay. (Doing a checksum of the page, perhaps? Not sure.)
Thanks for your input.
The following that you are describing is a vulnerability called "Insecure Direct Object References" And it is recognized by A4 in the The OWASP top 10 for 2010.
what can I do to intercept attempts to
munge the link and put someone else's
ID in there?
There are a few ways that this vulnerability can be addressed. The first is to store the User's primary key in a session variable so you don't have to worry about it being manipulated by an attacker. For all future requests, especially ones that update user information like password, make sure to check this session variable.
Here is an example of the security system i am describing:
"update users set password='new_pass_hash' where user_id='"&Session("user_id")&"'";
Edit:
Another approach is a Hashed Message Authentication Code. This approach is much less secure than using Session as it introduces a new attack pattern of brute force instead of avoiding the problem all togather. An hmac allows you to see if a message has been modified by someone who doesn't have the secret key. The hmac value could be calculated as follows on the server side and then stored as a hidden variable.
hmac_value=hash('secret'&user_name&user_id&todays_date)
The idea is that if the user trys to change his username or userid then the hmac_value will not be valid unless the attacker can obtain the 'secret', which can be brute forced. Again you should avoid this security system at all costs. Although sometimes you don't have a choice (You do have a choice in your example vulnerability).
You want to find out how to use a session.
Sessions on tiztag.
If you keep track of the user session you don't need to keep looking at the URL to find out who is making a request/post.