Encrypting asymmetrically only a part of a large file - encryption

I want to store a large file on a publicly accessible service, amazon, bittorrent, ipfs etc.
I want this file to be encrypted.
I know the common practice is to encrypt the file symmetrically with a complex password and then encrypt the password with the recipient public key, but I have a use case I need to deliver the key to each recipient so when the password leaks to public I know who did it.
So what I thought of was to encrypt the whole file with AES CBC then split it to chunks and encrypt only the first chunk asymmetrically.
Are there any logical mistakes in this idea? What should be the minimum size of the first chunk (in bytes or percentage of the whole file) so it's safe to say without decrypting the first one there is no way to decrypt the remaining ones.
Edit
Thanks for the answers
I'll elaborate a little more on the use case.
I'm planning to let users put (sell) files on decentralised storage using my platform (and I have no control over the nodes - lets assume it's global ipfs). To be compliant with the regulations files has to be encrypted and I have to have a way to block the access to it.
Because as stated before I wont be able to delete the files from all the nodes I thought of encrypting the files asymmetrically but this requires preparing a separate copy for each recipient and would take a lot of time.
That's how I came up with the idea of encrypting only a part of the file, moreover this would be done by a re-encryption proxy so the seller would only need to prepare the re-encryption key and the amount of excessive data on the network would be minimal (only one shard per buyer).
Still when the authorities approach me that I'm sharing illegal content I could tell them the file is encrypted and the only guys that downloaded it are these public keys owners.

Apparently some things are misunderstood
have a use case I need to deliver the key to each recipient so when the password leaks to public I know who did it.
Lets assume the file is encrypted with a single symetric encryption key (password in ypur case) . You may encrypt the password using recipients' personal public key, but once the password is released, you have no means to find out who leaked/released it.
split it to chunks and encrypt only the first chunk asymmetrically
that makes no sense / reason (at least I did not find any reason why this would help you to achive the stated use case)
note: the reason why hybrid encryption is used is that asymmetric encryption (RSA) is feasible to encrypt only limited amount of data (e. g. symmetric encr. key)

your problem is not solvable by the means of classic cryptography
when we take a look at your problem one might think your usecase is like so often in cryptography: confidentiality, but it is not
confidentiality in a cryptographic context means: helping n parties to keep a secret
that means, all of the original n parties share the common interest of keeping that secret ...
in your case, you suspect at least one of the parties not to share this interest ... this is where classical crypto attempts will fail to solve your problem ...
pay tv companies learned this the hard way ... their solution seemingly is to replace the content keys faster than a group of rouge actors can share the needed keys for live decryption and to manage access to the content keys by encrypting them with group keys, which are partitioned and distributet along all legitimate clients ... that only "works" (read "not really if you put in enough effort") for large dynamic content streams, not for a static file ...
your use case sounds more like digital watermarking and fingerprinting

Related

symmetric AES enryption concept

i have a project for a website, running on Django. One function of it needs to store user/password for a third party website. So it needs to be symmetric encryption, as it needs to use these credentials in an automated process.
Storing credentials is never a good idea, I know, but for this case there is no other option.
My idea so far is, to create a Django app, that will save and use these passwords, and do nothing else. With this I can have 2 "webservers" that will not receive any request from outside, but only get tasking via redis or something. Therefore I can isolate them to some degree (they are the only servers who will have access to this extra db, they will not handle any web request, etc)
First question: Does this plan sound solid or is there a major flaw?
Second question is about the encryption itself:
AES requires an encryption key for all its work, ok that needs to be "secured" in some way. But I am more interested in the IV.
Every user can have one or more credential sets saved in the extra db. Would it be a good idea to use some hash of sort over the user id or something to generate a per user custom IV? Most of the time I see IV to be just random generated. But then I will have to also store them somewhere in addition to the key.
For me it gets a bit confusing here. I need key and IV to decrypt, but I would "store" them the same way. So wouldn't it be likely if one get compromised, that also the IV will be? Would it then make any difference if I generate the IV on the fly over a known procedure? Problem then, everyone could know the IV if they know their user id, as the code will be open source....
In the end, I need some direction guidance as how to handle key and best unique IV per user. Thank you very much for reading so far :-)
Does this plan sound solid or is there a major flaw?
The need to store use credentials is imho a flaw by design, at least we all appreciate you are aware of it.
Having a separate credential service with dedicated datastore seems to be best you can do under stated conditions. I don't like the option to store user credentials, but let's skip academic discussion to practical things.
AES requires an encryption key for all its work, ok that needs to be "secured" in some way.
Yes, there's the whole problem.
to generate a per user custom IV?
IV allows reusing the same key for multiple encryptions, so effectively it needs to be unique for each ciphertext (if a user has multiple passwords, you need an IV for each password). Very commonly IV is prepended to the ciphertext as it is needed to decrypt it.
Would it then make any difference if I generate the IV on the fly over a known procedure?
IV doesn't need to be secret itself.
Some encryption modes require the IV to be unpredictable (e.g. CBC mode), therefore it's best if you generate the IV as random. There are some modes that use IV as a counter to encrypt/decrypt only part of data (such as CTR or OFB), but still it is required the IV is unique for each key and encryption.

Encryption of user name

I need to encrypt user names that i receive from an external partners SSO. This needs to be done because the user names are assigned to school children. But we still need to be able to track each individual to prevent abuse of our systems, so we have decided to encrypt the user names in our logs etc.
This way, a breach of our systems will not compromise the identity of the children.
Heres my predicament. I have very limited knowledge in this area, so i am looking for advice on which algorithm to use.
I was thinking of using an asymmetrical algorithm, like PGP, and throwing away one of the keys so that we will not be able to decrypt the user name.
My questions:
Does PGP encryption always provide the same output given the same input?
Is PGP a good choice for this, or should we use an other algorithm?
Does anyone have a better suggestion for achieving the same thing - anonymization of the user
If you want a one-way function, you don't want encryption. You want hashing. The easiest thing to do is to use a hash like SHA-256. I recommend salting the username before hashing. In this case, I would probably pick a static salt like edu.myschoolname: and put that in front of the username. Then run that through SHA-256. Convert the result to Base-64 or hex encoding, and use the resulting string as the "username."
From a unix command line, this would look like:
$ echo -n "edu.myschoolname:robnapier#myschoolname.edu" | shasum -a 256
09356cf6df6aea20717a346668a1aad986966b192ff2d54244802ecc78f964e3 -
That output is unique to that input string (technically it's not "unique" but you will never find a collision, by accident or by searching). And that output is stable, in that it will always be the same for the given input. (I believe that PGP includes some randomization; if it doesn't, it should.)
(Regarding comments below)
Cryptographic hash algorithms are extremely secure for their purposes. Non-cryptographic hash algorithms are not secure (but also aren't meant to be). There are no major attacks I know of against SHA-2 (which includes SHA-256 and SHA-512).
You're correct that your system needs to be robust against someone with access to the code. If they know what userid they're looking for, however, no system will be resistant to them discovering the masked version of that id. If you encrypt, an attacker with access to the key can just encrypt the value themselves to figure out what it is.
But if you're protecting against the reverse: preventing attackers from determining the id when they do not already know the id they're looking for, the correct solution is a cryptographic hash, specifically SHA-256 or SHA-512. Using PGP to create a one-way function is using a cryptographic primitive for something it is not built to do, and that's always a mistake. If you want a one-way function, you want a hash.
I think that PGP is a good Idea, but risk to make usernames hard to memorize, why not simply make a list of usernames composed with user + OrderedNumbers where user can be wichever word you want and oredered number is a 4-5 digit number ordered by birth date of childrens?Once you have done this you only have to keep a list where the usernames are linked wit the corresponding child abd then you can encript this "nice to have" list with a key only you know.

Wanted: encryption scheme for copy protection purposes

I am tasked with implementing a dongle-based copy protection scheme for an application. I realize that no matter what I do, someone will crack it, but I want to at least make it a little more difficult than an if-statement checking whether a dongle is present.
My approach is to encrypt critical data that the application needs for proper execution. During runtime, the decryption key is retrieved from the dongle (our chosen model has some suitable API functions for that), the data is decrypted and the application is happy.
Of course, a determined attacker can intercept that decryption key and also get ahold of the decrypted data. That's ok. But what should be hard is to substitute their own data. So I'm looking for an encryption scheme where knowing the decryption key doesn't enable someone to encrypt their own data.
That's obviously asymmetric encryption. But for every such algorithm I found so far, the encryption (or public) key can be generated from the decryption (or private) key, which is exactly what I'm trying to avoid.
Note: simply signing the data won't help much, since (unless I'm totally misunderstanding such signatures) verifying the signature will just be another if-statement, which is easily circumvented.
So... any ideas?
The moment the private key is known to the attacker you won't have any secret information to differentiate yourself from the others.
To make it harder for the attacker: You might want to expire each pair (public key, private key) after an application specific time T and generate a new pair based on the previous pair both on the dongle and your own machine, independently. This way the attacker needs to have a constant access to the dongle to be able to encrypt his data with the new private key or to run his private_key_detection algorithm as often as T.
You probably want to run the decrypt on the dongle. There are a few pieces of hardware that help this (I just googled this one, for example.). There are likely many others....Dallas Semiconductor used to have a Java button that would allow you to run code on a small dongle like device, but I don't think they have it anymore.
Some of these allow you to execute code in the dongle. So maybe a critical function that is hard to recreate yet doesn't require high performance might work? Perhaps a license key validation algorithm.
Maybe you could include code in the dongle that has to be put into memory in order for the program to run. This would be a little harder to break, but might be hard to implement depending on what tools you are using to make your program.
You probably also want to study up on some anti-debugging subjects. I remember seeing a few publications a while back, but here is at least one. This is another layer that will make it harder to crack.
Dependency on an Internet connection may also be an option. You have to be careful here to not piss off your customers if they can't get your code to run without an Internet connection.
You can also check out FlexLM (or whatever it is called these days). It works, but it is a beast. They also try to negotiate a percentage of your company's gross profit for the license fee if I recall correctly (it's been years....I think we told them to stuff it when they asked for that.)
Good luck!
To answer my own question (somewhat), it is possible to do this with RSA, but most APIs (including the one of OpenSSL's crypto library) need to be "tricked" into doing it. The reason you can generate the public key, given the private key, is that
It is common practive for implementations of RSA to save p and q (those big prime numbers) in the private key data structure.
Since the public key (which consists of the modulus N and some exponent e) is public anyway, there's (usually) no point in choosing an obscure e. Thus, there are a handful of standard values that are used commonly, like 3 or 65537. So even if p and q are unknown, you might be able to "guess" the public exponent.
However, RSA is symmetrical in the sense that anything you encrypt with the public key can be decrypted with the private key and vice versa. So what I've done (I'm a monster) is to let the crypto library generate an RSA key. You can choose your own public exponent there, which will later be used to decrypt (contrary to the normal way). Then, I switch around the public and private exponent in the key data structure.
Some tips for anyone trying to do something similar with the crypto library:
In the RSA data structure, clear out everything but n and e / d, depending on whether you want to encrypt or decrypt with that particular key.
Turn off blinding with RSA_blind_off. It requires the encryption exponent even when decrypting, which is not what we want. Note that this might open you up to some attacks.
If someone needs more help, leave a comment and I'll edit this post with more information.

time-based encryption algorithm?

I've an idea in my mind but I've no idea what the magic words are to use in Google - I'm hoping to describe the idea here and maybe someone will know what I'm looking for.
Imagine you have a database. Lots of data. It's encrypted. What I'm looking for is an encryption whereby to decrypt, a variable N must at a given time hold the value M (obtained from a third party, like a hardware token) or it failed to decrypt.
So imagine AES - well, AES is just a single key. If you have the key, you're in. Now imagine AES modified in such a way that the algorithm itself requires an extra fact, above and beyond the key - this extra datum from an external source, and where that datum varies over time.
Does this exist? does it have a name?
This is easy to do with the help of a trusted third party. Yeah, I know, you probably want a solution that doesn't need one, but bear with me — we'll get to that, or at least close to that.
Anyway, if you have a suitable trusted third party, this is easy: after encrypting your file with AES, you just send your AES key to the third party, ask them to encrypt it with their own key, to send the result back to you, and to publish their key at some specific time in the future. At that point (but no sooner), anyone who has the encrypted AES key can now decrypt it and use it to decrypt the file.
Of course, the third party may need a lot of key-encryption keys, each to be published at a different time. Rather than storing them all on a disk or something, an easier way is for them to generate each key-encryption key from a secret master key and the designated release time, e.g. by applying a suitable key-derivation function to them. That way, a distinct and (apparently) independent key can be generated for any desired release date or time.
In some cases, this solution might actually be practical. For example, the "trusted third party" might be a tamper-resistant hardware security module with a built-in real time clock and a secure external interface that allows keys to be encrypted for any release date, but to be decrypted only for dates that have passed.
However, if the trusted third party is a remote entity providing a global service, sending each AES key to them for encryption may be impractical, not to mention a potential security risk. In that case, public-key cryptography can provide a solution: instead of using symmetric encryption to encrypt the file encryption keys (which would require them either to know the file encryption key or to release the key-encryption key), the trusted third party can instead generate a public/private key pair for each release date and publish the public half of the key pair immediately, but refuse to disclose the private half until the specified release date. Anyone else holding the public key may encrypt their own keys with it, but nobody can decrypt them until the corresponding private key has been disclosed.
(Another partial solution would be to use secret sharing to split the AES key into the shares and to send only one share to the third party for encryption. Like the public-key solution described above, this would avoid disclosing the AES key to the third party, but unlike the public-key solution, it would still require two-way communication between the encryptor and the trusted third party.)
The obvious problem with both of the solutions above is that you (and everyone else involved) do need to trust the third party generating the keys: if the third party is dishonest or compromised by an attacker, they can easily disclose the private keys ahead of time.
There is, however, a clever method published in 2006 by Michael Rabin and Christopher Thorpe (and mentioned in this answer on crypto.SE by one of the authors) that gets at least partially around the problem. The trick is to distribute the key generation among a network of several more or less trustworthy third parties in such a way that, even if a limited number of the parties are dishonest or compromised, none of them can learn the private keys until a sufficient majority of the parties agree that it is indeed time to release them.
The Rabin & Thorpe protocol also protects against a variety of other possible attacks by compromised parties, such as attempts to prevent the disclosure of private keys at the designated time or to cause the generated private or public keys not to match. I don't claim to understand their protocol entirely, but, given that it's based on a combination of existing and well studies cryptographic techniques, I see no reason why it shouldn't meet its stated security specifications.
Of course, the major difficulty here is that, for those security specifications to actually amount to anything useful, you do need a distributed network of key generators large enough that no single attacker can plausibly compromise a sufficient majority of them. Establishing and maintaining such a network is not a trivial exercise.
Yes, the kind of encrpytion you are looking for exists. It is called timed-release encryption, or abbreviated TRE. Here is a paper about it: http://cs.brown.edu/~foteini/papers/MathTRE.pdf
The following is an excerpt from the abstract of the above paper:
There are nowdays various e-business applications, such as sealedbid auctions and electronic voting, that require time-delayed decryption of encrypted data. The literature oers at least three main categories of protocols that provide such timed-release encryption (TRE).
They rely either on forcing the recipient of a message to solve some time-consuming, non-paralellizable problem before being able to decrypt, or on the use of a trusted entity responsible for providing a piece of information which is necessary for decryption.
I personally like another name, which is "time capsule cryptography", probably coined at crypto.stackoverflow.com: Time Capsule cryptography?.
A quick answer is no: the key used to decrypt the data cannot change in time, unless you decrypt and re-encrypt all the database periodically (I suppose it is not feasible).
The solution suggested by #Ilmari Karonen is the only one feasible but it needs a trusted third party, furthermore, once obtained the master AES key it is reusable in the future: you cannot use 'one time pads' with that solution.
If you want your token to be time-based you can use TOTP algorithm
TOTP can help you generate a value for variable N (token) at a given time M. So the service requesting the access to your database would attach a token which was generated using TOTP. During validation of token at access provider end, you'll validate if the token holds the correct value based on the current time. You'll need to have a Shared Key at both the ends to generate same TOTP.
The advantage of TOTP is that the value changes with time and one token cannot be reused.
I have implemented a similar thing for two factor authentication.
"One time Password" could be your google words.
I believe what you are looking for is called Public Key Cryptography or Public Key Encryption.
Another good word to google is "asymmetric key encryption scheme".
Google that and I'm quite sure you'll find what you're looking for.
For more information Wikipedia's article
An example of this is : Diffie–Hellman key exchange
Edit (putting things into perspective)
The second key can be determined by an algorithm that uses a specific time (for example at the insert of data) to generate the second key which can be stored in another location.
As other guys pointed out One Time Password may be a good solution for the scenario you proposed.
There's an OTP implemented in C# that you might take a look https://code.google.com/p/otpnet/.
Ideally, we want a generator that depends on the time, but I don't know any algorithm that can do that today.
More generally, if Alice wants to let Bob know about something at a specific point in time, you can consider this setup:
Assume we have a public algorithm that has two parameters: a very large random seed number and the expected number of seconds the algorithm will take to find the unique solution of the problem.
Alice generates a large seed.
Alice runs it first on her computer and computes the solution to the problem. It is the key. She encrypts the message with this key and sends it to Bob along with the seed.
As soon as Bob receives the message, Bob runs the algorithm with the correct seed and finds the solution. He then decrypts the message with this key.
Three flaws exist with this approach:
Some computers can be faster than others, so the algorithm has to be made in such a way as to minimize the discrepancies between two different computers.
It requires a proof of work which may be OK in most scenarios (hello Bitcoin!).
If Bob has some delay, then it will take him more time to see this message.
However, if the algorithm is independent of the machine it runs on, and the seed is large enough, it is guaranteed that Bob will not see the content of the message before the deadline.

Been advised to use same IV in AES implementation

We've had to extend our website to communicate user credentials to a suppliers website (in the query string) using AES with a 256-bit key, however they are using a static IV when decrypting the information.
I've advised that the IV should not be static and that it is not in our standards to do that, but if they change it their end we would incur the [big] costs so we have agreed to accept this as a security risk and use the same IV (much to my extreme frustration).
What I wanted to know is, how much of a security threat is this? I need to be able to communicate this effectively to management so that they know exactly what they are agreeing to.
*UPDATE:*We are also using the same KEY throughout as well.
Thanks
Using a static IV is always a bad idea, but the exact consequences depend on the Mode of Operation in use. In all of them, the same plaintext will produce the same ciphertext, but there may be additional vulnerabilities: For example, in CFB mode, given a static key, the attacker can extract the cipherstream from a known plaintext, and use it to decrypt all subsequent strings!
Using a static IV is always a bad idea. Using a static key is always a bad idea. I bet that your supplier had compiled the static key into their binaries.
Sadly, I've seen this before. Your supplier has a requirement that they implement encryption and they are attempting to implement the encryption in a manner that's as transparent as possible---or as "checkbox" as possible. That is, they aren't really using encryption to provide security, they are using it to satisfy a checkbox requirement.
My suggestion is that you see if the supplier would be willing to forsake this home-brewed encryption approach and instead run their system over SSL. Then you get the advantage of using a quality standard security protocol with known properties. It's clear from your question that neither your supplier nor you should be attempting to design a security protocol. You should, instead, use one that is free and available on every platform.
As far as I know (and I hope others will correct me if I'm wrong / the user will verify this), you lose a significant amount of security by keeping a static key and IV. The most significant effect you should notice is that when you encrypt a specific plaintext (say usernameA+passwordB), you get the same ciphertext every time.
This is great for pattern analysis by attackers, and seems like a password-equivalent that would give attackers the keys to the kingdom:
Pattern analysis: The attacker can see that the encrypted user+password combination "gobbbledygook" is used every night just before the CEO leaves work. The attacker can then leverage that information into the future to remotely detect when the CEO leaves.
Password equivalent: You are passing this username+password in the URL. Why can't someone else pass exactly the same value and get the same results you do? If they can, the encrypted data is a plaintext equivalent for the purposes of gaining access, defeating the purpose of encrypting the data.
What I wanted to know is, how much of a security threat is this? I need to be able to communicate this effectively to management so that they know exactly what they are agreeing to.
A good example of re-using the same nonce is Sony vs. Geohot (on a different algorithm though). You can see the results for sony :) To the point. Using the same IV might have mild or catastrophic issues depending on the encryption mode of AES you use. If you use CTR mode then everything you encrypted is as good as plaintext. In CBC mode your first block of plaintext will be the same for the same encrypted data.

Resources