How do I reversibly (symmetrically) encrypt a filename (with or
without directory path, I'm OK w/ either) so that the result is also a
valid filename (less than 64 characters [or whatever the limit is], no
funny characters, ideally no spaces [but not a requirement], etc)?
Googling finds only filename encryption algorithms where the result is
a long string of binary characters (using MIME64, converting to
non-binary is easy, but this just makes the filename longer) and/or
non-symmetric one-way encrption schemes (eg, salted MD5, SHA1, DES,
etc). I don't want to store a table of hashes: I want to decrypt the
filename with a simple key I've memorized.
My own attempts with things like "mcrypt -b" failed too: the resulting
output (even before converting to ASCII) grows in size very rapidly as
the filename and key length increase.
Reasoning: I plan to use an "infinite backup" service (like mozy,
blazebackup, etc), but none encrypt filenames (just file
content). I'll create a directory that consists of encrypted filenames
with symlinks (or even hard links) to the real file. I'll back up only
that directory (and choose my own private key), and have
filename-encrypted and filecontent-encrypted backups.
EDIT: Petey's method worked like a charm!
# "-b 512" yields "Bits has bad value 512 (too small)"
ssh-keygen -t rsa -b 768 -f /tmp/test.rsa
echo "thisisareallylongfilenameknightswhosayniioratleastusedto" |\
openssl rsautl -inkey /tmp/test.rsa -encrypt | base64 |\
perl -0777 -pnle 's/\//-/isg;s/\n//isg'
yields a 130 character result that should always be a filename!
You could use an RSA key pair to do this. Generate an rsa key pair plus certificate, then import that into your cert store. Use the public key to encrypt your file name, then base64 encode the result. The maximum file name length for ntfs is 255 characters, so a 1024 bit RSA key should be fine, if you need shorter file names, use a 512 bit key. When you want to decrypt the file name: base64 decode the encrypted file name, then use the private key to decrypt back to the actual file name.
Not sure if there is any freeware available to do this. If you don't want to write the program yourself, I'll do it in .Net for you (for a small fee ;).
How strong do you need your encryption to be? You could use one of the classical alphabet based cyphers, such as Vigenère which will produce strictly alphabetical output, though it won't handle non-alphanumeric characters well if you want a valid filename as output. The result will still have "/" and "." where they were before.
If you are very concerned about security, it's worth noting that high entropy filenames may get more attention than just "file000001", ... "fileNNNNNNN". A separate file that maps from fileNNNN to the correct name can be encrypted separately and stored in duplicate in multiple locations. Both methods leak zero information about the original filenames. Alternatively, you could add a short header to every unencrypted file that encrypts the filename, thereby dispensing with a separate index.
What's more, the ability to do error detection of corrupted filenames is easier when the list of names is both known and in a pre-determined order.
[Posting as CW, because this is really more of a comment than an answer.]
Related
My company is working on a project that will put card readers in the field. The readers use DUKPT TripleDES encryption, so we will need to develop software that will decrypt the card data on our servers.
I have just started to scratch the surface on this one, but I find myself stuck on a seemingly simple problem... In trying to generate the IPEK (the first step to recreating the symmetric key).
The IPEK's a 16 byte hex value created by concatenating two triple DES encrypted 8 byte hex strings.
I have tried ECB and CBC (zeros for IV) modes with and without padding, but the result of each individual encoding is always 16 bytes or more (2 or more blocks) when I need a result that's the same size as the input. In fact, throughout this process, the cyphertexts should be the same size as the plaintexts being encoded.
<cfset x = encrypt("FFFF9876543210E0",binaryEncode(binaryDecode("0123456789ABCDEFFEDCBA98765432100123456789ABCDEF", "hex"), "base64") ,"DESEDE/CBC/PKCS5Padding","hex",BinaryDecode("0000000000000000","hex"))>
Result: 3C65DEC44CC216A686B2481BECE788D197F730A72D4A8CDD
If you use the NoPadding flag, the result is:
3C65DEC44CC216A686B2481BECE788D1
I have also tried encoding the plaintext hex message as base64 (as the key is). In the example above that returns a result of:
DE5BCC68EB1B2E14CEC35EB22AF04EFC.
If you do the same, except using the NoPadding flag, it errors with "Input length not multiple of 8 bytes."
I am new to cryptography, so hopefully I'm making some kind of very basic error here. Why are the ciphertexts generated by these block cipher algorithms not the same lengths as the plaintext messages?
For a little more background, as a "work through it" exercise, I have been trying to replicate the work laid out here:
https://www.parthenonsoftware.com/blog/how-to-decrypt-magnetic-stripe-scanner-data-with-dukpt/
I'm not sure if it is related and it may not be the answer you are looking for, but I spent some time testing bug ID 3842326. When using different attributes CF is handling seed and salt differently under the hood. For example if you pass in a variable as the string to encrypt rather than a constant (hard coded string in the function call) the resultant string changes every time. That probably indicates different method signatures - in your example with one flag vs another flag you are seeing something similar.
Adobe's response is, given that the resulting string can be unecrypted in either case this is not really a bug - more of a behavior to note. Can your resultant string be unencrypted?
The problem is encrypt() expects the input to be a UTF-8 string. So you are actually encrypting the literal characters F-F-F-F-9.... rather than the value of that string when decoded as hexadecimal.
Instead, you need to decode the hex string into binary, then use the encryptBinary() function. (Note, I did not see an iv mentioned in the link, so my guess is they are using ECB mode, not CBC.) Since the function also returns binary, use binaryEncode to convert the result to a more friendly hex string.
Edit: Switching to ECB + "NoPadding" yields the desired result:
ksnInHex = "FFFF9876543210E0";
bdkInHex = "0123456789ABCDEFFEDCBA98765432100123456789ABCDEF";
ksnBytes = binaryDecode(ksnInHex, "hex");
bdkBase64 = binaryEncode(binaryDecode(bdkInHex, "hex"), "base64");
bytes = encryptBinary(ksnBytes, bdkBase64, "DESEDE/ECB/NoPadding");
leftRegister = binaryEncode(bytes, "hex");
... which produces:
6AC292FAA1315B4D
In order to do this we want to start with our original 16 byte BDK
... and XOR it with the following mask ....
Unfortunately, most of the CF math functions are limited to 32 bit integers. So you probably cannot do that next step using native CF functions alone. One option is to use java's BigInteger class. Create a large integer from the hex strings and use the xor() method to apply the mask. Finally, use the toString(radix) method to return the result as a hex string:
bdkText ="0123456789ABCDEFFEDCBA9876543210";
maskText = "C0C0C0C000000000C0C0C0C000000000";
// use radix=16 to create integers from the hex strings
bdk = createObject("java", "java.math.BigInteger").init(bdkText, 16);
mask = createObject("java", "java.math.BigInteger").init(maskText, 16);
// apply the mask and convert the result to hex (upper case)
newKeyHex = ucase( bdk.xor(mask).toString(16) );
WriteOutput("<br>newKey="& newKeyHex);
writeOutput("<br>expected=C1E385A789ABCDEF3E1C7A5876543210");
That should be enough to get you back on track. Given some of CF's limitations here, java would be a better fit IMO. If you are comfortable with it, you could write a small java class and invoke that from CF instead.
Is there a native method in R to test if a file on disk is an ASCII text file, or a binary file? Similar to the file command in Linux, but a method that will work cross platform?
The file.info() function can distinguish a file from a dir, but it doesn't seem to go beyond that.
If all you care about is whether the file is ASCII or binary...
Well, first up definitions. All files are binary at some level:
is.binary <- function(file){
if(system.type() != "quantum computer"){
return(TRUE)
}else{
return(cat=alive&dead)
}
}
ASCII is just an encoding system for characters. It is therefore impossible to tell if a file is ASCII or binary, because ASCII-ness is a matter of interpretation. If I save a file and decide that binary number 01001101 is Q and 01001110 is Z then you might decode this as ASCII but you'll get the wrong message. Luckily the Americans muscled in and said "Hey, everyone use ASCII to code their text! You get 128 characters and a parity bit! Woo! Go USA!". IBM tried to tell people to use EBCDIC but nobody listened. Which was A Good Thing.
So everyone was packing ASCII-coded text into their 8-bit bytes, and using the eighth bit for parity checking. But then people stopped doing parity checking because TCP/IP handled all that, which was also A Good Thing, and the eighth bit was expected to be zero. If not, there was trouble.
Because people (read "Microsoft") started abusing the eighth bit, and making up their own encoding schemes, and so unless you knew what encoding scheme the file was using, you were stuffed. And the file very rarely told you what encoding scheme it was. And now we have Unicode and even more encoding schemes. And that is a third Good Thing. But I digress.
Nowadays when people ask if a file is binary, what they are normally asking is "Does any byte in this file have it's highest bit set?". Which you can do in R by reading a raw file connection as unsigned integers and testing the highest value. Something like:
is.binary <- function(filepath,max=1000){
f=file(filepath,"rb",raw=TRUE)
b=readBin(f,"int",max,size=1,signed=FALSE)
return(max(b)>128)
}
This will by default test only at most the first 1000 characters. I think the file command does something similar.
You may want to change the test to check for printable character codes, and whitespace, and line feed, carriage return, and other codes you might want to consider plausible in your non-binary files...
Well, how would you do that? I guess you can't without reading (parts or all of) the file, which is why files extensions are used to signal content type.
I looked into that years ago---and as I recall, the file(1) apps actually reads the first few header bytes of a file and compares that to what is stored in a lookup table. Sounds like a good candidate for an add-on package to me..
The example section of the manual for ?raw uses this:
isASCII <- function(txt) all(charToRaw(txt) <= as.raw(127))
I'm using DES in TCL to encrypt some phrases and I want to store those encrypted phrases in some ascii files which I need to manipulate easily. Therefore, I would like the "encrypted phrase" to be constituted only of standard ascii characters (preferentially with no spaces).
I'm using something like this to ecrypt:
set encrypted [ DES:des -dir encrypt -key "abcdefgh" "This_phrase" ]
I would like "encrypted" to be a standard ascii code, not something that,
as it happens, may even brake my terminal if displayed.
Thank you very much.
Leandro.
You could either replace all characters that might have a special meaning (everything except a-zA-Z0-9 etc) or encode it with e.g. base64.
set encrypted [base64::encode -wrapchar {} [DES:des -dir encrypt -key abcdefgh "This_phrase"]]
You need to strip the extra layer (base64, escape sequence encoding or whatever you used to convert the binary data to ascii) if you want to decode it.
I recently wrote a zip file I/O library called zipzap, but I'm struggling with correctly decoding zip entry file names from arbitrary zip files.
Now, the PKWARE spec states:
D.1 The ZIP format has historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437...
D.2 If general purpose bit 11 is unset, the file name and comment should conform
to the original ZIP character encoding. If general purpose bit 11 is set, the
filename and comment must support The Unicode Standard, Version 4.1.0 or
greater using the character encoding form defined by the UTF-8 storage
specification...
which means that conforming zip files encode file names as CP437, unless the EFS bit is set, in which case the file names are UTF-8.
Unfortunately it seems that a lot of zip tools either don't set the EFS bit correctly (e.g. Mac CLI, GUI zip) or use some other encoding, typically the default system one (e.g. WinZip?). If you know how WinZip, 7-Zip, Info-Zip, PKZIP, Java JAR/Zip, .NET zip, dotnetzip, etc. encode file names and what they set their "version made by" field to when zipping, please tell me.
In particular, Info-Zip tries this when unzipping:
File system = MS-DOS (0) => CP437
except: version = 2.5, 2.6, 4.0 => ISO 8859-1
File system = HPFS (6) => CP437
File system = NTFS (10) and version = 5.0 => CP437
otherwise, ISO 8859-1
If I want to support inspecting or extracting from arbitrary zip files and make a reasonable attempt at the file name encoding without the EFS flag, what can I look for?
At the moment situation is as following:
most of Windows implementations use DOS (OEM) encoding
Mac OS zip utility uses utf-8, but it doesn't set utf-8 bit flags
*nix zip utilities silently uses system encoding
So the only way is to check if filename contains something like utf-8 characters (check description of utf8 encoding - first byte should be 110xxxxx, second - 10xxxxxx for 2-bytes encoded chars). If it is correct utf8 string - use utf8 encoding. If not - fall back to OEM/DOS encoding.
The only way to determine if the filename is encoded as UTF-8 without using the EFS flag is to check to see if the high order bit is set in one of the characters. That could possibly mean that the character is UTF-8 encoded. However, it could still be the other way as there are some characters in CP437 that have the high order bit set and aren't meant to be decoded as UTF-8.
I would stick to the PKWARE app note specification and not hack in a solution that tries to conform to every known zip application in existence.
May the encryption string provided by PBEWithMD5AndDES and then Base64 encoded contain the CR and or LF characters?
Base64 is only printable characters. However when it's used as a MIME type for email it's split into lines which are separated by CR-LF.
PBEWithMD5AndDES returns binary data. PBE encryption is defined within the PKCS#5 standard, and this standard does not have a dedicated base 64 encoding scheme. So the question becomes for which system you need to Base 64 encode the binary data. Wikipedia has a nice section within the Base 64 article that explains the various forms.
You may encounter a PBE implementation that returns a Base 64, and the implementation does not mention which of the above schemes is used. In that case you need to somehow figure out which scheme is used. I would suggest searching for it, asking the community, looking at the source or if all fails, creating a set of tests on the output.
Fortunately you are pretty safe if you are decoding base 64 and you are ignoring all the white space. Note that some implementations are disregarding padding, so add it before decoding, if applicable.
If you perform the encoding base 64 yourself, I would strongly suggest to not output any whitespace, use only the default alphabet (with '+' and '/' signs) and always perform padding when required. After that you can always split the result and replace any non-standard character (especially the '+' and '/' signs of course), or remove the padding.
I was using java with Andorid SDK. I found that the command:
String s = Base64.encodeToString(enc, Base64.DEFAULT);
did line wrapping. It put LF chars into the output string.
I found that:
String s = Base64.encodeToString(enc, Base64.NO_WRAP);
did not put the LF characters into the output string.