Jasypt Encryption: Possible Characters? - encryption

How do I know exactly what characters are used for the encrypted output using jasypt? Can I force that my output does not contain certain characters or are always all ASCII characters used?
Reason I am asking is that the encrypted text is part of a file with delimiters and I would like to avoid that this delimiter is part of the encrypted text. The delimiter should also not be a hidden character, like SOH, because the file can be edited manually.

"Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable and humanly readable. None of the special characters available in ASCII are used. The 64 characters (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters as well as '+' and '/'."
So, looks like I can use ASCII special characters as a delimiter.

Related

base64 decoded string gives weird chars

eyJpc3MiOiJodHRwczpcL1wvYXV0aC5zbmFwY2hhdC5jb21cL3NuYXBfdG9rZW5cL3Rva2VuIiwidHlwIjoiSldUIiwiZW5jIjoiQTEyOENCQy1IUzI1NiIsImFsZyI6ImRpciIsImtpZCI6InNuYXAtYWNjZXNzLXRva2VuLWExMjhjYmMtaHMyNTYuMCJ9..mpqjrn8IPdzqrQC0VhJwMA.JJN9Rrc1k_qh1Iq-jGS-1754-iI_L5mISH7mHix5WCIXx4wqkQz3z8o9nDcBRUJijioV_EMFYW9OayWGHaFR5NlG0ROKHfkJPPWSz4Y47jyZwQxKjEQDMCdPi9HcpNJM_ao6umAQj3gdfFqGK8M9e2_oYy-q6bR6UzeqFvQVLt599KLwl2yJhevgLRFBs7kLd5NG8ZsKGNhTwWs7zYPPZFutyhOmPY13zt1hJsSwek1UXRRZm8qZEEQZsmSbuSQ0sAMvyIh9uZyMCEwdMfo6pU31cnya29Pi_vHJP_TLHH0PNgddOPzpp911Yp4c1lfEY99C3dknQ5DJFtkfdaA3MAUrqKj8NAsIcrX8qPrxpVhDgZ2tqqrkgQb6EMoxEIdRGssIRdR5_jL-F8_8xfhNxIM3mv1NEPkSPIBfOsbSRbBGPecCUwmaB-yP9OmPEyUWv0ieQkGKp5B1J6cFykrMlpmmGkB7H9WIwuDNM4IPLBBBaLgGegIBdwrTU22Yv7Qn2RXKpDObPRuSghUmIvLpr_LwGZ78N4YW-G-nTw_EOjlD58UDHOuth_EcKszBeLs0_EIe9JZzykjulg3ffROHI-
This is a token. when base64 decoded it gives some valid output but then it starts printing weird chars. Is this really only b64 or is their even a way to tell. I stripped some of the chars out for obvious reasons.
It's not base 64 encoded, it is base 64 URL encoded. Replace the - with a + and the _ (underscore) with the / character, then pad with = characters until you have a multiple of 4 base 64 characters (not counting whitespace). Then decode and the result should be correct. If not, the base 64 URL code was probably damaged.
I presume you have stripped off the characters and replaced them by a dot, because dots should not be present in URL-safe base 64.
Of course finding and using a base64url decoder would be more efficient than the generic find/replace/append scheme mentioned here.

Why does it seem to be that several different hexadecimal numbers represented as the dot (".") symbol?

I noticed that the symbol . doesn't represent the same hexadecimal number when I tried to tune my YARA rules that I run on VirusTotal. When I tried to exclude the false positive-generating text string .sample., it would not get excluded because . converted from text representation was 2E in this case, meanwhile in the string, that was actually contained in the false positives, . represented 00.
I assume that when the files are matched, text is converted to hex, the hex string is then matched in a hexdump of a file and the whole hexdump is converted to text in the VT preview.
Then I noticed that there were actually more hexadecimal numbers that were represented as . in VirusTotal's text preview. For example, 0A, 99, 09 (screenshot).
I tried seeing the text representation of these hex numbers using an online converter (http://www.unit-conversion.info/texttools/hexadecimal/) and some of them were represented as � or a blank symbol (not a space symbol, as the number 20, but just a blank space).
So my questions are - why do different numbers seem to represent the same symbol? In addition, what do the "blank spaces" represent in a file's hexdump?
The 0A characters are line feed characters, as can be seen from the table in this doc, while the 2E characters are actual periods.
As per this answer on the same issue:
These are whitespace characters, and if included literally would mess up the ASCII table. That's why they (as well as the unprintable control characters below 32, and any binary values above 127, which aren't defined by ASCII and would need another character set to be interpreted correctly) are represented by .
Essentially, the '.' character is a catch-all for things which can't be shown properly in the table.
As for the online converter, it appears to generate characters until 7F, after which ASCII's 128 bit implementation is no longer defined and the translator provides a � symbol. Even from 00 to 7F we find the translator has issues with a few hex values including the line feed character 0A.
The ASCII table linked earlier hints at a few characters which the translator might have trouble with, such the DEL character (7F), the bell (07), and ENQ (05).
I would expect that blank spaces are whitespace characters, this should be possible to verify in the ASCII table.

Simple string encryptation - safety of higher ascii characters

I am trying to create a simple encryptation scheme for strings. Each character of the string is given another ascii value.
It entails writing ascii characters upto 246 to a simple file on disk.
I want to find out if it is safe to write these special characters to the disk or can it cause untoward results. Thanks for your help.
Edit: I am considering algorithm similar to following:
* Convert each character of string to its integer number (hence 110 for 'n' and 122 for 'z')
* Double that number (get 220 and 244)
* Convert this to character (will get extended ascii codes)
* Save these characters to file.
Is it safe to save these extended ascii characters to disk files using usual text file writing functions?
There is only a limited set of ASCII characters. There are 95 printable characters such as 'A' but also the space character. There are 33 printable characters such as Line Feed, Carriage Return, NUL but also DELETE. So you cannot use 246 characters of ASCII as there are only 128 total available. ASCII is strictly 7 bits giving you 2^7 = 128 possible values.
Even if you would use the ISO 8859 Latin character set or the Windows-1252 character set you would still have the unprintable control characters to deal with, leaving you with 256 - 33 - 5 characters or 218 characters. Windows-1252 still has 5 undefined characters.
What you can do is of course save your data as bytes. Each byte has 256 possible values (usually 0 to 255 or -128 to 127). As long as you open files as binary this pose no problem.
You can of course store as many characters in a file as you want, up to the file system or operating system limit. So I presume you didn't ask that.

How to represent acute accents in ASCII?

I'm having an encoding problem related to cookies on one of my websites.
A user is inputing Usuário, which has an acute accent, and that's being put in a cookie. The raw HEX for the cookie response is (for the Usuário string):
55 73 75 C3 A1 72 69 6F
When I see it in the browser, it looks like this:
...which is really messy. I need to fix this up.
Then I went to this website: http://www.rapidtables.com/convert/number/hex-to-ascii.htm and converted the HEX value to see how it would look like. And I got the same output:
Right. This means the HEX code is wrong. Then I tried to convert Usuário to ASCII to see how it should be. I used this WebSite: http://www.asciitohex.com/ and this is the result:
For my surprise, the HEX is exactly the one that is showing up messy. Why???
And how do I represent Usuário in ASCII so I can put it in a cookie? Should I manually encode it?
PS: I'm using ASP.NET, just in case it matters.
As of 2015 the standard of the web to store character data is UTF-8 and not ASCII. ASCII actually only contains the first 128 characters of the codepage, and does not include any kind of accented characters. To add accented characters to this 128 characters there were many legacy solutions: codepages. They each added 128 different characters to the default ASCII list thereby allowing representing 256 different characters.
The problem was, that this didn't properly solve the issue: ASCII based codepages were more or less incomatible with each other (except for the first 128 characters), and there was usually no way of programatically knowing which codepage was in used.
One of the solutions was UTF-8, which is a way to encode the unocde character set (containing most of the characters used around the world, and more) while trying to remain compatible with ASCII. The first 128 characters are actually the same in both cases, but afterwards UTF-8 characters become multi-byte: one character is encoded using a series of bytes (usually 2-3, depends on which character needs to be encoded)
The problem is if you are using some kind of ASCII based single byte codebase (like ISO-8859-1), which encodes supported characters in single bytes, but your input is actually UTF-8, which will encode accented characters in multiple bytes (you can see this in your HEX example. á is encoded as C3 A1: two bytes). If you try to read these two bytes in an ASCII based codepage, which uses single bytes for every characters (in West-Europe this codepage is usually ISO-8859-1), then each of this two bytes will be reprensented with two different characters.
In the web world the default encoding is UTF-8, so your clients will usually send their requests using UTF-8. ASP.NET is Unicode aware, so it can handle these requests. However somewere in your code this UTF-8 is converted acccidentally into ISO-8859-1, and then back into UTF-8. This might happen on various layers. As you have issues it probably happens at the cookie layer, which is sometimes problematic (here is how it worked in 2009). You should also double check your application that it uses UTF-8 everywhere else though (views, database, etc.), if you want to properly support accented characters.

Creating a password regex

Right now I need to duplicate a password expression validator for a website. The password is only required to be 8-25 characters (only alphabet characters) long. I thought this was weird and had been using this regex
(?!^[0-9]*$)(?!^[a-zA-Z]*$)^([a-zA-Z0-9]{8,25})
but it has to be optional to have a capital letter, special characters and/or numbers throughout the password. I'm not particularly apt at building regex's where there are optional characters. Any help would be be appreciated.
I am using asp.net's RegularExpressionValidator.
This pattern should work:
^[a-zA-Z]{8,25}$
It matches a string consisting of 8 to 25 Latin letters.
If you want to allow numbers as well, this pattern should work:
^[a-zA-Z0-9]{8,25}$
It matches a string consisting of 8 to 25 Latin letters or decimal digits.
If you want to allow special characters as well, this pattern should work:
^[a-zA-Z0-9$#!]{8,25}$
It matches a string consisting of 8 to 25 Latin letters, decimal digits, or symbols, $, # or ! (of course you can add to this set fairly easily).
Your current regex won't work because it will accept special characters as from 9th character (and anything after the 9th character in fact, even a 26th character because you don't have the end of string anchor) .
You probably want something like this:
^(?=.*[a-z])[A-Za-z0-9]{8,25}$
This first makes sure there are lowercase alphabets (you mentioned that uppercase and digits are optional, so this makes obligatory lowercase) and then allows only uppercase and digits.
EDIT: To allow any special characters, you can use this:
^(?=.*[a-z]).{8,25}$
My understanding of your problem is that the password's first requirement is that it has to contain lowercase alphabet characters. The option now is that it can also contain other characters. If this isn't right, let me know.
regex101 demo

Resources