arduino wifi passphrase with \0 - arduino

I'm trying to connect Arduino to a WiFi network.
const char* ssid = "ssid";
char* password = "some_hex_chars";
.
.
.
void setup(void){
WiFi.begin(ssid, password);
.
.
.
The problem is, a have code 0x00 somewhere in passphrase. Since begin() method takes argument char, which is null-terminated string, password is truncated.
Is there a way to work around this? Where can I find source of begin() method to modify it?
Edit: WRONG.
It's not passphrase, it's PSK with 64 hexadecimal characters, and it doesn't want to connect.
Update:
I solved the problem. I wasn't the PSK problem, but WiFi router advanced settings. When
54g™ Mode is set to 54g Performance, it doesn't want to connect. After I changed it to 54g Auto, it works fine.

I know nothing about Arduino - but more about 802.11 aka Wi-Fi. An so:
Don't do that. If you have a 0x00 in the middle your passphrase, it is technically invalid, as per the IEEE 802.11 standards.
And so will, I presume, be interpreted as the next to last character of your passphrase ( = the passphrase is everything before this 0x00) by your 802.11 stack if correctly implemented, and you're looking for undefined behavior - at best, interoperability problems, at worst, you're taking a bet.
How is that?
(warning: this is going to be boring, lots of "network lawyer" stuff)
The IEEE 802.11 standard relevant to this is IEEE Std 802.11i-2004 "Amendment 6: Medium Access Control (MAC) Security Enhancements"[0], aka "WPA2".
(I won't dig deeper down to WEP, which is clearly deprecated to no use, nor "basic" WPA, which was a transition waiting for this WPA2 standard to be complete).
The relevant part can be found in 802.11i, the ASN MIB[1] (annex D, normative), on page 136, define the "dot11RSNAConfigPSKPassPhrase" as a "DisplayString". So what type of data exactly is a "DisplayString"?
RFC 1213, "Management Information Base for Network Management of TCP/IP-based internets: MIB-II", from 1991, on page 3, states that:
"A DisplayString is restricted to the NVT ASCII character set, as
defined in pages 10-11 of [6]."
OK...
This "[6]" is RFC 854, from 1983 (Wow! These IETF and IEEE design their standards seriously and really, really build upon). Are you still following me? :-) So having a look at it we learn that NVT stands for "Network Virtual Terminal", and in pointed to page 10 and 11, we found:
The NVT printer [sic! remember that's 1983] [...] can produce
representations of all 95 USASCII graphics (codes 32 through 126).
OK, ASCII codes 32 to 126. Now let's come back to IEEE 802.11i:
In Annex H (informative), "RSNA reference implementations and test vectors", section "H.4 Suggested pass-phrase-to-PSK mapping" (remember that the purpose of the passphrase, mathematicaly massaged with the SSID, is to derive a PSK (Pre-Shared Key), more useful for 802.11 operation but much less user-friendly than "a damned simple passphrase that I can type with a damned keyboard"). Which, phrased the IEEE way, gives this (page 165):
The RSNA PSK consists of 256 bits, or 64 octets when represented in
hex. It is difficult for a user to correctly enter 64 hex characters.
Most users, however, are familiar with passwords and pass-phrases and
feel more comfortable entering them than entering keys. A user is more
likely to be able to enter an ASCII password or pass-phrase, even
though doing so limits the set of possible keys. This suggests that
the best that can be done is to introduce a pass-phrase to PSK
mapping.
This clause defines a pass-phrase–to–PSK mapping that is the
recommended practice for use with RSNAs.
This pass-phrase mapping was introduced to encourage users unfamiliar
with cryptographic concepts to enable the security features of their
WLAN.
...so for what's the purpose of a passphrase. And then on following page 166:
Here, the following assumptions apply:
A pass-phrase is a sequence of between 8 and 63 ASCII-encoded characters. The limit of 63 comes from the desire to distinguish
between a pass-phrase and a PSK displayed as 64 hexadecimal
characters.
Each character in the pass-phrase must have an encoding in the range of 32 to 126 (decimal), inclusive. [emphasis mine]
And Voila! Indeed, "32 to 126 (decimal), inclusive".
So here we have again our passphrase as ASCII "in the range of 32 to 126 (decimal)", confirmed from IEEE to IETF back to IEEE. We also learn that it's supposed to be between 8 and 63 bytes long, which, I would infer, imply that if longer than 63 bytes it will be trimmed down (and not NULL terminated, which is not a problem), and if shorter, will be cut at the first character outside of the 32-126 ASCII code. Of course the C string NULL terminator 0x00 is the more practical, sensible to use for this BTW.
So, passphrase = a string consisting only of 32 to 126 (decimal) ASCII code.
Have a look at an ASCII table, and you'll see this start with space, and end with the tilde '~'.
And there's definitely not the 0x00 in that.
Hence, long story short: your passphrase is standard-wise technically invalid, and you're looking for undefined behavior.
Congratulation if you've read me this far!
Addendum:
When it comes to networking protocol, do never, ever assume that what looks like "a string" is just "a string" from whatever you may presuppose, and always check the exact encoding/limitations.
Other example regarding Wi-Fi:
Another "string" is the SSID. Is this really a string? No. It is a raw array of 32 bytes, no ASCII, no UTF-8, Unicode, whatever, no termination, just 32 raw bytes, even if you "set it" as "foobar + NULL terminator" a whole 32 bytes will be used by the stack and go on the air (look at a wireshark trace, and doucle-click the SSID field in the dissection: 32 bytes long). So an SSID could consist of only ASCII spaces, tabs, CR, LF and a few 0x00 here and there, or only 0x00 BTW, it will be perfectly valid and managed as a full 32 bytes sequence anyway.
EDIT:
I wondered about your motivation for setting such a passphrase, and the only idea I could come up with - correct me if I'm wrong - is that your purpose was to play a neat trick to ensure that a regular user, using a regular keyboard, could never enter the passphrase. Sadly - or actually hopefully - as I explained, this cannot work because the IEEE precisely designed the passphrase data type to be 100% sure that anybody, using the most basic keyboard, could always type it. That was their motivation.
And so, what can you do?
As an alternative, you could directly use a PSK. That's plain raw 32 bytes (representation 64 hex digit ASCII), with no type-able/printable consideration. For example, from the hostapd.conf file (of course the example PSK is represented here as "text", but that's actually raw bytes):
# WPA pre-shared keys for WPA-PSK. This can be either entered as a 256-bit
# secret in hex format (64 hex digits), wpa_psk, or as an ASCII passphrase
# (8..63 characters) that will be converted to PSK. This conversion uses SSID
# so the PSK changes when ASCII passphrase is used and the SSID is changed.
# wpa_psk (dot11RSNAConfigPSKValue)
# wpa_passphrase (dot11RSNAConfigPSKPassPhrase)
#wpa_psk=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
#wpa_passphrase=secret passphrase
But then of course, 1/ this may not fit your use case (deployment wise), and 2/ the Arduino Wi-Fi API may have no such capabilities.
Hope this help.
[0]:
You can download it for free here:
http://standards.ieee.org/getieee802/download/802.11i-2004.pdf
[1]:
That's IEEE jargon for "Management Information Base" in "Abstract Syntax Notation", which is a formal, hierarchical notation of every data with their names and types for a given standard. You can think of it as "XML", only its not XML, and is used by IETF and IEEE (RFC 2578, RFC 1213).

Related

How to build a serial (RS422 or RS232) message to communicate with Sick LMS200 via PuTTy?

I have a serial device (sick LMS200) connected to my PC using a RS422 to USB converter. The serial settings (baud, stop bits, etc...) on the LMS200 and my PC match and are communicating (verified using an application that ships with the LMS200). I need to write a custom application which communicates with the LMS.
Before I can begin building my application I need to figure out how to exchange datagrams between the PC and the LMS. To figure this out I have been trying to manually send datagrams using PuTTy. The manual for the LMS ( https://drive.google.com/open?id=0Byv4owwJZnRYVUJPMXdud0Z6Uzg) defines the datagram types and how they should be built. For example, on pg 46 of the manual it is possible to see a datagram that sends a specific instruction to the unit; it looks like this: 02 00 02 00 30 01 31 18.
However when I use PuTTy to send the string 02 00 02 00 30 01 31 18 the LMS does not respond (which it should). I believe it does not respond because the datagram is missing either some serial header data or I am not representing the hex values correctly (I tried to represent bytes such as00 using 0x00 and 00h but had no success). Can you please help me formulate a valid serial message using the manual? I have been at this for a very long time and I am having a really hard time understanding how to convert the information in the manual into a valid datagram.
Please let me know if I can provide any more info. Thanks in advance.
I am not representing the hex values correctly (I tried to represent bytes such as00 using 0x00 and 00h but had no success).
The Ctrl key on terminal/PC keyboards can be used to generate ASCII control characters (i.e. the unprintable characters with byte values of 0x00 through 0x1F).
Just like the Shift key generates the shifted or uppercase character of the key (instead of its unshifted or lower-case character), the Ctrl key (with an alphabetic or a few other keys) can generate an ASCII control character.
The typical USA PC keyboard can generate an ASCII 'NUL' character by typing ctrl-#, that is, by holding down the CTRL and Shift keys, and typing 2 (since the '#' character is the shifted character of the 2 key on USA PC keyboards).
In similar fashion for 'SOH' or 0x01 type ctrl-A (i.e. CTRL+A keys, the Shift is not necessary), for 'STX' or 0x02 type ctrl-B, et cetera.
For 'SUB' or 0x1A type ctrl-Z.
For 'ESC' or 0x1B type the Esc key.
For 'FS' or 0x1C type ctrl-\ (or CTRL+\).
For 'GS' or 0x1D type ctrl-] (or CTRL+]).
For 'RS' or 0x1E type ctrl-^ (or CTRL+Shift+6).
For 'US' or 0x1F type ctrl-_ (or CTRL+Shift+-).
Note that a few oft-used ASCII control codes have dedicated keys, e.g.
'HT' has the Tab key for 0x09,
'BS' has the Backspace key for 0x08,
'LF' has the Enter key (in Linux) for 0x0A, and
'ESC' has the Esc key for 0x1B.
When you don't know how to generate the ASCII control characters from the keyboard, you could fall back on creating the message in a file using a hex editor (not a text editor), and then send the file.
Actually the binary file could be the most reliable method of hand-generation of binary messages. Hand-typing of control codes could fail when a code is intercepted by the shell or application program as a special directive to it (e.g. ctrl-C or ctrl-Z to abort the program) rather than treating it as data.
Escaping the input data is one method that might be available to avoid this.
Phone modems have managed to avoid this issue when in transparent (aka data) mode, by requiring a time guard (i.e. specific idle times) to separate and differentiate commands from data.
The way to get this done is to:
(1) download HexEdit software, and create a file containing HEX values (not decimal reprsentations of the ascii table - where the number 2 was being tramitted as 32)
(2) use Tera Term software to then send the file over the serial line.

Is it possible to tell which hash algorithm generated these strings?

I have pairs of email addresses and hashes, can you tell what's being used to create them?
aaaaaaa#aaaaa.com
BeRs114JrR0sBpueyEmnOWZfnLuigYTA
and
aaaaaaaaaaaaa.bbbbbbbbbbbb#cccccccccccc.com
4KoujQHr3N2wHWBLQBy%2b26t8GgVRTqSEmKduST9BqPYV6wBZF4IfebJS%2fxYVvIvR
and
r.r#a.com
819kwGAcTsMw3DndEVzu%2fA%3d%3d
First, the obvious even if you know nothing about cryptography: the percent signs are URL encoding; decoding that gives
BeRs114JrR0sBpueyEmnOWZfnLuigYTA
4KoujQHr3N2wHWBLQBy+26t8GgVRTqSEmKduST9BqPYV6wBZF4IfebJS/xYVvIvR
819kwGAcTsMw3DndEVzu/A==
And that in turn is base64. The lengths of the encodings wrt the length of the original strings are
plaintext encoding
17 24
43 48
10 16
More samples would give more confidence, but it's fairly clear that the encoding pads the plaintext to a multiple of 8 bytes. That suggest a block cipher (it can't be a hash since a hash would be fixed-size). The de facto standard block algorithm is AES which uses 16-byte blocks; 24 is not a multiple of 16 so that's out. The most common block algorithm with a block size of 8 (which fits the data) is DES; 3DES or blowfish or something even rarer is also a possibility but DES is what I'd put my money on.
Since it's a cipher, there must be a key somewhere. It might be in a configuration file, or hard-coded in the source code. If all you have is the binary, you should be able to locate it with the help of a debugger. With DES, you could find the key by brute force (because a key is only 56 bits and that's doable by renting a bit of CPU time on Amazon) but finding it in the program would be easier.
If you want to reproduce the algorithm then you'll also need to figure out the mode of operation. Here one clue is that the encoding is never more than 7 bytes longer than the plaintext, so there's no room for an initialization vector. If the developers who made that software did a horrible job they might have used ECB. If they made a slightly less horrible job they might have used CBC or (much less likely) some other mode with a constant IV. If they did an again slightly less horrible job then the IV may be derived from some other characteristic of the account. You can refine the analysis by testing some patterns:
If the encoding of abcdefghabcdefgh#example.com (starting with two identical 8-byte blocks) starts with two identical 8-byte blocks, it's ECB.
If the encoding of abcdefgh1#example.com and abcdefgh2#example.com (differing at the 9th character) have identical first blocks, it's CBC (probably) with a constant IV.
Another thing you'll need to figure out is the padding mode. There are a few common ones. That's a bit harder to figure out as a black box except with ECB.
There are some tools online, and also some open source projects. For example:
https://code.google.com/archive/p/hash-identifier/
http://www.insidepro.com/

Is it possible to send ASCII control codes via RS232?

I would like to receive and send bytes that have special meaning in ASCII code like End of Text, End of Transmission etc. but I am not sure if it is allowed. Can it break my communication? Sending and receiving looks like reading from file that is why I doubt I can directly use this specific values. I use Windows OS.
EDIT: I have tested it and there is no problem with any sign. All of control ASCII characters can be sent via RS 232. Both reading and writing cause no unexpected behaviour.
RS232 is a very binary protocol. It does not even assume 8-bit bytes, let alone ASCII. The fact that on Windows, you use file functions does not matter either. Those too do not assume text data, although those do assume 8-bit byets.
RS-232 nodes do not interpret the data except in software flow control mode (XOn/XOff). You use this mode only if both parties agree and agree on the values of XOn and XOff.
The values are historically based on the ASCII DC1 and DC3 characters but the only thing that matters is their values, 0x11, and 0x13.
If you haven't set up software flow control, all values are passed through as-is.

Should I Use Base64 or Unicode for Storing Hashes & Salts?

I have never worked on the security side of web apps, as I am just out of college. Now, I am looking for a job and working on some websites on the side, to keep my skills sharp and gain new ones. One site I am working on is pretty much copied from the original MEAN stack from the guys that created it, but trying to understand it and do things better where I can.
To compute the hash & salt, the creators used PBKDF2. I am not interested in hearing about arguments for or against PBKDF2, as that is not what this question is about. They seem to have used buffers for everything here, which I understand is a common practice in node. What I am interested in are their reasons for using base64 for the buffer encoding, rather than simply using UTF-8, which is an option with the buffer object. Most computers nowadays can handle many of the characters in Unicode, if not all of them, but the creators could have chosen to encode the passwords in a subset of Unicode without restricting themselves to the 65 characters of base64.
By "the choice between encoding as UTF-8 or base64", I mean transforming the binary of the hash, computed from the password, into the given encoding. node.js specifies a couple ways to encode binary data into a Buffer object. From the documentation page for the Buffer class:
Pure JavaScript is Unicode friendly but not nice to binary data. When dealing with TCP
streams or the file system, it's necessary to handle octet streams. Node has several
strategies for manipulating, creating, and consuming octet streams.
Raw data is stored in instances of the Buffer class. A Buffer is similar to an array
of integers but corresponds to a raw memory allocation outside the V8 heap. A Buffer
cannot be resized.
What the Buffer class does, as I understand it, is take some binary data and calculate the value of each 8 (usually) bits. It then converts each set of bits into a character corresponding to its value in the encoding you specify. For example, if the binary data is 00101100 (8 bits), and you specify UTF-8 as the encoding, the output would be , (a comma). This is what anyone looking at the output of the buffer would see when looking at it with a text editor such as vim, as well as what a computer would "see" when "reading" them. The Buffer class has several encodings available, such as UTF-8, base64, and binary.
I think they felt that, while storing any UTF-8 character imaginable in the hash, as they would have to do, would not phase most modern computers, with their gigabytes of RAM and terabytes of space, actually showing all these characters, as they may want to do in logs, etc., would freak out users, who would have to look at weird Chinese, Greek, Bulgarian, etc. characters, as well as control characters, like the Ctrl button or the Backspace button or even beeps. They would never really need to make sense of any of them, unless they were experienced users testing PBKDF2 itself, but the programmer's first duty is to not give any of his users a heart attack. Using base64 increases the overhead by about a third, which is hardly worth noting these days, and decreases the character set, which does nothing to decrease the security. After all, computers are written completely in binary. As I said before, they could have chosen a different subset of Unicode, but base64 is already standard, which makes it easier and reduces programmer work.
Am I right about the reasons why the creators of this repository chose to encode its passwords in base64, instead of all of Unicode? Is it better to stick with their example, or should I go with Unicode or a larger subset of it?
A hash value is a sequence of bytes. This is binary information. It is not a sequence of characters.
UTF-8 is an encoding for turning sequences of characters into sequences of bytes. Storing a hash value "as UTF-8" makes no sense, since it is already a sequence of bytes, and not a sequence of characters.
Unfortunately, many people have took to the habit of considering a byte as some sort of character in disguise; it was at the basis of the C programming language and still infects some rather modern and widespread frameworks such as Python. However, only confusion and sorrow lie down that path. The usual symptoms are people wailing and whining about the dreadful "character zero" -- meaning, a byte of value 0 (a perfectly fine value for a byte) that, turned into a character, becomes the special character that serves as end-of-string indicator in languages from the C family. This confusion can even lead to vulnerabilities (the zero implying, for the comparison function, an earlier-than-expected termination).
Once you have understood that binary is binary, the problem becomes: how are we to handle and store our hash value ? In particular in JavaScript, a language that is known to be especially poor at handling binary values. The solution is an encoding that turns the bytes into characters, not just any character, but a very small subset of well-behaved characters. This is called Base64. Base64 is a generic scheme for encoding bytes into character strings that don't include problematic characters (no zero, only ASCII printable characters, excluding all the control characters and a few others such as quotes).
Not using Base64 would imply assuming that JavaScript can manage an arbitrary sequence of bytes as if it was just "normal characters", and that is simply not true.
There is a fundamental, security-related reason to store as Base64 rather than Unicode: the hash may contain the byte value "0", used by many programming languages as an end-of-string marker.
If you store your hash as Unicode, you, another programmer, or some library code you use may treat it as a string rather than a collection of bytes, and compare using strcmp() or a similar string-comparison function. If your hash contains the byte value "0", you've effectively truncated your hash to just the portion before the "0", making attacks much easier.
Base64 encoding avoids this problem: the byte value "0" cannot occur in the encoded form of the hash, so it doesn't matter if you compare encoded hashes using memcmp() (the right way) or strcmp() (the wrong way).
This isn't just a theoretical concern, either: there have been multiple cases of code for checking digital signatures using strcmp(), greatly weakening security.
This is an easy answer, since there are an abundance of byte sequences which are not well-formed UTF-8 strings. The most common one is a continuation byte (0x80-0xbf) that is not preceded by a leading byte in a multibyte sequence (0xc0-0xf7); bytes 0xf8-0xff aren't valid either.
So these byte sequences are not valid UTF-8 strings:
0x80
0x40 0xa0
0xff
0xfe
0xfa
If you want to encode arbitrary data as a string, use a scheme that allows it. Base64 is one of those schemes.
An addtional point: you might think to yourself, well, I don't really care whether they're well-formed UTF-8 strings, I'm never going to use the data as a string, I just want to hand this byte sequence to store for later.
The problem with that, is if you give an arbitrary byte sequence to an application expecting a UTF-8 string, and it is not well-formed, the application is not obligated to make use of this byte sequence. It might reject it with an error, it might truncate the string, it might try to "fix" it.
So don't try to store arbitrary byte sequences as a UTF-8 string.
Base64 is better, but consider a websafe base64 alphabet for transport. Base64 can conflict with querystring syntax.
Another option you might consider is using hex. Its longer but seldom conflicts with any syntax.

Find the encryption type with the orignal message and the crypted message

Does its possible or does is a tool that can find the encryption method when we have the original message and the crypted message ?
Example : crypted message : ZHVoYW1lbA
: original message : duhamel
example2 : crypted message : ZmV5
: original message : fey
No.
About all you can do is use the length of messages to work out whether it is a block or stream cipher (if it is a block cipher they will be multiples of some fixed size). Even that requires some care as you need to guess whether IV and HMAC or similar have been used (for example, in CTR mode, the (so-called) IV is half a block).
If your examples are real, then that's not a block cipher, because the encrypted messages are too short. And I don't really understand what the encoding is - normally an encrypted message is binary, rather than characters, so is written as a hex string or similar. But your examples seem to be character strings.
So your examples are either made up or something unusual - more likely a "home-made" code than a standard algorithm used in software libraries.
[Edit:] I'm updating this answer after working on https://stackoverflow.com/questions/18560948/encrypted-string-by-unknown-method#comment27312634_18560948
The above is talking about encryption. Sometimes, however, what people are actually asking about is the encoding. That is how the bytes in the (probably encrypted, but perhaps not) message are converted into something that is displayed or sent over the internet, or whatever. This might be hex, or base 64, or something more complex like PEM. Often you can guess this, because different encodings tend to look different. Base 64 often ends with "=", for example. And sometimes, this can give you a clue about the encryption used. For example, PEM has distinctive header lines, which makes it easy to identify, and the default cipher for PEM in OpenSSL is triple DES, so if a file is PEM encoded, it's quite likely it's triple DES encrypted.
So given that, I should have included, in my original answer, the comment that encoding can also help guess the cipher type at times. And in your examples, it's odd that both encrypted strings start with "Z". But I don't know of an encoding that does that.
[see also related comments at https://stackoverflow.com/a/20217208/181772]

Resources