Cannot get Russian Subject from Outlook using RDCOMClient - r

I am using Outlook under Windows 10 as my email client and am trying to use the RDCOMClient library to process some emails. Some of the emails are in Russian and I am having trouble getting the Russian part out in a usable format. Right now, I am just
focusing on the subject lines. When I extract the line and print
it out, I just get question marks except for a few Latin characters
in the subject. I have tried setting the encoding and using
iconv, but with no success. But iconv did provide a useful clue.
Based on my reproducible example below showing the raw characters
gives:
iconv(SUBJECT, toRaw=T)
[1] 53 74 61 63 6b 4f 76 65 72 66 6c 6f 77 54 65 73 74 4d 65 73 73 61 67 65 3a
[26] 20 3f 3f 3f 3f 3f 3f 3f 3f 20 3f 3f 3f 3f 3f 3f 3f 3f 3f
All of the 3f's at the end? That is the code for question mark. RDCOMClient is
actually returning the ??? from Outlook. It is not some encoding issue inside R.
I have looked at many RDCOMClient posts on SO, but do not see anything
that deals with this problem.
Is the RDCOMClient<->Outlook connection just broken? Or is there some way
around this?
Attempt at Reproducible Example
Since we are talking about accessing email, I don't see how to make a
really easy reproducible example, but here is a reproducible way to test this.
Of course, you have to have Outlook on Windows for this to make sense.
Send yourself an email with the subject line:
StackOverflowTestMessage: Тестовое сообщение
R code
We need to find the email first. Most of the code does that.
Then we inspect the subject.
## Connect to Outlook
OutApp <- COMCreate("Outlook.Application")
outlookNameSpace = OutApp$GetNameSpace("MAPI")
## Find the Inbox
INBOX = outlookNameSpace$GetDefaultFolder(6)
INBOX$Name() ## Confirm
emails <- INBOX$Items
## Find the relevant email
NumEmail = emails()$Count()
MessageNumber = 0
for(i in NumEmail:1) {
SUBJ = emails(i)$Subject()
if(grepl("StackOverflowTestMessage", SUBJ)) {
MessageNumber = i
break()
}
}
## Now try to get the subject line
SUBJECT = emails(MessageNumber)$Subject()
Encoding(SUBJECT) = 'UTF-8'
SUBJECT
[1] "StackOverflowTestMessage: ???????? ?????????"
iconv(SUBJECT, toRaw=T)
[[1]]
[1] 53 74 61 63 6b 4f 76 65 72 66 6c 6f 77 54 65 73 74 4d 65 73 73 61 67 65 3a
[26] 20 3f 3f 3f 3f 3f 3f 3f 3f 20 3f 3f 3f 3f 3f 3f 3f 3f 3f```

Related

Dont understand minecraft data packets

I have been trying for hours to understand what minecraft packets mean but they don't seem to adhere to the protocol I've been using wireshark to sniff the packets and according to the protocol they should start with 0x something but they never do I'm really confused rn and any help would be greatly appreciated.
Here is some of the data that came with packets:
57 9e e9 f7 7f 3c 0b c7 b2 f0 f2 1d 8e 42 6e
9c 14 57 71 74 6b 83
54 ad d7 3a 51 60 55
any help is really appreciated
0x is just a way of telling (almost all language compilers) that the following number is a hex number, 9e would be written as 0x9e in Java. There are also other number prefixes, e.g. 0b means that the following number is binary (0 and 1's).

Why is the hex value of a period in a DNS request not 0x2E, and why does it change?

Looking at a DNS request in wireshark for www.google.com and the hex for it is 03 77 77 77 06 67 6f 6f 67 6c 65 03 63 6f 6d 00
Little confused why the first period is 03 (and why it's there), the second is 06, and the last is 03
The DNS protocol layer is defined in RFC 1035. To cite from "3.1. Name space definitions":
Domain names in messages are expressed in terms of a sequence of labels.
Each label is represented as a one octet length field followed by that
number of octets. Since every domain name ends with the null label of
the root, a domain name is terminated by a length byte of zero.
Thus www.google.com is encoded in the DNS packet as:
03 77 77 77 length 3, "www"
06 67 6f 6f 67 6c 65 length 6, "google"
03 63 6f 6d length 3, "com"
00 length 0 (end of label)

Recognizing IR pattern/possible CRC in code?

Short brief: I have a small toy that uses infrared signals from a remote to change colors of some lights, as well as their patterns. I've managed to capture the pattern timings, wrote a script in python to convert it to 1's and 0's, and then I have an android app that will convert the 1's and 0's to the IR timings for the blaster. I've tested all of these patterns and they make the toy respond as expected from my phone.
Instead of trying to capture every single pattern manually, I think there's some sort of patterns inside the codes - I'm trying to figure them out. It looks like there's a check at the end of each, but that's the part I can't crack. I've tried simple checksum-like things, as well as revEng.
In all honesty, this is a bit over my head - my experience is mostly web-based. I don't even know for sure if there's a CRC code, but it makes sense looking at the data.
Here are a few samples - I've provided them as the binary as well as translated to HEX (less characters to stare at).
Pattern titles are Color / Animation / Transition / Speed
Light Blue / Double Flash / Fade / Normal
1001001100111011011010011100101001010010111110100011011001101000001000110100101011010111101111011110111101001011110110101011101100101001
93 3b 69 ca 52 fa 36 68 23 4a d7 bd ef 4b da bb 29
Light Blue / Solid / None / Normal
1110001100110011011011100111101111111010100111001010010100101111001010110001000011110110101001111110101110100011100001101000111110111101111011011001
e3 33 6e 7b fa 9c a5 2f 2b 10 f6 a7 eb a3 86 8f bd ed 90
Light Blue / Pulse / Straight / Slow
10110011001110110110111100101010010111101111011010101001111011111010001100000110110110011010011100101001010010111010101
b3 3b 6f 2a 5e f6 a9 ef a3 06 d9 a7 29 4b aa
Light Blue / Pulse / Straight / Normal
10110011001110110110111100101011001111101111011010101001111011111010001100000110110110011010011100101001010010111101011
b3 3b 6f 2b 3e f6 a9 ef a3 06 d9 a7 29 4b d6
Light Blue / Pulse / Fade / Fast
11100011001100110110111001111010001000101001110010100101001011111010001101100110111110000011011010001011111010111011111010011111101111110110110000001
e3 33 6e 7a 22 9c a5 2f a3 66 f8 36 8b eb be 9f bf 6c 08
Light Blue / Pulse / Fade / Normal
1110001100110011011011100111101010000000100111001010010100101111101000110110011011111000001101101000101111101011101111101010111110111111011011
e3 33 6e 7a 80 9c a5 2f a3 66 f8 36 8b eb be af bf 6c
Light Blue / Pulse / Fade / Slow
1110001100110011011011100111101111000110100111001010010100101111101000110110011011111000001101101000101111101011101111101101011110111111011011101011
e3 33 6e 7b c6 9c a5 2f a3 66 f8 36 8b eb be d7 bf 6e b0
You can see that the Pulse / Fade patterns all start the same, as well as the Pulse / Straight - which makes me believe even more that the entire IR sequence is simply a list of parameters, not necessarily pre-defined patterns.
tl;dr: Are there CRC patterns at the ends of these, and how can I figure them out?

nginx returning netstring with wrong length?

I installed nginx (nginx version: nginx/1.7.9) via macports on my mac running the latest OSX.
I configured a URI to use SCGI:
location /server {
include /Users/ruipacheco/Projects/Assorted/nginx/conf/scgi_params;
scgi_pass unix:/var/tmp/rpc.sock;
#scgi_pass 127.0.0.1:9000;
}
And when I do a GET request on 127.0.0.1/server, I see the following on my SCGI server:
633:CONTENT_LENGTH0REQUEST_METHODGETREQUEST_URI/serverQUERY_STRINGCONTENT_TYPEDOCUMENT_URI/serverDOCUMENT_ROOT/opt/local/htmlSCGI1SERVER_PROTOCOLHTTP/1.1REMOTE_ADDR127.0.0.1REMOTE_PORT62088SERVER_PORT80SERVER_NAMElocalhostHTTP_HOST127.0.0.1HTTP_CONNECTIONkeep-aliveHTTP_CACHE_CONTROLmax-age=0HTTP_ACCEPTtext/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8HTTP_USER_AGENTMozilla/5.0
(Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/40.0.2214.115
Safari/537.36HTTP_DNT1HTTP_ACCEPT_ENCODINGgzip, deflate,
sdchHTTP_ACCEPT_LANGUAGEen-US,en;q=0.8,End of file
The problem is that the length of the netstring, 633, does not match the interpretation. If I understand the netstrings spec correctly, 633 should be the length of characters between the first : and the last ,:
Any string of 8-bit bytes may be encoded as [len]":"[string]",". Here [string] is the string and [len] is a nonempty sequence of ASCII digits giving the length of [string] in decimal. The ASCII digits are <30> for 0, <31> for 1, and so on up through <39> for 9. Extra zeros at the front of [len] are prohibited: [len] begins with <30> exactly when [string] is empty.
For example, the string hello world! is encoded as 31 32 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c, i.e., 12:hello world!,.
So, I'm getting the wrong length. How can this be explained?
As far as I can tell, your example response has correct length.
According to example here:
http://en.wikipedia.org/wiki/Simple_Common_Gateway_Interface
Field values are preceded and followed by <00> symbol (ASCII symbol with hex code 00), eg.:
REQUEST_METHOD <00>GET<00>
Once I added missing spaces to your response snippet – it quickly got back to 633 bytes, as advertised.
I suppose somewhere in the process of passing that response to us here, some piece of software stripped <00>'s, which is a totally normal behaviour?
Anyway, the answer seems to be – your nginx is either returning a correct length, or your response is stripping <00>'s somewhere.
Well,
The hexadecimal <31 32 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21>
in ASCII is "12:hello world!" (no quotes) and the lenght is 12 (hello world!)
And this one <31 32 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c> in the example is wrong (at least it didnt match the nginx norm.)(since the internal lenght is 13 and the lenght specified in hex is 12):
The ASCII "12:hello world!," should be "13:hello world!," and in hex <31 33 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>
This line is the mess:
For example, the string "hello world!" is encoded as <31 32 3a 68 65
6c 6c 6f 20 77 6f 72 6c 64 21 2c>, i.e., "12:hello world!,".
OK) 12:hello world! ---> <31 *32* 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21>
KO) 12:hello world!, ---> <31 *32* 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>
OK) 13:hello world!, ---> <31 *33* 3a 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>
The hex inside the ** is the second number of the lenght.
Then your concept about this Ok, the example is bad.

Errors while opening dicom files in R

I am trying to open dicom files in R using following code:
library(oro.dicom)
dcmobject <- readDICOMFile(filename)
Some files open properly and I can display them. However, some files give errors of different types:
First error: For some, I get the error:
Error in file(con, "rb") : cannot open the connection
Second error: In others, I get following error with dicom file: http://www.barre.nom.fr/medical/samples/files/OT-MONO2-8-hip.gz :
Error in readDICOMFile(filename) : DICM != DICM
Third error: This file gives following error: http://www.barre.nom.fr/medical/samples/files/CT-MONO2-16-chest.gz
Error in parsePixelData(fraw[(132 + dcm$data.seek + 1):fsize], hdr, endian, :
Number of bytes in PixelData not specified
Fourth error: One dicom file gives following error:
Error in rawToChar(fraw[129:132]) : embedded nul in string: '\0\0\b'
How can I get rid of these errors and display these images in R?
EDIT:
This sample file gives the error 'embed nul in string...':
http://www.barre.nom.fr/medical/samples/files/CT-MONO2-12-lomb-an2.gz
> jj = readDICOMFile( "CT-MONO2-12-lomb-an2.dcm" )
Error in rawToChar(fraw[129:132]) : embedded nul in string: '3\0\020'
There are four different errors highlighted in this ticket:
Error in file(con, "rb") : cannot open the connection
This is not a problem with oro.dicom, it is simply the fact that the file path and/or name has been mis-specified.
Error in readDICOMFile(filename) : DICM != DICM
The file is not a valid DICOM file. That is, section 7.1 in Part 10 of the DICOM Standard (available at http://dicom.nema.org) specifies that there should be (a) the File Preample of length 128 bytes and (b) the four-byte DICOM Prefix "DICM" at the beginning of a DICOM file. The file OT-MONO2-8-hip does not follow this standard. One can investigate this problem further using the debug=TRUE input parameter
> dcm <- readDICOMFile("OT-MONO2-8-hip.dcm", debug=TRUE)
# First 128 bytes of DICOM header =
[1] 08 00 00 00 04 00 00 00 b0 00 00 00 08 00 08 00 2e 00 00 00 4f 52 49 47 49 4e 41 4c 5c 53 45
[32] 43 4f 4e 44 41 52 59 5c 4f 54 48 45 52 5c 41 52 43 5c 44 49 43 4f 4d 5c 56 41 4c 49 44 41 54
[63] 49 4f 4e 20 08 00 16 00 1a 00 00 00 31 2e 32 2e 38 34 30 2e 31 30 30 30 38 2e 35 2e 31 2e 34
[94] 2e 31 2e 31 2e 37 00 08 00 18 00 1a 00 00 00 31 2e 33 2e 34 36 2e 36 37 30 35 38 39 2e 31 37
[125] 2e 31 2e 37
Error in readDICOMFile("OT-MONO2-8-hip.dcm", debug = TRUE) : DICM != DICM
It is apparent that the first 128 bytes contain information. One can now use the parameters skipFirst128=FALSE and DICM=FALSE to start reading information from the beginning of the file
dcm <- readDICOMFile("OT-MONO2-8-hip.dcm", skipFirst128=FALSE, DICM=FALSE)
image(t(dcm$img), col=grey(0:64/64), axes=FALSE, xlab="", ylab="")
3.
Error in parsePixelData(fraw[(132 + dcm$data.seek + 1):fsize], hdr, endian, :
Number of bytes in PixelData not specified
The file CT-MONO2-16-chest.dcm is encoded using JPEG compression. The R package oro.dicom does not support compression.
Error in rawToChar(fraw[129:132]) : embedded nul in string: '\0\0\b'
I have to speculate, since the file is not available for direct interrogation. This problem is related to the check for "DICM" characters as part of the DICOM standard. If it failed, then one can assume the file is not a valid DICM file. I will look into making this error more informative in future versions of oro.dicom.
EDIT: Thank-you for providing a link to the appropriate file. The file is in "ARC-NEMA 2" format. The R package oro.dicom has not been designed to read such a file. I have modified the code to improve the error tracking.

Resources