I want to decrypt a xored content. if you want you can download the file in here
the file extension is .bin but content looks like hex to me and not binary, i'm not sure what kind of content it's.
the content look likes bellow:
2007 0b54 180a 541d 1318 1a00 541c 0654
0a0c 0606 065a 9854 0caa 2624 3000 0c04
260c 102c b435 fcaa b2ab acbf 32b2 aeb9
34b9 a0a8 a425 b6a9 809c bcb7 a8bb 2e34
eaa7 a835 80aa 8625 b8a7 aebc 2cbb 9e9d
329c bcaf 3493 a080 a625 aab9 329c bcaf
34b1 aab6 aab3 3431 b0a8 bebf b6ad 3634
b0af 849d 329c b225 faab acba b4af 3a93
32aa a0a9 a6b3 b80a 0a
and if it's hex why each 4 character is space delimiter-ed?
i think it can't be base64, because when i try to run following code i will get error
a#ubuntu:~/Downloads$ base64 -d enigma.bin>enigma.txt
base64: invalid input
second my goal is to find the key. so I tried the xortool
a#ubuntu:~/Downloads$ xortool enigma.bin
The most probable key lengths:
3: 15.1%
6: 19.3%
9: 13.6%
12: 15.3%
15: 9.4%
18: 10.9%
20: 4.4%
24: 5.3%
30: 3.4%
36: 3.4%
Key-length can be 3*n
Most possible char is needed to guess the key!
so i tried most used character like space(20) or E T A O I N S H R D L U but i had not luck. still my guess is i got the encoding incorrect
Related
This is a bit of a conundrum for me.
In the image below tesseract package in R totally ignores the second occurrence of 1 on the fourth line, no matter what I do (meaning, it reads it as 1 instead of 11). The image here is already preprocessed - upscaled via nn, cleaned, and binarized. It's the same thing even if I just lightly preprocess the source image.
Cropping the noise on the right does not help. Changing the tessedit_pageseg_mode options can only make things worse, but does not help with this particular problem.
Where the heck did the 1 go? I need to know for the sake of my sanity.
While waiting for R to compile tesseract package, I tested the command line version:
$ tesseract --version
tesseract 4.1.1
leptonica-1.79.0 #...etc
$ tesseract ocr_test.png test
obec TREBOHOSTICE 2021
okres Strakonice, Jihocesky kraj
Poéet osob starSich 15 let 274
Poéet osob v exekuci 11
Podil osob v exekuci 4,01 %
Celkovy pocet exekuci 106
Prumérny poéet exekuci na osobu 9.6
Z toho:
podil (pocet) osob s 1 — 9 exekucemi 45% (5)
podil (pocet) osob s 10 a vice exekucemi 55% (6)
PM. 2
CLI output looks good. Might be to do with the underlying versions of leptonica installed on your system
\\
Clean compile of R tesseract package plus Linux packages:
#Linux command line
$ sudo apt install libpoppler-cpp-dev libtesseract-dev libleptonica-dev
#In R
install.packages("tesseract") # version 5.1.0
library(tesseract)
ocr(file.choose())
Output of row 4 11 looks good:
obec TREBOHOSTICE 2021
okres Strakonice, Jihocesky kraj
Poéet osob starSich 15 let 274
Poéet osob v exekuci 11
Podil osob v exekuci 401% |
Celkovy pocet exekuci 106
Prumérny poéet exekuci na osobu 9.6
Z toho: on
podil (pocet) osob s 1 — 9 exekucemi 45% (5) ;
podil (pocet) osob s 10 a vice exekucemi 55% (6) >
The problem stems from using Czech engine engine = tesseract(language = "ces") for tesseract.
I have a data frame like this
scf0001 123 4567 - 4350
scf0001 4474 4878 + 376 *
scf0002 5375 10571 c 5006
scf0003 11370 16986 c 5536
scf0003 16256 17000 - 789 *
when I attempt to read it as a table in R, the error is line 2 did not have 5 elements, so it is not recognizing the star(*), I tried to read it as string with na.string="*" but it didn't work, so I think I have to define it with colclasses=.I tried something like this colClasses=[, c(6)="character"] but it doesn't work, Error: unexpected '['
I'm trying to understand the algorithm used for compression value = 1 with the Epson ESCP2 print command, "ESC-i". I have a hex dump of a raw print file which looks, in part, like the hexdump below (note little-endian format issues).
000006a 1b ( U 05 00 08 08 08 40 0b
units; (page1=08), (vt1=08), (hz1=08), (base2=40 0b=0xb40=2880)
...
00000c0 691b 0112 6802 0101 de00
esc i 12 01 02 68 01 01 00
print color1, compress1, bits1, bytes2, lines2, data...
color1 = 0x12 = 18 = light cyan
compress1 = 1
bits1 (bits/pixel) = 0x2 = 2
bytes2 is ??? = 0x0168 = 360
lines2 is # lines to print = 0x0001 = 1
00000c9 de 1200 9a05 6959
00000d0 5999 a565 5999 6566 5996 9695 655a fd56
00000e0 1f66 9a59 6656 6566 5996 9665 9659 6666
00000f0 6559 9999 9565 6695 9965 a665 6666 6969
0000100 5566 95fe 9919 6596 5996 5696 9666 665a
0000110 5956 6669 0456 1044 0041 4110 0040 8140
0000120 9000 0d00
1b0c 1b40 5228 0008 5200 4d45
FF esc # esc ( R 00 REMOTE1
The difficulty I'm having is how to decode the data, starting at 00000c9, given 2 bits/pixel and the count of 360. It's my understanding this is some form of tiff or rle encoding, but I can't decode it in a way that makes sense. The output was produced by gutenprint plugin for GIMP.
Any help would be much appreciated.
The byte count is not a count of the bytes in the input stream; it is a count of the bytes in the input stream as expanded to an uncompressed form. So when expanded, there should be a total of 360 bytes. The input bytes are interpreted as either a count of bytes to follow, if positive, in which case the count is the byte value +1; and if negative the count is a count of the number of times the immediately following byte should be expanded, again, +1. The 0D at the end is a terminating carriage return for the line as a whole.
The input stream is only considered as a string of whole bytes, despite the fact that the individual pixel/nozzle controls are only 2 bits each. So it is not really possible to use a repeat count for something like a 3-nozzle sequence; a repeat count must always specify a full byte 4-nozzle combination.
The above example then specifies:
0xde00 => repeat 0x00 35 times
0x12 => use the next 19 bytes as is
0xfd66 => repeat 0x66 4 times
0x1f => use the next 32 bytes as is
etc.
I have a dataset that looks like this:
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
I am using this to export to an .xpt file
write.xport(df_dummy, file = "data/tmp.xpt")
lookup.xport("data/tmp.xpt")
In SAS, I use this code to import:
libname sasfile 'PATH\data';
libname xptfile xport 'PATH\data\tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;
The table looks fine, but the rate doesn't show the decimal point.
In my actual dataset, there are a lot more rows but it's the same format essentially and if I run a lookup.xport I get this:
Variables in data set `MEASURES':
dataset name type format flength fdigits iformat iflength ifdigits label nobs
MEASURES ID character 0 0 0 0 29064
MEASURES MEASURE character 0 0 0 0 29064
MEASURES NUM numeric 0 0 0 0 29064
MEASURES DEN numeric 0 0 0 0 29064
MEASURES RATE numeric 0 0 0 0 29064
However, if I use the same SAS code to import this, I get something that looks completely off and I can't figure out what's causing it.
I cannot replicate your issue using R (3.4.1) and SAS (9.4 TS1M4) on Mac OS X with both being 64 bit versions. The 32/64 bit versions can cause issues sometimes.
I used R Studio and SAS UE, both freely available for education usage.
Full R code:
install.packages("SASxport")
library("SASxport")
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
write.xport(df_dummy, file = "tmp.xpt")
Full SAS Code:
libname sasfile '/folders/myfolders/';
libname xptfile xport '/folders/myfolders/tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;
Your example works. Even with older version or R. Make sure your transport file had not been corrupted by transferring between machines. A transport file is binary data with fixed length 80 byte records, but much of data looks like ASCII codes.
SAS transport files follow the SAS V5 rules for names. Make sure that your member name and variable names are valid SAS names and are not longer than 8 characters. Character variables cannot be longer than 200 characters.
You can quickly look at the file using a simple data step. Especially for your small example. So if you see that the length is not exactly a multiple of 80 or you see that the header records do not start at the beginning of an 80 byte record then something has corrupted the file.
56 data _null_;
57 infile '/test/tmp.xpt' lrecl=80 recfm=f ;
58 input;
59 list;
60 run;
NOTE: The infile '/test/tmp.xpt' is:
Filename=/test/tmp.xpt,
Owner Name=xxxxx,Group Name=xxxxx,
Access Permission=-rw-r--r--,
Last Modified=29Sep2017:09:16:16,
File Size (bytes)=1680
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
2 CHAR SAS SAS SASLIB 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222545222225454442232332222523232302222222222222222222222223354533333333333
NUMR 3130000031300000313C92007E000000203E0E200000000000000000000000002935017A09A16A16
3 29SEP17:09:16:16
4 HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
5 HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
6 CHAR SAS DF_DUMMYSASDATA 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222445454455454454232332222523232302222222222222222222222223354533333333333
NUMR 3130000046F45DD9313414107E000000203E0E200000000000000000000000002935017A09A16A16
7 29SEP17:09:16:16
8 HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000500000000000000000000
9 CHAR ........COMPANY ........
ZONE 00000000444544522222222222222222222222222222222222222222222222220000000022222222
NUMR 020008013FD01E900000000000000000000000000000000000000000000000000000000000000000
10 CHAR ....................................................................MEASURE
ZONE 00000000000000000000000000000000000000000000000000000000000000000000444555422222
NUMR 00000000000000000000000000000000000000000000000000000000000002000802D51352500000
11 CHAR ........ ....................
ZONE 22222222222222222222222222222222222222222222000000002222222200000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000008000000000000
12 CHAR ................................................NUM
ZONE 00000000000000000000000000000000000000000000000045422222222222222222222222222222
NUMR 000000000000000000000000000000000000000001000803E5D00000000000000000000000000000
13 CHAR ........ ........................................
ZONE 22222222222222222222222200000000222222220000000100000000000000000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
14 CHAR ............................DEN
ZONE 00000000000000000000000000004442222222222222222222222222222222222222222222222222
NUMR 000000000000000000000100080445E0000000000000000000000000000000000000000000000000
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
15 CHAR ........ ............................................................
ZONE 22220000000022222222000000010000000000000000000000000000000000000000000000000000
NUMR 00000000000000000000000000080000000000000000000000000000000000000000000000000000
16 CHAR ........RATE ........
ZONE 00000000545422222222222222222222222222222222222222222222222222220000000022222222
NUMR 01000805214500000000000000000000000000000000000000000000000000000000000000000000
17 CHAR ....... ....................................................
ZONE 00000002000000000000000000000000000000000000000000000000000022222222222222222222
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
18 HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
19 CHAR 0001 A A ......B.......B2......0002 B A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00010000100000001000000024000000220000000002000020000000100000002400000022000000
20 CHAR 0003 C A ......B.......B2......0004 D A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00030000300000001000000024000000220000000004000040000000100000002400000022000000
21 CHAR 0005 E A ......B.......B2......
ZONE 33332222422222224A00000041000000430000002222222222222222222222222222222222222222
NUMR 00050000500000001000000024000000220000000000000000000000000000000000000000000000
NOTE: 21 records were read from the infile '/test/tmp.xpt'.
I'm trying to read into R a test file encoded in Code page 437. Here is the file, and here is its hex-dump:
00000000: 0b0c 0e0f 1011 1213 1415 1617 1819 1a1b ................
00000010: 1c1d 1e1f 2021 2223 2425 2627 2829 2a2b .... !"#$%&'()*+
00000020: 2c2d 2e2f 3031 3233 3435 3637 3839 3a3b ,-./0123456789:;
00000030: 3c3d 3e3f 4041 4243 4445 4647 4849 4a4b <=>?#ABCDEFGHIJK
00000040: 4c4d 4e4f 5051 5253 5455 5657 5859 5a5b LMNOPQRSTUVWXYZ[
00000050: 5c5d 5e5f 6061 6263 6465 6667 6869 6a6b \]^_`abcdefghijk
00000060: 6c6d 6e6f 7071 7273 7475 7677 7879 7a7b lmnopqrstuvwxyz{
00000070: 7c7d 7e7f ffad 9b9c 9da6 aeaa f8f1 fde6 |}~.............
00000080: faa7 afac aba8 8e8f 9280 90a5 999a e185 ................
00000090: a083 8486 9187 8a82 8889 8da1 8c8b a495 ................
000000a0: a293 94f6 97a3 9681 989f e2e9 e4e8 eae0 ................
000000b0: ebee e3e5 e7ed fc9e f9fb ecef f7f0 f3f2 ................
000000c0: a9f4 f5c4 b3da bfc0 d9c3 b4c2 c1c5 cdba ................
000000d0: d5d6 c9b8 b7bb d4d3 c8be bdbc c6c7 ccb5 ................
000000e0: b6b9 d1d2 cbcf d0ca d8d7 cedf dcdb ddde ................
000000f0: b0b1 b2fe 0a .....
The file contains 245 characters (including the final newline), but R only reads 242 of them:
> test_text <- readLines(file('437__characters.txt', encoding='437'))
Warning message:
In readLines(file("437__characters.txt", :
incomplete final line found on '437__characters.txt'
> test_text
[1] "\v\f\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037 !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177 ¡¢£¥ª«¬°±²µ·º»¼½¿ÄÅÆÇÉÑÖÜßàáâäåæçèéêëìíîïñòóôö÷ùúûüÿƒΓΘΣΦΩαδεπστφⁿ₧∙√∞∩≈≡≤≥⌐⌠⌡─│┌┐└┘├┤┬┴┼═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟╠╡╢╣╤╥╦╧╨╩╪╫╬▀▄█▌▐░▒"
> nchar(test_text)
[1] 242
You'll note that R doesn't read the final characters "▓■\n".
My best guess is that this is something to do with how R determines the length of text files, because of the following:
Even though the file is terminated with a newline (0x0a), R gives an 'incomplete final line found' warning
Adding seven or more characters to the end of the file makes it read correctly
Similarly, the file is read correctly if you remove three characters from anywhere in the file
The same issue seems to occur with reading files encoded in other DOS code pages
This question might be related: R: read.table stops when meeting specific utf-16 characters.
It appears to be something wrong with readLines(), but could very well be an issue with the file connection for text, with something amiss happening in the encoding = part. Anyway, here's a workaround: Load the file as binary, and then convert. And stay away from bad voodoo 1980s code pages.
Using readLines()
This does not capture the last \n since that delimits the unit of text input by `readLines().
test_text2 <- readLines(file("~/Downloads/437__characters.txt", raw = TRUE))
test_text3 <- stringi::stri_conv(test_text2, "IBM437", "UTF-8")
stringi::stri_length(test_text3)
## [1] 244
test_text3
## [1] "\v\f\016\017\020\021\022\023\024\025\026\027\030\031\034\033\177\035\036\037 !\"#$%&'()*+,-./
## 0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\032 ¡¢£¥ª«¬°±²μ·º
## »¼½¿ÄÅÆÇÉÑÖÜßàáâäåæçèéêëìíîïñòóôö÷ùúûüÿƒΓΘΣΦΩαδεπστφⁿ₧∙√∞∩≈≡≤≥⌐⌠⌡─│┌┐└┘├┤┬┴┼═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟╠╡╢╣╤╥
## ╦╧╨╩╪╫╬▀▄█▌▐░▒▓■"
Using readBin()
Captures everything including the \n.
test_text_bin <- readBin(file("~/Downloads/437__characters.txt", "rb"),
n = 245, what = "raw")
test_text_bin_UTF8 <- stringi::stri_conv(test_text_bin, "IBM437", "UTF-8")
stringi::stri_length(test_text_bin_UTF8)
## [1] 245
test_text_bin_UTF8
## [1] "\v\f\016\017\020\021\022\023\024\025\026\027\030\031\034\033\177\035\036\037 !\"#$%&'()*+,-./
## 0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\032 ¡¢£¥ª«¬°±²μ·º
## »¼½¿ÄÅÆÇÉÑÖÜßàáâäåæçèéêëìíîïñòóôö÷ùúûüÿƒΓΘΣΦΩαδεπστφⁿ₧∙√∞∩≈≡≤≥⌐⌠⌡─│┌┐└┘├┤┬┴┼═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟╠╡╢╣╤╥
## ╦╧╨╩╪╫╬▀▄█▌▐░▒▓■\n"