I'm using a MITM technic to study some apps apis but im not able to restore the original data from the multipart gzip request
does anyone know how can i recover the content of this package?
POST /logging_client_events HTTP/1.1
Accept-Language: pt-BR, en-US
Content-Type: multipart/form-data; boundary=3TtLStKljJgtMAosyN-hY6JtpuUqhC
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 1129
--3TtLStKljJgtMAosyN-hY6JtpuUqhC
Content-Disposition: form-data; name="access_token"
567067343352427|f249176f09e26ce54212b472dbab8fa8
--3TtLStKljJgtMAosyN-hY6JtpuUqhC
Content-Disposition: form-data; name="format"
json
--3TtLStKljJgtMAosyN-hY6JtpuUqhC
Content-Disposition: form-data; name="cmsg"; filename="ae3ada0b-866d-4b0c-b0af-e0c66df71808_5_regular.batch.gz"
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
eRÛ®0üòG6¾GÊUhm/9Ö!#0Ð¥ù÷Ú¤Q¢VH\fvf׳ܪ×ê(÷cCu¬¤ÒTi.8µ¨uõ V2Ç(=é«m¦Ü»ÐôË¥ m¸FCç88A¥8ÊÖÄñÄ+¡Zë°6³¤Kì¾w¥ôSJ#DíqÜK"æ¡uTfeÂâÐ?4PGò$G=qZÔg ÕÌP5ËVLóÿ¾Ç.Mx^:2Ö
çfþ1¾ØÏ
®ùþ7ÖPf5²b2ôm<Ê$]ëê?Ñ¥-£kúíOye8BÀê:HDQsgPÑúZÝNL*¥eÚî®ëie»t³ÜRç©â¨u
['̹{QÎ`êøq«z¸ássðs\sýÓ
].ãÆSEùAð²³±ý¹`Îl_á¯yÊ~·j;ý3§UfJ&Û³yؾ\÷ÕøõoLv Wæã4B#óÁÏØFÒ}ù+rí°Ûv¥fïP*Xîh´BÉwêÿÞï?î
======================UPDATE===============
I uploaded 3 sample packages in this format so if anyone knows how to solve the problem can try
https://gofile.io/?c=fNakzX
The content you uploaded contains a lot of question marks as ASCII '\x3f' (all three versions of it). I am pretty sure these represent the original data at all bytes which were unprintable characters. In changing the original bytes to question marks, the information was lost completely.
The description of your question contains at least a version which is not peppered with question marks, but since this is a real text representation of the binary data, I am also pretty sure that there are some (relevant) characters missing and/or that some of the characters aren't correctly transformable back to the original binary.
If you do not have any other version of your input, I'm afraid your task cannot be accomplished, sorry.
Related
I've read about chunked Transfer-Encoding and basically got the point. However, there's something I don't quite understand and hasn't been reffered to in all the sources I've read.
A chunked encoded data is structured as a series of chunks, each structured as follows:
<chunk size> (In ASCII bytes expressing the hexadecimal value)
\r\n
<data>
\r\n
What I don't understand is: what if the payload itself contains a \r\n ? Doesn't it interfere with the way we track when a chunk starts and ends?
You could argue that even if it does, we still have the chunk size before the chunk so that CRLF shouldn't bother us, but then I would ask - if so, why having these CRLFs in the first place?
Please clarify.
Yes, it can include \r\n.
As to why this format was chosen: I don't know. Maybe to make it more readable when uses with textual data.
I have found some text in this form:
=B0=A1=C1=CB,=C4=E3=D2=B2=C3=BB=C1=AA=CF=B5=CE=D2,=D7=EE=BD=FC=CA=C7=B2=BB=CA=C7=
=BA=DC=C3=A6=B0=A1
containing mostly sequences consisting of an equal sign followed by two hexadecimal digits.
I am told it could be converted into this Chinese sentence:
啊了你也没联系我最近是不是很忙啊
What is the =B0=A1=C1 and how to decode/convert it?
The Chinese sentence has been encoded into an 8-bit Guobiao encoding (GB2312, GBK or GB18030; most likely the latter, though it apparently decodes correctly as the former too), and then further encoded into the 7-bit MIME quoted-printable encoding.
To decode it into a Unicode string, first undo the quoted-printable encoding, then decode the Guobiao encoding. Here’s an example using Python:
import quopri
print(quopri.decodestring("""\
=B0=A1=C1=CB,=C4=E3=D2=B2=C3=BB=C1=AA=CF=B5=CE=D2,=D7=EE=BD=FC=CA=C7=B2=BB=CA=C7=
=BA=DC=C3=A6=B0=A1\
""").decode('gb18030'))
This outputs 啊了,你也没联系我,最近是不是很忙啊 on my terminal.
The quoted-printable encoding is usually found in e-mail messages; whether it is actually in use should be determined from message headers. A message encoded in this manner should carry the header Content-Transfer-Encoding: quoted-printable. The text encoding (gb18030 in this case) should be specified in the charset parameter of the Content-Type header, but sometimes can be determined by other means.
I am getting some 55k mails each month & I have taken up an assignment to analyse the mails. While the .eml has lot of content, I am typically interested in email text content as follows:
From: "SavReader" <info#savreader.com>
To: <pgmagesh#gmail.com>
Subject: Export file SavReader.com
Date: Mon, 2 Nov 2015 08:37:52 +0100
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0000_01D11549.C39BD260"
This is a multi-part message in MIME format.
------=_NextPart_000_0000_01D11549.C39BD260
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Hello from SavReader!
The file that you submitted for export is now available for download
from SavReader - all files will be stored for 1 day from receipt of
this mail.
Download file <http://www.savreader.com/export/qlysDuv1xQ.xls>
Thanks,
Team SavReader
------=_NextPart_000_0000_01D11549.C39BD260
Content-Type: text/html
Content-Transfer-Encoding: 7bit
Hello from SavReader!<br><br>The file that you submitted for export is now available for download from SavReader - all files will be stored for 1 day from receipt of this mail.<br><br><a href=http://www.savreader.com/export/qlysDuv1xQ.xls>Download file</a><br><br>Thanks, <br><br>Team SavReader
------=_NextPart_000_0000_01D11549.C39BD260--
I am interested in extracting Subject:, From:, and the content of the mail. While the body of the mail is extracted in both Content-Type: text/plain; charset=UTF-8 as well as Content-Type: text/html; charset=UTF-8 I figured I get matching pair of delimit strings --001a113d7c1e5de339051fdaaf69 before and after --001a113d7c1e5de339051fdaaf69-- (the closing delimiter ends with additional two "--") the message or the body of the mail is sandwiched between these uids. I was trying to parse the email id and content of the mail. I have used the following code, where a is the name of my .eml file:
pat = '([From]): ([a-zA-Z]) (([a-z0-9_\\.-]+)#([\\da-z\\.-]+)\\.([a-z\\.]{2,6}))'
d <- str_match(pattern = pat, a)
d
another option:
strsplit(gsub("(?s)^_+\\s+", "", a, perl=T) , "_+\\s*(?=From:)", perl=T)[[1]]
another option:
d <- str_extract(string=a,pattern="From:\\b[-A-Za-z0-9_.%]+\\#[-A-Za-z0-9_.%]+\\.[A-Za-z]+")
and many other options given in SO. What I want to extract is:
From: DEFHIJ <abc#xyz.in>
and the html content of the mail between the matching delimiter strings. Can someone help pls?
You're almost there. You need to include angular brackets and also match the space which exist between From: and the first letter using \\s*.
str_extract(string=a,pattern="^From:\\s*[-A-Za-z0-9_.%]+\\s*<[-A-Za-z0-9_.%]+#[-A-Za-z0-9_.%]+\\.[A-Za-z]+>")
DEMO
or
str_extract(a, "^From:.*")
Imagine I have some text, like 'Feef'.
I can gzip it and the result is 24 bytes.
Is there a way to gzip it so the result would be 1024 bytes? It should be still a valid gzip stream, i.e. it would not generate the message "trailing garbage ignored" when decompressed.
How would I use it: Gzip data header to fixed size. Append gzipped data body. Update header, gzip it to same fixed size and overwrite.
You can concatenate gzip streams and it will still be valid gzip, but they have to be proper streams. Maybe there's a way to pad gzip output?
The gzip header permits an extra field of up to 65535 bytes that can contain arbitrary data and that is ignored when decompressing. So you can change the gzip header to insert an extra field to pad out the file to the desired length. See RFC 1952 for the format description. If you don't care about the file name in the gzip header, you can make that any length, to pad to an arbitrarily large size. Or if you want more than 64K and you don't want to muck with the file name, you can append empty gzip streams to make it as long as you like.
I found this string several times on the Internet, and I wonder what it means, and where it comes from:
3i2ndDfv2rTHiSisAbouNdArYfORhtTPEefj3q2f
It's often used after a boundery-definition in the HTTP-Content-Type-Header:
Content-Type: multipart/form-data; boundary=--3i2ndDfv2rTHiSisAbouNdArYfORhtTPEefj3q2f
http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
rfc1341 7.2 The Multipart Content-Type
The body must then contain one or more "body parts," each preceded by an encapsulation boundary, and the last one followed by a closing boundary.