I have found some text in this form:
=B0=A1=C1=CB,=C4=E3=D2=B2=C3=BB=C1=AA=CF=B5=CE=D2,=D7=EE=BD=FC=CA=C7=B2=BB=CA=C7=
=BA=DC=C3=A6=B0=A1
containing mostly sequences consisting of an equal sign followed by two hexadecimal digits.
I am told it could be converted into this Chinese sentence:
啊了你也没联系我最近是不是很忙啊
What is the =B0=A1=C1 and how to decode/convert it?
The Chinese sentence has been encoded into an 8-bit Guobiao encoding (GB2312, GBK or GB18030; most likely the latter, though it apparently decodes correctly as the former too), and then further encoded into the 7-bit MIME quoted-printable encoding.
To decode it into a Unicode string, first undo the quoted-printable encoding, then decode the Guobiao encoding. Here’s an example using Python:
import quopri
print(quopri.decodestring("""\
=B0=A1=C1=CB,=C4=E3=D2=B2=C3=BB=C1=AA=CF=B5=CE=D2,=D7=EE=BD=FC=CA=C7=B2=BB=CA=C7=
=BA=DC=C3=A6=B0=A1\
""").decode('gb18030'))
This outputs 啊了,你也没联系我,最近是不是很忙啊 on my terminal.
The quoted-printable encoding is usually found in e-mail messages; whether it is actually in use should be determined from message headers. A message encoded in this manner should carry the header Content-Transfer-Encoding: quoted-printable. The text encoding (gb18030 in this case) should be specified in the charset parameter of the Content-Type header, but sometimes can be determined by other means.
I have a JSON file with Tweet data, containing fields such as text, published date, author, ID, etc.
I used the parseTweets function from streamR, but when I view the completed df, the text has not been encoded/parsed in correctly.
tweets <- parseTweets("C:/Users/...file.json",simplify = FALSE, verbose = TRUE, legacy = FALSE)
View(tweets)
This is what is shown in the "text" column of the parsed object
think you’re continuing the conversation
It should say: think you're continuing the conversation
I did some searching and this seems to be an encoding issue, but I can't seem to figure it out.
Would I need to parseTweets first, then edit the text column afterwards? Or is there a wrapper method that I can parse correctly the first time I read in the JSON?
Any help is appreciated, thank you!
Here is an example JSON snippet pulled from my larger file
{"created_at":"Sun Jun 10 00:01:12 +0000 2018","id":100565760896,"id_str":"1005600896","text":"think you’re continuing the conversation","source":"Twitter for iPhone","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":403340,"id_str":"40311840","name":"Dvo","screen_name":"ImBorau","location":"Florida, USA","url":"http://Instagram.com/ ","description":"ucf | I your sarcastic quips","translator_type":"none","protected":false,"verified":false,"followers_count":43,"friends_count":166,"listed_count":0,"favourites_count":839,"statuses_count":1460,"created_at":"Wed Nov 02 01:41:45 +0000 2011","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"9AE4E8","profile_background_image_url":"http://abs.twimg.com/images/themes/theme16/bg.gif","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme16/bg.gif","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"BDDCAD","profile_sidebar_fill_color":"DDFFCC","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/10014987138688/RYbZNdVR_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/100149871633688/RYbNdVR_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/40318340/107757914","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null,"updated":["description","name"]},"geo":null,"coordinates":null,"place":{"id":"4ec0163497","url":"https://api.twitter.com/1.1/geo/id/4ec1c9db497.json","place_type":"admin","name":"Florida","full_name":"Florida, USA","country_code":"US","country":"United States","bounding_box":{"type":"Polygon","coordinates":[[[-87.634643,24.396308],[-87.634643,31.001056],[-79.974307,31.001056],[-79.974307,24.396308]]]},"attributes":{}},"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1528588108","matching_rules":[{"tag":null,"id":484862573421,"id_str":"48486970421"}]}
I'm using a MITM technic to study some apps apis but im not able to restore the original data from the multipart gzip request
does anyone know how can i recover the content of this package?
POST /logging_client_events HTTP/1.1
Accept-Language: pt-BR, en-US
Content-Type: multipart/form-data; boundary=3TtLStKljJgtMAosyN-hY6JtpuUqhC
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 1129
--3TtLStKljJgtMAosyN-hY6JtpuUqhC
Content-Disposition: form-data; name="access_token"
567067343352427|f249176f09e26ce54212b472dbab8fa8
--3TtLStKljJgtMAosyN-hY6JtpuUqhC
Content-Disposition: form-data; name="format"
json
--3TtLStKljJgtMAosyN-hY6JtpuUqhC
Content-Disposition: form-data; name="cmsg"; filename="ae3ada0b-866d-4b0c-b0af-e0c66df71808_5_regular.batch.gz"
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
eRÛ®0üòG6¾GÊUhm/9Ö!#0Ð¥ù÷Ú¤Q¢VH\fvf׳ܪ×ê(÷cCu¬¤ÒTi.8µ¨uõ V2Ç(=é«m¦Ü»ÐôË¥ m¸FCç88A¥8ÊÖÄñÄ+¡Zë°6³¤Kì¾w¥ôSJ#DíqÜK"æ¡uTfeÂâÐ?4PGò$G=qZÔg ÕÌP5ËVLóÿ¾Ç.Mx^:2Ö
çfþ1¾ØÏ
®ùþ7ÖPf5²b2ôm<Ê$]ëê?Ñ¥-£kúíOye8BÀê:HDQsgPÑúZÝNL*¥eÚî®ëie»t³ÜRç©â¨u
['̹{QÎ`êøq«z¸ássðs\sýÓ
].ãÆSEùAð²³±ý¹`Îl_á¯yÊ~·j;ý3§UfJ&Û³yؾ\÷ÕøõoLv Wæã4B#óÁÏØFÒ}ù+rí°Ûv¥fïP*Xîh´BÉwêÿÞï?î
======================UPDATE===============
I uploaded 3 sample packages in this format so if anyone knows how to solve the problem can try
https://gofile.io/?c=fNakzX
The content you uploaded contains a lot of question marks as ASCII '\x3f' (all three versions of it). I am pretty sure these represent the original data at all bytes which were unprintable characters. In changing the original bytes to question marks, the information was lost completely.
The description of your question contains at least a version which is not peppered with question marks, but since this is a real text representation of the binary data, I am also pretty sure that there are some (relevant) characters missing and/or that some of the characters aren't correctly transformable back to the original binary.
If you do not have any other version of your input, I'm afraid your task cannot be accomplished, sorry.
According to the HTTP specs a header can look like this:
Header-Name=value1,value2,value3,...
I try to parse the header values and store them as an array:
array('value1', 'value2', 'value3')
so far so good. I can just tokenize the string if a comma appears.
BUT how should I handle headers like this one:
Expires=Thu, 01 Dec 1994 16:00:00 GMT
there's a comma but in the one value the header has. Oh that's easy I thought and figuered out the rule: Only separate by commas when there's no space before and after the comma. This way both examples get parsed correct.
BUT then I came across a header like this:
Accept-Encding=gzip, deflate
and now? Is this one value array('gzip, deflate') or two values array('gzip', 'deflate')? For me they are two separate values but then my rule from the above isn't true anymore.
Is there a list which headers are allowed more than once? So I can check against a blacklist to determine if the comma means a value delimiter or not?
Comma concatenation can occur for any header field, even those that aren't designed for it; it's how libraries and intermediaries happen to work.
It is designed to be used for header fields that use list syntax (RFC 7230 has all the details).
Finally, you can't use generic code to tokenize, because the way the comma can occur inside values varies from field to field.
I found this string several times on the Internet, and I wonder what it means, and where it comes from:
3i2ndDfv2rTHiSisAbouNdArYfORhtTPEefj3q2f
It's often used after a boundery-definition in the HTTP-Content-Type-Header:
Content-Type: multipart/form-data; boundary=--3i2ndDfv2rTHiSisAbouNdArYfORhtTPEefj3q2f
http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
rfc1341 7.2 The Multipart Content-Type
The body must then contain one or more "body parts," each preceded by an encapsulation boundary, and the last one followed by a closing boundary.