What does this response mean? - http

This is a response from server of a video file. When seeing the preview in chrome(image) it shows in some characters(Not sure what kind of character is that. If someone know please let me know what is the name of those characters/symbols). Same video response in firefox(image) is seen as base64. So, is the video is transferred to the browser in form of base64 string even when the content type is set to video/mp4(image)? I notice this when i download a pdf file as well. Please explain me. Thanks.

You're looking at something that is binary data, not text, therefore it doesn't show as any ascii characters that make any sense.

Related

Is it possible for a server to provide the same file to 2 people, on who understands only UTF-8 and the other, only ISO8559?

INTRO:
I'm in a situation because when uploading an inventory upload feed to Amazon, in 2021, they still don't understand UTF-8 encoding.
Here we have a file, in a wordpress installation, as the image for a visual product.
Example url : https://wordpresssite.com/uploads/Café-à-la-crème.jpg
Wordpress displays it fine.
Amazon reads a bunch of gibberish and can't find the file and gives an error.
Can we leave the file name on the source server as is and yet do something in cPanel or in
the excel file that lists this URL in a way that Amazon can also read it?
Is this ultimately as simple as telling Excel to encode that column differently before uploading?
Thank you in advance!
UPDATE : What I am trying now, is to export the Excel to CSV and then run it through line by line using PHP with a combination of tricks hoping to do a passable job of it. From what I see, there are many ways that "sorta" work, but nothing is sure.
UPDATE 2 : I realize that this doesn't solve my problem, because if Amazon changes the file name, changing an "é" to an "e", then it won't find the image either, so I'll have to go through all the images and find the ones with accents that I'm using.
QUESTION ABOUT PROCEDURE : I haven't been able to quite understand the way things work. I thought originally that this is about trying to get help when stuck. I have explained the problem and code isn't necessary. If I'm wrong, please tell me how it changes THIS situation? I'm using Excel, WordPress and I have to lose the UTF-8 accented characters that seem to cause Amazon's systems such grief (no judgement to Amazon, except that this resistance to UTF-8 is giving me brain shudders at the moment).
MORE INFO: If this helps, I'm writing in English but certain art products have a lot of French and some German in their names. I thought my example sufficient to illustrate what I was up against.
My problem is not how to convert the code but how to put the steps together to do what I need. It's because this whole process is not a simple iconv vs utf_decode() in php that it's extra stressful. Once I get the big picture sorted, the smaller steps are written about in many places where I could find more specifc details if I needed.
I'm not snarking here, but it seems that this kind of comment is just kicking someone when they are down. You are not the first to make such a suggestion over the years but again, I am curious how I could have explained any more than I have already — in a way that pertains to my actual problem.
Thanks for your response.
That URI is not properly encoded as per RFC 3986 (see also Wikipedia: percent/URL encoding). You cannot expect a server to blindly assume a requested URI to be UTF-8 encoded, but you can expect every server to support percent encoding:
https://wordpresssite.com/uploads/Caf%C3%A9-%C3%A0-la-cr%C3%A8me.jpg
In PHP this can be achieved thru rawurlencode(); in JavaScript it would be encodeURI().
Not sure what you want with Excel and CSV, but from what I understood it is unrelated to your actual problem.

How to detect wrong encoding declaration?

I am building a ASP.NET webservice loading other webpages and then hand it clients.
I have been doing quite well with character code treatment, reading the meta tag from HTML then use that codeset to read the file.
But nevertheless, some less educated users just don't understand code sets. They declare a specific encoding method e.g. "gb2312", but in fact, he is just using normal UTF8. When I use gb2312 to decode the text, everything turns out a holy mess.
How can I detect whether the text is properly decoded? I loaded that page into my IE, which correctly use UTF-8 to decode the page. How does it achieve that?
Based on the BOM you can tell what encoding is used.
BOM and encoding
If you want to detect character set you could use the C# port of mozilla's character set detector.
CharDetSharp
If you want to make it extra sure that you are using a correct one, you maybe could be looking for special characters that are not supposed to be there. It is not very likely to include "óké". So you could be looking for such characters and try to use different encoding/character set to process your file.
Actually it is really hard to make your application completely "fool-proof".

Google search by image "image_content" format?

I'm trying to create an Application, which is able to upload an image to https://www.google.de/searchbyimage/upload. I got that working (Posting multipart/form-data via C#)
The only thing I now need to know is:
How is the image sent by the browser usually? In the multipart/form-data I found something called "image_content" in a sniffed request, what stores the image data.
But I don't know which format the image is stored.
------WebKitFormBoundaryumAjUbPr6ymfh8hM
Content-Disposition: form-data; name="image_content"

------WebKitFormBoundaryumAjUbPr6ymfh8hM
Any suggestions?
The default encoding is base64. You should form a request that matches your sniffed request, except for the following:
The WebKitFormBoundaryumAj... string should have a random string appended to ensure its uniqueness
The _9j_ line should be replaced with the base64-encoded contents of the image you are uploading.
The server will automatically detect the type of file (JPG, PNG, etc) so you shouldn't need to worry about that.
This is base64 encoded image. You can actually use it in many places, such as in CSS and in JavaScript. You can basically place it anywhere, where usual URI would be required. You can also encode many different things in such way (typefaces used in #font-face, for example).
In most modern computer languages there is built in functionality for base64 encoding – just google for one in C# if that's what you're using.
You can read more on the usage of data-URIs here: https://developer.mozilla.org/en-US/docs/data_URIs and perhaps here: http://css-tricks.com/data-uris/

how to convert MS word Unicode 2-byte Cyrillic to CP866 1-byte Cyrillic

I am having an issue with a piece of hardware that only contains the CP866 library/code page for Cyrillic. The text that I want to display is currently in MS Word and I need to convert it to the CP866 in a text file. (I know it just keeps getting worse!)
I am aware that MS Word uses Unicode to display Cyrillic and if i am not mistaken it uses the UTF-16. So if I try to copy it to NP++, which from what I can tell only uses UTF-8, the HEX value changes.
For example HEX values for 'й': UTF-16 is 0439 but UTF-8 is d0b9 but what I need is CP866 HEX 89.
Now I wish I could use different hardware, but it is what it is. Does anyone know the best way to make this happen? Maybe a different Text Editor someone could suggest.
Thanks for the help
I think I figured it out.
Open the .doc file, go to Word Options under the main round office button. Advanced tab -> General tab -> check Confirm file format conversion on open. click ok. close that file
Reopen the .doc file. Save as, change type to Plan text (.txt), file conversion should pop up. choose Cyrillic (DOS). click OK. new pop-up about something might not display, blah blah blah... click Yes.
Close the file.
go to the file and open it in NP++. everything looks all strange because its now displaying the format based on the ANSI map... BUT, the HEX values seem (I have not completely verified) to be the correct CP866. Now I can load my hardware.
I will be working on this for another day or two. I will report back if this did not work correctly.
Take a day off and come back later. It always seems to work. Hope this helps out anyone else who maybe experiencing Similar issues.
Best!

How to repair unicode letters?

Someone in email sent me letters like this
IVIØR†€™
correct should be
IVIØR†€™
suppose to be
How do I represent them in their original Portuguese langauge, it got altered after being passed through HTTP GET request.
I probably will not be able to fix the site.. but maybe create a repair tool to repair these broken encoded letters? or anyone know of any repair tool? or how to do it manually by hand? Seems like nothing is lost.. just badly interpreted
What happened here is that UTF-8 got misinterpreted as ISO-8859-1; and then other kinds of mangling (the bad ISO-8859-1 string being re-UTF-8-encoded; the non-breaking space character '\xA0' being converted to regular space '\x20') seem to have happened afterward, though those may just be a result of pasting it into Stack Overflow.
Due to the subsequent mangling, there's no really good way to completely undo it, but you can largely undo it by passing it through a not-very-strict UTF-8 interpreter. For example, if I save "IVIØR†€™" as a text-file on my computer, using Notepad, with the "ANSI" (single-byte) encoding, and then I open it in Firefox and tell it to interpret it as UTF-8 (Firefox > Web Developer > Character Encoding > Unicode (UTF-8)), then it displays "IVIØR� €™". (The "�" is because of the '\xA0' having been changed to '\x20', which broke the UTF-8 encoding.)
They're probably not broken. It's just a difference between the encoding they were sent in, vs. the decoding you're viewing them in.
Figure out what encoding was originally used, and use the same one to decode it, and it should look like the original. In terms of writing a "fix-it" tool, you'd always need to know what encoding they were originally created in, which can be complicated depending on the source, and whether or not you have access to said information.

Resources