There is an app which I use, which can read the name of a location from a qr code.
Recently, the qr code was changed so that the name of the location is no longer readable by zxing or any other barcode reader that I can find. Instead, I get a long string of numbers and letters. (The data that I need comes after the '&ln=' or '&eln=' in the url that is returned.)
The first example below is the new qr code. It returns the following URL:
https://mysejahtera.malaysia.gov.my/qrscan?lId=62419209a90dcd50091c36cb&eln=TW9qaXRvJ3MgQmVlciBCYXI=&formType=REGULAR&isExternal=false
The second one returns this:
https://mysejahtera.malaysia.gov.my/qrscan?lId=5edc745eb9e6850245c07e4b&ln=Osdin_Lighting_Enterprise
The original app can read the long location string in both the encrypted and human readable format. I want to be able to do the same. For example, the location of the first url is "Mojito's Beer Bar." The original app can read this and displays it correctly.
My feeling is that there must be a private key encryption which the app used to decipher the code. However, is it possible that there is a simple reason that a normal barcode/qrcode reader can't get the plain readable location text?
All I am looking for here is some pointers of where I should be looking. I have the decompiled source code from the MySejahtera app and have been digging through it without any luck. I'm happy to share this if anyone would be willing to help.
The "encoded" URL contains the value TW9qaXRvJ3MgQmVlciBCYXI= for the attribute eln. This is obviously a Base64 encoded value.
If you run the value through a Base64 decoder (e.g. https://www.base64decode.org/), the result is:
Mojito's Beer Bar
Base64 doesn't specify what text encoding is used. But likely it is UTF-8.
Related
I got lots of not followed page on Google Webmaster. I check them and is because lots of url are like http://www.mysite.net/2013/06/burn-notice-7%C3%9702-sub-espanol-online.html
whe the correct url have to be http://www.mysite.net/2013/06/burn-notice-7x02-sub-espanol-online.html
Im try to post a title wit many "x" on it and the only that weird %C3%97 when I post for example a new serie episode like this title: Burn Notice 7x02 Sub Español Online. When the x is between number appear %C3%97 and that made my posts duplicate.
So I try to fix changed the database collation from latin1_swedish_ci to utf8_general_ci but is still the same happend. I check as well my wp-config.php and is define('DB_CHARSET', 'utf8');
Please, some body know any good solution to fix all this situation? The database is quite big and supouse if I find a solution I need update the old url.
Thank you on advance
The URL you say Google is using:
http://www.mysite.net/2013/06/burn-notice-7%C3%9702-sub-espanol-online.html
is almost the same as the URL:
http://www.mysite.net/2013/06/burn-notice-7x02-sub-espanol-online.html
as the percent encoded characters actually repreesent Unicode Character 'MULTIPLICATION SIGN' aka it's an '×' not an 'x'. Google is just using the percent encoded version to be safe. That means that your database is probably fine, as it is showing URLs as valid UTF8.
The problem probably lies in how you're interpreting the requested URL and trying to match it to the database. PHP should already be decoding the percent encoded value to '×', so either:
Something is breaking the string (e.g. calling a non-multibyte safe function like strtolower() instead of mb_strtolower()).
Your PHP code is connecting to the database in a character set other than UTF8, please check that your my.cnf file contains 'default-character-set=utf8' in the client section.
or there's some other issue. The URL does appear valid though.
I am developing an http server with Netty. On some occasions, the server must answer a 1x1 transparent pixel. So I hard-coded a GIF transparent pixel in base64, and returned it with the following code :
String pixel_string= new String (Base64.decodeBase64("R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="));
HttpResponse response = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK);
response.setContent(ChannelBuffers.copiedBuffer(pixel_string, CharsetUtil.UTF_8));
EDIT : I also set the content-type :
response.setHeader(HttpHeaders.Names.CONTENT_TYPE,
"image/gif");
In Chrome, everything is fine. However, Firefox tells me that it cannot display the pixel (which is pretty bad for my app), as the pixel data in invalid.
After many investigations, I finally figured out a fix, by changing the charset to Iso-8859-1.
response.setContent(ChannelBuffers.copiedBuffer(
responseBuilder.pixel_string, CharsetUtil.ISO_8859_1));
I don't understand why it works, which makes me think that I may run into troubles in some cases. I tried to change the Firefox preferences (to have UTF8 as default), but it doesn't change much.
Why does Firefox accept the ISO-8859 encoding, and not UTF-8 ? Can I change that ? Would someone have a clue on the origin of the issue and how to be sure that it will work whatever the user's setting ?
Thanks
It's not Firefox that's accepting the encoding or not. It's your server.
When you do your base64 decode you produce a string that contains some characters... but what you really produced was bytes that you're then thinking of as characters somehow. Since a Java String is a container that holds a UTF-16 string, in practice what you're doing is taking each byte, treating it as a a 16-bit integer and constructing the UTF-16 "string" made up of those code units.
But when you want to put all this on the network, you have to convert you string to bytes, and the argument to copiedBuffer says how to do that. If converting to UTF-8, any character that came from a byte that had the high bit set will end up getting encoded as a two-byte UTF-8 sequence. On the other hand, if converting to ISO-8859-1, the conversion just drops the high byte of each UTF-16 code unit (which in your case is always zero anyway).
So the conversion to ISO-8859-1 produces the actual byte array you got out of base64-decoding, while the conversion to UTF-8 produces.... something else which may or may not actually make any sense depending on the exact byte values.
The copiedBuffer constructor you call is not appropriate for the type of data (binary) you are using. According to the JavaDoc of the Netty API, the one you are calling is:
Creates a new big-endian buffer whose content is the specified string
encoded in the specified charset.
Which means that your binary data is being "converted" to UTF-8 (which is meaningless). If you try to save the generated file and look at it with a hex editor, you'll probably see that it is corrupted.
Try with something like this (untested code):
static byte[] pixel_data = Base64.decodeBase64("R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==");
HttpResponse response = ...
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "image/gif");
response.setContent(ChannelBuffers.copiedBuffer(pixel_data));
I am using asp.net and c# in my application and Mysql as Database.I want to take input from user in hindi and store it in database and retrieve it.
When I am storing the hindi characters in from Mysql database it is working fine for me but when I am using textbox to input a hindi characters it is showing me ?????????.
I guess the problem is the aspx page does not set to support hindi characters.Please tell me the way to achieve this.
I guess using UTF-8 encoding on your Http request and responses would solve it. What is your requestEncoding and responseEncoding in your Web.config file set to currently?
See more on the <globalization> tag here:
http://msdn.microsoft.com/en-us/library/hy4kkhe0(v=VS.100).aspx
try this:-
// mytable=2 fields id(auto increment),title(nvarchar(max))
string title = "बिलाल";
SqlCommand cmd = new SqlCommand("insert into mytable values (N'" + title + "')", con);
con.Open();
cmd.ExecuteNonQuery();
con.Close();
Haha.. Oh the memories (and I only had to deal with spanish which fits into the default latin1).
So I don't know the MS side of the stack, but I assume it's the same types of solutions as Java. Namely you should always assume UTF-8, and thus make your Content-Type HTML responses always show UTF-8 so that browsers know to encode POST data in UTF-8. You should always inspect the encoding type of HTML POST's just in case you have a browser that ignored the encoding of the HTML form (someone might be using curl/wget/custom-browser). You need to learn how in MS-land to convert from one encoding type into UTF-8 (in java, for reference, we just say String s = new String(bytes, encoding_name))
Assuming that MS's stack uses UTF-16 or UCS-32 or whatever so that UTF-8 is easy to extract, next comes the mysql layer.
This includes 2 things..
1) column encoding MUST be set to UTF8.. It's not obvious at all how to do it, and even the spelling is annoying.. Just google it.. "create database foo default character set UTF8" (approximate syntax), or if you're worried for some reason, do it at the table level "create table foo (..) character set UTF8" (approximate syntax).. Or if the table is already there, take EVERY column that can take arbitrary web-form text (possibly including login-name, but not columns like enumerated varchars - as it would waste index space - even though you'd think it wouldn't) "alter table foo change name varchar(255) character set UTF8" (approximate syntax).
2) You MUST make the ODBC connection (jdbc in java, don't know in MS), encode all in/out characters at UTF-8. There are two parameters I set (use-unicode, and character-set=UTF-8) (approximate parameter names).
Google it all, but this should point you in the right direction.
Test the existing DB by connecting to mysql both with character-set=UTF8 and latin1.. You'll see totally different output in your text-data when connected as each encoding. If you're lucky, you already got the data in correctly.. Otherwise you'll have to regenerate ALL the data, or perform some very clever character conversion hacks like I had to do once upon a time (painful stuff).
I've been given the task of creating an ICAL feed of conference calls for members of our organization. I created a handler in ASP.NET that loops through our database, gets the call data from the database, and creates output that appears valid to me based on what I've read of the ICAL format, and the examples I've seen/disassembled.
Outlook 2007 reads the resulting output and displays the calendar, no problem (screenshot here shows how it renders).
30 Boxes also has no problem with it. (see test here).
But when I try to load the same output into Google Calendar, I get the message "We could not parse the calendar at the url requested":
What's wrong with my output that's causing Google to reject it? You can see the temporary data I'm testing with at this URL: http://www.joshuacarmody.com/temp/icaltest.ics. This is a snapshot of the output from my .ASHX file, unaltered except the phone numbers and passcodes have been sanitized.
Edit with additional Info:
I just tried the following
Created a copy of my test file called "icaltest-1googevent.ics"
Deleted all the VEVENT data from the file
Exported one of my Google calendars to ICS
Copied one VEVENT from Google's exported data into my test file
Attempted to subcribe to icaltest-1googevent.ics in Google Calendar.
I still got an error message. So I'm guessing the issue isn't with my VEVENT data, but with something else about the file. Maybe there's something wrong with my VCALENDAR definition?
the severinghaus ics validator seems to think there is something funny ( a ? ) before the BEGIN CALENDAR
http://severinghaus.org/projects/icv/?url=http%3A%2F%2Fwww.joshuacarmody.com%2Ftemp%2Ficaltest.ics
In my testing google was a lot fussier/rigourous/pedantic - once you get it working with the validator and google it should work in most places.
After lots of trial-and-error, and comparing my output with Google's, I got it working. There were a few problems with my ICS file:
Unescaped characters (I didn't know I had to escape commas!)
Inconsistent line return characters. They didn't show up in my text editor, but I had to use .NET's String.remove() to remove "\r" from my output to get Google to recognize it
The file was missing VCALENDAR:END. Apparently Outlook doesn't much care. Google does.
I had not one, but three funny characters before the BEGIN:VCALENDAR, decimal codes 239, 187, 191.
I found them thanks to the severinghaus.org link above, thanks!
It turns out they're a prefix called BOM in UTF-8, you can read up on it here: http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark
Google doesn't handle this, but after stripping these three characters from the file and uploading to the server, I was able to susbribe to that calendar in Google Calendar (from URL).
I hope this helps someone passing by this page in the future...
I had similar problem until I realised that opening the generated .ics file in Notepad++ wasn't in UTF-8. I was using a method to convert my string to a byte array, but wasn't using an encoder for this, so no matter what content headers I used, the file would never be generated using UTF-8. This simple fix resolved the UTF-8 generation and Google is now happy with my feed:
var utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(myString);
myString= utf8.GetString(utfBytes, 0, utfBytes.Length);
In some javascript, I have:
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
alert( url );
location.href = url;
where the value of address is the string "Seattle, WA".
In the alert I see
find.aspx?Seattle%2C%20WA
as I expect.
But on the server side, when I look at Request.Url, the relevant substring I see is
find.aspx?Seattle, WA
And in the Firefox url window I see
find.aspx?location=Seattle%2C WA
So I'm getting three different representations whereas I would expect that in all three places I should see what I see in the alert. My expectation is that the url I assign to location.href should show up as-is in the browser url window, and should be passed as-is to the server in Request.Url (and I would need to decode the values on the server before using them). What's happening?
Firefox converts certain encoded characters into their literal forms as a way to be friendly to users. It will also convert spaces typed into the address bar into %20 for the server.
Update: The reason Firefox doesn't display the comma unencoded is because commas are allowed in URLs, but spaces are not, so it knows that a space is going to be unambiguously interpreted, whereas the pre-encoded comma is different from a non-encoded comma to some servers. see: Can I use commas in a URL?
ASP is probably trying to help you out by auto-un-encoding the string for you.
Update: It looks like ASP.NET unencodes Request.Url for you by default, as mentioned here: QueryString malformed after URLDecode They also mention that you can use HttpRequest.Url.Query to access the un-decoded version.
The alert is the only thing not doing any "magic" for you.
For the alert, you are doing the encoding yourself. Perhaps it looks the same as on the server-side if you removed encodeURIComponent.
On the server side, ASP.NET will always show you the unencoded form. This is to make it easier to directly map to files that also have text that needed to be (un)encoded.
Note that you can replace every letter for its UTF8 representation in URL Encoding. It will still be the same URL. I.e., type the following in the browser window and it will still work: %66%59%6E%64.aspx?location=Seattle%2C%20WA. To only encode the necessary chars, use UrlEncode on the server side if you create a link yourself.
URL encoding can become fairly tricky. You ask to explain it. To know the correct escape of a certain character, you need to know how that character looks in UTF8. The hexadecimal value of the UTF-8 bytes then become the %XX%YY value of your letter. Sometimes it's one %XX, but it can be up to six byte sequences in total (some Chinese characters for instance).
URL Encoding works one way only. Never double-encode or double-unencode. This is prohibited by the specification. Also, because you can encode any character, it is not always possible (as you found out) to do roundtrip encoding/unencoding. If you unencode and re-encode again, it is well possible that the resulting string is different, but syntactically the same.
In HTML, URL Encoding is sometimes interspersed with HTML Encoding. I.e., the ampersand is valid in HTML, but not in HTML. find.aspx?city=A&name=B becomes find.aspx?city=A&name=B in and HTML URL. However, browsers are lenient and will accept wrongly HTML-encoded strings.
Finally, a not on the browser: if you type in a space in a link, even inside an <a> tag, it will escape the space (or other character) for you. Likewise, it will nowadays show the odd characters (é, ï etc) in the address bar, but when it sends it over HTTP, the browser will correctly do the encoding for you.
Update: about anwering your question of needing a "definitive" reference or proof.
While I couldn't find any on the internet, I decided to look for it myself using Reflector. Going through the methods that set, for instance, the HttpRequest.QueryString, you quickly encounter the private method HttpRequest.FillInQueryStringCollection which then calls HttpValueCollection.FillfromEncodedBytes. Somewhat near the end of that method, HttpUtility.UrlDecode is called for the values. Conclusion: do not call it yourself, to prevent double decoding.
You can see this for yourself when you download Reflector and disassemble the .NET libs of System.Web.
For your example you can change this line
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
to
var url = "find.aspx?" + "location=" + address;
and see the address as it is. Bu if address variable contains any '&' character your variable will be corrupt. So you are using encodeURIComponent to encode these things url.
On the Server side all these encoded strings are decoded back. It means encodeURIComponent is just for sending the address variable (whether it contains & character or not) to server side correctly.