I'm attempting to consume a REST API that does not handle periods correctly.
It will fail on http://api.com/endpoint?myparameter=includes%20a.
But it works fine when the period is encoded as %2E thus:
http://api.com/endpoint?myparameter=includes%20a%2E
Works fine.
Try as I might, when using requests the %2E always seems to get converted back into a "."
Is there any way of avoiding this behavior?
You can use Prepared Requests to get this to work since Requests by default will use the requote_uri(uri) function in requests.util to helpfully take care of unquoting the unreserved characters for you. If you already have parsed and prepped the URL yourself, you can do the following and override the url field:
from requests import Session, Request
s = Session()
req = Request('GET', 'http://localhost:8008?name=kevin%2Eemckinsey')
# This will use `requote_uri` to unquote unreserved characters so %2E becomes a `.`
prepped = req.prepare()
# Forcing the `url` field to be a URL we specified.
prepped.url = 'http://localhost:8008?name=kevin%2Emckinsey'
resp = s.send(prepped)
print(resp.url)
print(resp.json())
# http://localhost:8008?name=kevin%2Emckinsey
# PHP's $_SERVER['REQUEST_URI'] returns:
# {'name': '/?name=kevin%2Emckinsey'}
This seems like a dirty trick to me but AFAIK, there's no way to tell Requests not to unquote a specific character.
Related
I am trying to decode filenames in HTTP but the string from browser messages are different.
In my test file I put the name ç.jpg.
What I need is the name %C3%A7.jpg.
But the browser is sending %C3%83%C2%A7.jpg.
It's not UTF8, UTF16 or UTF32.
For another example I test the file name €.jpg.
What I need is the name %E2%82%AC.jpg.
But I am receiving %C3%A2%E2%80%9A%C2%AC.jpg.
how can I convert this names to UTF8?
Ok I played with this for about 30 minutes and I finally figured it out.
This is how the original string was encoded:
The string was in UTF-8
Some encoding mechanism thought it was CP1252, and based on that wrong assumption re-encoded it to UTF-8 again.
The resulting string was url-encoded.
To get back to a real UTF-8 string, this is what I did. (note, I used PHP, don't know what you are using but it should be doable in other languages just the same).
$input = '%C3%A2%E2%80%9A%C2%AC %C3%83%C2%A7';
$str1 = urldecode($input);
echo iconv('UTF-8', 'CP1252', $str1);
// output "€ ç"
So that conversion is counter intuitive. We're converting to CP1252, but still end up with a UTF-8 string. This only works because an existing UTF-8 was falsely treated as CP1252, and that incorrect interpretation was then converted to UTF-8. So I'm just reversing this double-encoding.
In other languages there might be a few more steps, this works in just 1 line with PHP because strings are bytes, not characters.
I read a csv file for input in my jmeter test plan. I name the first variable in the row query.
I need it to encode spaces as %20 not +. Using the __urlencode() function like ${__urlencode(${query})} encodes the spaces as + the same way selecting the encode option on the parameter does in the above screenshot.
I don't think this is something you're really want as encoding the URL is not only about spaces.
You should use encodeURIComponent() function (or its equivalent). The way of calling it in JMeter via __javaScript function will look like:
${__javaScript(encodeURIComponent("${query}"),)}
If you just need to replace spaces with %20 you can do it with __groovy() funciton like:
${__groovy(vars.get('query').replaceAll(' '\, '%20'),)}
Demo:
See Apache JMeter Functions - An Introduction article for more information on JMeter Functions concept.
I use http://www.regexper.com to view a picto representation regular expressions a lot. I would like a way to ideally:
send a regular expression to the site
open the site with that expression displayed
For example let's use the regex: "\\s*foo[A-Z]\\d{2,3}". I'd go tot he site and paste \s*foo[A-Z]\d{2,3} (note the removal of the double slashes). And it returns:
I'd like to do this process from within R. Creating a wrapper function like view_regex("\\s*foo[A-Z]\\d{2,3}") and the page (http://www.regexper.com/#%5Cs*foo%5BA-Z%5D%5Cd%7B2%2C3%7D) with the visual diagram would be opened with the default browser.
I think RCurl may be appropriate but this is new territory for me. I also see the double slash as a problem because http://www.regexper.com expects single slashes and R needs double. I can get R to return a single slash to the console using cat as follows, so this may be how to approach.
x <- "\\s*foo[A-Z]\\d{2,3}"
cat(x)
\s*foo[A-Z]\d{2,3}
Try something like this:
Query <- function(searchPattern, browse = TRUE) {
finalURL <- paste0("http://www.regexper.com/#",
URLencode(searchPattern))
if (isTRUE(browse)) browseURL(finalURL)
else finalURL
}
x <- "\\s*foo[A-Z]\\d{2,3}"
Query(x) ## Will open in the browser
Query(x, FALSE) ## Will return the URL expected
# [1] "http://www.regexper.com/#%5cs*foo[A-Z]%5cd%7b2,3%7d"
The above function simply pastes together the web URL prefix ("http://www.regexper.com/#") and the encoded form of the search pattern you want to query.
After that, there are two options:
Open the result in the browser
Just return the full encoded URL
Plone is showing the special chars from my mother language (Brazilian Portuguese) in its pages. However, when I use a spt page I created it shows escape sequences, e.g.:
Educa\xc3\xa7\xc3\xa3o
instead of
Educação
(by the way, it means Education). I'm creating a python function to replace the escape sequences with the utf chars, but I have a feeling that I'm slaving away without need.
Are you interpolating catalog search results? Those are, by necessity (the catalog cannot handle unicode) UTF-8 encoded.
Just use the .decode method on strings to turn them into unicode again:
value = value.decode('utf8')
A better way should be to use safe_unicode function https://github.com/plone/Products.CMFPlone/blob/master/Products/CMFPlone/utils.py#L458
from Products.CMFPlone.utils import safe_unicode
value = safe_unicode(value)
What's the difference between HttpServerUtility.UrlPathEncode and HttpServerUtility.UrlEncode? And when should I choose one over the other?
UrlEncode is useful for query string values (so to the left or especially, right, of each =).
In this url, foo, fooval, bar, and barval should EACH be UrlEncode'd separately:
http://www.example.com/whatever?foo=fooval&bar=barval
UrlEncode encodes everything, such as ?, &, =, and /, accented or other non-ASCII characters, etc, into %-style encoding, except space which it encodes as a +. This is form-style encoding, and is best for something you intend to put in the querystring (or maybe between two slashes in a url) as a parameter without it getting all jiggy with the url's control characters (like &). Otherwise an unfortunately placed & or = in a user's form input or db value value could break things.
EDIT: Uri.EscapeDataString is a very close match to UrlEncode, and may be preferable, though I don't know the exact differences.
UrlPathEncode is useful for the rest of the query string, it affects everything to the left of the ?.
In this url, the entire url (from http to barval) should be run through UrlPathEncode.
http://www.example.com/whatever?foo=fooval&bar=barval
UrlPathEncode does NOT encode ?, &, =, or /. It DOES, however, like UrlEncode, encode accented/non-ASCII characters with % notation, and space also becomes %20. This is useful to make sure the url is valid, since spaces and accented characters are not. It won't touch your querystring (everything to the right of ?), so you have to encode that with UrlEncode, above.
Update: as of 4.5, per MSDN reference, Microsoft recommends to only use UrlEncode. Also, the information previously listed in MSDN does not fully describe behavior of the two methods - see comments.
The difference is all in the space escaping - UrlEncode escapes them into + sign, UrlPathEncode escapes into %20. + and %20 are only equivalent if they are part of QueryString portion per W3C. So you can't escape whole URL using + sign, only querystring portion. Bottom line is that UrlPathEncode is always better imho
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx
To explain it as simply as possible:
HttpUtility.UrlPathEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http://www.foo.com/a%20b/?eggs=ham&bacon=1
and
HttpUtility.UrlEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http%3a%2f%2fwww.foo.com%2fa+b%2f%3feggs%3dham%26bacon%3d1