Why does a percent symbol in a get request break my site? - http

I feel pretty stupid for asking this, but I'm doing a form where the user enters some input and sometimes the input is a percent symbol, say 5%. When this gets passed along as part of a GET request, like this:
http://kburke.org/project/company_x/?id=4&var1=1&ops=23255&cashflow=25000&growth=5%25&pv=100000&roe=20&profitmargin=30&roe=80&turnover=2
I get a 404 Page Not Found error. When I remove the query string pair
&growth=5%25
the page loads fine. Can someone help explain what the problem is?
Edit: I tried removing all of the Javascript from the page and the server still craps out. I also just tried running it in MAMP as
http://localhost:8888/project/company_x/?id=4&var1=1&ops=23255&cashflow=25000&growth=5%25&pv=100000&roe=20&profitmargin=30&roe=80&turnover=2
and it worked fine. I'm wondering if it's a problem with my own server. When I open Firebug to the console and run the page, I see an error very briefly and then the 404 page loads - is there a way I can pause the redirect so I can read the error message?

Check out URL ENCODING. The "%" character in a url means something special.
You encode the space character ' ' as %20 in a url.
You encode the percent character '%' as %25 in a url.
So after your url gets to the script, your argument 'growth' will equal "5%".
I tried messing around with your url and it appears that your script is crashing when it tries to parse the growth argument, and your web site is hiding that crash from you by sending you to the 404 page. I'd post your script code if you need more help.

Related

Scrapy empty xpath response

I'm trying to get the url of images from this url: https://www.iproperty.com.my/sale/all-residential/ .
Using Chrome extension Xpath Helper, I've identified the Xpath and used Scrapy Shell to get a response:
fetch("https://www.iproperty.com.my/sale/all-residential/")
response.xpath("//div[#class='cFwUMy']/div[#class='fUtkLG']/div[#class='slick-initialized slick-slider']/div[#class='slick-list']/div[#class='slick-track']/div[#class='slick-slide slick-active'][1]/div[#class='img-wrapper']/a/div[#class='cHKlDH']/img[#class='lazyautosizes lazyloaded']/#src")
However, it doesn't return anything.
I've also tried:
response.xpath("//div[#class='img-wrapper']/a/div[#class='cHKlDH']")
Still not working.
How do I get the url of the image from the page? I've been successful with getting the title, location, and price, but am stuck with getting the images.
EDIT1:
So weird, I tried
response.xpath("div[#class='img-wrapper']/a")
It returns the links as expected, but
response.xpath("div[#class='img-wrapper']/a/div[#class='cHKlDH']")
and
response.xpath("//div[#class='cHKlDH']")
simply refuses to return anything.
Scrapy only downloads initial pages response
It does not executes an Javascript as our normal browser does.
Trick is, disable Javascript in your browser and then check if your desired element exists or not
In the website mentioned above, they have image links in JSON format in their initial page response and after that
In scrapy, you can do
re.findall(r"window.__INITIAL_STATE__ =(.*)window.__RENDER_APP_ERROR__", response.body, flags=re.DOTALL)
It will return you this JSON code, https://jsoneditoronline.org/?id=bbef330441b24957aeaceedcea621ba7
listings > items key, it has all data, prices/images you need
Here is complete working Python code
https://repl.it/#UmairAyub/AdmirableHilariousSpellchecker

ASP.NET Core URL Parameter Decoding

I have an ASP.NET Core web API and an issue with encoded URL's in query parameters.
I have an URL parameter like 'path/to/'. The IDENTIFIER part is something like 'HÄÄ/20/19'. This is urlEncoded in frontend to a link URL. The result is a link like
domain.com/new/stuff/path/to/H%C3%84%C3%84%2F20%2F19
Now, at some point, user gets redirected to a controller where this URL is used in a query parameter like:
param=%2Fpath%2Fto%2FH%C3%84%C3%84%2F20%2F19
I'm using request query to get the param
var param = HttpContext.Request.Query["param"].ToString();
After this the value of param is
%2Fpath%2Fto%2FHÄÄ%2F20%2F19
So the LATIN CAPITAL LETTER A WITH DIAERESIS are automatically decoded as the other encoded characters are not.
The actual problem comes when I'm redirecting the user to this URL. It ends up with a referer header where it causes havoc with an error message
System.InvalidOperationException: Invalid non-ASCII or control character in header: 0x00C4
I tried to just replace all the 'Ä' characters with 'A' and the problem is fixed. This is not a real fix though. I cannot encode the whole variable (see above) as it would result in double encoding for other encoded characters.
This problem only occurs with IE11 and Edge (AFAIK) and works fine with at least Chrome.
I'm not 100% sure where the actual problem is and why this is happening so does anyone have any ideas where to start looking and how to fix this without hacking with the string.replace?
EDIT
I could fix it with something like this, but I'm not seriously doing this. Seems way too hacky.
var problemPart = param.Substring(param.LastIndexOf('/') + 1, param.Length - param.LastIndexOf('/') - 1);
var fixedPart = WebUtility.UrlDecode(problemPart);
fixedPart = WebUtility.UrlEncode(fixedPart);
param = param.Replace(problemPart, fixedPart);
EDIT 2
I think the problem is that IE11 and Edge change the encoding by adding control characters to it when the URL ends up to the referer header. The fix I added to the original post doesn't actually fix the problem but just work around it. The control character that gets added to the URL is %C2%84 (so Ä becomes %C3%84%C2%84 instead of just %C3%84)
TEMPORARY WORKAROUND
I basically used the code above to workaround the issue. I iterated the parameter value and re-encoded all the invalid characters in it. This doesn't fix the root cause but works around the issue and user doesn't get any errors to the screen.

Collation urls not followed for Google Webmaster

I got lots of not followed page on Google Webmaster. I check them and is because lots of url are like http://www.mysite.net/2013/06/burn-notice-7%C3%9702-sub-espanol-online.html
whe the correct url have to be http://www.mysite.net/2013/06/burn-notice-7x02-sub-espanol-online.html
Im try to post a title wit many "x" on it and the only that weird %C3%97 when I post for example a new serie episode like this title: Burn Notice 7x02 Sub Español Online. When the x is between number appear %C3%97 and that made my posts duplicate.
So I try to fix changed the database collation from latin1_swedish_ci to utf8_general_ci but is still the same happend. I check as well my wp-config.php and is define('DB_CHARSET', 'utf8');
Please, some body know any good solution to fix all this situation? The database is quite big and supouse if I find a solution I need update the old url.
Thank you on advance
The URL you say Google is using:
http://www.mysite.net/2013/06/burn-notice-7%C3%9702-sub-espanol-online.html
is almost the same as the URL:
http://www.mysite.net/2013/06/burn-notice-7x02-sub-espanol-online.html
as the percent encoded characters actually repreesent Unicode Character 'MULTIPLICATION SIGN' aka it's an '×' not an 'x'. Google is just using the percent encoded version to be safe. That means that your database is probably fine, as it is showing URLs as valid UTF8.
The problem probably lies in how you're interpreting the requested URL and trying to match it to the database. PHP should already be decoding the percent encoded value to '×', so either:
Something is breaking the string (e.g. calling a non-multibyte safe function like strtolower() instead of mb_strtolower()).
Your PHP code is connecting to the database in a character set other than UTF8, please check that your my.cnf file contains 'default-character-set=utf8' in the client section.
or there's some other issue. The URL does appear valid though.

ASP.NET - Strange case of inexplicable 404 error

We are having a very strange problem on one particular web server (we do not have direct access to the web server, only FTP access).
Our ASP.NET application displays a dataset into a standard GridView. One of the columns in the GridView is a basic template column, with a link redirecting to another page - passing few parameters.
One of the parameters is EmployeeName - and the following page uses that parameter to set a label.
ON this particular web server (WEBSERVER1 in this example)... the resulting link generates an error 404 (page not found)
https://WWW.WEBSERVER1.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&EmployeeName=Knutson-Haushalter, Kathleen&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
On another web server (WEBSERVER2 in this example)... the resulting link properly opens the page.
http://WWW.WEBSERVER2.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&EmployeeName=Knutson-Haushalter, Kathleen&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
(unfortunately the links above are not rendered correctly
Yes, I am aware that WEBSERVER1 is running under SSL - but am not sure why this would make a difference.
Now, we have verified that the page Customer_011B.aspx is indeed present on WEBSERVER1.
Here comes the puzzle:
If we only remove the EmployeeName parameter, the page displays correctly. All database operations are performed correctly, etc. The only "problem" is that the EmployeeName is not reported in the target label.
In other words:
This DOES NOT work and all we get is error 404
https://WWW.WEBSERVER1.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&EmployeeName=Knutson-Haushalter, Kathleen&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
This DOES work and we get to the page and we retrieve all the needed data.
https://WWW.WEBSERVER1.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
Just in case you are wondering, the only parameter needed by our data access layer is that Assignment_Id number.
Also, note that I enclosed the links in double quotes... so that they would render properly...
Use the UrlEncode and UrlDecode to place the parametres on your url. I see that you use spaces and slash and commas. Parametres with slash/space/comma and other invalid url characters maybe cut or change by enabled url filter on one of the iis server.

Ampersands in URLRewriter Query Strings

I have a query string parameter value that contains an ampersand. For example, a valid value for the parameter may be:
a & b
When I generate the URL that contains the parameter, I'm using System.Web.HTTPUtility.UrlEncode() to make each element URL-friendly. It's (correctly) giving me a URL like:
http://example.com/foo?bar=a+%26b
The problem is that ASP.NET's Request object is interpreting the (encoded) ampersand as a Query String parameter delimiter, and is thus splitting my value into 2 parts (the first has "bar" as the parameter name; the second has a null name).
It appears that ASP.NET is URL-decoding the URL first and then using that when parsing the query string.
What's the best way to work around this?
UPDATE: The problem hinges on URLRewriter (a third-party plugin) and not ASP.NET itself. I've changed the title to reflect this, but I'll leave the rest of the question text as-is until I find out more about the problem.
man,
i am with you in the same boat, i have spent like hours and hours trying to figure out what is the problem, and as you said it is a bug in both, as normal links that contain weird characters or UTF-8 code characters are parsed fine by asp.net.
i think we have to switch to MVC.routing
Update: man you wont believe it, i have found the problem it is so strange, it is with IIS,
try to launch your page from visual studio Dev server and Unicode characters will be parsed just fine, but if you launch the page from IIS 7 it will give you the ???? characters.
hope some body will shade some light here
I would have thought that %26 and '&' mean exactly the same thing to the web server, so its the expected behavior. Urlencode is for encoding URLs, not encoding query strings.
... hang on ...
Try searching for abc&def in google, you'll get:
http://www.google.com.au/search?q=abc%26def
So your query string is correct, %26 is a literal ampersand. Hmm you're right, sounds like a bug. How do you go with an & instead of the %26 ?
Interesting reading:
http://www.stylusstudio.com/xsllist/200104/post11060.html
Switching to UrlRewritingNet.UrlRewrite did not help, as it apparently has the same bug. I'm thinking it might have something to do with ASP.NET after all.
I think URLRewriter has a problem with nameless parameters (null name).
I had a similar problem. When I gave my nameless parameter a (dummy) name, everything worked as expected.

Resources