Random .png string added to URL requests - http

Our application server is receiving a few dozen requests per day with a malformed URL, containing a random .png image reference on the end of the GET.
For example, our URL (with parameters) is supposed to end with this:
&quiz_psetGuid=PSETC0A80101000000234e7960020000
And instead it ends with this when the server receives it:
&quiz_psetGuid=PSETC0A80101000000234e7960020000/cfyxqvn.png
The .png reference is not ours and we didn't put it there. Needless to say it makes it impossible to read the URL parameter.
This problem occurs from multiple user agents.
Any idea where this is coming from?

We've been seeing these requests recently and as they also come from logged in members, it is quite clearly some add-on (likely some malware) installed on various machines. In our case there's no / added before the random string so a valid url sucj as
/hu/stamps/countries
becomes
/hu/stamps/countriesudkatuuajqi.png

Related

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

Server Log Showing Many 'Unhandled Exceptions' From URL with &hash=

I've noticed a large increase in the number of events logged daily that have &hash= in the URL. The requested URL is the same every time but the number that follows the &hash= is always different.
I have no idea what the purpose of the &hash= parameter is, so I'm unsure if these attempts are malicious or something else. Can anyone provide insight as to what is being attempted with the requested URL? I have copied in one from a recent log below.
https://www.movinglabor.com:443/moving-services/moving-labor/move-furniture/&du=https:/www.movinglabor.com/moving-services/moving-labor/move.../&hash=AFD3C9508211E3F234B4A265B3EF7E3F
I have been seeing the same thing in IIS on Windows Server 2012 R2. They were mostly HEAD requests. I did see a few other more obvious attack attempts from the same ip address so I'm assuming the du/hash thing is also intended to be malicious.
Here's an example of another attempt which also tries some url encoding to bypass filters:
part_id=D8DD67F9S8DF79S8D7F9D9D%5C&du=https://www.examplesite.com/page..asp%5C?part...%5C&hash=DA54E35B7D77F7137E|-|0|404_Not_Found
So you may want to look through your IIS logs to see if they are trying other things.
In the end I simply created a blocking rule for it using the Url Rewrite extension for IIS.

ASP.Net Relative Redirects and Resource Paths

We are working on the conversion of an ASP site to ASP.Net and are running into problems with redirects and resource locations. Our issues are coming from a peculiarity of our set-up. Our site can be accessed in two ways:
Directly by URL: http://www.mysite.com - in this case everything works fine
Via a proxy server with a URL like: http://www.proxy.com/mysite_proxy/proxy/
In #2 "mysite_proxy" is a mapping on proxy.com that directs the request behind the scenes to www.mysite.com, "proxy" is a virtual sub-website that just redirects the request to the root of www.mysite.com. It essntially is meant to give us a convenient way of knowing if a request is hitting the site from the proxy or not.
We are running into two problems with this setup:
Using Response.Redirect either with the "~" or a plain relative path (Default.aspx) generates a 302 response with a location of "/proxy/rest_of_the_path.aspx." This causes the browser to request http://www.proxy.com/proxy/rest_of_the_path.aspx which isn't anything and doesn't even hit our server so we couldn't do an after the fact re-write.
Using "~" based URLs in our pages for links, images, style-sheets, etc. creates the same kind of path: "/proxy/path_to_resources.css." We could probably solve some of these by using relative paths for all these resources though that would be a lot of work and it would do nothing to address similar resource links generated by the framework and 3rd party components.
Ideally I want to find a global fix that will make these problems transparent to the developers working on the site. I have a few ideas at this point:
Getting rid of the proxy, it is not really needed and is there for administrative and not technical reasons. Easiest to accomplish technically, the hardest to accomplish in the real world.
Hand the problem off to the group that runs the proxy and say it is their problem they need to fix it.
Use a Response filter to modify the raw html before it is sent to the client. I know this could fix my resource links, but I am not certain about the headers (need to test it out) and there would be a performance hit to having to parse every response looking for and re-writing urls.
All of these solutions have big negatives in my mind and I was hoping someone might have another idea. So any thoughts?
Aside: there are a lot of posts up already that deal with the reverse of this issue: I have a relative URL, how do I may it absolute, but I didn't come across anything that fit the bill for the other direction.
As a fix, I'd go with a small detection routine at Global.asax:Session_Start (since i imagine that the proxy doesn't actually starts another application instance), set a session variable with the correct path, and use it instead of '~'.
In the case a different application instance is used, then use Application_Start instead of Session_Start and a static Global variable instead of a Session variable.

Route IIS request to parent folder resource

I would like to create an asp.net page in my site within a particular folder e.g. at www.xyz.com/go/mypage.aspx. I would then like any requests that are routed to sub-folders of /go to actually route the request to this page. i.e. a request to www.xyz.com/go/test/123 should actually end up being handled by /go/mypage.aspx. (Indeed, it is important that /go/test/123 does not actually have to exist - it won't). Within the page I would then analyse the original path (/go/test/123) because it will contain embedded meaning. In this way I can issue any number of urls to users, but all requests will end up at mypage.aspx. The reason I am doing this is so I can issue personalised urls that look good but always arrive at the same page, which can then deal with the request accordingly. I do not want the sub-folders to have to exist. Ideally the users would also not have to specify a particular aspx page, but would just enter a url that has the necessary codes within it. In essence I would like to replace querystring parameters with 'virtual' directory paths where the paths don't actually exist in IIS.
An example url that I would send would be www.xyz.com/go/geneva/2010/welcome/t5RT4W - I would then extract the info of geneva, 2010, welcome and t5RT4W in mypage.aspx which will receive control even though it lives at www.xyz.com/go/mypage.aspx.

ASP.NET 404 (page not found) redirection with original parameters preserved

I'm replacing an old web application with a new one, with different structure.
I can not change the virtual directory path for the new app, as I have users which have bookmarked different links to the old app.
Lets say I have a user, who has this bookmark:
http://server/webapp/oldpage.aspx?data=somedata
My new app is going to reside in the same virtual directory, replacing the old one, but it has no longer oldpage.aspx, instead it has different layout, but it still needs the parameter from the old url.
So, I have set to redirect 404 errors to redirectfrombookmark.aspx, where I decide how to process the request.
The problem is, that the only parameter I receive is "aspxerrorpath=/webapp/oldpage.aspx", but not the "data" parameter, and I need it to correctly process the request.
Any idea how I can get the full "original" url in the 404 handler?
EDIT: reading the answers, looks like I did not make the question clear enough:
The users have bookmarked many different pages (oldpage1, oldpage2, etc.) and I should handle them equally.
The parameters for each old page are almost the same, and I need a specific ones only.
I want to re-use the "old" virtual directory name for the "new" application.
The search bots, etc., are not a concern, this is internal application with dynamic content, which expires very often.
The question is - can I do this w/o creating a bunch of empty pages in my "new" application with the old names, and Request.Redirect in their OnLoad. I.e. can this be done using the 404 mechanism, or some event handling in Global.asax, etc.
For the purposes of SEO, you should never redirect on a 404 error. A 404 should be a dead-end, with some helpful information of how to locate the page you're looking for, such a site map.
You should be using a 301, moved permanently. This allows the search bots to update their index without losing the page rank assigned to the original page,
See: http://www.webconfs.com/how-to-redirect-a-webpage.php on how to code this type of response.
You could look into the UrlRewritingNet component.
You should also look into using some of the events in your Global.ascx(?extention) file to check for errors and redirect intelligently. The OnError event is what you want to work with. You will have the variables from the request at that point in time (under the HttpContext object) and you can have your code work there instead of a 404. If you go this route, be sure you redirect the 404 correctly for anything other than oldpage.aspx.
I am sorry I don't have any explicit examples or information right now, hopefully this will point you in the right direction.
POST and GET parameters are only available per request. If you already know the name of the old page (OldPage.aspx) why not just add there a custom redirect in it?

Resources