I have a website that has been experiencing errors because of null references due to poorly coded logic regarding the user agent. Basically, there has been a slew of incoming requests that contain no user agent which leads to null reference exceptions in the user agent tracking. (It contained a call to "Request.UserAgent.ToLower()) I am correcting this logic to avoid the error condition. Since I'm certain these requests are coming from specialized tools and not ordinary users, I'm also blocking empty user agents via URL rewrite rules.
I need to test both of these changes. However, I can't seem to find a user agent spoofer that will enable me to generate a simple get request with NO USER AGENT. All of the tools that I have tried will allow me to do a custom agent string, but they won't let that string be left empty and there are no options that I can find to tell it to send no user agent.
So my question is, what tools are available, for a Windows-based system, that I can use to emulate a browser request with NO USER AGENT so that I can verify that my changes are working properly?
I believe that value is coming from the request headers. If yes, just try
Fiddler. Go to composer tab (see below) - by default it adds User-Agent to the request, however when you delete it in the Composer it seems to disappear from the request.
Related
I'm just trying to get going with JMeter, and I try to understand more about User-defined cookies.
What purpose do they fulfill when you add them with hard-coded values, like for instance if you define a cookie called A and give it a value B to a certain domain for your HTTP sampler?
Much grateful for all information!
There could be several possible reasons:
Replay user session without having to re-login (i.e. session hijacking) so you would be able to debug your test by running a single request rather than the whole sequence (open login page, login, navigate somewhere, do something, etc.)
The value doesn't have to be hard-coded, it may come from correlation or calculation by the JSR223 Pre-Processor
Negative test scenarios (providing invalid cookie value to check for anticipated error)
You name it
Some web sites have Javascript scripts executed on page load which are adding/removing/updating HTTP cookie(s)
The set() method of the cookies API sets a cookie containing the specified cookie data. This method is equivalent to issuing an HTTP Set-Cookie header during a request to a given URL.
JMeter isn't executing Javascript
JMeter is not a browser, it works at protocol level. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages.
So in JMeter you can manually manipulate HTTP cookie as you expect your site to execute
I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1®ion=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)
So I have a webpage, ("http://data.terapeak.com/verify/") and I don't see any & tags in the URL so I am unaware how to post data to this. I need to do this via HTTPRequest rather than browser control. I am creating a double threaded batch searching program. I have already successfully made this using a single browser control but that wont allow for multi-threading, atleast with my current knowledge due to the fact that even when creating a new frmBrw that already exists it needs for me to set the threat apartment to single. If i set it to single, I am unable to have it send the data the the excel sheet I need both threads to access. I hope this is clear... The basic question is how can I log into this form via HTTP request.
This isn't going to be easy to answer without further details however I suspect you'll need to provide the variables via a HTTP POST request.
Can you successfully login to this page in your browser? If so, run a proxy tool such as fiddler and check the HTTP headers it makes to the server. You should see the form variables being passed over. You then need to mimic this in code.
How to: Send Data Using the WebRequest Class
Hope this gets you started
Philosophically, I am accustomed to always using GET for HTTP requests that do not alter state, and POST for requests that do. However, lately I have run into some difficulties with this that have caused me to make exceptions. I was curious if there is any non-philosophical downside to using the wrong HTTP verbs, such as security concerns like cross-site attacks.
Exception #1
I wanted to trigger a download of a requested list of files dynamically packaged into an archive. However, the list of files could grow so large that, when encoded as querystring parameters in the URL, they exceeded the url length limit in Internet Explorer. To work around this, I ended up triggering the download with a POST.
Exception #2
There is a button that is always displayed, regardless of whether you are logged in or not, but it can only alter state if you are logged in. If you press it when you are not logged in, you are taken to the login page with a querystring parameter indicating the place you were intending to go next. When you log in, it redirects you there to complete your action. However, the redirect can only generate a GET, not a POST. So we have allowed GETs to alter state in this situation.
Are there any exploits or downsides to these exceptions? Do these allow any cross-site request forgery scenarios that cannot be prevented by checking the referer header?
Answer to question in subject: Yes
Exception #1: A GET request can have a body. You don't have to put everything in the URL
Exception #2: Alter the form to use GET when not logged in and POST if logged in.
Using referer is not recommended. There have been all sorts of workarounds, and some corporate software strip it for privacy concerns.
I highly recommend a token based approach to CSRF-mitigation.
I've recently enabled Digest Authentication on an intranet website/application I am creating for my company in ASP.NET.
The reason I have done so is because Windows Authentication seemed to only work for some users, and not for others. I could not figure out why nor do I know enough about IIS to try and trace the issue. After some trial and error, I found that digest authentication seemed to give me the behaviour that I wanted. That is: allow only users with a valid account on the domain to log in to the website with their credentials.
The problem now, is that Firefox (3+) seems to ask for the user to authenticate on every HTTP request sent to the server. This does not appear to occur in Internet Explorer (6+) or Chrome.
I've tried searching for solutions but I always arrive at dead-ends. I'll find a discussion about the issue, and every posted solution leads to a dead link...or it's on Experts Exchange and I don't have access to view to solution.
The issue appears to be related (from what I've read) to the way the different browsers send their authentication headers vs how IIS interprets them. I'm not sure what I can do to change this though? One of the solutions I had found mentioned writing an ISAPI filter to fix this, but of course the link to the finished filter was broken and I have no idea how to go about making one myself.
I've tried messing with the NTLM and other auth related strings in about:config to try and force Firefox to trust my server but that doesn't seem to work either.
From a few other sources I've read, it appears that everything should work if I switch back to Windows Authentication, but then I'm back at square one where the authentication would work only for some users and not others.
A solution for either problem would work for me, but I have very little information for the Windows Authentication issue. If someone could guide me through tracing the problem I'd gladly post more information for it as well.
Here are the URLs I've found discussing what seems like the same problem. (Sorry I couldn't make them all links, it wouldn't let me post otherwise)
support.mozilla.com/tiki-view_forum_thread.php?locale=pt-BR&forumId=1&comments_parentId=346851
www.experts-exchange.com/Software/Internet_Email/Web_Browsers/Mozilla/Q_24427378.html
channel9.msdn.com/forums/TechOff/168006-Twin-bugs-in-IIS-IE-unfair-competitive-advantage-EDIT-SOLVED/
www.derkeiler.com/Newsgroups/microsoft.public.inetserver.iis.security/2006-03/msg00141.html
This is a know bug in FF. See Advanced digest authentication works from Internet Explorer however we receive multiple authentication prompts on each GET request from fire fox
IE 6 had the same bug.A potential workaround would be to re-enable "old" Digest in IIS6:
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/1d6e22ac-0215-4d12-81e9-c9262c91b797.mspx?mfr=true
Currently, if the server send an opaque directive, the IE client will return this directive value as specified in the RFC. Unfortunately, for follow-on requests from the client where the nonce count is incremented (count 2 and beyond) the opaque directive value is not sent. This then fails authentication on the server and a 401 Unauthorized is returned. The IE client now requests the username and password for the new challenge and the file is retrieved.
This requires an additional round trip and the user is prompted for credential each time.
The RFC states that the opaque must always be sent on requests from the client.
The Digest implementation that IE6 is using is not RFC compliant (http://www.ietf.org/rfc/rfc2617.txt).
3.2.2 The Authorization Request Header
The values of the opaque and algorithm fields must be those supplied
in the WWW-Authenticate response header for the entity being
requested.
3.3 Digest Operation
A client should remember the username, password, nonce, nonce count and
opaque values associated with an authentication session to use to
construct the Authorization header in future requests within that
protection space.
Because the client is required to return the value of the opaque
directive given to it by the server for the duration of a session,
the opaque data may be used to transport authentication session state
information.
-------- Edit addition -----
Windows Authentication seemed to only work for some users, and not for others.
How did it fail? Did you enable impersonation?