JMeter: The purpose of User-defined Cookies for requests - http

I'm just trying to get going with JMeter, and I try to understand more about User-defined cookies.
What purpose do they fulfill when you add them with hard-coded values, like for instance if you define a cookie called A and give it a value B to a certain domain for your HTTP sampler?
Much grateful for all information!

There could be several possible reasons:
Replay user session without having to re-login (i.e. session hijacking) so you would be able to debug your test by running a single request rather than the whole sequence (open login page, login, navigate somewhere, do something, etc.)
The value doesn't have to be hard-coded, it may come from correlation or calculation by the JSR223 Pre-Processor
Negative test scenarios (providing invalid cookie value to check for anticipated error)
You name it

Some web sites have Javascript scripts executed on page load which are adding/removing/updating HTTP cookie(s)
The set() method of the cookies API sets a cookie containing the specified cookie data. This method is equivalent to issuing an HTTP Set-Cookie header during a request to a given URL.
JMeter isn't executing Javascript
JMeter is not a browser, it works at protocol level. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages.
So in JMeter you can manually manipulate HTTP cookie as you expect your site to execute

Related

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

Sending passwords through GET request

I have an app that requires users to enter database passwords. These passwords will not be saved on the server, and the server does not need to remember anything about the database after the request. If it's my understanding, most web servers will log get requests, and the browser can as well (does it do this even for fetch() requests?). I do not want to put databases at risk, but I also understand that you should not use a body in GET requests.
I am also not creating resources, so from what I also understand, I should not be using a POST request. Is there a safe way to send a get request with the password (over https) that makes sure it is not logged on the server? This would be an app that anyone could start - so I have no idea what their server configuration could be so I couldn't specifically disable it on one server to ignore it.
so from what I also understand, I should not be using a POST request.
Your understanding is incorrect. POST is the appropriate verb when none of the other verbs make sense. From the spec:
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.
Put simply, that means POST does whatever the service says it does. It isn't safe, idempotent, or cacheable so there are disadvantages to just using it for everything, but the intent is for it to be the catch-all verb.
You should not use GET because, as you mentioned, you should not include a body and URLs often get logged, which would expose your credentials.
If the client for your app is going to be a browser you can just use https only cookies to handle the authentication flow. In case if you want it to extend or use it in any other type of client, you can use the Authorization HTTP header.

Symfony2 - Check server-server request

I need to know when a request comes from a browser and when it comes from a server.
I have created an API and a listener to onKernelRequest event, I need to know what kind of request I received to execute a function or other.
How can I do this on Symfony 2.7?
A “server“ is an HTTP client just as a browser is. They only handle your websites response differently. So there’s no way to be sure who you are talking to. You can only check for a number of indicators.
You can examine the HTTP headers in the Request object. Your best bet would probably be the User-Agent header. But a non-browser could just as well fake the user agent header of an actual browser, so you’d only detect them if they want you to. And you’d have to prepare a list of user agents that you’d consider “servers“.

ASP.NET form scraping not working

I'm trying to scrape some pages on a website that uses ASPX forms. The forms involve adding details of people by updating the server (one person at a time) and then proceeding to a results page that shows information regarding the specified people. There are 5 steps to the process:
Hit the login page (the site is HTTPS) by sending a POST request with my credentials. The response will contain cookies that will be used to validate all subsequent requests.
Hit the search criteria page by sending a GET request (no parameters). The only purpose of this is to discover the __VIEWSTATE and __EVENTVALIDATION tokens in the HTML response to be used in the next step.
Update the server with a person. This involves hitting the same webpage in step 2 but using a POST request with form parameters that correspond to the form controls on the page for adding person details and their values. The form parameters will include the __VIEWSTATE and __EVENTVALIDATION tokens gained from the previous step. The server response will include a new __VIEWSTATE and __EVENTVALIDATION. This step can be repeated using the new __VIEWSTATE and __EVENTVALIDATION, or can proceed to the next step.
Signal to the server that all people have been added. This involves hitting the same page as the previous 2 steps by sending a POST request with form parameters that correspond to the form controls on the page for signalling that all people have been added. The server response will simply be 25|pageRedirect||/path/to/results.aspx|.
Hit the search results page specified in the redirect response from the previous step by sending a GET request (no parameters - cookies are enough). The server response will be the HTML that I need to scrape.
If I follow the process manually with any browser, filling in the form controls and clicking the buttons etc. (testing with just one person) I get to the results page and the results are fine. If I do this programmatically from an application running on my machine, then ultimately the search results HTML is wrong (the page returns valid HTML, but there are no results compared with the browser version and some null values were there should not be).
I've run this using a Java application with Apache HttpClient handling the requests. I've also tried it using a Ruby script with Mechanize handling the requests. I've setup a proxy server using Charles to intercept and examine all 5 HTTPS requests. Using Charles, I've scrutinized the raw requests (headers and body) and made comparisons between requests made using a browser and requests made using the application(s). They are all identical (except for the VIEWSTATE / EVENTVALIDATION values and session cookie values, which I would expect to differ).
A few additional points about the programmatic attempts:
The login step returns successful data, and the cookies are valid (otherwise the subsequent requests would all fail)
Updating the server with a person (step 3) returns successful responses, in that they are the same as would be returned from interaction using a browser. I can only assume this must mean the server is updating successfully with the person added.
A custom header is being added to requests in step 3 X-MicrosoftAjax: Delta=true (just like the browser requests are doing)
I don't own or have access to the server I'm scraping
Given that my application requests are identical to the browser requests that succeed, it baffles me that the server is treating them differently somehow. I can't help but feel that this is an ASP.net issue with forms that I'm overlooking. I'd appreciate any help.
Update:
I went over the raw requests again a bit more methodically, and it turns out I was missing something in the form parameters of the requests. Unfortunately, I don't think it will be of much use to anyone else, because it would seem to be specific to this particular ASP servers logic.
The POST request that notifies the server that all people have been added (step 4) requires two form parameters specifying the county and address of the last person that was added to the search. I was including these form parameters in my request, but the values were empty strings. I figured the browser request was just snagging these values because when the user hits the Continue button on the form, those controls would have the values of the last person added. I figured they wouldn't matter and forgot about them, but I was wrong.
It's a peculiar issue that I should have caught the first time. I can't complain though, I am scraping a site after all.
Review Charles logs again. It is possible that the search results and other content may be coming over via Ajax, and that your Java/Ruby apps are not actually doing all of the requests/responses that happen with the browser. Look for any POST or GET requests in between the requests you are already duplicating. If search results are populated via Javascript your client app may not be able to handle this?

Logging into a webpage via HTTP Request

So I have a webpage, ("http://data.terapeak.com/verify/") and I don't see any & tags in the URL so I am unaware how to post data to this. I need to do this via HTTPRequest rather than browser control. I am creating a double threaded batch searching program. I have already successfully made this using a single browser control but that wont allow for multi-threading, atleast with my current knowledge due to the fact that even when creating a new frmBrw that already exists it needs for me to set the threat apartment to single. If i set it to single, I am unable to have it send the data the the excel sheet I need both threads to access. I hope this is clear... The basic question is how can I log into this form via HTTP request.
This isn't going to be easy to answer without further details however I suspect you'll need to provide the variables via a HTTP POST request.
Can you successfully login to this page in your browser? If so, run a proxy tool such as fiddler and check the HTTP headers it makes to the server. You should see the form variables being passed over. You then need to mimic this in code.
How to: Send Data Using the WebRequest Class
Hope this gets you started

Resources