I am trying to generate html reports for a simple test where i do a HTTP GET which involves a 302 Redirect. The stats generated is bit confusing where it shows 3 different HTTP requests as below.
Jmeter statistics
I am using 3000 thread and i see i am getting almost 9000 samples. I assume there can be two HTTP requests 1. The original one and 2. the Redirect following. But why i am getting 3 HTTP requests. Am i missing some thing.
I am using the following to run and generate report:
jmeter.sh -n -t -l
jmeter.sh -g -o ./analysis
The report is fairly straight forward if redirects are not involved though.
They may stand for:
Redirects (HTTP Statuses 3xx). In this case you should cross-check JMeter's behaviour with real browser's network footprint. You can see which requests browser is sending using browser developer tools. If the number matches - you should be good to go as you're properly simulating real browser's behaviour. If not - play with Redirect Automatically and Follow Redirects checkboxes in the HTTP Request sampler:
Embedded Resources (images, scripts, styles, fonts, sounds, etc.). This is pretty normal as downloading resources is what real browsers do. Just make sure to add HTTP Cache Manager to your Test Plan as real browsers request these items only once and make sure that no external resources are in scope (coming from CDNs or 3rd-party services) as you should not include external services into the scope of your load test
Related
I'm trying to fetch data from a website (https://gesetze.berlin.de/bsbe/search). Using Mozilla, I've taken a look at the network analysis. Usually, I'm just messing around with the parameters of the POST-Request to see how I might influence the response of the server. But when I simply re-send the request (making no changes at all), I'm getting HTTP-response 500. The server answer states as message: security_notAuthenticated.
Can anyone explain that behaviour? The request is done by the same PC, the same browser in the same session, and there is no login function on that website. Pictures shown below.
Picture 1 - Code 200
Picture 2 - Code 500
The response security_notAuthenticated indicates, that your way of repeating the request omits some authentication-related information.
When I repeat the request, using Mozilla Firefox's "Resend" or "Edit and resend" function, the Cookie header is not sent with the request. Although it occurs in the editable header list when using "Edit and resend" it's missing when watching the actual sent request. I'm not sure whether this is a feature or a bug.
When using Firefox's "Use as Fetch in Console" function, the header will automatically be included and you still have the ability to change the headers and the body. The fetch API is a web standard and some introductory material about fetch can be found on MDN.
If you want to do custom requests, in the browser, fetch is a good option.
In other environments and languages you usually use some HTTP client (just search the web for "...your language... http request" or similar, you will find something).
I've been trying to load test a Wordpress site and I'm seeing many sub-responses under the main sampler response in 'View Result Tree' listener. This is probably resulting in more load time displayed in Jmeter as well. I've tried enabling/disabling the 'Retrieve All Embedded Resources' advanced setting of sampler and it has not made a difference.
I want to see only those samplers which are part of my script in 'View Results Tree'. How can I get rid of sub-responses appearing under those samplers in 'View Results Tree'?
If you are recording, Then you have option to skip files with desired extension in Jmeter. So you can skip *.png files and they wont show up in the recorded script.
In HTTP(S) Test Script Recorder there is a tab called Request Filtering.
So when you run the Jmeter script these request will not show up in the listener.
It might be the case you have embedded resources retrieval enabled in the HTTP Request Defaults, if this is the case - it impacts all the HTTP Request samplers, no matter what you set there.
The question is why do you want to disable it? It makes sense only to disable requests to external domains (like Google, Facebook, etc.) so you would focus only on your application.
Downloading images, scripts, fonts, styles, etc. is what real browsers are doing so your script should be doing this as well. Just make sure to add HTTP Cache Manager to ensure that the resources are downloaded only once or according to Cache-Control headers
More information: Web Testing with JMeter: How To Properly Handle Embedded Resources in HTML Responses
I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1®ion=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)
I visit en.wikipedia.org/wiki/Hello while keeping open Chrome console: in Network tab I can check HTTP requests' content: the first one to be called is:
GET https://en.wikipedia.org/wiki/Hello -> 200
then, a lot of others HTTP requests are handled: the Wikipedia logo .png, some CSS, scripts and others file are downloaded to my browser and together they render the actual page of Wikipedia.
With requests, I want to do the same thing: a simple
requests.get("https://en.wikipedia.org/wiki/Hello")
will return me the HTML document of Hello page, but no other resource will be downloaded.
I want to keep trace of the number of connections opened to render a page and what elements are downloaded; the GET request above will not return images, CSS or scripts.
I think I'm missing something important: who does know what are all the necessary resources required to completely load a web page?
I'm asking this because I want (with requests) know what resources are downloaded and how many connections did it take to get them.
I think the server is the one who knows what a page needs to be loaded, so the server should tell this information to the client, but I'm missing where: I did not find anything in HTTP request headers.
I need this list/dictionary/JSON/whatever of resources necessary to fully render a page, so I can manually do it with Python.
High five myself XD
The other required resources are (listed) in the first downloaded resource: the HTML document.
I'm going to parse it (BeautifulSoup4) and get what I need (<link rel=... href=... />), this should get me the number of downloads and resources the page needs.
As for the number of connections, I read about HTTP keep-alive: so if a single TCP connection is used to download resources, I don't have to worry about how many connections are opened since HTTP 1.1 connections are kept alive by default. I should just check if it is using HTTP 1.0, if so look for Connection: keep-alive header.
I want to change first line of the HTTP header of my request, modifying the method and/or URL.
The (excellent) Tamperdata firefox plugin allows a developer to modify the headers of a request, but not the URL itself. This latter part is what I want to be able to do.
So something like...
GET http://foo.com/?foo=foo HTTP/1.1
... could become ...
GET http://bar.com/?bar=bar HTTP/1.1
For context, I need to tamper with (make correct) an erroneous request from Flash, to see if an error can be corrected by fixing the url.
Any ideas? Sounds like something that may need to be done on a proxy level. In which case, suggestions?
Check out Charles Proxy (multiplatform) and/or Fiddler2 (Windows only) for more client-side solutions - both of these run as a proxy and can modify requests before they get sent out to the server.
If you have access to the webserver and it's running Apache, you can set up some rewrite rules that will modify the URL before it gets processed by the main HTTP engine.
For those coming to this page from a search engine, I would also recommend the Burp Proxy suite: http://www.portswigger.net/burp/proxy.html
Although more specifically targeted towards security testing, it's still an invaluable tool.
If you're trying to intercept the HTTP packets and modify them on the way out, then Tamperdata may be route you want to take.
However, if you want minute control over these things, you'd be much better off simulating the entire browser session using a utility such as curl
Curl: http://curl.haxx.se/