Simulate file download with Gatling - http

Good morning,
I would like to simulate a file download with Gatling. I'm not sure that a simple get request on a file ressource really simulate it:
val stuffDownload: ScenarioBuilder = scenario("Download stuff")
.exec(http("Download stuff").get("https://stuff.pdf")
.header("Content-Type", "application/pdf")
.header("Content-Type", "application/force-download"))
I want to challenge my server with multiple downloads within the same moment and I need to be sure I have the right method to do it.
Thanks in advance for your help.
EDIT: Other headers I send:
"User-Agent" -> "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
"Accept" -> "application/json, text/plain, */*; q=0.01",
"Accept-Encoding" -> "gzip, deflate, br",
"Accept-Language" -> "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7",
"DNT" -> "1",
"Connection" -> "keep-alive"

It looks technically globally fine except that:
you have 2 Content-Type ?
Is there a mistake in second one ?
Also aren't you missing other browser headers like User-Agent ?
Aren't you missing an important one related to Compression like Accept-Encoding ?
But regarding functional part, aren't you missing some steps before it ?
I mean do your user access immediately the link or do they hit a login screen , then do a search and finally click on a link ?
Also, is it always the same file ? Shouldn't you introduce a kind of variability using Gatling CSV Feeders for example with a set of files ?

Related

Instagram blocks me for the requests with 429

I have used a lot of requests to https://www.instagram.com/{username}/?__a=1 to check if a pseudo was existing and now I am getting 429.
Before, I had just to wait few minutes to make the 429 disapear. Now it is persistent ! :( I'm trying once a day, it doesnt work anymore.
Do you know anything about instagram requests limitation ?
Do you have any workaround please ? Thanks
Code ...
import requests
r = requests.get('https://www.instagram.com/test123/?__a=1')
res = str(r.status_code)
Try adding the user-agent header, otherwise, the website thinks that your a bot, and will block you.
import requests
URL = "https://www.instagram.com/bla/?__a=1"
HEADERS = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"}
response = requests.get(URL, headers=HEADERS)
print(response.status_code) # <- Output: 200

When do I have to set headers and how do I get them?

I am trying to crawl some information from www.blogabet.com.
In the mean time, I am attending a course at udemy about webcrawling. The author of the course I am enrolled in already gave me the answer to my problem. However, I do not fully understand why I have to do the specific steps he mentioned. You can find his code bellow.
I am asking myself:
1. For which websites do I have to use headers?
2. How do I get the information that I have to provide in the header?
3. How do I get the url he fetches? Basically, I just wanted to fetch: https://blogabet.com/tipsters
Thank you very much :)
scrapy shell
from scrapy import Request
url = 'https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=0'
page = Request(url,
headers={'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,pl;q=0.8,de;q=0.7',
'Connection': 'keep-alive',
'Host': 'blogabet.com',
'Referer': 'https://blogabet.com/tipsters',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'})
fetch(page)
If you look in your network panel when you load that page you can see the XHR and the headers it sends
So it looks like he just copied those.
In general you can skip everything except User-Agent and you want to avoid setting Host, Connection and Accept headers unless you know what you're doing.

Scrapy Shell: twisted.internet.error.ConnectionLost although USER_AGENT is set

When I try to scrape a certain web site (with both, spider and shell), I get the following error:
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
I found out that this can happen, when no user agent is set.
But after setting it manually, I still got the same error.
You can see the whole output of scrapy shell here: http://pastebin.com/ZFJZ2UXe
Notes:
I am not behind a proxy, and I can access other sites via scrapy shell without problems. I am also able to access the site with Chrome, so it is not a network or connection issue.
Maybe someone can give me a hint how I could solve this problem?
Here is 100% working code.
What you need to do is you have to send request headers as well.
Also set ROBOTSTXT_OBEY = False in settings.py
# -*- coding: utf-8 -*-
import scrapy, logging
from scrapy.http.request import Request
class Test1SpiderSpider(scrapy.Spider):
name = "test1_spider"
def start_requests(self):
headers = {
"Host": "www.firmenabc.at",
"Connection": "keep-alive",
"Cache-Control": "max-age=0",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"DNT": "1",
"Accept-Encoding": "gzip, deflate, sdch",
"Accept-Language":"en-US,en;q=0.8"
}
yield Request(url= 'http://www.firmenabc.at/result.aspx?what=&where=Graz', callback=self.parse_detail_page, headers=headers)
def parse_detail_page(self, response):
logging.info(response.body)
EDIT:
You can see what headers to send by inspecting the URLs in Dev Tools

issue with HTTP header settings

I am trying to pull a web page in my client (not a browser) with the following settings in the HTTP header
Accept: "text/html;charset=UTF-8"
Accept-Charset: "ISO-8859-1"
User-Agent: "Mozilla/5.0"
however I get an error code 406,
I also tried changing to;
Accept: "text/html"
with no success; error code and status message in the response header is
statusCode: 406
statusMessage: "Not Acceptable"
any idea waht the correct header settings should be, the page loads fine in the browser
Finally figured it out, I ran a sniffer to see which header settings worked, and here is what worked in every case
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; de-de) AppleWebKit/523.10.3 (KHTML, like Gecko) Version/3.0.4 Safari/523.10',
'Accept-Charset': 'ISO-8859-1,UTF-8;q=0.7,*;q=0.7',
'Accept-language' : 'de,en;q=0.7,en-us;q=0.3'
}
You should add Accept-Language. See Here
Why are you sending contradictory headers? You are requesting a representation that is both UTF8 and ISO-8859-1 at the same time. I guess that you could interpret the request as being for 7-bit ASCII representation.
In this case I would omit the Accept-Charset and change the Accept header to text/html, */*;q=0.1 so that you will get something back with a strong preference for HTML. See the Content Negotiation section of RFC7231 for details about these headers.

Illegal characters in path depending on User-Agent?

I have two identical calls to ASP.NET, the only difference is the User-Agent. I used Fiddler to reproduce the issue.
The HTTP request line is:
PUT http://localhost/API/es/us/havana/club/tickets/JiWOUUMxukGVWwVXQnjgfw%7C%7C214 HTTP/1.1
Works with:
User-Agent: Mozilla/5.0 (Linux; Android 4.3; Nexus 10 Build/JSS15Q) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2307.2 Safari/537.36
Fails with:
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4
Everything else is 100% the same.
In my case, the root cause was MVC's MultipleViews and DisplayMode providers. This allows MVC apps to magically pick up device-specific views; e.g.
custom.cshtml
customer.mobile.cshtml
This article has a good explanation of the functionality as well as details how to turn it off:
https://learn.microsoft.com/en-us/archive/msdn-magazine/2013/august/cutting-edge-creating-mobile-optimized-views-in-asp-net-mvc-4-part-2-using-wurfl
I fixed this by adding Microsoft.AspNet.WebPages package to my project and adding a call to this code in my startup (application_start in global.asax or if using OWIN, the method decordated w/ OwinStartup attribute):
public static void RegisterDisplayModes()
{
// MVC has handy helper to find device-specfic views. Ain't no body got time for that.
dynamic modeDesktop = new DefaultDisplayMode("") { ContextCondition = (c => { return true; }) };
dynamic displayModes = DisplayModeProvider.Instance.Modes;
displayModes.Clear();
displayModes.Add(modeDesktop);
}

Resources