Why won't Splash render this webpage? - web-scraping

I'm quite new to Splash and tho I was able to get Splash setup on my Ubuntu 18 (via Splash/Docker) it gives me different results for this page:
Normally it's rendered like so:
But when I try to render it in Splash, it renders it like this:
I have tried changing the user agent in Splash to this:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36
Consequently, this makes the Splash script like so:
function main(splash, args)
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
Yet, despite these additions, it still fails to render the page.
How can I get Splash to render this page?

It seems like overstock.com requires a Connection and Accept headers. Add it to your request and it should work as expected.
Tested on Postman, with and without the Connection: keep-alive && Accept: */* headers; I get the same error page:
After adding the two headers above:
Therefor your request should be edited accordingly:
function main(splash, args)
["Connection"] = "keep-alive",
["Accept"] = "*/*",
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),


Scraping data from https://cardano.ideascale.com webpage, but server noticed I am using Internet Explorer

I am scraping the content of this link. And my procedure is:
GET-TOKEN to get a Bearer token.
GET Fork Gitcoin and deploy on Cardano using the above token in the header and get json content in response.
My issue was when i run my below code, when run get /detail I got response as I am using Internet Explorer to access, that is weird because my request header has "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36".
<div id="ie-unsupported-alert" class="ie-d-none">
<p>We noticed you are using Internet Explorer. We don\'t have support for this browser in Incoming Moderation!
<p>We recommend using the Microsoft Edge Browser, Chrome, Firefox or Safari. <a
href="https://help.ideascale.com/knowledge/internet-web-browsers-supported-by-ideascale">Click for more
Can anyone explain the error and teach me how to fix it?
Below is my python code.
import requests
def get_content(url):
s = requests.session()
response = s.get(f"https://cardano.ideascale.com/a/community/api/get-token")
if response.status_code != 200:
return None
cookies = response.cookies
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36",
'Accept': 'application/json,',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Cache-Control': 'no-cache',
'Pragma': 'no-cache',
'Authorization': f'Bearer {response.content.decode("utf-8")}',
'Alt-Used': 'cardano.ideascale.com',
'Connection': 'keep-alive',
'Referer': url,
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'TE': 'trailers',
#import ipdb; ipdb.set_trace()
response = s.get(f"{url}/detail", headers=headers, cookies=cookies)

Python Scrapy: How to login into ASP.net website

I try to make scripts to login into private website and crawl data with Scrapy.
However this website requested to login.
I used chrome to check network when do manual login and found out that have 3 request was sent out after i clicked login button.
The first is login
Login request
The second is checkuservalid
Check valid user
Request to index page
Get request to index page
Note: Request 1 and 2 just display and disappear after login success.
I try to do as some instruction with scrapy FormRequest, request_from respone but can not login.
Please help give me some advices for this case.
import scrapy
class LoginSpider(scrapy.Spider):
name = "Test"
start_urls = ['http://hvsfcweb.fushan.fihnbb.com/Login.aspx']
headers = {'Content-Type': 'application/json; charset=UTF-8',
'Referer': 'http://hvsfcweb.fushan.fihnbb.com/Login.aspx',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
def start_request(self):
yield scrapy.Request(url=self.start_urls,
headers = self.headers,
def parse(self, response):
filename = f'quotes.html'
with open(filename, 'wb') as f:

Scrapy - 403 error not solving even after adding Headers

I am trying to scrape doordash.com. But everytime I run the request it shows 403 and also this line INFO: Ignoring response <403 http://doordash.com/>: HTTP status code is not handled or not allowed.
I tried many things like adding User-Agent but still it didn't work. I also added full headers but again same thing is happening.
Here's my code:
class DoordashSpider(scrapy.Spider):
name = 'doordash'
allowed_domains = ['doordash.com']
start_urls = ['http://doordash.com/']
def start_requests(self):
headers= {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br'}
for url in self.start_urls:
yield scrapy.Request(url, headers=headers)
def parse(self, response):
print('Crawled Successfully')
How to get 200 ?

400 Bad request on python requests.get()

I am doing a bit of web scraping with political donations and have a link that I am scraping from one page than I then need to scrape. I can get the secondary links just fine, however when i try to send a requests.get() call, the html returned from the call gives me a bad request 400 error.
I've already tried to change the request around by changing or adding more headers but nothing seems to work.
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept - Encoding": "gzip, deflate",
"Accept-Language": "en-US,en;q=0.9",
"Cache - Control": "max - age = 0",
"Connection": "keep-alive",
"DNT": "1",
"Host": "docquery.fec.gov",
"Referer": "http://www.politicalmoneyline.com/tr/tr_MG_IndivDonor.aspx?tm=3",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
params = {
"14960391627": ""
pdf_page = requests.get(potential_donor[10], headers=headers, params=params)
html = pdf_page.text
soup_donor_page = BeautifulSoup(html, 'html.parser')
note: the url of the sites should look something like this:
with the ending digits being different
The output of the print(soup_donor_page) is:
400 Bad request
Your browser sent an invalid request.
I need to get the actual html of the page in order to grab the embedded pdf from the page.
I suspect the cause is an issue with requests that arises when it is provided a parameter without a value.
Try building the url with a format string instead:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
param = "14960391627"
r = requests.get(f"http://docquery.fec.gov/cgi-bin/fecimg/?{param}", headers=headers)
soup = BeautifulSoup(r.content, "html.parser")

ASP.NET Core Azure App Service httpContext.Request.Headers["Host"] Value

Faced strange behaviour today.
We are hosting asp.net core 1.1 web app with Azure App Services and using subdomains that route to a specific controller or area.
So in my SubdomainConstraint: IRouteConstraint I use
to get host name. That previously returned smth like that
mywebsite.com or subdomain.mywebsite.com
Starting today (or a maybe yesterday) it started to return my App Service name instead of host name. On localhost everything works fine.
Enumerating through
in one of my Views gives me on localhost:
Accept :
Accept-Encoding : gzip, deflate, sdch, br
Accept-Language : ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4,ca;q=0.2
Cookie : .AspNetCore.Antiforgery....
Host : localhost:37202
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Upgrade-Insecure-Requests : 1
in Azure App Service:
Connection : Keep-Alive
Accept : text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding : gzip, deflate, sdch
Accept-Language : ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4,ca;q=0.2
Cookie : AspNetCore.Antiforgery....
Host : mydeploymentname:80
Max-Forwards : 10
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Upgrade-Insecure-Requests : 1
X-LiveUpgrade : 1
X-WAWS-Unencoded-URL : /
X-Original-URL : /
X-ARR-LOG-ID : 9c76e796-84a8-4335-919c-9ca4rb745f4fefdfde
DISGUISED-HOST : mywebsite.com
X-SITE-DEPLOYMENT-ID : mydeploymentname
WAS-DEFAULT-HOSTNAME : mydeploymentname.azurewebsites.net
X-Forwarded-For : IP:56548
MS-ASPNETCORE-TOKEN : a97b93ba-6106-4301-87b2-8af9a929d7dc
X-Original-For :
X-Original-Proto : http
I can get what I need from
But having problems with redirects to a login page, it redirects to the wrong URL with my deployment name.
Wondering if I could mess something up anywhere. But we've made last deployment like a few days ago and it worked fine after that.
This is caused by a regression in AspNetCoreModule deployed to a small number of apps in Azure App Service. This issue is being investigated. Please follow this thread for status.
Here is a workaround you can use until the fix is deployed: in your Configure method (typically in startup.cs), add the following:
public void Configure(IApplicationBuilder app, IHostingEnvironment env, ILoggerFactory loggerFactory)
app.Use((ctx, next) =>
string disguisedHost = ctx.Request.Headers["DISGUISED-HOST"];
if (!String.IsNullOrWhiteSpace(disguisedHost))
ctx.Request.Host = new Microsoft.AspNetCore.Http.HostString(disguisedHost);
return next();
// Rest of your code here...
