IdHttp returns HTTP 1.1/ 403 Forbidden. HttpClient works fine - http

I'm using IdHTTP for a simple Get request:
procedure TForm1.FormCreate(Sender: TObject);
const
URL = 'https://www.udemy.com/course/the-modern-angular-bootcamp/';
begin
IdSSLIOHandler.SSLOptions.Method := sslvSSLv23;
IdHttp1.Request.UserAgent :=
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0';
Memo1.Lines.Text := IdHTTP1.Get(URL);
The call to IdHTTP1.Get results in the error "HTTP/1.1 403 Forbidden".
A similar question (Why do I get "403 Forbidden" when I connect to whatismyip.com?) says the UserAgent should be set to a modern value. I have done so, but it does not help.
I'm using OpenSSL Win32 v1.0.2u. Tokyo, Windows 10
Similar code using THttpClient works without any problem:
HttpClient := THTTPClient.Create;
try
Response := HttpClient.Get(URL);
Memo1.Lines.Text := Response.ContentAsString();
finally
HttpClient.Free;
end;

Related

Scrapy request not going through

I don't know how to frame this question exactly. I am beginner at web scraping and I am trying to crawl a website using Python Scrapy. The website is dynamic and uses javascript and can't retrieve any data using the basic level xpath and CSS selectors.
I am trying to mimic the API request through my spider by requesting the url which has the data in json object. That request url is throwing a HTTP status code is not handled or not allowed error.
I think I am calling the wrong URL. 9/10 times this method of calling the json object url directly has worked for me. What can I do different?
the url has parameters and form data items in the headers section and the url doesn't even look like a valid website url
it starts with https://ih3kc909gb-dsn.algolia.net/1/indexes....
I know this is a long question but I could really use some help with how to get a response for this?
You should use start_requests() method instead of start_urls property. You can read more about it from here . Now, all you need to do is to make a POST request.
Code
import scrapy
class carswitch(scrapy.Spider):
name = 'car'
headers = {
"Connection": "keep-alive",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"sec-ch-ua": "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"91\", \"Chromium\";v=\"91\"",
"accept": "application/json",
"sec-ch-ua-mobile": "?0",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
"content-type": "application/x-www-form-urlencoded",
"Origin": "https://carswitch.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://carswitch.com/",
"Accept-Language": "en-US,en;q=0.9"
}
body = '{"params":"query=&hitsPerPage=24&page=0&numericFilters=%5B%22country_id%3D1%22%2C%22used_car%20%3D%201%22%5D&facetFilters=&typoTolerance=&tagFilters=%5B%5D&attributesToHighlight=%5B%5D&attributesToRetrieve=%5B%22make%22%2C%22make_ar%22%2C%22model%22%2C%22model_ar%22%2C%22year%22%2C%22trim%22%2C%22displayTrim%22%2C%22colorPaint%22%2C%22bodyType%22%2C%22salePrice%22%2C%22transmissionType%22%2C%22GPS%22%2C%22carID%22%2C%22inspectionID%22%2C%22inspectionStatus%22%2C%22rate%22%2C%22certified_dealer_id%22%2C%22dealer_category%22%2C%22used_car%22%2C%22new%22%2C%22top_condition%22%2C%22featured%22%2C%22photo%22%2C%22modifiedPlace%22%2C%22city%22%2C%22mileage%22%2C%22urgent_sales%22%2C%22price_dropped%22%2C%22urgent_sales_days%22%2C%22urgent_sales_end_date%22%2C%22date%22%2C%22negotiable%22%2C%22oldPrice%22%2C%22zero_downpayment%22%2C%22cashOnly%22%2C%22hasPriceGuidance%22%2C%22dealerOffer%22%2C%22maxPrice%22%2C%22fairPrice%22%2C%22pricey_deal%22%2C%22fair_deal%22%2C%22good_deal%22%2C%22great_deal%22%2C%22dealership_info%22%2C%22logo_small%22%2C%22GCCspecs%22%2C%22country%22%2C%22export%22%2C%22monthly_price%22%5D"}'
def start_requests(self):
url = 'https://ih3kc909gb-dsn.algolia.net/1/indexes/All_Carswitch_Cars/query?x-algolia-agent=Algolia%20for%20JavaScript%20(3.33.0)%3B%20Browser&x-algolia-application-id=IH3KC909GB&x-algolia-api-key=493a9bbc57331df3b278fa39c1dd8f2d'
yield Request(url=url, method='POST', headers=self.headers, body=self.body, callback=self.parse)
def parse(self,response):
print(response.body)

HTTP Get request to IP-based host using Indy

I have some Delphi code that connects to a servlet and I´m trying to switch from TIdTCPClient to TIdHTTP.
I connect to the servlet this way
try
lHTTP := TIdHTTP.Create( nil );
responseStream := TMemoryStream.Create;
lHTTP.Get(HttpMsg, responseStream);
SetString( html, PAnsiChar(responseStream.Memory), responseStream.Size);
AnotarMensaje( odDepurar, 'IMPFIS: Impresora fiscal reservada ' + html );
Where HttpMsg is localhost:6080/QRSRPServer/PedirImpresion?usuarioDMS=hector
All I´m getting is
GET localhost:6080/QRSRPServer/PedirImpresion?usuarioDMS=hector HTTP/1.1
Content-Type: text/html
Accept: text/html, */*
User-Agent: Mozilla/3.0 (compatible; Indy Library)
HTTP/1.1 400 Bad Request
The HTTP dialog that I had before was like this
GET /QRSRPServer/PedirImpresion?usuarioDMS=hector HTTP/1.1
Host: localhost:6080
HTTP/1.1 200 OK
So, I try to add the Host header, with this host: localhost:6080
try
lHTTP := TIdHTTP.Create( nil );
lHTTP.Host := Host;
responseStream := TMemoryStream.Create;
lHTTP.Get(HttpMsg, responseStream);
SetString( html, PAnsiChar(responseStream.Memory), responseStream.Size);
AnotarMensaje( odDepurar, 'IMPFIS: Impresora fiscal reservada ' + html );
And I get
Socket Error # 11004
Where HttpMsg is localhost:6080/QRSRPServer/PedirImpresion?usuarioDMS=hector
HttpMsg must begin with http:// or https://:
http://localhost:6080/QRSRPServer/PedirImpresion?usuarioDMS=hector
You should be getting an EIdUnknownProtocol exception raised when TIdHTTP parses the URL and sees the missing protocol scheme.
TIdHTTP should always be sending a Host header, but especially for an HTTP 1.1 request, but you claim it is not. This is why you are getting a Bad Request error, because HTTP 1.1 servers are required to reject an HTTP 1.1 request that omits that header.
You also claim that TIdHTTP is including the host and port values in the GET line. The ONLY time it ever does that is when connecting to a host through an HTTP proxy, but I don't see you configuring the TIdHTTP.ProxyParams property at all.
In short, TIdHTTP should not be behaving the way you claim.
The correct solution is to make sure you are passing a full URL to TIdHTTP.Get().
On a side note, your code requires html to be an AnsiString. You should change it to a standard string (which is AnsiString in D2007 and earlier) and let TIdHTTP return a string for you, then you don't need the TMemoryStream anymore:
html := lHTTP.Get(HttpMsg);
It was easier than I thought. I was assuming that having a "host" paremeter that included the port would be enough but looking at a Wireshark capture I saw it was sending everything over the standard HTTP port.
So this did the trick
try
lHTTP := TIdHTTP.Create( nil );
lHTTP.Host := GatewayIp;
lHTTP.Port := GatewayPuerto;
responseStream := TMemoryStream.Create;
lHTTP.Request.CustomHeaders.Clear;
lHTTP.Request.CustomHeaders.Add('Host: ' + Host );
lHTTP.Get(HttpMsg, responseStream);
SetString( html, PAnsiChar(responseStream.Memory), responseStream.Size);
AnotarMensaje( odDepurar, 'IMPFIS: Impresora fiscal reservada ' + html );

ASP.NET Core Azure App Service httpContext.Request.Headers["Host"] Value

Faced strange behaviour today.
We are hosting asp.net core 1.1 web app with Azure App Services and using subdomains that route to a specific controller or area.
So in my SubdomainConstraint: IRouteConstraint I use
HttpContext.Request.Headers["Host"]
to get host name. That previously returned smth like that
mywebsite.com or subdomain.mywebsite.com
Starting today (or a maybe yesterday) it started to return my App Service name instead of host name. On localhost everything works fine.
Enumerating through
Context.Request.Headers
in one of my Views gives me on localhost:
Accept :
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding : gzip, deflate, sdch, br
Accept-Language : ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4,ca;q=0.2
Cookie : .AspNetCore.Antiforgery....
Host : localhost:37202
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Upgrade-Insecure-Requests : 1
in Azure App Service:
Connection : Keep-Alive
Accept : text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding : gzip, deflate, sdch
Accept-Language : ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4,ca;q=0.2
Cookie : AspNetCore.Antiforgery....
Host : mydeploymentname:80
Max-Forwards : 10
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Upgrade-Insecure-Requests : 1
X-LiveUpgrade : 1
X-WAWS-Unencoded-URL : /
X-Original-URL : /
X-ARR-LOG-ID : 9c76e796-84a8-4335-919c-9ca4rb745f4fefdfde
DISGUISED-HOST : mywebsite.com
X-SITE-DEPLOYMENT-ID : mydeploymentname
WAS-DEFAULT-HOSTNAME : mydeploymentname.azurewebsites.net
X-Forwarded-For : IP:56548
MS-ASPNETCORE-TOKEN : a97b93ba-6106-4301-87b2-8af9a929d7dc
X-Original-For : 127.0.0.1:55602
X-Original-Proto : http
I can get what I need from
Headers["DISGUISED-HOST"]
But having problems with redirects to a login page, it redirects to the wrong URL with my deployment name.
Wondering if I could mess something up anywhere. But we've made last deployment like a few days ago and it worked fine after that.
This is caused by a regression in AspNetCoreModule deployed to a small number of apps in Azure App Service. This issue is being investigated. Please follow this thread for status.
Here is a workaround you can use until the fix is deployed: in your Configure method (typically in startup.cs), add the following:
public void Configure(IApplicationBuilder app, IHostingEnvironment env, ILoggerFactory loggerFactory)
{
app.Use((ctx, next) =>
{
string disguisedHost = ctx.Request.Headers["DISGUISED-HOST"];
if (!String.IsNullOrWhiteSpace(disguisedHost))
{
ctx.Request.Host = new Microsoft.AspNetCore.Http.HostString(disguisedHost);
}
return next();
});
// Rest of your code here...
}

Convert XHR (XML Http Request) into R command

I am trying to turn an XHR (XMLHttpRequest) request into an R command.
I am using the following code:
library(httr)
x <- POST("https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/getDataTableDetailData/?name=&defaultValue=false&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00%3A00%7CUTC%7CDAYTIMERANGE&dateTime.endDateTime=17.03.2017+00%3A00%7CUTC%7CDAYTIMERANGE&area.values=CTY%7C10YBE----------2!BZN%7C10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable-detail_22WAMERCO000010Y_22WAMERCO000008L_length=10&dv-datatable_length=50&detailId=22WAMERCO000010Y_22WAMERCO000008L",
user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.50 Safari/537.36"),
add_headers(`Referer`="https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/show?name=&defaultValue=true&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&dateTime.endDateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&area.values=CTY|10YBE----------2!BZN|10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&productionType.values=B20&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable_length=100",
Connection = "keep-alive",
Host = "https://transparency.entsoe.eu/",
Accept = "application/json, text/javascript, */*; q=0.01",
`Accept-Encoding` = "gzip, deflate, br",
Origin = "https://transparency.entsoe.eu",
`X-Requested-With` = "XMLHttpRequest",
`Content-Type` = "application/json;charset=UTF-8",
`Accept-Language`= "en-US,en;q=0.8,nl;q=0.6,fr-FR;q=0.4,fr;q=0.2"))
But I keep getting an 400 error: bad request instead of the 200 which would mark a successful response.
I've extracted the values via the Chrome network monitor from this website. The XHR request is sent when the plus button is clicked. I can send it repeatedly from my browser, but it doesn't seem to work from R.
What am I doing wrong in creating the Post request?

How to avoid some sites rejecting HTTP get using go

We have a script that on a daily basis checks all of the web links in all of our database records (the users want notifications when a link becomes out of date).
There are a couple of sites that work fine through a web browser from this IP address, but when fetched through GO, they either disconnect before completing the request or return a HTTP authorisation denied message.
I am assuming some sort of firewall (F5) is filtering/blocking the request. This occurs even when I change the HTTP request to use a common user agent. What can we do to ensure a GO request looks like a standard browser?
func fetch_url(url string, d time.Duration) (int, error) {
client := &http.Client{
Timeout: d,
}
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return 0, err
}
req.Header.Set("User-Agent", "Mozilla/5.0 (iPad; CPU OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53")
resp, err := client.Do(req)
if err != nil {
return 0, err
}
status := resp.StatusCode
resp.Body.Close()
return status, nil
}
Try matching the exact headers from a request from your web browser to eliminate other factors. A smart firewall could have heuristics on what looks like a web browser versus a robot.
Notice that the go http client sends only a minimal HTTP request:
GET /foo HTTP/1.1
Host: localhost:3030
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
Whereas a web browser is more chatty:
GET /foo HTTP/1.1
Host: localhost:3030
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

Resources