Python requests POST returns 400 instead 200 - python-requests

I'm expecting 200 though 400 get's returned.
Does one see what I'm doing wrong in my request?
Code:
import requests
import json
import lxml.html
from lxml.cssselect import CSSSelector
from lxml.etree import fromstring
SELECTOR = CSSSelector('[type=hidden]')
BASE_URL = 'https://www.bonuscard.ch/myos/en/login'
LOGIN_URL = BASE_URL+'1.IFormSubmitListener-homePanel-loginPanel-loginForm'
# headers copied from chromium (returns 200)
headers = {
"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding" : "gzip, deflate, br",
"Accept-Language" : "en,de;q=0.9",
"Cache-Control" : "no-cache",
"Connection" : "keep-alive",
"Content-Length" : "151",
"Content-Type" : "application/x-www-form-urlencoded",
"DNT" : "1",
"Host" : "www.bonuscard.ch",
"Origin" : "https: //www.bonuscard.ch",
"Pragma" : "no-cache",
"Referer" : "https: //www.bonuscard.ch/myos/en/login",
"Upgrade-Insecure-Requests" : "1",
"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
}
with requests.Session() as session:
response = session.get(BASE_URL)
tree = lxml.html.fromstring(response.content)
keyOnly_token = [e.get('id') for e in SELECTOR(tree)][0]
payload = {
keyOnly_token:"",
"userName-border:userName-border_body:userName ": "jon#doe.com",
"password-border:password-border_body:password ": "123",
"login ": ""
}
response = session.post(LOGIN_URL,headers=headers,data=payload)
# Returns 400
print(response)
These changes displayed no difference either:
POST without headers
POST with json=payload instead of data=payload

Thanks to Ivan's direction I found this curl to requests converter which was the solution https://curl.trillworks.com/#

Related

How to scrape url links when the website takes us to a splash screen?

import requests
from bs4 import BeautifulSoup
import re
R = []
url = "https://ascscotties.com/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; ' \
'Intel Mac OS X 10.6; rv:16.0) Gecko/20100101 Firefox/16.0'}
reqs = requests.get(url, headers=headers)
soup = BeautifulSoup(reqs.text, 'html.parser')
links= soup.find_all('a',href=re.compile("roster"))
s=[url + link.get("href") for link in links]
for i in s:
r = requests.get(i, allow_redirects=True, headers=headers)
if r.status_code < 400:
R.append(r.url)
Output
['https://ascscotties.com/sports/womens-basketball/roster',
'https://ascscotties.com/sports/womens-cross-country/roster',
'https://ascscotties.com/sports/womens-soccer/roster',
'https://ascscotties.com/sports/softball/roster',
'https://ascscotties.com/sports/womens-tennis/roster',
'https://ascscotties.com/sports/womens-volleyball/roster']
The code looks for roster links from url's and gives output, but like "https://auyellowjackets.com/" it fails as the url takes use to a splash screen. What can be done?
The site uses a cookie to indicate it has shown a splash screen before. So set it to get to the main page:
import re
import requests
from bs4 import BeautifulSoup
R = []
url = "https://auyellowjackets.com"
cookies = {"splash_2": "splash_2"} # <--- set cookie
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; "
"Intel Mac OS X 10.6; rv:16.0) Gecko/20100101 Firefox/16.0"
}
reqs = requests.get(url, headers=headers, cookies=cookies)
soup = BeautifulSoup(reqs.text, "html.parser")
links = soup.find_all("a", href=re.compile("roster"))
s = [url + link.get("href") for link in links]
for i in s:
r = requests.get(i, allow_redirects=True, headers=headers)
if r.status_code < 400:
R.append(r.url)
print(*R, sep="\n")
Prints:
https://auyellowjackets.com/sports/mens-basketball/roster
https://auyellowjackets.com/sports/mens-cross-country/roster
https://auyellowjackets.com/sports/football/roster
https://auyellowjackets.com/sports/mens-track-and-field/roster
https://auyellowjackets.com/sports/mwrest/roster
https://auyellowjackets.com/sports/womens-basketball/roster
https://auyellowjackets.com/sports/womens-cross-country/roster
https://auyellowjackets.com/sports/womens-soccer/roster
https://auyellowjackets.com/sports/softball/roster
https://auyellowjackets.com/sports/womens-track-and-field/roster
https://auyellowjackets.com/sports/volleyball/roster

How to web scrape AQI from airnow?

I am trying to scrape the current AQI in my location by beautifulsoup 4.
url = "https://www.airnow.gov/?city=Burlingame&state=CA&country=USA"
header = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"
}
response = requests.get(url, headers=header)
soup = BeautifulSoup(response.content, "lxml")
aqi = soup.find("div", class_="aqi")
when I print the aqi, it is just empty div like this:
However, on the website, there should be a element inside this div containing the aqi number that I want.

Implement curl request in R

I am trying to access the Amadeus travel API
To obtain a token, the given curl is:
curl "https://test.api.amadeus.com/v1/security/oauth2/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials&client_id={client_id}&client_secret={client_secret}"
My RScript Attempt is:
library("httr")
# Get Token
response <- POST("https://test.api.amadeus.com/v1/security/oauth2/token",
add_headers("Content-Type" = "application/x-www-form-urlencoded"),
body = list(
"grant_type" = "client_credentials",
"client_id" = API_KEY,
"client_secret" = API_SECRET),
encode = "json")
response
rsp_content <- content(response, as = "parsed", type = "application/json")
rsp_content
Resulting in the error:
Response [https://test.api.amadeus.com/v1/security/oauth2/token]
Date: 2021-07-23 00:59
Status: 400
Content-Type: application/json
Size: 217 B
{
"error":"invalid_request",
"error_description": "Mandatory grant_type form parameter missing",
"code": 38187,
"title": "Invalid parameters"
}
>
What is the correct way to call this API to obtain a token using R?
The curl -d option is used to send data in the same way an HTML form would. To match that format, use encode="form" rather than encode="json" in the call to POST().

I have scrapy script, but I can not scrape data, don't knew why

I run the script, but I got none, but there are data on the url
# -*- coding: utf-8 -*-
import scrapy
from scrapy.selector import Selector
class GetSpider(scrapy.Spider):
name = 'gets'
start_urls = ['https://www.retailmenot.com/coupons/insurance?u=ZTF65B5PJZEU3JDF326WY2SXOQ']
def parse(self, response):
s = Selector(response)
code = s.xpath("//button[contains(#class,'CopyCode')][1]/text()").get()
yield {'code':code}
I expect 52YR, but i got None
The easiest way to go about this is probably to load the json in the script as a python dictionary and navigate through it to get to the codes.
The below code should get you started:
import scrapy
import json
import logging
class GetSpider(scrapy.Spider):
name = 'gets'
start_urls = ['https://www.retailmenot.com/coupons/insurance?u=ZTF65B5PJZEU3JDF326WY2SXOQ']
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
}
custom_settings = {'ROBOTSTXT_OBEY': False}
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url,
callback=self.parse,
headers=self.headers,
dont_filter=True)
def parse(self, response):
script = response.xpath(
'//script[contains(text(), "__NEXT_DATA__")]/text()'
).extract_first()
dict_start_index = script.index('{')
dict_end_index = script.index('};') + 1
data = json.loads(script[dict_start_index:dict_end_index])
coupon_data = data['props']['pageProps']['serverState']['apollo']['data']
for key, value in coupon_data.items():
try:
code = value['code']
except KeyError:
logging.debug("no code found")
else:
yield {'code': code}

nginx keeps passing the same http_cookie to uwsgi

I have a small python app running via uwsgi with requests served by nginx.
I'm printing the environment variables... and it looks like after a couple of ok requests, nginx is sending the same HTTP_COOKIE param for unrelated requests:
For example:
{'UWSGI_CHDIR': '/ebs/py', 'HTTP_COOKIE':
'ge_t_c=4fcee8450c3bee709800920c', 'UWSGI_SCRIPT': 'server',
'uwsgi.version': '1.1.2', 'REQUEST_METHOD': 'GET', 'PATH_INFO':
'/redirect/ebebaf3b-475a-4010-9a72-96eeff797f1e', 'SERVER_PROTOCOL':
'HTTP/1.1', 'QUERY_STRING': '', 'x-wsgiorg.fdevent.readable':
, 'CONTENT_LENGTH': '',
'uwsgi.ready_fd': None, 'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible;
MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)', 'HTTP_CONNECTION':
'close', 'HTTP_REFERER': 'http://www.facebook.com/', 'SERVER_NAME':
'pixel.domain.com', 'REMOTE_ADDR': '10.load.bal.ip',
'wsgi.url_scheme': 'http', 'SERVER_PORT': '80', 'wsgi.multiprocess':
True, 'uwsgi.node': 'py.domain.com', 'DOCUMENT_ROOT':
'/etc/nginx/html', 'UWSGI_PYHOME': '/ebs/py', 'uwsgi.core': 127,
'HTTP_X_FORWARDED_PROTO': 'http', 'x-wsgiorg.fdevent.writable':
, 'wsgi.input':
,
'HTTP_HOST': 'track.domain.com', 'wsgi.multithread': False,
'REQUEST_URI': '/redirect/ebebaf3b-475a-4010-9a72-96eeff797f1e',
'HTTP_ACCEPT': 'text/html, application/xhtml+xml, /',
'wsgi.version': (1, 0), 'x-wsgiorg.fdevent.timeout': None,
'HTTP_X_FORWARDED_FOR': '10.load.bal.ip', 'wsgi.errors': , 'REMOTE_PORT': '36462',
'HTTP_ACCEPT_LANGUAGE': 'en-US', 'wsgi.run_once': False,
'HTTP_X_FORWARDED_PORT': '80', 'CONTENT_TYPE': '',
'wsgi.file_wrapper': ,
'HTTP_ACCEPT_ENCODING': 'gzip, deflate'}
and
{'UWSGI_CHDIR': '/ebs/py', 'HTTP_COOKIE':
'ge_t_c=4fcee8450c3bee709800920c', 'UWSGI_SCRIPT': 'server',
'uwsgi.version': '1.1.2', 'REQUEST_METHOD': 'GET', 'PATH_INFO':
'/redirect/2391e658-95ef-4300-80f5-83dbb1a0e526', 'SERVER_PROTOCOL':
'HTTP/1.1', 'QUERY_STRING': '', 'x-wsgiorg.fdevent.readable':
, 'CONTENT_LENGTH': '',
'uwsgi.ready_fd': None, 'HTTP_USER_AGENT': 'Mozilla/5.0 (iPad; CPU OS
5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko)
Version/5.1 Mobile/9B206 Safari/7534.48.3', 'HTTP_CONNECTION':
'close', 'HTTP_REFERER': 'http://www.facebook.com/', 'SERVER_NAME':
'pixel.domain.com', 'REMOTE_ADDR': '10.load.balancer.ip',
'wsgi.url_scheme': 'http', 'SERVER_PORT': '80', 'wsgi.multiprocess':
True, 'uwsgi.node': 'py.domain.com', 'DOCUMENT_ROOT':
'/etc/nginx/html', 'UWSGI_PYHOME': '/ebs/py', 'uwsgi.core': 127,
'HTTP_X_FORWARDED_PROTO': 'http', 'x-wsgiorg.fdevent.writable':
, 'wsgi.input':
,
'HTTP_HOST': 'fire.domain.com', 'wsgi.multithread': False,
'REQUEST_URI': '/redirect/2391e658-95ef-4300-80f5-83dbb1a0e526',
'HTTP_ACCEPT':
'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',
'wsgi.version': (1, 0), 'x-wsgiorg.fdevent.timeout': None,
'HTTP_X_FORWARDED_FOR': '10.load.bal.ip', 'wsgi.errors': , 'REMOTE_PORT': '39498',
'HTTP_ACCEPT_LANGUAGE': 'en-us', 'wsgi.run_once': False,
'HTTP_X_FORWARDED_PORT': '80', 'CONTENT_TYPE': '',
'wsgi.file_wrapper': ,
'HTTP_ACCEPT_ENCODING': 'gzip, deflate'}
These are 2 distinct clients. I opened an incognito session, confirmed that no cookie was sent in the headers, and the uwsgi log shows that it received the same HTTP_COOKIE.
How can I make sure that nginx only passes the proper information for the current request, without regard to other requests?
Figured it out...
I had to add this line to uwsgi_params in /etc/nginx/
uwsgi_param HTTP_COOKIE $http_cookie;
Without it, the HTTP_COOKIE variable could not be trusted in uwsgi/python app.

Resources