I want to use requests to web scrape on a login site. I already done the code using selenium but it is very inconvenient and slower to do it that way as I want to make it public(every user has to download chrome driver).
The problem is, there are multiple requests from the site and I don't have any experience processing that data and extracting the header data and name. Any help is great, thanks.
[Premise]
Using requests module you can send requests in these way:
import requests
url = "http://www.example.com" # request url
headers = { # headers dict to send in request
"header_name": "headers_value",
}
params = { # params to be encoded in the url
"param_name": "param_value",
}
data = { # data to send in the request body
"data_name": "data_value",
}
# Send GET request.
requests.get(url, params=params, headers=headers)
# Send POST request.
requests.post(url, params=params, headers=headers, data=data)
Once you perform a request, you can get much information from the response object:
>>> import requests
# We perform a request and get the response object.
>>> response = requests.get(url, params=params, headers=headers)
>>> response = requests.post(url, params=params, headers=headers, data=data)
>>> response.status_code # server response status code
>>> 200 # eg.
>>> response.request.method
>>> 'GET' # or eventually 'POST'
>>> response.request.headers # headers you sent with the request
>>> {'Accept-Encoding': 'gzip, deflate, br'} # eg.
>>> response.request.url # sent request url
>>> 'http://www.example.com'
>>> response.response.body
>>> 'name=value&name2=value2' # eg.
In conclusion, you can retrieve all the information that you can find in Dev Tools in the browser, from the response object. You need nothing else.
Dev Tools view
Dev Tool view 2
Once you send a GET or POST requests you can retrieve information from Dev Tools:
In General:
Request URL: the url you sent the request to. Corresponds to response.request.url
Request Method: corresponds to response.request.method
Status Code: corresponds to response.status_code
In Response Headers:
You find response headers which correspond to response.headers
eg. Connection: Keep-Alive,
Content-Length: 0,
Content-Type: text/html; charset=UTF-8...
In Requests Headers:
You find request headers which correspond to response.request.headers
In Form Data:
You can find the data you passed with data keyword in requests.post.
Corresponds to response.request.body
Related
def get_all_patent():
patent_list = []
for i in range(100):
res = requests.get(url).text
patent_list.append(res)
return patent_list
Because scrapy can't get response from request,reference:How can I get the response from the Request in Scrapy?
I want to extend the variable patent_list,But I can't get response body.
Can I through the Request meta or do something in Response?
can anyone help me how to get partial data from url ,these code is giving me failed to decode error
import requests
url = "http://tools.ietf.org/rfc/rfc2822.txt"
start=24
end=30
headers = {"Range":f"bytes={start}-{end}"}
r = requests.get(url,stream=True,headers=headers)
print(r.text)
You ask for a range of the resource, which the server reponds with, but due to the standard header Accept-Encoding: "gzip, deflate", the server sends the bytes back of the encoded resource. In order to retrieve the non-encoded bytes use the following header:
import requests
url = "http://tools.ietf.org/rfc/rfc2822.txt"
start=24
end=30
headers = {"Range":f"bytes={start}-{end}", "Accept-Encoding": None}
r = requests.get(url,stream=True,headers=headers)
print(r.text)
I am trying to send an API call to get the time from the Questrade platform. Here is the sample request from their guide
GET /v1/time HTTP/1.1
Host: https://api01.iq.questrade.com
Authorization: Bearer C3lTUKuNQrAAmSD/TPjuV/HI7aNrAwDp
I am able to get it working with the request module
headers = {'Authorization': f'{token_type} {access_token}'}
print(headers) -> {'Authorization': 'Bearer -xSoUNCLYCrFjxxxxx_wAQVpi4olWrQs0'}
qt_time_obj = requests.get(api_server + 'v1/time', headers=headers)
qt_time = qt_time_obj.json()['time']
print(qt_time) -> 2020-10-13T17:06:32.388000-04:00
Now I am trying to get urllib3 to work but without luck
headers = {'Authorization': f'{token_type} {access_token}'}
url = api_server + 'v1/time'
http = urllib3.PoolManager()
qt_time_obj = http.urlopen('GET', url, headers)
print(qt_time_obj.status) -> 401
print(qt_time_obj.data) -> b'{"code":1014,"message":"Missing authorization header"}'
I also tried with the make_headers method but it gives me the same error.
headers = urllib3.make_headers(basic_auth="Authorization: Bearer AdKt3YUl46_tGnZp7cRgTu4W2vtfBME50")
Could you point where I did wrong? Thank you!
So after some trying, I found that I need to use http.request instead of the http.open. I also need to do "headers=headers" instead of just the "headers" in the method.
qt_time_obj = http.request('GET', url, headers=headers)
I am trying to do a local load testing with Locust. I got the test environment up and running and a local build is also working. I am trying to test the responses of a local path and the response I get in the terminal is correct. But the Locust UI and also the statistics after terminating the test give me 100% fail results.
For creating the locust code (I am pretty new to it) I use the postman content and adjusted it. This is the Code for Locust:
from locust import HttpLocust, TaskSet, task, between
import requests
url = "http://localhost:8080/registry/downloadCounter"
payload = "[\n {\n \"appName\": \"test-app\",\n \"appVersion\": \"1.6.0\"\n }\n]"
class MyTaskSet(TaskSet):
#task(2)
def index(self):
self.client.get("")
headers = {
'Content-Type': 'application/json',
'Accept':'application/json'
}
response = requests.request("POST", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
class MyLocust(HttpLocust):
task_set = MyTaskSet
wait_time = between(2.0, 4.0)
For the Locust swarm I used just basic numbers:
Number of total users to simulate: 1
Hatch Rate: 3
Host: http://localhost:8080/registry/downloadCounter
I do not get any results there, the table stays blank. I guess it has something to do with the json format but I am not able to find the solution myself.
I also put a Screenshot of the Terminal response after termination in this post.
Thank you in advance for your help!
Best regards
This helped:
from locust import HttpLocust, TaskSet, task, between
import requests
url = "http://localhost:8080/registry/downloadCounter"
payload = "[\n {\n \"appName\": \"test-app\",\n \"appVersion\": \"1.6.0\"\n }\n]"
headers = {'Content-type':'application/json', 'Accept':'application/json'}
class MyTaskSet(TaskSet):
#task(2)
def index(self):
response = self.client.post(url = url, data = payload, headers=headers)
print(response.text.encode('utf8'))
print(response.status_code)
class MyLocust(HttpLocust):
task_set = MyTaskSet
wait_time = between(2.0, 4.0)
```
I have this python code that does not work as expected.
import requests
import json
API_ENDPOINT = "https://lkokpdvhc4.execute-api.us-east-1.amazonaws.com/mycall"
data = {'mnumber':'9819838466'}
r = requests.post(url = API_ENDPOINT, data = json.dumps(data))
print (r.text)
This will return an error:
{"stackTrace": [["/var/task/index.py", 5, "handler", "return
mydic[code]"]], "errorType": "KeyError", "errorMessage": "''"}
When I test the API using Amazon console's gateway, I get the expected output (i.e. string like "mumbai"). It means this is client side issue. I have confirmed this by using "postman" as well that returns the same error as mentioned above. How do I send correct headers to post request?
You can create a dictionary with the headers such as
headers = {
"Authorization": "Bearer 12345",
"Content-Type": "application/json",
"key" : "value"
}
Then at the point of making the request pass it as a keyword argument to the request method i.e .post() or .get() or .put
This will be
response = requests.post(API_ENDPOINT, data=json.dumps(data), headers=headers)