How to extract HTTP response body from a Python requests call? - http-request

I'm using the Python requests library. I'm trying to figure out how to extract the actual HTML body from a response. The code looks a bit like this:
r = requests.get(...)
print r.content
This should indeed print lots of content, but instead prints nothing.
Any suggestions? Maybe I've misunderstood how requests.get() works?

Your code is correct. I tested:
r = requests.get("http://www.google.com")
print(r.content)
And it returned plenty of content.
Check the url, try "http://www.google.com".

You can try this method:
import requests
response = requests.get("http://www.google.com")
response.raise_for_status()
data = response.json()
print(data)

import requests
site_request = requests.get("https://abhiunix.in")
site_response = str(site_request.content)
print(site_response)
You can do it either way.

Related

How to retrieve EPPO Database info via POST?

I can retrieve EPPO DB info from GET requests.
I am looking for help to retrieve the info from POST requests.
Example code and other info in the linked Rmarkdown HTMP output
As suggested, I have gone trough the https://httr.r-lib.org/ site.
Interesting. I followed the links to https://httr.r-lib.org/articles/api-packages.html and then to https://cdn.zapier.com/storage/learn_ebooks/e06a35cfcf092ec6dd22670383d9fd12.pdf.
I suppose that the arguments for the POST() function should be (more or less) as follows, but yet the response is always 404
url = "https://data.eppo.int/api/rest/1.0/"
config = list(authtoken=my_authtoken)
body = list(intext = "Fraxinus anomala|Tobacco ringspot virus|invalide name|Sequoiadendron giganteum")
encode = "json"
#handle = ???
Created on 2021-04-26 by the reprex package (v0.3.0)
How do I find the missing pieces?
It is a little bit tricky:
You need to use correct url with name of the service from https://data.eppo.int/documentation/rest, e.g. to use Search preferred names from EPPOCodes list:
url = "https://data.eppo.int/api/rest/1.0/tools/codes2prefnames"
Authorization should be passed to body:
body = list(authtoken = "yourtoken", intext = "BEMITA")
So, if you want to check names for two eppocodes: XYLEFA and BEMITA the code should look like:
httr::POST(
url = "https://data.eppo.int/api/rest/1.0/tools/codes2prefnames",
body = list(authtoken = "yourtoken", intext = "BEMITA|XYLEFA")
)
Nonetheless, I would also recommend you to just use the pestr package. However, to find eppocodes it uses SQLite database under the hood instead of EPPO REST API. Since the db is not big itself (circa 40MB) this shouldn't be an issue.
I found the easy way following a suggestion in the DataCamp course:
"To find an R client for an API service search 'CRAN '"
I found the 'pestr' package that gives great access to EPPO database.
I still do not know how to use the POST() function myself. Any hint on that side is warmly welcome.
Here is a solution to loop over names to get EPPO-codes. Whit minor adjustments this also works for distribution and other information in the EPPO DB
# vector of species names
host_name <- c("Fraxinus anomala", "Tobacco ringspot virus", "invalide name",
"Sequoiadendron giganteum")
EPPO_key <- "enter personal key"
# EPPO url
path_eppo_code <- "https://data.eppo.int/api/rest/1.0/tools/names2codes"
# epty list
my_list <- list()
for (i in host_name) {
# POST request on eppo database
response <- httr::POST(path_eppo_code, body=list(authtoken=EPPO_key,
intext=i))
# get EPPO code
pest_eppo_code <- strsplit(httr::content(response)[[1]], ";")[[1]][2]
# add to list
my_list[[i]] <- pest_eppo_code
}
# list to data frame
data <- plyr::ldply(my_list)

Using HTTR POST to Create a Azure Devops Work Item

Im attempting to setup some R code to create a new work item task in Azure Devops. Im okay with a mostly empty work item to start with if thats okay to do (my example code is only trying to create a work item with a title).
I receive a 203 response but the work item doesn't appear in Devops.
Ive been following this documentation from Microsoft, I suspect that I might be formatting the body incorrectly.
https://learn.microsoft.com/en-us/rest/api/azure/devops/wit/work%20items/create?view=azure-devops-rest-5.1
Ive tried updating different fields and formatting the body differently with no success. I have attempted to create either a bug or feature work item but both return the same 203 response.
To validate that my token is working I can GET work item data by ID but the POST continues to return a 203.
require(httr)
require(jsonlite)
url <- 'https://dev.azure.com/{organization}/{project}/_apis/wit/workitems/$bug?api-version=5.1'
headers = c(
'Authorization' = sprintf('basic %s',token),
'Content-Type' = 'application/json-patch+json',
'Host' = 'dev.azure.com'
)
data <- toJSON(list('body'= list("op"= "add",
"path"= "/fields/System.AreaPath",
"value"= "Sample task")), auto_unbox = TRUE, pretty = TRUE)
res <- httr::POST(url,
httr::add_headers(.headers=headers),
httr::verbose(),
body = data)
Im expecting a 200 response (similar to the example in the link above) and a work item task in Azure DevOps Services when I navigate to the website.
Im not the best with R so please be detailed. Thank you in advanced!
The POST continues to return a 203.
The HTTP response code 203 means Non-Authoritative Information, it should caused by your token format is converted incorrectly.
If you wish to provide the personal access token through an HTTP
header, you must first convert it to a Base64 string.
Refer to this doc described, if you want to use VSTS rest api, you must convert your token to a Base64 string. But in your script, you did not have this script to achieve this convert.
So, please try with the follow script to convert the token to make the key conformant with the requirements(load the base64enc package first):
require(base64enc)
key <- token
keys <- charToRaw(paste0(key,":token"))
auth <- paste0("Basic ",base64encode(keys))
Hope this help you get 200 response code
I know this question is fairly old, but I cannot seem to find a good solution posted yet. So, I will add my solution in case others find themselves in this situation. Note, this did take some reading through other SO posts and trial-and-error.
Mengdi is correct that you do need to convert your token to a Base64 string.
Additionally, Daniel from this SO question pointed out that:
In my experience with doing this via other similar mechanisms, you have to include a leading colon on the PAT, before base64 encoding.
Mengdi came up big in another SO solution
Please try with adding [{ }] outside your request body.
From there, I just made slight modifications to your headers and data objects. Removed 'body' from your json, and made use of paste to add square brackets as well. I found that the Rcurl package made base64 encoding a breeze. Then I was able to successfully create a blank ticket (just titled) using the API! Hope this helps someone!
library(httr)
library(jsonlite)
library(RCurl)
#user and PAT for api
userid <- ''
token= 'whateveryourtokenis'
url <- 'https://dev.azure.com/{organization}/{project}/_apis/wit/workitems/$bug?api-version=5.1'
#create a combined user/pat
#user id can actually be a blank string
#RCurl's base64 seemed to work well
c_id <- RCurl::base64(txt = paste0(userid,
":",
devops_pat
),
mode = "character"
)
#headers for api call
headers <- c(
"Authorization" = paste("Basic",
c_id,
sep = " "
),
'Content-Type' = 'application/json-patch+json',
'Host' = 'dev.azure.com'
)
#body
data <- paste0("[",
toJSON(list( "op"= "add",
"path"= "/fields/System.Title",
"value"= "API test - please ignore"),
auto_unbox = TRUE,
pretty = TRUE
),
"]"
)
#make the call
res <- httr::POST(url,
httr::add_headers(.headers=headers),
httr::verbose(),
body = data
)
#check status
status <- res$status_code
#check content of response
check <- content(res)

Run HTTP requests with PySpark in parallel and asynchronously

I have a text file containing several million URLs and I have to run a POST request for each of those URLs.
I tried to do it on my machine but it is taking forever so I would like to use my Spark cluster instead.
I wrote this PySpark code:
from pyspark.sql.types import StringType
import requests
url = ["http://myurltoping.com"]
list_urls = url * 1000 # The final code will just import my text file
list_urls_df = spark.createDataFrame(list_urls, StringType())
print 'number of partitions: {}'.format(list_urls_df.rdd.getNumPartitions())
def execute_requests(list_of_url):
final_iterator = []
for url in list_of_url:
r = requests.post(url.value)
final_iterator.append((r.status_code, r.text))
return iter(final_iterator)
processed_urls_df = list_urls_df.rdd.mapPartitions(execute_requests)
but it is still taking a lot of time, how can I make the function execute_requests more efficient launching the requests in each partition asynchronously for example?
Thanks!
Using the python package grequests(installable with pip install grequests) might be an easy solution for your problem without using spark.
The Documentation (can be found here https://github.com/kennethreitz/grequests) gives a simple example:
import grequests
urls = [
'http://www.heroku.com',
'http://python-tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://fakedomain/',
'http://kennethreitz.com'
]
Create a set of unsent Requests:
>>> rs = (grequests.get(u) for u in urls)
Send them all at the same time:
>>> grequests.map(rs)
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, None, <Response [200]>]
I found out, that using gevent wihtin a foreach on a spark Dataframe results in some weird errors and does not work. It seems as if spark also relies on gevent, which is used by grequests...

Scapy raw data manipulation

I am having troubles manipulating raw data. I am trying to change around a
resp_cookie in my ISAKMP header and when I do a sniff on the packet it is all in raw data format under Raw Load='\x00\x43\x01........... ' with about 3 lines like that. When I do a Wireshark capture I see the information I want to change but I cant seem to find a way to convert and change that raw data to find and replace the information I am looking for. Also, I can see the information I need when I do a hexdump(), but I can't store that in a variable. when I type i = hexdump(pkt) it spits out the hexdump but doesn't store the hexdump in i.
So this post is a little old, but I've come across it a dozen or so times trying to find the answer to a similar problem I'm having. I doubt OP has need for an answer anymore, but if anyone else is looking to do something similar...here you go!
I found the following code snippet somewhere in the deep, dark depths of google and it worked for my situation.
Hexdump(), show() and other methods of Scapy just output the packet to the terminal/console; they don't actually return a string or any other sort of object. So you need a way to intercept that data that it intends to write and put it in a variable to be manipulated.
NOTE: THIS IS PYTHON 3.X and SCAPY 3K
import io
import scapy
#generic scapy sniff
sniff(iface=interface,prn=parsePacket, filter=filter)
With the above sniff method, you're going to want to do the following.
def parsePacket(packet):
outputPacket = ''
#setup
qsave = sys.stdout
q = io.StringIO()
#CAPTURES OUTPUT
sys.stdout = q
#Text you're capturing
packet.show()
#restore original stdout
sys.stdout = qsave
#release output
sout = q.getvalue()
#Add to string (format if need be)
outputPacket += sout + '\n'
#Close IOStream
q.close()
#return your packet
return outputPacket
The string you return (outputPacket) can now be manipulated how you want.
Swap out .show() with whatever function you see fit.
P.S. Forgive me if this is a little rough from a Pythonic point of view...not a python dev by any stretch.

Using RCurl/httr for Github Basic Authorization

I am trying to create an OAuth token from the command line using the instructions here. I am able to use curl from the command line, and get the correct response
curl -u 'username:pwd' -d '{"scopes":["user", "gist"]}' \
https://api.github.com/authorizations
Now, I want to replicate the same in R using RCurl or httr. Here is what I tried, but both commands return an error. Can anyone point out what I am doing wrong here?
httr::POST(
'https://api.github.com/authorizations',
authenticate('username', 'pwd'),
body = list(scopes = list("user", "gist"))
)
RCurl::postForm(
uri = 'https://api.github.com/authorizations',
.opts = list(
postFields = '{"scopes": ["user", "gist"]}',
userpwd = 'username:pwd'
)
)
The question is ages old, but maybe still helpfull to some: The problem should be that the opts arguments are passed in the wrong way (lacking a curlOptions function call). The following worked for me in a different context:
result <- getURL(url,.opts=curlOptions(postfields=postFields))
(and yes, as far as I know you can use getURL function for POST requests).

Resources