Python Get Request Printing specific key and value - python-requests

I am new to HTTP requests, and I am trying to make a simple get request using Python and the Requests Library requesting GitHubs API.
I've currently tried to implement parameters for a key and value pair
import requests
r = requests.get("https://api.github.com/repos/git/git", params= {'name':name} )
print(name)
Obviously this is incorrect, as I'm getting an error which says name isn't defined which makes perfect sense however I don't know to print specific value's from keys I want rather the printing the entire r.json() response.
I've just tried to use this:
import requests
import json
r = requests.get("https://api.github.com/repos/git/git")
data = r.json()
class User:
def __init__(self, json_def):
self.__dict__ = json.loads(json_def)
user = User(data)
print(user.size)
However I'm getting the error:
TypeError: the JSON object must be str, bytes or bytearray, not 'dict'

You are checking a Response object which contains a server’s response to an HTTP request.. From that link, I am guessing you are trying to check for contents
from that response. So you can modify that code to this.
import requests
import json
r = requests.get("https://api.github.com/repos/git/git")
data = json.loads(r.content)
class User:
def __init__(self, json_def):
self.__dict__ = data

Related

How to recover a hidden ID from a query string from an XHR GET request?

I'm trying to use the hidden airbnb api. I need to reverse engineer where the ID comes from in the query string of a GET request. For example, take this listing:
https://www.airbnb.ca/rooms/47452643
The "public" ID is shown to be 47452643. However, another ID is needed to use the API.
If you look at the XHR requests in Chrome, you'll see a request starting with " StaysPdpSections?operationName". This is the request I want to replicate. If I copy the request in Insomnia or Postman, I see a variable in the query string starting with:
"variables":"{"id":"U3RheUxpc3Rpbmc6NDc0NTI2NDM="
The hidden ID "U3RheUxpc3Rpbmc6NDc0NTI2NDM" is what I need. It is needed to get the data from this request and must be inserted into the query string. How can I recover the hidden ID "U3RheUxpc3Rpbmc6NDc0NTI2NDM" for each listing dynamically?
That target id is burried really deep in the html....
import requests
from bs4 import BeautifulSoup as bs
import json
url = 'https://www.airbnb.ca/rooms/47452643'
req = requests.get(url)
soup = bs(req.content, 'html.parser')
script = soup.select_one('script[type="application/json"][id="data-state"]')
data = json.loads(script.text)
target = data.get('niobeMinimalClientData')[2][1]['variables']
print(target.get('id'))
Output:
U3RheUxpc3Rpbmc6NDc0NTI2NDM=

How to get a stream of a ndjson response

I am trying to connect to a http API. This API responses with a ndjson, that is a newline separated json strings. I need to consume these lines one by one, before I download them all (in fact even before the server knows what it will output on the future lines).
In Python, I can achieve this by:
import requests, json
lines = requests.get("some url", stream=True).iter_lines()
for line in lines:
#parse line as JSON and do whatever
and it works like charm.
I want the same effect done in Nim, but the program blocks. For example, I tried to load just the first line of the response:
import httpclient, json, streams
var stream = newHttpClient().get("some url").bodyStream
var firstLine = ""
discard stream.readLine(firstLine )
echo firstLine
but with no luck - that is, the program never echoes.
I also tried streams.lines iterator, but that didn't help either.
Is there some idiom similar to the Python snipet that would allow me to easily work with the http reponse stream line by line?
The solution is to use the net module as in the question linked by #pietroppeter. That initially didn't work for me, because I didn't construct the HTTP request correctly.
The resulting code:
import net, json
const HOST = "host"
const TOKEN = "token"
iterator getNdjsonStream(path: string): JsonNode =
let s = newSocket()
wrapSocket(newContext(), s)
s.connect(HOST, Port(443))
var req = &"GET {path} HTTP/1.1\r\nHost:{HOST}\r\nAuthorization: {TOKEN}\r\n\r\n"
s.send(req)
while true:
var line = ""
while line == "" or line[0] != '{':
line = s.recvLine
yield line.parseJson
I think this can't be achieved using the httpClient module. The async versions might look like they can do it but it seems to me that you can only work with the received data once the Future is completed, that is after all data is downloaded.
The fact that such a simple think cannot be done simply and the lack of examples I could find lead to a couple of days of frustration and the need of opening a stackoverflow account after 10 years of programming.

How to access trailing metadata from python gRPC client

Here is how I am sending the metadata from server.
def DoSomething(self, request, context):
response = detection2g_pb2.SomeResponse()
response.message = 'done'
_SERVER_TRAILING_METADATA = (
('method_status', '1010'),
('error', 'No Error')
)
context.set_trailing_metadata(_SERVER_TRAILING_METADATA)
return response
Here is what I tried:
res = _stub.DoSomething(req)
print (res.trailing_metadata())
In this case I get Attribute Error object has no attribute 'trailing_metadata'. I want to know way to access the trailing metadata in the client side.
I apologize that we don't yet have an example illustrating metadata but you can see here how getting the trailing metadata on the invocation side requires using with_call (or future, but that may change the control flow in a way that you don't want changed, so I think that with_call should be your first choice). I think your invocation-side code should look like
response, call = _stub.DoSomething.with_call(request)
print(call.trailing_metadata())
.

How does adding dont_filter=True argument in scrapy.Request make my parsing method to work ?

Here's a simple scrapy spider
import scrapy
class ExampleSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["https://www.dmoz.org"]
start_urls = ('https://www.dmoz.org/')
def parse(self,response):
yield scrapy.Request(self.start_urls[0],callback=self.parse2)
def parse2(self, response):
print(response.url)
When you run the program, parse2 method doesn't work and it doesn't print response.url. Then I found the solution to this in the thread below.
Why is my second request not getting called in the parse method of my scrapy spider
Its just that I needed to add dont_filter=True as argument in request method to make the parse2 function work.
yield scrapy.Request(self.start_urls[0],callback=self.parse2,dont_filter=True)
But in the examples given in scrapy documentation and many youtube tutorials, they never used dont_filter = True argument in scrapy.Request method and still their second parse functions works.
Take a look at this
def parse_page1(self, response):
return scrapy.Request("http://www.example.com/some_page.html",
callback=self.parse_page2)
def parse_page2(self, response):
# this would log http://www.example.com/some_page.html
self.logger.info("Visited %s", response.url)
Why can't my spider work unless dont_filter=True is added ? What am I doing wrong ? What were the duplicate links that my spider had filtered in my first example ?
P.S. I could've resolved resolved this in the QA thread I posted above, But I'm not allowed to comment unless I have 50 reputation (poor me !!)
Short answer: You are making duplicate requests. Scrapy has built in duplicate filtering which is turned on by default. That's why the parse2 doesn't get called. When you add that dont_filter=True, scrapy doesn't filter out the duplicate requests. So this time the request is processed.
Longer version:
In Scrapy, if you have set start_urls or have the method start_requests() defined, the spider automatically requests those urls and passes the response to the parse method which is the default method used for parsing requests. Now you can yield new requests from here which will again be parsed by Scrapy. If you don't set a callback, the parse method will be used again. If you set a callback, that callback will be used.
Scrapy also has a built in filter which stops duplicate requests. That is if Scrapy has already crawled a site and parsed the response, even if you yield another request with that url, scrapy will not process it.
In your case, you have the url in start_urls. Scrapy starts with that url. It crawls the site and passes the response to parse. Inside that parse method, you again yield a request to that same url (which scrapy just processed) but this time with parse2 as the callback. When this request is yielded, scrapy sees this as a duplicate. So it ignores the request and never processes it. So no calls to parse2 is made.
If you want to control which urls should be processed and which callback to be used, I recommend you override the start_requests() and return a list of scrapy.Request instead of using the single start_urls attribute.

How to pass parameter to Url with Python urlopen

I'm currently new to python programming. My problem is that my python program doesn't seem to pass/encode the parameter properly to the ASP file that I've created. This is my sample code:
import urllib.request
url = 'http://www.sample.com/myASP.asp'
full_url = url + "?data='" + str(sentData).replace("'", '"').replace(" ", "%20").replace('"', "%22") + "'"
print (full_url)
response = urllib.request.urlopen(full_url)
print(response)
the output would give me something like:
http://www.sample.com/myASP.asp?data='{%22mykey%22:%20[{%22idno%22:%20%22id123%22,%20%22name%22:%20%22ej%22}]}'
The asp file is suppose to insert the acquired querystring to a database.. But whenever I check my database, no record is saved. Though if I do copy and paste the printed output on my browser url, the record is saved. Any input on this? TIA
Update:
Is it possible the python calls my ASP File A but it doesn't call my ASP File B? ASP File A is called by python while ASP File B is called by ASP File A. Because whenever I run the url on a browser, the saving goes well. But in python, no saving of database occurs even though the data passed from python is read by ASP File A..
Use firebug with Firefox and watch the network traffic when the page is loaded. If it is actually an HTTP POST, which I suspect it is, check the post parameters on that post and do something like this:
from BeautifulSoup import BeautifulSoup
import urllib
post_params = {
'param1' : 'val1',
'param2' : 'val2',
'param3' : 'val3'
}
post_args = urllib.urlencode(post_params)
url = 'http://www.sample.com/myASP.asp'
fp = urllib.urlopen(url, post_args)
soup = BeautifulSoup(fp)
If its actually HTTP POST, this will work.
In case anybody stumbles upon this, this is what I've come up with:
py file:
url = "my.url.com"
data = {'sample': 'data'}
encodeddata = urllib.parse.urlencode(data).encode('UTF-8')
req = urllib.request.Request(url, encodeddata)
response = urllib.request.urlopen(req)
and in my asp file, I used json2.js:
jsondata = request.form("data")
jsondata = replace(jsondata,"'","""")
SET jsondata = JSON.parse(jsontimecard)
Note: use requests instead. ;)
First off, I don't know Python.
But from this : doc on urllib.request
the HTTP request will be a POST instead of a GET when the data
parameter is provided
Let me make a really wild guess, you are accessing the form values as Request.Querystring(..) in the asp page, so your post wont pass any values. But when you paste the url in the address bar, it is a GET and it works.
just guessing, you could show the .asp page for further check.

Resources