I am new at using APIs. I have an R output in the form of a list which I want to paste on a confluence page. and I have no idea how. I have been trying to use Rest API but its confusing me.
I have been able to get a 200 response from the website using
httr::set_config(config(ssl_verifypeer=FALSE))
URL <- "http://xxx.xx.xx/xx/xx/daily_report"
response = GET(URL, authenticate("xxx", "xxx"))
response
Really clueless where to go next.
Try this:
content(response, "parsed", "application/json")$results
Related
Beginner here, I have a list (or rather column) full of website redirect URLs from which I want to get the "correct" website URL. Example, I have the URL https://icoholder.com/en/v2/ico/ico-redirect/4321?to=https%3A//sirinlabs.com%3Futm_source%3Dicoholder but I want to get the correct website URL https://sirinlabs.com/?utm_source=icoholder that appears in the search bar when you click the previous link and load the website.
Any idea how to manage this is in R for an entire column of these URLs?
Thanks in advance.
You can use the httr library to get the final URL
url <- "https://icoholder.com/en/v2/ico/ico-redirect/4321?to=https%3A//sirinlabs.com%3Futm_source%3Dicoholder"
httr::GET(url)$url
# [1] "https://sirinlabs.com/?utm_source=icoholder"
That will actually make the HTTP request to see where the server sends you.
If you want to assume that the correct URL will always be in the ?to= querystring parameter, you can use
httr::parse_url(url)$query$to
[1] "https://sirinlabs.com?utm_source=icoholder"
without making any sort of HTTP request.
I have a coding problem regarding Python 3.5 web clawing.
I try to use 'requests.get' to extract the real link from 'http://www.baidu.com/link?url=ePp1pCIHlDpkuhgOrvIrT3XeWQ5IRp3k0P8knV3tH0QNyeA042ZtaW6DHomhrl_aUXOaQvMBu8UmDjySGFD2qCsHHtf1pBbAq-e2jpWuUd3'. An example of the code is like below:
import requests
response = requests.get('http://www.baidu.com/link?url=ePp1pCIHlDpkuhgOrvIrT3XeWQ5IRp3k0P8knV3tH0QNyeA042ZtaW6DHomhrl_aUXOaQvMBu8UmDjySGFD2qCsHHtf1pBbAq-e2jpWuUd3')
c = response.url
I expected that 'c' should be 'caifu.cnstock.com/fortune/sft_jj/tjj_yndt/201605/3787477.htm'. (I remove http:// from the link as I can't post two links in one question.)
However, it doesn't work, and keeps return me the same link as I putted in.
Can anyone help on this. Many thanks in advance.
#
Thanks a lot to Charlie.
I have found out the solution. I first use .content.decode to read the response history, but that will be mixed up with many irrelevant info. I then use .findall to extract the redirect url from the history, which should be the first url displayed in the response history. Then, I use requests.get to retrieve the info. Below is the code:
rep1 = requests.get(url)
cont = rep1.content.decode('utf-8')
extract_cont = re.findall('"([^"]*)"', cont)
redir_url = extract_cont[0]
rep = requests.get(redir_url)
You may consider looking into the response headers for a 'location' header.
response.headers['location']
You may also consider looking at the response history, which contains a response for each response instance in a chain of redirects
response.history
Your sample URL doesn't redirect; The response is a 200 and then it uses a JavaScript window.location change. The requests library won't support this type of redirect.
<script>window.location.replace("http://caifu.cnstock.com/fortune/sft_jj/tjj_yndt/201605/3787477.htm")</script>
<noscript><META http-equiv="refresh" content="0;URL='http://caifu.cnstock.com/fortune/sft_jj/tjj_yndt/201605/3787477.htm'"></noscript>
If you know you will always be using this one service, you could parse the response, maybe using regex.
If you don't know what service will always be used and also want to handle every possible situation, you might need to instantiate a WebKit instance or something and somehow try to determine when it finally finishes. I'm sure there's a page load complete event which you could use, but you still might have pages that do a window.location change after the page is loaded using a timer. This will be very heavyweight and still not cover every conceivable type of redirect.
I recommend starting with writing a special handler for each type of edge case and fallback on a default handler that just looks at the response.url. As new edge cases come up, write new handlers. It's kind of the 'trial and error' approach.
I had a nice little package to scrape Google Ngram data but I have discovered they have switched to SSL and my package has broken. If I switch from readLines to getURL gets some of the way there, but some of the included script in the page is missing. Do I need to get fancy with user agents or something?
Here is what I have tried so far (pretty basic):
library(RCurl)
myurl <- "https://books.google.com/ngrams/graph?content=hacker&year_start=1950&year_end=2000"
getURL(myurl)
Comparing the results to viewing the source after entering the url in a browser shows that the crucial content is missing from the results returned to R. In the browser, the source includes content looking like this:
<script type="text/javascript">
var data = [{"ngram": "hacker", "type": "NGRAM", "timeseries": [9.4930387994907051e-09,
1.1685493106483591e-08, 1.0784501440023556e-08, 1.0108472218003532e-08,
etc.
Any suggestions would be greatly appreciated!
Sorry, not a direct solution, but it doesn't seem to be an user-agent problem. When you open your URL in a browser, you can see that there is a redirection that adds a parameter at the end of the address : direct_url=t1%3B%2Chacker%3B%2Cc0.
If you use getURL() to download this new URL, complete with the new parameter, then the javascript you are mentioning is present in the result.
Another solution could be to try to access data via Google BigQuery, as mentioned in this SO question :
Google N-Gram Web API
I am trying to use XML, RCurl package to read some html tables of the following URL
http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#
Here is the code I am using
library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
url <- "http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#"
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE)
docName(doc) <- url
tmp <- readHTMLTable(doc)
## Required tables
tmp[[13]]
tmp[[14]]
If you look at the tables it has not been able to parse the values from the webpage.
I guess this due to some javascipt evaluation happening on the fly.
Now if I use "save page as" option in google chrome(it does not work in mozilla)
and save the page and then use the above code i am able to read in the values.
But is there a work around so that I can read the table of the fly ?
It will be great if you can help.
Regards,
Looks like they're building the page using javascript by accessing http://www.nse-india.com/marketinfo/equities/ajaxGetQuote.jsp?symbol=SBIN&series=EQ and parsing out some string. Maybe you could grab that data and parse it out instead of scraping the page itself.
Looks like you'll have to build a request with the proper referrer headers using cURL, though. As you can see, you can't just hit that ajaxGetQuote page with a bare request.
You can probably read the appropriate headers to put in by using the Web Inspector in Chrome or Safari, or by using Firebug in Firefox.
Hi friends I am using LinkedIn API to integrate LinkedIn profile and messages to my website.
I got one sample example throgh googling. But the output is coming in XML output.
Then, how can I bind all those things to my website controls?
You should prefer java script API of linkedin
for more details visit Linkeidn javaScript api docs =>
Java Script API
You get XML output from LinkedIn such as XMLFile.xml.
Following example shows output in label.
DataSet tmpDs = new DataSet();
tmpDs.ReadXml(Server.MapPath("~/XMLFile.xml"));
lblUser.Text = tmpDs.Tables[0].Rows[0]["first-name"].ToString();
If you are using linkedin api urls, by default you get an XML response. You can add ?format=json at the end or the url to get response in json format instead of XML.
e.g.
http://api.linkedin.com/v1/people/~
will give XML result while
http://api.linkedin.com/v1/people/~?format=json
will give json result.
Hope it helps.