I tried
download.file('https://www.dropbox.com/s/r3asyvybozbizrm/Himalayas.jpg',
destfile="1.jpg",
method="auto")
but it returns the HTML source of that page.
Also tried a little bit of rdrop
library(rdrop2)
# please put in your key/secret
drop_auth(new_usesr = FALSE, key=key, secret=secret, cache=T)
And the pop up website reports:
Invalid redirect_uri: "http://localhost:1410": It must exactly match one of the redirect URIs you've pre-configured for your app (including the path).
I don't understand the URI thing very well. Can somebody recommend some document to read please....
I read some posts but most of them discuss how to read data from excel files.
repmis worked only for reading excel files...
library(repmis)
repmis::source_DropboxData("test.csv",
"tcppj30pkluf5ko",
sep = ",",
header = F)
Also tried
library(RCurl)
url='https://www.dropbox.com/s/tcppj30pkluf5ko/test.csv'
x = getURL(url)
read.csv(textConnection(x))
And it didn't work...
Any help and discussion's appreciated. Thanks!
The first issue is because the https://www.dropbox.com/s/r3asyvybozbizrm/Himalayas.jpg link points to a preview page, not the file content itself, which is why you get the HTML. You can modify links like this though to point to the file content, as shown here:
https://www.dropbox.com/help/201
E.g., add a raw=1 URL parameter:
https://www.dropbox.com/s/r3asyvybozbizrm/Himalayas.jpg?raw=1
Your downloader will need to follow redirects for that to work though.
The second issue is because you're trying to use a OAuth 2 app authorization flow, which requires that all redirect URIs be pre-registered. You can register redirect URIs, in your case it's http://localhost:1410, for Dropbox API apps on the app's page on the App Console:
https://www.dropbox.com/developers/apps
For more information on using OAuth, you can refer to the Dropbox API OAuth guide here:
https://www.dropbox.com/developers/reference/oauthguide
I use read.table(url("yourdropboxpubliclink")) for instance I use this link
instead of using https://www.dropbox.com/s/xyo8sy9velpkg5y/foo.txt?dl=0, which is chared link on dropbox I use
https://dl.dropboxusercontent.com/u/15634209/histogram/foo.txt
and non-public link raw=1 will work
It works fine for me.
Related
Everytime I am testing postman collection ,I need to change the authorization token under header manually followed by exporting the collection again and running through newman.
Is there any way , instead of giving here, I can give it in a CSV file which is being used as test data file. which would reduce the efforts of changing code every time.
Please suggest.
Yes you can, in your csv file, add one more column - name as "AT"
and mention that reference AT in your request as in the following picture.
I'm triying to scrape reviews from this webpage https://www.leroymerlin.es/fp/82142706/armario-serie-one-blanco-abatible-2-puertas-200x100x50cm. I'm running into some issues to get XPath, when I ran the code I found the output is always NULL.
Code:
library(XML)
url <- "https://www.leroymerlin.es/fp/82142706/armario-serie-one-blanco-abatible-2-puertas-200x100x50cm"
source <- readLines(url, encoding = "UTF-8")
parsed_doc <- htmlParse(source, encoding = "UTF-8")
xpathSApply(parsed_doc, path = '//*[#id="reviewsContent"]/div[1]/div[2]/div[3]/h3', xmlValue)
I must be doing something wrong! I'm trying everything. Many thanks for your helps.
The This webpage is dynamically created upon load with the data is stored in a secondary file, typical scraping and xpath methods will not work.
If you access your browser's developer's tools and goto the network tab.
Reload the webpage and filter for the XHR files. Review each file and one should see a file named "reviews", this is the file where the reviews are stored in a JSON format. Right click the file and copy the link address.
One can access this file directly:
library(jsonlite)
fromJSON("https://www.leroymerlin.es/bin/leroymerlin/reviews?product=82142706&page=1&sort=best&reviewsPerPage=5")
Here is a good reference: How to Find The Link for JSON Data of a Certain Website
We are currently working with an API for file upload and download.
We upload the file successfully, yet we can't download it. We have found that some of the files contain a '+' character which seems to be the issue here.
My question is, if we call the API like so :
nameoffile = "testing+test"
https://sitename/api/filesystem/nameoffile
Could there be another way of calling the API so the clients/browsers handle the name in the way we need them to?
Yes. You're talking about URLEncoding special characters: https://www.tutorialspoint.com/html/html_url_encoding.htm
In your case, a '+' character encodes to "%2B":
nameOfFile = "testing%2Btest"
url = "https://www.google.com/abc/"+nameOfFile
I had a nice little package to scrape Google Ngram data but I have discovered they have switched to SSL and my package has broken. If I switch from readLines to getURL gets some of the way there, but some of the included script in the page is missing. Do I need to get fancy with user agents or something?
Here is what I have tried so far (pretty basic):
library(RCurl)
myurl <- "https://books.google.com/ngrams/graph?content=hacker&year_start=1950&year_end=2000"
getURL(myurl)
Comparing the results to viewing the source after entering the url in a browser shows that the crucial content is missing from the results returned to R. In the browser, the source includes content looking like this:
<script type="text/javascript">
var data = [{"ngram": "hacker", "type": "NGRAM", "timeseries": [9.4930387994907051e-09,
1.1685493106483591e-08, 1.0784501440023556e-08, 1.0108472218003532e-08,
etc.
Any suggestions would be greatly appreciated!
Sorry, not a direct solution, but it doesn't seem to be an user-agent problem. When you open your URL in a browser, you can see that there is a redirection that adds a parameter at the end of the address : direct_url=t1%3B%2Chacker%3B%2Cc0.
If you use getURL() to download this new URL, complete with the new parameter, then the javascript you are mentioning is present in the result.
Another solution could be to try to access data via Google BigQuery, as mentioned in this SO question :
Google N-Gram Web API
I am trying to use XML, RCurl package to read some html tables of the following URL
http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#
Here is the code I am using
library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
url <- "http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#"
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE)
docName(doc) <- url
tmp <- readHTMLTable(doc)
## Required tables
tmp[[13]]
tmp[[14]]
If you look at the tables it has not been able to parse the values from the webpage.
I guess this due to some javascipt evaluation happening on the fly.
Now if I use "save page as" option in google chrome(it does not work in mozilla)
and save the page and then use the above code i am able to read in the values.
But is there a work around so that I can read the table of the fly ?
It will be great if you can help.
Regards,
Looks like they're building the page using javascript by accessing http://www.nse-india.com/marketinfo/equities/ajaxGetQuote.jsp?symbol=SBIN&series=EQ and parsing out some string. Maybe you could grab that data and parse it out instead of scraping the page itself.
Looks like you'll have to build a request with the proper referrer headers using cURL, though. As you can see, you can't just hit that ajaxGetQuote page with a bare request.
You can probably read the appropriate headers to put in by using the Web Inspector in Chrome or Safari, or by using Firebug in Firefox.