read.csv fails to read a CSV file from google docs - r

I wish to use read.csv to read a google doc spreadsheet.
I try using the following code:
data_url <- "http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv"
read.csv(data_url)
Which results in the following error:
Error in file(file, "rt") : cannot open the connection
I'm on windows 7. And the code was tried on R 2.12 and 2.13
I remember trying this a few months ago and it worked fine.
Any suggestion what might be causing this or how to solve it?
Thanks.

It might have something to do with the fact that Google is reporting a 302 temporarily moved response.
> download.file(data_url, "~/foo.csv", method = "wget")
--2011-04-29 18:01:01-- http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Resolving spreadsheets0.google.com... 74.125.230.132, 74.125.230.128, 74.125.230.130, ...
Connecting to spreadsheets0.google.com|74.125.230.132|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [following]
--2011-04-29 18:01:01-- https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Connecting to spreadsheets0.google.com|74.125.230.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: `/home/gavin/foo.csv'
[ <=> ] 41 --.-K/s in 0s
2011-04-29 18:01:02 (1.29 MB/s) - `/home/gavin/foo.csv' saved [41]
> read.csv("~/foo.csv")
column1 column2
1 a 1
2 b 2
3 ds 3
4 d 4
5 f 5
6 ga 5
I'm not sure R's internal download code is capable of responding to such redirects:
> download.file(data_url, "~/foo.csv")
trying URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
Error in download.file(data_url, "~/foo.csv") :
cannot open URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'

I ran into the same problem and eventually found a solution in a forum thread. Using my own public CSV file:
library(RCurl)
tt = getForm("https://spreadsheets.google.com/spreadsheet/pub",
hl ="en_US", key = "0Aonsf4v9iDjGdHRaWWRFbXdQN1ZvbGx0LWVCeVd0T1E",
output = "csv",
.opts = list(followlocation = TRUE, verbose = TRUE, ssl.verifypeer = FALSE))
holidays <- read.csv(textConnection(tt))

Check the solution on http://blog.forret.com/2011/07/google-docs-infamous-moved-temporarily-error-fixed/
So what is the solution: just add “&ndplr=1” to your URL and you will
skip the authentication redirect. I’m not sure what the NDPLR
parameter name stands for, let’s just call it: “Never Do Published
Link Redirection“.

Related

libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value

While running the "PrepareAnnotationRefseq"  function from the customProDB package in R, I  ran into  a problem due to a compatibility issue of the curl version. I am currently using curl version 4.3.2.  The error report I got is:
PrepareAnnotationRefseq(genome='mm39',CDSfasta="geneseq.fasta",pepfasta="proteinseq.fasta", annotation_path, dbsnp = NULL, splice_matrix=FALSE, ClinVar=FALSE)
In curlSetOpt(..., .opts = .opts, curl = h, .encoding = .encoding) : Error setting the option for # 3 (status = 43) (enum = 81) (value = 0x55822c7f3b70): A libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!
This could be a trivial problem for an expert in R, however with my current skill set I am unable to resolve this after looking for a solution on several forums and R groups. I would be very grateful if you could kindly shed some light on this issue. Perhaps a patch file that can fix the problem.
It's easy to read the manual. Why can't you do it?
If verify value is set to 1:
From 7.28.1 to 7.65.3: setting it to 1 made curl_easy_setopt() return an error and leaving the flag untouched.
Use 2.
When CURLOPT_SSL_VERIFYHOST is 2, that certificate must indicate that the server is the server to which you meant to connect, or the connection fails. Simply put, it means it has to have the same name in the certificate as is in the URL you operate against.
But why do you touch it? The default value for this option is 2 and is suitable for most cases of libcurl usage.

Is there a restriction on the no. of HERE API Calls I can make in a loop (using R)

I am trying to loop through a list of origin destination lat long locations to get the transit time. I am getting the following error when I loop. However when I do a single call (without looping), I get an output without error. I use the freemium HERE-API and I am allowed 250k transactions a month.
`for (i in 1:nrow(test))
{
call <- paste0("https://route.api.here.com/routing/7.2/calculateroute.json",
"?app_id=","appid",
"&app_code=","appcode",
"&waypoint0=geo!",y$dc_lat[i],",",y$dc_long[i],
"&waypoint1=geo!",y$store_lat[i],",",y$store_long[i],
"&mode=","fastest;truck;traffic:enabled",
"&trailerscount=","1",
"&routeattributes=","sh",
"&maneuverattributes=","di,sh",
"&limitedweight=","20")
response <-fromJSON(call, simplify = TRUE)
Traffic_time = (response[["response"]][["route"]][[1]][["summary"]][["trafficTime"]]) / 60
Base_time = (response[["response"]][["route"]][[1]][["summary"]][["baseTime"]]) / 60
print(Traffic_time)
}`
Error in file(con, "r"): cannot open the connection to 'https://route.api.here.com/routing/7.2/calculateroute.json?app_id=appid&app_code=appcode&waypoint0=geo!45.1005200,-93.2452000&waypoint1=geo!45.0978500,-95.0413620&mode=fastest;truck;traffic:enabled&trailerscount=1&routeattributes=sh&maneuverattributes=di,sh&limitedweight=20'
Traceback:
As per the error, this suggests that there is problem with the file at your end. it could be corrupt, good to try with changing the extension of the file. Can also try to restart your IDE. The number of API calls depend on the plans that you have opted for freemium or pro plans. You can have more details : https://developer.here.com/faqs

Get JSON request from link into R

I am trying to learn how to collect data from the Web into R. There's a website from Brazilian Ministery of Health that is sharing the numbers of the disease here in Brazil, it is a public portal.
COVIDBRASIL
So, on this page, I am interested in the graph that displays the daily reporting of cases here in Brazil. Using the inspector on Google Chrome I can access the JSON file feeding the data to this chart, my question is how could I get this file automatically with R. When I try to open the JSON in a new tab outside the inspector "Response" tab, I get an "Unauthorized" message. There is any way of doing this or every time I would have to manually copy the JSON from the inspector and update my R script?
In my case, I am interested in the "PortalDias" response. Thank you.
URL PORTAL DIAS
You need to set some headers to prevent this "Unauthorized" message. I copied them from the 'Headers' section in the browser 'Network' window.
library(curl)
library(jsonlite)
url <- "https://xx9p7hp1p7.execute-api.us-east-1.amazonaws.com/prod/PortalDias"
h <- new_handle()
handle_setheaders(h, Host = "xx9p7hp1p7.execute-api.us-east-1.amazonaws.com",
`Accept-Encoding` = "gzip, deflate, br",
`X-Parse-Application-Id` = "unAFkcaNDeXajurGB7LChj8SgQYS2ptm")
fromJSON(rawToChar(curl_fetch_memory(url, handle = h)$content))
# $results
# objectId label createdAt updatedAt qtd_confirmado qtd_obito
# 1 6vr9rUPbd4 26/02 2020-03-25T16:25:53.970Z 2020-03-25T22:25:42.967Z 1 123
# 2 FUNHS00sng 27/02 2020-03-25T16:27:34.040Z 2020-03-25T22:25:55.169Z 0 34
# 3 t4qW51clpj 28/02 2020-03-25T19:08:36.689Z 2020-03-25T22:26:02.427Z 0 35
# ...

What is the maximum length of URL in R?

I am constructing an URI in R which is generated on the fly with ~40000 characters.
I tried using
RCurl
jsonlite
curl
All three give a bad URL Error when connecting through a HTTP GET request. I am refraining from using httr as it will install 5 additional dependencies, while I want minimum dependency in my R program. I am unsure if even httr would be able to handle so many characters in URL.
Is there a way that I can encode/pack it to a allowed limit or a better approach/package that can handle URL of any length similar to python's urllib?
Thanks in advance.
This is not a limitation of RCurl.
Let's make a long URL and try it:
> s = paste0(rep(letters,2000),collapse="")
> nchar(s)
[1] 52000
That's 52000 characters of A-Z. Stick it on a URL:
> url = paste0("http://www.omegahat.net/RCurl/",s,sep="")
> nchar(url)
[1] 52030
> substr(url, 1, 40)
[1] "http://www.omegahat.net/RCurl/abcdefghij"
Now try and get it:
> txt = getURL(url)
> txt
[1] "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>414 Request-URI Too Large</title>\n</head><body>\n<h1>Request-URI Too Large</h1>\n<p>The requested URL's length exceeds the capacity\nlimit for this server.<br />\n</p>\n</body></html>\n"
>
That's the correct response from the server. The server decided it was a long URL, returned a 414 error, and proves RCurl can request URLs of over 40,000 characters.
Until we know more, I can only presume the "bad URL" message is coming from the server, about which we know nothing.

R quandl : couldn't connect to host

I am begining to use Quandl facilities to import datasets to R with Quandl R API. It appears to be the easiest thing. However I have a problem. The below pasted snipet of code does not work (for me). It returns an error.
library(Quandl)
my_quandl_dtst <- Quandl("DOE/RBRTE")
Error in function (type, msg, asError = TRUE) : couldn't connect to host
What could be the cause of the problem?
I searched this site and found some solutions, also the one below, but it does not work for me.
set_config(use_proxy(url='your.proxy.url',port,username,password))
On the other hand, read.csv with url pasted from quandl website export dataset facility works:
my_quandl_dtst <- read.csv('http://www.quandl.com/api/v1/datasets/DOE/RBRTE.csv?', colClasses = c('Date' = 'Date'))
I would realy like to use the Quandl library, since using it would make my code cleaner. Therefore I would appreciate any help. Thanks in advance.
Ok, I found the solution, I had to set RCurlOptions, because the Quandl function uses getURL() to download data from url. But I had to use options() function as well. So:
options(RCurlOptions = list(proxy = "my.proxy", proxyport = my.proxyport.number))
head(quandldata <- Quandl("NSE/OIL"))
Date Open High Low Last Close Total Trade Quantity Turnover (Lacs)
1 2014-03-03 453.5 460.05 450.10 450.30 451.30 90347 410.08
2 2014-02-28 440.0 460.00 440.00 457.60 455.55 565074 2544.66
3 2014-02-26 446.2 450.95 440.00 440.65 440.60 179055 794.24
4 2014-02-25 445.1 451.75 445.10 446.60 447.20 86858 389.38
5 2014-02-24 443.0 449.50 443.00 446.50 446.30 81197 362.33
6 2014-02-21 447.9 448.65 442.95 445.50 446.80 95791 427.32
I guess you need to check if the domain quand1.com accepts remote Connections to the RBRTE.csv file.

Resources