Read a CSV from github into R - r

I am trying to read a CSV from github into R:
latent.growth.data <- read.csv("https://github.com/aronlindberg/latent_growth_classes/blob/master/LGC_data.csv")
However, this gives me:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : unsupported URL scheme
I tried ?read.csv, ?download.file, getURL (which only returned strange HTML), as well as the data import manual, but still cannot understand how to make it work.
What am I doing wrong?

Try this:
library(RCurl)
x <- getURL("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
y <- read.csv(text = x)
You have two problems:
You're not linking to the "raw" text file, but Github's display version (visit the URL for https:\raw.github.com....csv to see the difference between the raw version and the display version).
https is a problem for R in many cases, so you need to use a package like RCurl to get around it. In some cases (not with Github, though) you can simply replace https with http and things work out, so you can always try that out first, but I find using RCurl reliable and not too much extra typing.

From the documentation of url:
Note that ‘https://’ connections are not supported (with some
exceptions on Windows).
So the problem is that R does not allow conncetions to https URL's.
You can use download.file with curl:
download.file("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv", method = "curl")

I am using R 3.0.2 and this code does the job.
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(urlfile)
and this as well
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(url(urlfile))
edit (sessionInfo)
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.0.2

In similar style to akhmed, I thought I would update the answer, since now you can just use Hadley's readr package.
Just one thing to note: you'll need the url to be the raw content (see the //raw.git... below).
Here's an example:
library(readr)
data <- read_csv("https://raw.githubusercontent.com/RobertMyles/Bayesian-Ideal-Point-IRT-Models/master/Senate_Example.csv")
Voilà!

Realizing that the question is very old, Google still reported it as a top result (at least for me) so I decided to provide the answer for year 2015.
Folks are generally migrating now to curl package (including famous httr) as described by r-bloggers which offers the following very simple solution:
library(curl)
x <- read.csv( curl("https://raw.githubusercontent.com/trinker/dummy/master/data/gcircles.csv") )

This is what I've been helping develop rio for. It's basically a universal data import/export package that supports HTTPS/SSL and infers the file type from its extension, thus allowing you to read basically anything using one import function:
library("rio")
If you grab the "raw" url for your CSV from Github, you can load it one line with import:
import("https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
The result is a data.frame:
top100_repository_name month monthly_increase monthly_begin_at monthly_end_with
1 Bukkit 2012-03 9 431 440
2 Bukkit 2012-04 19 438 457
3 Bukkit 2012-05 19 455 474
4 Bukkit 2012-06 18 475 493
5 Bukkit 2012-07 15 492 507
6 Bukkit 2012-08 50 506 556
...

Seems nowadays GitHub wants you to go through their API to fetch content. I used the gh package as follows:
require(gh)
tmp = tempfile()
qurl = 'https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
# download
gh(paste0('GET ', qurl), .destfile = tmp, .overwrite = TRUE)
# read
read.csv(tmp)
The important part is that you provide an personal access token (PAT). Either through the gh(.token = ) argument, or as I did, by setting the PAT globally in an ~/.Renviron file [1]. Of course you first have to create the PAT at your GitHub account.
[1] ~/.Renviron, I guess is searched first by all r-lib packages, as gh is one. The token therein should look like this:
GITHUB_PAT = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
You could also use the usethis package to set up the PAT.

A rather dummy way... using copy/paste from clipboard
x <- read.table(file = "clipboard", sep = "t", header=TRUE)

curl might not work in windows at least for me
This is what worked for me in Windows
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="wininet")
In Linux
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="curl")

As mentioned by other postings, just go to the link for the raw code on github.
For example:
x <- read.csv("https://raw.githubusercontent.com/rfordatascience/
tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv")

Related

How to retrieve a tsv file from github directly to my R session? [duplicate]

I am trying to read a CSV from github into R:
latent.growth.data <- read.csv("https://github.com/aronlindberg/latent_growth_classes/blob/master/LGC_data.csv")
However, this gives me:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : unsupported URL scheme
I tried ?read.csv, ?download.file, getURL (which only returned strange HTML), as well as the data import manual, but still cannot understand how to make it work.
What am I doing wrong?
Try this:
library(RCurl)
x <- getURL("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
y <- read.csv(text = x)
You have two problems:
You're not linking to the "raw" text file, but Github's display version (visit the URL for https:\raw.github.com....csv to see the difference between the raw version and the display version).
https is a problem for R in many cases, so you need to use a package like RCurl to get around it. In some cases (not with Github, though) you can simply replace https with http and things work out, so you can always try that out first, but I find using RCurl reliable and not too much extra typing.
From the documentation of url:
Note that ‘https://’ connections are not supported (with some
exceptions on Windows).
So the problem is that R does not allow conncetions to https URL's.
You can use download.file with curl:
download.file("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv", method = "curl")
I am using R 3.0.2 and this code does the job.
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(urlfile)
and this as well
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(url(urlfile))
edit (sessionInfo)
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.0.2
In similar style to akhmed, I thought I would update the answer, since now you can just use Hadley's readr package.
Just one thing to note: you'll need the url to be the raw content (see the //raw.git... below).
Here's an example:
library(readr)
data <- read_csv("https://raw.githubusercontent.com/RobertMyles/Bayesian-Ideal-Point-IRT-Models/master/Senate_Example.csv")
Voilà!
Realizing that the question is very old, Google still reported it as a top result (at least for me) so I decided to provide the answer for year 2015.
Folks are generally migrating now to curl package (including famous httr) as described by r-bloggers which offers the following very simple solution:
library(curl)
x <- read.csv( curl("https://raw.githubusercontent.com/trinker/dummy/master/data/gcircles.csv") )
This is what I've been helping develop rio for. It's basically a universal data import/export package that supports HTTPS/SSL and infers the file type from its extension, thus allowing you to read basically anything using one import function:
library("rio")
If you grab the "raw" url for your CSV from Github, you can load it one line with import:
import("https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
The result is a data.frame:
top100_repository_name month monthly_increase monthly_begin_at monthly_end_with
1 Bukkit 2012-03 9 431 440
2 Bukkit 2012-04 19 438 457
3 Bukkit 2012-05 19 455 474
4 Bukkit 2012-06 18 475 493
5 Bukkit 2012-07 15 492 507
6 Bukkit 2012-08 50 506 556
...
Seems nowadays GitHub wants you to go through their API to fetch content. I used the gh package as follows:
require(gh)
tmp = tempfile()
qurl = 'https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
# download
gh(paste0('GET ', qurl), .destfile = tmp, .overwrite = TRUE)
# read
read.csv(tmp)
The important part is that you provide an personal access token (PAT). Either through the gh(.token = ) argument, or as I did, by setting the PAT globally in an ~/.Renviron file [1]. Of course you first have to create the PAT at your GitHub account.
[1] ~/.Renviron, I guess is searched first by all r-lib packages, as gh is one. The token therein should look like this:
GITHUB_PAT = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
You could also use the usethis package to set up the PAT.
A rather dummy way... using copy/paste from clipboard
x <- read.table(file = "clipboard", sep = "t", header=TRUE)
curl might not work in windows at least for me
This is what worked for me in Windows
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="wininet")
In Linux
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="curl")
As mentioned by other postings, just go to the link for the raw code on github.
For example:
x <- read.csv("https://raw.githubusercontent.com/rfordatascience/
tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv")

encoding error with read_html

I am trying to web scrape a page. I thought of using the package rvest.
However, I'm stuck in the first step, which is to use read_html to read the content.
Here´s my code:
library(rvest)
url <- "http://simec.mec.gov.br/painelObras/recurso.php?obra=17956"
obra_caridade <- read_html(url,
encoding = "ISO-8895-1")
And I got the following error:
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, :
Input is not proper UTF-8, indicate encoding !
Bytes: 0xE3 0x6F 0x20 0x65 [9]
I tried using what similar questions had as answers, but it did not solve my issue:
obra_caridade <- read_html(iconv(url, to = "UTF-8"),
encoding = "UTF-8")
obra_caridade <- read_html(iconv(url, to = "ISO-8895-1"),
encoding = "ISO-8895-1")
Both attempts returned a similar error.
Does anyone has any suggestion about how to solve this issue?
Here's my session info:
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
[3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rvest_0.3.2 xml2_1.1.1
loaded via a namespace (and not attached):
[1] httr_1.2.1 magrittr_1.5 R6_2.2.1 tools_3.3.1 curl_2.6 Rcpp_0.12.11
What's the issue?
Your issue here is in correctly determining the encoding of the webpage.
The good news
Your approach looks like a good one to me since you looked at the source code and found the Meta charset, given as ISO-8895-1. It is certainly ideal to be told the encoding, rather than have to resort to guess-work.
The bad news
I don't believe that encoding exists. Firstly, when I search for it online the results tend to look like typos. Secondly, R provides you with a list of supported encodings via iconvlist(). ISO-8895-1 is not in the list, so entering it as an argument to read_html isn't useful. I think it'd be nice if entering a non-supported encoding threw a warning, but this doesn't seem to happen.
Quick solution
As suggested by #MrFlick in a comment, using encoding = "latin1" appears to work.
I suspect the Meta charset has a typo and it should read ISO-8859-1 (which is the same thing as latin1).
Tips on guessing an encoding
What is your browser doing?
When loading the page in a browser, you can see what encoding it is using to read the page. If the page looks right, this seems like a sensible guess. In this instance, Firefox uses Western encoding (i.e. ISO-8859-1).
Guessing with R
rvest::guess_encoding is a nice, user-friendly function which can give a quick estimate. You can provide the function with a url e.g. guess_encoding(url), or copy in phrases with more complex characters e.g. guess_encoding("Situação do Termo/Convênio:").
One thing to note about this function is it can only detect from 30 of the more common encodings, but there are many more possibilities.
As mentioned earlier, iconvlist() provides a list of supported encodings. By looping through these encodings and examining some text in the page to see if it's what we expect, we should end up with a shortlist of possible encodings (and rule many encodings out).
Sample code can be found at the bottom of this answer.
Final comments
All the above points towards ISO-8859-1 being a sensible guess for the encoding.
The page url contains a .br extension indicating it's Brazilian, and - according to Wikipedia - this encoding has complete language coverage for Brazilian Portuguese, which suggests it might not be a crazy choice for whoever created the webpage. I believe this is also a reasonably common encoding type.
Code
Sample code for 'Guessing with R' point 2 (using iconvlist()):
library(rvest)
url <- "http://simec.mec.gov.br/painelObras/recurso.php?obra=17956"
# 1. See which encodings don't throw an error
read_page <- lapply(unique(iconvlist()), function(encoding_attempt) {
# Optional print statement to show progress to 1 since this can take some time
print(match(encoding_attempt, iconvlist()) / length(iconvlist()))
read_attempt <- tryCatch(expr=read_html(url, encoding=encoding_attempt),
error=function(condition) NA,
warning=function(condition) message(condition))
return(read_attempt)
})
names(read_page) <- unique(iconvlist())
# 2. See which encodings correctly display some complex characters
read_phrase <- lapply(x, function(encoded_page)
if(!is.na(encoded_page))
html_text(html_nodes(encoded_page, ".dl-horizontal:nth-child(1) dt")))
# We've ended up with 27 encodings which could be sensible...
encoding_shortlist <- names(read_phrase)[read_phrase == "Situação:"]

Import data from GitHub [duplicate]

I am trying to read a CSV from github into R:
latent.growth.data <- read.csv("https://github.com/aronlindberg/latent_growth_classes/blob/master/LGC_data.csv")
However, this gives me:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : unsupported URL scheme
I tried ?read.csv, ?download.file, getURL (which only returned strange HTML), as well as the data import manual, but still cannot understand how to make it work.
What am I doing wrong?
Try this:
library(RCurl)
x <- getURL("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
y <- read.csv(text = x)
You have two problems:
You're not linking to the "raw" text file, but Github's display version (visit the URL for https:\raw.github.com....csv to see the difference between the raw version and the display version).
https is a problem for R in many cases, so you need to use a package like RCurl to get around it. In some cases (not with Github, though) you can simply replace https with http and things work out, so you can always try that out first, but I find using RCurl reliable and not too much extra typing.
From the documentation of url:
Note that ‘https://’ connections are not supported (with some
exceptions on Windows).
So the problem is that R does not allow conncetions to https URL's.
You can use download.file with curl:
download.file("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv", method = "curl")
I am using R 3.0.2 and this code does the job.
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(urlfile)
and this as well
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(url(urlfile))
edit (sessionInfo)
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.0.2
In similar style to akhmed, I thought I would update the answer, since now you can just use Hadley's readr package.
Just one thing to note: you'll need the url to be the raw content (see the //raw.git... below).
Here's an example:
library(readr)
data <- read_csv("https://raw.githubusercontent.com/RobertMyles/Bayesian-Ideal-Point-IRT-Models/master/Senate_Example.csv")
Voilà!
Realizing that the question is very old, Google still reported it as a top result (at least for me) so I decided to provide the answer for year 2015.
Folks are generally migrating now to curl package (including famous httr) as described by r-bloggers which offers the following very simple solution:
library(curl)
x <- read.csv( curl("https://raw.githubusercontent.com/trinker/dummy/master/data/gcircles.csv") )
This is what I've been helping develop rio for. It's basically a universal data import/export package that supports HTTPS/SSL and infers the file type from its extension, thus allowing you to read basically anything using one import function:
library("rio")
If you grab the "raw" url for your CSV from Github, you can load it one line with import:
import("https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
The result is a data.frame:
top100_repository_name month monthly_increase monthly_begin_at monthly_end_with
1 Bukkit 2012-03 9 431 440
2 Bukkit 2012-04 19 438 457
3 Bukkit 2012-05 19 455 474
4 Bukkit 2012-06 18 475 493
5 Bukkit 2012-07 15 492 507
6 Bukkit 2012-08 50 506 556
...
Seems nowadays GitHub wants you to go through their API to fetch content. I used the gh package as follows:
require(gh)
tmp = tempfile()
qurl = 'https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
# download
gh(paste0('GET ', qurl), .destfile = tmp, .overwrite = TRUE)
# read
read.csv(tmp)
The important part is that you provide an personal access token (PAT). Either through the gh(.token = ) argument, or as I did, by setting the PAT globally in an ~/.Renviron file [1]. Of course you first have to create the PAT at your GitHub account.
[1] ~/.Renviron, I guess is searched first by all r-lib packages, as gh is one. The token therein should look like this:
GITHUB_PAT = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
You could also use the usethis package to set up the PAT.
A rather dummy way... using copy/paste from clipboard
x <- read.table(file = "clipboard", sep = "t", header=TRUE)
curl might not work in windows at least for me
This is what worked for me in Windows
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="wininet")
In Linux
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="curl")
As mentioned by other postings, just go to the link for the raw code on github.
For example:
x <- read.csv("https://raw.githubusercontent.com/rfordatascience/
tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv")

Quantmod, getSymbols error trying to replicate answer

I just downloaded the package Quantmod and have been playing with getSymbols. I want to be able to get data for multiple stocks as in this question: getSymbols and using lapply, Cl, and merge to extract close prices.
Unfortuantely, when I try to duplicate the answer:
tickers <- c("SPY","DIA","IWM","SMH","OIH","XLY",
"XLP","XLE","XLI","XLB","XLK","XLU")
getSymbols(tickers, from="2001-03-01", to="2011-03-11")
I get the following error message:
Error in download.file(paste(yahoo.URL, "s=", Symbols.name, "&a=", from.m, :
cannot open URL
'http://chart.yahoo.com/table.csv?s=SPY&a=2&b=01&c=2001&d=2&e=11&f=2011&g=d&q=q&y=0&z=SPY&x=.csv'
In addition: Warning message:
In download.file(paste(yahoo.URL, "s=", Symbols.name, "&a=", from.m,
: cannot open: HTTP status was '0 (null)'
Here is my sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quantmod_0.4-0 TTR_0.22-0 xts_0.9-7 zoo_1.7-10 Defaults_1.1-1
loaded via a namespace (and not attached):
[1] grid_3.0.2 lattice_0.20-23 tools_3.0.2
EDIT: In response to OP's comment:
So the bottom line seems to be that sites which provide free downloads of historical data are quirky, to say the least. They do not necessarily work for all valid symbols, and sometimes they become unavailable for no apparent reason. ichart.yahoo.com/table.csv worked for me 24 hours ago, but does not work (for me) at the moment. This may be because Yahoo has imposed a 24 hour lockout on my IP, which they will do if they detect activity interpretable as a DDOS attack. Or it might be for some other reason...
The updated code below, which queries Google, does work (again, at the moment), for everything except DJIA. Note that there is more success if you specify the exchange and the symbol (EXCHANGE:SYMBOL). I was unable to download SMH without the exchange. Finally, if your are having problems try uncommenting the print statement and pasting the url into a browser to see what happens.
tickers <- c("SPY","DJIA","IWM","NYSEARCA:SMH","OIH","XLY",
"XLP","XLE","XLI","XLB","XLK","XLU")
g <- function(x,from,to,output="csv") {
uri <- "http://www.google.com/finance/historical"
q.symbol <- paste("q",x,sep="=")
q.from <- paste("startdate",from,sep="=")
q.to <- paste("enddate",to,sep="=")
q.output <- paste("output",output,sep="=")
query <- paste(q.symbol,q.output,q.from,q.to,sep="&")
url <- paste(uri,query,sep="?")
# print(url)
try(assign(x,read.csv(url),envir=.GlobalEnv))
}
lapply(tickers,g,from="2001-03-01",to="2011-03-11",output="csv")
You can download DJI from the St. Louis Fed, which is very reliable. Unfortunately, you get all of it (from 1896), and it's a time series.
getSymbols("DJIA",src="FRED")
Original Response:
This worked for me, for everything except SMH and OIH.
tickers <- c("SPY","DJIA","IWM","SMH","OIH","XLY",
"XLP","XLE","XLI","XLB","XLK","XLU")
f <- function(x) {
uri <- "http://ichart.yahoo.com/table.csv"
symbol <- paste("s",x,sep="=")
from <- "a=2&b=1&c=2001"
to <- "d=2&e=11&f=2011"
period <- "g=d"
ignore <- "ignore=.csv"
query <- paste(symbol,from,to,period,ignore,sep="&")
url <- paste(uri,query,sep="?")
try(assign(x,read.csv(url),envir=.GlobalEnv))
}
lapply(tickers,f)
The main difference between this and getSymbols(...) is that this uses ichart.yahoo.com (as documented here), whereas getSymbols(...) uses chart.yahoo.com. The former seems to be much more reliable. In my experience, using getSymbols(...) with Yahoo is a monumental headache.
If the suggestion of #user2492310, to use src="google" works for you, then clearly this is the way to go. It didn't work for me.
One other note: SMH and OIH did not exist in 2001. The others all go back to at least 2000. So it might be that ichart.yahoo.com (and chart.yahoo.com) throws an error if you provide a date range outside of the symbol's operating range.

Fail to download csv with read.table(url()) from github [duplicate]

I am trying to read a CSV from github into R:
latent.growth.data <- read.csv("https://github.com/aronlindberg/latent_growth_classes/blob/master/LGC_data.csv")
However, this gives me:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : unsupported URL scheme
I tried ?read.csv, ?download.file, getURL (which only returned strange HTML), as well as the data import manual, but still cannot understand how to make it work.
What am I doing wrong?
Try this:
library(RCurl)
x <- getURL("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
y <- read.csv(text = x)
You have two problems:
You're not linking to the "raw" text file, but Github's display version (visit the URL for https:\raw.github.com....csv to see the difference between the raw version and the display version).
https is a problem for R in many cases, so you need to use a package like RCurl to get around it. In some cases (not with Github, though) you can simply replace https with http and things work out, so you can always try that out first, but I find using RCurl reliable and not too much extra typing.
From the documentation of url:
Note that ‘https://’ connections are not supported (with some
exceptions on Windows).
So the problem is that R does not allow conncetions to https URL's.
You can use download.file with curl:
download.file("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv", method = "curl")
I am using R 3.0.2 and this code does the job.
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(urlfile)
and this as well
urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(url(urlfile))
edit (sessionInfo)
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.0.2
In similar style to akhmed, I thought I would update the answer, since now you can just use Hadley's readr package.
Just one thing to note: you'll need the url to be the raw content (see the //raw.git... below).
Here's an example:
library(readr)
data <- read_csv("https://raw.githubusercontent.com/RobertMyles/Bayesian-Ideal-Point-IRT-Models/master/Senate_Example.csv")
Voilà!
Realizing that the question is very old, Google still reported it as a top result (at least for me) so I decided to provide the answer for year 2015.
Folks are generally migrating now to curl package (including famous httr) as described by r-bloggers which offers the following very simple solution:
library(curl)
x <- read.csv( curl("https://raw.githubusercontent.com/trinker/dummy/master/data/gcircles.csv") )
This is what I've been helping develop rio for. It's basically a universal data import/export package that supports HTTPS/SSL and infers the file type from its extension, thus allowing you to read basically anything using one import function:
library("rio")
If you grab the "raw" url for your CSV from Github, you can load it one line with import:
import("https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
The result is a data.frame:
top100_repository_name month monthly_increase monthly_begin_at monthly_end_with
1 Bukkit 2012-03 9 431 440
2 Bukkit 2012-04 19 438 457
3 Bukkit 2012-05 19 455 474
4 Bukkit 2012-06 18 475 493
5 Bukkit 2012-07 15 492 507
6 Bukkit 2012-08 50 506 556
...
Seems nowadays GitHub wants you to go through their API to fetch content. I used the gh package as follows:
require(gh)
tmp = tempfile()
qurl = 'https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
# download
gh(paste0('GET ', qurl), .destfile = tmp, .overwrite = TRUE)
# read
read.csv(tmp)
The important part is that you provide an personal access token (PAT). Either through the gh(.token = ) argument, or as I did, by setting the PAT globally in an ~/.Renviron file [1]. Of course you first have to create the PAT at your GitHub account.
[1] ~/.Renviron, I guess is searched first by all r-lib packages, as gh is one. The token therein should look like this:
GITHUB_PAT = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
You could also use the usethis package to set up the PAT.
A rather dummy way... using copy/paste from clipboard
x <- read.table(file = "clipboard", sep = "t", header=TRUE)
curl might not work in windows at least for me
This is what worked for me in Windows
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="wininet")
In Linux
download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv",
destfile = "/tmp/test.csv",method="curl")
As mentioned by other postings, just go to the link for the raw code on github.
For example:
x <- read.csv("https://raw.githubusercontent.com/rfordatascience/
tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv")

Resources