R Importing excel file directly from web - r

I need to import excel file directly from NYSE website. The spreadsheet url is https://quotespeed.morningstar.com/exportChartDataToExcel.jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E777D89877788ECB . Tried using gdata package and changing https to http but still doesnt work. Does anybody know solution to such issue?
EDIT: Has to be imported to R directly from website (project requirement)

Without information about why using the gdata package does not work for you I have to assume. Make sure you have Perl installed - you can download it at http://www.activestate.com/activeperl
This works for me:
library('gdata')
## URL broken into multiple lines for readability
url <- paste("https://quotespeed.morningstar.com/exportChartDataToExcel.",
"jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-",
"6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E",
"777D89877788ECB", sep = "")
url <- gsub("https", "http",url)
data <- read.xls(url, perl = "C:/Perl64/bin/perl.exe")
Without perl = "path_to_perl.exe" I got the error
Error in findPerl(verbose = verbose) :
perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument

Use the RCurl package to download the file and the readxl package by Hadley to read the excel file

Related

Error when exporting R data frame using openxlsx ("Error in zipr")

Usually I'm using the openxlsx package and the write.xlsx function when exporting R data frames into .xlsx-files. Since yesterday - probably after I was using the package XLConnect - something got messed up and the write.xlsx function doesn't work anymore. This is the error I get:
Error in zipr(zipfile = tmpFile, include_directories = FALSE, files = list.files(path = tmpDir, :
unused argument (include_directories = FALSE)
Unfortunately, I don't understand what this error means. Thanks for any helpful advice.
Edit: The function works when I use an older openxlsx version (4.1.0).
I was getting the same error.
I think the problem is with dependencies of openxlsx. There is a "zipR" package that might be picked up when you install openxlsx, while the actual dependency is zip package:
https://cran.r-project.org/web/packages/zip/index.html
https://cran.r-project.org/web/packages/zipR/zipR.pdf
I installed "zip" along with openxlsx and I don't get the error anymore.
I do not really understand the error message here. My computer does not allow me to save files to "c:/". So, if remove "c:/" part, it works fine, to save the file to the current working directory.
library(openxlsx)
df <- data.frame('x' = c(1,2,3),
'y' = c(3,2,1))
openxlsx::write.xlsx(df, "test.xlsx")
You would also try another package: writexl
writexl::write_xlsx(df, "text5.xlsx")`
This works on my machine.

Cannot read data from an xlsx file in RStudio

I have installed the required packages - gdata and ggplot2 and I have installed perl.
library(gdata)
library(ggplot2)
# Read the data from the excel spreadsheet
df = data.frame(read.xls ("AssignmentData.xlsx", sheet = "Data", header = TRUE, perl = "C:\\Strawberry\\perl\\bin\\perl.exe"))
However when I run this I get the following error:
Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
Intermediate file 'C:\Users\CLAIRE~1\AppData\Local\Temp\RtmpE3UYWA\file8983d8e1efc.csv' missing!
In addition: Warning message:
running command '"C:\STRAWB~1\perl\bin\perl.exe" "C:/Users/Claire1992/Documents/R/win-library/3.1/gdata/perl/xls2csv.pl" "AssignmentData.xlsx" "C:\Users\CLAIRE~1\AppData\Local\Temp\RtmpE3UYWA\file8983d8e1efc.csv" "Data"' had status 2
Error in file.exists(tfn) : invalid 'file' argument
Thanks to #Stibu I realised I had to set my work directory. This is the command you use to run in Rstudio; setwd("C/Documents..."). The file path is where the excel file is located.
I had the issue but I solved it differently.
My problem was because my file was saved as Excel (extension .xls) but it was a txt file.
I corrected the file and I did not meet any other error with the R function.

issue with get_rollit_source

I tried to use get_rollit_source from the RcppRoll package as follows:
library(RcppRoll)
get_rollit_source(roll_max,edit=TRUE,RStudio=TRUE)
I get an error:
Error in get("outFile", envir = environment(fun)) :
object 'outFile' not found
I tried
outFile="C:/myDir/Test.cpp"
get_rollit_source(roll_max,edit=TRUE,RStudio=FALSE,outFile=outFile)
I get an error:
Error in get_rollit_source(roll_max, edit = TRUE, RStudio = FALSE, outFile = outFile) :
File does not exist!
How can fix this issue?
I noticed that the RcppRoll folder in the R library doesn't contain any src directory. Should I download it?
get_rollit_source only works for 'custom' functions. For things baked into the package, you could just download + read the source code (you can download the source tarball here, or go to the GitHub repo).
Anyway, something like the following should work:
rolling_sqsum <- rollit(final_trans = "x * x")
get_rollit_source(rolling_sqsum)
(I wrote this package quite a while back when I was still learning R / Rcpp so there are definitely some rough edges...)

R produces "unsupported URL scheme" error when getting data from https sites

R version 3.0.1 (2013-05-16) for Windows 8 knitr version 1.5 Rstudio 0.97.551
I am using knitr to do the markdown of my R code.
As part of my analysis I downloaded various data sets from the web, knitr is totally fine with getting data from http sites but from https ones where it generates an unsupported URL scheme message.
I know when using the download.file function on a mac the method parameter has to be set to curl to get data from an https however this doesn't help when using knitr.
What do I need to do so that knitr will gather data from Https websites?
Edit:
Here is the code chunk that returns an error in Knitr but when run through R works without error.
```{r}
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy")
```
You could use https with download.file() function by passing "curl" to method as :
download.file(url,destination,method="curl")
Edit (May 2016): As of R 3.3.0, download.file() should handle SSL websites automatically on all platforms, making the rest of this answer moot.
You want something like this:
library(RCurl)
data <- getURL("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv",
ssl.verifypeer=0L, followlocation=1L)
That reads the data into memory as a single string. You'll still have to parse it into a dataset in some way. One strategy is:
writeLines(data,'temp.csv')
read.csv('temp.csv')
You can also separate out the data directly without writing to file:
read.csv(text=data)
Edit: A much easier option is actually to use the rio package:
library("rio")
import("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv")
This will read directly from the HTTPS URL and return a data.frame.
Use setInternet2(use = TRUE) before using the download.file() function. It works on Windows 7.
setInternet2(use = TRUE)
download.file(url, destfile = "test.csv")
I am sure you have already found solution to your problem by now.
I was working on an assignment right now and ended up getting the same error. I tried some of the tricks, but that did not work for me. Maybe because I am working on Windows machine.
Anyhow, I changed the link to http: rather than https: and that did the trick.
Following is chunk of my code:
if (!file.exists("./PeerAssesment2")) {dir.create("./PeerAssessment2")}
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, dest = "./PeerAssessment2/Data.zip")
install.packages("R.utils")
library(R.utils)
if (!file.exists("./PeerAssessment2/Data")) {
bunzip2 ("./PeerAssessment2/Data.zip", destname = "./PeerAssessment2/Data")
}
list.files("./PeerAssessment2")
noaaData <- read.csv ('./PeerAssessment2/Data')
Hope this helps.
I had the same issue with knitr and download.file() with a https url, on Windows 8.
You could try setInternet2(TRUE) before using the download.file() function. However I'm not sure that this fix works on Unix-like systems.
setInternet2(TRUE) # set the R_WIN_INTERNET2 to TRUE
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy") # now it should work
Source : R documentation (?download.file()) :
Note that https:// URLs are only supported if --internet2 or environment variable R_WIN_INTERNET2 was set or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid.
I had the same problem with a https with the following code running perfectly in R and getting unsupported URL scheme when knitting to html:
temp = tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", temp)
data = read.csv(unz(temp, "activity.csv"), colClasses = c("numeric", "Date", "numeric"))
I tried all the solutions posted here and nothing worked, in my absolute desperation I just eliminated the "s" in the "https" in the url and everything got fine...
Using the R download package takes care of the quirky details typically associated with file downloads. For you example, all you needed to do would have been:
```{r}
library(download)
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download(fileurl, destfile = "C:/Users/xxx/yyy")
```

Error with tex2docx function from reports R package

I'm trying to reproduce the example for tex2docx function in reports R package and getting the following error.
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex",
package = "reports")
BIB <- system.file("extdata/docs/example.bib", package = "reports")
tex2docx(DOC, file.path(getwd(), "test.docx"), path = NULL, bib.loc = BIB)
Error Message
pandoc.exe: Error reading bibliography `C:/Users/Muhammad'
citeproc: the format of the bibliographic database could not be recognized
using the file extension.
docx file generated!
Warning message:
running command 'C:\Users\MUHAMM~1\AppData\Local\Pandoc\pandoc.exe -s C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/doc_library/apa6.qual_tex/doc.tex -o C:/Users/Muhammad Yaseen/Documents/test.docx --bibliography=C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/docs/example.bib' had status 23
I wonder how to get tex2docx function in reports R package working properly.
As described in the above comments, the error is caused by passing a filename/path including some spaces that are nor escaped, nor quoted. A workaround could be wrapping all file paths and names inside of shQuote before passing to the command line with system.
Code: https://github.com/trinker/reports/pull/31
Demo:
Loading package
library(reports)
Creating a dummy dir with a space in the name that would hold the bib file
dir.create('foo bar')
file.copy(system.file("extdata/docs/example.bib", package = "reports"), 'foo bar/example.bib')
Specifying the source and the copied bib file:
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex", package = "reports")
BIB <- 'foo bar/example.bib'
Running the test:
tex2docx(DOC, file.path(getwd(), "test2.docx"), path = NULL, bib.loc = BIB)
Disclaimer: I tried to test this pull request, but I could not setup an environment with all the needed tools to run R CMD check with vignettes and everything else after all in 5 mins (sorry but being on vacation right now and just enjoying the siesta after lunch), so please consider this pull request as "untested" -- although it should work.

Resources