read.xls and url on Windows in R - r

I have seen many posts on here about using read.xls with a url and they all worked on my Mac, but now when I am trying to use the code on my Windows computer, it is not working. I used the below code on my Mac:
tmp <- tempfile()
download.file("https://www.spdrs.com/site-content/xls/SPY_All_Holdings.xls?fund=SPY&docname=All+Holdings&onyx_code1=1286&onyx_code2=1700", destfile = tmp, method = "curl")
SPY <- read.xls(tmp, skip=3)
unlink(tmp)
Using "curl" no longer woks ("had status 127" is the warning message) and when I try "internal" or "wininet", it says " formal argument "method" matched by multiple actual arguments". When I try read.xls, it says the file is "missing" and "invalid". I have downloaded Perl, Java, gdata, Rcurl and the "downloader" package (because I heard that works better with https) and could use that instead....Is there something else I would have to do on a Windows computer to make this code work?
Thanks!

> library(RCurl)
> library(XLConnect)
> URL <- "https://www.spdrs.com/site-content/xls/SPY_All_Holdings.xls?fund=SPY&docname=All+Holdings&onyx_code1=1286&onyx_code2=1700"
> f = CFILE("SPY_All_Holdings.xls", mode="wb")
> curlPerform(url = URL, writedata = f#ref, ssl.verifypeer = FALSE)
# OK
# 0
> close(f)
# An object of class "CFILE"
# Slot "ref":
# <pointer: (nil)>
> out <- readWorksheetFromFile(file = "SPY_All_Holdings.xls",sheet="SPY_All_Holdings")
> head(out)
# Fund.Name. SPDR..S.P.500..ETF Col3 Col4 Col5
# 1 Ticker Symbol: SPY <NA> <NA> <NA>
# 2 Holdings: As of 06/06/2016 <NA> <NA> <NA>
# 3 Name Identifier Weight Sector Shares Held
# 4 Apple Inc. AAPL 2.945380 Information Technology 54545070.000
# 5 Microsoft Corporation MSFT 2.220684 Information Technology 77807630.000
# 6 Exxon Mobil Corporation XOM 1.998224 Energy 40852760.000

Related

why can i plot my stars_proxy object but not write it out

I'm new to stars so hoping this is a simple answer and just me failing to understand the stars workflow properly.
R Version: 4.1.1
Stars Version: 0.5-5
library(stars)
library(starsdata) #install.packages("starsdata", repos = "http://gis-bigdata.uni-muenster.de", type = "source")
#Create the rasters to read in as proxy
granule = system.file("sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip", package = "starsdata")
s2 = paste0("SENTINEL2_L1C:/vsizip/", granule, "/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632")
r1<-read_stars(s2,,RasterIO=list(bands=1),proxy=T)
r2<-read_stars(s2,,RasterIO=list(bands=2),proxy=T)
r3<-read_stars(s2,,RasterIO=list(bands=3),proxy=T)
write_stars(r1,dsn="r1.tif")
write_stars(r2,dsn="r2.tif")
write_stars(r3,dsn="r3.tif")
Then I clear the objects from my environment and restart the R session.
#I clear all the objects and restart my R session here.
library(stars)
foo<-read_stars(c("r1.tif","r2.tif","r3.tif"),proxy=T)
r1<- foo[1]*0
r1[foo[1] > 4000 & foo[2] < 3000] <- 1
r1[foo[1] > 4000 & foo[2] >= 3000 & foo[2] <= 8000] <- 2
r1[foo[1] > 4000 & foo[2] > 8000 & foo[3] < 2000] <- 4
r1[foo[1] > 4000 & foo[2] > 8000 & foo[3] >= 2000] <- 2
# plot(r1) #this works just fine if you run it
#why doesn't the below work?
write_stars(r1,dsn="out.tif")
Attempting to write out the file results in the following error:
Error in st_as_stars.list(mapply(fun, x, i, value = value, SIMPLIFY = FALSE), :
!is.null(dx) is not TRUE
If instead of writing out the file, I plot the raster, it works just fine/as expected.
Perhaps the issue is just my failure to understand that this answer applies to me too:
How to reassign cell/pixel values in R stars objects
First of all thank you for the effort you have made to make available a minimal reproducible example. Unfortunately the image you use is very heavy... and my pc is very old ! ;-) So I chose to use your example with another image (the test image of the stars library), easier to handle for my old computer.
So please find below a reprex which describes step by step the approach.
Reprex
STEP 1: Create three dummy stars proxy objects from the test image of the stars library
library(stars)
r <- system.file("tif/L7_ETMs.tif", package = "stars")
r1 <- read_stars(r, RasterIO = list(bands=1), proxy = TRUE)
r2 <- read_stars(r, RasterIO = list(bands=2), proxy = TRUE)
r3 <- read_stars(r, RasterIO = list(bands=3), proxy = TRUE)
STEP 2: Write every stars proxy object as .tif files on disk
write_stars(r1,dsn="r1.tif")
write_stars(r2,dsn="r2.tif")
write_stars(r3,dsn="r3.tif")
STEP 3: Merge r1, r2 and r3 in the stars proxy object foo
foo <- read_stars(c("r1.tif","r2.tif","r3.tif"), proxy = TRUE)
foo <- merge(foo)
STEP 4: Visualization of the foo stars proxy object
plot(foo)
If you want to display a specific band, proceed like this (here, showing band 3):
plot(foo[,,,3], main = st_dimensions(foo)["band"]$band$values[3])
STEP 5: Run your chunk of code
r1 <- foo[,,,3]*0 #0 create a proxy with 0s that we will replace using rules below
r1[foo[,,,1] > 40 & foo[,,,2] < 30] <- 1
r1[foo[,,,1] > 40 & foo[,,,2] >= 30 & foo[,,,2] <= 70] <- 2
r1[foo[,,,1] > 40 & foo[,,,2] > 70 & foo[,,,3] < 7] <- 4
r1[foo[,,,1] > 40 & foo[,,,2] > 70 & foo[,,,3] >= 7] <- 2
STEP 6: Visualization of the output
plot(r1)
NB: I have deliberately not included the output raster here because at the end of the execution of your chunk of code, all pixels of the test image have the value 2. The output image is therefore a monochrome raster without any interest [this result is consistent with the pixel values of the original test image].
STEP 7: Save the output
write_stars(r1, dsn = "out.tif")
STEP 8: Checks that the file has been successfully written to the disk
file.exists("out.tif")
#> [1] TRUE
Created on 2021-12-10 by the reprex package (v2.0.1)

Download zip file to R when download link ends in '/download'

My issue is similar to this post, but the solution suggestion does not appear applicable.
I have a lot of zipped data stored an online server (B2Drop), that provides a download link with the extension "/download" instead of ".zip". I have been unable to get the method described here, to work.
I have created a test download page https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq, where the download link https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download can be obtained by right clicking the download button. Here is my script:
temp <- tempfile()
download.file("https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download",temp, mode="wb")
data <- read.table(unz(temp, "Test_file1.csv"))
unlink(temp)
When I run it, I get the error:
download.file("https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download",temp, mode="wb")
trying URL 'https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download'
Content type 'application/zip' length 558 bytes
downloaded 558 bytes
data <- read.table(unz(temp, "Test_file1.csv"))
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot locate file 'Test_file1.csv' in zip file 'C:\Users\User_name\AppData\Local\Temp\RtmpMZ6gXi\file3e881b1f230e'
which typically indicates a problem with the working directory where R is looking for the file. In this case that should be the temp wd.
Your internal path is wrong. You can use list=TRUE to list the files in the archive, analogous to the command-line utility's -l argument.
unzip(temp, list=TRUE)
# Name Length Date
# 1 Test/Test_file1.csv 256 2021-09-27 10:13:00
# 2 Test/Test_file2.csv 286 2021-09-27 10:14:00
Better than read.table, though, use read.csv since it's comma-delimited.
data <- read.csv(unz(temp, "Test/Test_file1.csv"))
head(data, 3)
# ID Variable1 Variable2 Variable Variable3
# 1 1 f 54654 25 t1
# 2 2 t 421 64 t2
# 3 3 x 4521 85 t3

getting Error while installing DMwR package

hi i am getting this error message while installing DMwR package from RGUI-3.3.1.
Error in read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package", "Type")) :
cannot open the connection
In addition: Warning messages:
1: In unzip(zipname, exdir = dest) : error 1 in extracting from zip file
2: In read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package", "Type")) :
cannot open compressed file 'bitops/DESCRIPTION', probable reason 'No such file or directory'
Approach 1:
The error being reported is inability to open a connection. In Windows that is often a firewall problem and is in the Windows R FAQ. The usual first attempt should be to run internet2.dll. From a console session you can use:
setInternet2(TRUE)
NEWS for R version 3.3.1 Patched (2016-09-13 r71247)
(Windows only) Function
setInternet2()
has no effect and will be removed in due
course. The choice between methods
"internal"
and
"wininet"
is now made by the
method
arguments of
url()
and
download.file()
and their defaults can be set
via
options. The out-of-the-box default remains
"wininet"
(as it has been since
R
3.2.2)
You are using version 3.3.1, this is why it is not working anymore.
Approach 2
The error is suggesting that the package requires another package bitops that is not available. That package is not in any of the dependencies but perhaps one of the dependencies requires it in turn(In this case, it is: ROCR).
Try installing:
install.packages("bitops",repos="https://cran.r-project.org/bin/windows/contrib/3.3/bitops_1.0-6.zip",dependencies=TRUE,type="source")
The package DMwR contains packages abind, zoo, xts, quantmod and ROCR as imports. So, additionally to installing 5 packages you must install DMwR package, Install these packages manually.
Install packages in following sequence:
install.packages('abind')
install.packages('zoo')
install.packages('xts')
install.packages('quantmod')
install.packages('ROCR')
install.packages("DMwR")
library("DMwR")
Approach 3:
chooseCRANmirror()
Select CRAN mirror from popup list. Then install packages:
install.packages("bitops")
install.packages("DMwR")
Package ‘DMwR’ was removed from the CRAN repository.
Formerly available versions can be obtained from the archive.
https://CRAN.R-project.org/package=DMwR
You can use the function as written in CRAN package. Copy the following code in a new RScript, run it and save it for future use if you want. Once you run this function, you should be able to use the way you have been trying to use it.
# ===================================================
# Creating a SMOTE training sample for classification problems
#
# If called with learner=NULL (the default) is does not
# learn any model, simply returning the SMOTEd data set
#
# NOTE: It does not handle NAs!
#
# Examples:
# ms <- SMOTE(Species ~ .,iris,'setosa',perc.under=400,perc.over=300,
# learner='svm',gamma=0.001,cost=100)
# newds <- SMOTE(Species ~ .,iris,'setosa',perc.under=300,k=3,perc.over=400)
#
# L. Torgo, Feb 2010
# ---------------------------------------------------
SMOTE <- function(form,data,
perc.over=200,k=5,
perc.under=200,
learner=NULL,...
)
# INPUTS:
# form a model formula
# data the original training set (with the unbalanced distribution)
# minCl the minority class label
# per.over/100 is the number of new cases (smoted cases) generated
# for each rare case. If perc.over < 100 a single case
# is generated uniquely for a randomly selected perc.over
# of the rare cases
# k is the number of neighbours to consider as the pool from where
# the new examples are generated
# perc.under/100 is the number of "normal" cases that are randomly
# selected for each smoted case
# learner the learning system to use.
# ... any learning parameters to pass to learner
{
# the column where the target variable is
tgt <- which(names(data) == as.character(form[[2]]))
minCl <- levels(data[,tgt])[which.min(table(data[,tgt]))]
# get the cases of the minority class
minExs <- which(data[,tgt] == minCl)
# generate synthetic cases from these minExs
if (tgt < ncol(data)) {
cols <- 1:ncol(data)
cols[c(tgt,ncol(data))] <- cols[c(ncol(data),tgt)]
data <- data[,cols]
}
newExs <- smote.exs(data[minExs,],ncol(data),perc.over,k)
if (tgt < ncol(data)) {
newExs <- newExs[,cols]
data <- data[,cols]
}
# get the undersample of the "majority class" examples
selMaj <- sample((1:NROW(data))[-minExs],
as.integer((perc.under/100)*nrow(newExs)),
replace=T)
# the final data set (the undersample+the rare cases+the smoted exs)
newdataset <- rbind(data[selMaj,],data[minExs,],newExs)
# learn a model if required
if (is.null(learner)) return(newdataset)
else do.call(learner,list(form,newdataset,...))
}
# ===================================================
# Obtain a set of smoted examples for a set of rare cases.
# L. Torgo, Feb 2010
# ---------------------------------------------------
smote.exs <- function(data,tgt,N,k)
# INPUTS:
# data are the rare cases (the minority "class" cases)
# tgt is the name of the target variable
# N is the percentage of over-sampling to carry out;
# and k is the number of nearest neighours to use for the generation
# OUTPUTS:
# The result of the function is a (N/100)*T set of generated
# examples with rare values on the target
{
nomatr <- c()
T <- matrix(nrow=dim(data)[1],ncol=dim(data)[2]-1)
for(col in seq.int(dim(T)[2]))
if (class(data[,col]) %in% c('factor','character')) {
T[,col] <- as.integer(data[,col])
nomatr <- c(nomatr,col)
} else T[,col] <- data[,col]
if (N < 100) { # only a percentage of the T cases will be SMOTEd
nT <- NROW(T)
idx <- sample(1:nT,as.integer((N/100)*nT))
T <- T[idx,]
N <- 100
}
p <- dim(T)[2]
nT <- dim(T)[1]
ranges <- apply(T,2,max)-apply(T,2,min)
nexs <- as.integer(N/100) # this is the number of artificial exs generated
# for each member of T
new <- matrix(nrow=nexs*nT,ncol=p) # the new cases
for(i in 1:nT) {
# the k NNs of case T[i,]
xd <- scale(T,T[i,],ranges)
for(a in nomatr) xd[,a] <- xd[,a]==0
dd <- drop(xd^2 %*% rep(1, ncol(xd)))
kNNs <- order(dd)[2:(k+1)]
for(n in 1:nexs) {
# select randomly one of the k NNs
neig <- sample(1:k,1)
ex <- vector(length=ncol(T))
# the attribute values of the generated case
difs <- T[kNNs[neig],]-T[i,]
new[(i-1)*nexs+n,] <- T[i,]+runif(1)*difs
for(a in nomatr)
new[(i-1)*nexs+n,a] <- c(T[kNNs[neig],a],T[i,a])[1+round(runif(1),0)]
}
}
newCases <- data.frame(new)
for(a in nomatr)
newCases[,a] <- factor(newCases[,a],levels=1:nlevels(data[,a]),labels=levels(data[,a]))
newCases[,tgt] <- factor(rep(data[1,tgt],nrow(newCases)),levels=levels(data[,tgt]))
colnames(newCases) <- colnames(data)
newCases
}
It has been removed from the CRAN library. There are instructions on how to retrieve it from the archive.
Either follow the link - https://packagemanager.rstudio.com/client/#/repos/2/packages/DMwR
OR copy-paste the three lines of code mentioned below:
install.packages("devtools")
devtools::install_version('DMwR', '0.4.1')
library("DMwR")
EDIT: this is the error I got while downloading the DMwR package in 2022, but looks like when the question was posted, the error happened because of another reason.
The reason is that the package 'DMwR' was built under R version 3.4.3 So the solution is actually explained in the marked answer in details. Hence, to be short:Just run the script below to get the problem solved! 
install.packages('abind')
install.packages('zoo')
install.packages('xts')
install.packages('quantmod')
install.packages('ROCR')
install.packages("DMwR")
library("DMwR")

API request with R

I try to do geocoding of French addresses. I'd like to use the following website : http://adresse.data.gouv.fr/
There is an example on this website on how is working the API but I think it's some Linux code and I'd like to translate in R code. The aim is to give a csv file with addresses and the result should be geo coordinates.
Linux code (example give on the website)
http --timeout 600 -f POST http://api-adresse.data.gouv.fr/search/csv/ data#path/to/file.csv
I tried to "translate" this in R with the following code
library(httr)
library(RCurl)
queryResults=POST("http://api-adresse.data.gouv.fr/search/csv/",body=list(data=fileUpload("file.csv")))
result_geocodage=content(queryResults)
But unfortunately I have a bad request error.
Does somebody knows what I'm missing in the translation to R?
Thanks!
Here's an example. First, some example data plus the request:
library(httr)
df <- data.frame(c("13 Boulevard Chanzy", "Gloucester St"),
c("93100 Montreuil", "Jersey"))
write.csv2(df, tf <- tempfile(fileext = ".csv"))
res <- POST("http://api-adresse.data.gouv.fr/search/csv/",
timeout(600),
body = list(data = upload_file(tf)))
Then, the result:
content(res, sep = ";", row.names = 1)
# c..13.Boulevard.Chanzy....Gloucester.St.. c..93100.Montreuil....Jersey.. latitude longitude
# 1 13 Boulevard Chanzy 93100 Montreuil 48.85825 2.434462
# 2 Gloucester St Jersey 49.46712 1.145554
# result_label result_score result_type result_id result_housenumber
# 1 13 Boulevard Chanzy 93100 Montreuil 0.88 housenumber ADRNIVX_0000000268334929 13
# 2 2 Résidence le Jersey 76160 Saint-Martin-du-Vivier 0.24 housenumber ADRNIVX_0000000311480901 2
# result_name result_street result_postcode result_city result_context result_citycode
# 1 Boulevard Chanzy NA 93100 Montreuil 93, Seine-Saint-Denis, Île-de-France 93048
# 2 Résidence le Jersey NA 76160 Saint-Martin-du-Vivier 76, Seine-Maritime, Haute-Normandie 76617
Or, just the coordinates:
subset(content(res, sep = ";", row.names = 1, check.names = FALSE), select = c("latitude", "longitude"))
# latitude longitude
# 1 48.85825 2.434462
# 2 49.46712 1.145554

Download a file from HTTPS using download.file()

I would like to read online data to R using download.file() as shown below.
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")
Someone suggested to me that I add the line setInternet2(TRUE), but it still doesn't work.
The error I get is:
Warning messages:
1: running command 'curl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv" -o "./data/data.csv"' had status 127
2: In download.file(URL, destfile = "./data/data.csv", method = "curl", :
download had nonzero exit status
Appreciate your help.
It might be easiest to try the RCurl package. Install the package and try the following:
# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
# RT SERIALNO DIVISION PUMA REGION ST
# 1 H 186 8 700 4 16
# 2 H 306 8 700 4 16
# 3 H 395 8 100 4 16
# 4 H 506 8 700 4 16
# 5 H 835 8 800 4 16
# 6 H 989 8 700 4 16
dim(out)
# [1] 6496 188
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")
Here's an update as of Nov 2014. I find that setting method='curl' did the trick for me (while method='auto', does not).
For example:
# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip')
# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='auto')
# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='curl')
I've succeed with the following code:
url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)
Note that I've changed the protocol from https to http, since the first one doesn't seem to be supported in R.
If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.
The extended code:
install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
Worked for me on Windows 7 64-bit using R3.1.0!
Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.
#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)
ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm
ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1
ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2
In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .
Try following with heavy files
library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)
127 means command not found
In your case, curl command was not found. Therefore it means, curl was not found.
You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html
Close RStudio before installation.
Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.
I resolved the problem doing as follows:
Using RStudio instead of R console.
Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).
I typed the URL address as https (secure connection) and with / instead of backslashes \\.
Setting method = "auto".
It works for me now. You should see the message:
Content type 'text/csv; charset=utf-8' length 9294 bytes
opened URL
downloaded 9294 by
You can set global options and try-
options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")
For issue refer to link-
https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html
Downloading files through the httr-package also works:
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
httr::GET(URL,
httr::write_disk(path = basename(URL),
overwrite = TRUE))

Resources