Invalid 'path' argument with XLConnect - r

I am trying and failing to get the following process to complete in R Version 3.1.2:
library(RCurl)
library(XLConnect)
yr <- substr(Sys.Date(), 1, 4)
mo <- as.character(as.numeric(substr(Sys.Date(), 6, 7)) - 1)
temp <- tempfile()
temp <- getForm("http://strikemap.clb.org.hk/strikes/api.v4/export",
FromYear = "2011", FromMonth = "1",
ToYear = yr, ToMonth = mo,
`_lang` = "en")
CLB <- readWorksheetFromFile(temp, sheet=1)
unlink(temp)
I have been able manually to export the requested data set and then read it into R from a local directory using the same readWorksheetFromFile syntax. My goal now is to do the whole thing in R. The call to the API seems to work (thanks to some earlier help), but the process fails at the next step, when I try to ingest the results. Here's what happens:
> CLB <- readWorksheetFromFile(temp, sheet=1)
Error in path.expand(filename) : invalid 'path' argument
Any thoughts on what I'm doing wrong or what's broken?

Turns out the problem didn't lie with XLConnect at all. Based on Hadley's tip that I needed to save the results of my query to the API to a file before reading them back into R, I have managed (almost) to complete the process using the following code:
library(httr)
library(readxl)
yr <- substr(Sys.Date(), 1, 4)
mo <- as.character(as.numeric(substr(Sys.Date(), 6, 7)) - 1)
baseURL <- paste0("http://strikemap.clb.org.hk/strikes/api.v4/export?FromYear=2011&FromMonth=1&ToYear=", yr, "&ToMonth=", mo, "&_lang=en")
queryList <- parse_url(baseURL)
clb <- GET(build_url(queryList), write_disk("clb.temp.xlsx", overwrite=TRUE))
CLB <- read_excel("clb.temp.xlsx")
The object that creates, CLB, includes the desired data with one glitch: the dates in the first column are not being read properly. If I open "clb.temp.xlsx" in Excel, they show up as expected (e.g., 2015-06-30, or 6/30/2015 if I click on the cell). But read_excel() is reading them as numbers that don't track to those dates in an obvious way (e.g., 42185 for 2015-06-30). I tried fixing that by specifying that they were dates in the call to read_excel, but that produced a long string of warnings about expecting dates but getting those numbers.
If I use readWorkSheetFromFile() instead of read_excel at that last step, here's what happens:
> CLB <- readWorksheetFromFile("clb.temp.xlsx")
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readWorksheet’ for signature ‘"workbook", "missing"’
I will search for a solution to the problem using read_excel and will create a new question if I can't find one.

Related

Problem with XLS files with R's package readxl

I need to read a XLS file in R, but I'm having a problem regarding the way my file is generated and the R function readxl. I do not have this issue with Python, and this is my hope that it's possible to solve this problem inside R.
An application we use at my company exports reports in XLS format (not XLSX). This report is generated daily. What I need is to sum the total value of the rows in each file, in order to create a new report containing each day followed by this total value.
When I try to read these files in R using the readxl package, the program returns this error:
Erro: Can't subset columns that don't exist.
x Location 5 doesn't exist.
i There are only 0 columns.
Run rlang::last_error() to see where the error occurred.
Now, the weird thing is that, when I open the XLS file on Excel before running my script, R is able to run properly.
I guesses this was an error caused by something like the file only being completed when I open it... but the same python script does give me the correct result.
I am now assuming this is a bug in the readxl package. Is there another package I could use to run XLS (and not XLSX)? One that does not depend on Java installed on my computer, I mean.
my readxl script:
if (!require("readxl")) {install.packages("readxl"); library("readxl")}
"%,%" <- function(x,y) paste0(x,"\\",y)
year = "2021"
month = "Aug"
column = 5 # VL_COVAR
path <- "F:\\variancia" %,% year %,% month
tiposDF = c("date","numeric","list","numeric","numeric","numeric","list")
file.names <- dir(path, pattern =".xls")
vari <- c()
for (i in 1:length(file.names)){
file <- paste(path,sep="\\",file.names[i])
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
}
vari
file <- paste(path,sep="\\",'Covariância.xls_02082021.xls')
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
x <- import(file)
View(x)
Thanks everyone!

Convert bed file to vcf with bed2vcf function

I am trying to convert .bed files to vcf by using the function bed2vcf from bedr R package.
I tried the following code:
cromXvcf <-
bed2vcf("cromXmerged2_pruned_removed_sex_mr_hh_sex_pop.bed",
filename = cromXmerged, zero.based = 1, header = NULL, fasta = "/media/iriel/Cosmos/Doctorado/Proyectos/Cromosoma X/Bases dedatos/human_g1k_v37.fasta")
and it throws the following error:
VALIDATE REGIONS * Checking input type... FAIL ERROR: Not sure what
the input format is! Error in is.valid.region(x) :
Can anybody tell what could be wrong? Any other suggestion of how could I do this conversion without using Perl?
I solved it by loading bed file to variable and changing datatypes for column 1 and 4.
Afterwards I also checked that my reference has chromosomes as chr1..22 not just 1...22.
The other thing I checked that my bed file is sorted.
x <- read.table("cromXmerged2_pruned_removed_sex_mr_hh_sex_pop.bed")
x$V1 <- as.character(x$V1)
x$V4 <- as.character(x$V4)
sapply(x, mode)
y <- bed2vcf(x, zero.based=True, header=NULL, fasta="/media/iriel/Cosmos/Doctorado/Proyectos/Cromosoma X/Bases dedatos/human_g1k_v37.fasta")
And it worked fine for me.

Using a function in R to scrape website, returning "subscript out of bounds" error

I am trying to scrape player data from the Baseball Reference website, using a function to loop through multiple years (variable "year") for each player notated by "playerid."
library(plyr)
library(XML)
fetch_stats <- function(playerid, year) {
url <- paste0("http://www.baseball-reference.com/players/gl.cgi?id=",playerid,"&t=b&year=",year)
data <- readHTMLTable(url, stringsAsFactors = FALSE)
data <- data[[3]]
data$Year <- year
data$PlayerId <- playerid
data
}
This function works perfectly well when it is applied to a single year's worth of data, as seen here:
AdrianGonzales <- ldply("gonzaad01", fetch_stats, year= 2008, .progress="text")
However, as soon as I actually use the function to loop through the multiple years in a players career, it always spits out the following error:
AdrianGonzales <- ldply("gonzaad01", fetch_stats, year= 2009:2004, .progress="text")
Error in data[[3]] : subscript out of bounds
In addition: Warning message:
XML content does not seem to be XML: 'http://www.baseball- reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2009
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2008
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2007
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2006
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2005
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2004'
From what I have been able to find, the "subscript out of bounds" error happens when you exceed the limits of a defined dataset within R. For this particular function, I may just be dumb, but I don't see how that would apply in this case- or why it would work for a single year, but not for several at a time.
I'm open to any and all suggestions. Thanks ahead of time.
You could just use lapply as in the following way below. I put in a minor fix to fetch_stats as it seems that the 6th column returned has no name. You can do what you like with it, as it is just to show how you can use lapply instead.
library(plyr)
library(XML)
# Minor change made to get function working (naming column 6)
fetch_stats <- function(playerid, year) {
url <- paste0("http://www.baseball-reference.com/players/gl.cgi?id=",playerid,"&t=b&year=",year)
data <- readHTMLTable(url, stringsAsFactors = FALSE)
data <- data[[3]]
data$Year <- year
data$PlayerId <- played
### Column six name is empty.
names(data)[6] <- 'EMPTY'
data
}
res <- lapply(2009:2004, function(x) fetch_stats("gonzaad01", x))
resdf <- ldply(res)
This will create a list of 6 elements, one for each year, then convert the list to a data.frame
The way ldapply is applied in your code, it is not giving it one year at a time, it is giving the entire vector of years all at once.
EDIT
After looking a little closer, here is a solution using ldply
new_res <- ldply(.data = 2009:2004,
.fun = function(x) fetch_stats("gonzaad01", x),
.progress="text")
This gave me the same results as the other method above.

Web Scraping (in R) - readHTMLTable error

I have a file called Schedule.csv, which is structured as follows:
URLs
http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=27&year=2015
http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=28&year=2015
I am trying to use the explanation provided in the following question to scrape the html tables but it isn't working: How to scrape HTML tables from a list of links
My current code is as follows:
library(XML)
schedule<-read.csv("Schedule.csv")
stats <- list()
for(i in seq_along(schedule))
{
print(i)
total <- readHTMLTable(schedule[i])
n.rows <- unlist(lapply(total, function(t) dim(t)[1]))
stats[[i]] <- as.data.frame(total[[which.max(n.rows)]])
}
I get an error when I run this code as follows:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"data.frame"’
If I manually type the URL's in a vector as per below I get exactly what I want when I run the readHTMLTable code.
schedule<-c("http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=27&year=2015","http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=28&year=2015")
Can someone please explain to me why the read.csv is not giving me a usable vector of information to input into the readHTMLTable function?
read.csv creates a data.frame in your shcedule. Then you want to access it by rows (seq_along and schedule[i] work along the columns of the data frame)
In your case you can do:
for (i in 1:nrow (schedule)) {
total <- readHTMLTable(schedule[i, 1])
as I understand you want the first column of your data.frame, change the , 1] or use column names otherwise.
Also notice that read.csv will read your first column as a factor so you may prefer to read it as a character:
schedule<-read.csv("Schedule.csv", as.is = TRUE)
An other alternative if your file has a unique column is to use readLines an then you can keep your loop as it was...
schedule<-readLines("Schedule.csv")
stats <- list()
for(i in seq_along(schedule))
{
print(i)
total <- readHTMLTable(schedule[i])
...
but be careful with the column names because they will be in the first element of your schedule vector

cannot handle matrix/array columns with write.dbf

hope i get everything together for this problem. first time for me and it's a little bit tricky to describe.
I want to add some attributes to a dbf file and save it afterwards for use in qgis. its about elections and the data are the votes from the 11 parties in absolute and relative values. I use the shapefiles package for this, but also tried it simply with foreign.
my system: RStudio 0.97.311, R 2.15.2, shapefile 0.7, foreign 0.8-52, ubuntu 12.04
try #1 => no problems
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
try #2 => "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDataDistricts <- cbind(shpDataDistricts, votesDistrict[, 2]) # add a new column
names(shpDataDistricts)[5] <- "SPOE"
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
the write function returns "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
so by simply adding a column (integer) to the data.frame, the write.dbf function isn't able to write out anymore. am now debugging for 3 hours on this simple issue. tried it with shapefiles package via opening shapefile and dbf file, all the time the same problem.
When i use the foreign package directly (read.dbf).
if i save the dbf-file without the voting data (only with the small adapations from step 1+2), it's no problem. It must have to do with the merge with the voting data.
I got the same error message ("error in get("write.dbf"...) while working with shapefiles in R using rgdal. I added a column to the shapefile, then tried to save the output and got the error. I was added the column to the shapefile as a dataframe, when I converted it to a factor via as.factor() the error went away.
shapefile$column <- as.factor(additional.column)
writePolyShape(shapefile, filename)
The problem is that write.dbf cannot write a dataframe into an attribute table. So I try to changed it to character data.
My initial wrong code was:
d1<-data.frame(as.character(data1))
colnames(d1)<-c("county") #using rbind should give them same column name
d2<-data.frame(as.character(data2))
colnames(d2)<-c("county")
county<-rbind(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "PANY_animals_84.dbf") **##doesn't work**
##Error in write.dbf(dataname, ".bdf")cannot handle matrix/array columns
Then I changed everything to character, it works! right code is:
d1<-as.character(data1)
d2<-as.character(data2)
county<-c(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "filename")
Hope it helps!

Resources