Issue downloading and opening xlsx-file from within R

Issue downloading and opening xlsx-file from within R - r

I would like to download and open the following Excel-file with monthly and annual consumer price indices directly from within R.
https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master
(the link can be found on this site: https://www.bfs.admin.ch/bfs/de/home/statistiken/preise/landesindex-konsumentenpreise/lik-resultate.assetdetail.7066959.html)
I used to download this file manually using the browser, save it locally on my computer, then open the xlsx-file with R and work with the data without any problems.
I have now tried to read the file directly from within R, but without luck so far. As you can see from the URL above, there is no .xlsx extension or the like, so I figured the file is zipped somehow. Here is what I've tried so far and where I am stuck.
library(foreign)
library(xlsx)
# in a browser, this links opens or dowloads an xlsx file
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"
temp <- tempfile()
download.file(likurl, temp)
list.files <- unzip(temp,list=TRUE)
data <- read.xlsx(unz(temp,
+ list.files$Name[8]), sheetIndex=2)
The result from the last step is
Error in +list.files$Name[8] : invalid argument to unary operator
I do not really understand the unz function, but can see this is somehow wrong when reading the help file for unz (I found this suggested solution somewhere online).
I also tried the following, different approach:
library(XLConnect)
likurl <- "https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master"
tmp = tempfile(fileext = ".xlsx")
download.file(likurl, tmp)
readWorksheetFromFile(tmp, sheet = 2, startRow = 4,
colNames = TRUE, rowNames = FALSE)
with the last line returning as result:
Error: ZipException (Java): invalid entry size (expected 1644 but got 1668 bytes)
I would greatly appreciate any help on how I can open this data and work with it as usual when reading in data from excel into R.
Thanks a lot in advance!

Here's my solution thanks to the hint by #Johnny. Reading the data from excel worked better with read.xlsx from the xlsx-package (instead of read_excel as suggested in the link above).
Some ugly details still remain with how the columns are named (colNames are not passed on correctly, except for the first and 11th column) and how strangely new columns are created from the options passed to read.xlsx (e.g., a column named colNames, with all entries == TRUE; for details, see the output structure with str(LIK.m)). However, these would be for another question and for the moment, they can be fixed in the quick and dirty way :-).
library(httr)
library(foreign)
library(xlsx)
# in a browser, this links opens or dowloads an xlsx file
likurl<-'https://www.bfs.admin.ch/bfsstatic/dam/assets/7066959/master'
p1f <- tempfile()
download.file(likurl, p1f, mode="wb")
GET(likurl, write_disk(tf <- tempfile(fileext = ".xlsx")))
# annual CPI
LIK.y <- read.xlsx(tf,
sheetIndex = 2, startRow = 4,
colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,
na.strings = "NA", check.names = TRUE, fillMergedCells = FALSE)
LIK.y$X. <- as.numeric(LIK.y$X.)
str(LIK.y)
# monthly CPI
LIK.m <- read.xlsx(tf,
sheetIndex = 1, startRow = 4,
colNames = TRUE, rowNames = FALSE, stringsAsFactors = FALSE,
detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE ,
na.strings = "NA", check.names = TRUE, fillMergedCells = FALSE)
LIK.m$X. <- as.numeric(LIK.m$X.)
str(LIK.m)

Related

How to solve the problem of character change that use write.xlsx() to writes data into excel document in R language?

I write a data.frame into an excel document through the function of write.xlsx. The header of the data.frame contains the characters like "95%CI", "Pr(>|W|)", etc. The data.frame is output in the r console without any problem, but when I written it into Excel file through write.xlsx(), 95% CI becomes X95.CI, and Pr(>|W|) becomes Pr...W..
How to solve this problem?
The test code is as follows:
library("openxlsx")
mydata <- data.frame("95%CI" = 1,
"Pr(>|W|)" =2)
write.xlsx(mydata,
"test.xlsx",
sheetName = "test",
overwrite = TRUE,
borders = "all", colWidths="auto")

I don't think this code works correctly in R console as well.
mydata <- data.frame("95%CI" = 1,"Pr(>|W|)" =2)
mydata
# X95.CI Pr...W..
#1 1 2
You have some non-standard characters in column names (like %, (, > etc), if you want to keep them use check.names = FALSE in data.frame function.
mydata <- data.frame("95%CI" = 1,"Pr(>|W|)" =2, check.names = FALSE)
mydata
# 95%CI Pr(>|W|)
#1 1 2
Now when you write it to excel -
openxlsx::write.xlsx(mydata,
"test.xlsx",
sheetName = "test",
overwrite = TRUE,
borders = "all", colWidths="auto")

With R package openxlsx i am trying to merge columns and rows. though script works it does not allow me to overwrite

I use this script
library(openxlsx)
Output <- read.xlsx(xlsxFile = "Excel_file.xlsx", fillMergedCells =
TRUE,colNames = TRUE)
When i do this it works and does what it is supposed to do. The file contains a lot of merged rows and columns. However it doesn't always work. Is there another package or another script i can do that maybe can do overwrite? I'm wondering if that is the issue.
Otherwise when i try to run the script above again it tells me.
Error in read.xlsx(xlsxFile = "Excel_file.xlsx", fillMergedCells = TRUE, :
Please provide a sheet name OR a sheet index.
when i do and acknowledge adding sheet name or sheet index.....
library(openxlsx)
Output <- read.xlsx(xlsxFile = "Excel_file.xlsx", fillMergedCells = TRUE,colNames = TRUE,sheetIndex = 1)
or
library(openxlsx)
Output <- read.xlsx(xlsxFile = "Excel_file.xlsx", fillMergedCells = TRUE,colNames = TRUE,sheetName = "sheet 1")
i get another error message
Error in loadWorkbook(file, password = password) :
argument "file" is missing, with no default

Can write.xlsx write multiple tabs into a file?

Within R, I am trying to print a series of dataframes into an Excel file using openxlsx. Specifically in this case, I'm using list.files, read.xlsx and write.xlsx.
I'm still unable to write multiple tabs into one Excel file.
Please see my code below, I've tried to approach this problem using a for loop as well as a manual solution to test the feasibility but have had no luck
This is what my code currently looks like. For the length of the file list, pipe each file into a read function which then writes the results.
lapply(
1:length(file.list),
function(x) {
write.xlsx(
read.xlsx(file.list[i]),
file = file_name,
sheetName = file.list[i],
col.names = TRUE,
row.names = FALSE,
append = TRUE)
}
)
A manual solution below also doesn't seem to have any luck for me either
df1 <- read.xlsx(file.list[1])
write.xlsx(df1, file = file_name, sheetName = file.list[1], col.names = TRUE, row.names = FALSE, append = FALSE)
df2 <- read.xlsx(file.list[2])
write.xlsx(df2, file = file_name, sheetName = file.list[2], col.names = TRUE, row.names = FALSE, append = TRUE)
No error messages so far. The final file does see data being written into it, however, it seems only the last file has the results print. I'm thinking that it's almost a cycle of overwrites,

Maybe you could try this:
wb <- createWorkbook(title = "Your_Workbook_Name")
lapply(1:length(file.list), function(y) lapply(1:length(file.list), function(x) writeData(wb,file.list[i],y,col.names = TRUE, row.names = FALSE, append = TRUE)))
Since I don't have a way to replicate this, perhaps you can understand the main idea behind this.
A double loop, in which your traverse all the files you want to write, before writing it you create a sheet with the name of the index, and then you can write in the newly created sheet, the data you want. I hope it's understandable (My knowledge about lapply and sapply is not the best, but the idea still stands)

You can simply use a named list of dataframes in write.xlsx. Something like this should work:
library(openxlsx)
df.list <- lapply(file.list, read.xlsx)
named.df.lst <- setNames(df.list, file.list)
write.xlsx( named.df.lst, file = file_name )

R find maxima of multiple variables from multiple .CSV files

I have multiple csv's, each containing multiple observations for one participant on several variables. Let's say each csv file looks something like the below, and the name of the file indicates the participant's ID:
data.frame(
happy = sample(1:20, 10),
sad = sample(1:20, 10),
angry = sample(1:20, 10)
)
I found some code in an excellent stackoverflow answer that allows me to access all files saved into a specific folder, calculate the sums of these emotions, and output them into a file:
# access all csv files in the working directory
fileNames <- Sys.glob("*.csv")
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
However, I can ONLY get "sum" to work in this function. I would like to not only find the sums of each emotion for each participant, but also the maximum value of each.
When I try to modify the above:
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
happy.max = max(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
I get the following warning message:
In max(sample$happy) : no non-missing arguments to max; returning -Inf
Would sincerely appreciate any advice anyone can give me!

using your test data, the max() statement works fine for me. Is it related to a discrepancy between the sample code you have posted and your actual csv file structure?

read.xls() and arguments in R give read.table error "no lines available in input"

I am having a hard time importing some data from .xls files in R.
library(gdata)
file.names <- list.files(path = ".", pattern = "\\.xls$")
file.names
for (file in seq(file.names))
temp <- read.xls(file.names[file],
verbose = FALSE, skip = 16, nrows = 14, header = FALSE,
check.names = FALSE, sep = "\t", fill = TRUE, fileEncoding="UTF-8")
write.csv(temp, "file.csv")
The code above fails to do what i want, producing the error i provided in the title section of this question. Some similar question here is SO aren't helpful at all.
Is there a conflict with additional arguments? Could this be a perl script error or something caused by bad encoding?

Omit the sep= and fileEncoding= arguments in which case I get a 14x48 data frame with the sample data.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Issue downloading and opening xlsx-file from within R - r

Related

How to solve the problem of character change that use write.xlsx() to writes data into excel document in R language?

With R package openxlsx i am trying to merge columns and rows. though script works it does not allow me to overwrite

Can write.xlsx write multiple tabs into a file?

R find maxima of multiple variables from multiple .CSV files

read.xls() and arguments in R give read.table error "no lines available in input"

Categories

Resources