Working with excel and r2xcel in R - r

I created a function that takes an excel file and splits it into smaller files using r2excel package. Basically, the function reads an excel file which contains all the students in our district, and creates individual files for each teacher in a school (e.g. class list). It seems to work fine in one excel file, however, when I tested on a different one, it still produced some files, but then suddenly it stopped. My solution was to remove some of the rows that causes the problem, and then rerun the function again. But this is only a temporary solution.
Below is the error I received.
Error in .jnew("java/io/File", file) :
java.lang.NoSuchMethodError:
Here is my code:
df <- read.csv("bigfile.csv")
extract <- function(name){
temp_df <- subset(df, `Teacher Name` == name)
temp_df <- temp_df[order(temp_df$Class, temp_df$`Student Name`),]
wb <- createWorkbook(type="xlsx")
sheet <- createSheet(wb, sheetName = "Class List")
xlsx.addTable(wb, sheet, temp_df, fontColor="darkblue", row.names=FALSE, startCol=1,fontSize=11)
xlsx.addLineBreak(sheet,0)
filename <- paste(unique(temp_df$`School Name`), unique(temp_df$`Teacher Name`),sep=" ")
filename <- paste(filename, " 2D.xlsx", sep="")
saveWorkbook(wb, filename)
}
lapply(unique(df$`Teacher Name`), extract)
Can someone please explain to me what the error implies as I am not familiar with r2excel or java? Is there something wrong with my excel file or did I not implement r2excel correctly? I am using the latest R and Rstudio. Thank you

Related

Reading all sheets in multiple excel files into R

I am trying to read a bunch of excel files, and all of the sheets from these files into R. I would like to then save each sheet as a separate data frame with the name of the data frame the same name as the name of the sheet. Some files only have 1 sheet, while others have more than one sheet so I'm not sure how to specify all sheets as opposed to just a number.
I have tried:
library(XLConnect)
files.list <- list.files(recursive=T,pattern='*.xlsx') #get files list from folder
for (i in 1:length(files.list)){
wb <- loadWorkbook(files.list[i])
sheet <- getSheets(wb, sheet = )
for (j in 1:length(sheet)){
tmp<-read.xlsx(files.list[i], sheetIndex=j,
sheetName=NULL,
as.data.frame=TRUE, header=F)
if (i==1&j==1) dataset<-tmp else dataset<-rbind(dataset,tmp)
}
}
and I get an error "could not find function "loadWorkbook"". At one point I resolved that issue and got an error "could not find function "getSheets"". I have had some issues getting this package to work so if anyone has a different alternative I would appreciate it!
You could try with readxl...
I've not tested this for the case of different workbooks with duplicate worksheet names.
There were a number of issues with your code:
the list.files pattern included a . which is a reserved character so needs to be escaped with \\
As #deschen pointed out the excel referring functions are from the openxlsx package
library(readxl)
files.list <- list.files(recursive = T, pattern = '*\\.xlsx$') #get files list from folder
for (i in seq_along(files.list)){
sheet_nm <- excel_sheets(files.list[i])
for (j in seq_along(sheet_nm)){
assign(x = sheet_nm[j], value = read_xlsx(path = files.list[i], sheet = sheet_nm[j]), envir = .GlobalEnv)
}
}
Created on 2022-01-31 by the reprex package (v2.0.1)
I'm pretty sure, the loadWorkbook function comes from package openxlsx. So you should use:
library(openxlsx)
https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf

Problem with XLS files with R's package readxl

I need to read a XLS file in R, but I'm having a problem regarding the way my file is generated and the R function readxl. I do not have this issue with Python, and this is my hope that it's possible to solve this problem inside R.
An application we use at my company exports reports in XLS format (not XLSX). This report is generated daily. What I need is to sum the total value of the rows in each file, in order to create a new report containing each day followed by this total value.
When I try to read these files in R using the readxl package, the program returns this error:
Erro: Can't subset columns that don't exist.
x Location 5 doesn't exist.
i There are only 0 columns.
Run rlang::last_error() to see where the error occurred.
Now, the weird thing is that, when I open the XLS file on Excel before running my script, R is able to run properly.
I guesses this was an error caused by something like the file only being completed when I open it... but the same python script does give me the correct result.
I am now assuming this is a bug in the readxl package. Is there another package I could use to run XLS (and not XLSX)? One that does not depend on Java installed on my computer, I mean.
my readxl script:
if (!require("readxl")) {install.packages("readxl"); library("readxl")}
"%,%" <- function(x,y) paste0(x,"\\",y)
year = "2021"
month = "Aug"
column = 5 # VL_COVAR
path <- "F:\\variancia" %,% year %,% month
tiposDF = c("date","numeric","list","numeric","numeric","numeric","list")
file.names <- dir(path, pattern =".xls")
vari <- c()
for (i in 1:length(file.names)){
file <- paste(path,sep="\\",file.names[i])
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
}
vari
file <- paste(path,sep="\\",'Covariância.xls_02082021.xls')
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
x <- import(file)
View(x)
Thanks everyone!

I am trying to read only the tail of multiple .xlsx files merged into a data.frame of lists

I am trying to merge multiple .xlsx sheets together into one data file within r, but extracting only the last row of each sheet.
I am a clinical academic, and we current have a prediction algorithm implemented via a macro-enabled excel spreadsheet. This macro-enabled spreadsheet outputs a .xlsx sheet into a pre-specified folder.
It unfortunately has a series of test rows that it inserted into the output .xlsx . Furthermore the users occasionally input the same data multiple times until it is correct. For this reason in the cleaned data we would only like the final row of each .xlsx file to be included.
I have managed to merge all the files, using the below code, mainly due to the help/code I have managed to find from this community.
I am unfortunately stuck at the following error message. See below
library(plyr)
library(dplyr)
library(readxl)
#file directory where the .xlsx files are to be listed below path <- "//c:/documents"
filenames_list <- list.files(path= path, full.names=TRUE)
All_list <- lapply (filenames_list,
function(filename){
print(paste("Merging",filename,sep = " "))
read.xlsx(filename)
})
#this below code doesnt work
#it returns the following error
# Error in x[seq.int(to = xlen, length.out = n)] :
# object of type 'S4' is not subsettable
tail_only_list_df <- lapply (All_list,
function(newtail){
tail(newtail, 1)
})
final_df <- rbind.fill(tail_only_list_df)
Try doing the following :
df <- do.call(rbind, lapply(filenames_list, function(filename)
tail(openxlsx::read.xlsx(filename), 1)))
Or if you already have list of excel files do
df <- do.call(rbind, lapply(All_list, tail, 1))

Create dataframe from list in Rproj

I have an issue that really bugs me: I've tried to convert to Rproj lately, because I would like to make my data and scripts available at some point. But with one of them, I get an error that, I think, should not occur. Here is the tiny code that gives me so much trouble, the R.proj being available at: https://github.com/fredlm/mockup.
library(readxl)
list <- list.files(path = "data", pattern = "file.*.xls") #List excel files
#Aggregate all excel files
df <- lapply(list, read_excel)
for (i in 1:length(df)){
df[[i]] <- cbind(df[[i]], list[i])
}
df <- do.call("rbind", df)
It gives me the following error right after "df <- lapply(list, read_excel)":
Error in read_fun(path = path, sheet = sheet, limits = limits, shim =
shim, : path[1]="file_1.xls": No such file or directory
Do you know why? When I do it old school, i.e. using 'setwd' before creating 'list', everything works just fine. So it looks like lapply does not know where to look for the file when used in a Rproj, which seems very odd...
What did I miss?
Thanks :)
Thanks to a non-stackoverflower, a solution was found. It's silly, but 'list' was missing a directory, so lapply couldn't aggregate the data. The following works just fine:
list <- paste("data/", list.files(path = "data", pattern = pattern = "file.*.xls"), sep = "") #List excel files

Reading files from loops

I have the following part of the code that contains two loops. I have some txt files, which I want to be read and analyzed in R separately, one by one. Currently, I face a problem of importing them to R. For example, the name of the first file is "C:/Users/User 1/Documents/Folder 1/1 1986.txt". To read it in R I have made the following loop:
## company
for(i in 1)
{
## year
for(j in 1986)
{
df=read.delim(paste("C:/Users/User 1/Documents/Folder 1/", i, j, ".txt"), stringsAsFactors=FALSE, header=FALSE)
df<-data.frame(rename(df, c("V3"="weight")))
}
}
When I run the loop, I get the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C:/Users/User 1/Documents/Folder 1/ 13 1986 .txt': No such file or directory
How do I avoid those additional gaps that R assumes to exist in the name of the original file?
You should replace paste with paste0.
By default, paste use spaces as a separator, thus yielding the obtained result. And paste0 use nothing as a separator.
Because I don't know how your files look like exactly, maybe this won't help you... But this is how I read in files with a loop:
First: setting the working directory
setwd("/Users/User 1/Documents/Folder 1")
Then I always save my data as one excel file with different sheets. For this example I have 15 different sheets in my excel file named 2000-2014, the first sheet is called "2000", the second "2001" and so on.
sheets <- list() # creating empty list named sheets
for(i in 1:15){
sheets[[i]] <- read_excel("2000-2014.xlsx", sheet = i) # every sheet will be one layer of the list sheets
k <- c(2000:2014)
sheet[[i]]$Year <- k[i] # to every listlayer I add a column "Year", matching the actual year my data is from
}
No I want my data from 2000 to 2014 merged in one big data frame. I can still analyse them one by one!
data <- do.call(rbind.data.frame, sheets)
To tidy my data all in one and to get it into the form Hadley Wickham and ggplot2 like it (http://vita.had.co.nz/papers/tidy-data.pdf) I restructure it:
data_restructed <- data %>%
as.data.frame() %>%
tidyr::gather(key = "categories", value = "values", 2:12)
2:12 because in my case columns 2:12 contain all the values while column 1 contains countrienames. Now you have all your data in one big dataframe and can analyse them seperated to specific variables like the year or the category or year AND category and so on.
I would avoid the loop in this case and go with lapply.
Files <- list.files('C:/Users/User 1/Documents/Folder 1/', pattern = "*.txt")
fileList <- lapply(Files, FUN =- function(x){
df <- read.delim(x, stringsAsFactors=FALSE, header=FALSE)
df <- data.frame(rename(df, c("V3"="weight")))
return(df)
})
do.call('rbind', fileList)

Resources