Please can someone explain how I can do a loop on xlconnect based on whether a excel sheet exists across multiple files. I have looked on
How to write the first for loop but I can't work it out for what I am trying to do.
Example: If I have 5 excel files some with sheet1 and some with sheet1 and sheet2 how can I say if the file has sheet2 then readworksheet else skip to next file.
filenames<-list.files(location,pattern='xlsx',full.names = TRUE)#this gives the file path of all files
data = lapply(filenames, function(f) {
wb = loadWorkbook(f)
existsSheet(filenames, "sheet2")
})# this thows an error because it can't find the sheet from file
I think I need something like the below but to handle the error so that if there is no sheet2 then go to readworksheet from the next file.
for (file in filenames) {
newFile = readWorksheetFromFile(file=file, sheet="sheet2")
df = merge(newFile, newFile, all=TRUE)
}
Related
I need to shape the data stored in Excel files and save it as new .csv files. I figured out what specific actions should be done, but can't understand how to use lapply.
All Excell files have the same structure. Each of the .csv files should have the name of original files.
## the original actions successfully performed on a single file
library(readxl)
library("reshape2")
DataSource <- read_excel("File1.xlsx", sheet = "Sheet10")
DataShaped <- melt(subset(DataSource [-(1),], select = - c(ng)), id.vars = c ("itemname","week"))
write.csv2(DataShaped, "C:/Users/Ol/Desktop/Meta/File1.csv")
## my attempt to apply to the rest of the files in the directory
lapply(Files, function (i){write.csv2((melt(subset(read_excel(i,sheet = "Sheet10")[-(1),], select = - c(ng)), id.vars = c ("itemname","week"))))})
R returns the result to the console but doesn't create any files. The result resembles .csv structure.
Could anybody explain what I am doing wrong? I'm new to R, I would be really grateful for the help
Answer
Thanks to the prompt answer from #Parfait the code is working! So glad. Here it is:
library(readxl)
library(reshape2)
Files <- list.files(full.names = TRUE)
lapply(Files, function(i) {
write.csv2(
melt(subset(read_excel(i, sheet = "Decomp_Val")[-(1),],
select = -c(ng)),id.vars = c("itemname","week")),
file = paste0(sub(".xlsx", ".csv",i)))
})
It reads an Excel file in the directory, drops first row (but headers) and the column named "ng", melts the data by labels "itemname" and "week", writes the result as a .csv to the working directory attributing the name of the original file. And then - rinse and repeat.
Simply pass an actual file path to write.csv2. Otherwise, as denoted in docs ?write.csv, the default value for file argument is empty string "" :
file: either a character string naming a file or a connection open for writing. "" indicates output to the console.
Below concatenates the Excel file stem to the specified path directory with .csv extension:
path <- "C:/Users/Ol/Desktop/Meta/"
lapply(Files, function (i){
write.csv2(
melt(subset(read_excel(i, sheet = "Sheet10")[-(1),],
select = -c(ng)),
id.vars = c("itemname","week")),
file = paste0(path, sub(".xlsx", ".csv", i))
)
})
I have multiple excel files having different names e.g USA.xlsx, India.xlsx etc. Each file has only one sheet. I want to rename the sheet of each file as Sheet 1.
Desired output USA.xlsx should have sheet 1, India.xlsx should have sheet 1 and so on. I have 1800 excel files. I know renameWorksheet(wb, sheet, newName) will work for one file. I have 1800 excel files
I think this should get the job done:
library(openxlsx)
list.files(pattern = '.xlsx')
for(file in list.files(pattern = '.xlsx')){
wb <- loadWorkbook(file, xlsxFile = NULL)
names(wb)[1] <- 'Sheet1'
saveWorkbook(wb, file, overwrite = TRUE)
}
I am trying to read the contents of a score of Excel files into R with XLConnect. This is a simplified version of my code:
# point to a folder
path <- "/path/to/folder"
# get all the Excel files in that folder
files <- list.files(path, pattern = "*.xlsx")
# create an empty data frame
dat <- data.frame(var.1 = character(), var.2 = numeric())
# load XLConnect
library("XLConnect")
# loop over the files
for (i in seq_along(files)) {
# read each Excel file
wb <- loadWorkbook(paste(pfad, files[i], sep = "/"))
# fill the data frame with data from the Excel file
dat[i, 1:2] <- readWorksheet(wb, "Table1", startRow = 1, startCol = 1, endRow = 2, endCol = 1, header = FALSE)
rm(wb)
}
I can read in a single file when I specify it with loadWorkbook(paste(pfad, files[1], sep = "/")), but when I loop over the file list with files[i], the code inside the for-loop returns the following error:
Error: InvalidFormatException (Java):
Your InputStream was neither an OLE2 stream, nor an OOXML stream
What am I doing wrong?
The problem had nothing to do with my code.
I had some of the files in that folder open in Excel. When you open a file in Excel, Excel creates an invisible file named "~$filename.xlsx". Since my regular expression searched for files with the suffix ".xlsx", these files were found, too, and since these files are not spreadsheet files, XLConnect couldn't read them and threw an error.
I solved the problem by closing those files in Excel.
Another solution would be to exclude files that begin with a tilde in the regular expression, with something like:
list.files(path, pattern = "^[^~].+\\.xlsx")
I want to read a bunch of excel files all located in the same directory and store them in different sheets in a consolidated Excel file.
I initially tried using XLConnect but kept getting the error GC overhead limit exceeded. I stumbled upon this question which says that it is a common problem with Java based Excel handling packages such as XLConnect and xlsx. I tried the memory management trick suggested there, but it did not work. One of the comments in one of the comments on the accepted answers suggested using openxls as it based on RCpp and hence avoid this particular problem.
My current code is as follows:
library(openxlsx)
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE) #pattern match as multiple files are from the same month
# Read them into a list and write to sheet
wb <- createWorkbook()
lapply(files, function(x){
print(x)
xlFile<-read.xlsx(xlsxFile = x, sheet = 1, startRow = 2, colNames = T) #Also tried
str(xlFile)
#Create a sheet in the new Excel file called Consolidated.xlsx with the month name
#Append current data in sheet
})
The problem I am getting is the error: Error in read.xlsx.default(xlsxFile = x, sheet = 1, startRow = 2, colNames = T) : openxlsx can not read .xls or .xlm files!
I have ensured that files variable contains all the files of interest (Ex: January 2015.xls, January 2016.xls, etc). I have also ensured that the path to the file is correct and the Excel files actually exists there.
I have left the writing to Excel as skeleton code as I need to solve the problem with reading the files first.
In case it helps, here is the code attempt with XLConnect
library(XLConnect)
setwd("D:/something/something")
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE)
# Read them into a list
df.list = lapply(files, readWorksheetFromFile, sheet=1, startRow=2)
#combine them into a single data frame and write to disk:
df = do.call(rbind, df.list)
rm(df.list)
outputFileName<-"Consolidated.xlsx"
# Load workbook (create if not existing)
wb <- loadWorkbook(outputFileName, create = TRUE)
createSheet(wb, name = mnth)
writeWorksheet(wb,df,sheet = mnth)
#write.xlsx2(df, outputFileName, sheetName = mnth, col.names = T, row.names = F, append = TRUE)
saveWorkbook(wb)
rm(df)
gc()
I am new to R and am currently trying to apply a function to worksheets of the same index in different workbooks using the package XLConnect.
So far I have managed to read in a folder of excel files:
filenames <- list.files( "file path here", pattern="\\.xls$", full.names=TRUE)
and I have looped through the different files, reading each worksheet of each file
for (i in 1:length(filenames)){
tmp<-loadWorkbook(file.path(filenames[i],sep=""))
lst<- readWorksheet(tmp,
sheet = getSheets(tmp), startRow=5, startCol=1, header=TRUE)}
What I think I want to do is to loop through the files in filenames and then take the worksheets with the same index (eg. the 1st worksheet of the all the files, then the 2nd worksheet of all the files etc.) and save these to a new workbook (the first workbook containing all the 1st worksheets, then a second workbook with all the 2nd worksheets etc.), with a new sheet for each original sheet that was taken from the previous files and then use
for (sheet in newlst){
Count<-t(table(sheet$County))}
and apply my function to the parameter Count.
Does anyone know how I can do this or offer me any guidance at all? Sorry if it is not clear, please ask and I will try to explain further! Thanks :)
If I understand your question correctly, the following should solve your problem:
require(XLConnect)
# Example: workbooks w1.xls - w[n].xls each with sheets S1 - S[m]
filenames = list.files("file path here", pattern = "\\.xls$", full.names = TRUE)
# Read data from all workbooks and all worksheets
data = lapply(filenames, function(f) {
wb = loadWorkbook(f)
readWorksheet(wb, sheet = getSheets(wb)) # read all sheets in one go
})
# Assumption for this example: all workbooks contain same number of sheets
nWb = sapply(data, length)
stopifnot(diff(range(nWb)) == 0)
nWb = min(nWb)
for(i in seq(length.out = nWb)) {
# List of data.frames of all i'th sheets
dout = lapply(data, "[[", i)
# Note: write all collected sheets in one go ...
writeWorksheetToFile(file = paste0("wout", i, ".xls"), data = dout,
sheet = paste0("Sout", seq(length.out = length(data))))
}
using gdata/xlsx might be for the best. Even though it's the slowest out there, it's one of the more intuitive ones.
This method fails if the excels are sorted differently.
Given there's no real example, here's some food for thought.
Gdata requires perl to be installed.
library(gdata)
filenames <- list.files( "file path here", pattern="\\.xls$", full.names=TRUE)
amountOfSheets <- #imput something
#This snippet gets the first sheet out of all the files and combining them
readXlsSheet <- function(whichSheet, files){
for(i in seq(files)){
piece <- read.xls(files[i], sheet=whichSheet)
#rbinding frames should work if the sheets are similar, use merge if not.
if(i == 1) complete <- piece else complete <- rbind(complete, piece)
}
complete
}
#Now looping through all the sheets using the previous method
animals <- lapply(seq(amountOfSheets), readXlsSheet, files=filenames)
#The output is a list where animals[[n]] has the all the nth sheets combined.
xlsx might only work on 32bit R, and has it's own fair share of issues.
library(xlsx)
filenames <- list.files(pattern="\\.xls$", full.names=TRUE)
amountOfSheets <- 2#imput something
#This snippet gets the first sheet out of all the files and combining them
readXlsSheet <- function(whichSheet, files){
for(i in seq(files)){
piece <- read.xlsx(files[i], sheetIndex=whichSheet)
#rbinding frames should work if the sheets are similar, use merge if not.
if(i == 1) complete <- piece else complete <- rbind(complete, piece)
}
complete
}
readXlsSheet(1, filenames)
#Now looping through all the sheets using the previous method
animals <- lapply(seq(amountOfSheets), readXlsSheet, files=filenames)