Using (openxlsx) package to write xlsx files.
I have a variable that is a vector of numbers
x <- 1:8
I then paste ".xlsx" to the end of each element of x to later create an xlsx file
new_x <- paste(x,".xlsx", sep = "")
I then write.xlsx using the ("openxlsx") package in a forloop to create new xlsx files
for (i in x) {
for (j in new_x) {
write.xlsx(i,j)
}}
When I open ("1.xlsx" - "8.xlsx"), all the files only have the number "8" on them. What I don't understand is why it doesn't have the number 1 for 1.xlsx - 7 for 7.xlsx, why does the 8th one overwrite everything else.
I even tried creating a new output for the dataframes as most others suggested
for (i in x) {
for (j in new_x) {
output[[i]] <- i
write.xlsx(output[[i]],j)
}}
And it still comes up with the same problem. I don't understand what is going wrong.
The problem is that you are creating each Excel file multiple times because you have nested loops. Try just using a single loop, and referring to an element of new_x.
x <- 1:8
new_x <- paste(x,".xlsx", sep = "")
for (i in seq_along(x)) {
write.xlsx(i,new_x[i])
}
if you want to read a number of .csv files and save them as xlsx files it is a similar approach, you still want to only have a single for loop such as:
# Define directory of where to look for csv files and where to save Excel files
csvDirectory <- "C:/Foo/Bar/"
ExcelDirectory <- paste0(Sys.getenv(c("USERPROFILE")),"\\Desktop")
# Find all the csv files of interest
csvFiles <- list.files(csvDirectory,"*.csv")
# Go through the list of files and for each one read it into R, and then save it as Excel
for (i in seq_along(csvFiles)) {
csvFile <- read.csv(paste0(csvDirectory,"/",csvFiles[i]))
write.xlsx(csvFile, paste0(ExcelDirectory,"/",gsub("\\.csv$","\\.xlsx",csvFiles[i])))
}
Related
This question already has answers here:
How to import multiple .csv files at once?
(15 answers)
Closed 2 years ago.
I have two lists, one with the excel file paths that I would like to read and another list with the file names that I would like to assign to each as a dataframe. Trying to create a loop using the below code but the loop only creates a single dataframe with name n. Any idea how to make this work?
files <- c("file1.xlsx","file2.xlsx")
names <- c('name1','name2')
for (f in files) {
for (n in names) {
n <- read_excel(path = f)
}
}
You are overwriting n on each iteration of the loop
Edit:
#Parfait commented that we shouldn't use assign if we can avoid it, and he is right (e.g. why-is-using-assign-bad)
This does not use assign and puts the data in a neat list:
files <- c("file1.xlsx","file2.xlsx")
names <- c('name1','name2')
result <- list()
for (i in seq_along(files)) {
result[names[i]] <- read_excel(path = files[i]))
}
Old and not recommended answer (only left here for transparency reasons):
We can use assign to use a character string as variable name:
files <- c("file1.xlsx","file2.xlsx")
names <- c('name1','name2')
for (i in seq_along(files)) {
assign(names[i], read_excel(path = files[i]))
}
An alternative is to loop through all Excel files in a folder, rather than a list. I'm assuming they exist in some kind of folder, somewhere.
# load names of excel files
files = list.files(path = "C:/your_path_here/", full.names = TRUE, pattern = ".xlsx")
# create function to read multiple sheets per excel file
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
sapply(sheets, function(f) as.data.frame(readxl::read_excel(filename, sheet = f)),
simplify = FALSE)
}
# execute function for all excel files in "files"
all_data <- lapply(files, read_excel_allsheets)
Updated...
I've got a lot of files in a folder, many of them are empty and others with data inside.
What I'm trying to do is this:
#Load all data in a list
file_all <- list.files(file.path(getwd(), "testall"), pattern = "\\.txt$")
Using this list, I'm trying to skip empty files using the method explained by #nrussell How to skip empty files when importing text files in R?
library(plyr)
df_list <- lapply(files, function(x) {
if (!file.size(x) == 0) {
list.files(x)
}
})
And (not empty files)
df_list2 <- lapply(files, function(x) {
if (file.size(x) == 0) {
list.files(x)
}
})
The difference between #nrussell and mine is that I want to create a list of empty files and another list with not empty files. I'd like to know how many files are empty and how many are not empty.
# create a list of files in the current working directory
list.of.files <- file.info(dir())
# get the size for each file
sizes <- file.info(dir())$size
# subset the files that have non-zero size
list.of.non.empty.files <- rownames(list.of.files)[which(sizes != 0)]
# here you can further subset that list by file name, eg - if I only want all the files with extension .mp3
list.of.non.empty.files[grep( ".mp3", list.of.non.empty.files)]
# get the empty files
list.of.empty.files <- rownames(list.of.files)[which(sizes == 0)]
I am reading each worksheet of Excel File named "REL" up to worksheet 4 using the repeat function given below. But after reading worksheet for each value of i, I want to save it first in my working directory before reading for i + 1.
i <- 1
repeat {
fcr <- read.xlsx("REL.xlsx", sheet = i, colNames = TRUE)
i <- i + 1
print(i)
if (i > 4) {
break
}
}
In the future please indicate which packages you are using when referencing non-base functions; presumably this is read.xlsx from the xlsx package. To save each worksheet as a csv, you would need to call write.csv(...) after reading the file in, and before the loop begins its next iteration. But you shouldn't even bother with repeat, etc... as above. Use something more idiomatic to R such as sapply:
library(xlsx)
##
list.files()
#[1] "REL.xlsx"
##
sapply(1:4, function(i) {
write.csv(
read.xlsx("REL.xlsx", sheetIndex = i, header = TRUE),
file = sprintf("WS%d.csv", i)
)
})
##
list.files()
#[1] "REL.xlsx" "WS1.csv" "WS2.csv" "WS3.csv" "WS4.csv"
I'm not a very experienced R user. I need to loop through a folder of csv files and apply a function to each one. Then I would like to take the value I get for each one and have R dump them into a new column called "stratindex", which will be in one new csv file.
Here's the function applied to a single file
ctd=read.csv(file.choose(), header=T)
stratindex=function(x){
x=ctd$Density..sigma.t..kg.m.3..
(x[30]-x[1])/29
}
Then I can spit out one value with
stratindex(Density..sigma.t..kg.m.3..)
I tried formatting another file loop someone made on this board. That link is here:
Looping through files in R
Here's my go at putting it together
out.file <- 'strat.csv'
for (i in list.files()) {
tmp.file <- read.table(i, header=TRUE)
tmp.strat <- function(x)
x=tmp.file(Density..sigma.t..kg.m.3..)
(x[30]-x[1])/29
write(paste0(i, "," tmp.strat), out.file, append=TRUE)
}
What have I done wrong/what is a better approach?
It's easier if you read the file in the function
stratindex <- function(file){
ctd <- read.csv(file)
x <- ctd$Density..sigma.t..kg.m.3..
(x[30] - x[1]) / 29
}
Then apply the function to a vector of filenames
the.files <- list.files()
index <- sapply(the.files, stratindex)
output <- data.frame(File = the.files, StratIndex = index)
write.csv(output)
I am new to R and am currently trying to apply a function to worksheets of the same index in different workbooks using the package XLConnect.
So far I have managed to read in a folder of excel files:
filenames <- list.files( "file path here", pattern="\\.xls$", full.names=TRUE)
and I have looped through the different files, reading each worksheet of each file
for (i in 1:length(filenames)){
tmp<-loadWorkbook(file.path(filenames[i],sep=""))
lst<- readWorksheet(tmp,
sheet = getSheets(tmp), startRow=5, startCol=1, header=TRUE)}
What I think I want to do is to loop through the files in filenames and then take the worksheets with the same index (eg. the 1st worksheet of the all the files, then the 2nd worksheet of all the files etc.) and save these to a new workbook (the first workbook containing all the 1st worksheets, then a second workbook with all the 2nd worksheets etc.), with a new sheet for each original sheet that was taken from the previous files and then use
for (sheet in newlst){
Count<-t(table(sheet$County))}
and apply my function to the parameter Count.
Does anyone know how I can do this or offer me any guidance at all? Sorry if it is not clear, please ask and I will try to explain further! Thanks :)
If I understand your question correctly, the following should solve your problem:
require(XLConnect)
# Example: workbooks w1.xls - w[n].xls each with sheets S1 - S[m]
filenames = list.files("file path here", pattern = "\\.xls$", full.names = TRUE)
# Read data from all workbooks and all worksheets
data = lapply(filenames, function(f) {
wb = loadWorkbook(f)
readWorksheet(wb, sheet = getSheets(wb)) # read all sheets in one go
})
# Assumption for this example: all workbooks contain same number of sheets
nWb = sapply(data, length)
stopifnot(diff(range(nWb)) == 0)
nWb = min(nWb)
for(i in seq(length.out = nWb)) {
# List of data.frames of all i'th sheets
dout = lapply(data, "[[", i)
# Note: write all collected sheets in one go ...
writeWorksheetToFile(file = paste0("wout", i, ".xls"), data = dout,
sheet = paste0("Sout", seq(length.out = length(data))))
}
using gdata/xlsx might be for the best. Even though it's the slowest out there, it's one of the more intuitive ones.
This method fails if the excels are sorted differently.
Given there's no real example, here's some food for thought.
Gdata requires perl to be installed.
library(gdata)
filenames <- list.files( "file path here", pattern="\\.xls$", full.names=TRUE)
amountOfSheets <- #imput something
#This snippet gets the first sheet out of all the files and combining them
readXlsSheet <- function(whichSheet, files){
for(i in seq(files)){
piece <- read.xls(files[i], sheet=whichSheet)
#rbinding frames should work if the sheets are similar, use merge if not.
if(i == 1) complete <- piece else complete <- rbind(complete, piece)
}
complete
}
#Now looping through all the sheets using the previous method
animals <- lapply(seq(amountOfSheets), readXlsSheet, files=filenames)
#The output is a list where animals[[n]] has the all the nth sheets combined.
xlsx might only work on 32bit R, and has it's own fair share of issues.
library(xlsx)
filenames <- list.files(pattern="\\.xls$", full.names=TRUE)
amountOfSheets <- 2#imput something
#This snippet gets the first sheet out of all the files and combining them
readXlsSheet <- function(whichSheet, files){
for(i in seq(files)){
piece <- read.xlsx(files[i], sheetIndex=whichSheet)
#rbinding frames should work if the sheets are similar, use merge if not.
if(i == 1) complete <- piece else complete <- rbind(complete, piece)
}
complete
}
readXlsSheet(1, filenames)
#Now looping through all the sheets using the previous method
animals <- lapply(seq(amountOfSheets), readXlsSheet, files=filenames)