R loop over write.xlsx() - r

I want to export a couple of data frames to an excel file using the function write.xlsx() from openxlsx. So, for example the following:
library(openxlsx)
x <- c(1,2,3)
for (i in x) {
name <- paste("sheet", i, sep = "")
assign(name, data.frame(1:4, 2:3))
path <- paste("/some_directory/",name,".xlsx" , sep = "")
write.xlsx(name, file = path)
}
This does create three different data frames with the values 1 to 4 and 2 to 3, those have the right names, it also creates three different excel files with the right names, but the excel files only contain the name instead of the values from the dataframe. Does anyone know how to change that?

you need to keep your data.frame in a variable:
library(glue)
library(openxlsx)
x <- c(1,2,3)
for (i in x) {
name <- paste("sheet", i, sep = "")
df <- data.frame(1:4, 2:3) # This step is missing in your example
assign(name, df)
path <- glue("/some_directory/{name}.xlsx", name = name)
write.xlsx(df, file = path)
}
``

Related

Converting text files to excel files in R

I have radiotelemetry data that is downloaded as a series of text files. I was provided with code in 2018 that looped through all the text files and converted them into CSV files. Up until 2021 this code worked. However, now the below code (specifically the lapply loop), returns the following error:
"Error in setnames(x, value) :
Can't assign 1 names to a 4 column data.table"
# set the working directory to the folder that contain this script, must run in RStudio
setwd(dirname(rstudioapi::callFun("getActiveDocumentContext")$path))
# get the path to the master data folder
path_to_data <- paste(getwd(), "data", sep = "/", collapse = NULL)
# extract .TXT file
files <- list.files(path=path_to_data, pattern="*.TXT", full.names=TRUE, recursive=TRUE)
# regular expression of the record we want
regex <- "^\\d*\\/\\d*\\/\\d*\\s*\\d*:\\d*:\\d*\\s*\\d*\\s*\\d*\\s*\\d*\\s*\\d*"
# vector of column names, no whitespace
columns <- c("Date", "Time", "Channel", "TagID", "Antenna", "Power")
# loop through all .TXT files, extract valid records and save to .csv files
lapply(files, function(x){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
colnames(dt) <- c("columns") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I looked up possible fixes for this and found using "setnames(skip_absent=TRUE)" in the loop resolved the setnames error but instead gave the error "Error in is.data.frame(x) : argument "x" is missing, with no default"
lapply(files, function(file){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
setnames(skip_absent = TRUE)
colnames(dt) <- c("col") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I'm confused at to why this code is no longer working despite working fine last year? Any help would be greatly appreciated!
The error occured at this line colnames(dt) <- c("columns") where you provided only one value to rename the (supposedly) 4-column dataframe. If you meant to replace a particular column, you can
colnames(dt)[i] <- c("columns")
where i is the index of the column you are renaming. Alternatively, provide a vector with 4 new names.

Rename all files in a directory based off of columns in another index data frame

I have a large number of csv files in a directory that I need to rename based off of corresponding cols in another index/reference data frame. Here is a three element sample of what I'm dealing with:
dir.create("dir1")
write.csv(mtcars[1:2,], "dir1/20821659.csv", row.names=FALSE)
write.csv(mtcars[3:4,], "dir1/20821654.csv", row.names=FALSE)
write.csv(mtcars[5:6,], "dir1/20821657.csv", row.names=FALSE)
Now I have another data frame with the orignial names of these files in one column, and another column that I would like to use to rename them:
location <- c("SFM01_2", "SFM05_2", "02M08_2")
sn <- c("20821659", "20821654", "20821657")
df<- data.frame(location, sn)
For example, the location name that corresponds to the first file name (20821659) is SFM01_2, and I would like to change that file name to SFM01_2 and so on for all the many files in this folder.
You could loop over the rows, each time using paste0() to create a mv command, which is then provided to system()
purrr::walk(1:nrow(df),function(i) {
cmd = paste0("mv dir1/",df[["sn"]][i], ".csv dir1/", df[["location"]][i], ".csv")
system(command=cmd)
})
Tested. file.rename returns TRUE on success.
dir1 <- "dir1"
apply(df, 1, \(x) {
new <- paste0(x[1], ".csv")
new <- file.path(dir1, new)
old <- paste0(x[2], ".csv")
old <- file.path(dir1, old)
if(file.exists(old)) file.rename(old, new)
})
#[1] TRUE TRUE TRUE
Here is a solution using mapply. You can create a new dataframe with the full paths of the files. Then, rename the file using the specification of the 2 columns row by row .
dir.create("dir1")
write.csv(mtcars[1:2,], "dir1/20821659.csv", row.names=FALSE)
write.csv(mtcars[3:4,], "dir1/20821654.csv", row.names=FALSE)
write.csv(mtcars[5:6,], "dir1/20821657.csv", row.names=FALSE)
list.files('dir1') # "20821654.csv" "20821657.csv" "20821659.csv"
location <- c("SFM01_2", "SFM05_2", "02M08_2")
sn <- c("20821659", "20821654", "20821657")
df<- data.frame(location, sn)
# Create a new dataframe with the full paths of the files
df2 <- sapply(df, function(i){
paste0('dir1/', i, '.csv')
})
# rename the file using the specification of the 2 columns row by row
mapply(FUN = file.rename, from = df2[, 2], to = df2[, 1],
MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
list.files('dir1') # "02M08_2.csv" "SFM01_2.csv" "SFM05_2.csv"

Dataframes are created but column names are not changing when reading from excel workbook

I am trying to read an excel workbook in R and for each sheet will create a dataframe.
In the next step, i want to read that created dataframe and use sheet name along with under score before each of the column in the respective dataframe.
Here is what I am doing:
library(readxl)
# Store Sheet Names in a vector
sheet <- excel_sheets("D:/OTC/JULY DATA.XLSX")
# Trim any of the Trailing White Spaces
sheet_trim_trailing <- function (x) sub("\\s+$", "", x)
sheet <- sheet_trim_trailing(sheet)
# Read each of the sheets in the workbook and create a
# dataframe using respective names of the sheets
for(i in 1:length(sheet)){
# this read a sheet and create the dataframe using its name
assign(sheet[i], read.xlsx("DATA.XLSX", sheetIndex = i))
# store dataframe name into a vector
sname <- sheet[i]
# use vector to change the col names in the respective dataframe
colnames(sname) <- gsub("^", paste0(sname,"_"), colnames(sname))
}
Dataframes are created but column names are not changing?
I dont know where I am wrong?
What you need to do is something like
colnames(get(sheet[i])) <- gsub("^", paste0(sname,"_"), colnames(get(sheet[i])))
But this will give an error
target of assignment expands to non-language object
A workaround is to use a temporary variable to change column names
Reproducible example
temp <- mtcars[1:5,]
d <- get("temp")
colnames(d) <- sub("y", " ", colnames(d))
assign("temp", d)
Try this
for(i in 1:length(sheet)){
assign(sheet[i], read.xlsx("DATA.XLSX", sheetIndex = i))
t <- get(sheet[i])
colnames(t) <- gsub("^", paste0(sheet[i],"_"), colnames(t))
assign(sheet[i], t)
}
I think I was looking for something like this one, which does the same as above.
Try This Alternative:
library(readxl)
# function to read all the sheets from excel workbook
read_all_sheets <- function(xlsfile) {
sheets <- excel_sheets(xlsfile)
setNames(lapply(sheets, function(.) {
tbl <- read_excel(xlsfile, sheet = .)
# this will change the col names with sheet name
# and underscore as prefix
names(tbl) <- paste(., names(tbl), sep = "_")
tbl
}), sheets)
}
## create dataframes from sheets
# first read all the sheets are list
List_of_All_Sheets <- read_all_sheets("Location/of/the/file.xlsx")
# then create dataframes
lapply(names(List_of_All_Sheets),
function(nams) assign(nams, List_of_All_Sheets[[nams]],
envir = .GlobalEnv))

applying same function on multiple files in R

I am new to R program and currently working on a set of financial data. Now I got around 10 csv files under my working directory and I want to analyze one of them and apply the same command to the rest of csv files.
Here are all the names of these files: ("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")
For example, because the Date column in CSV files are Factor so I need to change them to Date format:
CAN <- read.csv("CAN%10y.csv", header = T, sep = ",")
CAN$Date <- as.character(CAN$Date)
CAN$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
CAN_merge <- merge(all.dates.frame, CAN, all = T)
CAN_merge$Bid.Yield.To.Maturity <- NULL
all.dates.frame is a data frame of 731 consecutive days. I want to merge them so that each file will have the same number of rows which later enables me to combine 10 files together to get a 731 X 11 master data frame.
Surely I can copy and paste this code and change the file name, but is there any simple approach to use apply or for loop to do that ???
Thank you very much for your help.
This should do the trick. Leave a comment if a certain part doesn't work. Wrote this blind without testing.
Get a list of files in your current directory ending in name .csv
L = list.files(".", ".csv")
Loop through each of the name and reads in each file, perform the actions you want to perform, return the data.frame DF_Merge and store them in a list.
O = lapply(L, function(x) {
DF <- read.csv(x, header = T, sep = ",")
DF$Date <- as.character(CAN$Date)
DF$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
DF_Merge <- merge(all.dates.frame, CAN, all = T)
DF_Merge$Bid.Yield.To.Maturity <- NULL
return(DF_Merge)})
Bind all the DF_Merge data.frames into one big data.frame
do.call(rbind, O)
I'm guessing you need some kind of indicator, so this may be useful. Create a indicator column based on the first 3 characters of your file name rep(substring(L, 1, 3), each = 731)
A dplyr solution (though untested since no reproducible example given):
library(dplyr)
file_list <- c("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")
can_l <- lapply(
file_list
, read.csv
)
can_l <- lapply(
can_l
, function(df) {
df %>% mutate(Date = as.Date(as.character(Date), format ="%m/%d/%y"))
}
)
# Rows do need to match when column-binding
can_merge <- left_join(
all.dates.frame
, bind_cols(can_l)
)
can_merge <- can_merge %>%
select(-Bid.Yield.To.Maturity)
One possible solution would be to read all the files into R in the form of a list, and then use lapply to to apply a function to all data files. For example:
# Create vector of file names in working direcotry
files <- list.files()
files <- files[grep("csv", files)]
#create empty list
lst <- vector("list", length(files))
#Read files in to list
for(i in 1:length(files)) {
lst[[i]] <- read.csv(files[i])
}
#Apply a function to the list
l <- lapply(lst, function(x) {
x$Date <- as.Date(as.character(x$Date), format = "%m/%d/%y")
return(x)
})
Hope it's helpful.

Output formatting in R

I am new to R and trying to do some correlation analysis on multiple sets of data. I am able to do the analysis, but I am trying to figure out how I can output the results of my data. I'd like to have output like the following:
NAME,COR1,COR2
....,....,....
....,....,....
If I could write such a file to output, then I can post process it as needed. My processing script looks like this:
run_analysis <- function(logfile, name)
{
preds <- read.table(logfile, header=T, sep=",")
# do something with the data: create some_col, another_col, etc.
result1 <- cor(some_col, another_col)
result1 <- cor(some_col2, another_col2)
# somehow output name,result1,result2 to a CSV file
}
args <- commandArgs(trailingOnly = TRUE)
date <- args[1]
basepath <- args[2]
logbase <- paste(basepath, date, sep="/")
logfile_pattern <- paste( "*", date, "csv", sep=".")
logfiles <- list.files(path=logbase, pattern=logfile_pattern)
for (f in logfiles) {
name = unlist(strsplit(f,"\\."))[1]
logfile = paste(logbase, f, sep="/")
run_analysis(logfile, name)
}
Is there an easy way to create a blank data frame and then add data to it, row by row?
Have you looked at the functions in R for writing data to files? For instance, write.csv. Perhaps something like this:
rs <- data.frame(name = name, COR1 = result1, COR2 = result2)
write.csv(rs,"path/to/file",append = TRUE,...)
I like using the foreach library for this sort of thing:
library(foreach)
run_analysis <- function(logfile, name) {
preds <- read.table(logfile, header=T, sep=",")
# do something with the data: create some_col, another_col, etc.
result1 <- cor(some_col, another_col)
result2 <- cor(some_col2, another_col2)
# Return one row of results.
data.frame(name=name, cor1=result1, cor2=result2)
}
args <- commandArgs(trailingOnly = TRUE)
date <- args[1]
basepath <- args[2]
logbase <- paste(basepath, date, sep="/")
logfile_pattern <- paste( "*", date, "csv", sep=".")
logfiles <- list.files(path=logbase, pattern=logfile_pattern)
## Collect results from run_analysis into a table, by rows.
dat <- foreach (f=logfiles, .combine="rbind") %do% {
name = unlist(strsplit(f,"\\."))[1]
logfile = paste(logbase, f, sep="/")
run_analysis(logfile, name)
}
## Write output.
write.csv(dat, "output.dat", quote=FALSE)
What this does is to generate one row of output on each call to run_analysis, binding them into a single table called dat (the .combine="rbind" part of the call to foreach causes row binding). Then you can just use write.csv to get the output you want.

Resources