I am reading each worksheet of Excel File named "REL" up to worksheet 4 using the repeat function given below. But after reading worksheet for each value of i, I want to save it first in my working directory before reading for i + 1.
i <- 1
repeat {
fcr <- read.xlsx("REL.xlsx", sheet = i, colNames = TRUE)
i <- i + 1
print(i)
if (i > 4) {
break
}
}
In the future please indicate which packages you are using when referencing non-base functions; presumably this is read.xlsx from the xlsx package. To save each worksheet as a csv, you would need to call write.csv(...) after reading the file in, and before the loop begins its next iteration. But you shouldn't even bother with repeat, etc... as above. Use something more idiomatic to R such as sapply:
library(xlsx)
##
list.files()
#[1] "REL.xlsx"
##
sapply(1:4, function(i) {
write.csv(
read.xlsx("REL.xlsx", sheetIndex = i, header = TRUE),
file = sprintf("WS%d.csv", i)
)
})
##
list.files()
#[1] "REL.xlsx" "WS1.csv" "WS2.csv" "WS3.csv" "WS4.csv"
Related
I amtrying to do some R coding for my project. Where I have to read some .csv files from one directory in R and I have to assign data frame as df_subject1_activity1, i have tried nested loops but it is not working.
ex:
my dir name is "Test" and i have six .csv files
subject1activity1.csv,
subject1activity2.csv,
subject1activity3.csv,
subject2activity1.csv,
subject2activity2.csv,
subject2activity3.csv
now i want to write code to load this .csv file in R and assign dataframe name as
ex:
subject1activity1 = df_subject1_activity1
subject1activity2 = df_subject1_activity2
.... so on using for loop.
my expected output is:
df_subject1_activity1
df_subject1_activity2
df_subject1_activity3
df_subject2_activity1
df_subject2_activity2
df_subject2_activity3
I have trie dfollowing code:
setwd(dirname(getActiveDocumentContext()$path))
new_path <- getwd()
new_path
data_files <- list.files(pattern=".csv") # Identify file names
data_files
for(i in 1:length(data_files)) {
for(j in 1:4){
assign(paste0("df_subj",i,"_activity",j)
read.csv2(paste0(new_path,"/",data_files[i]),sep=",",header=FALSE))
}
}
I am not getting desire output.
new to R can anyone please help.
Thanks
One solution is to use the vroom package (https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/), e.g.
library(tidyverse)
library(vroom)
library(fs)
files <- fs::dir_ls(glob = "subject_*.csv")
data <- purrr::map(files, ~vroom::vroom(.x))
list2env(data, envir = .GlobalEnv)
# You can also combine all the dataframes if they have the same columns, e.g.
library(data.table)
concat <- data.table::rbindlist(data, fill = TRUE)
You are almost there. As always, if you are unsure, is never a bad idea to code clearly using more lines.
data_files <- list.files(pattern=".csv", full.names=TRUE) # Identify file names data_files
for( data_file in data_files) {
## check that the data file matches our expected pattern:
if(!grepl( "subject[0-9]activity[0-9]", basename(data_file) )) {
warning( "skiping file ", basename(data_file) )
next
}
## start creating the variable name from the filename
## remove the .csv extension
var.name <- sub( "\\.csv", "", basename(data_file), ignore.case=TRUE )
## prepend 'df' and introduce underscores:
var.name <- paste0(
"df",
gsub( "(subject|activity)", "_\\1", var.name ) ## this looks for literal 'subject' and 'acitivity' and if found, adds an underscore in front of it
)
## now read the file
data.from.file <- read.csv2( data_file )
## and assign it to our variable name
assign( var.name, data.from.file )
}
I don't have your files to test with, but should the above fail, you should be able to run the code line by line and easily see where it starts to go wrong.
I'm processing some .xlsx, there are named like time1_drug1,time1_drug2,until tiume6_drug5 (30 files in total). I want to load these xlsx to R and name them to dataset such as t1d1, t2d2.
I tried to use sprintf, but I cannot figure out how to make valid.
for(i in 1:6) {
for(j in 1:5) {
sprintf("time%i","drug%j,i,j)=read.xlsx("/Users/pathway/dataset/time_sprintf(%i,i)_drug(%j,j).xlsx", 1)}
names(sprintf("t%i","d%j,i,j))=c("result", "testF","TestN")
sprintf("t%i","d%j,i,j)$Discription[which(sprintf("t%i","d%j,i,j)$testF>=1&sprintf("t%i","d%j,i,j)$TestN>=2)]="High+High"
}
}
I expect to get 30 data like t1d1 till t6d5.
You should (almost) never use assign. When reading multiple files into R you should (almost) always put them in a named list.
A rough outline of a much better approach is this:
# Put all the excel files in a directory and this retrieves all their paths
f <- dir("/Users/pathway/dataset/",full.names = TRUE)
# Read all files into a list
drug_time <- lapply(X = f,FUN = read.xlsx)
# Name each list element based on the file name
names(drug_time) <- gsub(pattern = ".xlsx",replacement = "",x = basename(f),fixed = TRUE)
You can use the for loop as you are, but you should also use the assign function:
for(i in 1:6){
for(j in 1:5){
assign(paste0('t', i, '_', 'd', j), read.xlsx(paste0("/Users/pathway/dataset/time_",i,"_drug",j,".xlsx"), 1))
}
}
My script reads in a list of text files from a folder. A calculation for all values in a few columns in each text file is made.
At the end I want to write the resulting data.frame into a new text file in a different location.
The problem is, that the script keeps overwriting the file it created before. So I end up with only one file (the last one that was read in).
But I don't get what I am doing wrong here. The output file name is different each time, so in my head it should produce separate files.
The script looks as follows:
RAW <- "C:/path/tofiles"
files <- list.files(RAW, full.names = TRUE)
for(j in length(files)) {
if(file.exists(files[[j]])){
data <- read.csv(files[[j]], skip = 0, header=FALSE)
data[9] <- do.call(cbind,lapply(data[9], function(x){(data[9]*0.01701)/0.00848}))
data[11] <- do.call(cbind,lapply(data[11], function(x){(data[11]*0.01834)/0.00848}))
data[13] <- do.call(cbind,lapply(data[13], function(x){(data[13]*0.00982)/0.00848}))
data[15] <- do.call(cbind,lapply(data[15], function(x){(data[15]*0.01011)/0.00848}))
OUT <- paste("C:/path/to/destination_folder",basename(files[[j]]),sep="")
write.table(data, OUT, sep=",", row.names = FALSE, col.names = FALSE, append = FALSE)
}
}
The problem is in your for loop. length(files) just provides 1 value, namely the length of your files-vector, while I think you want to have a sequence with that length.
Try seq_along or just for(j in files).
I'm not a very experienced R user. I need to loop through a folder of csv files and apply a function to each one. Then I would like to take the value I get for each one and have R dump them into a new column called "stratindex", which will be in one new csv file.
Here's the function applied to a single file
ctd=read.csv(file.choose(), header=T)
stratindex=function(x){
x=ctd$Density..sigma.t..kg.m.3..
(x[30]-x[1])/29
}
Then I can spit out one value with
stratindex(Density..sigma.t..kg.m.3..)
I tried formatting another file loop someone made on this board. That link is here:
Looping through files in R
Here's my go at putting it together
out.file <- 'strat.csv'
for (i in list.files()) {
tmp.file <- read.table(i, header=TRUE)
tmp.strat <- function(x)
x=tmp.file(Density..sigma.t..kg.m.3..)
(x[30]-x[1])/29
write(paste0(i, "," tmp.strat), out.file, append=TRUE)
}
What have I done wrong/what is a better approach?
It's easier if you read the file in the function
stratindex <- function(file){
ctd <- read.csv(file)
x <- ctd$Density..sigma.t..kg.m.3..
(x[30] - x[1]) / 29
}
Then apply the function to a vector of filenames
the.files <- list.files()
index <- sapply(the.files, stratindex)
output <- data.frame(File = the.files, StratIndex = index)
write.csv(output)
Using (openxlsx) package to write xlsx files.
I have a variable that is a vector of numbers
x <- 1:8
I then paste ".xlsx" to the end of each element of x to later create an xlsx file
new_x <- paste(x,".xlsx", sep = "")
I then write.xlsx using the ("openxlsx") package in a forloop to create new xlsx files
for (i in x) {
for (j in new_x) {
write.xlsx(i,j)
}}
When I open ("1.xlsx" - "8.xlsx"), all the files only have the number "8" on them. What I don't understand is why it doesn't have the number 1 for 1.xlsx - 7 for 7.xlsx, why does the 8th one overwrite everything else.
I even tried creating a new output for the dataframes as most others suggested
for (i in x) {
for (j in new_x) {
output[[i]] <- i
write.xlsx(output[[i]],j)
}}
And it still comes up with the same problem. I don't understand what is going wrong.
The problem is that you are creating each Excel file multiple times because you have nested loops. Try just using a single loop, and referring to an element of new_x.
x <- 1:8
new_x <- paste(x,".xlsx", sep = "")
for (i in seq_along(x)) {
write.xlsx(i,new_x[i])
}
if you want to read a number of .csv files and save them as xlsx files it is a similar approach, you still want to only have a single for loop such as:
# Define directory of where to look for csv files and where to save Excel files
csvDirectory <- "C:/Foo/Bar/"
ExcelDirectory <- paste0(Sys.getenv(c("USERPROFILE")),"\\Desktop")
# Find all the csv files of interest
csvFiles <- list.files(csvDirectory,"*.csv")
# Go through the list of files and for each one read it into R, and then save it as Excel
for (i in seq_along(csvFiles)) {
csvFile <- read.csv(paste0(csvDirectory,"/",csvFiles[i]))
write.xlsx(csvFile, paste0(ExcelDirectory,"/",gsub("\\.csv$","\\.xlsx",csvFiles[i])))
}