I have a list of files like:
nE_pT_sbj01_e2_2.csv,
nE_pT_sbj02_e2_2.csv,
nE_pT_sbj04_e2_2.csv,
nE_pT_sbj05_e2_2.csv,
nE_pT_sbj09_e2_2.csv,
nE_pT_sbj10_e2_2.csv
As you can see, the name of the files is the same with the exception of 'sbj' (the number of the subject) which is not consecutive.
I need to run a for loop, but I would like to retain the original number of the subject. How to do this?
I assume I need to replace length(file) with something that keeps the original number of the subject, but not sure how to do it.
setwd("/path")
file = list.files(pattern="\\.csv$")
for(i in 1:length(file)){
data=read.table(file[i],header=TRUE,sep=",",row.names=NULL)
source("functionE.R")
Output = paste("e_sbj", i, "_e2.Rdata")
save.image(Output)
}
The code above gives me as output:
e_sbj1_e2.Rdata,e_sbj2_e2.Rdata,e_sbj3_e2.Rdata,
e_sbj4_e2.Rdata,e_sbj5_e2.Rdata,e_sbj6_e2.Rdata.
Instead, I would like to obtain:
e_sbj01_e2.Rdata,e_sbj02_e2.Rdata,e_sbj04_e2.Rdata,
e_sbj05_e2.Rdata,e_sbj09_e2.Rdata,e_sbj10_e2.Rdata.
Drop the extension "csv", then add "Rdata", and use filenames in the loop, for example:
myFiles <- list.files(pattern = "\\.csv$")
for(i in myFiles){
myDf <- read.csv(i)
outputFile <- paste0(tools::file_path_sans_ext(i), ".Rdata")
outputFile <- gsub("nE_pT_", "e_", outputFile, fixed = TRUE)
save(myDf, file = outputFile)
}
Note: I changed your variable names, try to avoid using function names as a variable name.
If you use regular expressions and sprintf (or paste0), you can do it easily without a loop:
fls <- c('nE_pT_sbj01_e2_2.csv', 'nE_pT_sbj02_e2_2.csv', 'nE_pT_sbj04_e2_2.csv', 'nE_pT_sbj05_e2_2.csv', 'nE_pT_sbj09_e2_2.csv', 'nE_pT_sbj10_e2_2.csv')
sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
[1] "e_sbj01_e2.Rdata" "e_sbj02_e2.Rdata" "e_sbj04_e2.Rdata" "e_sbj05_e2.Rdata" "e_sbj09_e2.Rdata" "e_sbj10_e2.Rdata"
You can easily feed the vector to a function (if possible) or feed the function to the vector with sapply or lapply
fls_new <- sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
res <- lapply(fls_new,function(x) yourfunction(x))
If I understood correctly, you only change extension from .csv to .Rdata, remove last "_2" and change prefix from "nE_pT" to "e". If yes, this should work:
Output = sub("_2.csv", ".Rdata", sub("nE_pT, "e", file[i]))
Related
I amtrying to do some R coding for my project. Where I have to read some .csv files from one directory in R and I have to assign data frame as df_subject1_activity1, i have tried nested loops but it is not working.
ex:
my dir name is "Test" and i have six .csv files
subject1activity1.csv,
subject1activity2.csv,
subject1activity3.csv,
subject2activity1.csv,
subject2activity2.csv,
subject2activity3.csv
now i want to write code to load this .csv file in R and assign dataframe name as
ex:
subject1activity1 = df_subject1_activity1
subject1activity2 = df_subject1_activity2
.... so on using for loop.
my expected output is:
df_subject1_activity1
df_subject1_activity2
df_subject1_activity3
df_subject2_activity1
df_subject2_activity2
df_subject2_activity3
I have trie dfollowing code:
setwd(dirname(getActiveDocumentContext()$path))
new_path <- getwd()
new_path
data_files <- list.files(pattern=".csv") # Identify file names
data_files
for(i in 1:length(data_files)) {
for(j in 1:4){
assign(paste0("df_subj",i,"_activity",j)
read.csv2(paste0(new_path,"/",data_files[i]),sep=",",header=FALSE))
}
}
I am not getting desire output.
new to R can anyone please help.
Thanks
One solution is to use the vroom package (https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/), e.g.
library(tidyverse)
library(vroom)
library(fs)
files <- fs::dir_ls(glob = "subject_*.csv")
data <- purrr::map(files, ~vroom::vroom(.x))
list2env(data, envir = .GlobalEnv)
# You can also combine all the dataframes if they have the same columns, e.g.
library(data.table)
concat <- data.table::rbindlist(data, fill = TRUE)
You are almost there. As always, if you are unsure, is never a bad idea to code clearly using more lines.
data_files <- list.files(pattern=".csv", full.names=TRUE) # Identify file names data_files
for( data_file in data_files) {
## check that the data file matches our expected pattern:
if(!grepl( "subject[0-9]activity[0-9]", basename(data_file) )) {
warning( "skiping file ", basename(data_file) )
next
}
## start creating the variable name from the filename
## remove the .csv extension
var.name <- sub( "\\.csv", "", basename(data_file), ignore.case=TRUE )
## prepend 'df' and introduce underscores:
var.name <- paste0(
"df",
gsub( "(subject|activity)", "_\\1", var.name ) ## this looks for literal 'subject' and 'acitivity' and if found, adds an underscore in front of it
)
## now read the file
data.from.file <- read.csv2( data_file )
## and assign it to our variable name
assign( var.name, data.from.file )
}
I don't have your files to test with, but should the above fail, you should be able to run the code line by line and easily see where it starts to go wrong.
I'm trying to call a dataframe but it's named with a number because it was originally multiple. I want to either rename the dataframes in my loop or find a way to call my dataframe even though it is titled with a number. Right now, after I run this code:
filenames <- list.files(path = "filepath",pattern = ".*txt")
head(filenames)
names <- substr(filenames,1,22)
for(i in names){
filepath <-file.path("filepath",paste(i,".txt",sep = ""))
assign(i,read.delim(filepath,colClasses = c('character','character','factor','factor'),sep = "\t"))
}
I get a lot of separate dataframes with names like '101_1b1_Al_sc_Meditron.txt'. When I try to even view the dataframe, R is confused because the name begins with a number.
Is there a good solution here?
The simplest solution is to reference the original names using backticks.
example:
`123_mtcars` <- mtcars
View(`123_mtcars`)
If you would prefer to create a naming convention or just to remove numbers from each dataframe name you could do that in your loop and use the new variable in your assign statement.
example:
filenames <- list.files(path = "filepath",pattern = ".*txt")
head(filenames)
names <- substr(filenames,1,22)
for(i in names){
filepath <-file.path("filepath",paste(i,".txt",sep = ""))
# gsub to replace all numbers with "" for the name i
dfName <- gsub("[0-9]", "", i)
assign(dfName,read.delim(filepath,colClasses = c('character','character','factor','factor'),sep = "\t"))
}
The are 3 solutions I can think of :
1. Keeping your code in current state.
If we don't change anything about your code and your dataframes are named as '101_1b1_Al_sc_Meditron' to view the contents of the dataframe you can use backticks. Try using it like this :
`101_1b1_Al_sc_Meditron`
2. Change the name of dataframes.
In your loop change the assign line to
assign(paste0('df_', i), read.delim(filepath,
colClasses = c('character','character','factor','factor'),sep = "\t"))
So after running for loop you'll have filenames as df_101_1b1_Al_sc_Meditron which is a standard name and you can access them without any problem.
3. Store data in a list.
Instead of having so many dataframes in the global environment why not store them in a list. Lists are easier to manage.
list_of_files <-lapply(filepath, function(x) read.delim(x,
colClasses = c('character','character','factor','factor'),sep = "\t"))
How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?
Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}
Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})
The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}
I am looking for an elegant way to insert character (name) into directory and create .csv file. I found one possible solution, however I am looking another without "replacing" but "inserting" text between specific charaktects.
#lets start
df <-data.frame()
name <- c("John Johnson")
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
#how to insert "name" vector between "Desktop/" and "." to get:
dir <- c("C:/Users/uzytkownik/Desktop/John Johnson.csv")
write.csv(df, file=dir)
#???
#I found the answer but it is not very elegant in my opinion
library(qdapRegex)
dir2 <- c("C:/Users/uzytkownik/Desktop/ab.csv")
dir2<-rm_between(dir2,'a','b', replacement = name)
> dir2
[1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
write.csv(df, file=dir2)
I like sprintf syntax for "fill-in-the-blank" style string construction:
name <- c("John Johnson")
sprintf("C:/Users/uzytkownik/Desktop/%s.csv", name)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
Another option, if you can't put the %s in the directory string, is to use sub. This is replacing, but it replaces .csv with <name>.csv.
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
sub(".csv", paste0(name, ".csv"), dir, fixed = TRUE)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
This should get you what you need.
dir <- "C:/Users/uzytkownik/Desktop/.csv"
name <- "joe depp"
dirsplit <- strsplit(dir,"\\/\\.")
paste0(dirsplit[[1]][1],"/",name,".",dirsplit[[1]][2])
[1] "C:/Users/uzytkownik/Desktop/joe depp.csv"
I find that paste0() is the way to go, so long as you store your directory and extension separately:
path <- "some/path/"
file <- "file"
ext <- ".csv"
write.csv(myobj, file = paste0(path, file, ext))
For those unfamiliar, paste0() is shorthand for paste( , sep="").
Let’s suppose you have list with the desired names for some data structures you want to save, for instance:
names = [“file_1”, “file_2”, “file_3”]
Now, you want to update the path in which you are going to save your files adding the name plus the extension,
path = “/Users/Documents/Test_Folder/”
extension = “.csv”
A simple way to achieve it is using paste() to create the full path as input for write.csv() inside a lapply, as follows:
lapply(names, function(x) {
write.csv(x = data,
file = paste(path, x, extension))
}
)
The good thing of this approach is you can iterate on your list which contain the names of your files and the final path will be updated automatically. One possible extension is to define a list with extensions and update the path accordingly.
I am trying to analyze 10 sets of data, for which I have to import the data, remove some values and plot histograms. I could do it individually but can naturally save a lot of time with a for loop. I know this code is not correct, but I have no idea of how to specify the name for the input files and how to name each iterated variable in R.
par(mfrow = c(10,1))
for (i in 1:10)
{
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
What I want to do is to have the counter number in every "i" in my code. Am I just approaching this the wrong way in R? I have read about the assign and paste functions, but honestly do not understand how I can apply them properly in this particular problem.
you can do if in several ways:
Use list.files() to get all files given directory. You can use regular expression as well. See here
If the names are consecutive, then you can use
for (i in 1:10)
{
filename <- sprintf("freqspeci.frq_%s",i)
freqi <- read.delim(filename, sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
Use also can use paste() to create file name.
paste("filename", 1:10, sep='_')
you could just save all your datafiles into an otherwise empty Folder. Then get the filenames like:
filenames <- dir()
for (i in 1:length(filenames)){
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
# and here whatever else you want to do on These files
}