I am using a for loop to read in multiple csv files and naming the datasets import1, import2, etc. For example:
assign(paste("import",i,sep=""), read.csv(files[i], header=FALSE))
However, I now want to rename the variables in each dataset. I have tried the following:
names(as.name(paste("import",i,sep=""))) <- c("xxxx", "yyyy")
But get the error "target of assignment expands to non-language object". (I need to change the name of variables in each dataset within the loop as the variable names need to be different in each dataset).
Any suggestions on how to do this would be much appreciated.
Thanks.
While I do agree it would be much better to keep your data.frames in a list rather than creating a bunch of variables in your global environment, you can also set names when you read the files in
assign(paste("import",i,sep=""),
read.csv(files[i], header=FALSE, col.names=c("xxxx", "yyyy")))
Using assign() isn't very "R-like".
A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above):
import <- lapply(files, read.csv, header=FALSE)
Then if you want to operate on each data.frame in the list using a loop, you easily can:
for (i in seq_along(import)) names(import[[i]]) <- c('xxx', 'yyy')
Related
I am writing a generic procedure and I don't understand how to handle names of objects that are unknown. In this case I am loading all *.Rda files in a directory and doing rbind to make a data frame. The names and number of Rda files can vary. My question is how best to handle this situation?
library(data.table)
# Load all data frames in wd
my_files <- list.files(pattern='*.Rda',full.names = TRUE)
# Names of files without .Rda suffix
my_files_names <- gsub(".Rda$","",list.files(pattern='*.Rda'))
# load each data frame, creates objects with names in my_files_names
for(i in 1:length(my_files)){
load(my_files[i])
}
# make large data frame from all loaded data frames
combined_df <- rbindlist(my_files_names)
I am getting the error
Input is character but should be a plain list of items to be stacked
combined_df <- rbindlist(as.list(my_files_names)) doesn't work.
The example works using rbind with each object as an argument, but for some reason a character vector can't be used to refer to objects with names not known at run-time. What am I missing?
The solution was a two-liner:
library(dplyr)
my_files <- list.files(pattern='*.rds',full.names = TRUE)
combined_df <- bind_rows(lapply(my_files, readRDS))
First, the names of the objects were not important so I could this different approach. Second, the use of .Rda files was causing problems. This file type can contain more than one object. Although my files only had a single data frame per file, the code about would not run with load as an argument in lapply. I converted my files to .rds files, which only allow one data frame per file and the code ran fine.
I have written a program in R that takes all of the .csv files in a folder and imports them as data frames with the naming convention "main1," "main2," "main3" and so on for each data frame. The number of files in the folder may vary, so I was hoping the convention would make it easier to join the files later by being able to paste together the number of records. I successfully coded a way to find the folder and identify all of the files, as well as the total number of files.
agencyloc <- dirname(file.choose())
setwd(agencyloc)
listagencyfiles <- list.files(pattern = "*.csv")
numagencies <- 1:length(listagencyfiles)
I then created the individual dataframes without issue. I am not including this because it is long and does not relate to my problem. The problem is when I try to rbind these dataframes into one large dataframe, it says "Input to rbindlist must be a list of data.tables." Since there will be varying numbers of files, I can't just hard code this in, it has to be something similar to this. I tried the following, but it creates a list of strings and not a list of objects:
allfiles <- paste0("main", 1:length(numagencies))
However, this outputs a list of strings that can't be used to bind the fiels. Is there a way to change the data type from character strings to objects so that this will work when executed:
finaltable <- rbindlist(allfiles)
What I am looking for would almost be the opposite of as.character(objectname) if that makes any sense. I need to go from character to object instead of object to character.
I am all too inexperienced in programming generally and R specifically so please forgive me if what I have is bad coding.
The problem I am trying to solve is to load many separate csv files into R, tidy up the input a bit, perform a few operations on the resulting objects and eventually plot the results of those operations. The way I have tried to solve it is to use a vector of strings which echoes the object names to call the objects in question. This does not work.
Below is a bit of code which after loading the data does not work.
files=list.files('foldername',pattern="*.csv",full.names=F) #Make a list of files
filen=str_extract(files, '.*(?=\\.csv)') #Pretty the file names for object names
for (i in 1:length(files)){
assign(paste(filen[i]),read.csv(paste(files[i]))) #Load the files
as.object(filen[i])=as.object(filen[i])[,order(names(ATCN_21))] # pseudocode line
as.object(filen[i])=operation(as.object(filen[i]),parameter 1, parameter 2, etc) #More pseudocode
}
where operation may be a plot command or an arbitrary function such as rbind, colnames, whatever you may fancy.
In other words: I need some way to use string i in vector filen exactly as if it were an object name. How can I do this?
The solution: Lists. (Thank you, Pierre)
files=list.files('foldername',pattern="\\.csv$",full.names=F) #Make a list of files
filen=str_extract(files, '.*(?=\\.csv)') #Pretty the file names for object names
list=lst()
for (i in 1:length(files)){
lst[[i]]=read.csv(paste(files[i]))#Load the files
names(lst)[i]<-filen[i] #Name the entries
lst[[i]]=lst[[i]][,order(names(lst[[i]]))]
lst[[i]]=operation(foo)
}
Thank you for helping a clueless n00b.
I know that this question has been asked previously, but answers to the previous posts cannot seem to solve my problem.
I have dozens of tab-delimited .txt files. Each file has two columns ("pos", "score"). I would like to compile all of the "score" columns into one file with multiple columns. The number of rows in each file varies and they are irrelevant for the compilation.
If someone could direct me on how to accomplish this, preferably in R, it would be a lot of helpful.
Alternatively, my ultimate goal is to read the median and mean of the "score" column from each file. So if this could be accomplished, with or without compiling the files, it would be even more helpful.
Thanks.
UPDATE:
As appealing as the idea of personal code ninjas is, I understand this will have to remain a fantasy. Sorry for not being explicit.
I have tried lapply and Reduce, e.g.,
> files <- dir(pattern="X.*\\.txt$")
> File_list <- lapply(filesToProcess,function(score)
+ read.table(score,header=TRUE,row.names=1))
> File_list <- lapply(files,function(z) z[c("pos","score")])
> out_file <- Reduce(function(x,y) {merge(x,y,by=c("pos"))},File_list)
which I know doesn't really make sense, considering I have variable row numbers. I have also tried plyr
> files <- list.files()
> out_list <- llply(files,read.table)
As well as cbind and rbind. Usually I get an error message, because the row numbers don't match up or I just get all the "score" data compiled into one column.
The advice on similar posts (e.g., Merging multiple csv files in R, Simultaneously merge multiple data.frames in a list, and Merge multiple files in a list with different number of rows) has not been helpful.
I hope this clears things up.
This problem could be solved in two steps:
Step 1. Read the data from your csv files into a list of data frames, where files is a vector of file names. If you need to add extra arguments to read.csv, add them like shown below. See ?lapply for details.
list_of_dataframes <- lapply(files, read.csv, stringsAsFactors = FALSE)
Step 2. Calculate means for each data frame:
means <- sapply(list_of_dataframes, function(df) mean(df$score))
Of course, you can always do it in one step like this:
means <- sapply(files, function(filename) mean(read.csv(filename)$score))
I think you want smth like this:
all_data = do.call(rbind, lapply(files,
function(f) {
cbind(read.csv(f), file_name=f)
}))
You can then do whatever "by" type of action you like. Also, don't forget to adjust the various read.csv options to suit your needs.
E.g. once you have the above, you can do the following (and much more):
library(data.table)
dt = data.table(all_data)
dt[, list(mean(score), median(score)), by = file_name]
A small note: you could also use data.table's fread, to read the files in instead of the read.table and its derivatives, and that would be much faster, and while we're at it, use rbindlist instead of do.call(rbind,.
I have to load in many files and tansform their data. Each file contains only one data.table, however the tables have various names.
I would like to run a single script over all of the files -- to do so, i must assign the unknown data.table to a common name ... say blob.
What is the R way of doing this? At present, my best guess (which seems like a hack, but works) is to load the data.table into a new environment, and then: assign('blob', get(objects(envir=newEnv)[1], env=newEnv).
In a reproducible context this is:
newEnv <- new.env()
assign('a', 1:10, envir = newEnv)
assign('blob', get(objects(envir=newEnv)[1], env=newEnv))
Is there a better way?
The R way is to create a single object, i.e. a single list of data tables.
Here is some pseudocode that contains three steps:
Use list.files() to create a list of all files in a folder.
Use lapply() and read.csv() to read your files and create a list of data frames. Replace read.csv() with read.table() or whatever is appropriate for your data.
Use lapply() again, this time with as.data.table() to convert the data frames to data tables.
The pseudocode:
filenames <- list.files("path/to/files")
dat <- lapply(files, read.csv)
dat <- lapply(dat, as.data.table)
Your result should be a single list, called dat, containing a data table for each of your original files.
I assume that you saved the data.tables using save() somewhat like this:
d1 <- data.table(value=1:10)
save(d1, file="data1.rdata")
and your problem is that when you load the file you don't know the name (here: d1) that you used when saving the file. Correct?
I suggest you use instead saveRDS() and readRDS() for saving/loading single objects:
d1 <- data.table(value=1:10)
saveRDS(d1, file="data1.rds")
blob <- readRDS("data1.rds")