I have the following problem. I have a number of csv files data frames named a,b, c,...,m. I want to load them and change their names to a1,a2,a3, etc How can I do it in R?
I have tried the following, but it gives me an error:
paste0("a",1)<-read.csv("a")
I also tried a way to rename the files after loading, but I don't know a way to it successfully.
If you want to create multiple data.frame objects in the global environment (I would rather have those datasets within a list), you can read the specific files using lapply in a list, change the names of the list elements to the desired object names, finally use list2env. For example, suppose I have 3 files a.csv, b.csv, and c.csv and want to create dataframe objects a1, a2, a3 for those corresponding files.
files <- list.files(pattern='^[a-z]\\.csv')
nm1 <- paste0('a', 1:3)
lst1 <- setNames(lapply(files, function(x) read.csv(x)), nm1)
list2env(lst1, envir=.GlobalEnv)
Try
assign(paste0("a",1),read.csv("a"))
If you don't want to have a line for each data set, you can list your csv files in a vector, named for example list_files and then do :
for (i in 1:length(list_files)){
assign(paste0("a",i),read.csv(list_files[i]))
}
Related
I have a (long) list of .csv file names and want to read each .csv file into its own data frame in R.
... "./data/2019-Q2.csv"
"./data/2019-Q3.csv" ...
I thought this should work:
allDFs <- lapply(csvPath, read.csv)
But it just infinit loops and I have to manually stop it. Thanks for any help.
You can read in the data using list.files and lapply, as suggested in the OP comments. To make each list item a separate data frame, use the assign() function in a for loop:
d <- split(iris, f = iris$Species)
for (i in names(d)) {
assign(i, d[[i]])
}
This uses the list names as the newly assigned variable name, so make sure this is set appropriately.
I have a set of excel files each containing one sheet of data, all of similar structure (mostly -- see below), that I want to ultimately combine into one large data frame (with each sub-set indexed by original file source).
I am able to create a list of multiple dataframes, and then merge these into one dataframe, pretty easily with the following code:
files <- grep(".xlsx", dir(), value=TRUE) # vector of file names
IDnos <- substr(files,20,24) #vector with key 5-digit ID info of each file
library("XLConnect")
library("data.table")
datalist <- lapply(files, readWorksheetFromFile, sheet = "Data")
names(datalist) <- IDnos
bigdatatable <- rbindlist(datalist, idcol = "IDNo")
One data column "Value" is usually class numeric, except I found that in several there was an "ND" put in to one row, making it class character, so in the final data frame the column is character.
Although I can fix this with some simple cleaning, I was left wondering if there is way to identify at the "list of dataframes" stage which files (or dataframe components of the list I created) with class character for column "Value". For example I can't run sapply(datalist,class) or other variations. I am hoping to avoid a for-loop.
Is there any way to use lapply or sapply to drill down into dataframes within a list?
Here's how I would use lapply to find the class of column a in a list of 2 data frames, named x and y.
datalist <- list(x = data.frame(a = letters),
y = data.frame(a = 1:26))
lapply(datalist, function(x) class(x$a))
$x
[1] "factor"
$y
[1] "integer"
I have a data.frame that contains one Date type variable. I want to export 4 files, one containing a subset corresponding to each week. The following will divide my data in 4 however I don't know how to store each of this in a new data.frame.
split(DataAir, sample(rep(1:4)))
Thanks
If you save your split data frames in a variable. You can access the elements with double-bracket subsetting, (e.g. s[[1]]). To save, create a vector of file names
as you'd like and write each to file.
s <- split(iris, iris$Species)
filenames <- paste0("my_path/file", 1:3, ".csv")
for(i in 1:length(s)) write.csv(s[[i]], filenames[i])
And for R users that get unnecessarily bugged out by for loops:
mapply(function(x,y) write.csv(x,y), s, filenames)
I have multiple data tables and all have a common column called ID. I have a vector vec that contains a set of ID values.
I would like to use lapply to subset all data tables using vec
I understand how to use lapply to subset the data tables but my question is how to assign the subsetted results back to original data tables
Here is what I tried :
tables<-c("dt1","dt2","dt3","dt4")
lapply(mget(tables),function(x)x[ID %in% vec,])
The above gives subsets of all data tables but how do I assign them back to dt1,dt2,dt3,dt4 ?
I would keep the datasets in the list rather than updating the dataset objects in the global environment as most of the operations can be done within the list (including reading the files and writing to output files ). But, if you insist, we can use list2env which will update the original dataset objects with the subset dataset
lst <- lapply(mget(tables),function(x)x[ID %in% vec,])
list2env(lst, envir=.GlobalEnv)
You could also just name the datasets in the list:
tables <- c("dt1","dt2","dt3","dt4")
dflist <- lapply(mget(tables),function(x)x[ID %in% vec,])
dflist <- setNames(dflist, tables)
I'm trying to store multiple dataframes in a list. However, at some point, the dataframes end up getting converted into lists, and so I end up with a list of lists.
All I'm really trying to do is keep all my dataframes together in some sort of structure.
Here's the code that fails:
all_dframes <- list() # initialise a list that will hold a dataframe as each item
for(file in filelist){ # load each file
dframe <- read.csv(file) # read CSV file
all_dframes[length(all_dframes)+1] <- dframe # add to the list
}
If I now call, for example, class(all_dframes[1]), I get 'list', whereas if I call class(dframe) I get 'data.frame'!
Of course, the class of all_dframes[1] is list since all_dframes is a list. The function [ returns a subset of the list. In this example, the length of the returned list is one. If you want to extract the data frame you have to use [[, i.e., all_dframes[[1]].
May I suggest this:
library(data.table)
all_dframes <- vector("list",length(filelist))
for(i in 1:length(filelist)){ # load each file
all_dframes[[i]]<-fread(filelist[i])
}
Is this what you need?