I have a data.frame that contains one Date type variable. I want to export 4 files, one containing a subset corresponding to each week. The following will divide my data in 4 however I don't know how to store each of this in a new data.frame.
split(DataAir, sample(rep(1:4)))
Thanks
If you save your split data frames in a variable. You can access the elements with double-bracket subsetting, (e.g. s[[1]]). To save, create a vector of file names
as you'd like and write each to file.
s <- split(iris, iris$Species)
filenames <- paste0("my_path/file", 1:3, ".csv")
for(i in 1:length(s)) write.csv(s[[i]], filenames[i])
And for R users that get unnecessarily bugged out by for loops:
mapply(function(x,y) write.csv(x,y), s, filenames)
Related
There are around 3k .txt files, comma separated with equal structure and no col names.
e.g. 08/15/2018,11.84,11.84,11.74,11.743,27407 ///
I only need col1 (date) and col 5 (11.743) and would like to import all those vectores with the name of the .txt file assigned (AAAU.txt -> AAAU vector). In a second step I would like to merge them to a matrix, with all the possible dates in rows and colums with .txt filename and col5 value for each date.
I tried using readr, but I was unable to include the information of the filename, thus I cannot proceed.
Cheers for any help!
I didn't test this code, but I think this will work for you. You can use list.files() to pull in all file names into a variable, then read each one individually and append it to a new data frame with either rbind() or cbind()
setwd("C:/your_favorite_directory/")
fnames <- list.files()
csv <- lapply(fnames, read.csv)
result <- do.call(rbind, csv)
# grab a subset of the fields you need
df <- subset(result, select = c(a, e))
#then write your final file
write.table(df,"AllFiles.txt",sep=",")
Also, the '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
df = subset(mydata, select = -c(b,c,d) )
I have a set of excel files each containing one sheet of data, all of similar structure (mostly -- see below), that I want to ultimately combine into one large data frame (with each sub-set indexed by original file source).
I am able to create a list of multiple dataframes, and then merge these into one dataframe, pretty easily with the following code:
files <- grep(".xlsx", dir(), value=TRUE) # vector of file names
IDnos <- substr(files,20,24) #vector with key 5-digit ID info of each file
library("XLConnect")
library("data.table")
datalist <- lapply(files, readWorksheetFromFile, sheet = "Data")
names(datalist) <- IDnos
bigdatatable <- rbindlist(datalist, idcol = "IDNo")
One data column "Value" is usually class numeric, except I found that in several there was an "ND" put in to one row, making it class character, so in the final data frame the column is character.
Although I can fix this with some simple cleaning, I was left wondering if there is way to identify at the "list of dataframes" stage which files (or dataframe components of the list I created) with class character for column "Value". For example I can't run sapply(datalist,class) or other variations. I am hoping to avoid a for-loop.
Is there any way to use lapply or sapply to drill down into dataframes within a list?
Here's how I would use lapply to find the class of column a in a list of 2 data frames, named x and y.
datalist <- list(x = data.frame(a = letters),
y = data.frame(a = 1:26))
lapply(datalist, function(x) class(x$a))
$x
[1] "factor"
$y
[1] "integer"
I'm just learning R. I have 300 different files containing rainfall data. I want to create a function that takes a range of values (i.e., 20-40). I will then read csv files named "020.csv", "021.csv", "022.csv" etc. up to "040.csv".
Each of these files has a variable named "rainfall". I want to open each csv file, extract the "rainfall" values and store (append) them to some sort of object, like a data frame (maybe something else is better?). So, when I'm done, I'll have a data frame or list with a single column containing rainfall data from all processed files.
This is what I have...
rainfallValues <- function(id = 1:300) {
df = data.frame()
# Read anywhere from 1 to 300 files
for(i in id) {
# Form a file name
fileName <- sprintf("%03d.csv",i)
# Read the csv file which has four variables (columns). I'm interested in
# a variable named "rainfall".
x <- read.csv(fileName,header=T)
# This is where I am stuck. I know how to exact the "rainfall" variable values from
# x, I just don't know how to append them to my data frame.
}
}
Here is a method using lapply that will return a list of rainfalls
rainList <- lapply(id, function(i) {
temp <- read.csv(sprintf("%03d.csv",i))
temp$rainfall
})
To put this into a single vector:
rainVec <- unlist(rainList)
comment
The unlist function will preserve the order that you read in the files, so the first element of rainVec will be the first observation of the first rainfall column from the first file in id and the second element the second observation in that files and so on to the last observation of the last file.
I have the following problem. I have a number of csv files data frames named a,b, c,...,m. I want to load them and change their names to a1,a2,a3, etc How can I do it in R?
I have tried the following, but it gives me an error:
paste0("a",1)<-read.csv("a")
I also tried a way to rename the files after loading, but I don't know a way to it successfully.
If you want to create multiple data.frame objects in the global environment (I would rather have those datasets within a list), you can read the specific files using lapply in a list, change the names of the list elements to the desired object names, finally use list2env. For example, suppose I have 3 files a.csv, b.csv, and c.csv and want to create dataframe objects a1, a2, a3 for those corresponding files.
files <- list.files(pattern='^[a-z]\\.csv')
nm1 <- paste0('a', 1:3)
lst1 <- setNames(lapply(files, function(x) read.csv(x)), nm1)
list2env(lst1, envir=.GlobalEnv)
Try
assign(paste0("a",1),read.csv("a"))
If you don't want to have a line for each data set, you can list your csv files in a vector, named for example list_files and then do :
for (i in 1:length(list_files)){
assign(paste0("a",i),read.csv(list_files[i]))
}
I'm trying to store multiple dataframes in a list. However, at some point, the dataframes end up getting converted into lists, and so I end up with a list of lists.
All I'm really trying to do is keep all my dataframes together in some sort of structure.
Here's the code that fails:
all_dframes <- list() # initialise a list that will hold a dataframe as each item
for(file in filelist){ # load each file
dframe <- read.csv(file) # read CSV file
all_dframes[length(all_dframes)+1] <- dframe # add to the list
}
If I now call, for example, class(all_dframes[1]), I get 'list', whereas if I call class(dframe) I get 'data.frame'!
Of course, the class of all_dframes[1] is list since all_dframes is a list. The function [ returns a subset of the list. In this example, the length of the returned list is one. If you want to extract the data frame you have to use [[, i.e., all_dframes[[1]].
May I suggest this:
library(data.table)
all_dframes <- vector("list",length(filelist))
for(i in 1:length(filelist)){ # load each file
all_dframes[[i]]<-fread(filelist[i])
}
Is this what you need?