I am working with multiple csv files in long format. Each file has a different number of columns but the same number of rows. I was trying to read all files and merged them in one df but I could not do it.
So far I use this code to read each file individually:
try <- read.table('input/SMPS/new_format/COALA_SMPS_20200218.txt', #set the file to read
sep = ',', #separator
header = F, # do not read the header
skip = 17, # skip 17 firdt lines of information
fill = T) %>% #fill all empty spaces in the df
t()%>% #transpose the data
data.frame()%>% #make it a df
select(1:196) #select the useful data
My plan was to use something similar to this code but I don't know where to include the transpose function to make it work.
smps_files_new <- list.files(pattern = '*.txt',path = 'input/SMPS/new_format/')#Change the path where the files are located
myfiles <-do.call("rbind", ##Apply the bind to the files
lapply(smps_files_new, ##call the list
function(x) ##apply the next function
read.csv(paste("input/SMPS/new_format/", x, sep=''),sep = ',', #separator
header = F, # do not read the header
skip = 17, # skip 17 first lines of information
stringsAsFactors = F,
fill = T))) ##
Use the same code in lapply which you used for individual files :
do.call(rbind, ##Apply the bind to the files
lapply(smps_files_new, ##call the list
function(x) ##apply the next function
read.csv(paste("input/SMPS/new_format/", x, sep=''),sep = ',',
header = F, # do not read the header
skip = 17, # skip 17 first lines of information
stringsAsFactors = FALSE,
fill = TRUE) %>%
t()%>%
data.frame()%>%
select(1:196)))
Another way would be to use purrr::map_df or map_dfr instead of lapply + do.call(rbind
purrr::map_df(smps_files_new,
function(x)
read.csv(paste("input/SMPS/new_format/", x, sep=''),sep = ',',
header = F,
skip = 17,
stringsAsFactors = FALSE,
fill = TRUE) %>%
t()%>%
data.frame()%>%
select(1:196)))
Related
The goal is to combine multiple .txt files with single column from different subfolders then cbind to one dataframe (each file will be one column), and keep file names as columne value, an example of the .txt file:
0.348107
0.413864
0.285974
0.130399
...
My code:
#list all the files in the folder
listfile<- list.files(path="",
pattern= "txt",full.names = T, recursive = TRUE) #To include sub directories, change the recursive = TRUE, else FALSE.
#extract the files with folder name aINS
listfile_aINS <- listfile[grep("aINS",listfile)]
#inspect file names
head(listfile_aINS)
#combined all the text files in listfile_aINS and store in dataframe 'Data'
for (i in 1:length(listfile_aINS)){
if(i==1){
assign(paste0("Data"), read.table(listfile[i],header = FALSE, sep = ","))
}
if(!i==1){
assign(paste0("Test",i), read.table(listfile[i],header = FALSE, sep = ","))
Data <- cbind(Data,get(paste0("Test",i))) #choose one: cbind, combine by column; rbind, combine by row
rm(list = ls(pattern = "Test"))
}
}
rm(list = ls(pattern = "list.+?"))
I ran into two problems:
R returns this error because the .txt files have different # of rows.
"Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 37, 36"
I have too many files so I hope to work around the error without having to fix the files into the same length.
my code won't keep file name as the column name
It will be easier to write a function and then rbind() the data from each file. The resulting data frame will have a file column with the filename from the listfile_aINS vector.
read_file <- function(filename) {
dat <- read.table(filename,header = FALSE, sep = ",")
dat$file <- filename
return(dat)
}
all_dat <- do.call(rbind, lapply(listfile_aINS, read_file))
If they don't all have the same number of rows it might not make sense to have each column be a file, but if you really want that you could make it into a wide dataset with NA filling out the empty rows:
library(dplyr)
library(tidyr)
all_dat %>%
group_by(file) %>%
mutate(n = 1:n()) %>%
pivot_wider(names_from = file, values_from = V1)
I am trying to save multiple csv files in one df and include a new column with the date of the file in the df. I already read all the files to get one df but I can't add the date column per file. Im using the next code
ccn_files <- list.files(pattern = '*.csv', path = "input/CCN/") ##Creates a list of all the files
ccn_data_raw <- do.call("rbind", ##Apply the bind to the files
lapply(ccn_files, ##call the list
function(x) ##apply the next function
read.csv(paste("input/CCN/", x, sep=''),fill = T, header = TRUE,
skip = 4)))
I was also able to get the date from all the files in a vector using this line
test <- ymd(substr(ccn_files,14,19))
How can I add this line inside the first chunk of code so it does what I want?
We can use Map
ccn_data_raw <- do.call(rbind, Map(cbind, lapply(ccn_files,
function(x) read.csv(paste("input/CCN/", x, sep=''),fill = TRUE,
header = TRUE, skip = 4)), date = test))
Or using purrr functions :
library(purrr)
ccn_data_raw <- map2_df(map(ccn_files, function(x)
read.csv(paste("input/CCN/", x, sep=''), fill = TRUE, header = TRUE,
skip = 4)), test, cbind)
I need to add 2 columns to a list of csv files and then write the csv's again into a folder. So, what I did is I used llply.
data_files <- list.files(pattern= ".csv$", recursive = T, full.names = F)
x <- llply(data_files, read.csv, header = T)
y <- llply(x, within, Cf <- var1 * 8)
z <- llply(y, within, Pc <- Cf + 1)
When I tried to write the files again using write.table in a loop:
lapply(z, FUN = function(eachPath) {
b <- read.csv(eachPath, header = F)
write.table(b, file = eachPath, row.names = F, col.names = T, quote = F)
})
I get this error and I think it is because z is a list of lists.
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection
What I think it needs to be done is to convert z in a list of dataframes. I would like and advise of how to do that, plus adding a command to extract the name of each file from a column containing the sample ID.
Thanks
I am trying to apply the same function to all csv files (identical structure) in a folder - adding two new columns based on 'old' columns, adding 0.05 to each variable and then saving it under the same name in the same folder as csv. Should be easy and there are several examples here for doing that, mostly using lapply, however, I keep running into an error:
Error in `$<-.data.frame`(`*tmp*`, "LAT", value = numeric(0)) : replacement has 0 rows, data has 3
This is my code:
my_files <- list.files(path="C:/PATH", pattern=".csv", full.names=T, recursive=FALSE)
add_col <- function(my_files) {
mpa <- read.csv(my_files, header=T)
mpa$LAT <- mpa$lat_bin + 0.05
mpa$LON <- mpa$lon_bin + 0.05
return(mpa)
write.csv(mpa,
append = FALSE,
quote = FALSE,
sep = ",",
row.names = FALSE,
col.names = TRUE)
}
I am unsure how best to do that for a large amounts of files.
Here is some sample code for the files
Df1 <- data.frame(lat_bin = c(50,40,70,6,8,4),lon_bin = (c(1,5,2,4,9,11)))
Df2 <- data.frame(lat_bin = c(66, 77, 82, 65, 88, 43),lon_bin = (c(2,3,4,5,11,51)))
Df3 <- data.frame(lat_bin = c(43,46,55,67,1,11),lon_bin = (c(7,6,5,9,11,15)))
write.csv(Df1, "data_1.csv", row.names=F)
write.csv(Df2, "data_2.csv", row.names=F)
write.csv(Df3, "data_3.csv", row.names=F)
Simply change parameters where function receives one file and you pass entire list of files inside lapply. As info, lappy is perhaps most popular of the apply family of functions that receives a list/vector input and returns an equal-length list where each input list element is passed into a function.
Specifically here res returns a list of dataframes equal to the number of files in my_files, each with column value changes. Also, write.csv had a missing file name, but below saves new csv files with _new suffix (double slashes to escape period, special character in regex).
my_files <- list.files(path="C:/PATH", pattern=".csv", full.names=T,
recursive=FALSE)
add_col <- function(one_file) {
mpa <- read.csv(one_file, header=T)
mpa$LAT <- mpa$lat_bin + 0.05
mpa$LON <- mpa$lon_bin + 0.05
write.csv(mpa,
file = sub("\\.csv", "_new\\.csv", one_file),
append = FALSE,
quote = FALSE,
sep = ",",
row.names = FALSE,
col.names = TRUE)
return(mpa)
}
res <- lapply(my_files, function(i) add_col(i)) # LONGER VERSION
res <- lapply(my_files, add_col) # SHORTER VERSION
I have 900 text files in my directory as seen in the following figure below
each file consists of data in the following format
667869 667869.000000
580083 580083.000000
316133 316133.000000
11065 11065.000000
I would like to extract fourth row from each text file and store the values in an array, any suggestions are welcome
This sounds more like a StackOverflow question, similar to
Importing multiple .csv files into R
You can try something like:
setwd("/path/to/files")
files <- list.files(path = getwd(), recursive = FALSE)
head(files)
myfiles = lapply(files, function(x) read.csv(file = x, header = TRUE))
mydata = lapply(myfiles, FUN = function(df){df[4,]})
str(mydata)
do.call(rbind, mydata)
A lazy answer is:
array <- c()
for (file in dir()) {
row4 <- read.table(file,
header = FALSE,
row.names = NULL,
skip = 3, # Skip the 1st 3 rows
nrows = 1, # Read only the next row after skipping the 1st 3 rows
sep = "\t") # change the separator if it is not "\t"
array <- cbind(array, row4)
}
You can further keep the name of the files
colnames(array) <- dir()