read.csv from list to get unique colnames - r

I am reading my files into file_list. The data is read using read.csv, however, I want the data in datalist to have colnames as the file-names the file_list. The original files does not have a header.
How do I change function(x) so that the the second column has colname similar to the file-name. The first column does not have to be unique.
file_list = list.files(pattern="*.csv")
datalist = lapply(file_list, function(x){read.csv(file=x,header=F,sep = "\t")})

How do I change function(x) so that the the second column has colname similar to the file-name?
datalist = lapply(file_list, function(x){
dat = read.csv(file=x, header=F, sep = "\t")
names(dat)[2] = x
return(dat)
})
This will put the name of the file as the name of the second column. If you want to edit the name, use gsub or substr (or similar) on x to modify the string.

You can just add another step.
names(datalist) <- file_list

Related

Use string from filename as column data when reading multiple files with lapply in R

I am importing multiple excel files, removing rows and columns, before binding together. I want to add a column, called id which is part of the excel filenames. I am having trouble with the line of code to add this column
This is my code
library(readxl)
f_path <- "filepath/"
filelist = list.files(path = f_path, pattern = "\\.xlsx", full.names = TRUE)
dat1= lapply(filelist, function(x){
df = read_excel(x, col_names = FALSE)
df1$id = gsub("\\-\\d+\\.xlsx$", "" , filelist) # this is the line causing difficulty. The regex for the gsub is correct.
})
dat2= do.call("rbind.data.frame", dat1)
The error I get is
Assigned data `gsub("\\\\-\\\\d+\\\\.xlsx$", "", filelist)` must be compatible with existing data.
UPDATE
filelist has been replaced with x as follows
dat1= lapply(filelist, function(x){
df = read_excel(x, col_names = FALSE)
df$id = gsub("\\-\\d+\\.xlsx$", "" , x)
})
dat2= do.call("rbind.data.frame", dat1)
But rather than creating id as a new "column" (name of item in list) with the results of gsub as the data in the column, the result of gsub is the name of the "column", so list items can't be binded together

Create a list of tibbles with unique names using a for loop

I'm working on a project where I want to create a list of tibbles containing data that I read in from Excel. The idea will be to call on the columns of these different tibbles to perform analyses on them. But I'm stuck on how to name tibbles in a for loop with a name that changes based on the for loop variable. I'm not certain I'm going about this the correct way. Here is the code I've got so far.
filenames <- list.files(path = getwd(), pattern = "xlsx")
RawData <- list()
for(i in filenames) {
RawData <- list(i <- tibble(read_xlsx(path = i, col_names = c('time', 'intesity'))))
}
I've also got the issue where, right now, the for loop overwrites RawData with each turn of the loop but I think that is something I can remedy if I can get the naming convention to work. If there is another method or data structure that would better suite this task, I'm open to suggestions.
Cheers,
Your code overwrites RawData in each iteration. You should use something like this to add the new tibble to the list RawData <- c(RawData, read_xlsx(...)).
A simpler way would be to use lapply instead of a for loop :
RawData <-
lapply(
filenames,
read_xlsx,
col_names = c('time', 'intesity')
)
Here is an approach with map from package purrr
library(tidyverse)
filenames <- list.files(path = getwd(), pattern = "xlsx")
mylist <- map(filenames, ~ read_xlsx(.x, col_names = c('time', 'intesity')) %>%
set_names(filenames)
Similar to the answer by #py_b, but add a column with the original file name to each element of the list.
filenames <- list.files(path = getwd(), pattern = "xlsx")
Raw_Data <- lapply(filenames, function(x) {
out_tibble <- read_xlsx(path = x, col_names = c('time', 'intesity'))
out_tibble$source_file <- basename(x) # add a column with the excel file name
return(out_tibble)
})
If you want to merge the list of tibbles into one big one you can use do.call('rbind', Raw_Data)

R - extracting column in dataframes of a loop

I need to save a list of csv files and extract values from thr 13th row on of a specific column (the second one) from each of dataframes.
Here's my try:
temp <- list.files(FILEPATH, pattern="*\\.csv$", full.names = TRUE)
for (i in 1:length(temp)){
assign(temp[i], read.csv(temp[i], header=TRUE, ski[=13, na.strings=c("", "NA")))
subset(temp[i], select=2) #extract the second column of the dataframe
temp[i] <- na.omit(temp[i])
However, this doesn't work. On the one hand, I think that's because of the skip argument of the read.csv command, as it apparently ignores the headers. On the other hand, if skip is not used, the following error pops up:
Error in subset.default(temp[i], select = 2) : argument "subset" is
missing, with no default
When I insert the argument subset=TRUE in the subset command, it doesn't give any error, but no extraction is performed.
Any possible solution?
Without seeing the files it's not easy to tell, but I would use lapply, not a for loop. Maybe you can get inspiration from something like the follwing. I use read.table because you skip = 13 lines and read.csv reads in the first line as column headers. Note that I avoid the use of assign.
df_list <- lapply(temp, read.table, sep = ",", skip = 13, na.strings = c("", "NA"))
names(df_list) <- temp
col2_list <- lapply(df_list, `[[`, 2)
col2_list <- lapply(col2_list, na.omit)
names(col2_list) <- temp
col2_list
If you want col2_list to be a list of df's with just one column each, column 2 of the original files, then, like I've said in comment use
col2_list <- lapply(df_list, `[`, 2)
And to rename that one column and renumber the rows consecutively
new_name <- "the_column_of_choice" # change this!
col2_list <- lapply(col2_list, function(x){
names(x) <- new_name
row.names(x) <- NULL
x
})

How to get the column name with reference to an object in r

I have multiple csv files for which I want to access second column for every file and do a regex which will remove all strings after ";". this Pattern is same for all the files.
I have referred this
In R, how to get an object's name after it is sent to a function?
This is a sample of my file
ID POLL
1 1,2:ksd ksj
2 3:jj
3 6:ok0j
This is what I have tried
setwd("D:/Data/STN")
temp = list.files(pattern="*.csv")
for(i in 1:length(temp)){
DF1=read.csv(temp[i])
col2=colnames(DF1)[2]
assign(paste(DF1,"$"),col2)
DF1$col2 = gsub(":.*","",DF1$col2)
In temp I have all names of all files, I tried with assign but no output.
Thanks in advance
We can use lapply to loop over the list of data.frames and replace the suffix part in the second column using sub.
lst1 <- lapply(lst, function(x) {x[,2] <- sub(":.*", "", x[,2])
x})
As noted below, the data.frames are read into a list
data
temp <- list.files(pattern="*.csv")
lst <- lapply(temp, read.csv, stringsAsFactors=FALSE)

How can I turn the filename into a variable when reading multiple csvs into R

I have a bunch of csv files that follow the naming scheme: est2009US.csv.
I am reading them into R as follows:
myFiles <- list.files(path="~/Downloads/gtrends/", pattern = "^est[[:digit:]][[:digit:]][[:digit:]][[:digit:]]US*\\.csv$")
myDB <- do.call("rbind", lapply(myFiles, read.csv, header = TRUE))
I would like to find a way to create a new variable that, for each record, is populated with the name of the file the record came from.
You can avoid looping twice by using an anonymous function that assigns the file name as a column to each data.frame in the same lapply that you use to read the csvs.
myDB <- do.call("rbind", lapply(myFiles, function(x) {
dat <- read.csv(x, header=TRUE)
dat$fileName <- tools::file_path_sans_ext(basename(x))
dat
}))
I stripped out the directory and file extension. basename() returns the file name, not including the directory, and tools::file_path_sans_ext() removes the file extension.
plyr makes this very easy:
library(plyr)
paths <- dir(pattern = "\\.csv$")
names(paths) <- basename(paths)
all <- ldply(paths, read.csv)
Because paths is named, all will automatically get a column containing those names.
Nrows <- lapply( lapply(myFiles, read.csv, header=TRUE), NROW)
# might have been easier to store: lapply(myFiles, read.csv, header=TRUE)
myDB$grp <- rep( myFiles, Nrows) )
You can create the object from lapply first.
Lapply <- lapply(myFiles, read.csv, header=TRUE))
names(Lapply) <- myFiles
for(i in myFiles)
Lapply[[i]]$Source = i
do.call(rbind, Lapply)

Resources