Import multiple .tsv files at once as data frames [duplicate] - r

This question already has answers here:
Read multiple CSV files into separate data frames
(11 answers)
Closed 3 years ago.
I have N .tsv files saved in a file named "data" into my rstudio working directory and I want to find a way to import them as separated data frames at once. Below is an example when I try to do it one by one but there are too many of them and I want something faster. Also every time their total number may be different.
#read files into R
f1<-read.table(file = 'a_CompositeSources/In1B1A_WDNdb_DrugTargetInteractions_CompositeDBs_Adhesion.tsv', sep = '\t', header = TRUE)
f2<-read.table(file = 'a_CompositeSources/In1B2A_WDNdb_DrugTargetInteractions_CompositeDBs_Cytochrome.tsv', sep = '\t', header = TRUE)
I have used :
library(readr)
library(dplyr)
files <- list.files(path = "C:/Users/user/Documents/kate/data", pattern = "*.tsv", full.names = T)
tbl <- sapply(files, read_tsv, simplify=FALSE) %>%
bind_rows(.id = "id")
##Read files named xyz1111.csv, xyz2222.csv, etc.
filenames <- list.files(path="C:/Users/user/Documents/kate/data",
pattern="*.tsv")
##Create list of data frame names without the ".csv" part
names <-gsub(".tsv", "", filenames)
###Load all files
for(i in names){
filepath <- file.path("C:/Users/user/Documents/kate/data",paste(i,".tsv",sep=""))
assign(i, read.delim(filepath,
colClasses=c("factor","character",rep("numeric",2)),
sep = "\t"))
}
but only the 1st file is read.

If you have all the .tsv files in one folder and read them into a list using lapply or a for loop:
files_to_read <- list.files(path = "a_CompositeSources/",pattern = "\\.tsv$",full.names = T)
all_files <- lapply(files_to_read,function(x) {
read.table(file = x,
sep = '\t',
header = TRUE)
})
If you need to reference the files by name you could do names(all_files) <- files_to_read. You could then go ahead and combine them into one dataframe using bind_rows from the dplyr package or simply work with the list of dataframes.

Related

Combine .txt files and add part of file name as new column value

I am combining a number of files that are essentially .txt files, though called .sta.
I've used the following code to combine them after having trouble with base R apply and dplyr lapply:
library(plyr)
myfiles <- list.files(path="LDI files", pattern ="*.sta", full.names = TRUE)
dat_tab <- ldply(myfiles, read.table, header= TRUE, sep = "\t", skip = 5)
I want to add a column which has values which are part of the file names. File name examples are "GFREX28-00-1" and "GFREX1534-00-1" . I want to keep the digits immediately after GFREX, before the first dash -.
I'm not sure if I understood your question correctly. I provide a tentative answer. The idea is to assign a new column to the data.frame before returning it.
filepaths <- list.files(path="LDI files", pattern ="*.sta",
full.names = TRUE)
filesnames <- list.files(path="LDI files", pattern ="*.sta",
full.names = FALSE)
dat_tab <- lapply(1:length(filepaths), function(i) {
df <- read.table(filepaths[i] header= TRUE, sep = "\t", skip = 5)
df$fn <- gsub("GFREX","",filesnames[i])
df
})

How do I read multiple txt file and save them as respective object names? [duplicate]

This question already has answers here:
Read multiple CSV files into separate data frames
(11 answers)
Closed 5 years ago.
I have hundreds of .txt files. I want to automate a process to read them all and save them with their respective file name. For example, I want to save them in this order without typing the name of individual files.
mytext1.txt <-read.table("./mytext1.txt", sep = "\t")
mytext2.txt <-read.table("./mytext2.txt", sep = "\t")
Here is the code I have tried which of course doesn't save the dataframe in a separate object name.
filelist = list.files(pattern = ".*.txt")
datalist = lapply(filelist, FUN=read.table, header=TRUE, sep = "\t")
It looks like you are missing a line:
datafr = do.call("rbind", datalist)
See this post for reference: How do you read in multiple .txt files into R?
This may not be the best way, but it should do what you want:
read.and.write.table <- function(files){
for(fn in files){
input <- read.table(file = fn, header = TRUE, sep = "\t")
assign(x = fn, value = input, envir = .GlobalEnv)
}
}
filelist = list.files(pattern = ".*.txt")
read.and.write.table(filelist)
Will create separate global variables named after your .txt files. Of course you could include some string manipulation to pretty up the names.

Include .csv filename when reading data into r using list.files [duplicate]

This question already has answers here:
Add "filename" column to table as multiple files are read and bound [duplicate]
(6 answers)
When importing CSV into R how to generate column with name of the CSV?
(7 answers)
Closed last year.
I'm aggregating a bunch of CSV files in R, which I have done successfully using the following code (found here):
Tbl <- list.files(path = "./Data/CSVs/",
pattern="*.csv",
full.names = T) %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
I want to include the .csv filename (ideally without the file extension) as a column in Tbl. I found a solution using plyr, but I want to stick with dplyr as plyr causes glitches further down my code.
Is there any way I can add something to the above code that will tell R to include the file name in Tbl$filename?
Many thanks!
Here's my solution. Let me know if this helps.
Tbl <- list.files(path = "./Data/CSVs/",
pattern="*.csv",
full.names = T) %>%
map_df(function(x) read_csv(x, col_types = cols(.default = "c")) %>% mutate(filename=gsub(".csv","",basename(x))))
It's difficult to know exactly what you want since the format of your data in .csv is unclear. But try gsub. Assuming you have list of your files in Tbl.list:
library(dplyr)
Tbl.list <- list.files(path = "./Data/CSVs/",
pattern="*.csv",
full.names = T)
Convert to data.frame and then mutate filename subbing out ".csv" with "":
Tbl.df <- data.frame( X1 = Tbl.list ) %>%
mutate( filename_wo_ext = gsub( ".csv", "", X1 ) )
You could also try the following, but I'm not sure it'll work. (Let's assume you have Tbl.list still). Start by changing your map_df statement to add an index column:
map_df(~ read_csv(., col_types = cols(.default = "c")),
.id="index") %>%
mutate( gsub( ".csv", "", Tbl.list[as.numeric(index)] )
The column index should contain a character vector [1...n]. The mutate statement will look in Tbl.list, grab the filename at index, and sub out ".csv" with "" .

Read all txt files from a folder and create seperate variable for each file in r [duplicate]

This question already has an answer here:
How can I read multiple (excel) files into R? [duplicate]
(1 answer)
Closed 7 years ago.
I have a yearly stock data in a folder for the last 15 years containing 15 files(one file / year). This folder is also set as my working directory. I can read each file seperately and save it to a variable but i want to make a loop or function to read all the files and create a variable for each year. I have tried with the following code but I can not get the desired results. any Help?
reading each file seperately:
allData_2000 <- read.csv("......../Data_1999-2015/scrip_high_low_year_2000.txt",sep = ",", header = TRUE, stringsAsFactors = FALSE)
allData_2001 <- read.csv("......../Data_1999-2015/scrip_high_low_year_2000.txt",sep = ",", header = TRUE, stringsAsFactors = FALSE)
But i would like to read all the files using a loop:
path <- "....Data_1999-2015"
files <- list.files(path=path, pattern="*.txt")
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read.csv(paste(path,file,sep=",",header = TRUE, stringsAsFactors = FALSE)))
}
Try this improved code:
library(tools)
library(data.table)
files<-list.files(pattern="*.csv")
for (f in 1:length(files))
assign(paste("AllData_",gsub("[^0-9]","",file_path_sans_ext(files[[f]])),sep=""), fread(files[f]))
Try something like this, maybe.
df_list = list()
counter = 1
for(file in files){
temp_df = read.csv(paste0(path, '/', file), header=T, stringsAsFactors = F)
temp_df$year = gsub('[^0-9]', '', file)
df_list[[counter]] = temp_df
counter = counter + 1
}
big_df = do.call(rbind, df_list)
create an empty list, then iterate through the files, reading them in. Remove any non-numeric characters in the file to get the year (this is based off what your files look like above: some text, along with the year; if the files don't look like that, you'll need a different method than the gsub I did), and create that as a new variable, and then store the whole dataframe in a list. Then bind the dataframes into a single dataframe at the end.
Edit: upon a reread of your question, I'm not sure if what I told you do is what you want to do. If you just want to load up all the dataframes into memory, and give them a variable so that you can access them, without putting them into a single dataframe, I'd probably do something like this:
df_list = list()
for(file in files){
temp_df = read.csv(paste0(path, '/', file), header=T, stringsAsFactors = F)
year = gsub('[^0-9]', '', file)
df_list[[year]] = temp_df
}
Then each dataframe can be accessed like: df_list[['2000']] would be the dataframe for the year 2000.

How do you read multiple .txt files into R? [duplicate]

This question already has answers here:
How to import multiple .csv files at once?
(15 answers)
Closed 4 years ago.
I'm using R to visualize some data all of which is in .txt format. There are a few hundred files in a directory and I want to load it all into one table, in one shot.
Any help?
EDIT:
Listing the files is not a problem. But I am having trouble going from list to content. I've tried some of the code from here, but I get a bug with this part:
all.the.data <- lapply( all.the.files, txt , header=TRUE)
saying
Error in match.fun(FUN) : object 'txt' not found
Any snippets of code that would clarify this problem would be greatly appreciated.
You can try this:
filelist = list.files(pattern = ".*.txt")
#assuming tab separated values with a header
datalist = lapply(filelist, function(x)read.table(x, header=T))
#assuming the same header/columns for all files
datafr = do.call("rbind", datalist)
There are three fast ways to read multiple files and put them into a single data frame or data table
First get the list of all txt files (including those in sub-folders)
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.txt$",
full.names = TRUE)
1) Use fread() w/ rbindlist() from the data.table package
#install.packages("data.table", repos = "https://cran.rstudio.com")
library(data.table)
# Read all the files and create a FileName column to store filenames
DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE),
use.names = TRUE, idcol = "FileName")
2) Use readr::read_table2() w/ purrr::map_df() from the tidyverse framework:
#install.packages("tidyverse",
# dependencies = TRUE, repos = "https://cran.rstudio.com")
library(tidyverse)
# Read all the files and create a FileName column to store filenames
df <- list_of_files %>%
set_names(.) %>%
map_df(read_table2, .id = "FileName")
3) (Probably the fastest out of the three) Use vroom::vroom():
#install.packages("vroom",
# dependencies = TRUE, repos = "https://cran.rstudio.com")
library(vroom)
# Read all the files and create a FileName column to store filenames
df <- vroom(list_of_files, .id = "FileName")
Note: to clean up file names, use basename or gsub functions
Benchmark: readr vs data.table vs vroom for big data
Edit 1: to read multiple csv files and skip the header using readr::read_csv
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.csv$",
full.names = TRUE)
df <- list_of_files %>%
purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
purrr::map_df(read_csv,
col_names = FALSE,
skip = 1,
.id = "FileName")
Edit 2: to convert a pattern including a wildcard into the equivalent regular expression, use glob2rx()
There is a really, really easy way to do this now: the readtext package.
readtext::readtext("path_to/your_files/*.txt")
It really is that easy.
Look at the help for functions dir() aka list.files(). This allows you get a list of files, possibly filtered by regular expressions, over which you could loop.
If you want to them all at once, you first have to have content in one file. One option would be to use cat to type all files to stdout and read that using popen(). See help(Connections) for more.
Thanks for all the answers!
In the meanwhile, I also hacked a method on my own. Let me know if it is any useful:
library(foreign)
setwd("/path/to/directory")
files <-list.files()
data <- 0
for (f in files) {
tempData = scan( f, what="character")
data <- c(data,tempData)
}

Resources