I'm trying to read 145 CSV. The idea then is to keep just the first row of each of them. I was able to find this suggestion on a blog: about how to read them.
library(readr)
files <- list.files(path = "~/Dropbox/Data/multiple_files", pattern = "*.csv", full.names = T)
tbl <- sapply(files, read_csv, simplify=FALSE) %>%
bind_rows(.id = "id")
But the columns of my csv are separated by | and I have not been able to find how to read them using the separator.
How should I include the separator in the code?
Thanks!
Related
I'm trying to combine multiple csv files with the same variables but some csv files have 'ZIP' as characters and others have 'ZIP' as numbers, so it won't allow me to use bind_rows().
Is there any way I can add to this to convert each 'ZIP' variable in the csv files to characters so I can apply bind_rows()?
Thanks,
df <- list.files(path = "/Users/XXXX", pattern = "*.csv") %>%
lapply(read_csv) %>%
bind_rows
Set col_types when you read the files. Perhaps
df <- list.files(path = "/Users/XXXX", pattern = "*.csv") %>%
lapply(read_csv, col_types = c(Zip = "character")) %>%
bind_rows
I need to open 100 ndjson large files (with same columns) ,
I have prepared a script to apply to each file but I would not like to repeat this 100 times !
With ndjson::stream_in , I can only open 1 ndjson file into R as a data frame
I tried the process to open multiple csv files and consolidate them into 1 dafatframe only, but it does not work with ndjson files :(
library(data.table)
library(purrr)
map_df_fread <- function(path, pattern = "*.ndjson") {
list.files(path, pattern, full.names = TRUE) %>%
map_df(~fread(., stringsAsFactors = FALSE))
}
myfiles <-
list.files(path = "C:/Users/sandrine/Documents/Projet/CAD/A/",
pattern = "*.ndjson",
full.names = T) %>%
map_df_fread(~fread(., stringsAsFactors = FALSE))
I tried to find also a package to convert ndjson files into csv ...but did not find any.
Any idea?
Using your own approach that you mentioned first, does this work?
library(tidyverse)
library(ndjson)
final_df <-
list.files(path = "C:/Users/sandrine/Documents/Projet/CAD/A/",
pattern = "*.ndjson",
full.names = T) %>%
map_dfr(~stream_in(.))
I have hundreds of csv files (all in a folder "Project A") each contains the same columns, but the first five rows are not part of the data frame.
I need to merge all the rows in every csv file starting from row 6, and create a master sheet in R. Here are my codes.
library(plyr)
library(readr)
myfiles <- list.files(path = "~/Projects/Project A", pattern = "*.csv", full.names = TRUE)
myfiles
do.call("rbind", lapply(myfiles, read.csv, header = TRUE))
How do I skip the first 5 rows? I know I should use skip = 5, but not sure where to put it or can it be integrated here.
I don't have a good way to test this, but think this will work:
library(tidyverse)
do.call("rbind", lapply(myfiles, read.csv, header = TRUE)) %>% slice(5:n())
Or, as James pointed out:
do.call("rbind", lapply(myfiles, read.csv, skip = 5, header = TRUE))
This question already has answers here:
Read multiple CSV files into separate data frames
(11 answers)
Closed 3 years ago.
I have N .tsv files saved in a file named "data" into my rstudio working directory and I want to find a way to import them as separated data frames at once. Below is an example when I try to do it one by one but there are too many of them and I want something faster. Also every time their total number may be different.
#read files into R
f1<-read.table(file = 'a_CompositeSources/In1B1A_WDNdb_DrugTargetInteractions_CompositeDBs_Adhesion.tsv', sep = '\t', header = TRUE)
f2<-read.table(file = 'a_CompositeSources/In1B2A_WDNdb_DrugTargetInteractions_CompositeDBs_Cytochrome.tsv', sep = '\t', header = TRUE)
I have used :
library(readr)
library(dplyr)
files <- list.files(path = "C:/Users/user/Documents/kate/data", pattern = "*.tsv", full.names = T)
tbl <- sapply(files, read_tsv, simplify=FALSE) %>%
bind_rows(.id = "id")
##Read files named xyz1111.csv, xyz2222.csv, etc.
filenames <- list.files(path="C:/Users/user/Documents/kate/data",
pattern="*.tsv")
##Create list of data frame names without the ".csv" part
names <-gsub(".tsv", "", filenames)
###Load all files
for(i in names){
filepath <- file.path("C:/Users/user/Documents/kate/data",paste(i,".tsv",sep=""))
assign(i, read.delim(filepath,
colClasses=c("factor","character",rep("numeric",2)),
sep = "\t"))
}
but only the 1st file is read.
If you have all the .tsv files in one folder and read them into a list using lapply or a for loop:
files_to_read <- list.files(path = "a_CompositeSources/",pattern = "\\.tsv$",full.names = T)
all_files <- lapply(files_to_read,function(x) {
read.table(file = x,
sep = '\t',
header = TRUE)
})
If you need to reference the files by name you could do names(all_files) <- files_to_read. You could then go ahead and combine them into one dataframe using bind_rows from the dplyr package or simply work with the list of dataframes.
This question already has answers here:
How to import multiple .csv files at once?
(15 answers)
Closed 4 years ago.
I'm using R to visualize some data all of which is in .txt format. There are a few hundred files in a directory and I want to load it all into one table, in one shot.
Any help?
EDIT:
Listing the files is not a problem. But I am having trouble going from list to content. I've tried some of the code from here, but I get a bug with this part:
all.the.data <- lapply( all.the.files, txt , header=TRUE)
saying
Error in match.fun(FUN) : object 'txt' not found
Any snippets of code that would clarify this problem would be greatly appreciated.
You can try this:
filelist = list.files(pattern = ".*.txt")
#assuming tab separated values with a header
datalist = lapply(filelist, function(x)read.table(x, header=T))
#assuming the same header/columns for all files
datafr = do.call("rbind", datalist)
There are three fast ways to read multiple files and put them into a single data frame or data table
First get the list of all txt files (including those in sub-folders)
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.txt$",
full.names = TRUE)
1) Use fread() w/ rbindlist() from the data.table package
#install.packages("data.table", repos = "https://cran.rstudio.com")
library(data.table)
# Read all the files and create a FileName column to store filenames
DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE),
use.names = TRUE, idcol = "FileName")
2) Use readr::read_table2() w/ purrr::map_df() from the tidyverse framework:
#install.packages("tidyverse",
# dependencies = TRUE, repos = "https://cran.rstudio.com")
library(tidyverse)
# Read all the files and create a FileName column to store filenames
df <- list_of_files %>%
set_names(.) %>%
map_df(read_table2, .id = "FileName")
3) (Probably the fastest out of the three) Use vroom::vroom():
#install.packages("vroom",
# dependencies = TRUE, repos = "https://cran.rstudio.com")
library(vroom)
# Read all the files and create a FileName column to store filenames
df <- vroom(list_of_files, .id = "FileName")
Note: to clean up file names, use basename or gsub functions
Benchmark: readr vs data.table vs vroom for big data
Edit 1: to read multiple csv files and skip the header using readr::read_csv
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.csv$",
full.names = TRUE)
df <- list_of_files %>%
purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
purrr::map_df(read_csv,
col_names = FALSE,
skip = 1,
.id = "FileName")
Edit 2: to convert a pattern including a wildcard into the equivalent regular expression, use glob2rx()
There is a really, really easy way to do this now: the readtext package.
readtext::readtext("path_to/your_files/*.txt")
It really is that easy.
Look at the help for functions dir() aka list.files(). This allows you get a list of files, possibly filtered by regular expressions, over which you could loop.
If you want to them all at once, you first have to have content in one file. One option would be to use cat to type all files to stdout and read that using popen(). See help(Connections) for more.
Thanks for all the answers!
In the meanwhile, I also hacked a method on my own. Let me know if it is any useful:
library(foreign)
setwd("/path/to/directory")
files <-list.files()
data <- 0
for (f in files) {
tempData = scan( f, what="character")
data <- c(data,tempData)
}