Open 100 files in R - r

I need to read many files with data, but I can't make it work.
For example: I have 6 ASCII files named "rain,wind, etc..."
This is what I thought:
namelist<-c("rain","wind","sunshine hour","radiation","soil moisture","pressure")
for (i in 1:6){
metedata<-read.table('d:/namelist[i].txt')
metedata
}
But that didn't work. What should I do?

Try this :
namelist<-c("rain","wind","sunshine hour","radiation","soil moisture","pressure")
for (name in namelist){
metedata<-read.table(paste0('d:/',name,'.txt')
metedata
}

Or read them into a list using lapply. Assuming your working directory is in the location of the files:
dat = lapply(list.files(pattern = "txt"), read.table)
this makes a list of all the .txt files in your working directory, and call read.table on them, returning a list of their contents.
Or directly read them into one big data.frame:
library(plyr)
dat = ldply(list.files(pattern = "txt"), read.table)

Related

R - read_csv without using paste

Use the read_csv function to read each of the files you got in the files object with code below:
path <- system.file("extdata", package = "dslabs")
files <- list.files(path)
files
I tried this code below but I get "vroom_ error. Please help.
for (f in files){
read_csv(f)
}
First, in list.files you should set full.names=TRUE to include the whole path. Next, if you look into files, there are also .xls and .pdf files included. You may want to filter just for .csv files, which can easily be done using grep.
files <- list.files(path, full.names=TRUE)
files <- grep('.csv$', files, value=TRUE)
However, even then readr::read_csv complains about column issues.
lst <- readr::read_csv(files)
# Error: Files must all have 2 columns:
# * File 2 has 57 columns
To avoid editing the columns by hand, I recommend to use rio::import_list instead, which gives just a warning, that a column name was guessed and can be changed if needed. You may even include the .xls in the grep.
files <- grep('.csv$|.xls', files, value=TRUE)
lst <- rio::import_list(files)
Note that rio::import_list (as well as readr::read_csv) is vectorized, so you won't need a loop.
Data:
path <- system.file("extdata", package="dslabs")

Import directory of docx files

I have a directory with .docx files that I want to import via textreadr's read_docx function.
First I set the working directory and create list of files:
setwd("C:/R")
files <- list.files("C:/R", pattern = "\\.docx")
Now I want to iterate through the list and import every file individually, named data_"file":
for (file in files) {
assign("data_", file, sep = "") <- read_docx("file")
}
Optionally, I tried creating a list of lists:
data_list <- lapply(files, function(v){
read_docx("v")
})
Both variants don't work and I'm not sure what I do wrong.
Maybe the full path is not present, we can add
files <- list.files("C:/R", pattern = "\\.docx", full.names = TRUE)
The issue is that v or file is quoted i.e. "" i.e. it is trying to read a string "v" instead of the value. Thus, the code in the OP's post can be corrected to
data_list <- lapply(files, function(v){
read_docx(v)
})
or in the for loop
for (file in files) {
assign(paste0("data_", file, sep = ""), read_docx(file))
}
Also, as noted in the comments, if there are 1000 files, assign creates 1000 new objects which is a bit messy when we want to gather all of them again. Instead, as in the lapply, which creates a single list, the output from for loop can be store in a list
data_list2 <- vector('list', length(files))
names(data_list2) <- files
for(file in files) {
data_list2[[file]] <- read_docx(file)
}
First off, you need to grab the full path instead of just the filenames from list.files:
files <- list.files("C:/R", pattern = "\\.docx$", full.names = TRUE)
Then the lapply solution works if you pass the parameter v to read_docx instead of a literal string "v". You don’t even need the nested function:
data_list <- lapply(files, read_docx)
As an aside, there’s no need for setwd in your code, and its use is strongly discouraged.
Furthermore, using the assign function as in your code doesn’t work and even after fixing the syntax, this use is simply completely inappropriate: at best it is a hack that approximates the functionality of lists, but badly. The correct solution, 10 times out of 10, is to use a named list or vector in its place.

reading multiple csv files using data.table doesn't work when given files path, possible bug?

I want to read multiple csv files where I only read two columns from each. So my code is this:
library(data.table)
files <- list.files(pattern="C:\\Users\\XYZ\\PROJECT\\NAME\\venv\\RawCSV_firstBatch\\*.csv")
temp <- lapply(files, function(x) fread(x, select = c("screenNames", "retweetUserScreenName")))
data <- rbindlist(temp)
This yields character(0). However when I move those csv files out to where my script is, and change the files to this:
files <- list.files(pattern="*.csv")
#....
My dir() output is this:
[1] "adjaceny_list.R" "cleanusrnms_firstbatch"
[3] "RawCSV_firstBatch" "username_cutter.py"
everything gets read. Could you help me track down what's exactly going on please? The folder that contains these csv files are in same directory where the script is. SO even if I do patterm= "RawCSV_firstBatch\\*.csv" same problem.
EDIT:
also did:
files <- list.files(path="C:\\Users\\XYZ\\PROJECT\\NAME\\venv\\RawCSV_firstBatch\\",pattern="*.csv")
#and
files <- list.files(pattern="C:/Users/XYZ/PROJECT/NAME/venv/RawCSV_firstBatch/*.csv")
Both yielded empty data frame.
#NelsonGon mentioned a workaround:
Do something like: list.files("./path/folder",pattern="*.csv$") Use ..
or . as required.(Not sure about using actual path). Can also utilise
~
So that works. Thank you. (sorry have 2 days limit before I tick this as answer)

Read files with a specific a extension from a folder in R

I want to read files with extension .output with the function read.table.
I used pattern=".output" but its'not correct.
Any suggestions?
As an example, heres how you could read in files with the extension ".output" and create a list of tables
list.filenames <- list.files(pattern="\\.output$")
trialsdata <- lapply(list.filenames,read.table,sep="\t")
or if you just want to read them one at a time manually just include the extention in the filename argument.
read.table("ACF.output",sep=...)
So finally because i didn't found a solution(something is going wrong with my path) i made a text file including all the .output files with ls *.output > data.txt.
After that using :
files = read.table("./data.txt")
i am making a data.frame including all my files and using
files[] <- lapply(files, as.character)
Finally with test = read.table(files[i,],header=F,row.names=1)
we could read every file which is stored in i (i = no of line).

Read several PDF files into R with pdf_text

I have several PDF files in my directory. I have downloaded them previously, no big deal so far.
I want to read all those files in R. My idea was to use the "pdf_text" function from the "pdftools" package and write a formula like this:
mypdftext <- pdf_text(files)
Where "files" is an object that gathers all the PDF file names, so that I don't have to write manually all the names. Because I have actually downlaoded a lot of files, it would avoid me to write:
mypdftext <- pdf_text("file1.pdf", "file2.pdf", and many more files...)
To create the object "pdflist", I used "files <- list.files (pattern = "pdf$")"
The “files” vector contains all the PDF file names.
But "files" does not work with pdf_text function, probably because it's a vector. What can I do instead?
maybe this is not the best solution but this works for me:
library(pdftools)
# Set your path here.
your_path = 'C:/Users/.../pdf_folder'
setwd(your_path)
getwd()
lf = list.files(path=getwd(), pattern=NULL, all.files=FALSE,
full.names=FALSE)
#Creating a list to iterate
my_pdfs = {}
#Iterate. Asssign each element of list files, to a list.
for (i in 1:length(lf)){my_pdfs[i] <- pdf_text(lf[i])}
#Calling the first pdf of the list.
my_pdfs[1]
Then you can assign each of the pdfs to a single file of whatever you want. Of course, each file will be saved in each element of the list. Does this solve your problem?
You could try using lapply over the vector that contains the location of every pdf file (files). I would recommend using list.files(..., full.names = T) to get the complete location of each pdf file. This should work.
mypdfs<-lapply(files, pdf_text)

Resources