reading and saving large files rds files in a single rds file - r

I have a list that contain many large files. All the files have the same column names. I want to combine them into an rds file and save.
list.nam<- list.files(pattern="*.I S")
list.fil <- lapply (list.nam, readRDs)
Error in match.fun(FUN) : object 'readRDs' not found

You have entered a incorrect function name, replace readRDs with readRDS it works
list.fil <- lapply (list.nam, readRDS)

Related

Read several PDF files into R with pdf_text

I have several PDF files in my directory. I have downloaded them previously, no big deal so far.
I want to read all those files in R. My idea was to use the "pdf_text" function from the "pdftools" package and write a formula like this:
mypdftext <- pdf_text(files)
Where "files" is an object that gathers all the PDF file names, so that I don't have to write manually all the names. Because I have actually downlaoded a lot of files, it would avoid me to write:
mypdftext <- pdf_text("file1.pdf", "file2.pdf", and many more files...)
To create the object "pdflist", I used "files <- list.files (pattern = "pdf$")"
The “files” vector contains all the PDF file names.
But "files" does not work with pdf_text function, probably because it's a vector. What can I do instead?
maybe this is not the best solution but this works for me:
library(pdftools)
# Set your path here.
your_path = 'C:/Users/.../pdf_folder'
setwd(your_path)
getwd()
lf = list.files(path=getwd(), pattern=NULL, all.files=FALSE,
full.names=FALSE)
#Creating a list to iterate
my_pdfs = {}
#Iterate. Asssign each element of list files, to a list.
for (i in 1:length(lf)){my_pdfs[i] <- pdf_text(lf[i])}
#Calling the first pdf of the list.
my_pdfs[1]
Then you can assign each of the pdfs to a single file of whatever you want. Of course, each file will be saved in each element of the list. Does this solve your problem?
You could try using lapply over the vector that contains the location of every pdf file (files). I would recommend using list.files(..., full.names = T) to get the complete location of each pdf file. This should work.
mypdfs<-lapply(files, pdf_text)

Looping over a set of standardized files to collect information and save it in a different files

I have several files in a folder. They all have same layout and I have extracted the information I want from them.
So now, for each file, I want to write a .csv file and name it after the original input file and add "_output" to it.
However, I don't want to repeat this process manually for each file. I want to loop over them. I looked for help online and found lots of great tips, including many in here.
Here's what I tried:
#Set directory
dir = setwd("D:/FRhData/elb") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = matrix()
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = readLines(filelist[i])
*code with all calculations*
write.csv(x = finalDF, file = paste (filename[i] ,"_output. csv")
}
Unfortunately, it didn't work out. Here's the error message I get:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
In addition: Warning message:
In myfile[i] <- readLines(filelist[i]) :
number of items to replace is not a multiple of replacement length
And 'report2016-03.txt' is the name of the first file the code should be executed on.
Does anyone know what I should do to correct this mistake - or any other possible mistakes you can foresee?
Thanks a lot.
======================================================================
Here's some of the resources I used:
https://www.r-bloggers.com/looping-through-files/
How to iterate over file names in a R script?
Looping through files in R
Loop in R loading files
How to loop through a folder of CSV files in R
This worked for me. I used a vector instead of a matrix, took out the readLines() call and used paste0 since there was no separator.
dir = setwd("C:/R_projects") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = vector()
finalDF <- data.frame(a=3, b=2)
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = filelist[i]
write.csv(x = finalDF, file = paste0(myfile[i] ,"_output.csv"))
}
list.files(dir)

import multiple txt files into R

I am working with MODIS 8-day data and am trying to import all the txt files of one MODIS product into the R, but not as one single data.frame, as individual txt files. So I can later apply same functions on them. The main objective is to export specific elements within each txt file. I was successful in excluding the desired elements from one txt file with the following command:
# selecting the element within the table
idxs <- gsub("\\]",")", gsub("\\[", "c(", "[24,175], [47,977], [159,520], [163,530]
,[165,721], [168,56], [217,820],[243,397],[252,991],[284,277],[292,673]
,[322,775], [369,832], [396,872], [434,986],[521,563],[522,717],[604,554]
,[608,50],[614,69],[752,213],[780,535],[786,898],[788,1008],[853,1159],[1014,785],[1078,1070]") )
lst <- rbind( c(24,175), c(47,977), c(159,520), c(163,530) ,c(165,721), c(168,56), c(217,820),c(243,397),c(252,991),c(284,277),c(292,673),c(322,775), c(369,832), c(396,872), c(434,986),c(521,563),c(522,717),c(604,554),c(608,50),c(614,69),c(752,213),c(780,535),c(786,898),c(788,1008),c(853,1159),c(1014,785),c(1078,1070))
mat <- matrix(scan("lst.txt",skip = 6),nrow=1200)
Clist <- as.data.frame(mat[lst])
But I need these element from all of the txt files and honestly I do not want to run it manually for 871 times. So I try to read all the txt files and then apply this function to them. but unfortunately it does not work. here is my approach:
folder <- "C:/Users/Documents/R/MODIS/txt/"
txt_files <- list.files(path=folder, pattern=".txt")
df= c(rep(data.frame(), length(txt_files)))
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(txt_files[i]))}
and this is the error I encounter:
**Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'rastert_a2001361.txt': No such file or directory**
additional information: each txt file includes 1200rows and 1200columns and 20-30 elements need to be extracted from the table.
I am very much looking forward for your answers and appreciate any helps or recommendations with this matter.
The issue is that list.files returns only the file name within the folder, not the full path to the file. If you working direction is not "C:/Users/Documents/R/MODIS/txt/" your code could not work. Change your code to
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(file.path(folder, txt_files[i])))}
Now it should be working.
file.path combines your path and your file with correct, OS specific, path seperator.

Save data.frame objects into .Rds files within a loop

I have data.frame objects with normalized names into my global env and I want to save them into .Rda files.
My first question is, should I save them into one big .Rda file or should I create one file for each data frame ? (df have 14 col and ~260 000 row).
Assuming that I'll save them into differents files, I was thinking about a function like this : (All my data.frame names begin by "errDatas")
sapply(ls(pattern = "errDatas"), function(x) save(as.name(x), file = paste0(x, ".Rda")))
But I have this error :
Error in save(as.name(x), file = paste0(x, ".Rda")) :
objet ‘as.name(x)’ introuvable
Seems like save can't parse as.name(x) and evaluate it as is. I tried also with eval(parse(text = x)) but it's the same thing.
Do you have an idea about how I can manage to save my data frames within a loop ? Thanks.
And I have a bonus question to know if what I'm trying to do is useful and legit :
These data frames come from csv files (one data frame by csv file which I import with read.csv). Each day I have one new csv file and I want to do some analysis on all the csv files. I realized that reading from csv is much slower than saving and loading a Rda file. So instead of reading all the csv each time I run my program, I actualy want to read each csv file only once, saving it into a Rda file and then loading it. Is this a good idea ? Is there best-practices for that with R ?
Use the list= parameter of the save function. This allows you to specify the name of the object as a character vector rather than passing the object itself. For example
lapply(ls(pattern = "errDatas"), function(x) {
save(list=x, file = paste0(x, ".Rda"))
})

Open multiple files and assign to individual variables using R

I have a 100 text files with matrices which I want to open using R - the read.table() command can be used for that.
I cant figure out how to assign these files to separate variable names so that I can carry out operations on the matrices.
I am trying to use the for loop but keep getting error messages.
I hope somebody can help me out with this...
If you have 100 files, it may make more sense to simply keep them in one neat list.
# Get the list of files
#----------------------------#
folder <- "path/to/files"
fileList <- dir(folder, recursive=TRUE) # grep through these, if you are not loading them all
# use platform appropriate separator
files <- paste(folder, fileList, sep=.Platform$file.sep)
# Read them in
#----------------------------#
myMatrices <- lapply(files, read.table)
Then access via, eg, myMatrices[[37]] or using lapply
Would it be easer to just use list.files?
For example:
files <- list.files(directory/path, pattern= "regexp.if.needed")
And then you could access each element by calling files[1], files[2], etc. This would allow you to pull out either all the files in a directory, or just the ones that matched a regular expression.

Resources