Open multiple files and assign to individual variables using R - r

I have a 100 text files with matrices which I want to open using R - the read.table() command can be used for that.
I cant figure out how to assign these files to separate variable names so that I can carry out operations on the matrices.
I am trying to use the for loop but keep getting error messages.
I hope somebody can help me out with this...

If you have 100 files, it may make more sense to simply keep them in one neat list.
# Get the list of files
#----------------------------#
folder <- "path/to/files"
fileList <- dir(folder, recursive=TRUE) # grep through these, if you are not loading them all
# use platform appropriate separator
files <- paste(folder, fileList, sep=.Platform$file.sep)
# Read them in
#----------------------------#
myMatrices <- lapply(files, read.table)
Then access via, eg, myMatrices[[37]] or using lapply

Would it be easer to just use list.files?
For example:
files <- list.files(directory/path, pattern= "regexp.if.needed")
And then you could access each element by calling files[1], files[2], etc. This would allow you to pull out either all the files in a directory, or just the ones that matched a regular expression.

Related

Creating objects from all .xlsx documents in working directory

I am trying to create objects from all files in working directory with name of the original file. I tried to go the following way, but couldn't solve appearing problems.
# - SETTING WD
getwd()
setwd("PATH TO THE FILE")
library(readxl)
# - CREATING OBJECTS
file_objects <- list.files()
xlsx_objects <- unlist(grep(".xlsx",file_objects,value = T))
for (i in xlsx_objects) {
xlsx_objects[i] <- read_xlsx(xlsx_objects[i], header = T)
}
I tried to paste [i]item from "xlsx_objects" with path to WD but it only created a list of files names from docs in WD.
I also find information, that read.csv can read only one file at the time, but I guess that it should be the case with for loop, right? It is reading only one file at the time.
Using lapply (as described in this forum) I was able to get the data in the environment, but argument header didn't work, I lost names of my docs in that object which does not have desired structure. I am though looking for having these files in separated objects without calling every document exclusively.
IIUC, you could do something like:
files = list.files("PATH TO THE FILE", full.names = T, pattern = 'xlsx')
list_files = map(files, readxl::read_excel)
(You can't use read.csv to read excel files)
Also I recommend reading about R Projects so you don't have to use setwd() ever again, which makes your code harder to reproduce down the pipeline

Read several PDF files into R with pdf_text

I have several PDF files in my directory. I have downloaded them previously, no big deal so far.
I want to read all those files in R. My idea was to use the "pdf_text" function from the "pdftools" package and write a formula like this:
mypdftext <- pdf_text(files)
Where "files" is an object that gathers all the PDF file names, so that I don't have to write manually all the names. Because I have actually downlaoded a lot of files, it would avoid me to write:
mypdftext <- pdf_text("file1.pdf", "file2.pdf", and many more files...)
To create the object "pdflist", I used "files <- list.files (pattern = "pdf$")"
The “files” vector contains all the PDF file names.
But "files" does not work with pdf_text function, probably because it's a vector. What can I do instead?
maybe this is not the best solution but this works for me:
library(pdftools)
# Set your path here.
your_path = 'C:/Users/.../pdf_folder'
setwd(your_path)
getwd()
lf = list.files(path=getwd(), pattern=NULL, all.files=FALSE,
full.names=FALSE)
#Creating a list to iterate
my_pdfs = {}
#Iterate. Asssign each element of list files, to a list.
for (i in 1:length(lf)){my_pdfs[i] <- pdf_text(lf[i])}
#Calling the first pdf of the list.
my_pdfs[1]
Then you can assign each of the pdfs to a single file of whatever you want. Of course, each file will be saved in each element of the list. Does this solve your problem?
You could try using lapply over the vector that contains the location of every pdf file (files). I would recommend using list.files(..., full.names = T) to get the complete location of each pdf file. This should work.
mypdfs<-lapply(files, pdf_text)

How can I perform the same set of commands separately for each file in the same file path?

I am trying to deal with extracting a subset from multiple .grb2 files in the same file path, and write them in a csv. I am able to do it for one (or a few) by using the following set of commands:
GRIB <- brick("tmp2m.1989102800.time.grb2")
GRIB <- as.array(GRIB)
readGDAL("tmp2m.1989102800.time.grb2")
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,"tmp1.csv")
The above set of commands extract, in csv, temperature values for specific latitude "51" and longitude "27", as well as for a specific time range "c(261:1232)".
Now I have hundreds of these files (with different file names, of course) in the same directory and I want to do the same for all. As you know, better than me, I cannot do this to one by one, changing the file name each time.
I have struggled a lot with this, but so far I did not manage to do it. Since I am new in R, and my knowledge is limited, I would very much appreciate any possible help with this.
The simplest way would be to use a normal for loop:
path <- "your file path here"
input.file.names <- dir(path, pattern =".grb2")
output.file.names <- paste0(tools::file_path_sans_ext(file.names),".csv")
for(i in 1:length(file.names)){
GRIB <- brick(input.file.names[i])
GRIB <- as.array(GRIB)
readGDAL(input.file.names[i]) # edited line
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,output.file.names[i])
}
You could of course create the body of the for loop into a function and then use the standard lapply or the map function from purrr.
Note that this code will print out different CSV files. If you want to append the data to a single file then you should check out write.table

Importing files with almost similar path and name

I have many txt files that I want to import into R. These files are imported one by one, I do the operations that I want, and then I import the next file.
All these files are located in a database system where all the folders have almost the same names, e.g.
database\type4\system50
database\type6\system50
database\type4\system30
database\type4\system50
Similarly, the names of the files are also almost the same, referring to the folder where they are positioned, e.g..
type4.system50.txt
type6.system50.txt
type4.system30.txt
type4.system50.txt
I have heard that there should be a easier way of importing these many files one by one, than simply multiple setwd and read.csv2 commands. As far as I understand this is possible by the macro import function in SAS, where you specify an overall path and then for each time you want to import a file you specify what is specific about this file name/folder name.
Is there a similar function in R? I tried to look at
Importing Data in R like SAS macro
, but this question did not really show me how to specify the folder name/file name.
Thank you for your help.
If you want to specify folder name / file name, try this
databasepath="path/to/database"
## list all files
list.files(getwd(),recursive = T,full.names = T,include.dirs = T) -> tmp
## filter files you want to read
readmyfile <- function(foldername,filename){
tmp[which(grepl(foldername,tmp) & grepl(filename,tmp))]
}
files_to_read <- readmyfile("type4", "system50")
some_files <- lapply(files_to_read, read.csv2)
## Or you can read all of them (if memory is large enough to hold them)
all_files <- lapply(tmp,read.csv2)
Instead of using setwd continuously, you could specify the absolute path for each file, save all of the paths to a vector, loop through the vector of paths and load the files into a list
library(data.table)
file_dir <- "path/to/files/"
file_vec <- list.files(path = file_dir, pattern = "*.txt")
file_list <- list()
for (n in 1:length(file_list)){
file_list[[n]] <- fread(input = paste0(file_dir, file_vec[n]))
}

R read a txt file write specific column to new txt file

I need to read a file "001.txt" with 3 columns, and export a file with a new name "new001C2.txt" that only contains the second column. I can do this sort of thing one file at a time, but I'm just starting out and haven't been able to find an automated way to do this for all the files in the directory.
This can be done with lapply. First, make a list of all the file names. Then read them all with lapply. This puts each file into a list, so "001.txt" will be the first member of the list, and can be accessed with readAll[[1]]. newFiles creates a list of new files, keeping the numbers in line with the original file names. Then we use lapply again to write each to a separate file.
files <- c("001.txt", "002.txt", "003.txt")
readAll <- lapply(files, read.csv)
col2 <- lapply(readAll, function(x) x[,2])
newFiles <- paste0("new", gsub(".txt", "", files), "C2.txt")
lapply(1:length(col2), function(i) write(col2[[i]], newFiles[i]))
If you're on a linux based system, here's a simpler solution you can run directly from the shell:
cut -f 2 001.txt > new001C2.txt

Resources