Merge RDS files from two different file paths? - r

Folder 1 and Folder 2 are full of .rds files. How would I go about merging all files in both folders into 1 .rds file?
What I have so far
mergedat <- do.call('rbind', lapply(list.files("File/Path/To/Folder/1/", full.names = TRUE), readRDS))
However I don't know how to add the second file path and even then, the code above does not seem to be working.
The information in the .rds files are all set up exactly the same as far as number of columns and column headers go, but the information in them is obviously different. I just figured out that I did not have the files read either within my code.
Any suggestions?

You can do something like this twice, each time for a different path:
path <- "./files"
files <- list.files(path = path,
full.names = TRUE,
all.files = FALSE)
files <- files[!file.info(files)$isdir]
data <- lapply(files,
function(x) {
readRDS(x)
})
You end up with 2 data objects which are lists with each list element containing a data frame that corresponds with what is in the RDS file. If all those files are the same in terms if structure, you can use dplyr::bind_rows() to concatenate all data frames into one combined data frame.

Related

Read excel folder with some containing worksheets in R

I have a folder containing excel files. There are some that have worksheets and I just want to know how to use a special case (maybe a for loop) where I would identify which files need to be used to read worksheets and then select only certain tabs to join all of the excel files together at the end. In addition, these tabs would need to skip 40 lines which I assume would use skip=40. When I type the code that I have, it looks like a giant mess.
files <- list.files(path = "/Users/Desktop/folder2, pattern = "*.xlsx")
files_join <- lapply(files, read_excel) %>%
bind_rows()

R script to open folders then identify a file, rename it, and read it

I have recently learned to code with R and I sort of manage to handle the data within files but I can't get it to manipulate the files themselves. Here is my problem:
I'd like to open successively, in my working directory "Laurent/R", the 3 folders that are within it ("gene_1", "gene_2", "gene_3").
In each folder, I want one specific .csv file (the one containing the specific word "Cq") to be renamed as "gene_x_Cq" (and then to move these 3 renamed files in a new folder (is that necessary?)).
I want then to be able to successively open these 3 .csv files (with read.csv i suppose) to manipulate the data within them.
I've looked at different functions like list.file, unlist, file.rename but i'm sure they are appropriate and I can't figure out how to use them in my case.
Can anyone help ? (I use a Mac)
Thanks
Laurent
Here's a potential solution. If you don't understand something, just shout out and ask!
setwd("Your own file path/Laurent")
library(stringr)
# list all .csv files
csvfiles <- list.files(recursive = T, pattern = "\\.csv")
csvfiles
# Pick out files that have cq in them, ensuring that you ignore uppercase/lowercase
cq.files <- csvfiles[str_detect(csvfiles, fixed("cq", ignore_case = T))]
# Get gene number for both files - using "2" here because gene folder is at the second level in the file path
gene.nb <- str_sub(word(cq.files, 2, 2, sep = "/"), 6, 6)
gene.nb
# create a new folder to place new files into
dir.create("R/genefiles")
# This will copy files, not move them. To move them, use file.rename - but be careful, I'd try file.copy first.
cq.files <- file.copy(cq.files,
paste0("R/genefiles/gene_", gene.nb, "_", "Cq", ".csv"))
# Now to work with all files in the new folder
library(purrr)
genefiles <- list.files("R/genefiles", full.names = T)
# This will bring in all data into one dataframe. If you want them brought in as separate dataframes,
# use something like gene1 <- read.csv("R/genefiles/gene_1_Cq.csv")
files <- map_dfr(genefiles, read.csv)

How to combine .txt files from multiple folders

I want to combine multiple .txt files in R from multiple folders. However, I'm running into trouble when I want to separate the data into different columns. Right now, the files combine but into one single column when there should be four.
I used list.files to find .txt files in the folders in my working directory. Then I used rbind and lapply to combine them with read.delim. (see below)
files = list.files(pattern = "*.txt")
myfiles = do.call(rbind, lapply(files, function(x) read.delim(x, header = FALSE, stringsAsFactors = FALSE)))
The above code combines all of the .txt files, but the first 3 rows of each file are artifacts of the data download (basically just a naming feature) and are not pertinent to the data itself. So once the files are combined, the three lines repeat. I cannot use filter(), as I would have to manually go through the data (many thousands of lines). I would also like to repeat this process in another folder with a similar setup. So I'd like to be able to use the same code.
I think I can resolve the issue by removing the top 3 lines of each .txt file before combining them. Then I can set header = FALSE and just add in headers once the files are combined. But again, there are many hundreds of files, so I do not wish to do this manually. I'm not sure how to do this, though. Any suggestions?
Thank you for any help.
Options, transcribed from the comment:
By itself, read.delim(..., skip = 3) will remove those leading duplicate rows. This will also remove the header row, so all of your frames will have generic column names, not a big problem.
To fix that, you can re-read the first row of one of the files (first?) to get the column names, with read.delim(..., nrows=1). If we used nrows=0, it reads all, so we need a minimum of 1 to limit the rows read; in the comment I included [0,], but since all you need is the column-names, it doesn't really affect things.
You can do it the first time with something like:
files = list.files(pattern = "*.txt")
myfiles = do.call(rbind, lapply(files, function(x) read.delim(x, skip = 3, header = FALSE, stringsAsFactors = FALSE)))
# added this part ^^^^^^^^^
colnames(myfiles) <- colnames(read.delim(files[1], header=TRUE, nrows=1))

Writing mutiple csv files into multiple data frames in R

I tried to write multiple csv files with the same amount of columns and rows into multiple data frames, which can be accessed in a way like this:
file[1] #Outputs the whole content of the first csv file
file[2] #Outputs the whole content of the second csv file
and so on...
I have already saved everything into one data frame, but the neccesary values can't be accessed in such a way:
files = list.files(pattern="*.csv")
myfiles = do.call(rbind, lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE)))
myfiles is one big data frame but I want to access them in the way I explained above.
I am using RStudio 0.9 and my working directory is where all files are located. The csv files are named in that way:
"001.csv"
"002.csv" "003.csv"...
Thank you in advance

To stack up results in one masterfile in R

Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory.
Now, I have created a csv file in each of these folders which contains a list of all the fitted values.
I would now like to do the following:
Set the working directory to the particular filename
Read fitted values file
Add a row/column stating the name of the site/ unique ID
Add it to the masterfile which is stored in the main directory with a title specifying site name/filename. It can be stacked by rows or by columns it doesn't really matter.
Come to the main directory to pick the next file
Repeat the loop
Using the merge(), rbind(), cbind() combines all the data under one column name. I want to keep all the sites separate for comparison at a later on stage.
This is what I'm using at the moment and I'm lost on how to proceed further.
setwd( "path") # main directory
path <-"path" # need this for convenience while switching back to main directory
# import all files and create a character type array
files <- list.files(path=path, pattern="*.csv")
for(i in seq(1, length(files), by = 1)){
fileName <- read.csv(files[i]) # repeat to set the required working directory
base <- strsplit(files[i], ".csv")[[1]] # getting the filename
setwd(file.path(path, base)) # setting the working directory to the same filename
master <- read.csv(paste(base,"_fiited_values curve.csv"))
# read the fitted value csv file for the site and store it in a list
}
I want to construct a for loop to make one master file with the files in different directories. I do not want to merge all under one column name.
For example, If I have 50 similar csv files and each had two columns of data, I would like to have one csv file which accommodates all of it; but in its original format rather than appending to the existing row/column. So then I will have 100 columns of data.
Please tell me what further information can I provide?
for reading a group of files, from a number of different directories, with pathnames patha pathb pathc:
paths = c('patha','pathb','pathc')
files = unlist(sapply(paths, function(path) list.files(path,pattern = "*.csv", full.names = TRUE)))
listContainingAllFiles = lapply(files, read.csv)
If you want to be really quick about it, you can grab fread from data.table:
library(data.table)
listContainingAllFiles = lapply(files, fread)
Either way this will give you a list of all objects, kept separate. If you want to join them together vertically/horizontally, then:
do.call(rbind, listContainingAllFiles)
do.call(cbind, listContainingAllFiles)
EDIT: NOTE, the latter makes no sense unless your rows actually mean something when they're corresponding. It makes far more sense to just create a field tracking what location the data is from.
if you want to include the names of the files as the method of determining sample location (I don't see where you're getting this info from in your example), then you want to do this as you read in the files, so:
listContainingAllFiles = lapply(files,
function(file) data.frame(filename = file,
read.csv(file)))
then later you can split that column to get your details (Assuming of course you have a standard naming convention)

Resources