Import multiple CSV files in R columnwise - r

I am reshaping my question,
I need to import several CSV files of 296x2 (296 rows x 2 columns) into a single dataset.
First column is the same for all files.
I would like to merge all the CSV into a single dataset columnwise (conserving only the first column as row name once.
In other words, All the 329 CSV files are comma delimited and are all the same 296x2. I would like to end up with a 296x329 dataset that includes the second column of each dataset.
Thanks in advance
Emiliano

Without knowing your data it's difficult to say, but assume you have your dataset in a folder name: C:/foo/. Try this one:
filenames <- list.files('C:/foo/', pattern="*.csv", full.names=TRUE)
la <- lapply(filenames, read.csv)
Reduce(function(x, y) merge(x, y, by="Wavelength"), la)

Related

Is there a way in R to read multiple excel files, change columns to character, and then merge them?

New-ish to R and I feel like this has a simple solution, but I can't figure it out.
I have 59 excel files that I want to combine. However, 4 of the columns have a mix of dates and NA's (depending on if the study animal is a migrant or not) so R won't let me combine them because some are numeric and some are character. I was hoping to read all of the excel files into R, convert those 4 columns in each file to as.character, and then merge them all. I figured a loop could do this.
Anything I find online has me typing out the name for each read file, which I don't really want to do for 59 files. And once I do have them read into R and those columns converted, can I merge them from R easily? Sorry if this is simple, but I'm not sure what to do that would make this easier.
You can do this quickly using lapply. It was unclear exactly how you wanted to combine the files (a true merge by a common variable, or append the rows, or append the columns). Either way, I do not believe you need to change anything to as.character for any of the below approaches (2A - 2C) to work:
library(readxl)
# 1. Read in all excel files in a folder given a specific filepath
filepath <- "your/file/path/"
file_list <- list.files(path = filepath, pattern='*.xlsx')
df_list <- lapply(file_list, read_excel)
# 2a. Merge data (assuming a unique identifier, i.e "studyid")
final_data <- Reduce(function(...) merge(..., by = "studyid", all = TRUE), df_list)
# 2b. If all files have the same columns, append to one long dataset
final_data <- do.call(rbind, df_list)
# 2c. If you want to make a wide dataset (append all columns)
final_data <- do.call(cbind, df_list)

Import multiple .txt files and merging them

There are around 3k .txt files, comma separated with equal structure and no col names.
e.g. 08/15/2018,11.84,11.84,11.74,11.743,27407 ///
I only need col1 (date) and col 5 (11.743) and would like to import all those vectores with the name of the .txt file assigned (AAAU.txt -> AAAU vector). In a second step I would like to merge them to a matrix, with all the possible dates in rows and colums with .txt filename and col5 value for each date.
I tried using readr, but I was unable to include the information of the filename, thus I cannot proceed.
Cheers for any help!
I didn't test this code, but I think this will work for you. You can use list.files() to pull in all file names into a variable, then read each one individually and append it to a new data frame with either rbind() or cbind()
setwd("C:/your_favorite_directory/")
fnames <- list.files()
csv <- lapply(fnames, read.csv)
result <- do.call(rbind, csv)
# grab a subset of the fields you need
df <- subset(result, select = c(a, e))
#then write your final file
write.table(df,"AllFiles.txt",sep=",")
Also, the '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
df = subset(mydata, select = -c(b,c,d) )

(In R) Merge Directory of CSVs, with some having different number of columns

I have roughly 50 .csv's with a varying degree of rows and columns. I want to stack the 50 files on top of one another to form one master list. However, I want their columns to match together, with rows missing a value in a missing column filling it with an NA value.
My Code so far:
#reads in all my csvs
filelist <- list.files(pattern = ".csv")
list_of_data = lapply(filelist, read.csv)
all_data = do.call(rbind.fill, list_of_data)
write.csv(all_data, file = "all_Data_test.csv")
What am I doing wrong? Functionally, do have to read in all the column headers and match based off that?
Thank you
One option is bind_rows from dplyr
library(dplyr)
all_data <- bind_rows(list_of_data)
The output can be written with write_csv
library(readr)
write_csv(all_data, "all_data_test.csv")

extract data from a ". dat" file and form a 9x4 matrix

I want to extract a matrix data from THREE ".dat" files with file names x1,x2 and x3 and combine them in one matrix. ( I have merged them here for convenience but should be assumed from three files). Each file has 3x3 matrix data. I want to extract the data in each file with corresponding DATE on one column. So the result will have 4 columns and 9 rows. The date should be written on the first row of each matrix and the rest of the spaces can be filled with NA's or leave them. Here is the file:enter image description here
Assuming that the files have 3 header lines before the beginning of data and if all the files are in the working directory. Get all the files from the working directory using list.files(). Loop through the 'files', read the dataset with read.csv, skip the first 3 lines, specifying the header as FALSE. Then, we third line from each of the files with scan, remove the substring until the date part with sub, create a column in each of the list element using Map, and rbind the output to have a single data.frame.
files <- list.files()
lst <- lapply(files, read.csv, skip=3, header=FALSE)
lst2 <- lapply(files, scan, skip=2, nlines=1, what = "")
Datetime <- sub(".*:\\s+", "", unlist(lst2))
do.call(rbind, Map(cbind, lst, Datetime=Datetime))

Change column names after merging multiple data frames into one in R

After merging multiple data frames into one, I would like to know how to change the column headers in the master data frame to represent the original files that they came from. I merged a large number of data frames into one using the code below:
library(plyr)
dflist = list.files(path=dir, pattern="csv$", full.names=TRUE, recursive=FALSE)
import.list = llply(dflist, read.csv)
Master = Reduce(function(x, y) merge(x, y, by="Hours"), import.list)
I would like the columns that belonged to each original data frame to be named by the unique ID that the original data frame/ csv file is named by (i.e. aa, ab, ac). The unique IDs in the filenames comes immediately before a low line ("_") so I can isolate them using the code below. However, I am having trouble now applying this to column headers. Any help would be much appreciated.
filename = dflist[1]
unqID = strsplit(filename,"_")[[1]][1]
You could define a function in your llply call to and have read.csv assign names.
or just rename them after reading them in and before merging #joran suggested
#First get the names
filenames = dflist
#I am unsure about the line below, as I
unqID = lapply(filenames,function(x) strplit(x,"_")[1])
names(import.list) <- paste("unqID", names(import.list),sep=".") #renaming the list items
And then merge using your code

Resources