extract data from a ". dat" file and form a 9x4 matrix - r

I want to extract a matrix data from THREE ".dat" files with file names x1,x2 and x3 and combine them in one matrix. ( I have merged them here for convenience but should be assumed from three files). Each file has 3x3 matrix data. I want to extract the data in each file with corresponding DATE on one column. So the result will have 4 columns and 9 rows. The date should be written on the first row of each matrix and the rest of the spaces can be filled with NA's or leave them. Here is the file:enter image description here

Assuming that the files have 3 header lines before the beginning of data and if all the files are in the working directory. Get all the files from the working directory using list.files(). Loop through the 'files', read the dataset with read.csv, skip the first 3 lines, specifying the header as FALSE. Then, we third line from each of the files with scan, remove the substring until the date part with sub, create a column in each of the list element using Map, and rbind the output to have a single data.frame.
files <- list.files()
lst <- lapply(files, read.csv, skip=3, header=FALSE)
lst2 <- lapply(files, scan, skip=2, nlines=1, what = "")
Datetime <- sub(".*:\\s+", "", unlist(lst2))
do.call(rbind, Map(cbind, lst, Datetime=Datetime))

Related

How can I take one column from multiple CSV files to create a new dataframe in R?

Essentially I am looking to take the third column from every CSV file in a folder and append it to a data frame as a new column. Ideally I would like the header for each column to be the respective file name. I have about 172 files in the folder, each with a unique filename (i.e. file1.csv, file2.csv, etc) however the title of the third column is the same. Illustrating this on a smaller scale, if I had file 1 and file 2, the output would look like what is shown below.
EDIT: added some clarification.
Will your third column always be the same name in both files?
if not you could to the below
cbind(file1[,3], file2[,3])
cbind would combine the data frames by column
You can use lapply to read all the files, extract the 3rd column from it, assign the name of the column same as the file name and bind all such column together in one dataframe.
filenames <- list.files(pattern = '.csv$', full.names = TRUE)
do.call(cbind, lapply(filenames, function(x) {
setNames(data.frame(read.csv(x)[[3]]), tools::file_path_sans_ext(basename(x)))
})) -> result
result

Import multiple .txt files and merging them

There are around 3k .txt files, comma separated with equal structure and no col names.
e.g. 08/15/2018,11.84,11.84,11.74,11.743,27407 ///
I only need col1 (date) and col 5 (11.743) and would like to import all those vectores with the name of the .txt file assigned (AAAU.txt -> AAAU vector). In a second step I would like to merge them to a matrix, with all the possible dates in rows and colums with .txt filename and col5 value for each date.
I tried using readr, but I was unable to include the information of the filename, thus I cannot proceed.
Cheers for any help!
I didn't test this code, but I think this will work for you. You can use list.files() to pull in all file names into a variable, then read each one individually and append it to a new data frame with either rbind() or cbind()
setwd("C:/your_favorite_directory/")
fnames <- list.files()
csv <- lapply(fnames, read.csv)
result <- do.call(rbind, csv)
# grab a subset of the fields you need
df <- subset(result, select = c(a, e))
#then write your final file
write.table(df,"AllFiles.txt",sep=",")
Also, the '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
df = subset(mydata, select = -c(b,c,d) )

How to combine multiple .csv files, and add a column with each dataset's name, in R?

I'm trying to combine multiple CSV files in R so that I can do some predictive modeling. While each file has the same columns and the same order of those columns is the same, the names are different for some columns. So far, my code combines the files just fine and strips away the headers. What I now need it to do now, however, is add another two columns for the date associated with each CSV. The file name of each CSV contains the date.
The file names are formatted as follows: 'January 2017', 'February 2017', 'March 2017', etcetera.
So I want the two columns to be the month and year.
Below is the code I've used so far. It combines all the CSV's into one, but doesn't create the two additional columns which I need.
dat <- setwd('C:/Users/ . . . /Historical Data')
file_names <- dir(dat)
dataset <- do.call(rbind, lapply(file_names, read.csv, skip = 1, header = FALSE))
dataset <- do.call(rbind, lapply(file_names, read.csv, header = FALSE, function(x) cbind(read.csv(x), name=strsplit(x,'\\.')[[1]][1])))
head(dataset)
Can anyone point me in the right direction for how to best code these two columns into this?
Your code was pretty good to begin with.
The following code reads each element in file_list and appends it to an empty list. It then binds all the elements together. It is good for batch reading files and keeping their file names in a separate column.
Try doing this:
library(data.table)
file_list <- list()
file_list <- lapply(file_names, function(x){
ret <- read_csv(x)
ret$origin <- x
return(ret)})
df <- rbindlist(file_list)
Here is a library(tidyverse) way of accomplishing what you need, you can still set your working directory to where it needs to be and instead of using dir() you can use list.files()
dat_files <- list.files(".../Historical Data", pattern='*.csv')
map_df(dat_files, ~read_csv(.x) %>%
mutate(month_year = str_remove_all(.x, ".csv", "")) %>%
separate(month_year, into=c("Month", "Year"), sep=" ")
)
This code will read all your files into one df and use the file name to create a new column without .csv attached to it. It will then separate the that column into the Month and Year column be separating on the " "

writing single column to .csv in R

HI folks: I'm trying to write a vector of length = 100 to a single-column .csv in R. Each time I try, I get two columns in the csv file: first with index numbers from the vector, second with the contents of my vector. For example:
MyPath<-("~/rstudioshared/Data/HW3")
Files<-dir(MyPath)
write.csv(Files,"Names.csv",row.names = FALSE)
If I convert the vector to a data frame and then check its dimensions,
Files<-data.frame(Files)
dim(Files)
I get 100 rows by 1 column, and the column contains the names of the files in my directory folder. This is what I want.
Then I write the csv. When I open it outside of R or read it back in and look at it, I get a 100 X 2 DF where the first column contains the index numbers and the second column has the names of my files.
Why does this happen?
How do I write just the single column of data to the .csv?
Thanks!
Row names are written by write.csv() by default (and by default, a data frame with n rows will have row names 1,...,n). You can see this by looking at e.g.:
dat <- data.frame(mevar=rnorm(10))
# then compare what gets written by:
write.csv(dat, "outname1.csv")
# versus:
rownames(dat) <- letters[1:10]
write.csv(dat, "outname2.csv")
Just use write.csv(dat, "outname.csv", row.names=FALSE) and the row names won't show up.
And a suggestion: might be easier/cleaner to just just write the vector directly to a text file with writeLines(your_vector, "your_outfile.txt") (you can still use read.csv() to read it back in if you prefer using that :p).

(In R) Merge Directory of CSVs, with some having different number of columns

I have roughly 50 .csv's with a varying degree of rows and columns. I want to stack the 50 files on top of one another to form one master list. However, I want their columns to match together, with rows missing a value in a missing column filling it with an NA value.
My Code so far:
#reads in all my csvs
filelist <- list.files(pattern = ".csv")
list_of_data = lapply(filelist, read.csv)
all_data = do.call(rbind.fill, list_of_data)
write.csv(all_data, file = "all_Data_test.csv")
What am I doing wrong? Functionally, do have to read in all the column headers and match based off that?
Thank you
One option is bind_rows from dplyr
library(dplyr)
all_data <- bind_rows(list_of_data)
The output can be written with write_csv
library(readr)
write_csv(all_data, "all_data_test.csv")

Resources