There are around 3k .txt files, comma separated with equal structure and no col names.
e.g. 08/15/2018,11.84,11.84,11.74,11.743,27407 ///
I only need col1 (date) and col 5 (11.743) and would like to import all those vectores with the name of the .txt file assigned (AAAU.txt -> AAAU vector). In a second step I would like to merge them to a matrix, with all the possible dates in rows and colums with .txt filename and col5 value for each date.
I tried using readr, but I was unable to include the information of the filename, thus I cannot proceed.
Cheers for any help!
I didn't test this code, but I think this will work for you. You can use list.files() to pull in all file names into a variable, then read each one individually and append it to a new data frame with either rbind() or cbind()
setwd("C:/your_favorite_directory/")
fnames <- list.files()
csv <- lapply(fnames, read.csv)
result <- do.call(rbind, csv)
# grab a subset of the fields you need
df <- subset(result, select = c(a, e))
#then write your final file
write.table(df,"AllFiles.txt",sep=",")
Also, the '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
df = subset(mydata, select = -c(b,c,d) )
Related
EDIT:
Using the following solved my problem:
bind_rows(setNames(df, basename(files)), .id = "id")
Faster summary of problem:
Wanting to add an attribute for the file name while combining multiple csv files w/ lapply, read_csv, and bind_rows. Because .id pulls the names from the original data frame, I wasn't sure how to change it preemptively... but basename() serves that function.
Here, 'files' refers to the list of file names for my data.
ORIGINAL POST:
I have a number of csv files where the column names is on the fifth row. So right now I'm skipping the first four so all the files will have the same column structure. However, I want to add a column with an ID tag that's the name of that csv file. If it's any easier, the name of the file is repeated in one of the skipped cells in the first four rows.
Right now my code is this:
To create the list of file names
folder = "my\\desktop\\path"
files = list.files(
path = folder,
pattern = "_indicator.*csv$",
recursive = TRUE, # include subfolders
ignore.case = T,
full.names = T
)
To read all the files
df <-
lapply(files, function(i){
read.csv(i, header=TRUE, skip=4)
})
They read in as one data frame of 70 different lists, all containing 15 rows and 76 columns.
This is all good, this is correct. I got to this point by reading other stack overflow questions.
However, I ultimately want to combine them into one dataframe, not one dataframe of lists.
I'm thinking about doing this:
df <- bind_rows(df.list, .id = "id")
***I then get the data structure I want, but the id is just a number. I still have my list of file names... so I could use the mutate function and change the id number to the file name (assuming it's still in the same order). But that just feels really inefficient. Is there a way to make the indicator my filename at the bind_rows or read_csv step?
I understand that .id is using the names from my original data frame of lists... but I don't know how to use lapply / read_csv to create that original data frame w/ filenames instead of numbers... or even if that's the best approach***
I have some 100 json files in one folder.
I want to construct a dataset in R that contains a column which contains complete data from each of the json file.
Dataset must contain a column say jsonfile and each row in this column should contain data from one json file i.e data from 1st json file from the folder should be in first row of this column, 2nd json file data should be in second row of the column and so on. without destroying the structure of json file in this column
Can we achieve this using R? If yes, how can we do this?
I would really appreciate any help.
In case your files are in a data folder you can iterate over it and add the info in a data frame
df <- NULL
dir <- "data/"
for (file_ in list.files(dir)){
fullPath <- paste0(dir,file_)
content <- readChar(fullPath, file.info(fullPath)$size)
df <- rbind(df, data.frame(fileName = fullPath, content = content))
}
Result:
Without sample data and example output it's hard to know what you're asking, but to generally to bind identical json files into a single data.frame
purrr::map_dfr(
list.files("data/", full.names = TRUE), jsonlite::fromJSON
)
fromJSON creates a data.frame from the file. map_dfr iterates over the files and binds the data.frames together.
I have two csv files and I am using R-
https://drive.google.com/open?id=1CSLDs9qQXPMqMegdsWK2cQI_64B9org7
https://drive.google.com/open?id=1mVp1s0m4OZNNctVBn5JXIYK1JPsp-aiw
As is visible from the files, each file has a list of dates running from 2008 to the present along with other columns.
I want my output to be two files, but both should contain rows of data for the dates present in both files.
For eg. say date X is not there on 1 file, then it should be removed from the other file where it is present as well. Only dates and the corresponding rows present on both columns should survive on both output files.
I tried the inner_join function in the dplyr library but that didn't work because the dates are in factor format.
You can avoid the factor conversion of character strings by adding stringAsFactors = F. In addition, in your dataset you have NA coded as the string null, so you should also specify this in the call to read.csv
path1 <- "the path for the first dataset KS"
path2 <- "the path for the second dataset 105560.KS"
df1 <- read.csv(path1,stringsAsFactors = F)
df2 <- read.csv(path2,stringsAsFactors = F,na.strings = "null")
df_comb <- inner_join(df1,df2,by = "Date")
I want to extract a matrix data from THREE ".dat" files with file names x1,x2 and x3 and combine them in one matrix. ( I have merged them here for convenience but should be assumed from three files). Each file has 3x3 matrix data. I want to extract the data in each file with corresponding DATE on one column. So the result will have 4 columns and 9 rows. The date should be written on the first row of each matrix and the rest of the spaces can be filled with NA's or leave them. Here is the file:enter image description here
Assuming that the files have 3 header lines before the beginning of data and if all the files are in the working directory. Get all the files from the working directory using list.files(). Loop through the 'files', read the dataset with read.csv, skip the first 3 lines, specifying the header as FALSE. Then, we third line from each of the files with scan, remove the substring until the date part with sub, create a column in each of the list element using Map, and rbind the output to have a single data.frame.
files <- list.files()
lst <- lapply(files, read.csv, skip=3, header=FALSE)
lst2 <- lapply(files, scan, skip=2, nlines=1, what = "")
Datetime <- sub(".*:\\s+", "", unlist(lst2))
do.call(rbind, Map(cbind, lst, Datetime=Datetime))
I have a data.frame that contains one Date type variable. I want to export 4 files, one containing a subset corresponding to each week. The following will divide my data in 4 however I don't know how to store each of this in a new data.frame.
split(DataAir, sample(rep(1:4)))
Thanks
If you save your split data frames in a variable. You can access the elements with double-bracket subsetting, (e.g. s[[1]]). To save, create a vector of file names
as you'd like and write each to file.
s <- split(iris, iris$Species)
filenames <- paste0("my_path/file", 1:3, ".csv")
for(i in 1:length(s)) write.csv(s[[i]], filenames[i])
And for R users that get unnecessarily bugged out by for loops:
mapply(function(x,y) write.csv(x,y), s, filenames)