I tried to write multiple csv files with the same amount of columns and rows into multiple data frames, which can be accessed in a way like this:
file[1] #Outputs the whole content of the first csv file
file[2] #Outputs the whole content of the second csv file
and so on...
I have already saved everything into one data frame, but the neccesary values can't be accessed in such a way:
files = list.files(pattern="*.csv")
myfiles = do.call(rbind, lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE)))
myfiles is one big data frame but I want to access them in the way I explained above.
I am using RStudio 0.9 and my working directory is where all files are located. The csv files are named in that way:
"001.csv"
"002.csv" "003.csv"...
Thank you in advance
Related
I have a folder containing excel files. There are some that have worksheets and I just want to know how to use a special case (maybe a for loop) where I would identify which files need to be used to read worksheets and then select only certain tabs to join all of the excel files together at the end. In addition, these tabs would need to skip 40 lines which I assume would use skip=40. When I type the code that I have, it looks like a giant mess.
files <- list.files(path = "/Users/Desktop/folder2, pattern = "*.xlsx")
files_join <- lapply(files, read_excel) %>%
bind_rows()
Folder 1 and Folder 2 are full of .rds files. How would I go about merging all files in both folders into 1 .rds file?
What I have so far
mergedat <- do.call('rbind', lapply(list.files("File/Path/To/Folder/1/", full.names = TRUE), readRDS))
However I don't know how to add the second file path and even then, the code above does not seem to be working.
The information in the .rds files are all set up exactly the same as far as number of columns and column headers go, but the information in them is obviously different. I just figured out that I did not have the files read either within my code.
Any suggestions?
You can do something like this twice, each time for a different path:
path <- "./files"
files <- list.files(path = path,
full.names = TRUE,
all.files = FALSE)
files <- files[!file.info(files)$isdir]
data <- lapply(files,
function(x) {
readRDS(x)
})
You end up with 2 data objects which are lists with each list element containing a data frame that corresponds with what is in the RDS file. If all those files are the same in terms if structure, you can use dplyr::bind_rows() to concatenate all data frames into one combined data frame.
I want to combine multiple .txt files in R from multiple folders. However, I'm running into trouble when I want to separate the data into different columns. Right now, the files combine but into one single column when there should be four.
I used list.files to find .txt files in the folders in my working directory. Then I used rbind and lapply to combine them with read.delim. (see below)
files = list.files(pattern = "*.txt")
myfiles = do.call(rbind, lapply(files, function(x) read.delim(x, header = FALSE, stringsAsFactors = FALSE)))
The above code combines all of the .txt files, but the first 3 rows of each file are artifacts of the data download (basically just a naming feature) and are not pertinent to the data itself. So once the files are combined, the three lines repeat. I cannot use filter(), as I would have to manually go through the data (many thousands of lines). I would also like to repeat this process in another folder with a similar setup. So I'd like to be able to use the same code.
I think I can resolve the issue by removing the top 3 lines of each .txt file before combining them. Then I can set header = FALSE and just add in headers once the files are combined. But again, there are many hundreds of files, so I do not wish to do this manually. I'm not sure how to do this, though. Any suggestions?
Thank you for any help.
Options, transcribed from the comment:
By itself, read.delim(..., skip = 3) will remove those leading duplicate rows. This will also remove the header row, so all of your frames will have generic column names, not a big problem.
To fix that, you can re-read the first row of one of the files (first?) to get the column names, with read.delim(..., nrows=1). If we used nrows=0, it reads all, so we need a minimum of 1 to limit the rows read; in the comment I included [0,], but since all you need is the column-names, it doesn't really affect things.
You can do it the first time with something like:
files = list.files(pattern = "*.txt")
myfiles = do.call(rbind, lapply(files, function(x) read.delim(x, skip = 3, header = FALSE, stringsAsFactors = FALSE)))
# added this part ^^^^^^^^^
colnames(myfiles) <- colnames(read.delim(files[1], header=TRUE, nrows=1))
Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory.
Now, I have created a csv file in each of these folders which contains a list of all the fitted values.
I would now like to do the following:
Set the working directory to the particular filename
Read fitted values file
Add a row/column stating the name of the site/ unique ID
Add it to the masterfile which is stored in the main directory with a title specifying site name/filename. It can be stacked by rows or by columns it doesn't really matter.
Come to the main directory to pick the next file
Repeat the loop
Using the merge(), rbind(), cbind() combines all the data under one column name. I want to keep all the sites separate for comparison at a later on stage.
This is what I'm using at the moment and I'm lost on how to proceed further.
setwd( "path") # main directory
path <-"path" # need this for convenience while switching back to main directory
# import all files and create a character type array
files <- list.files(path=path, pattern="*.csv")
for(i in seq(1, length(files), by = 1)){
fileName <- read.csv(files[i]) # repeat to set the required working directory
base <- strsplit(files[i], ".csv")[[1]] # getting the filename
setwd(file.path(path, base)) # setting the working directory to the same filename
master <- read.csv(paste(base,"_fiited_values curve.csv"))
# read the fitted value csv file for the site and store it in a list
}
I want to construct a for loop to make one master file with the files in different directories. I do not want to merge all under one column name.
For example, If I have 50 similar csv files and each had two columns of data, I would like to have one csv file which accommodates all of it; but in its original format rather than appending to the existing row/column. So then I will have 100 columns of data.
Please tell me what further information can I provide?
for reading a group of files, from a number of different directories, with pathnames patha pathb pathc:
paths = c('patha','pathb','pathc')
files = unlist(sapply(paths, function(path) list.files(path,pattern = "*.csv", full.names = TRUE)))
listContainingAllFiles = lapply(files, read.csv)
If you want to be really quick about it, you can grab fread from data.table:
library(data.table)
listContainingAllFiles = lapply(files, fread)
Either way this will give you a list of all objects, kept separate. If you want to join them together vertically/horizontally, then:
do.call(rbind, listContainingAllFiles)
do.call(cbind, listContainingAllFiles)
EDIT: NOTE, the latter makes no sense unless your rows actually mean something when they're corresponding. It makes far more sense to just create a field tracking what location the data is from.
if you want to include the names of the files as the method of determining sample location (I don't see where you're getting this info from in your example), then you want to do this as you read in the files, so:
listContainingAllFiles = lapply(files,
function(file) data.frame(filename = file,
read.csv(file)))
then later you can split that column to get your details (Assuming of course you have a standard naming convention)
I have a zip file having .txt documents. I want to unzip the file in R and convert the text documents into .csv so that I can use it for further analysis.
Can I give the header names while converting?
Further I also want to iterate the process by writing a function to read each above converted .csv file and generate basic graphs from the data. Is it feasible to do this in R?
For instance lets consider zip file name as 'data.zip' having 5 text files(1.txt, 2.txt, 3.txt, 4.txt, 5.txt). Each text file has log information on a single row with IP, date and time.
111.999.88.80 - - [27/Mar/2017:00:03:16 -0600] "HEAD / HTTP/1.1"
Your answers will be of great help.
Thanks in advance!
I create a reproducible sample.
And think this may solve your problem.
You can download the sample zip file I created from here.
Attached is the full codes.
## Clean Memory
rm(list=ls())
## Set path for your working location
setwd("D:/blah")
## unzipped it the file
unzip("D:/blah/text.zip")
## Check file in the zipped file
list.files()
## Read the file
temp = list.files(pattern="*.txt")
There is options here. I think what you want is the second one which combines the two files in the sample and merge them into one.
## Read the file as list
myfiles= lapply(temp, read.delim)
## Read the file all together
myfiles = do.call("rbind", lapply(temp, function(x) read.table(x, stringsAsFactors = FALSE,header = TRUE)))
Make sure to adjust the header setting if needed.
Alrighty, good luck.