Reading files from loops - r

I have the following part of the code that contains two loops. I have some txt files, which I want to be read and analyzed in R separately, one by one. Currently, I face a problem of importing them to R. For example, the name of the first file is "C:/Users/User 1/Documents/Folder 1/1 1986.txt". To read it in R I have made the following loop:
## company
for(i in 1)
{
## year
for(j in 1986)
{
df=read.delim(paste("C:/Users/User 1/Documents/Folder 1/", i, j, ".txt"), stringsAsFactors=FALSE, header=FALSE)
df<-data.frame(rename(df, c("V3"="weight")))
}
}
When I run the loop, I get the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C:/Users/User 1/Documents/Folder 1/ 13 1986 .txt': No such file or directory
How do I avoid those additional gaps that R assumes to exist in the name of the original file?

You should replace paste with paste0.
By default, paste use spaces as a separator, thus yielding the obtained result. And paste0 use nothing as a separator.

Because I don't know how your files look like exactly, maybe this won't help you... But this is how I read in files with a loop:
First: setting the working directory
setwd("/Users/User 1/Documents/Folder 1")
Then I always save my data as one excel file with different sheets. For this example I have 15 different sheets in my excel file named 2000-2014, the first sheet is called "2000", the second "2001" and so on.
sheets <- list() # creating empty list named sheets
for(i in 1:15){
sheets[[i]] <- read_excel("2000-2014.xlsx", sheet = i) # every sheet will be one layer of the list sheets
k <- c(2000:2014)
sheet[[i]]$Year <- k[i] # to every listlayer I add a column "Year", matching the actual year my data is from
}
No I want my data from 2000 to 2014 merged in one big data frame. I can still analyse them one by one!
data <- do.call(rbind.data.frame, sheets)
To tidy my data all in one and to get it into the form Hadley Wickham and ggplot2 like it (http://vita.had.co.nz/papers/tidy-data.pdf) I restructure it:
data_restructed <- data %>%
as.data.frame() %>%
tidyr::gather(key = "categories", value = "values", 2:12)
2:12 because in my case columns 2:12 contain all the values while column 1 contains countrienames. Now you have all your data in one big dataframe and can analyse them seperated to specific variables like the year or the category or year AND category and so on.

I would avoid the loop in this case and go with lapply.
Files <- list.files('C:/Users/User 1/Documents/Folder 1/', pattern = "*.txt")
fileList <- lapply(Files, FUN =- function(x){
df <- read.delim(x, stringsAsFactors=FALSE, header=FALSE)
df <- data.frame(rename(df, c("V3"="weight")))
return(df)
})
do.call('rbind', fileList)

Related

I am trying to read only the tail of multiple .xlsx files merged into a data.frame of lists

I am trying to merge multiple .xlsx sheets together into one data file within r, but extracting only the last row of each sheet.
I am a clinical academic, and we current have a prediction algorithm implemented via a macro-enabled excel spreadsheet. This macro-enabled spreadsheet outputs a .xlsx sheet into a pre-specified folder.
It unfortunately has a series of test rows that it inserted into the output .xlsx . Furthermore the users occasionally input the same data multiple times until it is correct. For this reason in the cleaned data we would only like the final row of each .xlsx file to be included.
I have managed to merge all the files, using the below code, mainly due to the help/code I have managed to find from this community.
I am unfortunately stuck at the following error message. See below
library(plyr)
library(dplyr)
library(readxl)
#file directory where the .xlsx files are to be listed below path <- "//c:/documents"
filenames_list <- list.files(path= path, full.names=TRUE)
All_list <- lapply (filenames_list,
function(filename){
print(paste("Merging",filename,sep = " "))
read.xlsx(filename)
})
#this below code doesnt work
#it returns the following error
# Error in x[seq.int(to = xlen, length.out = n)] :
# object of type 'S4' is not subsettable
tail_only_list_df <- lapply (All_list,
function(newtail){
tail(newtail, 1)
})
final_df <- rbind.fill(tail_only_list_df)
Try doing the following :
df <- do.call(rbind, lapply(filenames_list, function(filename)
tail(openxlsx::read.xlsx(filename), 1)))
Or if you already have list of excel files do
df <- do.call(rbind, lapply(All_list, tail, 1))

Importing multiple csv files, manipulate(filter a column and do a summary) and export the results in one txt file

I have code here I am trying to manipulte my files so that the one column is only 1 and not 1 and 0. I have multiple files and multiple columns but regardless filtering one column to get only 1's and keeping everything else should be easy to do. I can not get the dplyr function to work with d %>% filter(CreaseUp>0). Maybe there is another command with lapply that would work? everything else works. I can get the files summarized and outputted in one file. I'm so close to getting this right. Please help.
setwd("~/OneDrive/School/R/R Workspace/2016_Coda-Brundage/cb")
#assuming your working directory is the folder with the CSVs
f = list.files(pattern="*.csv")
for (i in 1:length(f)) assign(f[i], read.csv(f[i]))
d<-lapply(f, read.csv)
f.1<-d %>%
filter(CreaseUp>0)
w<-lapply(f.1, summary)
write.table(w, file = "SeedScan_results1.csv", sep = ",", col.names = NA,
qmethod = "double")
Final script. I had to open the .txt file in office, change the spaces inbetween the headings and numbers to commas and then create a table from text. From there i could put it in excel and pull my means from this set.
setwd("~/OneDrive")
#assuming your working directory is the folder with the CSVs
f=list.files(pattern="*.csv")
library(dplyr)
sink("SeedScan_results1.txt")
for (i in 1:length(f)){
df=assign(f[i], read.csv(f[i]))
df=filter(df, CreaseUp>0)
print(lapply(df, summary))
}
sink(NULL)
The d seems to be a list of dataframes, not a dataframe, so dplyr can't handle it. Also, what is that loop doing now? Why not put the read (and possibly filtering) inside the loop?
alldfs = NULL
for (i in f){
df = read.csv(i)
df = filter(df, CreaseUp>0)
alldfs = bind_rows(alldfs, df)
}
# print summary etc.
EDIT - if you want to print the summary from within the loop:
sink("SeedScan_results1.txt")
for (i in f){
df = read.csv(i)
df = filter(df, CreaseUp>0)
print(lapply(df, summary))
}
sink(NULL)
The append flag might be helpful if you want to move sink inside the loop.

R: Reading and writing multiple csv files into a loop then using original names for output

Apologies if this may seem simple, but I can't find a workable answer anywhere on the site.
My data is in the form of a csv with the filename being a name and number. Not quite as simple as having file with a generic word and increasing number...
I've achieved exactly what i want to do with just one file, but the issue is there are a couple of hundred to do, so changing the name each time is quite tedious.
Posting my original single-batch code here in the hopes someone may be able to ease the growing tension of failed searches.
# set workspace
getwd()
setwd(".../Desktop/R Workspace")
# bring in original file, skipping first four rows
Person_7<- read.csv("PersonRound7.csv", header=TRUE, skip=4)
# cut matrix down to 4 columns
Person7<- Person_7[,c(1,2,9,17)]
# give columns names
colnames(Person7) <- c("Time","Spare", "Distance","InPeriod")
# find the empty rows, create new subset. Take 3 rows away for empty lines.
nullrow <- (which(Person7$Spare == "Velocity"))-3
Person7 <- Person7[(1:nullrow), ]
#keep 3 needed columns from matrix
Person7<- Person7[,c(1,3,4)]
colnames(Person7) <- c("Time","Distance","InPeriod")
#convert distance and time columns to factors
options(digits=9)
Person7$Distance <- as.numeric(as.character(Person7$Distance))
Person7$Time <- as.numeric(as.character(Person7$Time))
#Create the differences column for distance
Person7$Diff <- c(0, diff(Person7$Distance))
...whole heap of other stuff...
#export Minutes to an external file
write.csv(Person7_maxs, ".../Desktop/GPS Minutes/Person7.csv")
So the three part issue is as follows:
I can create a list or vector to read through the file names, but not a dataframe for each, each time (if that's even a good way to do it).
The variable names throughout the code will need to change instead of just being "Person1" "Person2", they'll be more like "Johnny1" "Lou23".
Need to export each resulting dataframe to it's own csv file with the original name.
Taking any and all suggestions on board - s.t.ruggling with this one.
Cheers!
Consider using one list of the ~200 dataframes. No need for separate named objects flooding global environment (though list2env still shown below). Hence, use lapply() to iterate through all csv files of working directory, then simply name each element of list to basename of file:
setwd(".../Desktop/R Workspace")
files <- list.files(path=getwd(), pattern=".csv")
# CREATE DATA FRAME LIST
dfList <- lapply(files, function(f) {
df <- read.csv(f, header=TRUE, skip=4)
df <- setNames(df[c(1,2,9,17)], c("Time","Spare","Distance","InPeriod"))
# ...same code referencing temp variable, df
write.csv(df_max, paste0(".../Desktop/GPS Minutes/", f))
return(df)
})
# NAME EACH ELEMENT TO CORRESPONDING FILE'S BASENAME
dfList <- setNames(dfList, gsub(".csv", "", files))
# REFERENCE A DATAFRAME WITH LIST INDEXING
str(dfList$PersonRound7) # PRINT STRUCTURE
View(dfList$PersonRound7) # VIEW DATA FRAME
dfList$PersonRound7$Time # OUTPUT ONE COLUMN
# OUTPUT ALL DFS TO SEPARATE OBJECTS (THOUGH NOT NEEDED)
list2env(dfList, envir = .GlobalEnv)

lapply r to one column of a csv file

I have a folder with several hundred csv files. I want to use lappply to calculate the mean of one column within each csv file and save that value into a new csv file that would have two columns: Column 1 would be the name of the original file. Column 2 would be the mean value for the chosen field from the original file. Here's what I have so far:
setwd("C:/~~~~")
list.files()
filenames <- list.files()
read_csv <- lapply(filenames, read.csv, header = TRUE)
dataset <- lapply(filenames[1], mean)
write.csv(dataset, file = "Expected_Value.csv")
Which gives the error message:
Warning message: In mean.default("2pt.csv"[[1L]], ...) : argument is not numeric or logical: returning NA
So I think I have 2(at least) problems that I cannot figure out.
First, why doesn't r recognize that column 1 is numeric? I double, triple checked the csv files and I'm sure this column is numeric.
Second, how do I get the output file to return two columns the way I described above? I haven't gotten far with the second part yet.
I wanted to get the first part to work first. Any help is appreciated.
I didn't use lapply but have done something similar. Hope this helps!
i= 1:2 ##modify as per need
##create empty dataframe
df <- NULL
##list directory from where all files are to be read
directory <- ("C:/mydir/")
##read all file names from directory
x <- as.character(list.files(directory,,pattern='csv'))
xpath <- paste(directory, x, sep="")
##For loop to read each file and save metric and file name
for(i in i)
{
file <- read.csv(xpath[i], header=T, sep=",")
first_col <- file[,1]
d<-NULL
d$mean <- mean(first_col)
d$filename=x[i]
df <- rbind(df,d)
}
###write all output to csv
write.csv(df, file = "C:/mydir/final.csv")
CSV file looks like below
mean filename
1999.000661 hist_03082015.csv
1999.035121 hist_03092015.csv
Thanks for the two answers. After much review, it turns out that there was a much easier way to accomplish my goal. The csv files that I had were originally in one file. I split them into multiple files by location. At the time, I thought this was necessary to calculate mean on each type. Clearly, that was a mistake. I went to the original file and used aggregate. Code:
setwd("C:/~~")
allshots <- read.csv("All_Shots.csv", header=TRUE)
EV <- aggregate(allshots$points, list(Location = allshots$Loc), mean)
write.csv(EV, file= "EV_location.csv")
This was a simple solution. Thanks again or the answers. I'll need to get better at lapply for future projects so they were not a waste of time.

How to not overwrite file in R

I am trying to copy and paste tables from R into Excel. Consider the following code from a previous question:
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.data.frame(table(outline))
print(outline) # this prints all n tables
name <- paste0(i,"X.csv")
write.csv(outline, name)
}
This code writes each table into separate Excel files (i.e. "1X.csv", "2X.csv", etc..). Is there any way of "shifting" each table down some rows instead of rewriting the previous table each time? I have also tried this code:
output <- as.data.frame(output)
wb = loadWorkbook("X.xlsx", create=TRUE)
createSheet(wb, name = "output")
writeWorksheet(wb,output,sheet="output",startRow=1,startCol=1)
writeNamedRegion(wb,output,name="output")
saveWorkbook(wb)
But this does not copy the dataframes exactly into Excel.
I think, as mentioned in the comments, the way to go is to first merge the data frames in R and then writing them into (one) output file:
# get vector of filenames
filenames <- list.files(path=getwd())
# for each filename: load file and create outline
outlines <- lapply(filenames, function(filename) {
data <- read.csv(filename)
outline <- data[,2]
outline <- as.data.frame(table(outline))
outline
})
# merge all outlines into one data frame (by appending them row-wise)
outlines.merged <- do.call(rbind, outlines)
# save merged data frame
write.csv(outlines.merged, "all.csv")
Despite what microsoft would like you to believe, .csv files are not excel files, they are a common file type that can be read by excel and many other programs.
The best approach depends on what you really want to do. Do you want all the tables to read into a single worksheet in excel? If so you could just write to a single file using the append argument to the write.csv or other functions. Or use a connection that you keep open so each new one is appended. You may want to use cat to put a couple of newlines before each new table.
Your second attempt looks like it uses the XLConnect package (but you don't say, so it could be something else). I would think this the best approach, how is the result different from what you are expecting?

Resources