R Read files, manipulate file names and write files from/to folder - r

I have 10 files in a folder, which I have to manipulate. At the moment I do that manually meaning I adapt the script ten times. Now I'm trying to do that automatically.
First I get the file names by this command:
data <- list.files(path=".", pattern=".csv", full.names=TRUE)
Now I have the idea to iterate over the file names by a for-loop.
for(i in data) {
df <- read.csv("i", header=T)
df$sum <- sum(df$value1, df$value2)
write.table(df, file=i, row.names=FALSE)
}
I'm not sure if the read.csv command works. Moreover, t I don't want to rewrite the file. The originally file names have the following structure
30LOV_1.csv
30LOV_2.csv
100LOV_1.csv
2000LOV_1.csv
I want to add something like _20min to the file names, e.g.
30LOV_1_20min.csv
30LOV_2_20min.csv
100LOV_1_20min.csv
2000LOV_1_20min.csv
How can I achieve this? Do you have any suggestions?
Thanks

You can use sub to add additional information to the end of your filename. I used parenthesis to create a captured group to get the first part of the file name (.*) and then recalled it in the second part with \\1.
for(i in data) {
df <- read.csv("i", header=T)
df$sum <- sum(df$value1, df$value2)
filename <- sub("(.*).csv","\\1_20min.csv",i)
filename
# [1] "30LOV_1_20min.csv"
write.table(df, file=filename, row.names=FALSE)
}

Related

How to read multiple files with specific character or number using a loop in R

I have a directory with a bunch of .tif files like: tnt_xxx_2015.tif, tnt_xxx_2016.tif, tnt_xxx_2017.tif......tnt_xxx_2100.tif. 'tnt' is one of the variable names, and for each year "xxx" represents multiple strings. The years are from 2015 to 2100.So I want to find all files that the variable name and year are the same values.I tried to find all files with the variable 'tnt' in same year first:
for (i in 2015:2100)
{
myfiles = Sys.glob("*i.tif")
}
But it doesn't work. It won't accept wildcards; it needs an exact match on the file name. Sys.glob("2015.tif") runs successfully.
Any ideas?
How about:
for(I in 2015:2100){
myfiles <- list.files(pattern=paste0(i, ".tif$"))
}
This replicates your code above, but will overwrite myfiles each time through the loop. If you want to just grab all the files in one vector, you could do the following:
myfiles <- NULL
for(I in 2015:2100){
myfiles <- c(myfiles, list.files(pattern=paste0(i, ".tif$")))
}

remove multiple sequenced files in R

i created multiple files named 1:100 + random letter to file:
for (i in 1:100){
file.create( paste0(i , ".txt"), showWarnings=TRUE)
# assign random LETTER to files
AZ <- sample(LETTERS,1)
cat(AZ,file = paste0(i,".txt"),append=TRUE)
#rename files, and create new file with append of LETTERS
name <- scan(file=paste0(i,".txt"), what="character")
file.rename(paste0(i,".txt"), paste0(i, name,".txt"))
Now, i have a lot of files named like "1T, 2C, 3Y,..., 100A" and i want to remove all these files (not removing the rest that has in the directory) with file.remove function, how should i remove them without naming one by one? and all the directory named "exercicio03" with everything inside?
ps.: i have already tried
file.remove(paste0(i,name,".txt"))
but is removing only the last file "100A"
You can easily remove only the files with names like "1T.txt, 2C.txt, 3Y.txt, ..., 100A.txt" with the following two lines of code:
remove.files <- list.files(".", pattern="^[0-9]{1,3}[A-Z]{1}\\.txt$")
do.call(file.remove,list(remove.files))
The script obtains all text files beginning with 1-3 digits followed by a letter in the current directory where you created them, and removes them.
Since you used a sample function, I think you can only be 100% sure that you remove only these files and no others, if you (did) save the values you became from that sample function.
So your first part should have been:
AZ<-NA
for (i in 1:100){
file.create( paste0(i , ".txt"), showWarnings=TRUE)
# assign random LETTER to files
AZ[i] <- sample(LETTERS,1)
cat(AZ[i],file = paste0(i,".txt"),append=TRUE)
#rename files, and create new file with append of LETTERS
name <- scan(file=paste0(i,".txt"), what="character")
file.rename(paste0(i,".txt"), paste0(i, name,".txt"))
}
That way you can afterwards remove them all via this :
for (i in 1:100){
file.remove(paste0(i,AZ[i],".txt"))
}

lapply r to one column of a csv file

I have a folder with several hundred csv files. I want to use lappply to calculate the mean of one column within each csv file and save that value into a new csv file that would have two columns: Column 1 would be the name of the original file. Column 2 would be the mean value for the chosen field from the original file. Here's what I have so far:
setwd("C:/~~~~")
list.files()
filenames <- list.files()
read_csv <- lapply(filenames, read.csv, header = TRUE)
dataset <- lapply(filenames[1], mean)
write.csv(dataset, file = "Expected_Value.csv")
Which gives the error message:
Warning message: In mean.default("2pt.csv"[[1L]], ...) : argument is not numeric or logical: returning NA
So I think I have 2(at least) problems that I cannot figure out.
First, why doesn't r recognize that column 1 is numeric? I double, triple checked the csv files and I'm sure this column is numeric.
Second, how do I get the output file to return two columns the way I described above? I haven't gotten far with the second part yet.
I wanted to get the first part to work first. Any help is appreciated.
I didn't use lapply but have done something similar. Hope this helps!
i= 1:2 ##modify as per need
##create empty dataframe
df <- NULL
##list directory from where all files are to be read
directory <- ("C:/mydir/")
##read all file names from directory
x <- as.character(list.files(directory,,pattern='csv'))
xpath <- paste(directory, x, sep="")
##For loop to read each file and save metric and file name
for(i in i)
{
file <- read.csv(xpath[i], header=T, sep=",")
first_col <- file[,1]
d<-NULL
d$mean <- mean(first_col)
d$filename=x[i]
df <- rbind(df,d)
}
###write all output to csv
write.csv(df, file = "C:/mydir/final.csv")
CSV file looks like below
mean filename
1999.000661 hist_03082015.csv
1999.035121 hist_03092015.csv
Thanks for the two answers. After much review, it turns out that there was a much easier way to accomplish my goal. The csv files that I had were originally in one file. I split them into multiple files by location. At the time, I thought this was necessary to calculate mean on each type. Clearly, that was a mistake. I went to the original file and used aggregate. Code:
setwd("C:/~~")
allshots <- read.csv("All_Shots.csv", header=TRUE)
EV <- aggregate(allshots$points, list(Location = allshots$Loc), mean)
write.csv(EV, file= "EV_location.csv")
This was a simple solution. Thanks again or the answers. I'll need to get better at lapply for future projects so they were not a waste of time.

Assigning Directory as a Variable in R

I need to create a function called PollutantMean with the following arguments: directory, pollutant, and id=1:332)
I have most of the code written but I can't figure out how to assign my directory as a variable. My current working directory is C:/Users/User/Documents. I tried writing the variable as:
directory <- "C:/Users/User/specdata" and that didn't work.
Next I tried the following:
directory <- list.files("specdata", full.names=TRUE) and that didn't work either.
Any ideas on how to change this?
If you are trying to assign the values in your current working directory to the variable "directory" Why not take the simple method and add:
directory <- getwd()
This should take the contents of the working directory and assign the values to the variable "directory".
I've already worker with directory as variables, I usually declare them like that
directory<-"C://Users//User//specdata//"
To take back your example.
Then, if I want to read a specific file in this directory, I will just go like :
read.table(paste(directory,"myfile.txt",sep=""),...)
It's the same process to write in a file
write.table(res,file=paste(directory,"myfile.txt",sep=""),...)
Is this helping ?
EDIT : you can then use read.csv and it will work fine
I think you are confused by the assignment operation in R. The following line
directory <- "C:/Users/User/specdata"
assigns a string to a new object that just happened to be called directory. It has the same effect on your working environment as
elephant <- "C:/Users/User/specdata"
To change where R reads its files, use the function setwd (short for set working directory):
setwd("C:/Users/User/specdata")
You can also specify full path names to functions that read in data (like read.table). For your specific problem,
# creates a list of all files ending with `csv` (i.e. all csv files)
all.specdata.files <- list.files(path = "C:/Users/User/specdata", pattern = "csv$")
# creates a list resulting from the application of `read.csv` to
# each of these files (which may be slow!!)
all.specdata.list <- lapply(all.specdata.files, read.csv)
Then we use dplyr::rbind_all to row-bind them into one file.
library(dplyr)
all.specdata <- rbind_all(all.specdata.list)
Then use colMeans to determine the grand means. Not sure how to do this without seeing the data.
Assuming that the columns in each of the 300+ csv files are the same, that is have column j contains the same type of data in all files, then the following example should be of use:
# let's use a temp directory for storing the files
tmpdr <- tempdir()
# Let's creat a large matrix of values and then split it into many different
# files
original_data <- data.frame(matrix(rnorm(10000L), nrow = 1000L))
# write each row to a file
for(i in seq(1, nrow(original_data), by = 1)) {
write.csv(original_data[i, ],
file = paste0(tmpdr, "/", formatC(i, format = "d", width = 4, flag = 0), ".csv"),
row.names = FALSE)
}
# get a character vector with the full path of each of the files
files <- list.files(path = tmpdr, pattern = "\\.csv$", full.names = TRUE)
# read each file into a list
read_data <- lapply(files, read.csv)
# bind the read_data into one data.frame,
read_data <- do.call(rbind, read_data)
# check that our two data.frames are the same.
all.equal(read_data, original_data)
# [1] TRUE

Applying a task to several files in R

I would to apply a loop in R to process several files, one file per time. The files have exactly the same pattern, just the string "...split1..." is a crescent number to my files. Then a have files like "...split1...", "...split2..." ... "...split777...". I want output files like in the same logic, in the example: "newsplit1.txt", "newsplit2.txt" ... "newsplit777.txt".
all <- read.table("nsamplescluster.split1.adjusted", header=TRUE, sep=";")
all <- all[, -grep("GType", colnames(all))]
write.table(all, "newsplit1.txt", sep=";")
Cheers!
Use loop and paste file names.
for(i in 1:777){
infile <- paste0("nsamplescluster.split",i,".adjusted")
outfile <- paste0("newsplit",i,".txt")
all <- read.table(infile, header=TRUE, sep=";")
all <- all[, -grep("GType", colnames(all))]
write.table(all, outfile, sep=";")
}
If the files are all in the same directory, you can also use
filenames<- list.files(your.directory, pattern="nsamplescluster")
This will create a vector with all file names in your.directory with the indicated pattern. You can then use this to loop over your files. For instance,
for(i in filenames){
do stuff
}
This may come in handy if the number of files change.

Resources