I'm using the following code:
lst <- split(data, cut(data$Pos, breaks = maxima, include.lowest = TRUE))
dir <- getwd()
lapply(seq_len(length(lst)),
function (i) write.csv(lst[[i]], file = paste0(dir,"/",names(lst[i]), ".csv"), row.names = FALSE)) ## split data into .csv files based on max.csvima values
that another user provided me with, to split and save a dataset into separate .csv files. However, when the files are saved they are saved in a naming format as so: [0,9], (9,19], etc., which the analysis program I'm using cannot read in. How would I change the filenames that they are being saved as? I assumed that it was the
names(lst[i])
portion, however when I changed that (e.g. to names(vec[i]) with vec being a vector of numbers with the same length as the number of data files), no data files were created.
Any help is appreciated!
#desc provides the answer in the comment you only need to change your code to
lst <- split(data, cut(data$Pos, breaks = maxima, include.lowest = TRUE))
dir <- getwd()
lapply(seq_len(length(lst)),
function (i) write.csv(lst[[i]], file = paste0(dir,"/your_desired_label_here",names(lst[i]), ".csv"), row.names = FALSE)) ## split data into .csv files based on max.csvima values
Related
I have multiple .csv files in a folder that I would like to combine into multiple dataframes. I already can do this for one set of files using the following code:
DF_week1 <- list.files(path = 'x:/full/file/path',
pattern = "^Dai15E_ABC_10mbin_20211201_fullwatercolumn_evening_BNR*.*_week1.csv", full.names = TRUE) %>%
map_dfr(read_csv)
I have standard file names like this for about 5 weeks worth of data. What I want to do is make a loop. For example, the first batch of file names ends in "week1" with the next set ending in "week2". I'm trying to write something that would recognize that the number after "week" has changed and to now combine those .csv files into a new data frame (DF_week2, DF_week3, etc.). I'm unfortunately a bit stuck here and rather new to looping.
You can store the dataframes in a list after reading them, and set up a simple for loop:
n = 10 # fill with number of weeks
file_list = list()
for (i in seq(n)) {
file_list[[i]] <- list.files(path = 'x:/full/file/path',
pattern = str_c("^Dai15E_ABC_10mbin_20211201_fullwatercolumn_evening_BNR*.*_week", i, ".csv"), full.names = TRUE) %>%
map_dfr(read_csv)
}
I have a folder with many csv files. Each file has several columns as well as lat and long columns. Another folder have many rasters in tif format. The .csv files are named based on Julian date (e.g. 251.csv), and so the rasters (e.g. 251.tif). I would like to be able to add the raster value to the csv with matching name and save to a new csv in R. What I want to achieve is this:
raster<-raster("c:/temp/TIFF/2001/273.tif")
points<-read.csv("c:/temp/csv/2001/273.csv")
coordinates(points)=~long+lat
rasValue=extract(raster,points)
combinePointValue <- cbind(points,rasValue)
head(combinePointValue)
library(spdplyr)
combinePointValue <- combinePointValue %>%
rename(chloro = 10)
write.table(combinePointValue,file="c:/temp/2001/chloro/273_chloro.csv",append=FALSE,
sep=",",row.names=FALSE, col.names=TRUE)
Considering the many csv and many tif files, I would prefer avoiding having to type this over and over. Anyone able to help?
Many thanks in advance!
Ilaria
It is better to provide a minimal reproducible example since your code can not run without your specific data. However, if I understand well, you can try something like this. Since csv and tif files have the same name, you can sort them and loop through the file index. You can use the original path of csv files to save a new file just by pasting the suffix "_chloro:
library(spdplyr)
csv <- sort(list.files("c:/temp/csv/2001/",full.names = T))
tif <- sort(list.files("c:/temp/TIFF/2001/",full.names = T))
lapply(1:length(csv),function(i){
raster<-raster(tif[i])
points<-read.csv(csv[i])
coordinates(points)=~long+lat
rasValue=extract(raster,points)
combinePointValue <- cbind(points,rasValue)
head(combinePointValue)
combinePointValue <- combinePointValue %>%
rename(chloro = 10)
write.table(combinePointValue,file=paste0(tools::file_path_sans_ext(csv[i]),"_chloro.csv"),append=FALSE,
sep=",",row.names=FALSE, col.names=TRUE)
})
SInce the R spatial "ecosystem" is undergoing dramatic changes over the past few years, and package like sp and raster will be deprecated, you might consider a solution based on the terra package.
It would go something like:
# Not tested!
library(terra)
csv_path = "c:/temp/csv/2001/"
tif_path = "c:/temp/TIFF/2001/"
tif_list = list.files(file.path(tif_path, pattern = "*.tif", full.names = FALSE)
result_list = lapply(1:length(tif_list), function(i) {
tif_file = file.path(tif_path, tif_list[i])
# Do not assume that the list of files are exactly equivalent.
# Instead create CSV file name from tif file
csv_name = gsub("tif", "csv", tif_file)
csv_file = file.path(csv_path, csv_name)
r = rast(tif_file)
csv_df = read.csv(csv_file)
# Assume csv long/lat are the same CRS as the tif files
pts = vect(csv_df, geom=c("long", "lat"), crs=st_crs(tif))
result = extract(r, pts, xy = TRUE)
new_csv = paste0(tools::file_path_sans_ext(csv_file),"_chloro.csv")
write.csv(result, file.path(csv_path, new_csv))
return(result)
})
I am pretty new to R and programming so I do apologies if this question has been asked elsewhere.
I'm trying to load multiple .csv files, edit them and save again. But cannot find out how to manage more than one .csv file and also name new files based on a list of character strings.
So I have .csv file and can do:
species_name<-'ace_neg'
{species<-read.csv('species_data/ace_neg.csv')
species_1_2<-species[,1:2]
species_1_2$species<-species_name
species_3_2_1<-species_1_2[,c(3,1,2)]
write.csv(species_3_2_1, file='ace_neg.csv',row.names=FALSE)}
But I would like to run this code for all .csv files in the folder and add text to a new column based on .csv file name.
So I can load all .csv files and make a list of character strings for use as a new column text and as new file names.
NDOP_files <- list.files(path="species_data", pattern="*.csv$", full.names=TRUE, recursive=FALSE)
short_names<- substr(NDOP_files, 14,20)
Then I tried:
lapply(NDOP_files, function(x){
species<-read.csv(x)
species_1_2<-species[,1:2]
species_1_2$species<-'name' #don't know how to insert first character string of short_names instead of 'name', than second character string from short_names for second csv. file etc.
Then continue in the code to change an order of columns
species_3_2_1<-species_1_2[,c(3,1,2)]
And then write all new modified csv. files and name them again by the list of short_names.
I'm sorry if the text is somewhat confusing.
Any help or suggestions would be great.
You are actually quite close and using lapply() is really good idea.
As you state, the issue is, it only takes one list as an argument,
but you want to work with two. mapply() is a function in base R that you can feed multiple lists into and cycle through synchronically. lapply() and mapply()are both designed to create/ manipulate objects inRbut you want to write the files and are not interested in the out withinR. Thepurrrpackage has thewalk*()\ functions which are useful,
when you want to cycle through lists and are only interested in creating
side effects (in your case saving files).
purrr::walk2() takes two lists, so you can provide the data and the
file names at the same time.
library(purrr)
First I create some example data (I’m basically already using the same concept here as I will below):
test_data <- map(1:5, ~ data.frame(
a = sample(1:5, 3),
b = sample(1:5, 3),
c = sample(1:5, 3)
))
walk2(test_data,
paste0("species_data/", 1:5, "test.csv"),
~ write.csv(.x, .y))
Instead of getting the file paths and then stripping away the path
to get the file names, I just call list.files(), once with full.names = TRUE and once with full.names = FALSE.
NDOP_filepaths <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = TRUE,
recursive = FALSE
)
NDOP_filenames <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = FALSE,
recursive = FALSE
)
Now I feed the two lists into purrr::walk2(). Using the ~ before
the curly brackets I can define the anonymous function a bit more elegant
and then use .x, and .y to refer to the entries of the first and the
second list.
walk2(NDOP_filepaths,
NDOP_filenames,
~ {
species <- read.csv(.x)
species <- species[, 1:2]
species$species <- gsub(".csv", "", .y)
write.csv(species, .x)
})
Learn more about purrr at purrr.tidyverse.org.
Alternatively, you could just extract the file name in the loop and stick to lapply() or use purrr::map()/purrr::walk(), like this:
lapply(NDOP_filepaths,
function(x) {
species <- read.csv(x)
species <- species[, 1:2]
species$species <- gsub("species///|.csv", "", x)
write.csv(species, gsub("species///", "", x))
})
NDOP_files <- list.files(path="species_data", pattern="*.csv$",
full.names=TRUE, recursive=FALSE)
# Get name of each file (without the extension)
# basename() removes all of the path up to and including the last path seperator
# file_path_sands_ext() removes the .csv extension
csvFileNames <- tools::file_path_sans_ext(basename(NDOP_files))
Then, I would write a function that takes in 1 csv file and does some manipulation to the file and outputs out a data frame. Since you have a list of csv files from using list.files, you can use the map function in the purrr package to apply your function to each csv file.
doSomething <- function(NDOP_file){
# your code here to manipulate NDOP_file to your liking
return(NDOP_file)
NDOP_files <- map(NDOP_files, ~doSomething(.x))
Lastly, you can manipulate the file names when you write the new csv files using csvFileNames and a custom function you write to change the file name to your liking. Essentially, use the same architecture of defining your custom function and using map to apply to each of your files.
I am trying to loop through all the subfolders of my wd, list their names, open 'data.csv' in each of them and extract the second and last value from that csv file.
The df would look like this :
Name_folder_1 2nd value Last value
Name_folder_2 2nd value Last value
Name_folder_3 2nd value Last value
For now, I managed to list the subfolders and each of the file (thanks to this thread: read multiple text files from multiple folders) but I struggle to implement (what I'm guessing should be) a nested loop to read and extract data from the csv files.
parent.folder <- "C:/Users/Desktop/test"
setwd(parent.folder)
sub.folders1 <- list.dirs(parent.folder, recursive = FALSE)
r.scripts <- file.path(sub.folders1)
files.v <- list()
for (j in seq_along(r.scripts)) {
files.v[j] <- dir(r.scripts[j],"data$")
}
Any hints would be greatly appreciated !
EDIT :
I'm trying the solution detailed below but there must be something I'm missing as it runs smoothly but does not produce anything. It might be something very silly, I'm new to R and the learning curve is making me dizzy :p
lapply(files, function(f) {
dat <- fread(f) # faster
dat2 <- c(basename(dirname(f)), head(dat$time, 1), tail(dat$time, 1))
write.csv(dat2, file = "test.csv")
})
Not easy to reproduce but here is my suggestion:
library(data.table)
files <- list.files("PARENTDIR", full.names = T, recursive = T, pattern = ".*.csv")
lapply(files, function(f) {
dat <- fread(f) # faster
# Do whatever, get the subfolder name for example
basename(dirname(f))
})
You can simply look recursivly for all CSV files in your parent directory and still get their corresponding parent folder.
I need to create a function called PollutantMean with the following arguments: directory, pollutant, and id=1:332)
I have most of the code written but I can't figure out how to assign my directory as a variable. My current working directory is C:/Users/User/Documents. I tried writing the variable as:
directory <- "C:/Users/User/specdata" and that didn't work.
Next I tried the following:
directory <- list.files("specdata", full.names=TRUE) and that didn't work either.
Any ideas on how to change this?
If you are trying to assign the values in your current working directory to the variable "directory" Why not take the simple method and add:
directory <- getwd()
This should take the contents of the working directory and assign the values to the variable "directory".
I've already worker with directory as variables, I usually declare them like that
directory<-"C://Users//User//specdata//"
To take back your example.
Then, if I want to read a specific file in this directory, I will just go like :
read.table(paste(directory,"myfile.txt",sep=""),...)
It's the same process to write in a file
write.table(res,file=paste(directory,"myfile.txt",sep=""),...)
Is this helping ?
EDIT : you can then use read.csv and it will work fine
I think you are confused by the assignment operation in R. The following line
directory <- "C:/Users/User/specdata"
assigns a string to a new object that just happened to be called directory. It has the same effect on your working environment as
elephant <- "C:/Users/User/specdata"
To change where R reads its files, use the function setwd (short for set working directory):
setwd("C:/Users/User/specdata")
You can also specify full path names to functions that read in data (like read.table). For your specific problem,
# creates a list of all files ending with `csv` (i.e. all csv files)
all.specdata.files <- list.files(path = "C:/Users/User/specdata", pattern = "csv$")
# creates a list resulting from the application of `read.csv` to
# each of these files (which may be slow!!)
all.specdata.list <- lapply(all.specdata.files, read.csv)
Then we use dplyr::rbind_all to row-bind them into one file.
library(dplyr)
all.specdata <- rbind_all(all.specdata.list)
Then use colMeans to determine the grand means. Not sure how to do this without seeing the data.
Assuming that the columns in each of the 300+ csv files are the same, that is have column j contains the same type of data in all files, then the following example should be of use:
# let's use a temp directory for storing the files
tmpdr <- tempdir()
# Let's creat a large matrix of values and then split it into many different
# files
original_data <- data.frame(matrix(rnorm(10000L), nrow = 1000L))
# write each row to a file
for(i in seq(1, nrow(original_data), by = 1)) {
write.csv(original_data[i, ],
file = paste0(tmpdr, "/", formatC(i, format = "d", width = 4, flag = 0), ".csv"),
row.names = FALSE)
}
# get a character vector with the full path of each of the files
files <- list.files(path = tmpdr, pattern = "\\.csv$", full.names = TRUE)
# read each file into a list
read_data <- lapply(files, read.csv)
# bind the read_data into one data.frame,
read_data <- do.call(rbind, read_data)
# check that our two data.frames are the same.
all.equal(read_data, original_data)
# [1] TRUE