Specifying file names in a loop in R (converting .nc to geotiff) - r

I have a folder of .nc files on sea surface temperature, and I have a loop which extracts the variable I want ("analysed_sst") from the .nc file and writes the files to rasters.
I want to specify the name of the outputted raster files to be the first section of the original .nc file (which is the date).
An example would be that the original .nc file is called "20220113090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc" and so I would like the outputted raster to be called "20220113_STT.tiff".
I've attached the loop I'm using below.
library(ncdf4)
library(raster)
#input directory
dir.nc <- #file path
files.nc <- list.files(dir.nc, full.names = T, recursive = T)
#output directory
dir.output <- #file path
#loop
for (i in 1:length(files.nc)) {
r.nc <- raster(files.nc[i])
writeRaster(r.nc, paste(dir.output, i, '.tiff', sep = ''), format = 'GTiff', overwrite = T)
}

dirname() is vectorized so you should be able to safely use dirname(files.nc) to get the directory for each file.
Side note: It can be safer to use seq_along(files.nc) rather than 1:length(files.nc). When length(files.nc) == 0 you can get some confusing errors because 1:0 produces [1] 1 0 and your loop will try to do some weird stuff.

Related

How to loop through CSV files in directory and output them as RDS files

This is the code I have for now, I keep getting an error that fd is undefined, I tried defining it as
fd=data.frame()
but it doesn't work.
Code:
file<-list.files(pattern=".csv$")
#file creates a list of csv file names
for (i in seq_along(filenames))
{
fd[i]<- read.csv(file[i])
#read each csv file
output=c("o1.RDS","o2.RDS","o3.RDS")
#save each csv file as RDS every iteration,
#with the name as specified in the vector output.
saveRDS(fd[i],file =output[i])
}
You can do something like this, although its untested because I don't have a folder of .csv files at the moment:
library(tidyverse)
files <- list.files("./", pattern = ".csv")
map(files, ~read_csv(.x) %>%
write_rds(path = paste0("YOUR/PATH/HERE", basename(.x), ".rds")))
Have you tried defining fd as a list?
fd <- list()
Also in the example above you have a mistake. It should be "filenames" instead of "file".
Here is the result, which worked for me:
fd <- list()
file <- list.files(pattern=".csv$")
#file creates a list of csv file names
for (i in seq_along(file))
{
fd[i]<- read.csv(file[i])
#read each csv file
output = c("o1.RDS","o2.RDS","o3.RDS")
#save each csv file as RDS every iteration,
#with the name as specified in the vector output.
saveRDS(fd[i], file = output[i])
}

How can I loop over multiple files when the function argument is different each time?

I'm trying to extract sea surface temperature data from a series of .nc files.
So I have one folder containing the 30 downloaded .nc files all written like "1981.nc", "1982.nc" and so on.
But rather than load them all in individually I want to loop over each one and calculate the mean temperature for each file so I'd have 30 values of temperature at the end.
The problem is the year in date arguments have to change for each file. I thought of including something like years<-substr(filenames, 1,4) on the files which extracts the value of the year but it doesn't work.
I was thinking of something along the following lines:
library(ncdf4)
setwd("C:\\Users\\Desktop\\sst")
source("C:\\Users\\Desktop\\NOAA_OISST_ncdf4.R")
out.file<-""
filenames <- dir(pattern =".nc")
years<-substr(filenames, 1,4)
lst <- vector("list", length(filenames ))
for (i in 1:length(filenames)) {
ssts = extractOISSTdaily(filenames[i], "C:\\Users\\Desktop\\lsmask.oisst.v2.nc",
lonW=350,lonE=351,latS=52,latN=56,date1='years[i]-11-23', date2='years[i]-12-31')
mean(ssts)
}
The extractOISSTdaily function to do the extracting is described here: http://lukemiller.org/index.php/2014/11/extracting-noaa-sea-surface-temperatures-with-ncdf4/
The .nc files are here: https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html#detail
Does this work?
# Get filenames
filenames <- dir(pattern =".nc")
# Mean SSTs
m.ssts <- NULL
# Loop through filenames
for (i in filenames) {
# Get year (assuming form of filename is, e.g., 1981.nc)
year <- sub(".nc", "", i)
# Do whatever this function does
ssts <- extractOISSTdaily(i, "C:\\Users\\Desktop\\lsmask.oisst.v2.nc",
lonW=350, lonE=351, latS=52, latN=56,
date1=paste(year, "-11-23", sep = ""),
date2=paste(year, "-12-31", sep = ""))
# Profit!
m.ssts <- c(m.ssts, mean(ssts))
}
The code works by first collecting all filenames in the current directory with the extension .nc and creating an empty object in which to store the mean SSTs. The for loop goes through the filenames in turn stripping off the files extension to get the year (i.e., 1981.nc becomes 1981) by substituting an empty string in place of .nc. Next, the netCDF data for the specified interval is placed in ssts. The interval is created by pasting together the current year with the desired month and day. Finally, the mean is calculated and appended to the m.ssts object. As the OP says below, this should actually read m.ssts <- c(m.ssts, mean(ssts, na.rm = TRUE)) to allow for NA in the data.

R: Exporting potentially infinite function outputs to a csv file

I have a script that takes raw csv files in a folder, transforms the data in a method described in a function(filename) called "analyze", and spits out values into the console. When I attempt to write.csv these values, it only gives the last value of the function. IF there was a set amount of files per folder I would just do each specific csv file through the program, say [1:5], and lapply/set a matrix into write.csv. However, there is a potential for an infinite amount of files drawn from the directory, so this will not work (I think?). How would I export potentially infinite function outputs to a csv file? I have listed below my final steps after the function definition. It lists all the files in the folder and applys the function "anaylze" to all the files in the folder.
filename <- list.files(path = "VCDATA", pattern = ".csv", full.names = TRUE)
for (f in filename) {
print(f)
analyze(f)
}
Best,
Evan
It's hard to tell without a reproducible example, but I think you have assign the output of analyze to a vector or a dataframe (instead of spitting it out to the console).
Something along these lines:
filename <- list.files(path = "VCDATA", pattern = ".csv", full.names = TRUE)
results <- vector() #empty vector
for (f in filename) {
print(f)
results[which(filename==f)] <- analyze(f) #assign output vector
}
write.csv(results, file=xxx) #write csv file when loop is finished
I hope this answers your question, but it really depends on the format of the output of the analyze function.

Applying an R script to multiple files

I have an R script that reads a certain type of file (nexus files of phylogenetic trees), whose name ends in *.trees.txt. It then applies a number of functions from an R package called bGMYC, available here and creates 3 pdf files. I would like to know what I should do to make the script loop through the files for each of 14 species.
The input files are in a separate folder for each species, but I can put them all in one folder if that facilitates the task. Ideally, I would like to output the pdf files to a folder for each species, different from the one containing the input file.
Here's the script
# Call Tree file
trees <- read.nexus("L_boscai_1411_test2.trees.txt")
# To use with different species, substitute "L_boscai_1411_test2.trees.txt" by the path to each species tree
#Store the number of tips of the tree
ntips <- length(trees$tip.label[[1]])
#Apply bgmyc.single
results.single <- bgmyc.singlephy(trees[[1]], mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))
#Create the 1st pdf
pdf('results_single_boscai.pdf')
plot(results.single)
dev.off()
#Sample 50 trees
n <- sample(1:length(trees), 50)
trees.sample <- trees[n]
#Apply bgmyc.multiphylo
results.multi <- bgmyc.multiphylo(trees.sample, mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))
#Create 2nd pdf
pdf('results_boscai.pdf') # Substitute 'results_boscai.pdf' by "*speciesname.pdf"
plot(results.multi)
dev.off()
#Apply bgmyc.spec and spec.probmat
results.spec <- bgmyc.spec(results.multi)
results.probmat <- spec.probmat(results.multi)
#Create 3rd pdf
pdf('trees_boscai.pdf') # Substitute 'trees_boscai.pdf' by "trees_speciesname.pdf"
for (i in 1:50) plot(results.probmat, trees.sample[[i]])
dev.off()
I've read several posts with a similar question, but they almost always involve .csv files, refer to multiple files in a single folder, have a simpler script or do not need to output files to separate folders, so I couldn't find a solution to my specific problem.
Shsould I use a for loop or could I create a function out of this script and use lapply or another sort of apply? Could you provide me with sample code for your proposed solution or point me to a tutorial or another reference?
Thanks for your help.
It really depends on the way you want to run it.
If you are using linux / command line job submission, it might be best to look at
How can I read command line parameters from an R script?
If you are using GUI (Rstudio...) you might not be familiar with this, so I would solve the problem
as a function or a loop.
First, get all your file names.
files = list.files(path = "your/folder")
# Now you have list of your file name as files. Just call each name one at a time
# and use for loop or apply (anything of your choice)
And since you would need to name pdf files, you can use your file name or index (e.g loop counter) and append to the desired file name. (e.g. paste("single_boscai", "i"))
In your case,
files = list.files(path = "your/folder")
# Use pattern = "" if you want to do string matching, and extract
# only matching files from the source folder.
genPDF = function(input) {
# Read the file
trees <- read.nexus(input)
# Store the index (numeric)
index = which(files == input)
#Store the number of tips of the tree
ntips <- length(trees$tip.label[[1]])
#Apply bgmyc.single
results.single <- bgmyc.singlephy(trees[[1]], mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))
#Create the 1st pdf
outname = paste('results_single_boscai', index, '.pdf', sep = "")
pdf(outnam)
plot(results.single)
dev.off()
#Sample 50 trees
n <- sample(1:length(trees), 50)
trees.sample <- trees[n]
#Apply bgmyc.multiphylo
results.multi <- bgmyc.multiphylo(trees.sample, mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))
#Create 2nd pdf
outname = paste('results_boscai', index, '.pdf', sep = "")
pdf(outname) # Substitute 'results_boscai.pdf' by "*speciesname.pdf"
plot(results.multi)
dev.off()
#Apply bgmyc.spec and spec.probmat
results.spec <- bgmyc.spec(results.multi)
results.probmat <- spec.probmat(results.multi)
#Create 3rd pdf
outname = paste('trees_boscai', index, '.pdf', sep = "")
pdf(outname) # Substitute 'trees_boscai.pdf' by "trees_speciesname.pdf"
for (i in 1:50) plot(results.probmat, trees.sample[[i]])
dev.off()
}
for (i in 1:length(files)) {
genPDF(files[i])
}

Assigning Directory as a Variable in R

I need to create a function called PollutantMean with the following arguments: directory, pollutant, and id=1:332)
I have most of the code written but I can't figure out how to assign my directory as a variable. My current working directory is C:/Users/User/Documents. I tried writing the variable as:
directory <- "C:/Users/User/specdata" and that didn't work.
Next I tried the following:
directory <- list.files("specdata", full.names=TRUE) and that didn't work either.
Any ideas on how to change this?
If you are trying to assign the values in your current working directory to the variable "directory" Why not take the simple method and add:
directory <- getwd()
This should take the contents of the working directory and assign the values to the variable "directory".
I've already worker with directory as variables, I usually declare them like that
directory<-"C://Users//User//specdata//"
To take back your example.
Then, if I want to read a specific file in this directory, I will just go like :
read.table(paste(directory,"myfile.txt",sep=""),...)
It's the same process to write in a file
write.table(res,file=paste(directory,"myfile.txt",sep=""),...)
Is this helping ?
EDIT : you can then use read.csv and it will work fine
I think you are confused by the assignment operation in R. The following line
directory <- "C:/Users/User/specdata"
assigns a string to a new object that just happened to be called directory. It has the same effect on your working environment as
elephant <- "C:/Users/User/specdata"
To change where R reads its files, use the function setwd (short for set working directory):
setwd("C:/Users/User/specdata")
You can also specify full path names to functions that read in data (like read.table). For your specific problem,
# creates a list of all files ending with `csv` (i.e. all csv files)
all.specdata.files <- list.files(path = "C:/Users/User/specdata", pattern = "csv$")
# creates a list resulting from the application of `read.csv` to
# each of these files (which may be slow!!)
all.specdata.list <- lapply(all.specdata.files, read.csv)
Then we use dplyr::rbind_all to row-bind them into one file.
library(dplyr)
all.specdata <- rbind_all(all.specdata.list)
Then use colMeans to determine the grand means. Not sure how to do this without seeing the data.
Assuming that the columns in each of the 300+ csv files are the same, that is have column j contains the same type of data in all files, then the following example should be of use:
# let's use a temp directory for storing the files
tmpdr <- tempdir()
# Let's creat a large matrix of values and then split it into many different
# files
original_data <- data.frame(matrix(rnorm(10000L), nrow = 1000L))
# write each row to a file
for(i in seq(1, nrow(original_data), by = 1)) {
write.csv(original_data[i, ],
file = paste0(tmpdr, "/", formatC(i, format = "d", width = 4, flag = 0), ".csv"),
row.names = FALSE)
}
# get a character vector with the full path of each of the files
files <- list.files(path = tmpdr, pattern = "\\.csv$", full.names = TRUE)
# read each file into a list
read_data <- lapply(files, read.csv)
# bind the read_data into one data.frame,
read_data <- do.call(rbind, read_data)
# check that our two data.frames are the same.
all.equal(read_data, original_data)
# [1] TRUE

Resources