Read Multiple ncdf files and make average in R - r

By using R ill try to open my NetCDF data that contain 5 dimensional space with 15 variables. (variable for calculation is in matrix 1000X920 )
This problem actually look like the same with the other question before.
I got explanation from here and the others
At first I used RNetCDF package, but after some trial i found unconsistensy when the package read my data. And then finally better after used ncdf package.
there is no problem for opening data in a single file, but after ill try for looping in more than hundred data inside folder for a spesific variable (for example: var no 15) the program was failed.
> days = formatC(001:004, width=3, flag="0")
> ncfiles = lapply (days,
> function(d){ filename = paste("data",d,".nc",sep="")
> open.ncdf(filename) })
also when i try the command like this for a spesific variable
> sapply(ncfiles,function(file,{get.var.ncdf(file,"var15")})
so my question is, any solution to read all netcdf file with special variable then make calculation in one frame. From the solution before i was failed for generating the variable no 15 on whole netcdf data.
thanks for any solution to this problem.
UPDATE:
this is the last what i have done
when i write
library(ncdf)
files=list.files("allnc/",pattern='*nc',full.names=TRUE)
for(i in seq_along(files)) {
nc <- lapply(files[i],open.ncdf)
lw = get.var.ncdf(nc,"var15")
x=dim(lw)
rbind(df,data.frame(lw))->df
}
i can get all netcdf data by > nc
so i how i can get variable data with new name automatically like lw1,lw2...etc
i cant apply
var1 <- lapply(files, FUN = get.var.ncdf, variable = "var15")
then i can do calculation with all data.
the other technique i try used RNetCDF package n doing a looping
# Declare data frame
df=NULL
#Open all files
files= list.files("allnc/",pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])
# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'DBZH')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'var15')
# Add the values from each file to a single data.frame
}
i can take a variable data but i just got one data from my all file nc.
note: sampe of my data name ( data20150102001.nc,data20150102002.nc.....etc)

This solution uses NCO, not R. You may use it to check your R solution:
ncra -v var15 data20150102*.nc out.nc
That is all.
Full documentation in NCO User Guide.

You can use the ensemble statistics capabilities of CDO, but note that on some systems the number of files is limited to 256:
cdo ensmean data20150102*.nc ensmean.nc
you can replace "mean" with the statistic of your choice, max, std, var, min etc...

Related

Retaining original file names when processing multiple raster files using R

I have the following problem: I need to process multiple raster files using the same function in R package landscapemetrics. Basically my raster files are parts of a country map, all of the same shape and size (i.e. quadrants. I figured out a code for 1 file, but I have to do the same with more than 600 rasters. So, doing it manually is very irrational. The steps in my code are the following:
# 1. I load "raster" and "landscapemetrics" packages:
library(raster)
library(landscapemetrics)
# 2. I read in my quadrant:
Quadrant <- raster("C:\\Users\\customer\\Documents\\ ... \\2434-44.tif")
# 3. I process the raster to get landscape metrics tibble:
LS_metrics <- calculate_lsm(landscape = Quadrant)
# 4. Finally, I write it into a csv:
write.csv(LS_metrics, file = "2434-44.csv")
I need to keep the same file name for my csv files as I had for tif (e.g. results from processing quadrant "2434-44.tif", need to be stored in "2434-44.csv", possibly in a folder in wd).
I am new to R. I tried to use list.files() and then apply a for loop, but my code did not work.
I need your advice.
Yours faithfully,
Denis
Your question is really about iteration and character (filename) manipulation; not about landscapemetrics etc. There are many similar questions on this site and resources elsewhere that you can consult. The basic approach can be like this:
# get input filenames
inf <- list.files("/my/path", pattern="\\.tif$", full=TRUE)
# create output filenames
outf <- gsub(".tif", ".csv", basename(inf))
# perhaps put output files in particular folder
dir.create("out", FALSE, FALSE)
outf <- file.path("out", outf)
# iterate
for (i in 1:length(inf)) {
# read input
input <- raster(inf[i])
# do something
output <- data.frame(id=1)
# write output
write.csv(output, outf[i])
}
It's very hard to help without further information. What was the issue with your approach of looping through all files using list.files(). In general, this should work.
Furthermore, most likely you don't want to calculate all available landscape metrics, but rather specify a subselection during the calculate_lsm() function call.

how to use "for loop" to write multiple .csv file names?

Does anyone know the best way to carry out a "for loop" that would read in different subject id's and append them to the name of an exported csv?
As an example, I have multiple output files from an electrocardiogram software program (each file belongs to one individual). The files are named C800_HR.bdf.evt, C801_HR.bdf.evt, C802_HR.bdf.evt etc. Each file gets read into r and then has a script applied to calculate heart rate variability. At the end of the script, I need to add a loop that will extract the subject id (e.g., C800, C801, C802) and write a new file name for each individual so that it becomes C800_RtoR.csv. Essentially, I would like to avoid changing the syntax every time I read in and export a file name.
I am currently using the following syntax to read in multiple files:
>setwd("/Users/kmpc/Downloads")
>myhrvdata <-lapply(Sys.glob("C8**_HR.bdf.evt"), read.delim)
Try this out:
cardio_files <- list.files(pattern = "C8\\d{2}_HR.bdf.evt")
subject_ids <- sub("^(C8\\d{2})_.*", "\\1" cardio_files)
myList <- lapply(cardio_files, read.delim)
## do calculations on the list
for (i in names(myList)) {
write.csv(myList[[i]], paste0(subject_ids[i], "_RtoR.csv"))
}
The only thing is, you have to deal with using a list when doing your calculations. You could combine them to a single data.frame, but it would be best to leave it as a list to write the files at the end.
Consider generalizing your process by creating a function that: 1) reads in file, 2) processes data, 3) outputs to csv. Then have lapply call the defined method iteratively across all Sys.glob items and even return a list of calculated data frames.
proc_heart_rate <- function(f_name) {
# READ IN .evt FILE INTO df
df <- read.delim(f_name)
# CALCULATE HEART RATE VARIABILITY WITH df
...
# OUTPUT df TO CSV
subject_id <- gsub("\\_.*", "", f_name)
write.csv(df, paste0(subject_id, "_RtoR.csv"))
# RETURN df FOR OTHER USES
return(df)
}
# LIST OF DATA FRAMES WITH CALCULATIONS
myhrvdata_list <-lapply(Sys.glob("C8**_HR.bdf.evt"), proc_heart_rate)

How to merge netcdf files separately in R?

I am a user of R and would like some help in the following:
I have two netcdf files (each of dimensions 30x30x365) and one more with 30x30x366. These 3 files contain a year's worth of daily data, where the last dimension refers to the time dimension. I wanted to combine them separately i.e. I wanted the output file to contain 30x30x1096.
Note: I have seen a similar question but the output results in an average (i.e. 30x30x3) which I do not want.
from the comment I see below you seem to want to merge 3 files in the time dimension. As an alternative to R, you could do this quickly from the command line using cdo (climate data operators):
cdo mergetime file1.nc file2.nc file3.nc mergedfile.nc
or using wildcards:
cdo mergetime file?.nc mergedfile.nc
cdo is easy to install under ubuntu:
sudo apt install cdo
Without knowing exactly what dimensions and variables you have, this may be enough to get you started:
library(ncdf4)
output_data <- array(dim = c(30, 30, 1096))
files <- c('file1.nc', 'file2.nc', 'file3.nc')
days <- c(365, 365, 366)
# Open each file and add it to the final output array
for (i in seq_along(files)) {
nc <- nc_open(files[i])
input_arr <- ncvar_get(nc, varid='var_name')
nc_close(nc)
# Calculate the indices where each file's data should go
if (i > 1) {
day_idx <- (1:days[i]) + sum(days[1:(i-1)])
} else {
day_idx <- 1:days[i]
}
output_data[ , , day_idx] <- input_arr
}
# Write out output_data to a NetCDF. How exactly this should be done depends on what
# dimensions and variables you have.
# See here for more:
# https://publicwiki.deltares.nl/display/OET/Creating+a+netCDF+file+with+R

Populating a matrix (or a DF or a DT) with a loop from a folder containing txt files

I wrote my first code in R for treating some spectra [basically .txt files with a Xcol (wavelength) and Ycol (intensity)].
The code works for single files, provided I write the file name in the code. Here the code working for the first file HKU47_PSG_1_LW_0.txt.
setwd("C:/Users/dd16722/R/Raman/Data")
# import Spectra
PSG1_LW<-read.table("HKU47_PSG_1_LW_0.txt")
colnames(PSG1_LW)[colnames(PSG1_LW)=="V2"] <- "PSG1_LW"
PSG2_LW<-read.table("HKU47_PSG_2_LW_all_0.txt")
colnames(PSG2_LW)[colnames(PSG2_LW)=="V2"] <- "PSG2_LW"
#Plot 2 spectra and define the Y range
plot(PSG1_LW$V1, PSG1_LW$PSG1_LW, type="l",xaxs="i", yaxs="i", main="Raman spectra", xlab="Raman shift (cm-1)", ylab="Intensity", ylim=range(PSG1_LW,PSG2_LW))
lines(PSG2_LW$V1, PSG2_LW$PSG2_LW, col=("red"), yaxs="i")
# Temperature-excitation line correction
laser = 532
PSG1_LW_corr <- PSG1_LW$PSG1_LW*((10^7/laser)^3*(1-exp(-6.62607*10^(-34)*29979245800*PSG1_LW$V1/(1.3806488*10^(-23)*293.15)))*PSG1_LW$V1/((10^7/laser)-PSG1_LW$V1)^4)
PSG1_Raw_Corr <-cbind (PSG1_LW,PSG1_LW_corr)
lines(PSG1_LW$V1, PSG1_LW_corr, col="red")
plot(PSG1_LW$V1, PSG1_Raw_Corr$PSG1_LW_corr, type="l",xaxs="i", yaxs="i", xlab="Raman shift (cm-1)", ylab="Intensity")
Now, it's time for another little step forward. In the folder, there are many spectra (in the code above I reported the second one: HKU47_PSG_2_LW_all_0.txt) having again 2 columns, same length of the first file. I suppose I should merge all the files in a matrix (or DF or DT).
Probably I need a loop as I need a code able to check automatically the number of files contained in the folder and ultimately to create an object with several columns (i.e. the double of the number of the files).
So I started like this:
listLW <- list.files(path = ".", pattern = "LW")
numLW <- as.integer(length(listLW))
numLW represents the number of iterations I need to set. The question is: how can I populate a matrix (or DF or DT) in order to have in the first 2 columns the first txt file in my folder, then the second file in the 3rd and 4th columns etc? Considering that I need to perform some other operations as I showed above in the code.
I have been reading about loop in R since yestarday but actually could not find the best and easy solution.
Thanks!
You could do something like
# Load data.table library
require(data.table)
# Import the first file
DT_final <- fread(file = listLW[1])
# Loop over the rest of the files and use cbind to merge them into 1 DT
for(file in setdiff(listLW, listLW[1])) {
DT_temp <- fread(file)
DT_final <- cbind(DT_final, DT_temp)
}

Opening and reading multiple netcdf files with RnetCDF

Using R, I am trying to open all the netcdf files I have in a single folder (e.g 20 files) read a single variable, and create a single data.frame combining the values from all files. I have been using RnetCDF to read netcdf files. For a single file, I read the variable with the following commands:
library('RNetCDF')
nc = open.nc('file.nc')
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,240))
where 414 & 315 are the longitude and latitude of the value I would like to extract and 240 is the number of timesteps.
I have found this thread which explains how to open multiple files. Following it, I have managed to open the files using:
filenames= list.files('/MY_FOLDER/',pattern='*.nc',full.names=TRUE)
ldf = lapply(filenames,open.nc)
but now I'm stuck. I tried
var1= lapply(ldf, var.get.nc(ldf,'LWdown',start=c(414,315,1),count=c(1,1,240)))
but it doesn't work.
The added complication is that every nc file has a different number of timestep. So I have 2 questions:
1: How can I open all files, read the variable in each file and combine all values in a single data frame?
2: How can I set the last dimension in count to vary for all files?
Following #mdsummer's comment, I have tried a do loop instead and have managed to do everything I needed:
# Declare data frame
df=NULL
#Open all files
files= list.files('MY_FOLDER/',pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])
# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'LWdown')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,x[3]))
# Add the values from each file to a single data.frame
rbind(df,data.frame(lw))->df
}
There may be a more elegant way but it works.
You're passing the additional function parameters wrong. You should use ... for that. Here's a simple example of how to pass na.rm to mean.
x.var <- 1:10
x.var[5] <- NA
x.var <- list(x.var)
x.var[[2]] <- 1:10
lapply(x.var, FUN = mean)
lapply(x.var, FUN = mean, na.rm = TRUE)
edit
For your specific example, this would be something along the lines of
var1 <- lapply(ldf, FUN = var.get.nc, variable = 'LWdown', start = c(414, 315, 1), count = c(1, 1, 240))
though this is untested.
I think this is much easier to do with CDO as you can select the varying timestep easily using the date or time stamp, and pick out the desired nearest grid point. This would be an example bash script:
# I don't know how your time axis is
# you may need to use a date with a time stamp too if your data is not e.g. daily
# see the CDO manual for how to define dates.
date=20090101
lat=10
lon=50
files=`ls MY_FOLDER/*.nc`
for file in $files ; do
# select the nearest grid point and the date slice desired:
# %??? strips the .nc from the file name
cdo seldate,$date -remapnn,lon=$lon/lat=$lat $file ${file%???}_${lat}_${lon}_${date}.nc
done
Rscript here to read in the files
It is possible to merge all the new files with cdo, but you would need to be careful if the time stamp is the same. You could try cdo merge or cdo cat - that way you can read in a single file to R, rather than having to loop and open each file separately.

Resources