Batch read netcdf files and average one variable - r

I'm a new R user. I now have daily netcdf data for year 1979 such as these:
sm19790101.1.nc
sm19790102.1.nc
.
.
.
sm19791231.1.nc
I need to average a variable called "sm" to monthly resolution. I can now do this:
glob2rx("sm197901*.1.nc")
jan<-list.files(pattern=glob2rx("sm197901*.1.nc"),full.names=TRUE)
to port all January data to jan, but I don't know how to open each file and get specific variable (I've had Rnetcdf package installed) . If I were to do this manually, it should be:
s19790101<-open.nc("sm19790101.1.nc")
sm19790101<-var.get.nc(s19790101,"sm",na.mode=0)
and then average them...
I guess the question is how to read files with a variable (e.g. 01-31) as part of the file name and then loop through the whole month.

If you have a lot of data to summarize, you could summarize the daily data into monthly means with the NetCDF Operator tool http://nco.sourceforge.net/nco.html#ncra-netCDF-Record-Averager
ncra DAILY/sm197901[*].1.nc MONTHLY/sm197901.1.nc

It looks like you can paste together the filename components "sm197901", day, ".1.nc" construct the desired filename.
#make sure it has a leading 0
days = formatC(1:31, width=2, flag="0")
ncfiles = lapply(days, function(d){
filename = paste("sm197901", d, ".1.nc", sep="")
#print(filename)
open.nc(filename)
})

parallel to Dave's ncra answer you can also do it with cdo
cdo mergetime sm1979????.1.nc year.nc
# you only need this next step if there is more than one variable in the file:
cdo selvar,sm year.nc yearsm.nc
cdo monmean year.nc month.nc
On some systems the number of open files is limited to 256 - if this is your case you can replace "mergetime" with "cat" and I think it should still work since the files will be listed in time order.

Related

How to shift timestamps in multiple sequential file names (wav files, from bioacoustic monitoring) using R

I deployed passive acoustic recorders to detect animal calls but on one of my devices the date & time settings got messed up and reverted to the default (Jan 1, 2000). And then the device recorded for a month, writing 20-min files that are day & time stamped. e.g., file name is swift1_20000101_020000.wav (deviceName_YYYYMMDD_HHMMSS). And now I reeeeally don't want to have to manually rename 2000+ individual files.
I know the actual start date & time from my field notes and I'm wondering if there's a way to input that actual start date/time and have all the files shift off of that. So swift1_20000101_000000 would become swift1_20220617_093000, swift1_20000101_002000 would become swift1_20220617_095000, and so on in some sort of loop.
Any ideas? I know you can rename files with file.rename(), paste0(), etc but I would need a function that iterated on all files within the directory sequentially and I haven't been able to find something that will do it. Any thoughts or ideas would be much appreciated!
It sounds like you want to add 8203 days, 9 hours, and 30 minutes to the date/time implied in each file name. Try this. Assumes you are updating all the files in your current directory.
library(lubridate)
library(stringr)
files <- list.files(getwd())
for(f in 1:length(files)){
x <- files[f]
orig <- ymd_hms(paste0(substr(x, 8, 11),"-",substr(x,12,13),"-",substr(x,14,15)," ",
substr(x, 17,18),":",substr(x, 19,20),":", substr(x, 21,22)))
y <- as.character(orig + days(8203) + hours(9) + minutes(30))
new <- paste0("swift1_", str_replace(str_remove_all(y, "-|:"), " ","_"),".wav")
file.rename(from = x, to = new)
}
First we get all the file names from the directory. Then for each file, we extract the date components, combine, convert to a date, add the adjustment, then break the date components back out to form a new file name. Finally, we rename the original file.

R -find and replace within a script, iteratively [duplicate]

This question already has an answer here:
R: list files based on pattern
(1 answer)
Closed 1 year ago.
I have a somewhat complex script that is working well. It imports multiple .csvs, combines them, adjusts them, re-sorts them and writes them out as multiple new .csvs. All good.
The problem is that I need to run this script on each of 2100 files. Each .csv file has a name incorporating a seven or eight digit non-numeric string which also has other specific identifiers. There are numerous files with the same string suffix and the script works on all of them at once. An example of the naming system:
gfdlesm2g_45Fall_17100202.csv
ccsm4_45Fall_10270102.csv
bnuesm_45Fall_5130205.csv
mirocesmchem_45Fall_5010007.csv
The script begins with fnames <- dir("~/Desktop/modified_files/", pattern = "*_45Fall_1030001.csv")
And I need to replace the "1030001", in this case, with the next number. Right now I am using Find and Replace in RStudio to replace the seven (or eight) digit number each time the script has completed. I know there has to be a better way than to do this all manually for 2100 files.
All the research I've found is for iterating within a dataframe or whatever, in the columns or rows, and I can't process how to make this work for my needs.
I am thinking that if I made a vector of all the numbers (really they're names), like "01080204", "01090003", "01100001", "18020116", "18020125", "15080303", "16020301", "03170006", "04010101", "04010201", etc
There must be a way to say, in code, "now pick the next name, and run the script". I looked at the lapply, mapply, sapply family and couldn't seem to figure it out.
If you are looking for pattern in files _45Fall_ you can use list.files.
fnames <- list.files("~/Desktop/modified_files/", pattern = "*_45Fall_\\d+\\.csv$")

How can i List all files with .nc (netcdf) in a folder and extract 1 variable out of 10 variable?

My task is to get multiple similar NetCDF (.nc) files from a folder and stack one a variable out of 10 variables.
I used:
a <- list.files(path=ncpath, pattern = "nc$", full.names = TRUE)
This gets me all the files with .nc extenstion.
How to proceed for the second task?
I want this variable a from these number of files in a folder and stack them.
If you just want the output in a netcdf file, you might consider doing this task from the command line in linux and using cdo?
files=$(ls *.nc) # you might want to be more selective with your wildcard
# this loop selects the variable from each file and puts into file with the var name at the end
for file in $files ; do
cdo selvar,varname $file ${file%???}_varname.nc
done
# now merge into one file:
cdo merge *_varname.nc merged_file.nc
where you need to obviously replace varname with the name of your variable of choice.

Not able to append two netcdf files using nco

I am using netcdf operators to append two NCEP netCDF files together.
These files are of different sizes but they represent the same atmospheric variable i.e. geopotential height. One is at 1000 hPa and the other file is at 925 hPa.They have the same dimensions and same latitudinal and longitudinal extent. Both represent the same time instant
This is the command I am using - ncks -A hgt_1000.nc hgt_925.nc
The command runs without any issue but when I look at the output of hgt_925.nc it looks the files have not merged. Looking at the NCO documentation it looks they have to be the same size to append. Is there another way forward or should I write my own code to append ? These are netCDF4 files classic files downloaded using nccopy.
new answer, based on new user information:
Since your input files already have a level dimension, the proceducre to follow is here. Turn level into a record dimension, then concatenate files along it with ncrcat, then permute back with ncpdq. The manual has examples.
old answer:
What you want to do seems to be what NCO would handle with ncecat (appending is for copying new variables to existing files). Concatenate the files together and rename the resulting record variable as, e.g., level, with
ncecat -u level hgt_1000.nc hgt_925.nc out.nc
You can also use the CDO to merge the netcdf files.
The command cdo merge hgt_1000.nc hgt_925.nc out.nc

R how to combine data from a for loop in one txt file

I have one loop
for (chr in paste('chr',c(seq(1,22),'X','Y'),sep='')){
.
.
write.table(exp,file="list.txt",col.names =FALSE,row.names=TRUE,sep="\t")
}
right now the loop will give me only one txt with the data from the last loop (chromosome Y)
What i want is one txt file with the data from all the chromosomes/ from all the loops. Not 24 different txt (one per chromosome)
Thank you
Best regards
Anna
What you are not yet recognizing is that 'append' is set to FALSE by default and that you need to explicitly change it to TRUE if you want to record each 'exp'-object (which by the way is an unfortunate name for a data object because it's shared by a common math function). If you set append=TRUE and if the "exp"-object is a full record of one chromosome at each time the write.csv is called (hopefully with an 'chr'-identifier column so you can keep track of where the data came from), then you should succeed.

Resources