How to concatenate multiple netCDF files with varying dimension sizes? - netcdf

I have 20 netCDF files containing oceanographic CTD data. Each file contains the same dimension and variable names, however they differ in the size of the vertical coordinate (ie. CTD profiles inshore have a smaller depth range than profiles offshore). I need to concatenate these separate files into one netCDF file with a record variable "station".
I have tried:
ncecat -u station *.nc outfile.nc
This concatenates the files in the correct way, but it takes the dimension size of the first netCDF file (which is the smallest) and so I lose the data below the depth of the shallowest CTD profile for the rest of the netCDF files.
I'm assuming I need to add FillValues (or similar) in place of the data that is shallower than the maximum depth of the deepest CTD profile.
Is there a way to do this using ncecat?

The closest you can get with ncecat alone is to use group aggregation to store each station profile as its own group in a netCDF4 file. Then you do not need to search for and fill-in any missing data:
ncecat --gag *.nc outfile.nc

Related

making shape files with zeros values eliminated

I am trying to make a shapefile where the zero values are taken out in order to reduce the amount of space used. I attempted to do this through the following procedure but only got more zero values displayed:
cdo gtc,0.000008 precip_2022110815_0p50.nc mask1.nc
cdo setctomiss,0 mask1.nc mask2.nc
cdo mul precip_2022110815_0p50.nc mask2.nc precip_2022110815_0p50_adjust.nc
cdo setctomiss,0 precip_2022110815_0p50_adjust.nc precip_2022110815_0p50_final.nc
gdal_contour -a precip -i 0.00001 precip_2022110815_0p50_final.nc precip_2022110815_0p50.shp
I got the netcdf from a grib file that was obtained from ftp.ncep.noaa.gov.
Is there anyway I could tweak this code or other methods I could use to get a shape file where all zero values are filtered out or even values below a certain threshold are filtered out? Is there a way to filter the values below a certain amount out from a grib2 file?

How can i split a netCDF file with conditions

I'm working with a netCDF file with a spatially averaged wind variable, which is a function of time only.
I would like to split the file into years with east wind and years with west wind.
I thought I would do it with cdo but I don't know how to write the condition.
Anything with splityear, 'u <0'?
I do not think this is advisable, as you will split the files in to two different NetCDF files with incompatible grids. In my view this would defeat the purpose of storing the data in NetCDF files.
But, if you wish to do it, there is a way within CDO. As you haven't provided files I can outline a strategy.
First create a mask file identifying cells with u<0:
cdo -setrtomiss,-10000,0 -selname,u infile.nc mask.nc
Then apply reducegrid to the infile using this mask:
cdo -reducegrid,mask.nc infile.nc outfile.nc
That should do it for the u condition. Just test that and modify it for the other variables.

How do I use a for loop to open .ncdf files and average a matrix variable that has different values over all the files? (Using R Programming)

I'm currently trying to code an averaged matrix for all matrix values from a specific air quality variable (ColumnAmountNO2TropCloudScreened) positioned in different .ncdf4 files. The only way I was able to do it was listing all the files, opening them using lapply, creating a single NO2 variable for every ncdf. file and then applying abind to all of the variables. Even though I was able to do it, it took me a lot of time to type in different names for the NO2 variables (NO2_1, NO2_2,NO2_3,etc) and which row to access the original listed file ([[1]],[[2]],[[3]],etc).
I am trying to type in a code that's smarter and easier than just typing in a bunch of numbers. I have all the original .ncdf4 files listed, and am trying to loop over the files to open them and get the 'ColumnAmountNO2TropCloudScreened' matrix value from each, so then I can average them. However, I am having no luck. Would someone know what is wrong with this code/my thought over it? Thanks.
I'm trying the code as it follows:
# Load libraries
library(ncdf4)
library(abind)
library(plot.matrix)
# Set working directory
setwd("~/researchdatasets/2020")
# Declare data frame
df=NULL
# List all files in one file
files1= list.files(pattern='\\.nc4$',full.names=FALSE)
# Loop to open files, get NO2 variables
for(i in seq(along=files1)) {
nc_data = nc_open(files1[i])
NO2_var<-ncvar_get(nc_data,'ColumnAmountNO2TropCloudScreened')
nc_close(nc_data)
}
# Average variables
list_NO2= apply(abind::abind(NO2_var,along=3),1:2,mean,na.rm=TRUE)
NCO's ncra averages variables across all input files with, e.g.,
ncra in*.nc out.nc

Reading multiples uneven .csv files in one dataframe with R

I need the following help if possible please let me know your comments
My ObjectTive:-
I had multiples .csv files one location.
all .csv files have different numbers of row(m) and column (n) i.e. m=!n
all csv files have an almost similar date (Calendar day & time stamps eg: 04/01/2016 7:01) but the interesting point is some data have some time stamps missing
All .csv files have following data common ( Open,High,Low, Close,Date).
My Objective is to import only "Close" column from all .csv files but each file have different numbers of rows as some time stamps data is missing in some files.
If on any case any time stamps data is missing but the previous present then repeat previous values.
If on any case any time stamps data is missing and the previous also missing then put 'NA' on it. This is only applicable for first few data points.
Here is my planning:-
Reading/Writing Files: We’ll need to implement a logic to read files in a certain fashion and then write separate files for different sets of instruments separately.
Inconsistent time series: You’ll notice that the time series is not consistent and continuous for some securities, so you need to generate your own datetime stamps and then fill data against each datestamp (wherever available).
Missing data points: There will be certain timestamps against which you don’t have the data, make your timeseries continuous by filling in the missing points with data from pervious timestamp.
Maybe try
read_in <- function(csv){
f <- read.csv(csv)
f <- f[!is.na(f$time_stamp),]
f$close
}
l <- lapply(csv_list, read_in)
df <- rbindlist(l)

How to get RPKM value from bed file or wig files? And what's the difference between these two type of files?

I want to download fastq raw file from RNAseq to get gene expression values. But GEO only provides .bed.gz and .wig.gz formats. What can I do to get the RPKM values? Thank you very much!
In order to calculate RPKM, you need (mapped) raw reads as contained in BAM/SAM or even CRAM files. Wiggle, BED and their derivatives such as bigWiggle are compressed versions of those only containing the coverage (mainly used for plotting), that is they have lost the read information needed for counting and therefore calculating RPKM (or FPKM/TPM for that manner).
The standard approach is to start from a bam file, extract the reads counts for regions of interest and calculate RPKM etc. There is many pipelines out there such as this.
If Bam files are not available, GEO usually has at least the raw fastq files (or sra files that can be converted to fastq) as a basis for mapping to obtain a bam file. Also have a look at ArrayExpress, they could have the raw files for that project since it is mirroring GEO.
Maybe as a word of warning, if you intend to do differential expression analysis, you need to go from the raw counts, not the RPKM values.

Resources