Memory error using ncks to set time as record dimension - netcdf

I am trying to use ncks to set the time as record dimension in a large (14 GB) file and getting the following error
$ ncks -O --mk_rec_dmn time vorticity_1979_1.40625deg.nc test.nc
nco_def_var_chunking(): ERROR Total requested chunk size = 14926479360 exceeds netCDF
maximium-supported chunk size = 4294967295
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered
error): nco_def_var_chunking()
nco_err_exit(): ERROR Error code is -127. Translation into English with nc_strerror(-127) is
"NetCDF: Bad chunk sizes."
nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)

I was able to solve this using the following command
ncks -6 -O --mk_rec_dmn time vorticity_1979_1.40625deg.nc test.nc

Not sure why it automatically chose too large of a chunk size. Changing format to netCDF-64 bit works because then chunking is not an issue. You could also, I think, keep the format as netCDF4 by explicitly setting the chunk size to that of e.g., a single timestep with
ncks -O --cnk_dmn time,1 --mk_rec_dmn time vorticity_1979_1.40625deg.nc test.nc

Related

Failed to concatenate global layer netCDF data using NCO

I am using monthly global potential evapotranspiration data from TerraClimate from 1958-2020 (available as 1 nc per year) and planning to concatenate all into single nc file.
The data has a variable pet and three dimension ppt(time,lat,lon).
I managed to combine all of the data using cod mergetime TerraClimate_*.nc and generate around 100GB of output file.
For analysis purpose in Windows machine, I need single netCDF file with order lat,lon,time. What I have done is as follows:
Reorder the dimension from time,lat,lon into lat,lon,time using ncpdq command
for fl in *.nc; do ncpdq -a lat,lon,time $fl ../pet2/$fl; done
Loop all file in the folder to make time the record dimension/variable used for concatenating files using ncks command
for fl in *.nc; do ncks -O --mk_rec_dmn time $fl $fl; done
Concatenates all nc files in the folder into one nc file using ncrcat command
ncrcat -h TerraClimate_*.nc -O TerraClimate_pet_1958_2020.nc
It's worked, but the result is not what I expected, it generate 458KB size of file, when I check the result using Panoply it provide wrong information, all have value -3276.7. See below picture.
I have check the files from step 1 and 2, and everything is correct.
I also try to concatenate only 2 files, using 1958 and 1959 data (each file 103MB), but the result still not what I expected.
ncrcat -h TerraClimate_pet_1958.nc TerraClimate_pet_1959.nc -O ../TerraClimate_pet_1958_1959.nc
Did I missed something on the code or write the wrong code? Any suggestion how to solve the problem?
UPDATE 1 (22 Oct 2021):
Here's the metadata of original data downloaded from above link.
UPDATE 2 (23 Oct 2021):
Following suggestion from Charlie, I did unpack for all the data from point 2 above using below command.
for fl in *.nc4; do ncpdq --unpack $fl ../unpack/$fl; done
Here's the example metadata from unpack process.
And the data visualised using Panoply.
Then I did test to concatenate again using 2 data from unpack process (1958 and 1959)
ncrcat -h TerraClimate_pet_1958.nc TerraClimate_pet_1959.nc -O ../TerraClimate_pet_1958_1959.nc
Unfortunately the result remain same, I got result with size 1MB. Below is the metadata
And visualised the ncrcat result using Panoply
Your commands appear to be correct, however I suspect that the data in the input files is packed. As explained in the ncrcat documentation here, the input data should be unpacked (e.g., with ncpdq --unpack) prior to concatenating all the input files (unless they all share the same values of scale_factor and add_offset). If that does not solve the problem, then (1) there is likely an issue with _FillValue and (2) please post the pet metadata from a sample input file.

Select data along non-conventional dimension with CDO or NCO

I have a large number of NetCDF files from which I would like to extract a small number of variables for one location, and merge them into a new NetCDF file. The dimensions of the files are:
dimensions:
time = 18 ;
level = 65 ;
levelh = 66 ;
domain = 36 ;
I can subtract/merge the files for all domains with something like:
cdo select,name=u,v file1.nc file2.nc out.nc
But all other operators seem to be related to selections in space (e.g. sellonlatbox) or time (e.g. seltimestep), but I can't find a way to select only 1 domain from the NetCDF files. Is this possible with CDO's or NCO's?
Not sure I fully understand the question/intent. NCO treats all dimensions equally. If you want domain #17 then try
ncrcat -v u,v -d domain,17 file1.nc file2.nc out.nc
If file1.nc and file2.nc are not sequential in a record coordinate then try
ncecat -v u,v -d domain,17 file1.nc file2.nc out.nc
ADDED 20180929:
or if you don't like that, and the files do not have a record dimension yet are time-sequential then before using ncrcat turn the temporal dimension into a record coordinate for each file with
ncks -O --mk_rec_dmn time file1.nc file1.nc
ncks -O --mk_rec_dmn time file2.nc file2.nc
...
etc. and proceed as above. That may be the best way forward with NCO.

How old is file?

I have a shell script that will check a file is how many days old. I did stat -f "%m%t%Sm %N" "$file" . But I want to store this into a variable and then compare current time and file created time ?
Assuming you're using bash, you can capture the output of commands with something like:
fdate=$(stat -f "%m%t%Sm %N" "$file")
and then do whatever you will with the results:
echo ${fdate}
That's assuming the command itself works in the first place. If you are, you can ignore the text below.
The GNU stat program uses -f to specify you want to query the filesystem rather than a file and the other options you have don't seem to make sense in the context of your question.
Using Gnu stat, you can get the time since the last file update(1) as:
ageInSeconds=$(($(date -u +%s) - $(stat --printf "%Y" "file")))
The subtracts the last modification time of the file from the current time (both expressed as seconds since the epoch) to give you the age in seconds.
To turn that into days, assuming you're not overly concerned about the possible error from leap seconds (an error of, at most, one part in about 15.7 million, or 0.000006%), you can just divide it by 86,400:
ageInDays=$((($(date -u +%s) - $(stat --printf "%Y" "file")) / 86400))
(1) Note that, although stat purports to have a %W format specifier that gives the birth of the file, this doesn't always work (it returns zero). You could check that first if you're really interested in when the file was created rather than last updated but you may have to be prepared to accept the possibility the information is not available. I've used last modification time above since, frequently, it's used for things like detecting changes.

How to identify errors when loading data into BigQuery

While importing a ~5 GB file with ~41 million rows into BigQuery, I received the following error message:
Errors:
File: 0 / Offset:4026531933 / Line:604836 / Field:39, Value cannot be converted to expected type.
My question: how would I use the Offset / Line information in the error message above to determine the line number of the offending record?
For large files, BigQuery splits them up into large pieces and loads them in parallel. That means BigQuery doesn't know how many lines come before a particular piece, since the file was chunked by byte ranges. The offset mentioned is the start of the chunk from the beginning of the file, in bytes. So the error should come at 604836 lines after the 4026531933th byte.
You can isolate the line with the bad value on Unix with:
tail -c +4026531933 <input file> | head -n $((604836 + 1)) | tail -1
Or with sed:
tail -c + | sed -n $(( + 1))p

Why the Target Duration of the created segments are not the one i specified

I am using the following command
ffmpeg -i Apple.ts -map 0:0 -map 0:1 -c copy -f segment -segment_time 10 -segment_list test.m3u8 -segment_format ts 'fileSequence%d.ts'
The files do get segmented but the values are not precise. see in the .m3u8 generated below the target duration is 12 but i specified 10
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ALLOWCACHE:1
#EXTINF:11.478100,
fileSequence0.ts
#EXTINF:10.410400,
fileSequence1.ts
#EXTINF:11.745067,
fileSequence2.ts
#EXTINF:7.841167,
fileSequence3.ts
#EXTINF:8.024678,
fileSequence4.ts
#EXT-X-TARGETDURATION:12
#EXT-X-ENDLIST
Also if i dont want the floting point duration how do i do it?
To get strict duration you need to ensure your input file has been encoded with a strict I picture rate. Ensure you have set keyframe interval to 10 seconds and sc_threshold to 0 while encoding the original mp4.with ffmpeg. Then run your segmentation command. This will give you exact 10 second segments.
Otherwise ffmpeg will try and cut around the 10 second mark ( or whatever duration you have given ) at the closes I picture it can find.

Resources