How to extract variable names from a netCDF file in R? - r

I am writing a function in R to extract some air quality modelling data from netCDF files. I have the Package "ncdf" installed.
In order to allow other users or myself to choose what variables to extract from a netCDF file, I would like to extract the names of all variables in the file, so that I can present in a simple list rather than just print.ncdf() the file to give too much information. Is there any way of doing it?
I tried unlist() to the var field of the ncdf object but it seemed that it returned the contents as well...
I googled and searched stack*overflow* but didn't seem to find an answer, so your help is very much appreciated.
Many thanks in advance.

If your ncdf object is called nc, then quite simply:
names(nc$var)
With an example, using the dataset downloaded here, for instance (since you didn't provide with one):
nc <- open.ncdf("20130128-ABOM-L4HRfnd-AUS-v01-fv01_0-RAMSSA_09km.nc")
names(nc$var)
[1] "analysed_sst" "analysis_error" "sea_ice_fraction" "mask"

It is now 2016. ncdf package is deprecated.
Same code as SE user plannapus' answer is now:
library(ncdf4)
netcdf.file <- "flux.nc"
nc = ncdf4::nc_open(netcdf.file)
variables = names(nc[['var']])
#print(nc)
A note from the documentation:
Package: ncdf
Title: Interface to Unidata netCDF Data Files
Maintainer: Brian Ripley <ripley#stats.ox.ac.uk>
Version: 1.6.9
Author: David Pierce <dpierce#ucsd.edu>
Description: This is deprecated and will be removed
from CRAN in early 2016: use 'RNetCDF' or 'ncdf4' instead.
Newer package "ncdf4" is designed to work with the netcdf library
version 4, and supports features such as compression and
chunking.Unfortunately, for various reasons the ncdf4 package must have
a different API than the ncdf package.
A note from the home page of the maintainer:
Package ncdf4 -- use this for new code
The "ncdf4" package is designed to work with the netcdf library, version 4.
It includes the ability to use compression and chunking,
which seem to be some of the most anticipated benefits of the version 4
library. Note that the API of ncdf4 has to be different
from the API of ncdf, unfortunately. New code should use ncdf4, not ncdf.
http://cirrus.ucsd.edu/~pierce/ncdf/

Related

Get variable data out of a group in a NetCDF file using RNetCDF or ncdf4

I am trying to access data from variables within a NetCDF file that contains hierarchical groups using R. For example:
I can't find anything about how to do this in the RNetCDF documentation - though this seems to be out of date online.
Latest version: https://www.rdocumentation.org/packages/RNetCDF/versions/2.6-1
Latest documented version: https://www.rdocumentation.org/packages/RNetCDF/versions/1.9-1
I am open to using ncdf4, though would rather do this in RNetCDF since I think the syntax is easier to read and I use R only for teaching purposes.
I can do this in Python using xarray - but need an R solution in this case. Thanks!

How to include raw data in an R package

I'm working on the final assignment of the course Building R Packages.
In this assignment, we need to create an R package based on some example functions provided by the instructors. We need to organize and document the package, then make it available on GitHub. My package is called FARS and is already available in this GitHub repo.
I'm having trouble with making raw data available with the package. After following the instructions provided in the course's readings and also in chapter 14.3 of the book Building R Packages, the files are still not being recognized.
What did I do so far?
Prepared all the package's documentation, including roxygen2 tags, DESCRIPTION, README.Md, and vignette, following these steps in addition to instructions provided in the readings and book mentioned;
Created a subdirectory named inst/extdata in the package's directory;
Copied all three example files (.csv.bz2) with raw data to inst/extdata;
Tested the functions using testthat;
Installed my FARS package.
Now I'm trying to check if one of the files is available after installing the package:
system.file("extdata", "accident_2013.csv.bz2",
package = "FARS",
mustWork = TRUE)
I get an error message:
Error in system.file("extdata", "accident_2013.csv.bz2", package = "FARS", :
no file found
These data files need to be available with the package, so the examples provided in the vignette work properly.
Here's a "real-life" example, using a simple package I wrote recently.
I have a "data" directory in the build directory.
EDIT To clarify the comments found in R-exts, the directory tree packagename/inst/extdata is intended for data that your functions call directly, by specifying that directory path. Since you want to load data into your workspace, use the data directory.
My "data" directory contains one file named preciseNumbersAsChar.r . That file contains assignments such as
charE <- {long number string}
If you read the help page for the command data, it explains that files ending in .r are sourced when called.
library(FunWithNumbers)
data('preciseNumbersAsChar') #works
Which is to say, the defined objects are now in my environment.
It's worth reading the help page for data in detail as different file types are handled slightly differently.

How to write an effective loop to access datasets inside 1000's of h5 files in R [duplicate]

I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5 object with this package, and how to actually extract the matrix?
UPDATE
I found out a package rhdf5 which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5 file as a python pickle. So every time I tried to open it and access it through R i got a segmentation fault. I did figure out how to save the matrix from within python as a tsv file and now that problem is solved.
The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor
# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11")
And to use it:
library(rhdf5)
List the objects within the file to find the data group you want to read:
h5ls("path/to/file.h5")
Read the HDF5 data:
mydata <- h5read("path/to/file.h5", "/mygroup/mydata")
And inspect the structure:
str(mydata)
(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.
You could also use h5, a package which I recently published on CRAN.
Compared to rhdf5 it has the following features:
S4 object model to directly interact with HDF5 objects like files, groups, datasets and attributes.
Simpler syntax, implemented R-like subsetting operators for datasets supporting commands like
readdata <- dataset[1:3, 1:3]
dataset[1:3, 1:3] <- matrix(1:9, nrow = 3)
Supported NA values for all data types
200+ Test cases with a code coverage of 80%+.
To save a matrix you could use:
library(h5)
testmat <- matrix(rnorm(120), ncol = 3)
# Create HDF5 File
file <- h5file("test.h5")
# Save matrix to file in group 'testgroup' and datasetname 'testmat'
file["testgroup", "testmat"] <- testmat
# Close file
h5close(file)
... and read the entire matrix back into R:
file <- h5file("test.h5")
testmat_in <- file["testgroup", "testmat"][]
h5close(file)
See also h5 on
CRAN: http://cran.r-project.org/web/packages/h5/index.html
Github: https://github.com/mannau/h5
I used the rgdal package to read HDF5 files. You do need to take care that probably the binary version of rgdal does not support hdf5. In that case, you need to build gdal from source with HDF5 support before building rgdal from source.
Alternatively, try and convert the files from hdf5 to netcdf. Once they are in netcdf, you can use the excellent ncdf package to access the data. The conversion I think could be done with the cdo tool.
The ncdf4 package, an interface to netCDF-4, can also be used to read hdf5 files (netCDF-4 is compatible with netCDF-3, but it uses hdf5 as the storage layer).
In the developer's words:
the HDF group says:
NetCDF-4 combines the netCDF-3 and HDF5 data models, taking the desirable characteristics of each, while taking advantage of their separate strengths
Unidata says:
The netCDF-4 format implements and expands the netCDF-3 data model by using an enhanced version of HDF5 as the storage layer.
In practice, ncdf4 provides a simple interface, and migrating code from using older hdf5 and ncdf packages to a single ncdf4 package has made our code less buggy and easier to write (some of my trials and workarounds are documented in my previous answer).

How to deal with hdf5 files in R?

I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5 object with this package, and how to actually extract the matrix?
UPDATE
I found out a package rhdf5 which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5 file as a python pickle. So every time I tried to open it and access it through R i got a segmentation fault. I did figure out how to save the matrix from within python as a tsv file and now that problem is solved.
The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor
# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11")
And to use it:
library(rhdf5)
List the objects within the file to find the data group you want to read:
h5ls("path/to/file.h5")
Read the HDF5 data:
mydata <- h5read("path/to/file.h5", "/mygroup/mydata")
And inspect the structure:
str(mydata)
(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.
You could also use h5, a package which I recently published on CRAN.
Compared to rhdf5 it has the following features:
S4 object model to directly interact with HDF5 objects like files, groups, datasets and attributes.
Simpler syntax, implemented R-like subsetting operators for datasets supporting commands like
readdata <- dataset[1:3, 1:3]
dataset[1:3, 1:3] <- matrix(1:9, nrow = 3)
Supported NA values for all data types
200+ Test cases with a code coverage of 80%+.
To save a matrix you could use:
library(h5)
testmat <- matrix(rnorm(120), ncol = 3)
# Create HDF5 File
file <- h5file("test.h5")
# Save matrix to file in group 'testgroup' and datasetname 'testmat'
file["testgroup", "testmat"] <- testmat
# Close file
h5close(file)
... and read the entire matrix back into R:
file <- h5file("test.h5")
testmat_in <- file["testgroup", "testmat"][]
h5close(file)
See also h5 on
CRAN: http://cran.r-project.org/web/packages/h5/index.html
Github: https://github.com/mannau/h5
I used the rgdal package to read HDF5 files. You do need to take care that probably the binary version of rgdal does not support hdf5. In that case, you need to build gdal from source with HDF5 support before building rgdal from source.
Alternatively, try and convert the files from hdf5 to netcdf. Once they are in netcdf, you can use the excellent ncdf package to access the data. The conversion I think could be done with the cdo tool.
The ncdf4 package, an interface to netCDF-4, can also be used to read hdf5 files (netCDF-4 is compatible with netCDF-3, but it uses hdf5 as the storage layer).
In the developer's words:
the HDF group says:
NetCDF-4 combines the netCDF-3 and HDF5 data models, taking the desirable characteristics of each, while taking advantage of their separate strengths
Unidata says:
The netCDF-4 format implements and expands the netCDF-3 data model by using an enhanced version of HDF5 as the storage layer.
In practice, ncdf4 provides a simple interface, and migrating code from using older hdf5 and ncdf packages to a single ncdf4 package has made our code less buggy and easier to write (some of my trials and workarounds are documented in my previous answer).

communicating with SAS datasets from R

I have a bunch of datasets that are in SAS format. I would like to avoid using SAS since I think R provides more than enough functionality for me. Therefore, is there a package that would allow me to interact with the SAS datasets from R? I have the SAS software installed but I would like to avoid coding things in multiple languages.
Since you have SAS, you can use Frank Harrell's 'Hmisc' package which has sas.get and sasxport.get functions. It also has a bunch of utility functions: label,sas.get, contents,describe. For those without a SAS license, package 'foreign' has read.ssd, lookup.xport, and read.xport.
EDIT1: I will also mention that Anthony Joseph Damico recently announced a package to parse SAS INPUT code into read.fwf code. From its description file: " Using importation code designed for SAS users to read ASCII files into sas7bdat files, the SAScii package parses through the INPUT block of a (.sas) syntax file to design the parameters needed for a read.fwf() function call."
EDIT2: There is also a package by Matt Shotwell called 'sas7bdat' with read.sas7bdat(file) that describes its function as " Read SAS files in the sas7bdat data format."
More recently, the haven package can read and write sas7bdat and SAS xpt files. This package is consistent with other import/export packages in the tidyverse.
There is also a package called libr that simulates a SAS libname() function almost exactly. This package is part of a system of packages called sassy that recreates many basic SAS concepts in R.

Resources