Reading an attribute value from a NetCDF4 file with nested groups - r

This should be trivial but I can't for the life of me figure out how to do this: I am trying to read an attribute value from a NetCDF4 file in R. Now, my NetCDF4 file (uploaded here) is fairly complex, i.e. it contains nested groups.
I would like to extract the value of the attribute called gml:posList from the group METADATA/EOP_METADATA/om:featureOfInterest/eop:multiExtentOf/gml:surfaceMembers/gml:exterior using R. I am not sure if it matters in this context, but this group does not contain any variables, only metadata attributes.
I have tried the following
library(ncdf4)
fid = nc_open('S5P_NRTI_L2__NO2____20180728T130136_20180728T130636_04089_01_010100_20180728T140302.nc')
ncatt_get(fid, varid=0, attname='METADATA/EOP_METADATA/om:featureOfInterest/eop:multiExtentOf/gml:surfaceMembers/gml:exterior/gml:posList', verbose=TRUE)
but this returns
[1] "ncatt_get: entering"
[1] "ncatt_get: is a global att"
[1] "ncatt_get: calling ncatt_get_inner for a global att"
[1] "ncatt_get_inner: entering with ncid= 65536 varid= -1 attname= METADATA/EOP_METADATA/om:featureOfInterest/eop:multiExtentOf/gml:surfaceMembers/gml:exterior/gml:posList"
[1] "ncatt_get_inner: about to call R_nc4_inq_att"
[1] "ncatt_get_inner: R_nc4_inq_att returned with error= -43 type= -1"
$`hasatt`
[1] FALSE
$value
[1] 0
presumably indicating that it cannot find the attribute and I assume that I got the path wrong somehow.
So my question is, how do I need to specify the path to an attribute that is a) in a nested group and b) not linked to a specific variable, such that ncatt_get() can find the attribute and return its value?
By the way, just for reference, in Matlab the command
test = ncreadatt(file, 'METADATA/EOP_METADATA/om:featureOfInterest/eop:multiExtentOf/gml:surfaceMembers/gml:exterior', 'gml:posList')
works fine, so I know it's not an issue with the file.
Any hints would be highly appreciated!

Related

Why R markdown always generates the same numbers when loading work space?

I've just noticed that R mark down always generates the same numbers when I begin with loading work space. For instance I type something like
load("C:/Users/Piotr/Documents/MyWorkSpace.RData")
rnorm(10,0,1)
and the result is always the same. In my instance this is
[1] 1.2741648 -0.7905977 -0.4062481 0.3983397 0.3917316 -1.4122062
[7] 0.6595976 0.5776770 -1.0952124 0.1878156
Can you explain it to me and tell my how can I deal with it? ;/
The Random Number Generator stores its current state in an hidden variable called .Random.seed.
When you load MyWorkspace.RData, you restore the state of RNG.
To reset the Random Number Generator, just delete .Random.seed
rm(.Random.seed, envir=globalenv())

Automate Response at Prompt in R interactive

Please see below my reference to a previous question asked along these lines.
I am running the library taxize in R. Taxize includes a function for getting a stable number associated with a scientific name, get_tsn().
I can run this in interactive mode or non-interactive mode so that I am either
prompted or not, respectively, to choose among multiple hits.
Interactive:
> tax.num <- get_tsn("Acer rubrum", ask=TRUE)
Retrieving data for taxon 'Acer rubrum'
tsn target commonNames nameUsage
1 28728 Acer rubrum red maple accepted
2 28730 Acer rubrum ssp. drummondii NA not accepted
3 526853 Acer rubrum var. drummondii Drummond's maple accepted
...
More than one TSN found for taxon 'Acer rubrum'!
Enter rownumber of taxon (other inputs will return 'NA'):
Non-interactive:
> tax.num <- get_tsn("Acer rubrum", ask=TRUE)
Retrieving data for taxon 'Acer rubrum'
Warning message:
> 1 result; no direct match found
I need to run this library in interactive mode so that I do not get an empty result when there is more than one match. However, babysitting this script is totally unrealistic for the size of my data, which are in the millions of scientific names. Thus, I want to automate a response to the prompt so that the answer is always 1. This will be the right answer for probably 99% of cases and will ultimately still lead to the right answer downstream in 100% of cases for reasons that are probably beyond the scope of this question.
Thus, how can I automate the response to always be 1?
I looked at this question and tried modifying my code accordingly.
options(httr_oauth_cache=T)
tax.num <- get_tsn("Acer rubrum",ask=T)
However, this gave the same result shown for interactive mode above.
Your help is appreciated.
UPDATE: Ignore below. Obviously Nathan Werth posted the best answer in a comment above.
tax.num <- get_tsn_(searchterm = "Acer rubrum", rows = 1)
works wonderfully!
...
I decided to modify the source code to handle this. I suspect that there is a more desirable solution, but this one meets my needs.
Thus, in the file get_tsn.R from the source, I replaced the following block of code
# prompt
message("\n\n")
print(tsn_df)
message("\nMore than one TSN found for taxon '", x, "'!\n
Enter rownumber of taxon (other inputs will return 'NA'):\n")
# prompt
take <- scan(n = 1, quiet = TRUE, what = 'raw')
with
take <- 1
I could have deleted other echoing to screen bits, that are unnecessary and now not true.
The revised function, which I tested using trace("get_tsn",edit=TRUE), returns as follows:
> print(tax.num)
[1] "28728"
attr(,"match")
[1] "found"
attr(,"multiple_matches")
[1] TRUE
attr(,"pattern_match")
[1] FALSE
attr(,"uri")
[1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?
search_topic=TSN&search_value=28728"
attr(,"class")
[1] "tsn"
I will recompile and install it on Linux now with the edit for use with this particular project.
I still welcome other, better answers.

In R, how do I access information in a data set using a variable after the $?

I'm using Bioconductor to look at GO terms. I can use for instance GOBPANCESTOR$"GO:0060412" to get all the ancestral terms to 0060412. However, I need to loop through many possible terms. However, I can't seem to get GOBPANCESTOR$ to accept a variable after the $.
> GOBPANCESTOR$"GO:0060412"
[1] "GO:0003007" "GO:0003205" "GO:0003206" "GO:0003231" "GO:0003279" "GO:0003281" "GO:0007275" "GO:0009653" "GO:0009887" "GO:0007507" "GO:0008150"
[12] "GO:0032501" "GO:0032502" "GO:0044699" "GO:0044707" "GO:0044767" "GO:0048513" "GO:0048731" "GO:0048856" "GO:0060411" "GO:0072358" "GO:0072359"
[23] "all"
But...
> mygoterm <- "GO:0060412"
> GOBPANCESTOR$mygoterm
NULL
Also tried using paste to no avail. I feel like I must be misunderstanding something integral about the way R works...
Thanks for your help!

Writing help information for user defined functions in R

I frequently use user defined functions in my code.
RStudio supports the automatic completion of code using the Tab key. I find this amazing because I always can read quickly what is supposed to go in the (...) of functions/calls.
However, my user defined functions just show the parameters, no additional info and obviously, no help page.
This isn't so much pain for me but I would like to share code I think it would be useful to have some information at hand besides the #coments in every line.
Nowadays, when I share, my lines usually look like this
myfun <- function(x1,x2,x3,...){
# This is a function for this and that
# x1 is a factor, x2 is an integer ...
# This line of code is useful for transformation of x2 by x1
some code here
# Now we do this other thing
more code
# This is where the magic happens
return (magic)
}
I think this line by line comment is great but I'd like to improve it and make some things handy just like every other function.
Not really an answer, but if you are interested in exploring this further, you should start at the rcompgen-help page (although that's not a function name) and also examine the code of:
rc.settings
Also, executing this allows you to see what the .CompletionEnv has in it for currently loaded packages:
names(rc.status())
#-----
[1] "attached_packages" "comps" "linebuffer" "start"
[5] "options" "help_topics" "isFirstArg" "fileName"
[9] "end" "token" "fguess" "settings"
And if you just look at:
rc.status()$help_topics
... you see the character items that the tab-completion mechanism uses for matching. On my machine at the moment there are 8881 items in that vector.

How to combine/nest qmake spec variables?

I'd like to create a variable in QMAKESPEC file, based on the other variables, like below (see also comments inline):
# some project-related paths
PROJECT_ABC_ROOT_PATH=$HOME/dev/project_one
PROJECT_XYZ_ROOT_PATH=$HOME/dev/project_two
# variable below is used to select one from the paths above
PROJ_NAME=ABC
# [1] this gives "projec_one" path properly
CURRENT_PATH=$${PROJECT_ABC_ROOT_PATH}
# [2] this doesn't work
CURRENT_PATH=$${PROJECT_$${PROJ_NAME}_ROOT_PATH}
Can anyone give advice on how could I correct version [2] please?
Try the following :
# some project-related paths
PROJECT_ABC_ROOT_PATH=$HOME/dev/project_one
PROJECT_XYZ_ROOT_PATH=$HOME/dev/project_two
# variable below is used to select one from the paths above
PROJ_NAME=ABC
CURRENT_PATH=$$eval(PROJECT_$${PROJ_NAME}_ROOT_PATH)

Resources