spChFIDs() on level 1 or higher map-files - r

Hopefully (one of) the last question on map-files.
Why is this not working, and how would I do that right?
load(url('http://gadm.org/data/rda/CUB_adm1.RData'))
CUB <- gadm
CUB <- spChFIDs(CUB, paste("CUB", rownames(CUB), sep = "_"))
Thank you very much!!!
seems to work with row.names()
load(url('http://gadm.org/data/rda/CUB_adm1.RData'))
CUB <- gadm
CUB <- spChFIDs(CUB, paste("CUB", row.names(CUB), sep = "_"))

The answer is apparent once one reads the help for ?row.names() and ?rownames().
The rownames() function only knows something about matrix-like objects, and CUB is not one of those, hence it doesn't have row names that rownames() can find:
> rownames(CUB)
NULL
row.names() is different, it is an S3 generic function and that means package authors can write methods for specific types of objects such that the row names of those objects can be extracted.
Here is a list of the methods available for row.names() in my current session, with the sp package loaded:
> methods(row.names)
[1] row.names.data.frame
[2] row.names.default
[3] row.names.SpatialGrid*
[4] row.names.SpatialGridDataFrame*
[5] row.names.SpatialLines*
[6] row.names.SpatialLinesDataFrame*
[7] row.names.SpatialPixels*
[8] row.names.SpatialPoints*
[9] row.names.SpatialPointsDataFrame*
[10] row.names.SpatialPolygons*
[11] row.names.SpatialPolygonsDataFrame*
Non-visible functions are asterisked
The class of the object CUB is:
> class(CUB)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
So what is happening is that the SpatialPolygonsDataFrame method of the row.names() function is being used and it knows where to find the required row names.

Related

Nested List Parsing with jsonlite

This is the second time that I have faced this recently, so I wanted to reach out to see if there is a better way to parse dataframes returned from jsonlite when one of elements is an array stored as a column in the dataframe as a list.
I know that this part of the power with jsonlite, but I am not sure how to work with this nested structure. In the end, I suppose that I can write my own custom parsing, but given that I am almost there, I wanted to see how to work with this data.
For example:
## options
options(stringsAsFactors=F)
## packages
library(httr)
library(jsonlite)
## setup
gameid="2015020759"
SEASON = '20152016'
BASE = "http://live.nhl.com/GameData/"
URL = paste0(BASE, SEASON, "/", gameid, "/PlayByPlay.json")
## get the data
x <- GET(URL)
## parse
api_response <- content(x, as="text")
api_response <- jsonlite::fromJSON(api_response, flatten=TRUE)
## get the data of interest
pbp <- api_response$data$game$plays$play
colnames(pbp)
And exploring what comes back:
> class(pbp$aoi)
[1] "list"
> class(pbp$desc)
[1] "character"
> class(pbp$xcoord)
[1] "integer"
From above, the column pbp$aoi is a list. Here are a few entries:
> head(pbp$aoi)
[[1]]
[1] 8465009 8470638 8471695 8473419 8475792 8475902
[[2]]
[1] 8470626 8471276 8471695 8476525 8476792 8477956
[[3]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[4]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[5]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[6]]
[1] 8469619 8471695 8473492 8474625 8475727 8475902
I don't really care if I parse these lists in the same dataframe, but what do I have for options to parse out the data?
I would prefer to take the data out of out lists and parse them into a dataframe that can be "related" to the original record it came from.
Thanks in advance for your help.
From #hrbmstr above, I was able to get what I wanted using unnest.
select(pbp, eventid, aoi) %>% unnest() %>% head

R - get values from multiple variables in the environment

I have some variables in my current R environment:
ls()
[1] "clt.list" "commands.list" "dirs.list" "eq" "hurs.list" "mlist" "prec.list" "temp.list" "vars"
[10] "vars.list" "wind.list"
where each one of the variables "clt.list", "hurs.list", "prec.list", "temp.list" and "wind.list" is a (huge) list of strings.
For example:
clt.list[1:20]
[1] "clt_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc" "clt_Amon_ACCESS1-3_historical_r1i1p1_185001-200512.nc"
[3] "clt_Amon_bcc-csm1-1_historical_r1i1p1_185001-201212.nc" "clt_Amon_bcc-csm1-1-m_historical_r1i1p1_185001-201212.nc"
[5] "clt_Amon_BNU-ESM_historical_r1i1p1_185001-200512.nc" "clt_Amon_CanESM2_historical_r1i1p1_185001-200512.nc"
[7] "clt_Amon_CCSM4_historical_r1i1p1_185001-200512.nc" "clt_Amon_CESM1-BGC_historical_r1i1p1_185001-200512.nc"
[9] "clt_Amon_CESM1-CAM5_historical_r1i1p1_185001-200512.nc" "clt_Amon_CESM1-CAM5-1-FV2_historical_r1i1p1_185001-200512.nc"
[11] "clt_Amon_CESM1-FASTCHEM_historical_r1i1p1_185001-200512.nc" "clt_Amon_CESM1-WACCM_historical_r1i1p1_185001-200512.nc"
[13] "clt_Amon_CMCC-CESM_historical_r1i1p1_190001-190412.nc" "clt_Amon_CMCC-CESM_historical_r1i1p1_190001-200512.nc"
[15] "clt_Amon_CMCC-CESM_historical_r1i1p1_190501-190912.nc" "clt_Amon_CMCC-CESM_historical_r1i1p1_191001-191412.nc"
[17] "clt_Amon_CMCC-CESM_historical_r1i1p1_191501-191912.nc" "clt_Amon_CMCC-CESM_historical_r1i1p1_192001-192412.nc"
[19] "clt_Amon_CMCC-CESM_historical_r1i1p1_192501-192912.nc" "clt_Amon_CMCC-CESM_historical_r1i1p1_193001-193412.nc"
What I need to do is extract the subset of the string that is between "Amon_" and "_historical".
I can do this for a single variable, as shown here:
levels(as.factor(sub(".*?Amon_(.*?)_historical.*", "\\1", clt.list[1:20])))
[1] "ACCESS1-0" "ACCESS1-3" "bcc-csm1-1" "bcc-csm1-1-m" "BNU-ESM" "CanESM2" "CCSM4"
[8] "CESM1-BGC" "CESM1-CAM5" "CESM1-CAM5-1-FV2" "CESM1-FASTCHEM" "CESM1-WACCM" "CMCC-CESM"
However, what I'd like to do is to run the command above for all the five variables at once. Instead of using just "ctl.list" as argument in the command above, I'd like to use all variables "clt.list", "hurs.list", "prec.list", "temp.list" and "wind.list" at once.
How can I do that?
Many thanks in advance!
You can put your operation into a function and then iterate over it:
get_my_substr <- function(vecname)
levels(as.factor(sub(".*?Amon_(.*?)_historical.*", "\\1", get(vecname))))
lapply(my_vecnames,get_my_substr)
lapply acts like a loop. You can create your list of vector names with
my_vecnames <- ls(pattern=".list$")
It is generally good practice to post a reproducible example in your question. Since none was provided here, I tested this approach with...
# example-maker
prestr <- "grr_Amon_"
posstr <- "_historical_zzz"
make_ex <- function()
replicate(
sample(10,1),
paste0(prestr,paste0(sample(LETTERS,sample(5,1)),collapse=""),posstr)
)
# make a couple examples
set.seed(1)
m01 <- make_ex()
m02 <- make_ex()
# test result
lapply(ls(pattern="^m[0-9][0-9]$"),get_my_substr)
One solution would be to create a vector containing the variable names that you want extract the data from, for example:
var.names <- c("clt.list", "commands.list", "dirs.list")
Then to access the value of each variable from the name:
for (var.name in var.names) {
var.value <- as.list(environment())[[var.name]]
# Do something with var.value
}

How to distinguish package namespace environment from other environment objects

Is there any way to programmatically distinguish between package environments and non-package environment objects? For example, the objects x and y below are both environments, with the same class and attributes.
x <- as.environment(cars)
y <- getNamespace("graphics")
However judging from the print method there is a difference:
> print(x)
<environment: 0x1d38118>
> print(y)
<environment: namespace:graphics>
Now suppose I have an arbitrary object, how can I determine which of the two it is (without looking at the output of print)? I would like to know this to determine how to store the object on disk. In case of the former I need to store the list representation of the environment (and perhaps its parents), but for the latter I would just store the name and version of the package.
isNamespace ?
isNamespace(y)
# [1] TRUE
isNamespace(x)
# [1] FALSE
And, for future reference, apropos is often helpful when you've got a question like this.
apropos("namespace")
# [1] "..getNamespace" ".BaseNamespaceEnv" ".getNamespace"
# [4] ".methodsNamespace" "asNamespace" "assignInMyNamespace"
# [7] "assignInNamespace" "attachNamespace" "fixInNamespace"
# [10] "getFromNamespace" "getNamespace" "getNamespaceExports"
# [13] "getNamespaceImports" "getNamespaceInfo" "getNamespaceName"
# [16] "getNamespaceUsers" "getNamespaceVersion" "isBaseNamespace"
# [19] "isNamespace" "loadedNamespaces" "loadingNamespaceInfo"
# [22] "loadNamespace" "namespaceExport" "namespaceImport"
# [25] "namespaceImportClasses" "namespaceImportFrom" "namespaceImportMethods"
# [28] "packageHasNamespace" "parseNamespaceFile" "requireNamespace"
# [31] "setNamespaceInfo" "unloadNamespace"

How can I get a list of all methods defined on an S4 class in R?

Is there a way in R to get a list of all methods defined on an S4 class, given the name of that class?
Edit: I know that showMethods can show me all the methods, but I want to manipulate the list programmatically, so that's no good.
Maybe this would be useful:
mtext <- showMethods(class="SpatialPolygons", printTo =FALSE )
fvec <- gsub( "Function(\\:\\s|\\s\\\")(.+)(\\s\\(|\\\")(.+$)",
"\\2", mtext[grep("^Function", mtext)] )
fvec
[1] ".quad" "[" "addAttrToGeom"
[4] "area" "as.data.frame" "click"
[7] "coerce" "coordinates" "coordnames"
[10] "coordnames<-" "coords" "disaggregate"
[13] "extract" "fromJSON" "isDiagonal"
[16] "isTriangular" "isValidJSON" "jsType"
[19] "over" "overlay" "plot"
[22] "polygons" "polygons<-" "rasterize"
[25] "recenter" "spChFIDs" "spsample"
[28] "spTransform" "text" "toJSON"
The original version did not properly extract the quoted non S4 generics in mtext such as:
[60] "Function \"jsType\":"
[61] " <not an S4 generic function>"
Are you looking for showMethods()?
library(sp)
showMethods(class="SpatialPolygons")
Maybe something like
library(sp)
x=capture.output(showMethods(class="SpatialPolygons"))
unlist(lapply(strsplit(x[grep("Function: ",x,)]," "),function(x) x[2]))
Also stumbled upon it, how about
library(sp)
attr(methods(class="SpatialPolygons"), "info")$generic
# Alternatively:
# attr(.S4methods(class="SpatialPolygons"), "info")$generic
This will directly yield a vector of method names.

XCMS Package - Retention time

Is there a simple way to get the list/array of retention times from a xcmsRaw object?
Example Code:
xraw <- xcmsRaw(cdfFile)
So for example getting information from it :
xraw#env$intensity
or
xraw#env$mz
You can see what slots are available in your xcmsRaw instance with
> slotNames(xraw)
[1] "env" "tic" "scantime"
[4] "scanindex" "polarity" "acquisitionNum"
[7] "profmethod" "profparam" "mzrange"
[10] "gradient" "msnScanindex" "msnAcquisitionNum"
[13] "msnPrecursorScan" "msnLevel" "msnRt"
[16] "msnPrecursorMz" "msnPrecursorIntensity" "msnPrecursorCharge"
[19] "msnCollisionEnergy" "filepath"
What you want is xraw#msnRt - it is a vector of numeric.
The env slot is a environment that stores 3 variables:
> ls(xraw#env)
[1] "intensity" "mz" "profile"
More details on the class itself at class?xcmsRaw.
EDIT: The msnRt slot is populated only if you specify includeMSn = TRUE and your input file must be in mzXML or mzML, not in cdf; if you use the faahKO example from ?xcmasRaw, you will see that
xr <- xcmsRaw(cdffiles[1], includeMSn = TRUE)
Warning message:
In xcmsRaw(cdffiles[1], includeMSn = TRUE) :
Reading of MSn spectra for NetCDF not supported
Also, xr#msnRt will only store the retention times for MSn scans, with n > 1. See the xset#rt where xset is an xcmsSet instance for the raw/corrected MS1 retention times as provided by xcms.
EDIT2: Alternatively, have a go with the mzR package
> library(mzR)
> cdffiles[1]
[2] "/home/lgatto/R/x86_64-unknown-linux-gnu-library/2.16/faahKO/cdf/KO/ko15.CDF"
> xx <- openMSfile(cdffiles[1])
> xx
Mass Spectrometry file handle.
Filename: /home/lgatto/R/x86_64-unknown-linux-gnu-library/2.16/faahKO/cdf/KO/ko15.CDF
Number of scans: 1278
> hd <- header(xx)
> names(hd)
[1] "seqNum" "acquisitionNum"
[3] "msLevel" "peaksCount"
[5] "totIonCurrent" "retentionTime"
[7] "basePeakMZ" "basePeakIntensity"
[9] "collisionEnergy" "ionisationEnergy"
[11] "highMZ" "precursorScanNum"
[13] "precursorMZ" "precursorCharge"
[15] "precursorIntensity" "mergedScan"
[17] "mergedResultScanNum" "mergedResultStartScanNum"
[19] "mergedResultEndScanNum"
> class(hd)
[1] "data.frame"
> dim(hd)
[1] 1278 19
but you will be outside of the default xcms pipeline if you take this route (although Steffen Neumann, from xcms, does know mzR very well, oubviously).
Finally, you are better of to use the Bioconductor mailing list of the xcms online forum if you want to maximise your chances to get feedback from the xcms developers.
Hope this helps.
Good answer but i was looking for this :
xraw <- xcmsRaw(cdfFile)
dp_index <- which(duplicated(rawMat(xraw)[,1]))
xraw_rt_dp <- rawMat(xraw)[,1]
xrawData.rt <- xraw_rt_dp[-dp_index]
Now :
xrawData.rt #contains the retention time.
Observation --> Using mzr package:
nc <- mzR:::netCDFOpen(cdfFile)
ncData <- mzR:::netCDFRawData(nc)
mzR:::netCDFClose(nc)
ncData$rt #contains the same retention time for the same input !!!

Resources