extract lmg from calc.relimp in R - r

I'm trying to extract the lmg information from the results i'm getting from calc.relimp found in the relaimpo package.
when i view my results i see
Response variable: DS[, 2]
Total response variance: 107.5848
Analysis based on 21985 observations
3 Regressors:
DS[, 33] DS[, 18] DS[, 23]
Proportion of variance explained by model: 1.39%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg
DS[, 33] 0.007041436
DS[, 18] 0.001038892
DS[, 23] 0.005823708
Average coefficients for different model sizes:
1X 2Xs 3Xs
DS[, 33] -1.9229313 -2.3138967 -2.4784731
DS[, 18] -0.9155606 -0.8011497 -0.6107294
DS[, 23] 1.3592192 2.0488534 2.3525688
i would ideally like to extract 33 0.00704, 18 0.00103, 23 0.00582 so i can run more analysis on the lmg values.
Thank you for your help!

relaimpo also caters for users who are used to lists and the $ extractor, i.e. the following would also work:
library(relaimpo)
ll <- calc.relimp(swiss)
ll$lmg ## instead of ll#lmg

You can see the structure of your object calculated with calc.relimp() function using function list().
lmg values are stored in list element with the same name and can selected as object#lmg.
Here is example using data from this package
library(relaimpo)
data(swiss)
ll<-calc.relimp(swiss)
str(ll)
Formal class 'relimplm' [package "relaimpo"] with 36 slots
..# var.y : num 156
..# R2 : num 0.707
..# R2.decomp : num 0.707
..# lmg : Named num [1:5] 0.0571 0.1712 0.2601 0.1056 0.1128
.. ..- attr(*, "names")= chr [1:5] "Agriculture" "Examination" "Education" "Catholic" ...
.....
..# car.diff : num(0)
..# namen : chr [1:6] "Fertility" "Agriculture" "Examination" "Education" ...
..# nobs : int 47
..# ave.coeffs : num [1:5, 1:5] 0.194 -1.011 -0.862 0.139 1.786 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:5] "Agriculture" "Examination" "Education" "Catholic" ...
.. .. ..$ : chr [1:5] "1X" "2Xs" "3Xs" "4Xs" ...
....
ll#lmg
Agriculture Examination Education Catholic Infant.Mortality
0.05709122 0.17117303 0.26013468 0.10557015 0.11276592

Related

Plotting realisations of random Gaussian fields using RandomFields package results in blank graph. Why?

For some reason when I try to using the plot() function to visualise the output of the RFsimulate() function in the RandomFields package, the output is always an empty plot.
I am just using the example code included in the help file:
## first let us look at the list of implemented models
RFgetModelNames(type="positive definite", domain="single variable",
iso="isotropic")
## our choice is the exponential model;
## the model includes nugget effect and the mean:
model <- RMexp(var=5, scale=10) + # with variance 4 and scale 10
RMnugget(var=1) + # nugget
RMtrend(mean=0.5) # and mean
## define the locations:
from <- 0
to <- 20
x.seq <- seq(from, to, length=200)
y.seq <- seq(from, to, length=200)
simu <- RFsimulate(model=model, x=x.seq, y=y.seq)
str(simu)
Which gives:
Formal class 'RFspatialGridDataFrame' [package ""] with 5 slots
..# .RFparams :List of 5
.. ..$ n : num 1
.. ..$ vdim : int 1
.. ..$ T : num(0)
.. ..$ coordunits: NULL
.. ..$ varunits : NULL
..# data :'data.frame': 441 obs. of 1 variable:
.. ..$ variable1: num [1:441] 4.511 2.653 3.951 0.771 2.718 ...
..# grid :Formal class 'GridTopology' [package "sp"] with 3 slots
.. .. ..# cellcentre.offset: Named num [1:2] 0 0
.. .. .. ..- attr(*, "names")= chr [1:2] "coords.x1" "coords.x2"
.. .. ..# cellsize : Named num [1:2] 1 1
.. .. .. ..- attr(*, "names")= chr [1:2] "coords.x1" "coords.x2"
.. .. ..# cells.dim : int [1:2] 21 21
..# bbox : num [1:2, 1:2] -0.5 -0.5 20.5 20.5
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
.. .. ..$ : chr [1:2] "min" "max"
..# proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..# projargs: chr NA
... so data has been simulated, but when I call
plot(simu)
I end up with something like this:
e.g. Empty plot
Can anyone tell what going on here?!
I would coerce the object back to an sp SpatialGridDataFrame and plot that, as RandomFields creates a wrapper around this S4 class:
sgdf = sp::SpatialGridDataFrame(simu#grid, simu#data, simu#proj4string)
sp::plot(sgdf)
Also, you can coerce to matrix and plot using the standard graphics library:
graphics::image(as.matrix(simu))
The strange thing is that converting it to a SpatialGridDataFrame requires a flip and transpose before plotting:
graphics::image(t(apply(as.matrix(sgdf), 1, rev)))
Apparently, they are a bit internally inconsistent. The simplest solution is to convert simu to raster and plot:
r = raster::raster(simu)
raster::plot(r)

Extract statistics from Anderson-Darling test (list)

I would like to extract the p-values from the Anderson-Darling test (ad.test from package kSamples). The test result is a list of 12 containing a 2x3 matrix. The p value is part of the 2x3 matrix and is present in element 7.
When using the following code:
lapply(AD_result, "[[", 7)
I get the following subset of AD test results (first 2 of a total of 50 shown)
[[1]]
AD T.AD asympt. P-value
version 1: 1.72 0.94536 0.13169
version 2: 1.51 0.66740 0.17461
[[2]]
AD T.AD asympt. P-value
version 1: 12.299 14.624 6.9248e-07
version 2: 11.900 14.144 1.1146e-06
My question is how to extract only the p-value (e.g. from version 1) and put these 50 results into a vector
The output from str(AD_result) is:
List of 55
$ :List of 12
..$ test.name : chr "Anderson-Darling"
..$ k : int 2
..$ ns : int [1:2] 103 2905
..$ N : int 3008
..$ n.ties : int 2873
..$ sig : num 0.762
..$ ad : num [1:2, 1:3] 1.72 1.51 0.945 0.667 0.132 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "version 1:" "version 2:"
.. .. ..$ : chr [1:3] "AD" "T.AD" " asympt. P-value"
..$ warning : logi FALSE
..$ null.dist1: NULL
..$ null.dist2: NULL
..$ method : chr "asymptotic"
..$ Nsim : num 1
..- attr(*, "class")= chr "kSamples"
You could try:
unlist(lapply(AD_result, function(x) x$ad[,3]))

R: Extract values from a summary() in a clValid object

I'm working on validating the goodness of hierarchical clustering using clValid. Below is my code. The clustering always results in one noisy cluster which contains the 70% of the elements and hence I recursively cluster the elements in the noisy cluster.
intern <- clValid(primaryDataSource, 2:10,clMethods = c("Hierarchical"),
validation="internal", maxitems = 2200)
summary(intern)
Output of summary(intern):
Clustering Methods:
hierarchical
Cluster sizes:
2 3 4 5 6 7 8 9 10
Validation Measures:
2 3 4 5 6 7 8 9 10
hierarchical Connectivity 3.8738 3.8738 8.2563 10.9452 16.0286 18.6452 20.6452 22.6452 24.6452
Dunn 4.0949 0.8810 0.6569 0.8694 0.8808 1.0416 1.0230 1.0262 1.3724
Silhouette 0.9592 0.9879 0.9785 0.9751 0.9727 0.9729 0.9727 0.9726 0.9725
Optimal Scores:
Score Method Clusters
Connectivity 3.8738 hierarchical 2
Dunn 4.0949 hierarchical 2
Silhouette 0.9879 hierarchical 3
At each iteration I have to execute the clValid() and select the number of clusters which would give me the highest Silhouette value (in the above example it's 3). I'm trying to automate the recursive clustering approach. Hence I'm looking to pick the number of clusters which would have the highest Silhouette value. Can you please help me in extracting that piece of information? Thank you.
P.S: I tried converting the results into a data frame or a table. However it didn't work.
Update: After using str()
> str(intern)
Formal class 'clValid' [package "clValid"] with 14 slots
..# clusterObjs:List of 1
.. ..$ hierarchical:List of 7
.. .. ..$ merge : int [1:2173, 1:2] -1673 -714 -1121 -1688 -1876 -1123 -1689 -1228 -429 -535 ...
.. .. ..$ height : num [1:2173] 0 0.001 0.001 0.001 0.001 ...
.. .. ..$ order : int [1:2174] 2165 2166 1950 1951 1954 1955 1577 1565 1564 1576 ...
.. .. ..$ labels : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
.. .. ..$ method : chr "average"
.. .. ..$ call : language hclust(d = Dist, method = method)
.. .. ..$ dist.method: chr "euclidean"
.. .. ..- attr(*, "class")= chr "hclust"
..# measures : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
.. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
.. .. ..$ : chr "hierarchical"
..# measNames : chr [1:3] "Connectivity" "Dunn" "Silhouette"
..# clMethods : chr "hierarchical"
..# labels : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
..# nClust : num [1:9] 2 3 4 5 6 7 8 9 10
..# validation : chr "internal"
..# metric : chr "euclidean"
..# method : chr "average"
..# neighbSize : num 10
..# annotation : NULL
..# GOcategory : chr "all"
..# goTermFreq : num 0.05
..# call : language clValid(obj = primaryDataSource, nClust = 2:10, clMethods = c("Hierarchical"), validation = "internal", maxitems = 2200)
I guess the important section is
# measures : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
.. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
.. .. ..$ : chr "hierarchical"
when I executed >intern#measuresI got the below result.
2 3 4 5 6 7 8 9
Connectivity 3.8738095 3.8738095 8.2563492 10.9452381 16.0285714 18.6452381 20.6452381 22.645238
Dunn 4.0948837 0.8810494 0.6568857 0.8694067 0.8808228 1.0415614 1.0230197 1.026192
Silhouette 0.9591803 0.9879153 0.9784684 0.9751393 0.9727454 0.9728736 0.9727153 0.972622
10
Connectivity 24.6452381
Dunn 1.3724494
Silhouette 0.9725379
I'm able to get the max and access individual items based on the index. I want to get the maximum value for Silhouette.
intern#measures[1]
max(intern#measures)
Some additionnal explanation, when str() shows # signs, this points that the object you are inspecting is a S4 class with attributes. I am not familiar with clValid but a quick look at the source code shows that the clValid class inherits from S4.
You can access those using object#attribute. Typically these attributes can be anything.
Looking at the print function for clValid it seems that you can access the measures using the convenience function measures(object). Looking at the remaining source code for clValid there are utility functions that may be of use for you. Check optimalScores().

Extract knot values out of gam with spline [duplicate]

This question already has an answer here:
mgcv: How to set number and / or locations of knots for splines
(1 answer)
Closed 5 years ago.
I am running a GAM across many samples and am extracting coefficients/t-values/r-squared from the results in the way shown below. For background, I am using a natural spline, so the regular lm() works fine here and perhaps that is why this method works fine.
tvalsm93exf=ldply(fitsm93exf, function(x) as.data.frame(t(coef(summary(x))[,'t value', drop=FALSE])))
r2m93exf=ldply(fitsm93exf, function(x) as.data.frame(t(summary(x))[,'r.squared', drop=FALSE]))
I would also like to extract the knot locations for each sample set(df=4 and no intercept, so three internal knots and the boundaries). I have tried several variations of the commands above, but haven't been able to index in to this. The regular way to do this is below, so I was attempting to put this into the form above. But I am not certain if the summary function contains these values, or if there is another result I should be including instead.
attr(terms(fits),"predvars")
http://www.inside-r.org/r-doc/splines/ns
Note: This question is related to the question below, if that helps, though its solution did not help me solve my problem:
Extract estimates of GAM
The knots are fixed at the time that the ns function is called in the examples on the help page you linked to, so you could have extracted the knots without going into the model object. But ... you have not provided the code for the GAM model creation, so we can only speculate about what you might have done. Just because the word "spline" is used in both the ?ns-help-page and in the documentation does not mean they are the same. The model in the other page you linked to had two "smooth" terms constructed wtih the s function.
.... + s(time,bs="cr",k=200) + s(tmpd,bs="cr")
The result of that gam call had a list node named "smooth" and the first one looked like this when viewed with str():
str(ap1$smooth)
List of 2
$ :List of 22
..$ term : chr "time"
..$ bs.dim : num 200
..$ fixed : logi FALSE
..$ dim : int 1
..$ p.order : logi NA
..$ by : chr "NA"
..$ label : chr "s(time)"
..$ xt : NULL
..$ id : NULL
..$ sp : Named num -1
.. ..- attr(*, "names")= chr "s(time)"
..$ S :List of 1
.. ..$ : num [1:199, 1:199] 5.6 -5.475 2.609 -0.577 0.275 ...
..$ rank : num 198
..$ null.space.dim: num 1
..$ df : num 199
..$ xp : Named num [1:200] -2556 -2527 -2502 -2476 -2451 ...
.. ..- attr(*, "names")= chr [1:200] "0.0000000%" "0.5025126%" "1.0050251%" "1.5075377%" ...
..$ F : num [1:40000] 0 0 0 0 0 0 0 0 0 0 ...
..$ plot.me : logi TRUE
..$ side.constrain: logi TRUE
..$ S.scale : num 9.56e-05
..$ vn : chr "time"
..$ first.para : num 5
..$ last.para : num 203
..- attr(*, "class")= chr [1:2] "cr.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..$ qr : num [1:200, 1] -0.0709 0.0817 0.0709 0.0688 0.0724 ...
.. ..$ rank : int 1
.. ..$ qraux: num 1.03
.. ..$ pivot: int 1
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
So the smooth was evaluated at each of 200 points and a polynomial function fit to the data on that grid. If you forced the knots to be at three interior locations then they will just be at the extremes and evenly spaced location between the extremes.

Accessing control chart results in R?

I have a short R script that loads a bunch of data and plots it in an XBar chart. Using the following code, I can plot the data and view the various statistical information.
library(qcc)
tir<-read.table("data.dat", header=T,,sep="\t")
names(tir)
attach(tir)
rand <- sample(tir)
xbarchart <- qcc(rand[1:100,],type="R")
summary(xbarchart)
I want to be able to do some process capability analysis (described here(PDF) on page 5) immediately after the XBar chart is created. In order to create the analysis chart, I need to store the LCL and UCL results from the XBar chart results created before as variables. Is there any way I can do this?
I shall answer your question using the example in the ?qcc help file.
x <- c(33.75, 33.05, 34, 33.81, 33.46, 34.02, 33.68, 33.27, 33.49, 33.20,
33.62, 33.00, 33.54, 33.12, 33.84)
xbarchart <- qcc(x, type="xbar.one", std.dev = "SD")
A useful function to inspect the structure of variables and function results is str(), short for structure.
str(xbarchart)
List of 11
$ call : language qcc(data = x, type = "xbar.one", std.dev = "SD")
$ type : chr "xbar.one"
$ data.name : chr "x"
$ data : num [1:15, 1] 33.8 33 34 33.8 33.5 ...
..- attr(*, "dimnames")=List of 2
.. ..$ Group : chr [1:15] "1" "2" "3" "4" ...
.. ..$ Samples: NULL
$ statistics: Named num [1:15] 33.8 33 34 33.8 33.5 ...
..- attr(*, "names")= chr [1:15] "1" "2" "3" "4" ...
$ sizes : int [1:15] 1 1 1 1 1 1 1 1 1 1 ...
$ center : num 33.5
$ std.dev : num 0.342
$ nsigmas : num 3
$ limits : num [1, 1:2] 32.5 34.5
..- attr(*, "dimnames")=List of 2
.. ..$ : chr ""
.. ..$ : chr [1:2] "LCL" "UCL"
$ violations:List of 2
..$ beyond.limits : int(0)
..$ violating.runs: num(0)
- attr(*, "class")= chr "qcc"
You will notice the second to last element in this list is called $limits and contains the two values for LCL and UCL.
It is simple to extract this element:
limits <- xbarchart$limits
limits
LCL UCL
32.49855 34.54811
Thus LCL <- limits[1] and UCL <- limits[2]

Resources