Scoping of variables in aes(...) inside a function in ggplot - r

Consider this use of ggplot(...) inside a function.
x <- seq(1,10,by=0.1)
df <- data.frame(x,y1=x, y2=cos(2*x)/(1+x))
library(ggplot2)
gg.fun <- function(){
i=2
plot(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
}
if(exists("i")) remove(i)
gg.fun()
# Error in `[.data.frame`(df, , i) : object 'i' not found
i=3
gg.fun() # plots df[,3] vs. x
It looks like ggplot does not recognize the variable i defined inside the function, but does recognize i if it is defined in the global environment. Why is that?
Note that this gives the expected result.
gg.new <- function(){
i=2
plot(ggplot(data.frame(x=df$x,y=df[,i]),aes(x,y)) + geom_line())
}
if(exists("i")) remove(i)
gg.new() # plots df[,2] vs. x
i=3
gg.new() # also plots df[,2] vs. x

Let's return a non-rendered ggplot object to see what's going on:
gg.str <- function() {
i=2
str(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
}
gg.str()
List of 9
$ data :'data.frame': 91 obs. of 3 variables:
..$ x : num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
..$ y1: num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
..$ y2: num [1:91] -0.208 -0.28 -0.335 -0.373 -0.393 ...
$ layers :List of 1
..$ :Classes 'proto', 'environment' <environment: 0x0000000009886ca0>
$ scales :Reference class 'Scales' [package "ggplot2"] with 1 fields
..$ scales: list()
..and 21 methods, of which 9 are possibly relevant:
.. add, clone, find, get_scales, has_scale, initialize, input, n, non_position_scales
$ mapping :List of 2
..$ x: symbol x
..$ y: language df[, i]
$ theme : list()
$ coordinates:List of 1
..$ limits:List of 2
.. ..$ x: NULL
.. ..$ y: NULL
..- attr(*, "class")= chr [1:2] "cartesian" "coord"
$ facet :List of 1
..$ shrink: logi TRUE
..- attr(*, "class")= chr [1:2] "null" "facet"
$ plot_env :<environment: R_GlobalEnv>
$ labels :List of 2
..$ x: chr "x"
..$ y: chr "df[, i]"
- attr(*, "class")= chr [1:2] "gg" "ggplot"
As we can see, mapping for y is simply an unevaluated expression. Now, when we ask to do the actual plotting, the expression is evaluated within plot_env, which is global. I do not know why it is done so; I believe there are reasons for that.
Here's a demo that can override this behaviour:
gg.envir <- function(envir=environment()) {
i=2
p <- ggplot(df,aes(x=x,y=df[,i]))+geom_line()
p$plot_env <- envir
plot(p)
}
# evaluation in local environment; ok
gg.envir()
# evaluation in global environment (same as default); fails if no i
gg.envir(environment())

Related

Display the lambda used in caret::BoxCoxTrans function

I'm using BoxCoxTrans function from the caret package:
library(caret)
library(purrr)
model1 <- apply(X = my.df, 2, BoxCoxTrans)
model2 <- purrr::map2(model1, my.df, function(x,y) predict(x,y))
trans.df <- as.data.frame(do.call(cbind, model2))
library(rcompanion)
plotNormalHistogram(trans.df)
print(trans.df)
It is working correctly and transforming the data, but I have no way of knowing which lambda value is used for the transformation.
You can find these values in model1. I'll show you how to get them using the iris data.
library(caret)
fudge <- 0.2
out <- lapply(iris[1:2], BoxCoxTrans, fudge = fudge) # instead of apply(..., margin = 2, ...)
Now look at the structure of out
str(out, 2)
#List of 2
# $ Sepal.Length:List of 6
# ..$ lambda : num -0.1
# ..$ fudge : num 0.2
# ..$ n : int 150
# ..$ summary :Classes 'summaryDefault', 'table' Named num [1:6] 4.3 5.1 5.8 5.84 6.4 ...
# .. .. ..- attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
# ..$ ratio : num 1.84
# ..$ skewness: num 0.309
# ..- attr(*, "class")= chr "BoxCoxTrans"
# $ Sepal.Width :List of 6
# ..$ lambda : num 0.3
# ..$ fudge : num 0.2
# ..$ n : int 150
# ..$ summary :Classes 'summaryDefault', 'table' Named num [1:6] 2 2.8 3 3.06 3.3 ...
# .. .. ..- attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
# ..$ ratio : num 2.2
# ..$ skewness: num 0.313
# ..- attr(*, "class")= chr "BoxCoxTrans"
Using base R you can use sapply and `[[` now as follows
sapply(out, `[[`, "lambda")
#Sepal.Length Sepal.Width
# -0.1 0.3
Since you use purrr, you might consider map and pluck
map_dbl(out, pluck, "lambda")
#Sepal.Length Sepal.Width
# -0.1 0.3
Thanks to #missuse's mindful comments we can get the lambda used for transformation as
library(dplyr)
real_lambda <- case_when(between(lambda, -fudge, fudge) ~ 0,
between(lambda, 1 - fudge, 1 + fudge) ~ 1,
TRUE ~ lambda)
real_lambda <- setNames(real_lambda, names(lambda))
real_lambda
#Sepal.Length Sepal.Width
# 0.0 0.3
This is necessary because the function BoxCoxTrans has the argument fudge which is
a tolerance value: lambda values within +/-fudge will be coerced to 0 and within 1+/-fudge will be coerced to 1.

R kohonen - Is the input data scaled and centred automatically?

I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.
Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)

R - low-level file IO

I am trying to read in a file in "flexible data format" using R.
I got the number of bytes I should be reading in (counting from EOF, e.g., I should be reading EOF-32 to EOF bytes in as my data).
I am seeking the equivalences to the fseek and fread from MATLAB in R.
I think you would do better with a different approach (if I've got the right "flexible data format" file format here). You can deal with much of these (horrible) files with basic string functions in R:
library(stringr)
# read in fdf file
l <- readLines("http://rud.is/dl/Fe.fdf")
# some basic cleanup
l <- sub("#.*$", "", l) # remove comments
l <- sub("^=.*$", "", l) # remove comments
l <- gsub("\ +", " ", l) # compress spaces
l <- str_trim(l) # beg/end space trim
l <- grep("^$", l, value=TRUE, invert=TRUE) # ignore blank lines
# start of data blocks
blocks <- which(grepl("^%block", l))
# all "easy"/simple lines
simple <- str_split_fixed(grep("^[[:digit:]%]", l, value=TRUE, invert=TRUE),
"[[:space:]]+", 2)
# "simple" name/val [unit] conversions
convert_vals <- function(simple) {
vals <- simple[,2]
names(vals) <- simple[,1]
lapply(vals, function(v) {
# if logical
if (tolower(v) %in% c("t", "true", ".true.", "f", "false", ".false.")) {
return(as.logical(gsub("\\.", "", v)))
}
# if it's just a number
# i may be missing a numeric fmt char in this horrible format
if (grepl("^[[:digit:]\\.\\+\\-]+$", v)) {
return(as.numeric(v))
}
# if value and unit convert to an actual number with a unit attribute
# or convert it here from the table starting on line 927 of fdf.f
if (grepl("^[[:digit:]]", v) & (!any(is.na(str_locate(v, " "))))) {
vu <- str_split_fixed(v, " ", 2)
x <- as.numeric(vu[,1])
attr(x, "unit") <- vu[,2]
return(x)
}
# handle "1.d-3" and other vals with other if's
# anything not handled is returned
return(v)
})
}
# handle begin/end block "complex" data conversion
convert_blocks <- function(lines) {
block_names <- sub("^%block ", "", grep("^%block", lines, value=TRUE))
lapply(blocks, function(blk_start) {
blk <- lines[blk_start]
blk_info <- str_split_fixed(blk, " ", 2)
blk_end <- which(grepl(sprintf("^%%endblock %s", blk_info[,2]), lines))
# this is overly simplistic since you have to do some conversions, but you know the line
# range of the data values now so you can process them however you need to
read.table(text=lines[(blk_start+1):(blk_end-1)],
header=FALSE, stringsAsFactors=FALSE, fill=TRUE)
}) -> blks
names(blks) <- block_names
return(blks)
}
fdf <- c(convert_vals(simple),
convert_blocks(l))
str(fdf)
Output of the str:
List of 32
$ SystemName : chr "bcc Fe ferro GGA"
$ SystemLabel : chr "Fe"
$ WriteCoorStep : chr ""
$ WriteMullikenPop : num 1
$ NumberOfSpecies : num 1
$ NumberOfAtoms : num 1
$ PAO.EnergyShift : atomic [1:1] 50
..- attr(*, "unit")= chr "meV"
$ PAO.BasisSize : chr "DZP"
$ Fe : num 2
$ LatticeConstant : atomic [1:1] 2.87
..- attr(*, "unit")= chr "Ang"
$ KgridCutoff : atomic [1:1] 15
..- attr(*, "unit")= chr "Ang"
$ xc.functional : chr "GGA"
$ xc.authors : chr "PBE"
$ SpinPolarized : logi TRUE
$ MeshCutoff : atomic [1:1] 150
..- attr(*, "unit")= chr "Ry"
$ MaxSCFIterations : num 40
$ DM.MixingWeight : num 0.1
$ DM.Tolerance : chr "1.d-3"
$ DM.UseSaveDM : logi TRUE
$ DM.NumberPulay : num 3
$ SolutionMethod : chr "diagon"
$ ElectronicTemperature : atomic [1:1] 25
..- attr(*, "unit")= chr "meV"
$ MD.TypeOfRun : chr "cg"
$ MD.NumCGsteps : num 0
$ MD.MaxCGDispl : atomic [1:1] 0.1
..- attr(*, "unit")= chr "Ang"
$ MD.MaxForceTol : atomic [1:1] 0.04
..- attr(*, "unit")= chr "eV/Ang"
$ AtomicCoordinatesFormat : chr "Fractional"
$ ChemicalSpeciesLabel :'data.frame': 1 obs. of 3 variables:
..$ V1: int 1
..$ V2: int 26
..$ V3: chr "Fe"
$ PAO.Basis :'data.frame': 5 obs. of 3 variables:
..$ V1: chr [1:5] "Fe" "0" "6." "2" ...
..$ V2: num [1:5] 2 2 0 2 0
..$ V3: chr [1:5] "" "P" "" "" ...
$ LatticeVectors :'data.frame': 3 obs. of 3 variables:
..$ V1: num [1:3] 0.5 0.5 0.5
..$ V2: num [1:3] 0.5 -0.5 0.5
..$ V3: num [1:3] 0.5 0.5 -0.5
$ BandLines :'data.frame': 5 obs. of 5 variables:
..$ V1: int [1:5] 1 40 28 28 34
..$ V2: num [1:5] 0 2 1 0 1
..$ V3: num [1:5] 0 0 1 0 1
..$ V4: num [1:5] 0 0 0 0 1
..$ V5: chr [1:5] "\\Gamma" "H" "N" "\\Gamma" ...
$ AtomicCoordinatesAndAtomicSpecies:'data.frame': 1 obs. of 4 variables:
..$ V1: num 0
..$ V2: num 0
..$ V3: num 0
..$ V4: int 1
You can see the output (and the file and this code) in this gist since it's easier to copy/past/clone a gist.
You still need to:
deal with unit conversion (but with this grid::unit-like structure that shld be far more straightforward)
swap out the naive read.table with a better "block reader"
deal with file includes (pretty simple, tho, if you add a function or two)
With a bit of tweaking/polish this cld be a new R package, not that I'd ever want a data file in this format ever.

Extract knot values out of gam with spline [duplicate]

This question already has an answer here:
mgcv: How to set number and / or locations of knots for splines
(1 answer)
Closed 5 years ago.
I am running a GAM across many samples and am extracting coefficients/t-values/r-squared from the results in the way shown below. For background, I am using a natural spline, so the regular lm() works fine here and perhaps that is why this method works fine.
tvalsm93exf=ldply(fitsm93exf, function(x) as.data.frame(t(coef(summary(x))[,'t value', drop=FALSE])))
r2m93exf=ldply(fitsm93exf, function(x) as.data.frame(t(summary(x))[,'r.squared', drop=FALSE]))
I would also like to extract the knot locations for each sample set(df=4 and no intercept, so three internal knots and the boundaries). I have tried several variations of the commands above, but haven't been able to index in to this. The regular way to do this is below, so I was attempting to put this into the form above. But I am not certain if the summary function contains these values, or if there is another result I should be including instead.
attr(terms(fits),"predvars")
http://www.inside-r.org/r-doc/splines/ns
Note: This question is related to the question below, if that helps, though its solution did not help me solve my problem:
Extract estimates of GAM
The knots are fixed at the time that the ns function is called in the examples on the help page you linked to, so you could have extracted the knots without going into the model object. But ... you have not provided the code for the GAM model creation, so we can only speculate about what you might have done. Just because the word "spline" is used in both the ?ns-help-page and in the documentation does not mean they are the same. The model in the other page you linked to had two "smooth" terms constructed wtih the s function.
.... + s(time,bs="cr",k=200) + s(tmpd,bs="cr")
The result of that gam call had a list node named "smooth" and the first one looked like this when viewed with str():
str(ap1$smooth)
List of 2
$ :List of 22
..$ term : chr "time"
..$ bs.dim : num 200
..$ fixed : logi FALSE
..$ dim : int 1
..$ p.order : logi NA
..$ by : chr "NA"
..$ label : chr "s(time)"
..$ xt : NULL
..$ id : NULL
..$ sp : Named num -1
.. ..- attr(*, "names")= chr "s(time)"
..$ S :List of 1
.. ..$ : num [1:199, 1:199] 5.6 -5.475 2.609 -0.577 0.275 ...
..$ rank : num 198
..$ null.space.dim: num 1
..$ df : num 199
..$ xp : Named num [1:200] -2556 -2527 -2502 -2476 -2451 ...
.. ..- attr(*, "names")= chr [1:200] "0.0000000%" "0.5025126%" "1.0050251%" "1.5075377%" ...
..$ F : num [1:40000] 0 0 0 0 0 0 0 0 0 0 ...
..$ plot.me : logi TRUE
..$ side.constrain: logi TRUE
..$ S.scale : num 9.56e-05
..$ vn : chr "time"
..$ first.para : num 5
..$ last.para : num 203
..- attr(*, "class")= chr [1:2] "cr.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..$ qr : num [1:200, 1] -0.0709 0.0817 0.0709 0.0688 0.0724 ...
.. ..$ rank : int 1
.. ..$ qraux: num 1.03
.. ..$ pivot: int 1
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
So the smooth was evaluated at each of 200 points and a polynomial function fit to the data on that grid. If you forced the knots to be at three interior locations then they will just be at the extremes and evenly spaced location between the extremes.

Accessing control chart results in R?

I have a short R script that loads a bunch of data and plots it in an XBar chart. Using the following code, I can plot the data and view the various statistical information.
library(qcc)
tir<-read.table("data.dat", header=T,,sep="\t")
names(tir)
attach(tir)
rand <- sample(tir)
xbarchart <- qcc(rand[1:100,],type="R")
summary(xbarchart)
I want to be able to do some process capability analysis (described here(PDF) on page 5) immediately after the XBar chart is created. In order to create the analysis chart, I need to store the LCL and UCL results from the XBar chart results created before as variables. Is there any way I can do this?
I shall answer your question using the example in the ?qcc help file.
x <- c(33.75, 33.05, 34, 33.81, 33.46, 34.02, 33.68, 33.27, 33.49, 33.20,
33.62, 33.00, 33.54, 33.12, 33.84)
xbarchart <- qcc(x, type="xbar.one", std.dev = "SD")
A useful function to inspect the structure of variables and function results is str(), short for structure.
str(xbarchart)
List of 11
$ call : language qcc(data = x, type = "xbar.one", std.dev = "SD")
$ type : chr "xbar.one"
$ data.name : chr "x"
$ data : num [1:15, 1] 33.8 33 34 33.8 33.5 ...
..- attr(*, "dimnames")=List of 2
.. ..$ Group : chr [1:15] "1" "2" "3" "4" ...
.. ..$ Samples: NULL
$ statistics: Named num [1:15] 33.8 33 34 33.8 33.5 ...
..- attr(*, "names")= chr [1:15] "1" "2" "3" "4" ...
$ sizes : int [1:15] 1 1 1 1 1 1 1 1 1 1 ...
$ center : num 33.5
$ std.dev : num 0.342
$ nsigmas : num 3
$ limits : num [1, 1:2] 32.5 34.5
..- attr(*, "dimnames")=List of 2
.. ..$ : chr ""
.. ..$ : chr [1:2] "LCL" "UCL"
$ violations:List of 2
..$ beyond.limits : int(0)
..$ violating.runs: num(0)
- attr(*, "class")= chr "qcc"
You will notice the second to last element in this list is called $limits and contains the two values for LCL and UCL.
It is simple to extract this element:
limits <- xbarchart$limits
limits
LCL UCL
32.49855 34.54811
Thus LCL <- limits[1] and UCL <- limits[2]

Resources