User-defined kernels in kernlab library

User-defined kernels in kernlab library - r

Lately I've worked with kernlab library in R-statistical software. In specific, I use the ksvm function to identify patterns. I'm trying to implement a new kernel in SVM, my code is:
<gnndot<-function (sigma = 1, degree=2)
{
rval <- function(x, y = NULL) {
return(exp(-sigma * ( sqrt(-(round(2 * crossprod(x, y) - crossprod(x) - crossprod(y), 9)))^degree ) ))
}
}
library(mvtnorm)
library(kernlab)
x <- rmvnorm(n=500, mean=c(1,1.5,2,2.5,3), sigma=diag(5))
class<-rep(c("ONE","TWO","THREE","FOUR","FIVE"),each=100)
ksvm(x,class,type="C-svc",kernel ="gnndot",kpar=list(sigma=0.05,degree=1.5),C=600,cross=5)>
The result that I got is: Error in match.arg(kernel, c("rbfdot", "polydot", "tanhdot", "vanilladot", : 'arg' should be one of “rbfdot”, “polydot”, “tanhdot”, “vanilladot”, “laplacedot”, “besseldot”, “anovadot”, “splinedot”, “matrix”
This is, I must define this "user-defined" kernel as one of the already designed kernels. Is it possible to use the kernel here defined (gnndot) as one usable by the function ksvm?

Related

could not find function "symbolicInt"

I simply cannot get these functions: symbolicInt and integral. Not sure why. I just re-installed everthing. I thought maybe it was becuase of the libraries but I tried each library, by itself; did not work.
For example, according to documentation of
Document of symbolicInt
Document of integral
Current R Versions:
R 4.2
R42-tools are downloaded.
Code:
library('mosaic')
library('mosaicCore')
library('mosaicCalc')
f <- function(x){a * x^2 ~ x}
F = antiD(a * x^2 ~ x, a = 1)
F
symbolicInt(f )
f = integral( 3*x**2 ~ x)
f(from=1,to=4)
Result:
Error in symbolicInt(f) : could not find function "symbolicInt"
Error in integral(f ~ x) : could not find function "integral"
Edit: I am able to use symbolicInt by mosaicCalc:::symbolicInt ; but I cannot process this function:
(x/sigma**2)*exp((-x**2)/2*sigma**2 )
It gives following error:
Error in .intArith(form, ...) :
Error: symbolic algorithm gave up
6. stop("Error: symbolic algorithm gave up")
5. .intArith(form, ...)
4. symbolicAntiD(lform, ...)
3. .intArith(form, ...)
2. symbolicAntiD(form, ...)
1. sI((x/sigma^2) * exp((-x^2)/2 * sigma^2) + 1 ~ x)

Target multiple variables for a function within a custom function

I would like to build a function which runs a chi square test if calc=TRUE. The problem appears to be targeting the variables used within the wtd.chi.sq function.
Thank you very much in advance.
library(weights); library(sjstats)
testfunction <- function(dta, x, y, calc=TRUE, annotate=note){
if(isTRUE(calc)){
testresult<-wtd.chi.sq(dta[[x]], dta[[y]])
return(testresult["p.value"])
}
else {
annotate <- paste0(note, "... and no calc necessary")
}
return(annotate)
}
testfunction(dta=df, x=f1, y=s12x, calc=TRUE, annotate=note)
Error in tbl_subset2(x, j = i, j_arg = substitute(i)) :
object 'f1' not found

How to use `foreach` and `%dopar%` with an `R6` class in R?

I ran into an issue trying to use %dopar% and foreach() together with an R6 class. Searching around, I could only find two resources related to this, an unanswered SO question and an open GitHub issue on the R6 repository.
In one comment (i.e., GitHub issue) an workaround is suggested by reassigning the parent_env of the class as SomeClass$parent_env <- environment(). I would like to understand what exactly does environment() refer to when this expression (i.e., SomeClass$parent_env <- environment()) is called within the %dopar% of foreach?
Here is a minimal reproducible example:
Work <- R6::R6Class("Work",
public = list(
values = NULL,
initialize = function() {
self$values <- "some values"
}
)
)
Now, the following Task class uses the Work class in the constructor.
Task <- R6::R6Class("Task",
private = list(
..work = NULL
),
public = list(
initialize = function(time) {
private$..work <- Work$new()
Sys.sleep(time)
}
),
active = list(
work = function() {
return(private$..work)
}
)
)
In the Factory class, the Task class is created and the foreach is implemented in ..m.thread().
Factory<- R6::R6Class("Factory",
private = list(
..warehouse = list(),
..amount = NULL,
..parallel = NULL,
..m.thread = function(object, ...) {
cluster <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cluster)
private$..warehouse <- foreach::foreach(1:private$..amount, .export = c("Work")) %dopar% {
# What exactly does `environment()` encapsulate in this context?
object$parent_env <- environment()
object$new(...)
}
parallel::stopCluster(cluster)
},
..s.thread = function(object, ...) {
for (i in 1:private$..amount) {
private$..warehouse[[i]] <- object$new(...)
}
},
..run = function(object, ...) {
if(private$..parallel) {
private$..m.thread(object, ...)
} else {
private$..s.thread(object, ...)
}
}
),
public = list(
initialize = function(object, ..., amount = 10, parallel = FALSE) {
private$..amount = amount
private$..parallel = parallel
private$..run(object, ...)
}
),
active = list(
warehouse = function() {
return(private$..warehouse)
}
)
)
Then, it is called as:
library(foreach)
x = Factory$new(Task, time = 2, amount = 10, parallel = TRUE)
Without the following line object$parent_env <- environment(), it throws an error (i.e., as mentioned in the other two links): Error in { : task 1 failed - "object 'Work' not found".
I would like to know, (1) what are some potential pitfalls when assigning the parent_env inside foreach and (2) why does it work in the first place?
Update 1:
I returned environment() from within foreach(), such that private$..warehouse captures those environments
using rlang::env_print() in a debug session (i.e., the browser() statement was placed right after foreach has ended execution) here is what they consist of:
Browse[1]> env_print(private$..warehouse[[1]])
# <environment: 000000001A8332F0>
# parent: <environment: global>
# bindings:
# * Work: <S3: R6ClassGenerator>
# * ...: <...>
Browse[1]> env_print(environment())
# <environment: 000000001AC0F890>
# parent: <environment: 000000001AC20AF0>
# bindings:
# * private: <env>
# * cluster: <S3: SOCKcluster>
# * ...: <...>
Browse[1]> env_print(parent.env(environment()))
# <environment: 000000001AC20AF0>
# parent: <environment: global>
# bindings:
# * private: <env>
# * self: <S3: Factory>
Browse[1]> env_print(parent.env(parent.env(environment())))
# <environment: global>
# parent: <environment: package:rlang>
# bindings:
# * Work: <S3: R6ClassGenerator>
# * .Random.seed: <int>
# * Factory: <S3: R6ClassGenerator>
# * Task: <S3: R6ClassGenerator>

Disclaimer: a lot of what I say here are educated guesses and inferences based on what I know,
I can't guarantee everything is 100% correct.
I think there can be many pitfalls,
and which one applies really depends on what you do.
I think your second question is more important,
because if you understand that,
you'll be able to evaluate some of the pitfalls by yourself.
The topic is rather complex,
but you can probably start by reading about R's lexical scoping.
In essence, R has a sort of hierarchy of environments,
and when R code is executed,
variables whose values are not found in the current environment
(which is what environment() returns)
are sought in the parent environments
(not to be confused with the caller environments).
Based on the GitHub issue you linked,
R6 generators save a "reference" to their parent environments,
and they expect that everything their classes may need can be found in said parent or somewhere along the environment hierarchy,
starting at that parent and going "up".
The reason the workaround you're using works is because you're replacing the generator's parent environment with the one in the current foreach call inside the parallel worker
(which may be a different R process, not necessarily a different thread),
and, given your .export specification probably exports necessary values,
R's lexical scoping can then search for missing values starting from the foreach call in the separate thread/process.
For the specific example you linked,
I found that a simpler way to make it work
(at least on my Linux machine)
is to do the following:
library(doParallel)
cluster <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cluster)
parallel::clusterExport(cluster, setdiff(ls(), "cluster"))
x = Factory$new(Task, time = 1, amount = 3)
but leaving the ..m.thread function as:
..m.thread = function(object, amount, ...) {
private$..warehouse <- foreach::foreach(1:amount) %dopar% {
object$new(...)
}
}
(and manually call stopCluster when done).
The clusterExport call should have semantics similar to*:
take everything from the main R process' global environment except cluster,
and make it available in each parallel worker's global environment.
That way, any code inside the foreach call can use the generators when lexical scoping reaches their respective global environments.
foreach can be clever and exports some variables automatically
(as shown in the GitHub issue),
but it has limitations,
and the hierarchy used during lexical scoping can get very messy.
*I say "similar to" because I don't know what exactly R does to distinguish (global) environments if forks are used,
but since that export is needed,
I assume they are indeed independent of each other.
PS: I'd use a call to on.exit(parallel::stopCluster(cluster)) if you create workers inside a function call,
that way you avoid leaving processes around until they are somehow stopped if an error occurs.

Can R recognize the type of distribution used as a function argument?

Background
I have a simple function called TBT. This function has a single argument called x. A user can provide any type rdistribution_name() (e.g., rnorm(), rf(), rt(), rbinom() etc.) existing in R for argument x, EXCEPT ONE: "rcauchy()".
Question
I was wondering how R could recognize that a user has provided an rcauchy() as the input for x, and when this is the case, then R issues a warning message?
Here is my R code with no success:
TBT = function(x) {
if( x == rcauchy(...) ) { warning("\n\tThis type of distribution is not supported.") }
}
TBT( x = rcauchy(1e4) )
Error in TBT(rcauchy(10000)) : '...' used in an incorrect context

If you are expeciting them do call to random function when they call your function, you could so
TBT <- function(x) {
xcall <- match.call()$x
if (class(xcall)=="call" && xcall[[1]]=="rcauchy") {
warning("\n\tThis type of distribution is not supported.")
}
}
TBT( x = rcauchy(1e4) )
But this would not catch cases like
x <- rcauchy(1e4)
TBT( x )
R can't track where the data in the x variable came from

(in R) Why is result of ksvm using user-defined linear kernel different from that of ksvm using "vanilladot"?

I wanted to use user-defined kernel function for Ksvm in R.
so, I tried to make a vanilladot kernel and compare with "vanilladot" which is built in "kernlab" as practice.
I write my kernel as follow.
#
###vanilla kernel with class "kernel"
#
kfunction.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "kernel"
k}
l<-0.1 ; C<-1/(2*l)
###use kfunction.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.k
I thouhgt result of this operation is eqaul to that of following.
However It dosn't.
#
###use "vanilladot"
#
l<-0.1 ; C<-1/(2*l)
tmp1<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel="vanilladot", C = C)
alpha(tmp1)[[1]]
ind1<-alphaindex(tmp1)[[1]]
x.s<-x[ind1,] ; y.s<-y[ind1]
w.tmp1<-t(alpha(tmp1)[[1]]*y.s)%*%x.s
w.tmp1
I think maybe this problem is related to kernel class.
When class is set to "kernel", this problem is occured.
However When class is set to "vanillakernel", the result of ksvm using user-defined kernel is equal to that of ksvm using "vanilladot" which is built in Kernlab.
#
###vanilla kernel with class "vanillakernel"
#
kfunction.v.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "vanillakernel"
k}
# The only difference between kfunction.k and kfunction.v.k is "class(k)".
l<-0.1 ; C<-1/(2*l)
###use kfunction.v.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.v.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.v.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.v.k
I don't understand why the result is different from "vanilladot", when setting the class to "kernel".
Is there an error in my operation?

First, it seems like a really good question!
Now to the point. In the sources of ksvm we can find when is a line drawn between using user-defined kernel, and the built-ins:
if (type(ret) == "spoc-svc") {
if (!is.null(class.weights))
weightedC <- class.weights[weightlabels] * rep(C,
nclass(ret))
else weightedC <- rep(C, nclass(ret))
yd <- sort(y, method = "quick", index.return = TRUE)
xd <- matrix(x[yd$ix, ], nrow = dim(x)[1])
count <- 0
if (ktype == 4)
K <- kernelMatrix(kernel, x)
resv <- .Call("tron_optim", as.double(t(xd)), as.integer(nrow(xd)),
as.integer(ncol(xd)), as.double(rep(yd$x - 1,
2)), as.double(K), as.integer(if (sparse) xd#ia else 0),
as.integer(if (sparse) xd#ja else 0), as.integer(sparse),
as.integer(nclass(ret)), as.integer(count), as.integer(ktype),
as.integer(7), as.double(C), as.double(epsilon),
as.double(sigma), as.integer(degree), as.double(offset),
as.double(C), as.double(2), as.integer(0), as.double(0),
as.integer(0), as.double(weightedC), as.double(cache),
as.double(tol), as.integer(10), as.integer(shrinking),
PACKAGE = "kernlab")
reind <- sort(yd$ix, method = "quick", index.return = TRUE)$ix
alpha(ret) <- t(matrix(resv[-(nclass(ret) * nrow(xd) +
1)], nclass(ret)))[reind, , drop = FALSE]
coef(ret) <- lapply(1:nclass(ret), function(x) alpha(ret)[,
x][alpha(ret)[, x] != 0])
names(coef(ret)) <- lev(ret)
alphaindex(ret) <- lapply(sort(unique(y)), function(x)
which(alpha(ret)[,
x] != 0))
xmatrix(ret) <- x
obj(ret) <- resv[(nclass(ret) * nrow(xd) + 1)]
names(alphaindex(ret)) <- lev(ret)
svindex <- which(rowSums(alpha(ret) != 0) != 0)
b(ret) <- 0
param(ret)$C <- C
}
The important parts are two things, first, if we provide ksvm with our own kernel, then ktype=4 (while for vanillakernel, ktype=0) so it makes two changes:
in case of user-defined kernel, the kernel matrix is computed instead of actually using the kernel
tron_optim routine is ran with the information regarding the kernel
Now, in the svm.cpp we can find the tron routines, and in the tron_run (called from tron_optim), that LINEAR kernel has a separate optimization routine
if (param->kernel_type == LINEAR)
{
/* lots of code here */
while (Cpj < Cp)
{
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w,
Cpj, Cnj, param->eps, sii, param->shrinking,
param->qpsize);
/* lots of code here */
}
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
delete[] w;
}
else
{
Solver_B s;
s.Solve(l, BSVC_Q(*prob,*param,y), minus_ones, y, alpha, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
}
As you can see, the linear case is treated in the more complex, more detailed way. There is an inner optimization loop calling the solver many times. It would require really deep analysis of actual optimization being performed here, but at this step one can answer your question in a following way:
There is no error in your operation
kernlab's svm has a separate routine for training SVM with linear kernel, which is based on the type of kernel passed to the code, changing "kernel" to "vanillakernel" made the ksvm think it is actually working with vanillakernel, and so performed this separate optimization routine
It does not seem as a bug in fact, as the linear SVM is in fact very different from the kernelized version in terms of efficient optimization techniques. Amount of heuristic as well as numerical issues that has to be taken care of is really big. As a result, some approximations are required and can lead to the different results. While for the rich feature space (like those induced by RBF kernel) it should not really matter, for simple kernels line linear ones - this simplifications can lead to significant output changes.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

User-defined kernels in kernlab library - r

Related

could not find function "symbolicInt"

Target multiple variables for a function within a custom function

How to use `foreach` and `%dopar%` with an `R6` class in R?

Can R recognize the type of distribution used as a function argument?

(in R) Why is result of ksvm using user-defined linear kernel different from that of ksvm using "vanilladot"?

Categories

Resources