Parameter "n" in edarf partial dependency function - r

I am currently using the partial_dependence function within the edarf package.
Regarding the n parameter, i found the below definition from https://rdrr.io/cran/edarf/f/vignettes/edarf.Rmd.... however am still confused to exactly how i should be using/understanding this variable. Any suggestions is greatly appreciated.
Thanks
"The parameter n is a numeric vector of length two, which controls via its first element the size of the grid to evaluate the columns specified by the vars argument at, and the second element gives the number of rows of the other variables to sample when marginalizing the prediction function. All the actual computation is performed in the mmpf package using the function marginalPrediction. Additional arguments can be passed to this function by name (i.e., using ...)."

Related

Can someone please explain me this code? especially the role of "function x and [[x]]"?

This is the code in R and I'm having trouble understanding the role of function(x) and qdata[[x]] in this line of code. Can someone elaborate me this piece by piece? I didn't write this code. Thank you
outs=lapply(names(qdata[,12:35]), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
This code generate a series of histograms, one for each of columns 12 to 35 of dataframe qdata. The lapply function iterates over the columns. At each iteraction, the name of the current column is passed as argument "x" to the anonymous function defined by "function(x)". The body of the function is a call to the hist() function, which creates the histogram. qdata[[x]] (where x is the name of a column) extracts the data from that column. I am actually confused by "data=qdata".
We don't have the data object named qdata so we cannot really be sure what will happen with this code. It appears that the author of this code is trying to pass the values of components named outs from function calls to hist. If qdata is an ordinary dataframe, then I suspect that this code will fail in that goal, because the hist function does not have an out component. (Look at the output of ?hist. When I run this with a simple dataframe, I do get histogram plots that appear in my interactive plotting device but I get NULL values for the outs components. Furthermore the 12 warnings are caused by the lack of a data parameter to hte hist function.
qdata <- data.frame(a=rnorm(10), b=rnorm(10))
outs=lapply(names(qdata), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
#There were 12 warnings (use warnings() to see them)
> str(outs)
List of 2
$ : NULL
$ : NULL
So I think we need to be concerned about the level of R knowledge of the author of this code. It's possible I'm wrong about this presumption. The hist function is generic and it is possible that some unreferenced package has a function designed to handle a data object and retrun an outs value when delivered a vector having a particular class. In a typical starting situation with only the base packages loaded however, there are only three hist.* functions:
methods(hist)
#[1] hist.Date* hist.default hist.POSIXt*
#see '?methods' for accessing help and source code
As far as the questions about the role of function and [[x]]: the keyword function returns a language object that can receive parameter values and then do operations and finally return results. In this case the names get passed to the anonymous function and become, each in turn, the local name, x and the that value is used by the '[['-function to look-up the column in what I am presuming is the ‘qdata’-dataframe.

Optim with multiple parameters - define function argument type

I want to minimize the first 11 (or any large number) parameters of a function, which takes 15 (or any number greater than 11 / the first number stated) parameters in total. The problem is that when I use the optim function with par=c(...) for the first 11 values, the function cannot be evaluated as it takes all 11 values for its first argument only. Is there any way to force each input argument to be a scalar / single number?
To be more specific:
fun_to_optim <- function(m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,p1,p2,p3,p4)
{
# I want to make sure m1 to m11 are single numbers and no vectors
...
}
# optimization
optim(par=c(0,0,0,0,0,0,0,0,0,0,0),fn=fun_to_optim,p1=data,p2=opt,p3=data,p4=1)
Trying to run this code always ends with the error message: Error in fn(par, ...) : argument "m1" is missing, with no default. By printing out the arguments, I figured that the function assumes that all values from par are for the input variable m1. So m1 is taken to be a vector with the initial values given through par.
Please note: I am aware of the possibility to simply put the first 11 parameters into a vector (let's say para_to_optim) and then call the optim function as I did. However, I explicitly do not want to do this because I need to optimize the parameters iteratively, i.e. starting with just 1 parameter and ending will all 11. Writing the function 11 times with different input argument structures seems a bit inefficient.
I have already checked similar post for function arguments such as Forcing specific data types as arguments to a function and How to define argument types for R functions?, but it would not be possible in my case to check the type of the argument within the function and wrapper or S3 classes are also kind of problematic. Is there any other way to solve this?

R: parameter in update function

Here is a snippet of R script doing beta regression on data "GasolineYield":
library("betareg")
data("GasolineYield", package = "betareg")
gy_logit <- betareg(yield ~ batch + temp, data = GasolineYield)
gy_logit4 <- update(gy_logit, subset = -4)
The 4th line magically deletes the 4th observation and update the fit automatically, but I don't quite understand the why this parameter works in the update function here, because I tried to look up the documentation by ?update, but couldn't find there's such parameter.
I'm curious about how to find right documentation in this case, because maybe I want to add some new observation instead of removing it. Any help?
subset in betareg works the same as subset in lm, therefore you can read lm documentation.
From the help file you can find:
subset an optional vector specifying a subset of observations to be used in the fitting process.
Hence by setting select=-4 you are lefting out the fourth row in the estimation.
update() contains the ... parameter, which means any parameters that are not matched in your call to update() are passed on to the function that does the estimation. In this case, that is betareg(), which does have the subset argument.
This type of thing is very common in R. Many higher-level function that call other user-visible functions will have the three dot parameter and pass any unmatched parameters on, so you have to search all the user-visible functions that get called in order to know all possible options.
You can check out the help file for the top level function (update() in this case) to get an idea of which functions get the leftover parameters.

Taylor diagram from existing Correlation and Standard Dev values

Is it possible to create a Taylor diagram from already calculated correlation and standard deviation values?
I am doing model evaluation, and I have already the correlation and standard deviations values.I understand that there is already a package plotrix where by giving the observation and the modeled values, the diagram is created. However for the type of work that I am doing, it is easier to start by giving already the correlation and standard deviation values.
Is there any way I can do this in R?
There's no reason it shouldn't be possible, but the authors didn't seem to allow for that when they wrote the function. The function is a bit long and complex, but the part that does the calculation is at the top. It is possible to swap out that code and replace it to allow for the passing of summary statistics. Now, keep in mind what i'm about to do is a hack and i've only tested it with versions 3.5-5 of plotrix. Other version may not work.
Here will will create a new function taylor.diagram2 that takes all the code from taylor.diagram but adds in an extra if statement to check for a list of summarized data as the first argument
taylor.diagram2<-taylor.diagram
bl<-as.list(body(taylor.diagram))
cond<-list(
as.name("if"),
quote(is.list(ref) & missing(model)), #condition
quote({R<-ref$R; sd.r<-ref$sd.r; sd.f<-ref$sd.f}), #if true
as.call(c(as.symbol("{"), bl[3:8]))) #else
bl<-c(bl[1:2], as.call(cond), bl[9:length(bl)]) #splice in new code
body(taylor.diagram2)<-as.call(bl) #update function
Now we can test the function. First, we'll do things the standard way
#test data
aref<-rnorm(30,sd=2)
amodel1<-aref+rnorm(30)/2
#standard behavior function
taylor.diagram2(aref,amodel1, main="Standard Behavior"))
#summarized data
xx<-list(
R=cor(aref, amodel1, use = "pairwise"),
sd.r=sd(aref),
sd.f=sd(amodel1)
)
#modified behavior
taylor.diagram2(xx, main="Modified Behavior")
So the new taylor.diagram2 function can do both. If you pass it two vectors, it will do the standard behavior. If you pass it a list with the names R, sd.r, and sd.f, then it will do the same plot but with the values you passed in. Also, the model parameter must be empty for the modified version to work. That means if you want to set any additional parameter, you must use named parameters rather than positional arguments.

Accessing class values in R's poLCA

I am trying my hand at learning Latent Component Analysis, while also learning R. I'm using the poLCA package, and am having a bit of trouble accessing the attributes. I can run the sample code just fine:
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
ds = within(ds, (cesdcut = ifelse(cesd>20, 1, 0)))
library(poLCA)
res2 = poLCA(cbind(homeless=homeless+1,
cesdcut=cesdcut+1, satreat=satreat+1,
linkstatus=linkstatus+1) ~ 1,
maxiter=50000, nclass=3,
nrep=10, data=ds)
but in order to make this more useful, I'd like to access the attributes within the objects created by the poLCA class as such:
attr(res2, 'Nobs')
attr(res2, 'maxiter')
but they both come up as 'Null'. I expect Nobs to be 453 (determined by the function) and maxiter to be 50000 (dictated by my input value).
I'm sure I'm just being naive, but I could use any help available. Thanks a lot!
Welcome to R. You've got the model-fitting syntax right, in that you can get a model out (don't know how latent component analysis works, so can't speak to the statistical validity of your result). However, you've mixed up the different ways in which R can store information pertaining to a model.
poLCA returns an object of class poLCA, which is
a list containing the following elements:
(. . .)
Nobs number of fully observed cases (less than or equal to N).
maxiter maximum number of iterations through which the estimation algorithm was set
to run.
Since it's a list, you can extract individual elements from your model object using the $ operator:
res2$Nobs # number of observations
res2$maxiter # maximum iterations
In some cases, there might be extractor functions to get this information without having to do low-level indexing. For example, many model-fitting functions will have a fitted method, which pulls out the vector of fitted values on the training data; and similarly residuals pulls out the vector of residuals. You should check whether there are such extractor functions provided by the poLCA package and use them if possible; that way, you're not making assumptions about the structure of the model object that might be broken in the future.
This is distinct to getting the attributes of an object, which is what you use attr for. Attributes in R are what you might call metadata: they contain R-specific information about an object itself, rather than information about whatever it is the object relates to. Examples of common attributes include class (the class of an object), dim (the dimensions of an array or matrix), names (names of individual elements of a vector/list/array) and so on.

Resources