number of k when build a yai object in yaiImpute package - r

I'm trying to find the best number of nearest neighbor to use in mahalanobis method in yai function offered by yaImpute package (1.0-19). I tried to run the yai function with 'mal' method with different number of k:
mal<-yai(x=x,y=y,method="mahalanobis", k=5, noTrgs= FALSE, nVec=NULL, pVal=.05, ann=F)
mal<-yai(x=x,y=y,method="mahalanobis", k=20, noTrgs= FALSE, nVec=NULL, pVal=.05, ann=F)
But, when I'm looking the rmsd (root mean square distance) of each, there are exactly the same. The process found effectively the number of k that I asked (when I print the 'mal' object), but it seems like it does not use them.
My aim is to use AsciiGridImpute function to impute values on my entire map. But I don't understand what is the utility of the k number in my yai object. How the AscciGridImpute use them?
Thank you
Sorry for my bad english!!

I finally found why the RMSD was the same whatever the number of k used in the yai object.
The function rmsd.yai called automatically an other function call impute.yai. That function is supposed to allow methods as mean or dstWeighted to compute the imputed values for continuous variables. The use of these methods changed effectively the rmsd values depending on the number of k.
But the automatically call of this function (impute.yai) compute imputed values with the default method:closest. So only one k is used.
I think it's the same thing that happen with the function AsciiGridImpute.

Related

How would you optimize dividing bi variate data in R?

I'm not looking for a specific line a code - just built in functions or common packages that may help me do the following. Basically, something like, write up some code and use this function. I'm stuck on how to actually optimize - should I use SGD?
I have two variables, X, Y. I want to separate Y into 4 groups so that the L2, that is $(Xji | Yi - mean(Xji) | Yi)^2$ is minimized subject to the constraint that there are at least n observations in each group.
How would one go about solving this? I'd imagine you can't do this with the optim function? Basically the algo needs to move 3 values around (there are 3 cutoff points for Y) until L2 is minimized subject to n being a certain size.
Thanks
You could try optim and simply add a penalty if the constraints are not satisfied: since you minimise, add zero if all constraints are okay; otherwise a positive number.
If that does not work, since you only look for three cutoff points, I'd probably try a grid search, i.e. compute the objective function for different levels of the cutoff point; throw away those that violate the constraints, and then keep the best solution.

TSFRESH: Get N most relevant features

Is there any way to get the N most relevant features in TSFRESH? Currently, the method extract_relevant_features has a parameter fdr_level, but for a big amount of time series (>1000), the function with a very low fdr_level parameter (< 0.01) returns more than 400 features. I would like to return the 20 or 40 most relevant features.
You could use the function calculate_relevance_table (link to the docu) (which is called internally in the select_features method, which in turn is called in the extract_relevant_features method) to get the p-value for each of the features and then only use the TOP-N sorted by p-value.
So the general flow would be:
extract all features with extract_features
call calculate_relevance_table
sort by p-value
get only the top N
You could even tell tsfresh the next time to only extract those features (to save a lot of computation time) following this.

Function doesn't change value (R)

I have written a function that takes two arguments, a number between 0:16 and a vector which contains four parameter values.
The output of the function does change if I change the parameters in the vector, but it does not change if I change the number between 0:16.
I can add, that the function I'm having troubles with, includes another function (called 'pi') which takes the same arguments.
I have checked that the 'pi' function does actually change values if I change the value from 0:16 (and it does also change if I change the values of the parameters).
Firstly, here is my code;
pterm_ny <- function(x, theta){
(1-sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*theta[4])/pi(x, theta)
}
pi <- function(x, theta){
theta[1]*1*(x==0)+theta[2]*(theta[3]^(x))*exp((-1)*(theta[3]))+(1-
sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*(theta[4]))
}
Which returns 0.75 for pterm_ny(i,c(0.2,0.2,2,2)), were i = 1,...,16 and 0.2634 for i = 0, which tells me that the indicator function part in 'pi' does work.
With respect to raising a number to a certain power, I have been told that one should wrap the wished number in a 'I', as an example it would be like;
x^I(2)
I have tried to do that in my code, but that didn't help either.
I can't remember the argument for doing it, but I expect that it's to ensure that the number in parentheses is interpreted as an integer.
My end goal is to get 17 different values of the 'pterm' and to accomplish that, I was thinking of using the sapply function like this;
sapply(c(0:16),pterm_ny,theta = c(0.2,0.2,2,2))
I really hope that someone can point out what I'm missing here.
In advance, thank you!
You have a theta[4]^x term both in your main expression and in your pi() function; these are cancelling out, leaving the result invariant to changes in x ...
Also:
you might want to avoid using pi as your function name, as it's also a built-in variable (3.14159...) - this can sometimes cause confusion
the advice about using the "as is" function I() to protect powers is only relevant within formulas, e.g. as used in lm() (linear regression). (It would be used as I(x^2), not x^I(2)

Optimisation tool in R to find the input parameter of function that minimises output value?

I wish to find an optimisation tool in R that lets me determine the value of an input parameter (say, a specific value between 0.001 and 0.1) that results in my function producing a desired output value.
My function takes an input parameter and computes a value. I want this output value to exactly match a predetermined number, so the function outputs the absolute of the difference between these two values; when they are identical, the output of the function is zero.
I've tried optimize(), but it seems to be set up to minimise the input parameter, not the output value. I also tried uniroot(), but it produces the error f() values at end points not of opposite sign, suggesting that it doesn't like the fact that increasing/decreasing the input parameter reduces the output value up to a point, but going beyond that point then increases it again.
Apologies if I'm missing something obvious here—I'm completely new to optimising functions.
Indeed you are missing something obvious:-) It's very obvious how you should/could formulate your problem.
Assuming the function that must equal a desired output value is f.
Define a function g satisfying
g <- function(x) f(x) - output_value
Now you can use uniroot to find a zero of g. But you must provide endpoints that satisfy the requirements of uniroot. I.e. the value of g for one endpoint must be positive and the value of g for the other endpoint must be negative (or the other way around).
Example:
f <- function(x) x - 10
g <- function(x) f(x) - 8
then
uniroot(g,c(0,20))
will do what you want but
uniroot(g,c(0,2))
will issue the error message values at end points not of opposite sign.
You could also use an optimization function but then you want to minimize the function g. To set you straight: optimize does not minimize the input paramater. Read the help thoroughly.

Using outer() with a multivariable function

Suppose you have a function f<- function(x,y,z) { ... }. How would you go about passing a constant to one argument, but letting the other ones vary? In other words, I would like to do something like this:
output <- outer(x,y,f(x,y,z=2))
This code doesn't evaluate, but is there a way to do this?
outer(x, y, f, z=2)
The arguments after the function are additional arguments to it, see ... in ?outer. This syntax is very common in R, the whole apply family works the same for instance.
Update:
I can't tell exactly what you want to accomplish in your follow up question, but think a solution on this form is probably what you should use.
outer(sigma_int, theta_int, function(s,t)
dmvnorm(y, rep(0, n), y_mat(n, lambda, t, s)))
This calculates a variance matrix for each combination of the values in sigma_int and theta_int, uses that matrix to define a dennsity and evaluates it in the point(s) defined in y. I haven't been able to test it though since I don't know the types and dimensions of the variables involved.
outer (along with the apply family of functions and others) will pass along extra arguments to the functions which they call. However, if you are dealing with a case where this is not supported (optim being one example), then you can use the more general approach of currying. To curry a function is to create a new function which has (some of) the variables fixed and therefore has fewer parameters.
library("functional")
output <- outer(x,y,Curry(f,z=2))

Resources