bnlearn package: unexpected cpdist (prediction) behaviour - r

I encounter a problem that goes beyond my understanding.
I have made a simple reproducible example for you to test it out.
Basically I create a Bayesian network with two strongly correlated variables that are linked together. One would expect that if one of them is high, the other one should also be (Since they are directly linked).
library(bnlearn)
Learning.set4 = cbind(c(1,2,1,8,9,9),c(2,0,1,10,10,10))
Learning.set4 = as.data.frame(Learning.set4)
colnames(Learning.set4) = c("Cause","Cons")
b.network = empty.graph(colnames(Learning.set4))
struct.mat = matrix(0,2,2)
colnames(struct.mat) = colnames(Learning.set4)
rownames(struct.mat) = colnames(struct.mat)
struct.mat[1,2] = 1
bnlearn::amat(b.network) = struct.mat
haha = bn.fit(b.network,Learning.set4)
# Here we get a mean that is close to 10
seems_logic_to_me=cpdist(haha, nodes="Cons",
evidence=list("Cause"=10), method="lw")
# Here I get a mean that is close to 5, so a high value
# of Cons wouldn't mean anything for Cause?
very_low_cause_values = cpdist(haha, nodes="Cause",
evidence=list("Cons"=10), method="lw")
Could anyone enlighten me here on the reason why it doesn't work with the lw method? (You can try with ls and it seems to work fine).
lw stands for likelihood weighting
UPDATE:
Got the solution from the maintainer.
Adding the following at the end will print the the expected prediction:
print (sum(very_low_cause_values[, 1] * attr(very_low_cause_values, "weights")) / sum(attr(very_low_cause_values, "weights")))

Related

How do I fix Newton failed to find minimum in R

I have a series of ARGOS data from various individual animals. I am using the R package aniMotum (previously foiegras) to create move persistence maps like this example output from the documentation. My goal is to be able to create one of these maps for each animal in my dataframe.
When I adapt the example code to my tracking data, it fails on some tracks and not others. When it fails, I receive a series of errors like these
Error in newton(par = c(X = 4205.28883097774, X = -5904.86998464547, X = 4204.95187507608, :
Newton failed to find minimum.
And a final error
Warning message:
The optimiser failed. Try simplifying the model with the following argument:
map = list(rho_o = factor(NA))
Here is a sample of the code I used
library(aniMotum)
library(tidyverse)
movePersistence <- readRDS("newton_error.rds")
x <-fit_ssm(movePersistence,
vmax = 3,
model = "mp",
time.step = 2,
control = ssm_control(verbose = 2)
)
aniMotum::map(x, what = "p", normalise = TRUE, silent = TRUE)
From what I can tell, the issue is occurring from the Newton function in the TMB package (a dependency of Animotum). I have noticed that if I change the time.step value, it will change the number of errors I get. For example, on track 14, I get 13 Newton errors with a step of 2 but none with a step of 12. I am using 2 as it gives a better approximation than 12. I did try my whole dataset at various steps and each time, different tracks fail. I also tried changing the tolerance and number of iterations in the ssm_control but that was more of a hail-Mary approach that was not working.
Here is a dput() of track 14 to reproduce the errors on a smaller scale using the above code - https://pastebin.com/xMXzdPDh - if anyone has some recommendations I would greatly appreciate it as I do not understand the inconsistent results.

I'm unable to solve an unused arguments error I get when using the smd package

I keep getting an 'unused arguments' error when I call the SMD function
I'm using smd() as part of a larger data analysis, comparing groups created through k-means clustering. And it was all working fine... until it wasn't. I'd been editing other parts of the main script - adding a derived variable.
I've puzzled for some time, checking syntax and the code that creates the function arguments. All to no avail. Finally I wrote a short script to see if I had this problem with some very basic data. And I still do. The new script is
library(smd)
Mean_x <- 75
Mean_y <- 25
n_x <- 25
n_y <- 25
sd_x <- 40
sd_y <- 20
temp_smd <- smd(Mean.1=Mean_x, Mean.2=Mean_y, s.1=sd_x, s.2=sd_y, n.1=n_x, n.2=n_y)
... and I get the error message
Error in smd(Mean.1 = Mean_x, Mean.2 = Mean_y, s.1 = sd_x, s.2 = sd_y, :
unused arguments (Mean.1 = Mean_x, Mean.2 = Mean_y, s.1 = sd_x, s.2 = sd_y, n.1 = n_x, n.2 = n_y)
I even tried smd::smd, in case there was a package conflict that I wasn't aware of.
All help appreciated
From the documentation on smd package (https://cran.r-project.org/web/packages/smd/) it looks like smd is looking for the following arguments:
x a vector or matrix of values
g a vector of at least 2 groups to compare. This should coercable to a factor.
w a vector of numeric weights (optional)
std.error Logical indicator for computing standard errors using compute_smd_var. Defaults to FALSE.
na.rm Remove NA values from x? Defaults to FALSE.
gref an integer indicating which level of g to use as the reference group. Defaults to 1.
Whereas your giving it a bunch of different arguments, which the function isn't able to make use of. Your arguments make sense in terms of the mathematical explanation of what smd is supposed to calculate as presented in the documentation, but the documentation doesn't make it clear (at least to me) how the arguments its expecting relate to the number it calculates. If it were me, I'd probably write my own function to do the calculation.
Shutting everything down and re-installing the MBESS package seems to have fixed it? It is all working now anyway! :-)

zfit straight line fitting for 2 dim dataset

I would like to fit 2-dim plot by straight line (a*x+b) using zfit like the following figure.
That is very easy work by a probfit package, but it has been deprecated by scikit-hep. https://nbviewer.jupyter.org/github/scikit-hep/probfit/blob/master/tutorial/tutorial.ipynb
How can I fit such 2dim plots by any function?
I've checked zfit examples, but it seems to be assumed some distribution (histogram) thus zfit requires dataset like 1d array and I couldn't reach how to pass 2d data to zfit.
There is no direct way in zfit currently to implement this out-of-the-box (with one line), since a corresponding loss is simply not added.
However, the SimpleLoss (zfit.loss.SimpleLoss) allows you to construct any loss that you can think of (have a look at the example as well in the docstring). In your case, this would look along this:
x = your_data
y = your_targets # y-value
obs = zfit.Space('x', (lower, upper))
param1 = zfit.Parameter(...)
param2 = zfit.Parameter(...)
...
model = Func(...) # a function is the way to go here
data = zfit.Data.from_numpy(array=x, obs=obs)
def mse():
prediction = model.func(data)
value = tf.reduce_mean((prediction - y) ** 2) # or whatever you want to have
return value
loss = zfit.loss.SimpleLoss(mse, [param1, param2])
# etc.
On another note, it would be a good idea to add such a loss. If you're interested to contribute I recommend to get in contact with the authors and they will gladly help you and guide you to it.
UPDATE
The loss function itself consists presumably of three to four things: x, y, a model and maybe an uncertainty on y. The chi2 loss looks like this:
def chi2():
y_pred = model.func(x)
return tf.reduce_sum((y_pred - y) / y_error) ** 2)
loss = zfit.loss.SimpleLoss(chi2, model.get_params())
That's all, 4 lines of code. x is a zfit.Data object, model is in this case a Func.
Does that work?
That's all.

Estimate parameters of Frechet distribution using mmedist or fitdist(with mme) error

I'm relatively new in R and I would appreciated if you could take a look at the following code. I'm trying to estimate the shape parameter of the Frechet distribution (or inverse weibull) using mmedist (I tried also the fitdist that calls for mmedist) but it seems that I get the following error :
Error in mmedist(data, distname, start = start, fix.arg = fix.arg, ...) :
the empirical moment function must be defined.
The code that I use is the below:
require(actuar)
library(fitdistrplus)
library(MASS)
#values
n=100
scale = 1
shape=3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
memp=minvweibull(c(1,2), shape=3, rate=1, scale=1)
# estimating the parameters
para_lm = mmedist(data_fre,"invweibull",start=c(shape=3,scale=1),order=c(1,2),memp = "memp")
Please note that I tried many times en-changing the code in order to see if my mistake was in syntax but I always get the same error.
I'm aware of the paradigm in the documentation. I've tried that as well but with no luck. Please note that in order for the method to work the order of the moment must be smaller than the shape parameter (i.e. shape).
The example is the following:
require(actuar)
#simulate a sample
x4 <- rpareto(1000, 6, 2)
#empirical raw moment
memp <- function(x, order)
ifelse(order == 1, mean(x), sum(x^order)/length(x))
#fit
mmedist(x4, "pareto", order=c(1, 2), memp="memp",
start=c(shape=10, scale=10), lower=1, upper=Inf)
Thank you in advance for any help.
You will need to make non-trivial changes to the source of mmedist -- I recommend that you copy out the code, and make your own function foo_mmedist.
The first change you need to make is on line 94 of mmedist:
if (!exists("memp", mode = "function"))
That line checks whether "memp" is a function that exists, as opposed to whether the argument that you have actually passed exists as a function.
if (!exists(as.character(expression(memp)), mode = "function"))
The second, as I have already noted, relates to the fact that the optim routine actually calls funobj which calls DIFF2, which calls (see line 112) the user-supplied memp function, minvweibull in your case with two arguments -- obs, which resolves to data and order, but since minvweibull does not take data as the first argument, this fails.
This is expected, as the help page tells you:
memp A function implementing empirical moments, raw or centered but
has to be consistent with distr argument. This function must have
two arguments : as a first one the numeric vector of the data and as a
second the order of the moment returned by the function.
How can you fix this? Pass the function moment from the moments package. Here is complete code (assuming that you have made the change above, and created a new function called foo_mmedist):
# values
n = 100
scale = 1
shape = 3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
# estimating the parameters
para_lm = foo_mmedist(data_fre, "invweibull",
start= c(shape=5,scale=2), order=c(1, 2), memp = moment)
You can check that optimization has occurred as expected:
> para_lm$estimate
shape scale
2.490816 1.004128
Note however, that this actually reduces to a crude way of doing overdetermined method of moments, and am not sure that this is theoretically appropriate.

(R) function: object not found: environment depth fine?

I'm puzzled by a function error & would appreciate some insight.
The function, very briefly, automates the multiple processes involved in Boosted Regression Trees using gbm.step & other gbm's.
"gbm.auto" <- function (grids, samples, 3 parameters) {
starts 2 counters, require(gbm), does various small processing jobs with grids & samples
for parameter 1{
for parameter 2{
for parameter 3{
Runs 2 BRTs per parameter-combination loop, generates & iteratively updates a 'best' BRT for each, adds to counters. Extensive use of samples.
}}}
closes the loops, function continues as the first } is still open.
The next BRT can't find samples, even though it's at the same environment depth (1?) as the pre-loop processing jobs which used it successfully. Furthermore, adding "globalsamples<<-samples" after the }}} loop successfully saves the object, suggesting that samples is still available. Adding env1,2 & 3<<-environment() before the {{{ loop, within it & after it results in Environment for all three. Also suggesting it's all the same function environment & samples should be available.
What am I missing here? Thanks in advance!
Edit: exact message:
Error in eval(expr, envir, enclos) : object 'samples' not found
Function - loads removed & compacted but still gives same error message:
"gbm.auto" <-
function (samples, expvar, resvar, tc, lr, bf)
{ # open function
require(gbm)
require(dismo)
# create binary (0/1) response variable, for bernoulli BRTs
samples$brv <- ifelse(samples[resvar] > 0, 1, 0)
brvcol <- which(colnames(samples)=="brv") # brv column number for BRT
for(j in tc){ # permutations of tree complexity
for(k in lr){ # permutations of learning rate
for(l in bf){ # permutations of bag fraction
Bin_Best_Model<- gbm.step(data=samples,gbm.x = expvar, gbm.y = brvcol, family = "bernoulli", tree.complexity = j, learning.rate = k, bag.fraction = l)
}}} # close loops, producing all BRT/GBM objects & continue through model selection
Bin_Best_Simp_Check <- gbm.simplify(Bin_Best_Model) # simplify model
# if best number of variables to remove isn't 0 (i.e. it's worth simplifying), re-run the best model (Bin_Best_Model, using gbm.call to get its values)
# with just-calculated best number of variables to remove, removed. gbm.x asks which number of drops has the minimum mean (lowest point on the line)
# & that calls up the list of predictor variables with those removed, from $pred.list
if(min(Bin_Best_Simp_Check$deviance.summary$mean) < 0)
assign("Bin_Best_Simp", gbm.step(data = samples,
gbm.x = Bin_Best_Simp_Check$pred.list[[which.min(Bin_Best_Simp_Check$deviance.summary$mean)]],
gbm.y = brvcol, family = "bernoulli", tree.complexity = j, learning.rate = k, bag.fraction = l))
}
Read in data:
mysamples<-data.frame(response=round(sqrt(rnorm(5000, mean= 2.5, sd=1.5)^2)),
depth=sqrt(rnorm(5000, mean= 35, sd=24)^2),
temp=rnorm(5000, mean= 15, sd=1.2),
sal=rnorm(5000, mean= 34, sd=0.34))
Run this: gbm.auto(expvar=c(2,3,4),resvar=1,samples=mysamples,tc=2,lr=0.00000000000000001,bf=0.5)
Problem now: this causes a different error because my fake data are somehow wrong. ARGHG!
Edit: rounded the response data to integers and kept shrinking the learning rate until it runs. If it doesn't work for you, add zeroes until it does.
Edit: so this worked on my computer but reading it back to a clean sheet from online fails on a DIFFERENT count:
Error in var(cv.cor.stats, use = "complete.obs") :
no complete element pairs
In cor(y_i, u_i) : the standard deviation is zero
Is it allowed to attach or link to a csv of a small clip of my data? I'm currently burrowing deeper & deeper into bugfixing problems created by using fake data which I'm only using for this question, & thus getting off topic from the actual problem. Exasperation mode on!
Cheers
Edit2: if this is allowed: 1000row 4column csv link here: https://drive.google.com/file/d/0B6LsdZetdypkaC1WYXpKU3ZScjQ

Resources