I have a series of ARGOS data from various individual animals. I am using the R package aniMotum (previously foiegras) to create move persistence maps like this example output from the documentation. My goal is to be able to create one of these maps for each animal in my dataframe.
When I adapt the example code to my tracking data, it fails on some tracks and not others. When it fails, I receive a series of errors like these
Error in newton(par = c(X = 4205.28883097774, X = -5904.86998464547, X = 4204.95187507608, :
Newton failed to find minimum.
And a final error
Warning message:
The optimiser failed. Try simplifying the model with the following argument:
map = list(rho_o = factor(NA))
Here is a sample of the code I used
library(aniMotum)
library(tidyverse)
movePersistence <- readRDS("newton_error.rds")
x <-fit_ssm(movePersistence,
vmax = 3,
model = "mp",
time.step = 2,
control = ssm_control(verbose = 2)
)
aniMotum::map(x, what = "p", normalise = TRUE, silent = TRUE)
From what I can tell, the issue is occurring from the Newton function in the TMB package (a dependency of Animotum). I have noticed that if I change the time.step value, it will change the number of errors I get. For example, on track 14, I get 13 Newton errors with a step of 2 but none with a step of 12. I am using 2 as it gives a better approximation than 12. I did try my whole dataset at various steps and each time, different tracks fail. I also tried changing the tolerance and number of iterations in the ssm_control but that was more of a hail-Mary approach that was not working.
Here is a dput() of track 14 to reproduce the errors on a smaller scale using the above code - https://pastebin.com/xMXzdPDh - if anyone has some recommendations I would greatly appreciate it as I do not understand the inconsistent results.
Related
I keep getting an 'unused arguments' error when I call the SMD function
I'm using smd() as part of a larger data analysis, comparing groups created through k-means clustering. And it was all working fine... until it wasn't. I'd been editing other parts of the main script - adding a derived variable.
I've puzzled for some time, checking syntax and the code that creates the function arguments. All to no avail. Finally I wrote a short script to see if I had this problem with some very basic data. And I still do. The new script is
library(smd)
Mean_x <- 75
Mean_y <- 25
n_x <- 25
n_y <- 25
sd_x <- 40
sd_y <- 20
temp_smd <- smd(Mean.1=Mean_x, Mean.2=Mean_y, s.1=sd_x, s.2=sd_y, n.1=n_x, n.2=n_y)
... and I get the error message
Error in smd(Mean.1 = Mean_x, Mean.2 = Mean_y, s.1 = sd_x, s.2 = sd_y, :
unused arguments (Mean.1 = Mean_x, Mean.2 = Mean_y, s.1 = sd_x, s.2 = sd_y, n.1 = n_x, n.2 = n_y)
I even tried smd::smd, in case there was a package conflict that I wasn't aware of.
All help appreciated
From the documentation on smd package (https://cran.r-project.org/web/packages/smd/) it looks like smd is looking for the following arguments:
x a vector or matrix of values
g a vector of at least 2 groups to compare. This should coercable to a factor.
w a vector of numeric weights (optional)
std.error Logical indicator for computing standard errors using compute_smd_var. Defaults to FALSE.
na.rm Remove NA values from x? Defaults to FALSE.
gref an integer indicating which level of g to use as the reference group. Defaults to 1.
Whereas your giving it a bunch of different arguments, which the function isn't able to make use of. Your arguments make sense in terms of the mathematical explanation of what smd is supposed to calculate as presented in the documentation, but the documentation doesn't make it clear (at least to me) how the arguments its expecting relate to the number it calculates. If it were me, I'd probably write my own function to do the calculation.
Shutting everything down and re-installing the MBESS package seems to have fixed it? It is all working now anyway! :-)
I am supposed to perform a combined K-means + Gaussian mixture Models to determine a set of consensus clusters for a fixes number of clusters (k = 4). My data is composed of 231 cells from 4 different types of tumor which have a total of 19'177 variables (genes in this case).
I have never tried to perform this and I tried to follow the instructions from this R package : https://search.r-project.org/CRAN/refmans/diceR/html/consensus_cluster.html
However I must have done something wrong since when I try to run the code:
cc <- consensus_cluster(data, nk = 4, algorithms =c("gmm", "km"), progress = F )
it takes way too much time and ends up saying this error:
Error: cannot allocate vector of size 11.0 Gb
So clearly my generated vector is too heavy and I must have understood things wrong in the tutorial.
Is someone familiar with diceR package and could explain to me if there is a way to make it work?
The consensus_cluster during it's execution "eats up" memory of R session. You have so many variables that their handling cannot be allocated in the memory.
So you have two choices: increase physical memory or use not full data, but its partial sample. Let's assume that physical memory increase is not feasible. Then you should use prep.data = "sample" option. However you'll need to wait. I model data and for GMM it was 8 hours to wait.
Please see below:
library(diceR)
observ = 23
variables = 19177
dat <- matrix(rnorm(observ * variables), ncol = variables)
cc <- consensus_cluster(dat, nk = 4, algorithms =c("gmm", "km"), progress = TRUE,
prep.data = "sample")
Output (was not so patient to wait):
Clustering Algorithm 1 of 2: GMM (k = 4) [---------------------------------] 1% eta: 8h
I am very new to R, and this is my first time of encountering the eval() function. So I am trying to use the med and boot.med function from the following package: mma. I am using it to conduct mediation analysis. med and boot.med take in models such as linear models, and dataframes that specify mediators and predictors and then estimate the mediation effect of each mediator.
The author of the package gives the flexible option of specifying one's own custom.function. From the source code of med, it can be seen that the custom.function is passed to the eval(). So I tried insert the gbmt function as the custom function. However, R kept giving me error message: Error during wrapup: Number of trees to be used in prediction must be provided. I have been searching online for days and tried many ways of specifying the number of trees parameter n.trees, but nothing works (I believe others have raised similar issues: post 1, post 2).
The following codes are part of the source code of the med function:
cf1 = gsub("responseY", "y[,j]", custom.function[j])
cf1 = gsub("dataset123", "x2", cf1)
cf1 = gsub("weights123", "w", cf1)
full.model[[j]] <- eval(parse(text = cf1))
One custom function example the author gives in the package documentation is as follows:
temp1<-med(data=data.bin,n=2,custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
Here the glm is the custom function. This example code works and you can replicate it easily (if you have mma installed and loaded). However when I am trying to use the gbmt function on a survival object, I got errors and here is what my code looks like:
temp1 <- med(data = data.surv,n=2,type = "link",
custom.function = 'gbmt(responseY ~.,
data = dataset123,
distribution = dist,
train_params = start_stop,
cv_folds=10,
keep_gbm_data = TRUE,
)')
Anyone has any idea how the argument about number of trees n.trees can be added somewhere in the above code?
Many thanks in advance!
Update: in order to replicate the example code, please install mma and try the following:
library("mma")
data("weight_behavior") ##binary x #binary y
x=weight_behavior[,c(2,4:14)]
pred=weight_behavior[,3]
y=weight_behavior[,15]
data.bin<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4)
temp1<-med(data=data.bin,n=2) #or use self-defined final function
temp1<-med(data=data.bin,n=2, custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
I changed the custom.function to gbmt and used a survival object as responseY and the error occurs. When I use the gbmt function on my data outside the med function, there is no error.
I have 30 datasets that are conbined in a data list. I wanted to analyze spatial point pattern by L function along with randomisation test. Codes are following.
The first code works well for a single dataset (data1) but once it is applied to a list of dataset with lapply() function as shown in 2nd code, it gives me a very long error like so,
"Error in Kcross(X, i, j, ...) : No points have mark i = Acoraceae
Error in envelopeEngine(X = X, fun = fun, simul = simrecipe, nsim =
nsim, : Exceeded maximum number of errors"
Can anybody tell me what is wrong with 2nd code?
grp <- factor(data1$species)
window <- ripras(data1$utmX, data1$utmY)
pp.grp <- ppp(data1$utmX, data1$utmY, window=window, marks=grp)
L.grp <- alltypes(pp.grp, Lest, correlation = "Ripley")
LE.grp <- alltypes(pp.grp, Lcross, nsim = 100, envelope = TRUE)
plot(L.grp)
plot(LE.grp)
L.LE.sp <- lapply(data.list, function(x) {
grp <- factor(x$species)
window <- ripras(x$utmX, x$utmY)
pp.grp <- ppp(x$utmX, x$utmY, window = window, marks = grp)
L.grp <- alltypes(pp.grp, Lest, correlation = "Ripley")
LE.grp <- alltypes(pp.grp, Lcross, envelope = TRUE)
result <- list(L.grp=L.grp, LE.grp=LE.grp)
return(result)
})
plot(L.LE.sp$LE.grp[1])
This question is about the R package spatstat.
It would help if you could add a minimal working example including data which demonstrate this problem.
If that is not available, please generate the error on your computer, then type traceback() and capture the output and post it here. This will trace the location of the error.
Without this information, my best guess is the following:
The error message says No points have mark i=Acoraceae. That means that the code is expecting a point pattern to include points of type Acoraceae but found that there were none. This can happen because in alltypes(... envelope=TRUE) the code generates random point patterns according to complete spatial randomness. In the simulated patterns, the number of points of type Acoraceae (say) will be random according to a Poisson distribution with a mean equal to the number of points of type Acoraceae in the observed data. If the number of Acoraceae in the actual data is small then there is a reasonable chance that the simulated pattern will contain no Acoraceae at all. This is probably what is causing the error message No points have mark i=Acoraceae.
If this interpretation is correct then you should be able to suppress the error by including the argument fix.marks=TRUE, that is,
alltypes(pp.grp, Lcross, envelope=TRUE, fix.marks=TRUE, nsim=99)
I'm not suggesting this is necessarily appropriate for your application, but this should remove the error message if my guess is correct.
In the latest development version of spatstat, available on github, the code for envelope has been tweaked to detect this error.
I am currently working on a project where I have to do some feature selection for building a predictive model. I was lead to a package in R called mRMRe. I am just trying to work the example but cannot get it working. The example can be found here - http://www.inside-r.org/packages/cran/mRMRe/docs/mRMR.ensemble.
Here is my code -
data(cgps)
data <- data.frame(target=cgps.ic50, cgps.ge)
mRMR.ensemble(data, 1, rep.int(1, 30))
When I run this code I get the error -
Error in .local(.Object, ...) : data must be of type mRMRe.Data.
I dug a litter further and found that you actually have to convert the data to mRMR.Data type. So I did this update -
# Update
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
but I still get the same error. When I look at the class I have -
> class(data)
[1] "mRMRe.Data"
attr(,"package")
[1] "mRMRe"
So the data is the requested type but the code is still not functional.
My question is if anyone has experience using this package or any help or comments would be appreciated!
Also want to note that in the example from the link - when I load the data
cgps_ic50 -> cgps.ic50
cgps_ge -> cgps.ge
so the names of the data aren't the same as the same in the example.
With the code you wrote:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data, 1, rep.int(1, 30))
The function mRMR.ensemble is getting the data as the first parameter, but the default first parameter in this function is solution_count.
I understand that your intentions executing that example are finding 30 relevant and non-redundant features using the classic mRMR feature selection algorithm so try this:
data(cgps)
data <- mRMR.data(data = data.frame(target=cgps.ic50, cgps.ge))
mRMR.ensemble(data = data, target_indices = 1,
feature_count = 30, solution_count = 1)
The target_indices are the positions in the original data.frame of the features used to maximize the relevance (correlation or other quality measure for this issue), so features selected in the end will be good for explaining the features indicated in the target_indices.
For example, in a classification problem, we would choose the position of the class variable as the value for the target_indices parameter.
The feature_count parameter indicates the number of variables to be chosen.
The solution_count is not a parameter of the classic mRMR. It indicates the number of mRMR algorithms to be ensembled to get a final feature selection, so if set to 1 it performs only one classic mRMR.