Simulating the SBML models with coupled compartment and species rate rules - simulator

I am working on the Simulation Biology Simulation Core Library (SBSCL) where we are currently simulating the SBML models from the SBML Test Suite. But I am having issues in simulating the SBML models where there are coupled compartment and species rate rules and the species are in concentration units (i.e. the species depends on the values of the compartment). The models with this property can be found in the SBML Test Suite with one of them being the test case 1198.
The series of discussions on this issue can also be found at the sbml-discuss google group [Link]. Even I have created an issue for this in SBSCL.
Can I get the best way of simulating this type of SBML models?

If you can treat your species concentration like a parameter, this is the easiest solution: simply set the rate of change directly. If asked to compute the amount of the species, multiply that concentration by the compartment size.
If, however, your simulator is like most SBML simulators, your fundamental unit of change is the species amount. This makes this scenario particularly awkward, because the rate of change of the species amount must be derived from the rate of change of the species concentration as well as the rate of change of the compartment size.
However, it's still doable. If we use "S1" as the species amount, "[S1]" as the species concentration, and "C" as the compartment size:
Again, this applies in the scenario where dC/dt and d[S1]/dt are defined, and dS1/dt is unknown.
This can be derived as follows:
(via the chain rule)

Related

What is Threshold in the Evaluate Model Module?

Notice in the image below, if I increase the value of "Threshold," the accuracy of the model seems to increase (with diminishing returns after about .62).
What does this mean and can I somehow update this value such that my model will retain this setting?
For example, I am using a boosted decision tree, but I don't see any such value for "threshold."
Ref. https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/evaluate-model?redirectedfrom=MSDN
The term Threshold defines the line of separation between the variables before implementation of the evaluation metrics. We need to split the dataset into two different parts with different ratios.
For example, we have 9 rows in our dataset and we need to split it for training and testing purposes. Let us consider first two rows are for testing purposes and remaining are for training purposes. The Threshold is the Hyperplane the seperation line between the categories. When we need to split the data after training into categories, we need to differentiate between them with some threshold value. Based on the number of training and testing variables the threshold value will be automatically assigned by scikit learn.
It is true that if we increase the threshold based on the number of training variables and testing variables, the accuracy will increase. That will show impact on precision and recall.
Check out the blog on the same, regarding the importance of threshold in decision trees.
Blog Contribution: https://www.geeksforgeeks.org/

clustering standard errors within MLMs/lme4

Is it possible to use both cluster standard errors and multilevel models together and how does one implement this in R?
In my set up I am running a conjoint experiment in 26 countries with 2000 participants per country. Like any conjoint experiment each participant is shown two vignettes and asked to choose/rate each vignette. The same participants is then shown two fresh vignettes for comparison and asked to repeat the task. In this case each participant performs two comparisons. The hierarchy is thus comparisons nested within individuals nested within countries. I am currently running a multilevel model with each comparison at level 1 and country is the level 2 unit. Obviously comparisons within individuals are likely to be correlated so I'd like to cluster standard errors at the individual level as well. It seems overkill to add another level in the MLM for this since the size of my clusters are extremely small (n=2) and it makes more sense to do my analysis at the individual level (not to mention unnecessarily complicating the model since with 2000 individuals*26 countries the parameter space becomes crazy huge). Is this possible? If so how does one do this in R together with a multilevel model set up?
The cluster size of 2 is not an issue, and I don't see any issue with the parameter space either. If you fit random intercepts for participants, and countries, these are estimated as latent normally distributed variables. A model such as:
lmer(outomce ~ fixed effects + (1|country/participant)
This will handle the dependencies within clusters (at the participant level and the country level) so there will be no need to use cluster standard errors.

Singularity in Linear Mixed Effects Models

Dataset Description: I use a dataset with neuropsychological (np) tests from several subjects. Every subject has more than one tests in his/her follow up i.e one test per year. I study the cognitive decline in these subjects. The information that I have are: Individual number(identity number), Education(years), Gender(M/F as factor), Age(years), Time from Baseline (= years after the first np test).
AIM: My aim is to measure the rate of change in their np tests i.e the cognitive decline per year for each of them. To do that I use Linear Mixture Effects Models (LMEM), taking into account the above parameters and I compute the slope for each subject.
Question: When I run the possible models (combining different parameters every time), I also check their singularity and the result in almost all cases is TRUE. So my models present singularity! In the case that I would like to use these models to do predictions this is not good as it means that the model overfits the data. But now that I just want to find the slope for each individual I think that this is not a problem, or even better I think that this is an advantage, as in that case singularity offers a more precise calculation for the subjects' slopes. Do you think that this thought is correct?

Did I just do an ANCOVA or MANOVA?

I’m trying to do an ANCOVA here ...
I want to analyze the effect of EROSION FORCE and ZONATION on all the species (listed with small letters) in each POOL.STEP (ranging from 1-12/1-4), while controlling for the effect of FISH.
I’m not sure if I’m doing it right. What is the command for ANCOVA?
So far I used lm(EROSIONFORCE~ZONATION+FISH,data=d), which yields:
So what I see here is that both erosion force percentage (intercept?) and sublittoral zonation are significant in some way, but I’m still not sure if I’ve done an ANCOVA correctly here or is this just an ANOVA?
In general, ANCOVA (analysis of covariance) is simply a special case of the general linear model with one categorical predictor (factor) and one continuous predictor (the "covariate"), so lm() is the right function to use.
However ... the bottom line is that you have a moderately challenging statistical problem here, and I would strongly recommend that you try to get local help (if you're working within a research group, can you consult with others in your group about appropriate methods?) I would suggest following up either on CrossValidated or r-sig-ecology#r-project.org
by putting EROSIONFORCE on the left side of the formula, you're specifying that you want to use EROSIONFORCE as a response (dependent) variable, i.e. your model is estimating how erosion force varies across zones and for different fish numbers - nothing about species response
if you want to analyze the response of a single species to erosion and zone, controlling for fish numbers, you need something like
lm(`Acmaeidae s...` ~ EROSIONFORCE+ZONATION+FISH, data=your_data)
the lm() suggestion above would do each species independently, i.e. you'd have to do a separate analysis for each species. If you also want to do it separately for each POOL.STEP you're going to have to do a lot of separate analyses. There are various ways of automating this in R, the most idiomatic is probably to melt your data (see reshape2::melt or tidy::gather) into long format and then use lmList from lme4.
since you have count data with low means, i.e. lots of zeros (and a few big values), you should probably consider a Poisson or negative binomial model, and possibly even a zero-inflated/hurdle model (i.e. analyze presence-absence and size of positive responses separately)
if you really want to analyze the joint distribution of all species (i.e., a response of a multivariate analysis, which is the M in MANOVA), you're going to have to work quite a bit harder ... there are a variety of joint species distribution models by people like Pierre Legendre, David Warton and others ... I'd suggest you try starting with the mvabund package, but you might need to do some reading first

How can I include repeated measures to my lmer correctly

In my study I was sampling the same sites in different regions for many years. Each site has different properties in each year, which is important for my research question. I want to know, if the properties of the site affect biodiversity on the sites. And I am interested in the interaction of the propterties and the regions.
Overview:
Biodiversity = response
Site property = fixed factor, changes every year
Region = fixed factor , same regions every year
Site = random effect, is repeatedly sampled in the different sampling years
Year = random effect, is the factor in which "site" is repeated
At the moment my model looks like this:
mod1 <- lmer(biodiversity~region*siteProperty+(1|Year)+(1|site))
I'm not sure if this accounts for the repeated measures.
Alternatively I was thinking about this one, as it includes also the nestedness of the sites in the years, but maybe that is not necessary:
mod2 <- lmer(biodiversity~region*siteProperty+(1|Year)+(1|Year:site))
The problem with this approach is, that it only works if my site properties are not zero. But I have zeros in different properties and I need to analyse their effects as well.
If you need more information, just ask me for.
Thanks for your help!
Your first example,
mod1 <- lmer(biodiversity~region*siteProperty+(1|Year)+(1|site))
should be fine (although I'd encourage you to use the data argument explicitly ...). If you have samples for "many years" for each site, you might want to consider
including a trend model (i.e. include Year, as a numeric variable, in the fixed effects part of the model as well, either as a simple linear term or as part of an additive model, e.g. using splines::ns
checking for/allowing for autocorrelation (although this is tricky in lme4; you can use lme, but then crossed random effects of Year and site become harder).
If you have one sample per site/year combination, you don't want (1|Site:year), because that will be the same as the residual variability term.
Your statement "only works if my site properties are not zero" doesn't make sense to me: in general, having zeros in predictor variables shouldn't cause any problems ... ?

Resources