Piecewise regression : Reproducibility problems on breakpoints detection in segmented R package

Piecewise regression : Reproducibility problems on breakpoints detection in segmented R package - r

I'm trying to fit a 3 pieces regression on my data with the help of the segmented package, and I'm a bit lost...
First : here is a reproducible example :
y=c(520.0000, 620.0000, 653.3333, 853.3333, 1220.0000, 1553.3333, 1586.6667, 1586.6667, 1586.6667, 1586.6667, 1586.6667)
x=c(33320, 41020, 49020, 56920, 69220, 76320, 86320, 95420, 103720, 111520, 120320)
plot(y~x)
out=lm(y~x)
My data with 2 visible breakpoints :
- First I tried specifying the known number of breakpoints with K=2 :
mdl2=segmented(out, seg.Z =~x, psi=NA, control=seg.control(K=2,n.boot=0,it.max=500,stop.if.error=FALSE,display=T))
plot(mdl2)
points(y~x)
Which gives me, a 1 breakpoint result :
- But if I set 2<K<8 (so a wrong value...), I'm able to detect the right number of breakpoints :
- And a last point which puzzles me :
If I set K=4, the display=T option show me a result with 3 breakpoints, but in the function output I still have two breakpoints...
******EDIT OF THE 09/19/2016******
I tried also by specifying psi directly as I've some priors on the breakpoints location (but it's not my goal), and the results are still really bad with segmented...
For some regression I've to run the function many times before the algorithm success to end with a solution. Also, the solutions proposed have often reproducibility problems...
Does anyone know a way to robustly estimate these breakpoints ? It looks like my data are not that hard to fit, isn't it ?

Related

How to remove the models that failed convergence from a set of random questions?

I want to include some random replications of model estimations (e.g., GARCH model) in the question. The code uses a different data series randomly. In this process, some GARCH estimations for some random data series may not achieve numerical convergence. Therefore, I need to code the question/problem in such a way that it has to remove the models that failed convergence from the set of questions. How can I code this when I use R-exams?

Basic idea
In general when using random data in the generation of exercises, there is a chance that sometimes something goes wrong, e.g., the solution does not fall into a desired range (i.e., becomes too large or too small), or the solution does not even exist due to mathematical intractability or numerical problems (as you point out) etc.
Of course, it is best to avoid such problems in the data-generating process so that they do not occur at all. However, it is not always possible to do so or not worth the effort because problems occur very rarely. In such situations I typically use a while() loop to re-generate the random data if necessary. As this might run potentially for several iterations it is important, though, to make the probably sufficiently small that it is needed.
Worked example
A worked example can be found in the fourfold exercise that ships with the package. It randomly generates a fourfold table with probabilities that should subsequently be reconstructed from partial information in the actual exercise. In order for the exercise to be well-defined all entries of the table must be (strictly) between 0 and 1 and they must sum up to 1. The simulation code actually tries to assure that but edge cases might occur. Rather than writing more code to avoid these edge cases, a simple while() loop tries to catch them and sample a new table if needed:
ok <- FALSE
while(!ok) {
[...generate probabilities...]
tab <- cbind(c(prob1, prob3), c(prob2, prob4))
[...compute solutions...]
ok <- sum(tab) == 1 & all(tab > 0) & all(tab < 1)
}
Application to catching errors
The same type of strategy could also be used for other problems such as the ones you describe. You can wrap the model estimation into a code like
fit <- try(mymodel(...), silent = TRUE)
and then use something like
ok <- !inherits(fit, "try-error")
In addition to not producing an error you might require, say that all coefficients are positive (or something like that). Then you would do:
ok <- !inherits(fit, "try-error") && all(coef(fit) > 0)
Analogously, you could check the convergence of the model etc.

Estimation to plot person-item map not feasible because items "have no 0-responses" in data matrix

I am trying to create a person item map that organizes the questions from a dataset in order of difficulty. I am using the eRm package and the output should looks like follows:
[person-item map] (https://hansjoerg.me/post/2018-04-23-rasch-in-r-tutorial_files/figure-html/unnamed-chunk-3-1.png)
So one of the previous steps, before running the function that outputs the map, I have to fit the data set to have a matrix which is the object that the plotting functions uses to create the actual map, but I am having an error when creating that matrix
I have already tried to follow and review some documentation that might be useful if you want to have some extra-information:
[Tutorial] https://hansjoerg.me/2018/04/23/rasch-in-r-tutorial/#plots
[Ploting function] https://rdrr.io/rforge/eRm/man/plotPImap.html
[Documentation] https://eeecon.uibk.ac.at/psychoco/2010/slides/Hatzinger.pdf
Now, this is the code that I am using. First, I install and load the respective libraries and the data:
> library(eRm)
> library(ltm)
Loading required package: MASS
Loading required package: msm
Loading required package: polycor
> library(difR)
Then I fit the PCM and generate the object of class Rm and here is the error:
*the PCM function here is specific for polytomous data, if I use a different one the output says that I am not using a dichotomous dataset
> res <- PCM(my.data)
>Warning:
The following items have no 0-responses:
AUT_10_04 AUN_07_01 AUN_07_02 AUN_09_01 AUN_10_01 AUT_11_01 AUT_17_01
AUT_20_03 CRE_05_02 CRE_07_04 CRE_10_01 CRE_16_02 EFEC_03_07 EFEC_05
EFEC_09_02 EFEC_16_03 EVA_02_01 EVA_07_01 EVA_12_02 EVA_15_06 FLX_04_01
... [rest of items]
>Responses are shifted such that lowest
category is 0.
Warning:
The following items do not have responses on
each category:
EFEC_03_07 LC_07_03 LC_11_05
Estimation may not be feasible. Please check
data matrix
I must clarify that all the dataset has a range from 1 to 5. Is a Likert polytomous dataset
Finally, I try to use the plot function and it does not have any output, the system just keep loading ad-infinitum with no answer
>plotPImap(res, sorted=TRUE)
I would like to add the description of that particular function and the arguments:
>PCM(X, W, se = TRUE, sum0 = TRUE, etaStart)
#X
Input data matrix or data frame with item responses (starting from 0);
rows represent individuals, columns represent items. Missing values are
inserted as NA.
#W
Design matrix for the PCM. If omitted, the function will compute W
automatically.
#se
If TRUE, the standard errors are computed.
#sum0
If TRUE, the parameters are normed to sum-0 by specifying an appropriate
W.
If FALSE, the first parameter is restricted to 0.
#etaStart
A vector of starting values for the eta parameters can be specified. If
missing, the 0-vector is used.
I do not understand why is necessary to have a score beginning from 0, I think that that what the error is trying to say but I don't understand quite well that output.
I highly appreciate any hint that you can provide me
Feel free to ask for any information that could be useful to reach the solution to this issue

The problem is not caused by the fact that there are no items with 0-responses. The model automatically corrects this by centering the response scale categories on zero. (You'll notice that the PI-map that you linked to is centered on zero. Also, I believe the map you linked to is of dichotomous data. Polytomous data should include the scale categories on the PI-map, I believe.)
Without being able to see your data, it is impossible to know the exact cause though.
It may be that the model is not converging. That may be what this error was alluding to: Estimation may not be feasible. Please check data matrix. You could check by entering > res at the prompt. If the model was able to converge you should see something like:
Conditional log-likelihood: -2.23709
Number of iterations: 27
Number of parameters: 8
...

Does your data contain answers with decimal numbers? I found the same error, I solved it by using dplyr::dense_rank() function:
df_ranked <- sapply(df_decimal_data, dense_rank)
Worked.

How do I use prodlim function with a non-binary variable in formula?

I am trying to (eventually) plot data by groups, using the prodlim function.
I'm adjusting and adapting code that someone else (not available for questions) has written, and I'm not very familiar with the prodlim library/function. There are definitely other ways to do what I'd like to, but I'm trying to keep it consistent with what the previous person did.
I have code that works, when dividing the data into 2 groups, but when I try to adjust for a 4 group situation, I get an error.
Of note, the data is coming over from SAS using StatTransfer, which has been working fine.
I am new to coding, but I have compared the dataframes I'm trying to work with. The second is just a subset of the first (where the code does work), with all the same variables, and both of the variables I'm trying to group by are integer values.
Hist(medpop$dz_time, medpop$dz_status) works just fine, so the problem must be with the prodlim function, and I haven't understood much of what I've looked up about it, sadly :/ But it the documentation seems to indicate it supports continuous or categorical variables, and doesn't seem limited to binary either. None of the options seem applicable as I understand them.
this works:
M <- prodlim(Hist(dz_time, dz_status)~med, data=pop)
where med is a binary value =1 when a member of this population is taking it, and dz is a disease that some portion develop.
this does not:
(either of these get the error as below)
N <- prodlim(Hist(dz_time, dz_status)~strength, data=medpop)
N <- prodlim(Hist(dz_time, dz_status)~strength, data=pop, subset=pop$med==1)
medpop = the subset of the original population taking the med,
strength = categorical variable ("1","2","3","4")
For the line that does work, the next step is just plot(M), giving a plot with two lines, med==0 and med==1 (showing cumulative incidence of dz_status by dz_time).
For the other line, I get an error saying
Error in KernSmooth::dpik(cumtabx/N, kernel = "box") :
scale estimate is zero for input data
I don't know what that means or how to fix it.. :/

GLMM's for meta-analysis - error using metabin

I'm trying to run a generalised linear mixed effects (binomial-normal) meta-analysis for 7 randomised studies, where each study records the presence of an adverse event within the treatment and placebo populations (exposure and control).
To do this, I'm hoping to use the metabin function (meta package). However, I'm getting an error and I'm not sure why. E.g. running this code:
install.packages('meta')
# Data
data<-data.frame(exposure.events=c(11,34,152,4,60,3,25), exposure.population=c(184,152,9500,77,2012,15,60), control.events=c(3,33,4729,133,1441,1,25), control.population=c(184,375,613978,15865,480485,105,238), Study=c("1","2","3","4","5","6","7"))
# Calling metabin
metabin(event.e=exposure.events, n.e=exposure.population, event.c=control.events, n.c=control.population, studlab=Study, data=data, method="GLMM",model.glmm = "CM.AL",method.tau = "ML")
I get this output:
Error in metafor::rma.glmm(ai = event.e[!exclude], n1i = n.e[!exclude], :
Cannot fit ML model.
I've also tried calling the rma.glmm function directly (instead of doing this via metabin), but get the same error message. I've also tried reading the source code for rma.glmm but I'm not sure I understand what's going on. However, I think the issue is related to the third study (the largest), and in particular the size of the control population, as both of the following run smoothly:
# Modifying 3rd row's control population
data<-data.frame(exposure.events=c(11,34,152,4,60,3,25), exposure.population=c(184,152,9500,77,2012,15,60), control.events=c(3,33,4729,133,1441,1,25), control.population=c(184,375,61378,15865,480485,105,238), Study=c("1","2","3","4","5","6","7"))
metabin(event.e=exposure.events, n.e=exposure.population, event.c=control.events, n.c=control.population, studlab=Study, data=data, method="GLMM",model.glmm = "CM.AL",method.tau = "ML")
# Deleting 3rd row
data<-data.frame(exposure.events=c(11,34,4,60,3,25), exposure.population=c(184,152,77,2012,15,60), control.events=c(3,33,133,1441,1,25), control.population=c(184,375,15865,480485,105,238), Study=c("1","2","3","4","5","6"))
metabin(event.e=exposure.events, n.e=exposure.population, event.c=control.events, n.c=control.population, studlab=Study, data=data, method="GLMM",model.glmm = "CM.AL",method.tau = "ML")
Is this a convergence problem, and does anyone know if there is any way around this? The only other thing I can find about this error message is for a problem (and thus solution) which does not apply to me.
Any help would be really appreciated :)

Correctly setting up Shannon's Entropy Calculation in R

I was trying to run some entropy() calculations on Force Platform data and i get a warning message:
> library(entropy)
> d2 <- read.csv("c:/users/SLA9DI/Documents/data2.csv")
> entropy(d2$CoPy, method="MM")
[1] 10.98084
> entropy(d2$CoPx, method="MM")
[1] 391.2395
Warning message:
In log(freqs) : NaNs produced
I am sure it is because the entropy() is trying to take the log of a negative number. I also know R can do complex numbers using complex(), however i have not been successful in getting it to work with my data. I did not get this error on my CoPy data, only the CoPx data, since a force platform gets Center of Pressure data in 2 dimensions. Does anyone have any suggestions on getting complex() to work on my data set or is there another function that would work better to try and get a proper entropy calculation? Entropy shouldn't be that much greater in CoPx compared to CoPy. I also tried it with some more data sets from other subjects and the same thing was popping up, CoPx entropy measures were giving me warning messages and CoPy measurements were not. I am attaching a data set link so anyone can try it out for themselves and see if they can figure it out, as the data is a little long to just post into here.
Data
Edit: Correct Answer
As suggested, i tried the table(...) function and received no warning/error message and the entropy output was also in the expected range as well. However, i apparently overlooked a function in the package discretize() and that is what you are supposed to use to correctly setup the data for entropy calculation.

I think there's no point in applying the entropy function on your data. According to ?entropy, it
estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y
(emphasis mine). This means that you need to convert your data (which seems to be continuous) to count data first, for instance by binning it.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Piecewise regression : Reproducibility problems on breakpoints detection in segmented R package - r

Related

How to remove the models that failed convergence from a set of random questions?

Estimation to plot person-item map not feasible because items "have no 0-responses" in data matrix

How do I use prodlim function with a non-binary variable in formula?

GLMM's for meta-analysis - error using metabin

Correctly setting up Shannon's Entropy Calculation in R

Categories

Resources