Extract Optimal Parameters from Flexmix: Univariate Data - r

I am trying to replicate a table in the book HMM:
Specifically, the k=2 model.
I am using the code:
dat<-as.data.frame(c(13, 8, 23, 22, 18, 15, 14, 11, 24, 18, 14, 16, 8, 14, 27, 15, 10, 13, 10, 23, 41, 20, 15, 15, 16, 18, 31, 15, 8, 16, 26, 17, 27, 22, 15, 11, 32, 19, 35, 19, 6, 11, 27, 20, 26, 16, 11, 18, 22, 28, 30, 8, 32, 19, 36, 27, 7, 36, 13, 39, 29, 18, 24, 26, 21, 23, 16, 22, 13, 17, 20, 13, 23, 14, 22, 16, 12, 22, 22, 17, 21, 13, 18, 24, 19, 21, 20, 25, 21, 15, 25, 15, 21, 22, 34, 16, 16, 21, 26, 10, 18, 12, 14, 21, 15, 15, 18))
colnames(dat)<-"x"
x2 <- flexmix(x~1, data=dat, k=2,model=FLXMRglm(formula=x~1,family="poisson"))
summary(x2)
Examining the priors and the log lik it appears the syntax is doing what I want :). My question is how to extract the component means lambda (15.777 and 26.840 above)? I do not believe they are simply the mean value of the data in each cluster.

exp(parameters(x2))
You can use parameters to extract the coefficients but the model defaults to using a log link-function so you need to use exp to convert back to the original scale.

Related

error in densityplot mice- missing data example

I have the following data:
dput(example)
structure(list(q1 = c(5, 22, 16, 24, 9, 20, 21, 16, 28, 28, 24,
25, 34, 22, 29, NA, 24, 13, 10, 17, 24, 21, 22, 35, 20, 25, 25,
23, 22, 20, 27, 22, 20, 23, 5, 21, 19, 17, 27, 20, 35, 35, 10,
16, 22, 34, 34, 23, 25, 23, 25, 30, 18, 21, 15, 23, 5, 35, 5,
30), q2 = c(5, 5, 24, 15, 5, 5, 26, 23, 24, 9, 24, 5, 15, 26,
30, 14, 14, 19, 11, 25, 20, 5, 14, 13, 11, 10, 13, 16, 16, 21,
10, 12, 20, 9, 15, 5, 13, 5, 30, 18, 12, 27, 10, 9, 20, 5, 9,
10, 11, 26, 22, 8, 6, 5, 15, 6, 5, 35, 10, 18), q3 = c(11, 22,
NA, 22, 6, 18, 30, 6, 26, NA, 17, 22, 33, 19, 22, 25, 23, 13,
13, 15, 16, 16, 23, 24, 6, 25, 27, 12, 25, 17, 28, 15, 20, 31,
5, 17, 17, 20, 24, 7, 35, 35, 10, 10, 20, 10, 31, 21, 16, 32,
25, 30, 10, 24, 15, 24, 5, 35, 9, 26), q4 = c(14, 15, 23, 21,
NA, 25, 30, 23, 28, 20, 25, 5, 35, 30, 19, 23, 30, 5, 23, 18,
30, 15, 30, 22, 8, 29, 35, 23, 23, 24, 25, 25, 20, 25, 5, 15,
34, 8, 32, 35, 35, 35, 10, 6, 21, 10, 24, 27, 10, 30, 35, 15,
6, 21, 15, 15, 5, 35, 19, 26), q5 = c(5, 18, 21, 19, 5, 6, 5,
29, 20, 23, 22, 5, 16, 22, 12, 13, 18, 5, 17, 15, 18, 16, 20,
8, 12, 19, 12, 23, 9, 16, 5, 29, 20, 5, 5, 5, 5, 5, 30, 22, 32,
35, 10, 13, 20, 13, 12, 16, 5, 24, 22, 17, 5, 20, 14, 5, 5, 35,
15, 16), q6 = c(15, 9, 25, 26, 6, 17, 28, 32, 26, 28, 24, 25,
11, 24, 31, 18, 19, 6, 20, 26, 29, 17, 21, 24, 7, 29, 17, 17,
14, 25, 24, 35, 24, 6, 16, 6, 9, 6, 38, 19, 30, 42, 12, 20, 27,
26, 25, 13, 9, 36, 27, 27, 7, 24, 22, 6, 16, 42, 14, 11)), class = "data.frame", row.names = c(NA,
-60L))
I then use mice:
*edit: forgot the complete line
library(mice)
imp <- mice(example,m=5,maxit=50,meth='pmm',seed=500)
example_i <- complete(imp,1)
But when trying to get a densityplot I get the following error:
densityplot(imp)
Error in str2lang(x) : <text>:2:0: unexpected end of input
1: ~
^
My questions are:
Is there something fundamentally wrong about my approach to impute missing data? (this is just a small example)
Am I using properly the MICE arguments?
What am I doing wrong with the density plot, as I have gotten it for all of the other scales I am working with?
Answer
You need to supply a formula to densityplot, otherwise it will plot all variables with > 2 missing values. Since you don't have any variables with 2 > missing values, and since densityplot doesn't expect that, it produces this cryptic error.
Example that works
example$q4[1:10] <- NA
imp <- mice(example, m = 5, maxit = 50, meth = "pmm", seed = 500)
densityplot(imp)
# equivalent: densityplot(imp, ~ q4)
Rationale
imp is of class mids, so you are calling densityplot.mids. Normally, densityplot.mids requires you to provide a formula (data argument), so that it knows which variables to plot (see ?densityplot.mids). If you want to plot q4, then the code is densityplot(imp, ~ q4).
Inside densityplot.mids, we see:
if (missing(data)) {
vnames <- vnames[!allfactors & x$nmis > 2 & x$nmis <
nrow(x$data) - 1]
formula <- as.formula(paste("~", paste(vnames,
collapse = "+", sep = ""), sep = ""))
}
If we use traceback() right after getting your error, then you will see that the last line above is the line that throws the error.
In the first line, you can see the condition xnmis > 2, which means that it will grab all the columns with more than 2 missing values. When no columns satisfy the conditions, then vnames will evaluate to character(0), and so the subsequent line yields as output ~, i.e. the code that you see in your error.
So, why does it give an error when there are too few missings? That's because densityplot plots a distribution, and plotting a distribution of 1 or 2 points is just not doable.
Suggestion
The package maintainers could improve the error by simply checking whether vnames has any content, and if not, they can throw an error that is informative. You may want to add this as an issue on Github if you think it is useful.

R Time series data: Plot multiple batches

I have a big timeseries dataset which looks like the table below. T0, T1, T2,... (goes on till T70) are the timestamps and over 400 batches (A,B,C,...). There are multiple features in the data (Description Column in the sample data) which I'm interested in plotting. My first attempt was to separate the dataset for each description so that I get one row per batch in each subset ranging from T0 to T70.
My aim is to convert this dataframe into a timeseries object and check for seasonality for Good and bad batches (for each description). Can someone help with any easy fixes in R? Thanks!
Update:
My subset of the data for one Description looks like this:
In order to melt the data, I used:
mdf <- melt(df,id.vars = c('Batch',colnames(df[, c(2:70)])))
and it didn't work. I want to get just three variables out of it:
Batch - Time - Value.
Any help would be appreciated!
EDIT:dput(head(df,20)) gave the following output. I have truncated the output till T20 instead of T70.
structure(list(Batch = c("A", "B", "C",
"D", "E", "F", "G", "H",
"I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R",
"S", "T"),
T0 = c(5, 6,
4, 2, 6, 3, 4, 6, 4, 1, 6, 5, 4, 5, 6, 5, 6, 5,
5, 6), T1 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 5, 6, 6), T2 = c(6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6), T3 = c(20,
19, 19, 19, 19, 18, 20, 20, 20, 20, 20, 20, 20, 19,
18, 19, 20, 20, 20, 19), T4 = c(21, 21, 21, 21, 20,
20, 21, 21, 21, 21, 22, 21, 22, 21, 21, 21, 22, 21,
22, 20), T5 = c(22, 22, 22, 22, 22, 21, 21, 22, 21,
22, 23, 22, 23, 22, 22, 23, 23, 23, 23, 22), T6 = c(23,
23, 24, 23, 23, 23, 23, 23, 23, 24, 24, 23, 23, 24,
23, 24, 24, 24, 24, 23), T7 = c(25, 25, 25, 24, 24,
24, 24, 25, 25, 25, 24, 25, 24, 25, 25, 26, 25, 25,
25, 25), T8 = c(26, 26, 25, 26, 25, 26, 26, 26, 26,
26, 25, 26, 26, 26, 26, 26, 25, 26, 25, 26), T9 = c(20,
23, 19, 21, 22, 27, 24, 26, 24, 25, 21, 23, 21, 22,
28, 22, 20, 24, 19, 27), T10 = c(16, 18, 14, 15, 15,
23, 19, 20, 19, 20, 15, 16, 15, 17, 23, 16, 15, 18,
15, 23), T11 = c(15, 16, 15, 15, 16, 17, 15, 14, 15,
15, 15, 14, 15, 15, 17, 15, 15, 15, 15, 17), T12 = c(15,
16, 15, 15, 16, 14, 17, 15, 15, 15, 15, 15, 15, 16,
15, 15, 15, 16, 15, 15), T13 = c(15, 16, 15, 15, 16,
15, 15, 15, 15, 15, 15, 15, 15, 16, 15, 15, 15, 16,
14, 15), T14 = c(16, 16, 15, 16, 16, 15, 16, 15, 16,
15, 15, 15, 15, 16, 16, 15, 16, 16, 15, 16), T15 = c(16,
16, 16, 16, 17, 15, 16, 15, 16, 15, 16, 15, 16, 16,
16, 16, 16, 16, 15, 16), T16 = c(16, 17, 16, 16, 17,
15, 17, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
15, 16), T17 = c(17, 19, 17, 18, 20, 15, 18, 15, 16,
16, 18, 16, 18, 19, 19, 17, 19, 17, 17, 17), T18 = c(24,
26, 27, 26, 28, 22, 25, 20, 25, 20, 26, 25, 27, 26,
25, 25, 28, 25, 27, 24), T19 = c(36, 37, 36, 38, 36,
38, 37, 31, 36, 26, 36, 37, 36, 36, 37, 36, 37, 35,
35, 35), T20 = c(38, 39, 37, 38, 38, 43, 39, 41, 39,
40, 38, 39, 38, 39, 43, 38, 37, 39, 37, 42)), row.names = c(NA,
20L), class = "data.frame")
As long as you don't have data for reproducible practice of the problem, I will add some dummy data. For future questions dput() your data and paste with your question. Your issue can be solved melting your data. In this method with the function melt() from reshape2 you choose variables to be ids and the rest of variables are made rows with a reference in a key variable. Next, I apply that method and I build some plots related to what you want:
library(reshape2)
library(ggplot2)
#Data
df <- data.frame(Batch=rep(c('A','B','C'),2),
Type=c('Good','Bad','Good','Good','Bad','Good'),
Description=c(rep('In',3),rep(c('Out'),3)),
T0=c(1,2,1,4,3,2),
T1=c(2,3,4,1,3,4),
T2=c(3,5,3,5,5,6),stringsAsFactors = F)
#Melt
mdf <- melt(df,id.vars = c('Batch','Type','Description'))
#Plot for description
ggplot(mdf,aes(x=Description,y=value,fill=variable))+
geom_bar(stat='identity')
Using Description on x-axis you will get this:
Also you can wrap by some variable to get different plots like this using facet_wrap():
#Wrap by description
ggplot(mdf,aes(x=Batch,y=value,fill=variable))+
geom_bar(stat='identity')+
facet_wrap(.~Description)
With the melted data mdf you can play and obtain other plots you want.
Update: With the data provided, here a possible solution to your issue:
library(tidyverse)
#Data
dff <- structure(list(Batch = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T"),
T0 = c(5, 6, 4, 2, 6, 3, 4, 6, 4, 1, 6, 5, 4, 5, 6, 5, 6,
5, 5, 6), T1 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 5, 6, 6), T2 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 5, 6, 6, 6, 6, 6), T3 = c(20, 19, 19, 19, 19, 18,
20, 20, 20, 20, 20, 20, 20, 19, 18, 19, 20, 20, 20, 19),
T4 = c(21, 21, 21, 21, 20, 20, 21, 21, 21, 21, 22, 21, 22,
21, 21, 21, 22, 21, 22, 20), T5 = c(22, 22, 22, 22, 22, 21,
21, 22, 21, 22, 23, 22, 23, 22, 22, 23, 23, 23, 23, 22),
T6 = c(23, 23, 24, 23, 23, 23, 23, 23, 23, 24, 24, 23, 23,
24, 23, 24, 24, 24, 24, 23), T7 = c(25, 25, 25, 24, 24, 24,
24, 25, 25, 25, 24, 25, 24, 25, 25, 26, 25, 25, 25, 25),
T8 = c(26, 26, 25, 26, 25, 26, 26, 26, 26, 26, 25, 26, 26,
26, 26, 26, 25, 26, 25, 26), T9 = c(20, 23, 19, 21, 22, 27,
24, 26, 24, 25, 21, 23, 21, 22, 28, 22, 20, 24, 19, 27),
T10 = c(16, 18, 14, 15, 15, 23, 19, 20, 19, 20, 15, 16, 15,
17, 23, 16, 15, 18, 15, 23), T11 = c(15, 16, 15, 15, 16,
17, 15, 14, 15, 15, 15, 14, 15, 15, 17, 15, 15, 15, 15, 17
), T12 = c(15, 16, 15, 15, 16, 14, 17, 15, 15, 15, 15, 15,
15, 16, 15, 15, 15, 16, 15, 15), T13 = c(15, 16, 15, 15,
16, 15, 15, 15, 15, 15, 15, 15, 15, 16, 15, 15, 15, 16, 14,
15), T14 = c(16, 16, 15, 16, 16, 15, 16, 15, 16, 15, 15,
15, 15, 16, 16, 15, 16, 16, 15, 16), T15 = c(16, 16, 16,
16, 17, 15, 16, 15, 16, 15, 16, 15, 16, 16, 16, 16, 16, 16,
15, 16), T16 = c(16, 17, 16, 16, 17, 15, 17, 15, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 15, 16), T17 = c(17, 19,
17, 18, 20, 15, 18, 15, 16, 16, 18, 16, 18, 19, 19, 17, 19,
17, 17, 17), T18 = c(24, 26, 27, 26, 28, 22, 25, 20, 25,
20, 26, 25, 27, 26, 25, 25, 28, 25, 27, 24), T19 = c(36,
37, 36, 38, 36, 38, 37, 31, 36, 26, 36, 37, 36, 36, 37, 36,
37, 35, 35, 35), T20 = c(38, 39, 37, 38, 38, 43, 39, 41,
39, 40, 38, 39, 38, 39, 43, 38, 37, 39, 37, 42)), row.names = c(NA,
-20L), class = "data.frame")
Next the code:
#Code
Melted <- pivot_longer(dff,cols = -Batch)
Melted$name <- factor(Melted$name,levels = unique(Melted$name))
#Plot
ggplot(Melted,aes(x=Batch,y=value,color=name,group=name))+geom_line()

Inline data.frame inclusion in R script

While there are functions for saving data as a separate CSV file (write.table) or as an R-data file (save, saveRDS), I have not found a way to store or print a data frame as R code that recreates this data frame.
Background of my question is that I want to include data with a script (instead of storing it in a separate file), and am thus looking for a way to generate the specific code provided the data frame already exists. I could hack on with sed or other external tools, but I wonder whether someone knows of a built-in method in R.
Try with "dput" like so:
dput(cars)
# Returns:
structure(list(speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16,
16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20,
22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18,
26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80,
20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32,
48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)), class = "data.frame",
row.names = c(NA, -50L))

r subset `x` and `labels` must be same type

I am trying to subset my dataset as follows
df[df$Age > 19,]
I am seeing an error , Error: x and labels must be same type
I am not sure I understand this, any suggestions are much appreciated.
=================
dput(df$Age)
c(20, 11, 10, 15, 6, 23, 45, 30, 18, 11, 15, 20, 7, 18, 19, 30,
40, 16, 14, 33, 12, 22, 12, 5, NA, 18, 30, 26, 25, 27, 12, 27,
13, 15, 32, 19, NA, 18, 13, 30, 10, 16, 47, 24, 64, 21, 9, 30,
12, 33, 16, 20, 14, 10, 19, 18, 20, 18, 10, 15, 55, 18, 50, 14,
35, 18, 21, 17, 14, 9, 25, 17, 10, 16, 12, 30, 38, 10, 27, 20,
27, 16, 30, 11, 5, 20, 30, 12, 24, 11, 7, 26, 48, 25, 20, 18,
27, 18, 28, 15, 17, 46, 30, 20, 20, 14, 35, 31, 10, 26, 13, NA,
15, 3, 30, 33, 15, 43, 19, 40, 8, 16, 8, 3, 37, 40, 58, 18, 12,
19, 14, 24, 34, 30, 23, 28, 47, 29, 21, 35, 23, 47, 11, 30, 16,
25, 30, 30, 8, 18, 20, 12, 8, 18, 30, 6, 54, 60, 18, 27, 42,
6, 42, 13, 21, 15, 17, 10, 33, 15, 16, 36, 16, 52, 4, 30, 28,
30, 14, 13, 14, NA, 15, 20, 20, 24, 27, 23, 10, 13, 22, 30, 45,
10, 23, 14, 27, 19, 12, 25, 10, 10, 14, 16, 16, 19, 18, 12, 65,
18, 35, 20, 31, NA, 21, 40, 8, 13, 25, 8, 13, 15, 19, 25, 10,
9, 24, 8, 25, 30, 38, 35, 20, 12, 15, 25, 27, 39, 8, 10, NA,
12, 50, 16, 14, 22, 12, 20, 44, 13, 8, 43, 48, 13, 21, 20, 42,
11, 20, 35, 53, 22, 17, 5, NA, 14, 10, 21, 33, 21, 69, 24, 15,
12, 8, 28, 11, 32, 25, 26, 21, 36, 12, 24, 20, 23, 14, 30, 50,
26, NA, 30, 22, 44, 22, 14, 30, 28, 10, 16, 32, 35, 40, 16, 40,
33, 23, 25, 10, 17, 10, 14, 22, 14, 25, 20, 39, 24, 52, 16, 34,
26, 23, 11, 12, 70, 59, 12, 38, 22, 13, 40, 57, 30, 7, 21, 20,
30, 12, 13, 5, 19, 35, 56, 17, 40, 48, 19, 8, 30, 21, 5, 40,
16, 22, 20, 17, 16, 30, 18, 13, 17, NA, 40, 9, 24, 26, 20, 22,
17, 44, 45, 18, 26, 50, 10, 21, 15, NA, 20, 12, 16, 54, 15, 16,
33, 22, 26, 60, 35, 11, 30, 16, 48, 16, 16, 16, 10, 14, 15, 23,
17, 18, NA, 49, 12, 7, 18, 24, 17, 14, 30, 13, 6, 51, 36, 16,
10, 43, 34, 15, 12, 15, 15, 17, 40, 58, 15, 33, 16, 48, 25, 15,
16, 5, NA, 40, 34, 10, 30, 30, 30, 15, 15, 12, 5, 10, 20, 18,
20, 16, 20, 26, 12, 14, 14, 20, 12, 30, 30, 29, 22, 19, 26, 11,
23, 40, 30, 16, 50, 20, 25, 29, 40, 44, 20, 40, 8, 16, 15, 38,
11, 27, 63, 16, NA, 47, 65, 21, 29, 30, 16, 21, 25, 16, 23, 5,
17, 22, 12, 14, 27, NA, 16, 9, 33, 11, 15, 34, 41, 30, 33, 15,
25, 40, 25, 12, 12, 17, 14)

How do you include data frame output inside warnings and errors?

How can I include array or data frame output in a message, warning or error?
By default, the output is collapsed by deparseing each column, which isn't useful. Here's an example, using the cars dataset.
message(cars)
## c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24, 24, 25)c(2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)
Print the output, recapture it using capture.output(), and collapse into a single string separated by newlines.
print_and_capture <- function(x)
{
paste(capture.output(print(x)), collapse = "\n")
}
message(print_and_capture(cars))
## speed dist
## 1 4 2
## 2 4 10
## # etc.
stop("An error was found in the cars dataset:\n", print_and_capture(cars))
## Error: An error was found in the cars dataset:
## speed dist
## 1 4 2
## 2 4 10
## # etc.
print_and_capture() is now available in assertive.base.

Resources