error in densityplot mice- missing data example - r

I have the following data:
dput(example)
structure(list(q1 = c(5, 22, 16, 24, 9, 20, 21, 16, 28, 28, 24,
25, 34, 22, 29, NA, 24, 13, 10, 17, 24, 21, 22, 35, 20, 25, 25,
23, 22, 20, 27, 22, 20, 23, 5, 21, 19, 17, 27, 20, 35, 35, 10,
16, 22, 34, 34, 23, 25, 23, 25, 30, 18, 21, 15, 23, 5, 35, 5,
30), q2 = c(5, 5, 24, 15, 5, 5, 26, 23, 24, 9, 24, 5, 15, 26,
30, 14, 14, 19, 11, 25, 20, 5, 14, 13, 11, 10, 13, 16, 16, 21,
10, 12, 20, 9, 15, 5, 13, 5, 30, 18, 12, 27, 10, 9, 20, 5, 9,
10, 11, 26, 22, 8, 6, 5, 15, 6, 5, 35, 10, 18), q3 = c(11, 22,
NA, 22, 6, 18, 30, 6, 26, NA, 17, 22, 33, 19, 22, 25, 23, 13,
13, 15, 16, 16, 23, 24, 6, 25, 27, 12, 25, 17, 28, 15, 20, 31,
5, 17, 17, 20, 24, 7, 35, 35, 10, 10, 20, 10, 31, 21, 16, 32,
25, 30, 10, 24, 15, 24, 5, 35, 9, 26), q4 = c(14, 15, 23, 21,
NA, 25, 30, 23, 28, 20, 25, 5, 35, 30, 19, 23, 30, 5, 23, 18,
30, 15, 30, 22, 8, 29, 35, 23, 23, 24, 25, 25, 20, 25, 5, 15,
34, 8, 32, 35, 35, 35, 10, 6, 21, 10, 24, 27, 10, 30, 35, 15,
6, 21, 15, 15, 5, 35, 19, 26), q5 = c(5, 18, 21, 19, 5, 6, 5,
29, 20, 23, 22, 5, 16, 22, 12, 13, 18, 5, 17, 15, 18, 16, 20,
8, 12, 19, 12, 23, 9, 16, 5, 29, 20, 5, 5, 5, 5, 5, 30, 22, 32,
35, 10, 13, 20, 13, 12, 16, 5, 24, 22, 17, 5, 20, 14, 5, 5, 35,
15, 16), q6 = c(15, 9, 25, 26, 6, 17, 28, 32, 26, 28, 24, 25,
11, 24, 31, 18, 19, 6, 20, 26, 29, 17, 21, 24, 7, 29, 17, 17,
14, 25, 24, 35, 24, 6, 16, 6, 9, 6, 38, 19, 30, 42, 12, 20, 27,
26, 25, 13, 9, 36, 27, 27, 7, 24, 22, 6, 16, 42, 14, 11)), class = "data.frame", row.names = c(NA,
-60L))
I then use mice:
*edit: forgot the complete line
library(mice)
imp <- mice(example,m=5,maxit=50,meth='pmm',seed=500)
example_i <- complete(imp,1)
But when trying to get a densityplot I get the following error:
densityplot(imp)
Error in str2lang(x) : <text>:2:0: unexpected end of input
1: ~
^
My questions are:
Is there something fundamentally wrong about my approach to impute missing data? (this is just a small example)
Am I using properly the MICE arguments?
What am I doing wrong with the density plot, as I have gotten it for all of the other scales I am working with?

Answer
You need to supply a formula to densityplot, otherwise it will plot all variables with > 2 missing values. Since you don't have any variables with 2 > missing values, and since densityplot doesn't expect that, it produces this cryptic error.
Example that works
example$q4[1:10] <- NA
imp <- mice(example, m = 5, maxit = 50, meth = "pmm", seed = 500)
densityplot(imp)
# equivalent: densityplot(imp, ~ q4)
Rationale
imp is of class mids, so you are calling densityplot.mids. Normally, densityplot.mids requires you to provide a formula (data argument), so that it knows which variables to plot (see ?densityplot.mids). If you want to plot q4, then the code is densityplot(imp, ~ q4).
Inside densityplot.mids, we see:
if (missing(data)) {
vnames <- vnames[!allfactors & x$nmis > 2 & x$nmis <
nrow(x$data) - 1]
formula <- as.formula(paste("~", paste(vnames,
collapse = "+", sep = ""), sep = ""))
}
If we use traceback() right after getting your error, then you will see that the last line above is the line that throws the error.
In the first line, you can see the condition xnmis > 2, which means that it will grab all the columns with more than 2 missing values. When no columns satisfy the conditions, then vnames will evaluate to character(0), and so the subsequent line yields as output ~, i.e. the code that you see in your error.
So, why does it give an error when there are too few missings? That's because densityplot plots a distribution, and plotting a distribution of 1 or 2 points is just not doable.
Suggestion
The package maintainers could improve the error by simply checking whether vnames has any content, and if not, they can throw an error that is informative. You may want to add this as an issue on Github if you think it is useful.

Related

R Time series data: Plot multiple batches

I have a big timeseries dataset which looks like the table below. T0, T1, T2,... (goes on till T70) are the timestamps and over 400 batches (A,B,C,...). There are multiple features in the data (Description Column in the sample data) which I'm interested in plotting. My first attempt was to separate the dataset for each description so that I get one row per batch in each subset ranging from T0 to T70.
My aim is to convert this dataframe into a timeseries object and check for seasonality for Good and bad batches (for each description). Can someone help with any easy fixes in R? Thanks!
Update:
My subset of the data for one Description looks like this:
In order to melt the data, I used:
mdf <- melt(df,id.vars = c('Batch',colnames(df[, c(2:70)])))
and it didn't work. I want to get just three variables out of it:
Batch - Time - Value.
Any help would be appreciated!
EDIT:dput(head(df,20)) gave the following output. I have truncated the output till T20 instead of T70.
structure(list(Batch = c("A", "B", "C",
"D", "E", "F", "G", "H",
"I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R",
"S", "T"),
T0 = c(5, 6,
4, 2, 6, 3, 4, 6, 4, 1, 6, 5, 4, 5, 6, 5, 6, 5,
5, 6), T1 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 5, 6, 6), T2 = c(6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6), T3 = c(20,
19, 19, 19, 19, 18, 20, 20, 20, 20, 20, 20, 20, 19,
18, 19, 20, 20, 20, 19), T4 = c(21, 21, 21, 21, 20,
20, 21, 21, 21, 21, 22, 21, 22, 21, 21, 21, 22, 21,
22, 20), T5 = c(22, 22, 22, 22, 22, 21, 21, 22, 21,
22, 23, 22, 23, 22, 22, 23, 23, 23, 23, 22), T6 = c(23,
23, 24, 23, 23, 23, 23, 23, 23, 24, 24, 23, 23, 24,
23, 24, 24, 24, 24, 23), T7 = c(25, 25, 25, 24, 24,
24, 24, 25, 25, 25, 24, 25, 24, 25, 25, 26, 25, 25,
25, 25), T8 = c(26, 26, 25, 26, 25, 26, 26, 26, 26,
26, 25, 26, 26, 26, 26, 26, 25, 26, 25, 26), T9 = c(20,
23, 19, 21, 22, 27, 24, 26, 24, 25, 21, 23, 21, 22,
28, 22, 20, 24, 19, 27), T10 = c(16, 18, 14, 15, 15,
23, 19, 20, 19, 20, 15, 16, 15, 17, 23, 16, 15, 18,
15, 23), T11 = c(15, 16, 15, 15, 16, 17, 15, 14, 15,
15, 15, 14, 15, 15, 17, 15, 15, 15, 15, 17), T12 = c(15,
16, 15, 15, 16, 14, 17, 15, 15, 15, 15, 15, 15, 16,
15, 15, 15, 16, 15, 15), T13 = c(15, 16, 15, 15, 16,
15, 15, 15, 15, 15, 15, 15, 15, 16, 15, 15, 15, 16,
14, 15), T14 = c(16, 16, 15, 16, 16, 15, 16, 15, 16,
15, 15, 15, 15, 16, 16, 15, 16, 16, 15, 16), T15 = c(16,
16, 16, 16, 17, 15, 16, 15, 16, 15, 16, 15, 16, 16,
16, 16, 16, 16, 15, 16), T16 = c(16, 17, 16, 16, 17,
15, 17, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
15, 16), T17 = c(17, 19, 17, 18, 20, 15, 18, 15, 16,
16, 18, 16, 18, 19, 19, 17, 19, 17, 17, 17), T18 = c(24,
26, 27, 26, 28, 22, 25, 20, 25, 20, 26, 25, 27, 26,
25, 25, 28, 25, 27, 24), T19 = c(36, 37, 36, 38, 36,
38, 37, 31, 36, 26, 36, 37, 36, 36, 37, 36, 37, 35,
35, 35), T20 = c(38, 39, 37, 38, 38, 43, 39, 41, 39,
40, 38, 39, 38, 39, 43, 38, 37, 39, 37, 42)), row.names = c(NA,
20L), class = "data.frame")
As long as you don't have data for reproducible practice of the problem, I will add some dummy data. For future questions dput() your data and paste with your question. Your issue can be solved melting your data. In this method with the function melt() from reshape2 you choose variables to be ids and the rest of variables are made rows with a reference in a key variable. Next, I apply that method and I build some plots related to what you want:
library(reshape2)
library(ggplot2)
#Data
df <- data.frame(Batch=rep(c('A','B','C'),2),
Type=c('Good','Bad','Good','Good','Bad','Good'),
Description=c(rep('In',3),rep(c('Out'),3)),
T0=c(1,2,1,4,3,2),
T1=c(2,3,4,1,3,4),
T2=c(3,5,3,5,5,6),stringsAsFactors = F)
#Melt
mdf <- melt(df,id.vars = c('Batch','Type','Description'))
#Plot for description
ggplot(mdf,aes(x=Description,y=value,fill=variable))+
geom_bar(stat='identity')
Using Description on x-axis you will get this:
Also you can wrap by some variable to get different plots like this using facet_wrap():
#Wrap by description
ggplot(mdf,aes(x=Batch,y=value,fill=variable))+
geom_bar(stat='identity')+
facet_wrap(.~Description)
With the melted data mdf you can play and obtain other plots you want.
Update: With the data provided, here a possible solution to your issue:
library(tidyverse)
#Data
dff <- structure(list(Batch = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T"),
T0 = c(5, 6, 4, 2, 6, 3, 4, 6, 4, 1, 6, 5, 4, 5, 6, 5, 6,
5, 5, 6), T1 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 5, 6, 6), T2 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 5, 6, 6, 6, 6, 6), T3 = c(20, 19, 19, 19, 19, 18,
20, 20, 20, 20, 20, 20, 20, 19, 18, 19, 20, 20, 20, 19),
T4 = c(21, 21, 21, 21, 20, 20, 21, 21, 21, 21, 22, 21, 22,
21, 21, 21, 22, 21, 22, 20), T5 = c(22, 22, 22, 22, 22, 21,
21, 22, 21, 22, 23, 22, 23, 22, 22, 23, 23, 23, 23, 22),
T6 = c(23, 23, 24, 23, 23, 23, 23, 23, 23, 24, 24, 23, 23,
24, 23, 24, 24, 24, 24, 23), T7 = c(25, 25, 25, 24, 24, 24,
24, 25, 25, 25, 24, 25, 24, 25, 25, 26, 25, 25, 25, 25),
T8 = c(26, 26, 25, 26, 25, 26, 26, 26, 26, 26, 25, 26, 26,
26, 26, 26, 25, 26, 25, 26), T9 = c(20, 23, 19, 21, 22, 27,
24, 26, 24, 25, 21, 23, 21, 22, 28, 22, 20, 24, 19, 27),
T10 = c(16, 18, 14, 15, 15, 23, 19, 20, 19, 20, 15, 16, 15,
17, 23, 16, 15, 18, 15, 23), T11 = c(15, 16, 15, 15, 16,
17, 15, 14, 15, 15, 15, 14, 15, 15, 17, 15, 15, 15, 15, 17
), T12 = c(15, 16, 15, 15, 16, 14, 17, 15, 15, 15, 15, 15,
15, 16, 15, 15, 15, 16, 15, 15), T13 = c(15, 16, 15, 15,
16, 15, 15, 15, 15, 15, 15, 15, 15, 16, 15, 15, 15, 16, 14,
15), T14 = c(16, 16, 15, 16, 16, 15, 16, 15, 16, 15, 15,
15, 15, 16, 16, 15, 16, 16, 15, 16), T15 = c(16, 16, 16,
16, 17, 15, 16, 15, 16, 15, 16, 15, 16, 16, 16, 16, 16, 16,
15, 16), T16 = c(16, 17, 16, 16, 17, 15, 17, 15, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 15, 16), T17 = c(17, 19,
17, 18, 20, 15, 18, 15, 16, 16, 18, 16, 18, 19, 19, 17, 19,
17, 17, 17), T18 = c(24, 26, 27, 26, 28, 22, 25, 20, 25,
20, 26, 25, 27, 26, 25, 25, 28, 25, 27, 24), T19 = c(36,
37, 36, 38, 36, 38, 37, 31, 36, 26, 36, 37, 36, 36, 37, 36,
37, 35, 35, 35), T20 = c(38, 39, 37, 38, 38, 43, 39, 41,
39, 40, 38, 39, 38, 39, 43, 38, 37, 39, 37, 42)), row.names = c(NA,
-20L), class = "data.frame")
Next the code:
#Code
Melted <- pivot_longer(dff,cols = -Batch)
Melted$name <- factor(Melted$name,levels = unique(Melted$name))
#Plot
ggplot(Melted,aes(x=Batch,y=value,color=name,group=name))+geom_line()

r subset `x` and `labels` must be same type

I am trying to subset my dataset as follows
df[df$Age > 19,]
I am seeing an error , Error: x and labels must be same type
I am not sure I understand this, any suggestions are much appreciated.
=================
dput(df$Age)
c(20, 11, 10, 15, 6, 23, 45, 30, 18, 11, 15, 20, 7, 18, 19, 30,
40, 16, 14, 33, 12, 22, 12, 5, NA, 18, 30, 26, 25, 27, 12, 27,
13, 15, 32, 19, NA, 18, 13, 30, 10, 16, 47, 24, 64, 21, 9, 30,
12, 33, 16, 20, 14, 10, 19, 18, 20, 18, 10, 15, 55, 18, 50, 14,
35, 18, 21, 17, 14, 9, 25, 17, 10, 16, 12, 30, 38, 10, 27, 20,
27, 16, 30, 11, 5, 20, 30, 12, 24, 11, 7, 26, 48, 25, 20, 18,
27, 18, 28, 15, 17, 46, 30, 20, 20, 14, 35, 31, 10, 26, 13, NA,
15, 3, 30, 33, 15, 43, 19, 40, 8, 16, 8, 3, 37, 40, 58, 18, 12,
19, 14, 24, 34, 30, 23, 28, 47, 29, 21, 35, 23, 47, 11, 30, 16,
25, 30, 30, 8, 18, 20, 12, 8, 18, 30, 6, 54, 60, 18, 27, 42,
6, 42, 13, 21, 15, 17, 10, 33, 15, 16, 36, 16, 52, 4, 30, 28,
30, 14, 13, 14, NA, 15, 20, 20, 24, 27, 23, 10, 13, 22, 30, 45,
10, 23, 14, 27, 19, 12, 25, 10, 10, 14, 16, 16, 19, 18, 12, 65,
18, 35, 20, 31, NA, 21, 40, 8, 13, 25, 8, 13, 15, 19, 25, 10,
9, 24, 8, 25, 30, 38, 35, 20, 12, 15, 25, 27, 39, 8, 10, NA,
12, 50, 16, 14, 22, 12, 20, 44, 13, 8, 43, 48, 13, 21, 20, 42,
11, 20, 35, 53, 22, 17, 5, NA, 14, 10, 21, 33, 21, 69, 24, 15,
12, 8, 28, 11, 32, 25, 26, 21, 36, 12, 24, 20, 23, 14, 30, 50,
26, NA, 30, 22, 44, 22, 14, 30, 28, 10, 16, 32, 35, 40, 16, 40,
33, 23, 25, 10, 17, 10, 14, 22, 14, 25, 20, 39, 24, 52, 16, 34,
26, 23, 11, 12, 70, 59, 12, 38, 22, 13, 40, 57, 30, 7, 21, 20,
30, 12, 13, 5, 19, 35, 56, 17, 40, 48, 19, 8, 30, 21, 5, 40,
16, 22, 20, 17, 16, 30, 18, 13, 17, NA, 40, 9, 24, 26, 20, 22,
17, 44, 45, 18, 26, 50, 10, 21, 15, NA, 20, 12, 16, 54, 15, 16,
33, 22, 26, 60, 35, 11, 30, 16, 48, 16, 16, 16, 10, 14, 15, 23,
17, 18, NA, 49, 12, 7, 18, 24, 17, 14, 30, 13, 6, 51, 36, 16,
10, 43, 34, 15, 12, 15, 15, 17, 40, 58, 15, 33, 16, 48, 25, 15,
16, 5, NA, 40, 34, 10, 30, 30, 30, 15, 15, 12, 5, 10, 20, 18,
20, 16, 20, 26, 12, 14, 14, 20, 12, 30, 30, 29, 22, 19, 26, 11,
23, 40, 30, 16, 50, 20, 25, 29, 40, 44, 20, 40, 8, 16, 15, 38,
11, 27, 63, 16, NA, 47, 65, 21, 29, 30, 16, 21, 25, 16, 23, 5,
17, 22, 12, 14, 27, NA, 16, 9, 33, 11, 15, 34, 41, 30, 33, 15,
25, 40, 25, 12, 12, 17, 14)

scatterplot3d I can not make a surface graph

I'm trying to make a surface graph but I can not do it. The graphic is not pretty and I have tried in several forums how to do and I did not succeed.
library(scatterplot3d)
x1 <- rep(10, 6)
x2 <- rep(15, 6)
x3 <- rep(20, 6)
x4 <- rep(25, 6)
x5 <- rep(30, 6)
x <- c(x1, x2, x3, x4, x5)
y1 <- rep(7, 30)
y2 <- rep(21, 30)
y3 <- rep(35, 30)
y <- c(y1, y2, y3)
z = 1781.166805 + 52.445903*y + 203.454647*x -1.570445*x*y -4.119635*(x**2)
scatterplot3d(x, y, z)
I'd highly appreciate if you help me!
First off, for future postings, please use the code tags to properly format your code. Secondly, your formula for z is not valid R syntax.
Lastly, I'd strongly recommend to spend some time taking the tour and learning how to ask good questions.
scatterplot3d allows you to plot a three dimensional point cloud, not a surface. Based on the (poorly formatted) data you provide, this works just fine:
library(scatterplot3d);
z <- 1781.166805 + 52.445903 * y + 203.454647 * x -1.570445 * x * y -4.119635 * x^2;
scatterplot3d(x, y, z);
Update
If you want to have a 3d surface (mesh) plot, you can use e.g. plotly:
require(plotly);
plot_ly(x = ~x, y = ~y, z = ~z, type = "mesh3d");
which produces an interactive (rotatable, zoomable) plot, a screenshot of which looks like this:
Sample data
x <- c(10, 10, 10, 10, 10, 10, 15, 15, 15, 15, 15, 15, 20, 20, 20,
20, 20, 20, 25, 25, 25, 25, 25, 25, 30, 30, 30, 30, 30, 30, 10,
10, 10, 10, 10, 10, 15, 15, 15, 15, 15, 15, 20, 20, 20, 20, 20,
20, 25, 25, 25, 25, 25, 25, 30, 30, 30, 30, 30, 30, 10, 10, 10,
10, 10, 10, 15, 15, 15, 15, 15, 15, 20, 20, 20, 20, 20, 20, 25,
25, 25, 25, 25, 25, 30, 30, 30, 30, 30, 30);
y <- c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35,
35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35,
35, 35, 35, 35);

sorting columns from lowest to highest values (i.e. 1, 2, 3 etc, not 1, 10, 11...2, 20, 21... etc)

I have a dataset with 50 thousand rows that I want to sort according the the values in one of the columns. The numbers in the column go from 1-30, and when I do the following
data=data[order(data$columnname),]
it gets sorted so that the order of the columns is like this
1, 10, 11 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 3, 30, 4, 5, 6, 7, 8, 9
how could I sort it so that it is like this
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
For me it seems, that your format is not numeric. Try this:
data$columnname<-as.numeric(data$columnname)
data=data[order(data$columnname),]

Extract Optimal Parameters from Flexmix: Univariate Data

I am trying to replicate a table in the book HMM:
Specifically, the k=2 model.
I am using the code:
dat<-as.data.frame(c(13, 8, 23, 22, 18, 15, 14, 11, 24, 18, 14, 16, 8, 14, 27, 15, 10, 13, 10, 23, 41, 20, 15, 15, 16, 18, 31, 15, 8, 16, 26, 17, 27, 22, 15, 11, 32, 19, 35, 19, 6, 11, 27, 20, 26, 16, 11, 18, 22, 28, 30, 8, 32, 19, 36, 27, 7, 36, 13, 39, 29, 18, 24, 26, 21, 23, 16, 22, 13, 17, 20, 13, 23, 14, 22, 16, 12, 22, 22, 17, 21, 13, 18, 24, 19, 21, 20, 25, 21, 15, 25, 15, 21, 22, 34, 16, 16, 21, 26, 10, 18, 12, 14, 21, 15, 15, 18))
colnames(dat)<-"x"
x2 <- flexmix(x~1, data=dat, k=2,model=FLXMRglm(formula=x~1,family="poisson"))
summary(x2)
Examining the priors and the log lik it appears the syntax is doing what I want :). My question is how to extract the component means lambda (15.777 and 26.840 above)? I do not believe they are simply the mean value of the data in each cluster.
exp(parameters(x2))
You can use parameters to extract the coefficients but the model defaults to using a log link-function so you need to use exp to convert back to the original scale.

Resources