Related
I am trying to create a histogram with a rainbow color scale but I also want to have the bin labels. I have been able to create a histogram with labeled bins and I have read a couple of posts talking about how to make a rainbow histogram which I have been able to recreate (here and here). However, I have not been able to create a rainbow histogram with the correct bin labels. I will attach an example data set and some sample code that I have tried. Ideally, I would also like to remove any bin labels that have zero as a value but I don't want to be too greedy here.
ggplot(final_df,aes(x=V1, fill = cut(V1, 25)))+ geom_histogram(show.legend = FALSE) +
stat_bin(aes(y=..count.., label=..count..), geom="text", vjust=-.5)
As you can see, it creates the rainbow histogram but the bin labels are all messed up.
structure(list(V1 = c(18, 0, 20, 21, 0, 2, 0, 1, 0, 0, 4, 16,
0, 0, 20, 20, 2, 0, 19, 22, 0, 0, 19, 0, 22, 22, 19, 2, 0, 0,
1, 18, 23, 1, 3, 1, 1, 1, 0, 21, 21, 0, 0, 15, 24, 0, 20, 19,
0, 1, 20, 21, 0, 0, 20, 22, 20, 0, 21, 0, 0, 22, 0, 0, 0, 23,
2, 1, 1, 21, 0, 2, 3, 23, 23, 1, 22, 0, 19, 23, 1, 2, 23, 1,
0, 0, 20, 1, 0, 0, 1, 18, 0, 0, 0, 0, 0, 2, 0, 7, 22, 0, 0, 23,
1, 0, 23, 0, 0, 1, 2, 0, 0, 18, 16, 0, 0, 1, 0, 0, 0, 2, 22,
0, 2, 0, 0, 0, 24, 0, 0, 0, 1, 1, 20, 0, 0, 1, 18, 0, 1, 1, 0,
0, 3, 0, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 20, 2,
0, 1, 22, 0, 1, 23, 2, 0, 1, 5, 0, 10, 1, 17, 0, 0, 1, 1, 2,
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 2, 0, 23, 2, 19, 2, 1, 21, 3,
0, 0, 20, 0, 1, 0, 1, 0, 0, 24, 2, 1, 1, 23, 1, 1, 0, 1, 0, 0,
22, 23, 0, 23, 0, 22, 2, 19, 0, 20, 22, 0, 23, 0, 21, 0, 0, 23,
0, 0, 0, 0, 3, 22, 1, 0, 1, 22, 22, 20, 0, 1, 2, 22, 2, 23, 0,
18, 1, 23, 0, 2, 0, 1, 22, 0, 21, 0, 2, 20, 0, 0, 23, 0, 1, 18,
0, 18, 20, 1, 0, 20, 0, 1, 0, 0, 17, 20, 0, 0, 1, 22, 20, 22,
2, 1, 1, 0, 1, 0, 0, 0, 18, 0, 0, 21, 0, 0, 2, 22, 20, 1, 0,
0, 0, 0, 1, 0, 0, 1, 0, 4, 1, 0, 21, 21, 0, 0, 1, 0, 1, 3, 0,
1, 1, 0, 24, 0, 0, 22, 17, 0, 1, 20, 1, 1, 21, 1, 21, 21, 0,
21, 0, 1, 23, 0, 0, 23, 21, 0, 0, 24, 0, 6, 17, 0, 21, 0, 23,
0, 0, 22, 1, 1, 22, 0, 2, 0, 0, 1, 19, 0, 21, 21, 2, 1, 18, 1,
21, 0, 1, 1, 0, 0, 1, 23, 0, 0, 1, 0, 0, 0, 1, 2, 1, 0, 0, 0,
25, 0, 0, 1, 0, 0, 0, 23, 23, 0, 0, 0, 21, 19, 2, 0, 0, 0, 0,
0, 1, 0, 22, 22, 0, 19, 0, 3, 0, 21, 0, 1, 20, 1, 1, 1, 22, 1,
22, 1, 22, 1, 0, 2, 0, 25, 23, 0, 20, 0, 2, 22, 0, 0, 1, 0, 1,
23, 22, 0, 1, 19, 23, 1, 0, 2, 0, 18, 0, 0, 2, 0, 0, 23, 0, 0,
0, 0, 0, 1, 2, 1, 0, 21, 0, 21, 20, 0, 1, 19, 23, 0, 1, 23, 0,
1, 22, 21, 3, 0, 22, 2, 0, 1, 23, 2, 0, 24, 23, 21, 23, 20, 0,
0, 0, 20, 22, 0, 2, 0, 17, 0, 0, 1, 22, 1, 1, 1, 0, 0, 3, 3,
5, 21, 21, 1, 19, 18, 0, 24, 1, 2, 0, 0, 1, 1, 0, 0, 0, 0, 0,
0, 23, 1, 20, 0, 0, 1, 19, 22, 21, 24, 3, 1, 2, 24, 0, 0, 23,
17, 22, 0, 24, 23, 16, 1, 0, 2, 20, 0, 19, 0, 2, 1, 22, 20, 0,
20, 0, 1, 22, 0, 1, 0, 2, 0, 1, 0, 0, 2, 25, 24, 2, 20, 3, 0,
0, 23, 0, 4, 0, 19, 1, 0, 1, 0, 3, 19, 22, 0, 0, 0, 1, 0, 1,
23, 20, 20, 23, 0, 0, 0, 24, 0, 21, 20, 23, 0, 1, 1, 0, 19, 0,
0, 0, 1, 22, 0, 22, 0, 1, 18, 0, 20, 1, 0, 0, 1, 20, 0, 0, 0,
0, 0, 0, 0, 0, 19, 0, 0, 1, 0, 2, 23, 19, 21, 4, 1, 0, 0, 1,
23, 21, 21, 4, 20, 24, 0, 3, 0, 20, 23, 1, 23, 21, 20, 18, 0,
21, 2, 1, 21, 0)), class = "data.frame", row.names = c(NA, -713L
))
The issue is that you manually bin your V1 variable using cut(V1, 25. Thereby you get 25 groups which (while most of the time having a zero count) get stacked on top of each other. Hence, you end up with 25 stacked (and overlapping) labels per bin. Instead make use of the bins computed by stat_bin by mapping factor(..x..) on fill:
library(ggplot2)
p <- ggplot(final_df, aes(x = V1, fill = factor(..x..))) +
geom_histogram(show.legend = FALSE)
p +
stat_bin(aes(y = ..count.., label = ..count..), geom = "text", vjust = -.5)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
To get rid of the zero entries you could make use of an ifelse:
p +
stat_bin(aes(y = ..count.., label = ifelse(..count.. > 0, ..count.., "")), geom = "text", vjust = -.5)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Consider dput:
structure(list(REAÇÃO = structure(c(0, 1, 0, 0, 1, 0, 1, 1,
0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
1, 0, 1, 1, 0, 1, 1), format.spss = "F11.0"), IDADE = structure(c(22,
38, 36, 58, 37, 31, 32, 54, 60, 34, 45, 27, 30, 20, 30, 30, 22,
26, 19, 18, 22, 23, 24, 50, 20, 47, 34, 31, 43, 35, 23, 34, 51,
63, 22, 29), format.spss = "F11.0"), ESCOLARIDADE = structure(c(6,
12, 12, 8, 12, 12, 10, 12, 8, 12, 12, 12, 8, 4, 8, 8, 12, 8,
9, 4, 12, 6, 12, 12, 12, 12, 12, 12, 12, 8, 8, 12, 16, 12, 12,
12), format.spss = "F11.0"), SEXO = structure(c(1, 1, 0, 0, 1,
0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0,
0, 1, 0, 1, 0, 0, 0, 1, 1, 1), format.spss = "F11.0")), .Names = c("REAÇÃO",
"IDADE", "ESCOLARIDADE", "SEXO"), row.names = c(NA, -36L), class = "data.frame")
where: REAÇÃO is a dependent variable in the model.
Constant: -4.438.
How can I obtain this value using a simple function in R?
For obtain constant term in Discriminant Analysis on R (with library MASS):
groupmean<-(model$prior%*%model$means)
constant<-(groupmean%*%model$scaling)
constant
where model is the lda discriminant expression:
model<-lda(y~x1+x2+xn,data=mydata)
model
Example data:
df <- structure(
list(
group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 7, 7, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11),
val = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.000141111115226522, 0, 0, 0, 0.00127000000793487, 0.00070555554702878, 0.000141111115226522, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.00127000000793487, 0.000282222230453044, 0, 0, 0.000141111115226522, 0.000282222230453044, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.00070555554702878, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.000141111115226522, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
),
.Names = c("group", "val"),
row.names = c(NA, -1000L),
class = c("data.table", "data.frame"),
.internal.selfref = <pointer: 0xe4d468>
)
I can plot this as a timeseries per group, like so:
ggplot(df, aes(x=unlist(by(val, group, seq_along)), y=val, group=group)) +
geom_line(alpha=0.5)
but I want to plot a rolling mean of the data, like so:
library(zoo)
ggplot(df, aes(x=unlist(by(val, group, seq_along)),
y=rollmean(val, 48, fill=NA), group=group)) +
geom_line(alpha=0.5)
But this adds upticks to the end of each line, that do not exist in the data:
The upticks at 130 and 670 do not exist in the data, nor do they exist in the rolling mean, as you can see with rollmean(df[group==5, val], 48, fill=NA). So what is causing them?
The first uptick does occur at exactly 132. In your rolling mean, you chose the default align, which sets it to center, meaning that each point is the mean of the previous k/2 and the future k/2 points. Since you set k=48, it means that point 132 will be the mean of (132-24):(132+24). You can verify that the first non-zero point is indeed 156.
# First non-zero value
min(which(df$val!=0))
# 156
You can also verify that the first non-zero value in the rolling mean is 132.
df$rollmean <- rollmean(df$val, 48, fill=NA)
min(which(df$rollmean!=0))
# 132
Additionally, it looks like you are applying your rolling mean across all groups, which you almost certainly don't want. Try splitting by group, like you did with by to create the time variable. Here is an example:
# Set a time variable before hand
df$time <- with(df, unlist(by(val, group, seq_along)))
df$group <- as.factor(df$group)
k=48
# Remove those groups wtihout enough values for rolling mean of k window
df.subset <- df[df$group %in% names(which(table(df$group) >= k)),]
# Calculate a rolling mean on each group
df.subset$rollmean <- unlist(by(df.subset$val, df.subset$group, FUN=rollmean, k=k, fill=NA))
# Plot
ggplot(df.subset, aes(x=time,
y=rollmean,
colour=group)) + geom_line()
I work with animal trials in which I try to get information about movement for several groups of animals (normally 4 groups of 12 individuals, but not allways the same).
My final data frame per trial looks like this.
> dput(aa)
structure(list(Tiempo = c(618.4, 618.6, 618.8, 619, 619.2, 619.4,
619.6, 619.8, 620, 620.2, 620.4), UT1 = c(0, 0, 15, 19, 26, 27,
29, 37, 42, 44, 45), UT2 = c(0, 0, 0, 0, 0, 1, 18, 19, 21, 21,
21), UT3 = c(0, 2, 3, 3, 3, 3, 16, 19, 20, 20, 20), UT4 = c(0,
0, 0, 0, 0, 0, 5, 17, 29, 34, 39), UT5 = c(0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), UT6 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), UT7 = c(0,
0, 1, 2, 2, 3, 4, 6, 7, 7, 8), UT8 = c(0, 19, 20, 23, 24, 25,
33, 80, 119, 122, 130), UT9 = c(0, 1, 1, 1, 1, 3, 6, 9, 19, 19,
19), UT10 = c(0, 0, 0, 0, 0, 1, 2, 3, 10, 12, 14), TR1 = c(0,
0, 0, 0, 0, 0, 0, 1, 2, 2, 2), TR2 = c(0, 0, 0, 0, 0, 0, 2, 19,
32, 37, 43), TR3 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), TR4 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TR5 = c(0, 0, 0, 0, 0, 0, 13,
18, 20, 22, 26), TR6 = c(0, 2, 11, 20, 25, 29, 37, 40, 41, 42,
43), TR7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TR8 = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), TR9 = c(0, 0, 4, 9, 16, 19, 23, 27,
31, 33, 34), TR10 = c(0, 1, 9, 25, 32, 41, 49, 49, 51, 57, 60
), UT1.1 = c(0, 10, 15, 17, 23, 31, 37, 48, 53, 57, 58), UT2.1 = c(0,
1, 1, 1, 1, 2, 2, 4, 4, 4, 4), UT3.1 = c(0, 2, 11, 14, 20, 22,
24, 25, 26, 26, 26), UT4.1 = c(0, 0, 0, 0, 0, 0, 0, 11, 13, 13,
14), UT5.1 = c(0, 3, 5, 7, 18, 19, 19, 27, 37, 39, 42), UT6.1 = c(0,
0, 0, 0, 0, 0, 2, 2, 3, 4, 4), UT7.1 = c(0, 0, 2, 8, 9, 9, 12,
16, 18, 18, 18), UT8.1 = c(0, 0, 1, 8, 13, 15, 44, 68, 80, 89,
94), UT9.1 = c(0, 1, 1, 1, 1, 2, 3, 5, 9, 10, 10), UT10.1 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), UT11 = c(0, 12, 17, 17, 18, 34,
74, 116, 131, 145, 170), UT12 = c(0, 1, 2, 3, 3, 3, 5, 14, 21,
22, 24), TR1.1 = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1), TR2.1 = c(0,
0, 0, 11, 16, 19, 40, 94, 121, 134, 145), TR3.1 = c(0, 0, 0,
2, 3, 5, 6, 6, 6, 7, 7), TR4.1 = c(0, 0, 0, 1, 1, 1, 1, 1, 4,
4, 5), TR5.1 = c(0, 24, 27, 28, 29, 37, 86, 151, 212, 258, 288
), TR6.1 = c(0, 0, 1, 1, 1, 2, 5, 9, 12, 12, 13), TR7.1 = c(0,
4, 7, 28, 47, 70, 108, 125, 127, 127, 127), TR8.1 = c(0, 1, 2,
2, 2, 2, 3, 3, 4, 4, 4), TR9.1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), TR10.1 = c(0, 1, 1, 1, 1, 1, 13, 40, 41, 45, 49), TR11 = c(0,
0, 0, 1, 4, 8, 10, 11, 17, 23, 25), TR12 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0)), .Names = c("Tiempo", "UT1", "UT2", "UT3", "UT4",
"UT5", "UT6", "UT7", "UT8", "UT9", "UT10", "TR1", "TR2", "TR3",
"TR4", "TR5", "TR6", "TR7", "TR8", "TR9", "TR10", "UT1.1", "UT2.1",
"UT3.1", "UT4.1", "UT5.1", "UT6.1", "UT7.1", "UT8.1", "UT9.1",
"UT10.1", "UT11", "UT12", "TR1.1", "TR2.1", "TR3.1", "TR4.1",
"TR5.1", "TR6.1", "TR7.1", "TR8.1", "TR9.1", "TR10.1", "TR11",
"TR12"), row.names = c(NA, -11L), class = "data.frame")
My goal is to lm the individuals represented in each column using Tiempo variable as x so I do it like this:
fit<-apply(aa,2,function(x) lm(x~aa$Tiempo))
It works perfect but the problem is that all the valuable (and useless) information gets stored in that lm object and I can't extract the data in an efficient way. My lm object looks like this
summary(fit)
Length Class Mode
Tiempo 12 lm list
UT1 12 lm list
UT2 12 lm list
UT3 12 lm list
UT4 12 lm list
UT5 12 lm list
UT6 12 lm list
UT7 12 lm list
UT8 12 lm list
UT9 12 lm list
UT10 12 lm list
TR1 12 lm list
TR2 12 lm list
TR3 12 lm list
TR4 12 lm list
TR5 12 lm list
TR6 12 lm list
TR7 12 lm list
TR8 12 lm list
TR9 12 lm list
TR10 12 lm list
UT1.1 12 lm list
UT2.1 12 lm list
UT3.1 12 lm list
UT4.1 12 lm list
UT5.1 12 lm list
UT6.1 12 lm list
UT7.1 12 lm list
UT8.1 12 lm list
UT9.1 12 lm list
UT10.1 12 lm list
UT11 12 lm list
UT12 12 lm list
TR1.1 12 lm list
TR2.1 12 lm list
TR3.1 12 lm list
TR4.1 12 lm list
TR5.1 12 lm list
TR6.1 12 lm list
TR7.1 12 lm list
TR8.1 12 lm list
TR9.1 12 lm list
TR10.1 12 lm list
TR11 12 lm list
TR12 12 lm list
And each animal looks like this
summary(fit$UT1)
Call:
lm(formula = x ~ aa$Tiempo)
Residuals:
Min 1Q Median 3Q Max
-6.873 -1.845 1.182 2.314 4.918
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -14642.700 1104.825 -13.25 3.29e-07 ***
aa$Tiempo 23.682 1.784 13.28 3.24e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.742 on 9 degrees of freedom
Multiple R-squared: 0.9514, Adjusted R-squared: 0.946
F-statistic: 176.3 on 1 and 9 DF, p-value: 3.24e-07
I would like to get the summary information organised in a data frame with all animals (or at least the coefficients and R-squared data) in order to keep doing some statistical analysis. Having that information cuould possibly help me to think a function to evaluate if the R-squared is lower than a fixed value and I should check that fit (or discard that animal if it's really not performing well). Besides, I should find a way to make it reproducible because nowadays I'm using
FIT<-data.frame(UT1=fit$UT1$coefficients,
UT2=fit$UT2$coefficients,
UT3=fit$UT3$coefficients,...)
This approach doesn't even meet what I'm trying to do and it's really precarious.
I've made a little search and find about coef function but
coef(fit)
NULL
With your fit list, you can extract the coefficients and r-squared values with
fit<-apply(aa,2,function(x) lm(x~aa$Tiempo))
mysummary <- t(sapply(fit, function(x) {
ss<-summary(x); c(coef(x),
r.square=ss$r.squared, adj.r.squared=ss$adj.r.squared)
}))
We use sapply to go over the list you created and extract the coefficients from the model and the r-squared values from the summary. The output is
> mysummary
(Intercept) aa$Tiempo r.square adj.r.squared
Tiempo 0.0000 1.0000000 1.0000000 1.0000000
UT1 -14642.7000 23.6818182 0.9514231 0.9460256
UT2 -8662.4182 14.0000000 0.7973105 0.7747894
UT3 -7535.5091 12.1818182 0.8404400 0.8227111
...
I was wondering if anyone could tell me how to represent the enumeration of vectors of privite key f in a Meet-In-the-Middle Attack on an NTRU Private key. I can not understand the example, given here http://securityinnovation.com/cryptolab/pdf/NTRUTech004v2.pdf
I'll be very thankful if anyone could show an example in detail.
(Full disclosure: I work for Security Innovation and worked for NTRU until SI acquired us)
Warning: Long answer!
Let's look at a toy example: N = 11, q = 29. Let's take df = 3, so f consists of 3 coefficients equal to 1 and 8 coefficients equal to 0. Take dg = 5. And assume that h = g*f^{-1} mod p, rather than using the optimizations that have f = 1+pF. Then we might have
f = [1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
finv = [16, 12, 4, 18, 17, 14, 9, 28, 8, 26, 3]
g = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0]
h = [15, 20, 1, 21, 4, 26, 14, 17, 25, 11, 12]
You can check that f*h = g here.
The attacker wants to find f, so they can do the brute force search for df = 3. They can speed this up by taking advantage of the fact that there will be some rotation of f that has a 1 in the first position, so they only need to search the (10 pick 2) possible locations for the other two nonzero coefficients of f. The full search they perform is this:
f*h (=g) f
[9, 18, 7, 13, 26, 22, 15, 28, 27, 24, 19]; [1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[23, 17, 4, 8, 16, 2, 3, 6, 10, 21, 11]; [1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[15, 2, 3, 5, 11, 21, 12, 23, 17, 4, 8]; [1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[12, 23, 17, 4, 8, 16, 2, 3, 5, 11, 20]; [1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[24, 20, 9, 18, 7, 13, 26, 22, 14, 28, 27]; [1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0]
[2, 3, 6, 10, 21, 12, 23, 17, 4, 8, 15]; [1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0]
[19, 10, 18, 7, 13, 26, 22, 14, 28, 27, 24]; [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[28, 27, 25, 19, 10, 18, 7, 13, 25, 22, 14]; [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0]
[18, 7, 13, 26, 22, 15, 28, 27, 24, 19, 9]; [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[22, 14, 28, 27, 25, 19, 10, 18, 7, 13, 25]; [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]
[14, 28, 27, 24, 20, 9, 19, 6, 14, 25, 22]; [1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0]
[11, 20, 12, 23, 17, 4, 9, 15, 2, 3, 5]; [1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0]
[23, 17, 4, 8, 16, 1, 4, 5, 11, 20, 12]; [1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]
[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0]; [1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
[18, 7, 13, 26, 22, 14, 0, 26, 25, 19, 9]; [1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0]
[27, 24, 20, 9, 19, 6, 14, 25, 22, 14, 28]; [1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]
[17, 4, 8, 16, 2, 3, 6, 10, 21, 11, 23]; [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]
[28, 27, 24, 19, 10, 18, 7, 13, 26, 22, 14]; [1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0]
[25, 19, 9, 18, 7, 13, 26, 22, 14, 0, 26]; [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]
[8, 16, 1, 3, 6, 10, 21, 12, 23, 17, 4]; [1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]
[15, 28, 27, 24, 20, 9, 18, 7, 13, 26, 21]; [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0]
[3, 6, 10, 21, 12, 23, 17, 4, 8, 16, 1]; [1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]
[12, 23, 17, 4, 9, 15, 2, 3, 5, 11, 20]; [1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0]
[2, 3, 5, 11, 21, 12, 23, 17, 4, 8, 15]; [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
[17, 4, 8, 15, 2, 3, 6, 10, 21, 12, 23]; [1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0]
[0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1]; [1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0]
[7, 13, 26, 21, 15, 28, 27, 24, 20, 9, 18]; [1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0]
[24, 20, 9, 18, 7, 13, 26, 21, 15, 28, 27]; [1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0]
[4, 8, 16, 1, 4, 5, 11, 20, 12, 23, 17]; [1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0]
[23, 17, 4, 8, 16, 2, 3, 5, 11, 20, 12]; [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]
[26, 22, 14, 28, 27, 24, 20, 9, 18, 7, 13]; [1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]
[4, 5, 11, 20, 12, 23, 17, 4, 8, 16, 1]; [1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0]
[21, 12, 23, 17, 4, 8, 16, 1, 3, 6, 10]; [1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0]
[1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0]; [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0]
[20, 9, 18, 7, 13, 26, 22, 14, 28, 27, 24]; [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
[16, 2, 3, 5, 11, 20, 12, 23, 17, 4, 8]; [1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0]
[4, 9, 15, 2, 3, 5, 11, 20, 12, 23, 17]; [1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0]
[13, 26, 22, 14, 0, 26, 25, 19, 9, 18, 7]; [1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
[3, 6, 10, 21, 12, 23, 17, 4, 8, 15, 2]; [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
[11, 21, 12, 23, 17, 4, 8, 15, 2, 3, 5]; [1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
[20, 9, 19, 6, 14, 25, 22, 14, 28, 27, 24]; [1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0]
[10, 18, 7, 13, 26, 22, 14, 28, 27, 24, 19]; [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1]
[8, 16, 2, 3, 6, 10, 21, 11, 23, 17, 4]; [1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0]
[27, 25, 19, 10, 18, 7, 13, 25, 22, 14, 28]; [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1]
[7, 13, 26, 22, 15, 28, 27, 24, 19, 9, 18]; [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
Scan down there, and you can see that g appears in row 14, 26 and 34 of the 45 rows. (g appears three times because there are three 1's in f, so there are three rotations of f that have a 1 in the leading position).
Now let's look at the meet-in-the-middle attack. The attacker uses the formula
(f1+f2) * h = g
so
f1*h = g - f2*h
Using e[i] to mean the i'th coefficient of e, this means that the attacker knows that
(f1*h)[i] = - (f2*h)[i] + 0 or 1
So the attacker calculates all possible values of f1*h. Call the resulting list {g1}. They then calculate -f2*h and for each result g2, they see if g2 is the same as an existing g1 or if g2 differs from any g1 by no more than 1 in each coefficient. In other words,
[3, 10, 12, 7]
would match
[4, 10, 12, 8]
Doing it this way, the attacker needs only work through the following:
All 10 f1s with a 1 in the leading position and a 1 somewhere else
All 10 f2s with a single 1 in any position other than the leading one
This gives the following. I've sorted the lists to make the matches easier to spot.
f1*h = g1 f1
[00, 08, 26, 03, 16, 12, 05, 18, 17, 15, 09] [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[03, 16, 12, 04, 19, 17, 15, 09, 00, 08, 26] [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[06, 21, 22, 25, 01, 11, 02, 13, 07, 23, 27] [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[07, 24, 27, 06, 21, 22, 25, 00, 11, 02, 13] [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[11, 02, 13, 07, 24, 27, 06, 21, 22, 25, 00] [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[12, 05, 18, 17, 15, 09, 00, 08, 26, 03, 16] [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
[16, 12, 05, 18, 18, 14, 10, 28, 08, 26, 03] [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
[19, 17, 15, 09, 00, 08, 26, 03, 16, 12, 04] [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
[26, 03, 16, 12, 05, 18, 18, 14, 10, 28, 08] [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[27, 06, 21, 22, 25, 01, 11, 02, 13, 07, 23] [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-f2*h = g2 f2
[03, 15, 12, 04, 18, 17, 14, 09, 28, 08, 25] [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
[04, 18, 17, 14, 09, 28, 08, 25, 03, 15, 12] [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[08, 25, 03, 15, 12, 04, 18, 17, 14, 09, 28] [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[09, 28, 08, 25, 03, 15, 12, 04, 18, 17, 14] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[12, 04, 18, 17, 14, 09, 28, 08, 25, 03, 15] [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[15, 12, 04, 18, 17, 14, 09, 28, 08, 25, 03] [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[17, 14, 09, 28, 08, 25, 03, 15, 12, 04, 18] [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[18, 17, 14, 09, 28, 08, 25, 03, 15, 12, 04] [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[25, 03, 15, 12, 04, 18, 17, 14, 09, 28, 08] [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
[28, 08, 25, 03, 15, 12, 04, 18, 17, 14, 09] [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
You can see that:
line 1 of g1 matches with line 10 of g2, giving [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0]
line 2 of g1 matches with line 1 of g2, giving [1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0]
line 6 of g1 matches with line 5 of g2, giving [1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0]
line 7 of g1 matches with line 6 of g2, giving [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0]
line 8 of g1 matches with line 8 of g2, giving [1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
line 9 of g1 matches with line 9 of g2, giving [1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
There are 6 collisions here because there are 3 rotations with a 1 in the leading position and for each rotation there are two ways to pick the other two coefficients.
So an attacker would have to do about 45/3 = 15 work to find the key with a brute force search and about 10 work to find the key with a meet-in-the-middle attack (slightly less than 10 due to the rotations, but I don't have a clean formula to hand).
There are various optimizations, but this should be enough to give you the idea.
One thing I haven't dealt with so far is how to keep the search time down. A straightforward way to do it is simply to sort the results as you're going along. The time to insert or look for a collision with an entry is about log_2(size of the search space). Alternatively, at the cost of using more memory, it's possible to bring this search time down to a constant by reserving a block for each possible value of the first few coefficients of g1.
Hope this helps. Let me know if you have any more questions.