I have to run a glmer with a continuous response variable, two variable as fixed effects and random effect:
glmer.1 <- glmer(Conc ~ Metodo * Kit +
(1|Especie),
data = Input_2,
family = Gamma(link = "inverse"))
I got this warning message:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0652105 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
I tried this with bobyqa() as optimization argument and got this warning messages:
glmer.Conc<-glmer(Conc ~ Metodo * Kit
+ (1 | Especie),
data = Input_2,
control = glmerControl(optimizer = "bobyqa"),
family = Gamma)
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0747655 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
my summary looks like this:
summary(glmer.1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: Gamma ( inverse )
Formula: Conc ~ Metodo * Kit + (1 | Especie)
Data: Input_2
AIC BIC logLik deviance df.resid
3202.0 3223.6 -1595.0 3190.0 264
Scaled residuals:
Min 1Q Median 3Q Max
-0.7347 -0.5647 -0.2876 0.2740 12.0396
Random effects:
Groups Name Variance Std.Dev.
Especie (Intercept) 5.264e-05 0.007256
Residual 1.724e+00 1.313030
Number of obs: 270, groups: Especie, 6
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 0.0089191 0.0028893 3.087 0.00202 **
Metodo -0.0002581 0.0009247 -0.279 0.78013
Kit -0.0009974 0.0007818 -1.276 0.20203
Metodo:Kit 0.0007497 0.0004608 1.627 0.10380
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Metodo Kit
Metodo -0.531
Kit -0.531 0.857
Metodo:Kit 0.449 -0.907 -0.897
convergence code: 0
Model failed to converge with max|grad| = 0.0652105 (tol = 0.002, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Especie represent the type of specie, the metodo and kit represent what type of method and kit was used to obtain the concentration and Conc is concentraction values
My data frame looks like this:
Input_2 <- readr::read_table2("Especie Metodo Kit Conc A1 A2
1 1 1 70.4 1.5 0.9
1 1 1 57.2 1.7 1.4
1 1 1 22.2 1.6 1.1
1 1 1 60.6 1.5 0.9
1 1 1 50.2 1.6 1
1 1 2 40.5 2.1 1.7
1 1 2 86.6 2.1 2.1
1 1 2 53.9 1.8 1.3
1 1 2 50.3 1.9 0.2
1 1 2 44.1 2 1.5
1 1 3 159.1 2.1 2.3
1 1 3 246 2.1 2.4
1 1 3 81.5 2 1.3
1 1 3 107.9 2.1 1.9
1 1 3 125 2 1.7
1 2 1 72.9 1.6 1.1
1 2 1 38.1 1.6 1.1
1 2 1 8.3 1.3 1.8
1 2 1 31 1.6 0.8
1 2 1 36.3 1.3 1.2
1 2 2 16.8 1.9 1.8
1 2 2 4.8 2.2 1
1 2 2 20.4 1.4 1
1 2 2 31.8 1.9 0.4
1 2 2 5 1.6 1
1 2 3 17.9 2 2.2
1 2 3 14.9 2.6 2.5
1 2 3 5.3 1.6 0.5
1 2 3 17.1 2.2 1.3
1 2 3 17.1 1.9 1
1 3 1 67.3 1.6 1
1 3 1 11.3 1.4 0.7
1 3 1 4.2 1 1
1 3 1 9.7 1.1 0.8
1 3 1 20 1.4 0.8
1 3 2 4.3 1.4 1.3
1 3 2 4.1 1.6 2
1 3 2 11.4 1.5 0.3
1 3 2 12.5 1.4 0.2
1 3 2 11 1.5 0.9
1 3 3 59.7 1.9 2.4
1 3 3 81.4 2.3 2.4
1 3 3 32.2 1.9 1.6
1 3 3 24.9 1.8 1.2
1 3 3 50.9 2.1 1.2
2 1 1 185 1.6 1
2 1 1 146.2 1.7 1.2
2 1 1 239 1.8 1.4
2 1 1 141.9 2 2
2 1 1 303.7 1.9 1.8
2 1 2 53.6 1.9 0.6
2 1 2 424.8 2.1 1.5
2 1 2 30.3 1.9 0.4
2 1 2 291 2.1 1.5
2 1 2 471.6 2.1 2.1
2 1 3 132.8 1.9 1.4
2 1 3 124.5 2.1 1.7
2 1 3 484.4 2.1 2.1
2 1 3 251.3 2.2 2.2
2 1 3 598.6 2.1 2.2
2 2 1 15.6 1.4 0.9
2 2 1 107.6 1.7 1.1
2 2 1 172.2 1.8 1.6
2 2 1 210.2 1.9 1.9
2 2 1 699.7 2 2.2
2 2 2 20 1.7 0.2
2 2 2 86.2 2 0.9
2 2 2 36.8 2.1 0.3
2 2 2 434.4 2.1 2.3
2 2 2 209.5 2.2 1.5
2 2 3 37 2.1 1.6
2 2 3 94.8 2.1 1.8
2 2 3 220.2 2.2 2.1
2 2 3 17.7 2.3 0.8
2 2 3 145.3 2.1 1.9
2 3 1 33.4 1.5 0.8
2 3 1 87 1.7 1.2
2 3 1 98.9 1.8 1.8
2 3 1 176.6 2 2.2
2 3 1 180.7 1.9 1.9
2 3 2 20.8 1.8 0.3
2 3 2 226.2 2.1 1.1
2 3 2 227.8 2.1 1.7
2 3 2 40.8 2.1 0.4
2 3 2 83.4 2.1 0.9
2 3 3 35.7 2.1 1.2
2 3 3 191.9 2.1 1.8
2 3 3 203.2 2.2 2.2
2 3 3 143.3 2.2 2
2 3 3 70.7 2.1 1.7
3 1 1 190.7 2.1 2
3 1 1 456.5 2 2.2
3 1 1 520.9 2 2.2
3 1 1 535.5 2 2.1
3 1 1 894.3 2 2.3
3 1 2 265.2 2.1 1.7
3 1 2 392.2 2.1 2.3
3 1 2 241.3 2.1 1.4
3 1 2 156.9 2.1 1.7
3 1 2 277.2 2.1 1.7
3 1 3 431.6 2.1 2.1
3 1 3 669.9 2.2 2.3
3 1 3 342.5 2.1 2.1
3 1 3 572.3 2.1 2.2
3 1 3 435.2 2.1 2.1
3 2 1 498.6 2 2.1
3 2 1 1137.2 2.1 2.3
3 2 1 764.5 2.1 2.3
3 2 1 746.4 2.1 2.2
3 2 1 737.2 2.1 2.2
3 2 2 98.1 2 1.4
3 2 2 237.9 2.1 2.2
3 2 2 144 2 1.2
3 2 2 240.3 2.1 2.1
3 2 2 253.4 2.1 1.8
3 2 3 247.3 2.2 1.7
3 2 3 155.3 2.1 1.8
3 2 3 179.3 2.2 1.6
3 2 3 225.7 2.1 2.2
3 2 3 274.4 2.1 2.1
3 3 1 944.9 2.1 2.3
3 3 1 978.4 2.1 2.3
3 3 1 785.9 2 2.1
3 3 1 510.7 2 2.1
3 3 1 164.5 2 2.2
3 3 2 48.1 2 0.3
3 3 2 427.7 2.1 2.2
3 3 2 75 2 0.9
3 3 2 153 2.1 1.3
3 3 2 293.8 2.1 2.4
3 3 3 335.8 2.1 1.9
3 3 3 395.5 2.1 2.1
3 3 3 181.1 2.1 1.9
3 3 3 468.4 2.1 2.2
3 3 3 348 2.1 2.1
4 1 1 335.8 2 1.9
4 1 1 188.1 1.9 1.5
4 1 1 219 2 2
4 1 1 104.1 1.9 1.3
4 1 1 273.7 2 2
4 1 2 573.2 2.2 1.8
4 1 2 115.3 2.1 2.1
4 1 2 56.7 2 0.4
4 1 2 316.5 2.1 1.6
4 1 2 85.2 2.1 1.9
4 1 3 585 2.2 2.3
4 1 3 377.5 2.1 2
4 1 3 47.1 2.1 1.6
4 1 3 73.1 2 1.4
4 1 3 91.9 2 1.2
4 2 1 374.8 2 1.7
4 2 1 85.8 1.9 1.9
4 2 1 53.1 1.8 1.8
4 2 1 34.3 1.8 1.1
4 2 1 27 1.9 1.1
4 2 2 182.9 2.1 1.3
4 2 2 25.3 1.8 0.4
4 2 2 4.4 2.4 0.1
4 2 2 117.8 2.2 2.1
4 2 2 6.6 1.9 0.8
4 2 3 158.9 2.2 2.2
4 2 3 44.5 2 1
4 2 3 12.6 1.7 0.5
4 2 3 22.5 2.3 1
4 2 3 91 2 1.3
4 3 1 212.1 1.7 1.2
4 3 1 101.7 1.9 1.8
4 3 1 63.4 1.4 0.9
4 3 1 67.9 1.8 1.3
4 3 1 40.4 1.6 0.9
4 3 2 73 2.1 0.9
4 3 2 33.1 2.1 0.6
4 3 2 7.8 2 0.3
4 3 2 10.9 1.9 1.4
4 3 2 11.6 2 0.2
4 3 3 382.8 2.2 2.3
4 3 3 7 2.4 0.4
4 3 3 13.1 2 0.7
4 3 3 24.7 2.2 1.1
4 3 3 42.9 2 0.9
5 1 1 340 1.4 0.9
5 1 1 48.6 1.5 0.6
5 1 1 53.7 1.4 0.6
5 1 1 33.3 1.3 0.5
5 1 1 86 1.3 0.5
5 1 2 11.4 1.4 0.5
5 1 2 27 1.6 1
5 1 2 33.7 1.7 0.9
5 1 2 14.2 1.8 1.3
5 1 2 16.2 1.9 0.8
5 1 3 606 1.6 0.7
5 1 3 265.8 1.7 1
5 1 3 73.5 1.5 2.6
5 1 3 189 1.6 0.7
5 1 3 223.1 1.6 0.7
5 2 1 101.5 1.5 0.6
5 2 1 7.2 1.2 0.5
5 2 1 49.7 1.4 0.7
5 2 1 61.9 1.5 0.6
5 2 1 6.9 1.1 0.6
5 2 2 20.6 1.4 0.7
5 2 2 13.7 1.7 1
5 2 2 48.5 1.8 1
5 2 2 6.8 1.6 0.7
5 2 2 5.7 1.3 0.7
5 2 3 239.5 1.7 0.8
5 2 3 152.5 1.7 0.8
5 2 3 2065.4 1.5 1
5 2 3 112.7 1.6 0.6
5 2 3 104.5 1.6 0.7
5 3 1 40.7 1.3 0.5
5 3 1 82 1.4 0.9
5 3 1 20.6 1.3 1
5 3 1 21.4 1.4 0.6
5 3 1 29.6 1.4 0.6
5 3 2 4.3 1.1 0.5
5 3 2 38.5 1.3 1.3
5 3 2 22.9 1.7 1.1
5 3 2 10.3 1.7 1
5 3 2 4.9 1.3 0.6
5 3 3 216.8 1.7 0.8
5 3 3 220.7 1.7 0.9
5 3 3 51.8 1.3 1.3
5 3 3 161 1.6 0.7
5 3 3 144.4 1.6 0.6
6 1 1 79.9 2.1 1.8
6 1 1 295.3 2.1 2.1
6 1 1 136.4 2.1 2.1
6 1 1 177.3 2 2.1
6 1 1 116.4 1.7 1.1
6 1 2 93.1 2.1 2.2
6 1 2 385.9 2.1 2.3
6 1 2 318.6 2.1 1.8
6 1 2 20.3 2.1 1.4
6 1 2 131.5 2.1 1.4
6 1 3 53.9 2.2 2.7
6 1 3 156.9 2.2 1.8
6 1 3 344.7 2.1 2
6 1 3 15.5 2.6 0.6
6 1 3 33.7 2.1 1.3
6 2 1 137.8 1.8 1.2
6 2 1 104 2.1 2
6 2 1 151.7 2.1 2
6 2 1 15.4 2.1 1.6
6 2 1 31.5 1.9 2.1
6 2 2 8 2.1 1.5
6 2 2 23.9 2 0.9
6 2 2 14.8 2.2 0.4
6 2 2 8.6 1.8 0.4
6 2 2 18.2 2.2 0.3
6 2 3 35 2.2 2.7
6 2 3 61 2.2 1.7
6 2 3 136.9 2.1 2.3
6 2 3 3.1 8.5 0.2
6 2 3 8.7 1.6 1.1
6 3 1 155.7 2 2.2
6 3 1 48.5 1.7 0.9
6 3 1 77.7 2 1.5
6 3 1 45.6 1.6 1.2
6 3 1 142 1.6 1.1
6 3 2 5.6 1.4 1.3
6 3 2 7.4 2.2 0.1
6 3 2 66.4 1.9 1.4
6 3 2 6.8 2.3 0.7
6 3 2 10.3 1.9 2.2
6 3 3 42.9 2.1 2.5
6 3 3 48.3 2.3 1.2
6 3 3 62.6 2.3 1.4
6 3 3 6.9 2.2 0.3
6 3 3 3.9 1.8 0.5")
Did I forget anything to include to the GLMM? Thanks!
Edit 1:
I have converted the Method and Kit variables to factor formats:
Input_2$Metodo<-as.factor(Input_2$Metodo)
Input_2$Kit<-as.factor(Input_2$Kit)
Obtaining a different warming:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00649539 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
How can I solve these problems?
Based on suggestion, I have tried to reescale my values:
numcols <- grep("^c\\.",names(Input_2))
dfs <- Input_2
dfs[,numcols] <- scale(dfs[,numcols])
m1_sc <- update(glmer.1,data=dfs)
Obtaining the next warming message:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00649539 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Edit2: based on the suggestions I have removed my interaction * for + in addition I have converted the species column into a factor:
Input_2$Especie<-as.factor(Input_2$Especie)
glmer.1 <- glmer(Conc ~ Metodo + Kit +
(1|Especie),
data = Input_2,
family = Gamma(link = "inverse"))
and I get the next warming:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00513611 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
A second suggestion was to change the glmer to glm to see if the model was saturated with a simpler strategy (as the glm requires all factors to be fixed I changed the species variable to fixed factor):
glmer.1 <- glm(Conc ~ Metodo + Kit +Especie,
data = Input_2,
family = Gamma(link = "inverse"))
And when I execute it, I have no problem.
How can I solve these problems with the glmer?What is my mistake?
Thank you in advance
Related
Having a dataframe as the one below, How can i get an estimated treatment means plot in r
I have tried using the ggplot
p <- ggplot(data2, aes(x = faca, y = res, colour = facb, shape = facb))
plot(p)
but im getting an empty plot for it. How can i fix this?
>data2
res faca facb obs
1 2.4 1 low 1
2 2.7 1 low 2
3 2.3 1 low 3
4 2.5 1 low 4
5 4.6 1 medium 1
6 4.2 1 medium 2
7 4.9 1 medium 3
8 4.7 1 medium 4
9 4.8 1 high 1
10 4.5 1 high 2
11 4.4 1 high 3
12 4.6 1 high 4
13 5.8 2 low 1
14 5.2 2 low 2
15 5.5 2 low 3
16 5.3 2 low 4
17 8.9 2 medium 1
18 9.1 2 medium 2
19 8.7 2 medium 3
20 9.0 2 medium 4
21 9.1 2 high 1
22 9.3 2 high 2
23 8.7 2 high 3
24 9.4 2 high 4
25 6.1 3 low 1
26 5.7 3 low 2
27 5.9 3 low 3
28 6.2 3 low 4
29 9.9 3 medium 1
30 10.5 3 medium 2
31 10.6 3 medium 3
32 10.1 3 medium 4
33 13.5 3 high 1
34 13.0 3 high 2
35 13.3 3 high 3
36 13.2 3 high 4
I have a data.table with millions of rows in the following format.
There are multi-year results for each ID, however I only know the day of the year going from 1 to 365 or 366. I don't know the month nor the year, but I know the date for the first row (e.g. 1995/1/1).
ID DAY ATRR1 ATRR2
1 1 0.2 0.4
2 1 1.2 0.5
3 1 0.8 1.4
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
1 3 1.5 1.4
2 3 2.1 1.3
3 3 1.2 0.3
...
1 365 1.5 1.4
2 365 2.1 1.3
3 365 1.2 0.3
1 1 1.5 1.4
2 1 2.1 1.3
3 1 1.2 0.3
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
...
I would like to add a DATE column adding one day at each change in the DAY column, so the result would be:
ID DAY ATRR1 ATRR2 DATE
1 1 0.2 0.4 1995/1/1
2 1 1.2 0.5 1995/1/1
3 1 0.8 1.4 1995/1/1
1 2 1.3 1.5 1995/1/2
2 2 2.3 0.3 1995/1/2
3 2 1.7 1.3 1995/1/2
1 3 1.5 1.4 1995/1/3
2 3 2.1 1.3 1995/1/3
3 3 1.2 0.3 1995/1/3
...
1 365 1.5 1.4 1995/12/31
2 365 2.1 1.3 1995/12/31
3 365 1.2 0.3 1995/12/31
1 1 1.5 1.4 1996/1/1
2 1 2.1 1.3 1996/1/1
3 1 1.2 0.3 1996/1/1
1 2 1.3 1.5 1996/1/2
2 2 2.3 0.3 1996/1/2
3 2 1.7 1.3 1996/1/2
...
How would it be possible to do that?
You can simply do this:
as.Date(x, origin="1994-12-31")
My assumption here is that you don't have gaps in your dates and arranged as described in the question, otherwise this shall produce undesirable results.
Sample data:
df <- data.frame(Day = rep(c(1:365,1:2),each=3))
Create a seq like this using rle(run length encoding)
df$seq <- data.table::rleid(df$Day)
df$date <- as.Date(df$seq, origin="1994-12-31") #final answer
tail(df,8)
Let me know , if this is your expectation
Sample Output:
> tail(df,8)
Day seq date
1094 365 365 1995-12-31
1095 365 365 1995-12-31
1096 1 366 1996-01-01
1097 1 366 1996-01-01
1098 1 366 1996-01-01
1099 2 367 1996-01-02
1100 2 367 1996-01-02
1101 2 367 1996-01-02
date gaps no problem for this solution:
library(data.table)
library(lubridate)
library(magrittr)
read.table(text = "
ID DAY ATRR1 ATRR2
1 1 0.2 0.4
2 1 1.2 0.5
3 1 0.8 1.4
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
1 3 1.5 1.4
2 3 2.1 1.3
3 3 1.2 0.3
1 365 1.5 1.4
2 365 2.1 1.3
3 365 1.2 0.3
1 1 1.5 1.4
2 1 2.1 1.3
3 1 1.2 0.3
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3", header = T) %>% setDT -> x
x[, date := as.Date(DAY, origin = "1995-01-01") -1]
x[, date := {
t1 = c(0, diff(DAY))
t2 = ifelse(t1 < 0, 1, 0)
t3 = cumsum(t2)
t4 = date + years(t3)
}]
I've been working for developing a reference curve in a risky placenta thickness by week.
So I calculated quantiles of .03, .05, .10, .50, .90, .95, and .99 by each gestational week.
Consequently, I have two datasets for placenta thickness and quantiles. And I'd like to make a new variable, which presents outliers in the former dataset using the lowest and highest quantiles by week.
Here's examples of data:
Data A for thickness:
ID week day thickness
1 15 0 1.3
2 15 0 1.5
3 16 2 2.3
4 16 1 3.5
5 16 1 2.5
6 17 0 3.6
7 17 0 3.4
8 17 3 2.4
Data B for quantiles:
week .03 .05 .10 .50 .90 .95 .99
15 1.6 1.7 1.8 2.4 2.6 2.7 2.8
16 1.7 1.8 2.0 2.5 3.1 3.3 3.4
17 1.7 1.8 2.1 2.6 3.4 3.5 3.7
So I tried codes using ifelse() statement like below:
C<-within(A, {outlier = ifelse(A$Thickness<B[2] & A$week == B[1], 1, 0)
outlier = ifelse(A$Thickness>B[8] & A$week == B[1], 1, 0)})
But an error occurred regarding the mismatched number of rows from each data.
Error in `[<-.data.frame`(`*tmp*`, nl, value = list(outlier = c(0, 0, : replacement element 1 is a matrix/data frame of 33 rows, need 55808
The expected form of data based on Data A will be like this:
Data C:
ID week day thickness outlier
1 15 0 1.3 1
2 15 0 1.5 1
3 16 2 2.3 0
4 16 1 3.5 1
5 16 1 2.5 0
6 17 0 3.6 0
7 17 0 3.4 0
8 17 3 2.4 0
The base R solution I can think of.:
transform(A,outlier=as.numeric((C<-thickness-B[as.factor(week),c(2,8)])[,1]<0|C[,2]>0))
ID week day thickness outlier
1 1 15 0 1.3 1
2 2 15 0 1.5 1
3 3 16 2 2.3 0
4 4 16 1 3.5 1
5 5 16 1 2.5 0
6 6 17 0 3.6 0
7 7 17 0 3.4 0
8 8 17 3 2.4 0
You can decide to write it as below:
C=A$thickness-B[as.factor(A$week),c(2,8)] #Only columns 2 and 8 subtract from A
transform(A,outlier=as.numeric(C[,1]<0|C[,2]>0)) #eg If the first column is -ve then an outlier
ID week day thickness outlier
1 1 15 0 1.3 1
2 2 15 0 1.5 1
3 3 16 2 2.3 0
4 4 16 1 3.5 1
5 5 16 1 2.5 0
6 6 17 0 3.6 0
7 7 17 0 3.4 0
8 8 17 3 2.4 0
A solution using dplyr. We can perform a join and then determine the outlier condition.
library(dplyr)
B2 <- B %>% select(week, X.03, X.99)
A2 <- A %>%
left_join(B2, by = "week") %>%
mutate(outlier = as.integer(thickness < X.03 | thickness > X.99)) %>%
select(-starts_with("X"))
A2
# ID week day thickness outlier
# 1 1 15 0 1.3 1
# 2 2 15 0 1.5 1
# 3 3 16 2 2.3 0
# 4 4 16 1 3.5 1
# 5 5 16 1 2.5 0
# 6 6 17 0 3.6 0
# 7 7 17 0 3.4 0
# 8 8 17 3 2.4 0
Here is the base R version of the same operation.
B2 <- B[, c("week", "X.03", "X.99")]
A2 <- merge(A, B2, by = "week", all.x = TRUE)
A2$outlier <- as.integer(A2$thickness < A2$X.03 | A2$thickness > A2$X.99)
A2[, c("X.03", "X.99")] <- NULL
A2
# week ID day thickness outlier
# 1 15 1 0 1.3 1
# 2 15 2 0 1.5 1
# 3 16 3 2 2.3 0
# 4 16 4 1 3.5 1
# 5 16 5 1 2.5 0
# 6 17 6 0 3.6 0
# 7 17 7 0 3.4 0
# 8 17 8 3 2.4 0
Here is the data.table version of the same operation.
library(data.table)
setDT(A)
setDT(B)
B2 <- B[, .(week, X.03, X.99)]
setkey(A, week)
setkey(B2, week)
A2 <- merge(A, B2)[, outlier := as.integer(between(thickness, X.03, X.99, incbounds = FALSE)),
][, c("X.03","X.99"):=NULL]
A2[]
# week ID day thickness outlier
# 1: 15 1 0 1.3 1
# 2: 15 2 0 1.5 1
# 3: 16 3 2 2.3 0
# 4: 16 4 1 3.5 1
# 5: 16 5 1 2.5 0
# 6: 17 6 0 3.6 0
# 7: 17 7 0 3.4 0
# 8: 17 8 3 2.4 0
DATA
A <- read.table(text = "ID week day thickness
1 15 0 1.3
2 15 0 1.5
3 16 2 2.3
4 16 1 3.5
5 16 1 2.5
6 17 0 3.6
7 17 0 3.4
8 17 3 2.4
",
header = TRUE)
B <- read.table(text = "week .03 .05 .10 .50 .90 .95 .99
15 1.6 1.7 1.8 2.4 2.6 2.7 2.8
16 1.7 1.8 2.0 2.5 3.1 3.3 3.4
17 1.7 1.8 2.1 2.6 3.4 3.5 3.7",
header = TRUE)
Here is an option using data.table join
library(data.table)
setDT(A)[B[c('week', '.03', '.99')], outlier :=
as.integer(thickness < `.03`| thickness > `.99`), on = .(week)]
A
# ID week day thickness outlier
#1: 1 15 0 1.3 1
#2: 2 15 0 1.5 1
#3: 3 16 2 2.3 0
#4: 4 16 1 3.5 1
#5: 5 16 1 2.5 0
#6: 6 17 0 3.6 0
#7: 7 17 0 3.4 0
#8: 8 17 3 2.4 0
Please see this example. Look at y axis. The data there has only two levels: 1 and 2. But in the plot 6 tickmarks drawn on that axis. How could I fix that. The x axis has the same problem.
The data
extra group ID
1 0.7 1 1
2 -1.6 1 2
3 -0.2 1 3
4 -1.2 1 4
5 -0.1 1 5
6 3.4 1 6
7 3.7 1 7
8 0.8 1 8
9 0.0 1 9
10 2.0 1 10
11 1.9 2 1
12 0.8 2 2
13 1.1 2 3
14 0.1 2 4
15 -0.1 2 5
16 4.4 2 6
17 5.5 2 7
18 1.6 2 8
19 4.6 2 9
20 3.4 2 10
The script
require('mise')
require('scatterplot3d')
mise() # clear the workspace
# example data
print(sleep)
# plot it
scatterplot3d(x=sleep$ID,
x.ticklabs=levels(sleep$ID),
y=sleep$group,
y.ticklabs=levels(sleep$group),
z=sleep$extra)
The result
How about this:
scatterplot3d(x=sleep$ID, y=sleep$extra, z=sleep$group, lab.z = c(1, 2))
i have a data frame like this
A B value
1 1 0.123
2 1 0.213
3 1 0.543
1 2 0.313
2 2 0.123
3 2 0.412
what i want to do is to create a function that shift this data frame by a value. for example:
if the value of shifting is 1 the data frame will become:
A B value
3 2 0.412
1 1 0.123
2 1 0.213
3 1 0.543
1 2 0.313
2 2 0.123
etc...
the function should be like this.
shift<-function(dataframe,shiftvalue)
is there any simple way to do this in R without entering in a lot of loops??
You can do it many ways, but one way is to use head and tail:
df <- data.frame(a=1:10, b = 11:20)
shift <- function(d, k) rbind( tail(d,k), head(d,-k), deparse.level = 0 )
> shift(df,3)
a b
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
1 1 11
2 2 12
3 3 13
I prefer plain old modulo ;-)
shift<-function(df,offset) df[((1:nrow(df))-1-offset)%%nrow(df)+1,]
It is pretty straightforward, the only quirk is R's from-one indexing. Also it works for offsets like 0, -7 or 7*nrow(df)...
here is my implementation:
> shift <- function(df, sv = 1) df[c((sv+1):nrow(df), 1:sv),]
> head(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
> tail(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
>
Updated:
> shift <- function(df, sv = 1) df[c((nrow(df)-sv+1):nrow(df), 1:(nrow(df)-sv)),]
> head(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
> tail(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
There's a shift function in taRifx that works on vectors. Applying it results in coersion of all columns to character if any are character, so we'll use a trick from plyr. I'll likely write a data.frame method for it soon:
dd <- data.frame(b = seq(4),
x = c("A", "D", "A", "C"), y = c('a','b','c','d'),
z = c(1, 1, 1, 2),stringsAsFactors=FALSE)
> dd
b x y z
1 1 A a 1
2 2 D b 1
3 3 A c 1
4 4 C d 2
library(taRifx)
library(plyr)
shift.data.frame <- colwise(shift)
> shift.data.frame(dd)
b x y z
1 2 D b 1
2 3 A c 1
3 4 C d 2
4 1 A a 1
> shift(dd,n=-1)
b x y z
1 4 C d 2
2 1 A a 1
3 2 D b 1
4 3 A c 1
> shift(dd,n=-1,wrap=FALSE)
b x y z
1 1 A a 1
2 2 D b 1
3 3 A c 1
> shift(dd,n=-1,wrap=FALSE,pad=TRUE)
b x y z
1 NA <NA> <NA> NA
2 1 A a 1
3 2 D b 1
4 3 A c 1
The advantage of shift is that it takes a bunch of options:
n can be positive or negative to wrap from left/right
wrap can be turned on or off
If wrap is turned off, pad can be turned on to pad with NAs so vector remains the same length
https://dplyr.tidyverse.org/reference/lead-lag.html
lag(1:5, n = 1)
#> [1] NA 1 2 3 4
lag(1:5, n = 2)
#> [1] NA NA 1 2 3