Adding one day at each value change - r

I have a data.table with millions of rows in the following format.
There are multi-year results for each ID, however I only know the day of the year going from 1 to 365 or 366. I don't know the month nor the year, but I know the date for the first row (e.g. 1995/1/1).
ID DAY ATRR1 ATRR2
1 1 0.2 0.4
2 1 1.2 0.5
3 1 0.8 1.4
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
1 3 1.5 1.4
2 3 2.1 1.3
3 3 1.2 0.3
...
1 365 1.5 1.4
2 365 2.1 1.3
3 365 1.2 0.3
1 1 1.5 1.4
2 1 2.1 1.3
3 1 1.2 0.3
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
...
I would like to add a DATE column adding one day at each change in the DAY column, so the result would be:
ID DAY ATRR1 ATRR2 DATE
1 1 0.2 0.4 1995/1/1
2 1 1.2 0.5 1995/1/1
3 1 0.8 1.4 1995/1/1
1 2 1.3 1.5 1995/1/2
2 2 2.3 0.3 1995/1/2
3 2 1.7 1.3 1995/1/2
1 3 1.5 1.4 1995/1/3
2 3 2.1 1.3 1995/1/3
3 3 1.2 0.3 1995/1/3
...
1 365 1.5 1.4 1995/12/31
2 365 2.1 1.3 1995/12/31
3 365 1.2 0.3 1995/12/31
1 1 1.5 1.4 1996/1/1
2 1 2.1 1.3 1996/1/1
3 1 1.2 0.3 1996/1/1
1 2 1.3 1.5 1996/1/2
2 2 2.3 0.3 1996/1/2
3 2 1.7 1.3 1996/1/2
...
How would it be possible to do that?

You can simply do this:
as.Date(x, origin="1994-12-31")
My assumption here is that you don't have gaps in your dates and arranged as described in the question, otherwise this shall produce undesirable results.
Sample data:
df <- data.frame(Day = rep(c(1:365,1:2),each=3))
Create a seq like this using rle(run length encoding)
df$seq <- data.table::rleid(df$Day)
df$date <- as.Date(df$seq, origin="1994-12-31") #final answer
tail(df,8)
Let me know , if this is your expectation
Sample Output:
> tail(df,8)
Day seq date
1094 365 365 1995-12-31
1095 365 365 1995-12-31
1096 1 366 1996-01-01
1097 1 366 1996-01-01
1098 1 366 1996-01-01
1099 2 367 1996-01-02
1100 2 367 1996-01-02
1101 2 367 1996-01-02

date gaps no problem for this solution:
library(data.table)
library(lubridate)
library(magrittr)
read.table(text = "
ID DAY ATRR1 ATRR2
1 1 0.2 0.4
2 1 1.2 0.5
3 1 0.8 1.4
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
1 3 1.5 1.4
2 3 2.1 1.3
3 3 1.2 0.3
1 365 1.5 1.4
2 365 2.1 1.3
3 365 1.2 0.3
1 1 1.5 1.4
2 1 2.1 1.3
3 1 1.2 0.3
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3", header = T) %>% setDT -> x
x[, date := as.Date(DAY, origin = "1995-01-01") -1]
x[, date := {
t1 = c(0, diff(DAY))
t2 = ifelse(t1 < 0, 1, 0)
t3 = cumsum(t2)
t4 = date + years(t3)
}]

Related

Warning glmer: GLMM failed

I have to run a glmer with a continuous response variable, two variable as fixed effects and random effect:
glmer.1 <- glmer(Conc ~ Metodo * Kit +
(1|Especie),
data = Input_2,
family = Gamma(link = "inverse"))
I got this warning message:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0652105 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
I tried this with bobyqa() as optimization argument and got this warning messages:
glmer.Conc<-glmer(Conc ~ Metodo * Kit
+ (1 | Especie),
data = Input_2,
control = glmerControl(optimizer = "bobyqa"),
family = Gamma)
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0747655 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
my summary looks like this:
summary(glmer.1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: Gamma ( inverse )
Formula: Conc ~ Metodo * Kit + (1 | Especie)
Data: Input_2
AIC BIC logLik deviance df.resid
3202.0 3223.6 -1595.0 3190.0 264
Scaled residuals:
Min 1Q Median 3Q Max
-0.7347 -0.5647 -0.2876 0.2740 12.0396
Random effects:
Groups Name Variance Std.Dev.
Especie (Intercept) 5.264e-05 0.007256
Residual 1.724e+00 1.313030
Number of obs: 270, groups: Especie, 6
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 0.0089191 0.0028893 3.087 0.00202 **
Metodo -0.0002581 0.0009247 -0.279 0.78013
Kit -0.0009974 0.0007818 -1.276 0.20203
Metodo:Kit 0.0007497 0.0004608 1.627 0.10380
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Metodo Kit
Metodo -0.531
Kit -0.531 0.857
Metodo:Kit 0.449 -0.907 -0.897
convergence code: 0
Model failed to converge with max|grad| = 0.0652105 (tol = 0.002, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Especie represent the type of specie, the metodo and kit represent what type of method and kit was used to obtain the concentration and Conc is concentraction values
My data frame looks like this:
Input_2 <- readr::read_table2("Especie Metodo Kit Conc A1 A2
1 1 1 70.4 1.5 0.9
1 1 1 57.2 1.7 1.4
1 1 1 22.2 1.6 1.1
1 1 1 60.6 1.5 0.9
1 1 1 50.2 1.6 1
1 1 2 40.5 2.1 1.7
1 1 2 86.6 2.1 2.1
1 1 2 53.9 1.8 1.3
1 1 2 50.3 1.9 0.2
1 1 2 44.1 2 1.5
1 1 3 159.1 2.1 2.3
1 1 3 246 2.1 2.4
1 1 3 81.5 2 1.3
1 1 3 107.9 2.1 1.9
1 1 3 125 2 1.7
1 2 1 72.9 1.6 1.1
1 2 1 38.1 1.6 1.1
1 2 1 8.3 1.3 1.8
1 2 1 31 1.6 0.8
1 2 1 36.3 1.3 1.2
1 2 2 16.8 1.9 1.8
1 2 2 4.8 2.2 1
1 2 2 20.4 1.4 1
1 2 2 31.8 1.9 0.4
1 2 2 5 1.6 1
1 2 3 17.9 2 2.2
1 2 3 14.9 2.6 2.5
1 2 3 5.3 1.6 0.5
1 2 3 17.1 2.2 1.3
1 2 3 17.1 1.9 1
1 3 1 67.3 1.6 1
1 3 1 11.3 1.4 0.7
1 3 1 4.2 1 1
1 3 1 9.7 1.1 0.8
1 3 1 20 1.4 0.8
1 3 2 4.3 1.4 1.3
1 3 2 4.1 1.6 2
1 3 2 11.4 1.5 0.3
1 3 2 12.5 1.4 0.2
1 3 2 11 1.5 0.9
1 3 3 59.7 1.9 2.4
1 3 3 81.4 2.3 2.4
1 3 3 32.2 1.9 1.6
1 3 3 24.9 1.8 1.2
1 3 3 50.9 2.1 1.2
2 1 1 185 1.6 1
2 1 1 146.2 1.7 1.2
2 1 1 239 1.8 1.4
2 1 1 141.9 2 2
2 1 1 303.7 1.9 1.8
2 1 2 53.6 1.9 0.6
2 1 2 424.8 2.1 1.5
2 1 2 30.3 1.9 0.4
2 1 2 291 2.1 1.5
2 1 2 471.6 2.1 2.1
2 1 3 132.8 1.9 1.4
2 1 3 124.5 2.1 1.7
2 1 3 484.4 2.1 2.1
2 1 3 251.3 2.2 2.2
2 1 3 598.6 2.1 2.2
2 2 1 15.6 1.4 0.9
2 2 1 107.6 1.7 1.1
2 2 1 172.2 1.8 1.6
2 2 1 210.2 1.9 1.9
2 2 1 699.7 2 2.2
2 2 2 20 1.7 0.2
2 2 2 86.2 2 0.9
2 2 2 36.8 2.1 0.3
2 2 2 434.4 2.1 2.3
2 2 2 209.5 2.2 1.5
2 2 3 37 2.1 1.6
2 2 3 94.8 2.1 1.8
2 2 3 220.2 2.2 2.1
2 2 3 17.7 2.3 0.8
2 2 3 145.3 2.1 1.9
2 3 1 33.4 1.5 0.8
2 3 1 87 1.7 1.2
2 3 1 98.9 1.8 1.8
2 3 1 176.6 2 2.2
2 3 1 180.7 1.9 1.9
2 3 2 20.8 1.8 0.3
2 3 2 226.2 2.1 1.1
2 3 2 227.8 2.1 1.7
2 3 2 40.8 2.1 0.4
2 3 2 83.4 2.1 0.9
2 3 3 35.7 2.1 1.2
2 3 3 191.9 2.1 1.8
2 3 3 203.2 2.2 2.2
2 3 3 143.3 2.2 2
2 3 3 70.7 2.1 1.7
3 1 1 190.7 2.1 2
3 1 1 456.5 2 2.2
3 1 1 520.9 2 2.2
3 1 1 535.5 2 2.1
3 1 1 894.3 2 2.3
3 1 2 265.2 2.1 1.7
3 1 2 392.2 2.1 2.3
3 1 2 241.3 2.1 1.4
3 1 2 156.9 2.1 1.7
3 1 2 277.2 2.1 1.7
3 1 3 431.6 2.1 2.1
3 1 3 669.9 2.2 2.3
3 1 3 342.5 2.1 2.1
3 1 3 572.3 2.1 2.2
3 1 3 435.2 2.1 2.1
3 2 1 498.6 2 2.1
3 2 1 1137.2 2.1 2.3
3 2 1 764.5 2.1 2.3
3 2 1 746.4 2.1 2.2
3 2 1 737.2 2.1 2.2
3 2 2 98.1 2 1.4
3 2 2 237.9 2.1 2.2
3 2 2 144 2 1.2
3 2 2 240.3 2.1 2.1
3 2 2 253.4 2.1 1.8
3 2 3 247.3 2.2 1.7
3 2 3 155.3 2.1 1.8
3 2 3 179.3 2.2 1.6
3 2 3 225.7 2.1 2.2
3 2 3 274.4 2.1 2.1
3 3 1 944.9 2.1 2.3
3 3 1 978.4 2.1 2.3
3 3 1 785.9 2 2.1
3 3 1 510.7 2 2.1
3 3 1 164.5 2 2.2
3 3 2 48.1 2 0.3
3 3 2 427.7 2.1 2.2
3 3 2 75 2 0.9
3 3 2 153 2.1 1.3
3 3 2 293.8 2.1 2.4
3 3 3 335.8 2.1 1.9
3 3 3 395.5 2.1 2.1
3 3 3 181.1 2.1 1.9
3 3 3 468.4 2.1 2.2
3 3 3 348 2.1 2.1
4 1 1 335.8 2 1.9
4 1 1 188.1 1.9 1.5
4 1 1 219 2 2
4 1 1 104.1 1.9 1.3
4 1 1 273.7 2 2
4 1 2 573.2 2.2 1.8
4 1 2 115.3 2.1 2.1
4 1 2 56.7 2 0.4
4 1 2 316.5 2.1 1.6
4 1 2 85.2 2.1 1.9
4 1 3 585 2.2 2.3
4 1 3 377.5 2.1 2
4 1 3 47.1 2.1 1.6
4 1 3 73.1 2 1.4
4 1 3 91.9 2 1.2
4 2 1 374.8 2 1.7
4 2 1 85.8 1.9 1.9
4 2 1 53.1 1.8 1.8
4 2 1 34.3 1.8 1.1
4 2 1 27 1.9 1.1
4 2 2 182.9 2.1 1.3
4 2 2 25.3 1.8 0.4
4 2 2 4.4 2.4 0.1
4 2 2 117.8 2.2 2.1
4 2 2 6.6 1.9 0.8
4 2 3 158.9 2.2 2.2
4 2 3 44.5 2 1
4 2 3 12.6 1.7 0.5
4 2 3 22.5 2.3 1
4 2 3 91 2 1.3
4 3 1 212.1 1.7 1.2
4 3 1 101.7 1.9 1.8
4 3 1 63.4 1.4 0.9
4 3 1 67.9 1.8 1.3
4 3 1 40.4 1.6 0.9
4 3 2 73 2.1 0.9
4 3 2 33.1 2.1 0.6
4 3 2 7.8 2 0.3
4 3 2 10.9 1.9 1.4
4 3 2 11.6 2 0.2
4 3 3 382.8 2.2 2.3
4 3 3 7 2.4 0.4
4 3 3 13.1 2 0.7
4 3 3 24.7 2.2 1.1
4 3 3 42.9 2 0.9
5 1 1 340 1.4 0.9
5 1 1 48.6 1.5 0.6
5 1 1 53.7 1.4 0.6
5 1 1 33.3 1.3 0.5
5 1 1 86 1.3 0.5
5 1 2 11.4 1.4 0.5
5 1 2 27 1.6 1
5 1 2 33.7 1.7 0.9
5 1 2 14.2 1.8 1.3
5 1 2 16.2 1.9 0.8
5 1 3 606 1.6 0.7
5 1 3 265.8 1.7 1
5 1 3 73.5 1.5 2.6
5 1 3 189 1.6 0.7
5 1 3 223.1 1.6 0.7
5 2 1 101.5 1.5 0.6
5 2 1 7.2 1.2 0.5
5 2 1 49.7 1.4 0.7
5 2 1 61.9 1.5 0.6
5 2 1 6.9 1.1 0.6
5 2 2 20.6 1.4 0.7
5 2 2 13.7 1.7 1
5 2 2 48.5 1.8 1
5 2 2 6.8 1.6 0.7
5 2 2 5.7 1.3 0.7
5 2 3 239.5 1.7 0.8
5 2 3 152.5 1.7 0.8
5 2 3 2065.4 1.5 1
5 2 3 112.7 1.6 0.6
5 2 3 104.5 1.6 0.7
5 3 1 40.7 1.3 0.5
5 3 1 82 1.4 0.9
5 3 1 20.6 1.3 1
5 3 1 21.4 1.4 0.6
5 3 1 29.6 1.4 0.6
5 3 2 4.3 1.1 0.5
5 3 2 38.5 1.3 1.3
5 3 2 22.9 1.7 1.1
5 3 2 10.3 1.7 1
5 3 2 4.9 1.3 0.6
5 3 3 216.8 1.7 0.8
5 3 3 220.7 1.7 0.9
5 3 3 51.8 1.3 1.3
5 3 3 161 1.6 0.7
5 3 3 144.4 1.6 0.6
6 1 1 79.9 2.1 1.8
6 1 1 295.3 2.1 2.1
6 1 1 136.4 2.1 2.1
6 1 1 177.3 2 2.1
6 1 1 116.4 1.7 1.1
6 1 2 93.1 2.1 2.2
6 1 2 385.9 2.1 2.3
6 1 2 318.6 2.1 1.8
6 1 2 20.3 2.1 1.4
6 1 2 131.5 2.1 1.4
6 1 3 53.9 2.2 2.7
6 1 3 156.9 2.2 1.8
6 1 3 344.7 2.1 2
6 1 3 15.5 2.6 0.6
6 1 3 33.7 2.1 1.3
6 2 1 137.8 1.8 1.2
6 2 1 104 2.1 2
6 2 1 151.7 2.1 2
6 2 1 15.4 2.1 1.6
6 2 1 31.5 1.9 2.1
6 2 2 8 2.1 1.5
6 2 2 23.9 2 0.9
6 2 2 14.8 2.2 0.4
6 2 2 8.6 1.8 0.4
6 2 2 18.2 2.2 0.3
6 2 3 35 2.2 2.7
6 2 3 61 2.2 1.7
6 2 3 136.9 2.1 2.3
6 2 3 3.1 8.5 0.2
6 2 3 8.7 1.6 1.1
6 3 1 155.7 2 2.2
6 3 1 48.5 1.7 0.9
6 3 1 77.7 2 1.5
6 3 1 45.6 1.6 1.2
6 3 1 142 1.6 1.1
6 3 2 5.6 1.4 1.3
6 3 2 7.4 2.2 0.1
6 3 2 66.4 1.9 1.4
6 3 2 6.8 2.3 0.7
6 3 2 10.3 1.9 2.2
6 3 3 42.9 2.1 2.5
6 3 3 48.3 2.3 1.2
6 3 3 62.6 2.3 1.4
6 3 3 6.9 2.2 0.3
6 3 3 3.9 1.8 0.5")
Did I forget anything to include to the GLMM? Thanks!
Edit 1:
I have converted the Method and Kit variables to factor formats:
Input_2$Metodo<-as.factor(Input_2$Metodo)
Input_2$Kit<-as.factor(Input_2$Kit)
Obtaining a different warming:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00649539 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
How can I solve these problems?
Based on suggestion, I have tried to reescale my values:
numcols <- grep("^c\\.",names(Input_2))
dfs <- Input_2
dfs[,numcols] <- scale(dfs[,numcols])
m1_sc <- update(glmer.1,data=dfs)
Obtaining the next warming message:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00649539 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Edit2: based on the suggestions I have removed my interaction * for + in addition I have converted the species column into a factor:
Input_2$Especie<-as.factor(Input_2$Especie)
glmer.1 <- glmer(Conc ~ Metodo + Kit +
(1|Especie),
data = Input_2,
family = Gamma(link = "inverse"))
and I get the next warming:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00513611 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
A second suggestion was to change the glmer to glm to see if the model was saturated with a simpler strategy (as the glm requires all factors to be fixed I changed the species variable to fixed factor):
glmer.1 <- glm(Conc ~ Metodo + Kit +Especie,
data = Input_2,
family = Gamma(link = "inverse"))
And when I execute it, I have no problem.
How can I solve these problems with the glmer?What is my mistake?
Thank you in advance

Create lower triangle genetic distance matrix

I have distance matrix like this
1 2 3 4 5
A 0.1 0.2 0.3 0.5 0.6
B 0.7 0.8 0.9 1 1.1
C 1.2 1.3 1.4 1.5 1.6
D 1.7 1.8 1.9 2 2.1
E 2.2 2.3 2.4 2.5 2.6
and now I want to create lower triangle matrix like this
1 2 3 4 5 A B C D E
1 0
2 0.1 0
3 0.2 0.1 0
4 0.4 0.3 0.2 0
5 0.5 0.4 0.3 0.1 0
A 0.1 0.2 0.3 0.5 0.6 0
B 0.7 0.8 0.9 1 1.1 0.6 0
C 1.2 1.3 1.4 1.5 1.6 1.1 0.5 0
D 1.7 1.8 1.9 2 2.1 1.6 1 0.5 0
E 2.2 2.3 2.4 2.5 2.6 2.1 1.5 1 0.5 0
I just deducted distance between 2 from 1 from first table to get genetic distance between 1 and 2 (0.2 - 0.1=0.1) and like this I did for rest of the entries and I do not know doing like this is correct or not?, after doing calculation like that made lower triangle matrix. I tried like this in R
x <- read.csv("AD2.csv", head = FALSE, sep = ",")
b<-lower.tri(b, diag = FALSE)
but I am getting only TRUE and FALSE as output not like distance matrix.
can any one help to solve this problem and here is link to my example data.
You can make use of dist to calculate sub-matrices. Then use cbind and create the top and bottom half. Then rbind the 2 halves. Then set upper triangular to NA to create the desired output.
mat <- rbind(
cbind(as.matrix(dist(tbl[1,])), tbl),
cbind(tbl, as.matrix(dist(tbl[,1])))
)
mat[upper.tri(mat, diag=FALSE)] <- NA
mat
Hope it helps.
data:
tbl <- as.matrix(read.table(text="1 2 3 4 5
A 0.1 0.2 0.3 0.5 0.6
B 0.7 0.8 0.9 1 1.1
C 1.2 1.3 1.4 1.5 1.6
D 1.7 1.8 1.9 2 2.1
E 2.2 2.3 2.4 2.5 2.6", header=TRUE, check.names=FALSE, row.names=1))

Filling a column in a data frame based on certain conditions

I am using R and RStudio.
I have the following data frame:
ID TIME DV GpH SIpH GTT SITT
1 0 0 1.4 7.1 1.8 3.5
1 1 0.5 1.4 7.1 1.8 3.5
1 2 2 1.4 7.1 1.8 3.5
1 3 5 1.4 7.1 1.8 3.5
2 0 0 1.5 7.5 0.7 2.5
2 1 0.5 1.5 7.5 0.7 2.5
2 2 2 1.5 7.5 0.7 2.5
2 3 5 1.5 7.5 0.7 2.5
I want to add a pH column to the data frame such that:
1) If TIME is less than GTT for each subject ID then pH is GpH for that subject.
2) If TIME is bigger than GTT and less than the sum of GTT+SITT then pH = SIpH for that subject.
3) If TIME is bigger than the sum of GTT+SITT for each subject, then pH=6.
How possibly can I achieve this in R in a fast way?
You can try this, assuming that your data frame is stored as df1:
df1$pH <- with(df1, (TIME < GTT) * GpH + (TIME > GTT & TIME < (GTT + SITT)) * SIpH + (TIME > (GTT + SITT)) * 6)
#> df1
# ID TIME DV GpH SIpH GTT SITT pH
#1 1 0 0.0 1.4 7.1 1.8 3.5 1.4
#2 1 1 0.5 1.4 7.1 1.8 3.5 1.4
#3 1 2 2.0 1.4 7.1 1.8 3.5 7.1
#4 1 3 5.0 1.4 7.1 1.8 3.5 7.1
#5 2 0 0.0 1.5 7.5 0.7 2.5 1.5
#6 2 1 0.5 1.5 7.5 0.7 2.5 7.5
#7 2 2 2.0 1.5 7.5 0.7 2.5 7.5
#8 2 3 5.0 1.5 7.5 0.7 2.5 7.5
You can try a nesting of ifelse:
"Creating" the data:
data <- read.csv(head=TRUE, text =
"ID,TIME,DV,GpH,SIpH,GTT,SITT
1,0,0,1.4,7.1,1.8,3.5
1,1,0.5,1.4,7.1,1.8,3.5
1,2,2,1.4,7.1,1.8,3.5
1,3,5,1.4,7.1,1.8,3.5
2,0,0,1.5,7.5,0.7,2.5
2,1,0.5,1.5,7.5,0.7,2.5
2,2,2,1.5,7.5,0.7,2.5
2,3,5,1.5,7.5,0.7,2.5")
Adding ph
data$ph <- ifelse(
data$TIME < data$GTT,
data$GpH,
ifelse (
data$TIME > data$GTT & data$TIME < data$GTT + data$SITT,
data$SIpH,
6
)
)
Printing the result
data
ID TIME DV GpH SIpH GTT SITT ph
1 1 0 0.0 1.4 7.1 1.8 3.5 1.4
2 1 1 0.5 1.4 7.1 1.8 3.5 1.4
3 1 2 2.0 1.4 7.1 1.8 3.5 7.1
4 1 3 5.0 1.4 7.1 1.8 3.5 7.1
5 2 0 0.0 1.5 7.5 0.7 2.5 1.5
6 2 1 0.5 1.5 7.5 0.7 2.5 7.5
7 2 2 2.0 1.5 7.5 0.7 2.5 7.5
8 2 3 5.0 1.5 7.5 0.7 2.5 7.5

expand data.frame and insert average in another column

If the data set is
date CPI
2000/ 1 1.2
2000/ 2 3.2
2000/ 3 1.6
then I want to get a weekly cpi
So this is my expected result.
date CPI Average
2000/ 1 1.2 0.3
2000/ 1 1.2 0.3
2000/ 1 1.2 0.3
2000/ 1 1.2 0.3
2000/ 2 3.2 0.8
2000/ 2 3.2 0.8
2000/ 2 3.2 0.8
2000/ 2 3.2 0.8
2000/ 3 1.6 0.4
2000/ 3 1.6 0.4
2000/ 3 1.6 0.4
2000/ 3 1.6 0.4
How Can I do this in R program?
please help me. my monthly Cpi is almost 200.
May be this helps:
n <- 4
mydf1 <- transform(mydf, Average=CPI/n) #created a new column `Average` by dividing CPI by n
mydf2 <-mydf1[rep(1:nrow(mydf1),each=n),] #replicate the row numbers of the dataset `mydf1` by `n` and used the numeric index to expand the rows of `mydf1`
row.names(mydf2) <- 1:nrow(mydf2) #change the rownames
mydf2
# date CPI Average
#1 2000/ 1 1.2 0.3
#2 2000/ 1 1.2 0.3
#3 2000/ 1 1.2 0.3
#4 2000/ 1 1.2 0.3
#5 2000/ 2 3.2 0.8
#6 2000/ 2 3.2 0.8
#7 2000/ 2 3.2 0.8
#8 2000/ 2 3.2 0.8
#9 2000/ 3 1.6 0.4
#10 2000/ 3 1.6 0.4
#11 2000/ 3 1.6 0.4
#12 2000/ 3 1.6 0.4
Or using data.table
Here, the idea is similar to the above. First convert the data.frame to data.table using setDT. Create a new column Average:=CPI/n. Then use replicate rep the rownumbers of the dataset with n and use that numeric index to expand the rows of mydf
library(data.table)
setDT(mydf)[mydf[, Average:=CPI/n][,rep(seq_len(.N), each=n)]]
# date CPI Average
# 1: 2000/ 1 1.2 0.3
# 2: 2000/ 1 1.2 0.3
# 3: 2000/ 1 1.2 0.3
# 4: 2000/ 1 1.2 0.3
# 5: 2000/ 2 3.2 0.8
# 6: 2000/ 2 3.2 0.8
# 7: 2000/ 2 3.2 0.8
# 8: 2000/ 2 3.2 0.8
# 9: 2000/ 3 1.6 0.4
#10: 2000/ 3 1.6 0.4
#11: 2000/ 3 1.6 0.4
#12: 2000/ 3 1.6 0.4
If you need to separate the date in to year and quarter as shown in #KFB's post, you could use cSplit along with data.table. In the below code, setnames are used to rename the columns after the split. Rest of the procedure is the same as above.
Link to cSplit is https://gist.github.com/mrdwab/11380733
library(devtools)
source_gist(11380733)
DT1 <- setnames(cSplit(mydf, "date", '[/]', fixed=FALSE,direction='wide'),
c("CPI", "year", "Quarter"))
DT1[DT1[, Average:= CPI/n][,rep(seq_len(.N), each=n)]]
# CPI year Quarter Average
#1: 1.2 2000 1 0.3
#2: 1.2 2000 1 0.3
#3: 1.2 2000 1 0.3
#4: 1.2 2000 1 0.3
#5: 3.2 2000 2 0.8
#6: 3.2 2000 2 0.8
#7: 3.2 2000 2 0.8
#8: 3.2 2000 2 0.8
#9: 1.6 2000 3 0.4
#10: 1.6 2000 3 0.4
#11: 1.6 2000 3 0.4
#12: 1.6 2000 3 0.4
data
mydf <- structure(list(date = c("2000/ 1", "2000/ 2", "2000/ 3"), CPI = c(1.2,
3.2, 1.6)), .Names = c("date", "CPI"), class = "data.frame", row.names = c("1",
"2", "3"))
Another data.table solution using #akrun's mydf:
mydt = data.table(mydf)
mydt2 = mydt[,data.table(apply(.SD,2,function(x) rep(x,4))),]
mydt2$CPI = as.numeric(mydt2$CPI)
mydt2[,Average:=CPI/4,]
mydt2
date CPI Average
1: 2000/ 1 1.2 0.3
2: 2000/ 2 3.2 0.8
3: 2000/ 3 1.6 0.4
4: 2000/ 1 1.2 0.3
5: 2000/ 2 3.2 0.8
6: 2000/ 3 1.6 0.4
7: 2000/ 1 1.2 0.3
8: 2000/ 2 3.2 0.8
9: 2000/ 3 1.6 0.4
10: 2000/ 1 1.2 0.3
11: 2000/ 2 3.2 0.8
12: 2000/ 3 1.6 0.4

Shifting (+ wrap around) a data frame in R

i have a data frame like this
A B value
1 1 0.123
2 1 0.213
3 1 0.543
1 2 0.313
2 2 0.123
3 2 0.412
what i want to do is to create a function that shift this data frame by a value. for example:
if the value of shifting is 1 the data frame will become:
A B value
3 2 0.412
1 1 0.123
2 1 0.213
3 1 0.543
1 2 0.313
2 2 0.123
etc...
the function should be like this.
shift<-function(dataframe,shiftvalue)
is there any simple way to do this in R without entering in a lot of loops??
You can do it many ways, but one way is to use head and tail:
df <- data.frame(a=1:10, b = 11:20)
shift <- function(d, k) rbind( tail(d,k), head(d,-k), deparse.level = 0 )
> shift(df,3)
a b
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
1 1 11
2 2 12
3 3 13
I prefer plain old modulo ;-)
shift<-function(df,offset) df[((1:nrow(df))-1-offset)%%nrow(df)+1,]
It is pretty straightforward, the only quirk is R's from-one indexing. Also it works for offsets like 0, -7 or 7*nrow(df)...
here is my implementation:
> shift <- function(df, sv = 1) df[c((sv+1):nrow(df), 1:sv),]
> head(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
> tail(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
>
Updated:
> shift <- function(df, sv = 1) df[c((nrow(df)-sv+1):nrow(df), 1:(nrow(df)-sv)),]
> head(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
> tail(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
There's a shift function in taRifx that works on vectors. Applying it results in coersion of all columns to character if any are character, so we'll use a trick from plyr. I'll likely write a data.frame method for it soon:
dd <- data.frame(b = seq(4),
x = c("A", "D", "A", "C"), y = c('a','b','c','d'),
z = c(1, 1, 1, 2),stringsAsFactors=FALSE)
> dd
b x y z
1 1 A a 1
2 2 D b 1
3 3 A c 1
4 4 C d 2
library(taRifx)
library(plyr)
shift.data.frame <- colwise(shift)
> shift.data.frame(dd)
b x y z
1 2 D b 1
2 3 A c 1
3 4 C d 2
4 1 A a 1
> shift(dd,n=-1)
b x y z
1 4 C d 2
2 1 A a 1
3 2 D b 1
4 3 A c 1
> shift(dd,n=-1,wrap=FALSE)
b x y z
1 1 A a 1
2 2 D b 1
3 3 A c 1
> shift(dd,n=-1,wrap=FALSE,pad=TRUE)
b x y z
1 NA <NA> <NA> NA
2 1 A a 1
3 2 D b 1
4 3 A c 1
The advantage of shift is that it takes a bunch of options:
n can be positive or negative to wrap from left/right
wrap can be turned on or off
If wrap is turned off, pad can be turned on to pad with NAs so vector remains the same length
https://dplyr.tidyverse.org/reference/lead-lag.html
lag(1:5, n = 1)
#> [1] NA 1 2 3 4
lag(1:5, n = 2)
#> [1] NA NA 1 2 3

Resources