What does dqAIC mean in r? - r

i am using the IC.test formula to compare models but don't know how to interpret it
dqAIC df
DM7 0.0 20
DM1 65.7 17
DM2 65.7 17
DM3 65.7 17
DM4 67.8 7
DM5 88.2 11
DM6 NA 23
This is the formula i used
testIC.D <- ICtab(DM1, DM2, DM3, DM4, DM5, DM6, DM7, type="qAIC")
testIC.D

Related

reshape untidy data frame, spreading rows to columns names [duplicate]

This question already has answers here:
Transpose a data frame
(6 answers)
Closed 2 years ago.
Have searched the threads but can't understand a solution that will solve the problem with the data frame that I have.
My current data frame (df):
# A tibble: 8 x 29
`Athlete` Monday...2 Tuesday...3 Wednesday...4 Thursday...5 Friday...6 Saturday...7 Sunday...8
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Date 29/06/2020 30/06/2020 43837.0 43868.0 43897.0 43928.0 43958.0
2 HR 47.0 54.0 51.0 56.0 59.0 NA NA
3 HRV 171.0 91.0 127.0 99.0 77.0 NA NA
4 Sleep Duration 9.11 7.12 8.59 7.15 8.32 NA NA
5 Sleep Efficien~ 92.0 94.0 89.0 90.0 90.0 NA NA
6 Recovery Score 98.0 66.0 96.0 72.0 46.0 NA NA
7 Life Stress NO NO NO NO NO NA NA
8 Sick NO NO NO NO NO NA NA
Have tried to use spread and pivot wider but I know there would require additional functions in order to get the desired output which beyond my level on understanding in R.
Do I need to u
Desired output:
Date HR HRV Sleep Duration Sleep Efficiency Recovery Score Life Stress Sick
29/06/2020 47.0 171.0 9.11
30/06/2020 54.0 91.0 7.12
43837.0 51.0 127.0 8.59
43868.0 56.0 99.0 7.15
43897.0 59.0 77.0 8.32
43928.0 NA NA NA
43958.0 NA NA NA
etc.
Thank you
In Base R you will do:
type.convert(setNames(data.frame(t(df[-1]), row.names = NULL), df[,1]))
Date HR HRV Sleep Duration Sleep Efficien~ Recovery Score Life Stress Sick
1 29/06/2020 47 171 9.11 92 98 NO NO
2 30/06/2020 54 91 7.12 94 66 NO NO
3 43837.0 51 127 8.59 89 96 NO NO
4 43868.0 56 99 7.15 90 72 NO NO
5 43897.0 59 77 8.32 90 46 NO NO
6 43928 NA NA NA NA NA <NA> <NA>
7 43958 NA NA NA NA NA <NA> <NA>

Time series forecasting by lm() using lapply

I was trying to forecast a time series problem using lm() and my data looks like below
Customer_key date sales
A35 2018-05-13 31
A35 2018-05-20 20
A35 2018-05-27 43
A35 2018-06-03 31
BH22 2018-05-13 60
BH22 2018-05-20 67
BH22 2018-05-27 78
BH22 2018-06-03 55
Converted my df to a list format by
df <- dcast(df, date ~ customer_key,value.var = c("sales"))
df <- subset(df, select = -c(dt))
demandWithKey <- as.list(df)
Trying to write a function such that applying this function across all customers
my_fun <- function(x) {
fit <- lm(ds_load ~ date, data=df) ## After changing to list ds_load and date column names
## are no longer available for formula
fit_b <- forecast(fit$fitted.values, h=20) ## forecast using lm()
return(data.frame(c(fit$fitted.values, fit_b[["mean"]])))
}
fcast <- lapply(df, my_fun)
I know the above function doesn't work, but basically I'm looking for getting both the fitted values and forecasted values for a grouped data.
But I've tried all other methods using tslm() (converting into time series data) and so on but no luck I can get the lm() work somehow on just one customer though. Also many questions/posts were on just fitting the model but I would like to forecast too at same time.
lm() is for a regression model
but here you have a time serie so for forecasting the serie you have to use one of the time serie model (ARMA ARCH GARCH...)
so you can use the function in r : auto.arima() in "forecast" package
I don't know what you're up to exactly, but you could make this less complicated.
Using by avoids the need to reshape your data, it splits your data e.g. by customer ID as in your case and applies a function on the subsets (i.e. it's a combination of split and lapply; see ?by).
Since you want to compare fitted and forecasted values somehow in your result, you probably need predict rather than $fitted.values, otherwise the values won't be of same length. Because your independent variable is a date in weekly intervals, you may use seq.Date and take the first date as a starting value; the sequence has length actual values (nrow each customer) plus h= argument of the forecast.
For demonstration purposes I add the fitted values as first column in the following.
res <- by(dat, dat$cus_key, function(x) {
H <- 20 ## globally define 'h'
fit <- lm(sales ~ date, x)
fitted <- fit$fitted.values
pred <- predict(fit, newdata=data.frame(
date=seq(x$date[1], length.out= nrow(x) + H, by="week")))
fcst <- c(fitted, forecast(fitted, h=H)$mean)
fit.na <- `length<-`(unname(fitted), length(pred)) ## for demonstration
return(cbind(fit.na, pred, fcst))
})
Result
res
# dat$cus_key: A28
# fit.na pred fcst
# 1 41.4 41.4 41.4
# 2 47.4 47.4 47.4
# 3 53.4 53.4 53.4
# 4 59.4 59.4 59.4
# 5 65.4 65.4 65.4
# 6 NA 71.4 71.4
# 7 NA 77.4 77.4
# 8 NA 83.4 83.4
# 9 NA 89.4 89.4
# 10 NA 95.4 95.4
# 11 NA 101.4 101.4
# 12 NA 107.4 107.4
# 13 NA 113.4 113.4
# 14 NA 119.4 119.4
# 15 NA 125.4 125.4
# 16 NA 131.4 131.4
# 17 NA 137.4 137.4
# 18 NA 143.4 143.4
# 19 NA 149.4 149.4
# 20 NA 155.4 155.4
# 21 NA 161.4 161.4
# 22 NA 167.4 167.4
# 23 NA 173.4 173.4
# 24 NA 179.4 179.4
# 25 NA 185.4 185.4
# ----------------------------------------------------------------
# dat$cus_key: B16
# fit.na pred fcst
# 1 49.0 49.0 49.0
# 2 47.7 47.7 47.7
# 3 46.4 46.4 46.4
# 4 45.1 45.1 45.1
# 5 43.8 43.8 43.8
# 6 NA 42.5 42.5
# 7 NA 41.2 41.2
# 8 NA 39.9 39.9
# 9 NA 38.6 38.6
# 10 NA 37.3 37.3
# 11 NA 36.0 36.0
# 12 NA 34.7 34.7
# 13 NA 33.4 33.4
# 14 NA 32.1 32.1
# 15 NA 30.8 30.8
# 16 NA 29.5 29.5
# 17 NA 28.2 28.2
# 18 NA 26.9 26.9
# 19 NA 25.6 25.6
# 20 NA 24.3 24.3
# 21 NA 23.0 23.0
# 22 NA 21.7 21.7
# 23 NA 20.4 20.4
# 24 NA 19.1 19.1
# 25 NA 17.8 17.8
# ----------------------------------------------------------------
# dat$cus_key: C12
# fit.na pred fcst
# 1 56.4 56.4 56.4
# 2 53.2 53.2 53.2
# 3 50.0 50.0 50.0
# 4 46.8 46.8 46.8
# 5 43.6 43.6 43.6
# 6 NA 40.4 40.4
# 7 NA 37.2 37.2
# 8 NA 34.0 34.0
# 9 NA 30.8 30.8
# 10 NA 27.6 27.6
# 11 NA 24.4 24.4
# 12 NA 21.2 21.2
# 13 NA 18.0 18.0
# 14 NA 14.8 14.8
# 15 NA 11.6 11.6
# 16 NA 8.4 8.4
# 17 NA 5.2 5.2
# 18 NA 2.0 2.0
# 19 NA -1.2 -1.2
# 20 NA -4.4 -4.4
# 21 NA -7.6 -7.6
# 22 NA -10.8 -10.8
# 23 NA -14.0 -14.0
# 24 NA -17.2 -17.2
# 25 NA -20.4 -20.4
As you can see, prediction and forecast yield the same values, since both methods are based on the same single explanatory variable date in this case.
Toy data:
set.seed(42)
dat <- transform(expand.grid(cus_key=paste0(LETTERS[1:3], sample(12:43, 3)),
date=seq.Date(as.Date("2018-05-13"), length.out=5, by="week")),
sales=sample(20:80, 15, replace=TRUE))

Slide along data frame rows and compare rows with next rows

I guess something similar should have been asked before, however I could only find an answer for python and SQL. So please notify me in the comments when this was also asked for R!
Data
Let's say we have a dataframe like this:
set.seed(1); df <- data.frame( position = 1:20,value = sample(seq(1,100), 20))
# In cause you do not get the same dataframe see the comment by #Ian Campbell - thanks!
position value
1 1 27
2 2 37
3 3 57
4 4 89
5 5 20
6 6 86
7 7 97
8 8 62
9 9 58
10 10 6
11 11 19
12 12 16
13 13 61
14 14 34
15 15 67
16 16 43
17 17 88
18 18 83
19 19 32
20 20 63
Goal
I'm interested in calculating the average value for n positions and subtract this from the average value of the next n positions, let's say n=5 for now.
What I tried
I now used this method, however when I apply this to a bigger dataframe it takes a huge amount of time, and hence wonder if there is a faster method for this.
calc <- function( pos ) {
this.five <- df %>% slice(pos:(pos+4))
next.five <- df %>% slice((pos+5):(pos+9))
differ = mean(this.five$value)- mean(next.five$value)
data.frame(dif= differ)
}
df %>%
group_by(position) %>%
do(calc(.$position))
That produces the following table:
position dif
<int> <dbl>
1 1 -15.8
2 2 9.40
3 3 37.6
4 4 38.8
5 5 37.4
6 6 22.4
7 7 4.20
8 8 -26.4
9 9 -31
10 10 -35.4
11 11 -22.4
12 12 -22.3
13 13 -0.733
14 14 15.5
15 15 -0.400
16 16 NaN
17 17 NaN
18 18 NaN
19 19 NaN
20 20 NaN
I suspect a data.table approach may be faster.
library(data.table)
setDT(df)
df[,c("roll.position","rollmean") := lapply(.SD,frollmean,n=5,fill=NA, align = "left")]
df[, result := rollmean[.I] - rollmean[.I + 5]]
df[,.(position,value,rollmean,result)]
# position value rollmean result
# 1: 1 27 46.0 -15.8
# 2: 2 37 57.8 9.4
# 3: 3 57 69.8 37.6
# 4: 4 89 70.8 38.8
# 5: 5 20 64.6 37.4
# 6: 6 86 61.8 22.4
# 7: 7 97 48.4 4.2
# 8: 8 62 32.2 -26.4
# 9: 9 58 32.0 -31.0
#10: 10 6 27.2 -35.4
#11: 11 19 39.4 -22.4
#12: 12 16 44.2 NA
#13: 13 61 58.6 NA
#14: 14 34 63.0 NA
#15: 15 67 62.6 NA
#16: 16 43 61.8 NA
#17: 17 88 NA NA
#18: 18 83 NA NA
#19: 19 32 NA NA
#20: 20 63 NA NA
Data
RNGkind(sample.kind = "Rounding")
set.seed(1); df <- data.frame( position = 1:20,value = sample(seq(1,100), 20))
RNGkind(sample.kind = "default")

Find first previous lower value for each value in dataframe column

Given the following dataframe :
set.seed(1)
my_df = data.frame(x = rep(words[1:5], 50) %>% sort(),
y = 1:250,
z = sample(seq(from = 30 , to = 90, by = 0.1), size = 250, replace = T))
my_df %>% head(30)
x y z
1 a 1 45.9
2 a 2 52.3
3 a 3 64.4
4 a 4 84.5
5 a 5 42.1
6 a 6 83.9
7 a 7 86.7
8 a 8 69.7
9 a 9 67.8
10 a 10 33.7
11 a 11 42.3
12 a 12 40.6
13 a 13 71.2
14 a 14 53.0
15 a 15 76.2
16 a 16 59.9
17 a 17 73.1
18 a 18 89.6
19 a 19 52.8
20 a 20 76.7
21 a 21 86.1
22 a 22 42.7
23 a 23 69.1
24 a 24 37.5
25 a 25 46.0
26 a 26 53.2
27 a 27 30.8
28 a 28 52.9
29 a 29 82.2
30 a 30 50.4
I would like to create the following column using dplyr mutate
for each value In column z show the row index of the first value in z which is lower.
For example:
for row 8 show 5
for row 22 show 12
I'm not sure how to do this using dplyr, but here is a data.table attempt using a self non-equi join
library(data.table)
setDT(my_df) %>% #convert to data.table
# Run a self non-equi join and find the closest lower value
.[., .N - which.max(rev(z < i.z)) + 1L, on = .(y <= y), by = .EACHI] %>%
# filter the cases where there are no such values
.[y != V1] %>%
# join the result back to the original data
my_df[., on = .(y), res := V1]
head(my_df, 22)
# x y z res
# 1: a 1 45.9 NA
# 2: a 2 52.3 1
# 3: a 3 64.4 2
# 4: a 4 84.5 3
# 5: a 5 42.1 NA
# 6: a 6 83.9 5
# 7: a 7 86.7 6
# 8: a 8 69.7 5
# 9: a 9 67.8 5
# 10: a 10 33.7 NA
# 11: a 11 42.3 10
# 12: a 12 40.6 10
# 13: a 13 71.2 12
# 14: a 14 53.0 12
# 15: a 15 76.2 14
# 16: a 16 59.9 14
# 17: a 17 73.1 16
# 18: a 18 89.6 17
# 19: a 19 52.8 12
# 20: a 20 76.7 19
# 21: a 21 86.1 20
# 22: a 22 42.7 12
I have managed to find a dplyr solution inspired
by a solution given to one of my other questions using rollapply
in this link.
set.seed(1)
my_df = data.frame(x = rep(words[1:5], 50) %>% sort(),
y = 1:250,
z = sample(seq(from = 30 , to = 90, by = 0.1), size = 250, replace = T))
my_df %>%
mutate(First_Lower_z_Backwards = row_number() - rollapply(z,
width = list((0:(-n()))),
FUN = function(x) which(x < x[1])[1] - 1,
fill = NA,
partial = T)) %>%
head(22)
x y z First_Lower_z_Backwards
1 a 1 45.9 NA
2 a 2 52.3 1
3 a 3 64.4 2
4 a 4 84.5 3
5 a 5 42.1 NA
6 a 6 83.9 5
7 a 7 86.7 6
8 a 8 69.7 5
9 a 9 67.8 5
10 a 10 33.7 NA
11 a 11 42.3 10
12 a 12 40.6 10
13 a 13 71.2 12
14 a 14 53.0 12
15 a 15 76.2 14
16 a 16 59.9 14
17 a 17 73.1 16
18 a 18 89.6 17
19 a 19 52.8 12
20 a 20 76.7 19
21 a 21 86.1 20
22 a 22 42.7 12

R program - getting particular values depending on another column

So I have data regarding Id number and time
Id number Time(hr)
1 5
2 6.1
3 7.2
4 8.3
5 9.6
6 10.9
7 13
8 15.1
9 17.2
10 19.3
11 21.4
12 23.5
13 25.6
14 27.1
15 28.6
16 30.1
17 31.8
18 33.5
19 35.2
20 36.9
21 38.6
22 40.3
23 42
24 43.7
25 45.4
I want this output
Time Id number
10 5
20 10
30 16
40 22
So I want the time to be in 10 hour intervals and get the ID that corresponds to that particular hour...I decided to use this code data <- data2[seq(0, nrow(data2), by=5), ] but instead of the Time being in 10 hr intervals...the ID number is at 10 intervals....but I dont want that output..so far I'm getting this output
Id.number Time..s.
10 19.3
20 36.9
You can use %% (mod) operator.
data[data$Time %% 10 == 0, ]
I use cut() and cumsum(table()) but I don't quite get the answer you are expecting. How exactly are you calculating this?
# first load the data
v.txt <- '1 5
2 6.1
3 7.2
4 8.3
5 9.6
6 10.9
7 13
8 15.1
9 17.2
10 19.3
11 21.4
12 23.5
13 25.6
14 27.1
15 28.6
16 30.1
17 31.8
18 33.5
19 35.2
20 36.9
21 38.6
22 40.3
23 42
24 43.7
25 45.4'
# load in the data... awkwardly...
v <- as.data.frame(matrix(as.numeric(unlist(strsplit(strsplit(v.txt, '\n')[[1]], ' +'))), byrow=T, ncol=2))
tens <- seq(from=0, by=10, to=100)
v$cut <- cut(v$Time, tens, labels=tens[-1])
v2 <- as.data.frame(cumsum(table(v$cut)))
names(v2) <- 'Time'
v2$Id <- rownames(v2)
rownames(v2) <- 1:nrow(v2)
v2 <- v2[,c(2,1)]
rm(v, v.txt, tens) # not needed anymore
v2 # the answer... but doesn't quite match your expected answer...
Id Time
1 10 5
2 20 10
3 30 15
4 40 21
5 50 25

Resources