R Upsampling a time series in a dataframe filling missing values

R Upsampling a time series in a dataframe filling missing values - r

I have collected data from two instruments, one is collected #10 Hz and the other is #100Hz.
I would like to increase the data from 10Hz to 100Hz in one dataframe to then align and merge the two dataframes together
The example data frame is:
DeltaT
Speed
Acc
HR
Player
48860,7
0,03
-0,05
0
Player1
48860,8
0,02
-0,05
0
Player1
48860,9
0,02
-0,04
0
Player1
48861,0
0,02
-0,03
0
Player1
48861,1
0,01
-0,02
0
Player1
Is there a package function that can help me create data between two points?

Manually with the approx function:
dt<- read.table(text=gsub(",", ".", 'DeltaT Speed Acc HR Player
48860,7 0,03 -0,05 0 Player1
48860,8 0,02 -0,05 0 Player1
48860,9 0,02 -0,04 0 Player1
48861,0 0,02 -0,03 0 Player1
48861,1 0,01 -0,02 0 Player1', fixed = TRUE),header=T)
upsampleDeltaT=seq(from=min(dt$DeltaT),to=max(dt$DeltaT),by=.01)
Speed<-approx(dt$DeltaT,dt$Speed,upsampleDeltaT)$y
Acc<-approx(dt$DeltaT,dt$Acc,upsampleDeltaT)$y
HR<-approx(dt$DeltaT,dt$HR,upsampleDeltaT)$y
Player <- rep(dt$Player,c(rep(10,nrow(dt)-1),1))
data.frame(upsampleDeltaT,Speed,Acc,HR,Player)
#> upsampleDeltaT Speed Acc HR Player
#> 1 48860.70 0.030 -0.050 0 Player1
#> 2 48860.71 0.029 -0.050 0 Player1
#> 3 48860.72 0.028 -0.050 0 Player1
#> 4 48860.73 0.027 -0.050 0 Player1
#> 5 48860.74 0.026 -0.050 0 Player1
#> 6 48860.75 0.025 -0.050 0 Player1
#> 7 48860.76 0.024 -0.050 0 Player1
#> 8 48860.77 0.023 -0.050 0 Player1
#> 9 48860.78 0.022 -0.050 0 Player1
#> 10 48860.79 0.021 -0.050 0 Player1
#> 11 48860.80 0.020 -0.050 0 Player1
#> 12 48860.81 0.020 -0.049 0 Player1
#> 13 48860.82 0.020 -0.048 0 Player1
#> 14 48860.83 0.020 -0.047 0 Player1
#> 15 48860.84 0.020 -0.046 0 Player1
#> 16 48860.85 0.020 -0.045 0 Player1
#> 17 48860.86 0.020 -0.044 0 Player1
#> 18 48860.87 0.020 -0.043 0 Player1
#> 19 48860.88 0.020 -0.042 0 Player1
#> 20 48860.89 0.020 -0.041 0 Player1
#> 21 48860.90 0.020 -0.040 0 Player1
#> 22 48860.91 0.020 -0.039 0 Player1
#> 23 48860.92 0.020 -0.038 0 Player1
#> 24 48860.93 0.020 -0.037 0 Player1
#> 25 48860.94 0.020 -0.036 0 Player1
#> 26 48860.95 0.020 -0.035 0 Player1
#> 27 48860.96 0.020 -0.034 0 Player1
#> 28 48860.97 0.020 -0.033 0 Player1
#> 29 48860.98 0.020 -0.032 0 Player1
#> 30 48860.99 0.020 -0.031 0 Player1
#> 31 48861.00 0.020 -0.030 0 Player1
#> 32 48861.01 0.019 -0.029 0 Player1
#> 33 48861.02 0.018 -0.028 0 Player1
#> 34 48861.03 0.017 -0.027 0 Player1
#> 35 48861.04 0.016 -0.026 0 Player1
#> 36 48861.05 0.015 -0.025 0 Player1
#> 37 48861.06 0.014 -0.024 0 Player1
#> 38 48861.07 0.013 -0.023 0 Player1
#> 39 48861.08 0.012 -0.022 0 Player1
#> 40 48861.09 0.011 -0.021 0 Player1
#> 41 48861.10 0.010 -0.020 0 Player1

library(data.table)
library(zoo)
set.seed(123)
# 10Hz and 100Hz sample data
DT10 <- data.table(time = seq(0,1, by = 0.1), value = sample(1:10, 11, replace = TRUE))
DT100 <- data.table(time = seq(0,1, by = 0.01), value = sample(1:10, 101, replace = TRUE))
# you should use setDT() if your data is not already data.table format
# join the DT10 to DT100
DT100[DT10, value2 := i.value, on = .(time)]
# intyerpolate NA-values
DT100[, value2_inter := zoo::na.approx(value2)]
#output
head(DT100, 31)
# time value value2 value2_inter
# 1: 0.00 3 3 3.0
# 2: 0.01 9 NA 3.0
# 3: 0.02 9 NA 3.0
# 4: 0.03 9 NA 3.0
# 5: 0.04 3 NA 3.0
# 6: 0.05 8 NA 3.0
# 7: 0.06 10 NA 3.0
# 8: 0.07 7 NA 3.0
# 9: 0.08 10 NA 3.0
# 10: 0.09 9 NA 3.0
# 11: 0.10 3 3 3.0
# 12: 0.11 4 NA 3.7
# 13: 0.12 1 NA 4.4
# 14: 0.13 7 NA 5.1
# 15: 0.14 5 NA 5.8
# 16: 0.15 10 NA 6.5
# 17: 0.16 7 NA 7.2
# 18: 0.17 9 NA 7.9
# 19: 0.18 9 NA 8.6
# 20: 0.19 10 NA 9.3
# 21: 0.20 7 10 10.0
# 22: 0.21 5 NA 9.8
# 23: 0.22 7 NA 9.6
# 24: 0.23 5 NA 9.4
# 25: 0.24 6 NA 9.2
# 26: 0.25 9 NA 9.0
# 27: 0.26 2 NA 8.8
# 28: 0.27 5 NA 8.6
# 29: 0.28 8 NA 8.4
# 30: 0.29 2 NA 8.2
# 31: 0.30 1 NA 8.0
# time value value2 value2_inter

Have a look at the approx function. It interpolates data for new points by.

Related

Method in R to find difference between rows with varying row spacing

I want to add an extra column in a dataframe which displays the difference between certain rows, where the distance between the rows also depends on values in the table.
I found out that:
mutate(Col_new = Col_1 - lead(Col_1, n = x))
can find the difference for a fixed n, but only a integer can be used as input. How would you find the difference between rows for a varying distance between the rows?
I am trying to get the output in Col_new, which is the difference between the i and i+n row where n should take the value in column Count. (The data is rounded so there might be 0.01 discrepancies in Col_new).
col_1 count Col_new
1 0.90 1 -0.68
2 1.58 1 -0.31
3 1.89 1 0.05
4 1.84 1 0.27
5 1.57 1 0.27
6 1.30 2 -0.26
7 1.25 2 -0.99
8 1.56 2 -1.58
9 2.24 2 -1.80
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.01
13 5.04 3 0.60
14 4.99 3 0.60
15 4.71 3 0.01
16 4.44 4 -1.84
17 4.39 4 NA
18 4.70 4 NA
19 5.38 4 NA
20 6.28 4 NA

Data:
df <- data.frame(Col_1 = c(0.90, 1.58, 1.89, 1.84, 1.57, 1.30, 1.35,
1.56, 2.24, 3.14, 4.04, 4.72, 5.04, 4.99,
4.71, 4.44, 4.39, 4.70, 5.38, 6.28),
Count = sort(rep(1:4, 5)))
Some code that generates the intended output, but can undoubtably be made more efficient.
library(dplyr)
df %>%
mutate(col_2 = sapply(1:4, function(s){lead(Col_1, n = s)})) %>%
rowwise() %>%
mutate(Col_new = Col_1 - col_2[Count]) %>%
select(-col_2)
Output:
# A tibble: 20 × 3
# Rowwise:
Col_1 Count Col_new
<dbl> <int> <dbl>
1 0.9 1 -0.68
2 1.58 1 -0.310
3 1.89 1 0.0500
4 1.84 1 0.27
5 1.57 1 0.27
6 1.3 2 -0.26
7 1.35 2 -0.89
8 1.56 2 -1.58
9 2.24 2 -1.8
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.0100
13 5.04 3 0.600
14 4.99 3 0.600
15 4.71 3 0.0100
16 4.44 4 -1.84
17 4.39 4 NA
18 4.7 4 NA
19 5.38 4 NA
20 6.28 4 NA

df %>% mutate(Col_new = case_when(
df$count == 1 ~ df$col_1 - lead(df$col_1 , n = 1),
df$count == 2 ~ df$col_1 - lead(df$col_1 , n = 2),
df$count == 3 ~ df$col_1 - lead(df$col_1 , n = 3),
df$count == 4 ~ df$col_1 - lead(df$col_1 , n = 4),
df$count == 5 ~ df$col_1 - lead(df$col_1 , n = 5)
))
col_1 count Col_new
1 0.90 1 -0.68
2 1.58 1 -0.31
3 1.89 1 0.05
4 1.84 1 0.27
5 1.57 1 0.27
6 1.30 2 -0.26
7 1.25 2 -0.99
8 1.56 2 -1.58
9 2.24 2 -1.80
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.01
13 5.04 3 0.60
14 4.99 3 0.60
15 4.71 3 0.01
16 4.44 4 -1.84
17 4.39 4 NA
18 4.70 4 NA
19 5.38 4 NA
20 6.28 4 NA
This would give you your desired results but is not a very good solution for more cases. Imagine your task with 10 or more different counts another solution is required.

using melt() does not sort by set ID

I have the data set:
Time a b
[1,] 0 5.06 9.60
[2,] 4 9.57 4.20
[3,] 8 1.78 3.90
[4,] 12 2.21 3.90
[5,] 16 4.10 5.84
[6,] 20 2.81 8.10
[7,] 24 2.70 1.18
[8,] 36 52.00 5.68
[9,] 48 NA 6.66
And I would like to reshape it to:
Time variable value
0 a 5.06
4 a 9.57
8 a 1.78
...
0 b 9.60
4 b 4.20
8 b 3.90
...
The code I am using is:
library(reshape2)
Time <- c(0,4,8,12,16,20,24,36,48)
a <- c(5.06,9.57,1.78,2.21,4.1,2.81,2.7,52,NA)
b <- c(9.6,4.2,3.9,3.9,5.84,8.1,1.18,5.68,6.66)
Mono <- cbind(Time,a,b)
mono <- melt(Mono,id="Time",na.rm=F)
Which produces:
Var1 Var2 value
1 1 Time 0.00
2 2 Time 4.00
3 3 Time 8.00
4 4 Time 12.00
5 5 Time 16.00
6 6 Time 20.00
7 7 Time 24.00
8 8 Time 36.00
9 9 Time 48.00
10 1 a 5.06
11 2 a 9.57
12 3 a 1.78
13 4 a 2.21
14 5 a 4.10
15 6 a 2.81
16 7 a 2.70
17 8 a 52.00
18 9 a NA
19 1 b 9.60
20 2 b 4.20
21 3 b 3.90
22 4 b 3.90
23 5 b 5.84
24 6 b 8.10
25 7 b 1.18
26 8 b 5.68
27 9 b 6.66
I'm sure its a small error, but I can't figure it out. It's especially frustrating because I've used melt() without problems many times before. How can I fix the code to produce the table I'm looking for?
Thanks for your help!

Use tidyr::gather() to move from wide to long format.
> df <- data.frame(time = seq(0,20,5),
a = rnorm(5,0,1),
b = rnorm(5,0,1))
> library(tidyr)
> gather(df, variable, value, -time)
time variable value
1 0 a 1.5406529
2 5 a 1.5048055
3 10 a -1.1138529
4 15 a -0.1199039
5 20 a -1.7052608
6 0 b -1.1976938
7 5 b 0.7997127
8 10 b 1.1940454
9 15 b 0.5177981
10 20 b 0.6725264

Replace ID variable values with counts of value occurrences

I have a data frame like:
DATE x y ID
06/10/2003 7.21 0.651 1
12/10/2003 5.99 0.428 1
18/10/2003 4.68 1.04 1
24/10/2003 3.47 0.363 1
30/10/2003 2.42 0.507 1
02/05/2010 2.72 0.47 2
05/05/2010 2.6 0. 646 2
08/05/2010 2.67 0.205 2
11/05/2010 3.57 0.524 2
12/05/2010 0.428 4.68 3
13/05/2010 1.04 3.47 3
14/05/2010 0.363 2.42 3
18/10/2003 0.507 2.52 3
24/10/2003 0.418 4.68 3
30/10/2003 0.47 3.47 3
29/04/2010 0.646 2.42 4
18/10/2003 3.47 2.52 4
i have the count of number of rows per group for column ID as an integer vector like 5 4 6 2
is there a way to replace the group values in column id with these integer vector 5 4 6 2
the output i am expecting is
DATE x y ID
06/10/2003 7.21 0.651 5
12/10/2003 5.99 0.428 5
18/10/2003 4.68 1.04 5
24/10/2003 3.47 0.363 5
30/10/2003 2.42 0.507 5
02/05/2010 2.72 0.47 4
05/05/2010 2.6 646 4
08/05/2010 2.67 0.205 4
11/05/2010 3.57 0.524 4
12/05/2010 0.428 4.68 6
13/05/2010 1.04 3.47 6
14/05/2010 0.363 2.42 6
18/10/2003 0.507 2.52 6
24/10/2003 0.418 4.68 6
30/10/2003 0.47 3.47 6
29/04/2010 0.646 2.42 2
18/10/2003 3.47 2.52 2
i am quite new to R and tried to find if there is any idea replace function. But having a hard time. Any help is much appreciated.
above data is just an example for understanding my requirement.

A compact solution with the data.table-package:
library(data.table)
setDT(mydf)[, ID := .N, by = ID][]
which gives:
> mydf
DATE x y ID
1: 06/10/2003 7.210 0.651 5
2: 12/10/2003 5.990 0.428 5
3: 18/10/2003 4.680 1.040 5
4: 24/10/2003 3.470 0.363 5
5: 30/10/2003 2.420 0.507 5
6: 02/05/2010 2.720 0.470 4
7: 05/05/2010 2.600 0.646 4
8: 08/05/2010 2.670 0.205 4
9: 11/05/2010 3.570 0.524 4
10: 12/05/2010 0.428 4.680 6
11: 13/05/2010 1.040 3.470 6
12: 14/05/2010 0.363 2.420 6
13: 18/10/2003 0.507 2.520 6
14: 24/10/2003 0.418 4.680 6
15: 30/10/2003 0.470 3.470 6
16: 29/04/2010 0.646 2.420 2
17: 18/10/2003 3.470 2.520 2
What this does:
setDT(mydf) converts the dataframe to a data.table
by = ID groups by ID
ID := .N replaces the original value of ID with the count by group

You can use the ave() function to calculate how many rows each ID takes up. In the example below I created a new variable ID2, but you could replace the original ID if you want.
(I included code to create your data in R below, but when you ask questions in the future please include your data in the question by using the dput() function on the data object. That's what I did to make the code below.)
mydata <- structure(list(DATE = c("06/10/2003", "12/10/2003", "18/10/2003",
"24/10/2003", "30/10/2003", "02/05/2010", "05/05/2010", "08/05/2010",
"11/05/2010", "12/05/2010", "13/05/2010", "14/05/2010", "18/10/2003",
"24/10/2003", "30/10/2003", "29/04/2010", "18/10/2003"),
x = c(7.21, 5.99, 4.68, 3.47, 2.42, 2.72, 2.6, 2.67, 3.57, 0.428, 1.04, 0.363,
0.507, 0.418, 0.47, 0.646, 3.47),
y = c(0.651, 0.428, 1.04, 0.363, 0.507, 0.47, 646, 0.205, 0.524, 4.68, 3.47,
2.42, 2.52, 4.68, 3.47, 2.42, 2.52),
ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4)),
.Names = c("DATE", "x", "y", "ID"),
class = c("data.frame"),
row.names = c(NA, -17L))
# ave() takes an input object, an object of group IDs of the same length
# as the input object, and a function to apply to the input object split across groups
mydata$ID2 <- ave(mydata$ID, mydata$ID, FUN = length)
mydata
DATE x y ID ID2
1 06/10/2003 7.210 0.651 1 5
2 12/10/2003 5.990 0.428 1 5
3 18/10/2003 4.680 1.040 1 5
4 24/10/2003 3.470 0.363 1 5
5 30/10/2003 2.420 0.507 1 5
6 02/05/2010 2.720 0.470 2 4
7 05/05/2010 2.600 646.000 2 4
8 08/05/2010 2.670 0.205 2 4
9 11/05/2010 3.570 0.524 2 4
10 12/05/2010 0.428 4.680 3 6
11 13/05/2010 1.040 3.470 3 6
12 14/05/2010 0.363 2.420 3 6
13 18/10/2003 0.507 2.520 3 6
14 24/10/2003 0.418 4.680 3 6
15 30/10/2003 0.470 3.470 3 6
16 29/04/2010 0.646 2.420 4 2
17 18/10/2003 3.470 2.520 4 2
# if you want to replace the original ID variable, you can assign to it
# instead of adding a new variable
mydata$ID <- ave(mydata$ID, mydata$ID, FUN = length)

A solution with dplyr:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(ID2 = n()) %>%
ungroup() %>%
mutate(ID = ID2) %>%
select(-ID2)
Edit:
I've just found a solution that's a bit cleaner than the above:
df %>%
group_by(ID2 = ID) %>%
mutate(ID = n()) %>%
select(-ID2)
Result:
# A tibble: 17 x 4
DATE x y ID
<fctr> <dbl> <dbl> <int>
1 06/10/2003 7.210 0.651 5
2 12/10/2003 5.990 0.428 5
3 18/10/2003 4.680 1.040 5
4 24/10/2003 3.470 0.363 5
5 30/10/2003 2.420 0.507 5
6 02/05/2010 2.720 0.470 4
7 05/05/2010 2.600 0.646 4
8 08/05/2010 2.670 0.205 4
9 11/05/2010 3.570 0.524 4
10 12/05/2010 0.428 4.680 6
11 13/05/2010 1.040 3.470 6
12 14/05/2010 0.363 2.420 6
13 18/10/2003 0.507 2.520 6
14 24/10/2003 0.418 4.680 6
15 30/10/2003 0.470 3.470 6
16 29/04/2010 0.646 2.420 2
17 18/10/2003 3.470 2.520 2
Notes:
The reason behind ungroup() %>% mutate(ID = ID2) %>% select(-ID2) is that dplyr doesn't allow mutateing on grouping variables. So this would not work:
df %>%
group_by(ID) %>%
mutate(ID = n())
Error in mutate_impl(.data, dots) : Column ID can't be modified
because it's a grouping variable
If you don't care about replacing the original ID column, you can just do:
df %>%
group_by(ID) %>%
mutate(ID2 = n())
Alternative Result:
# A tibble: 17 x 5
# Groups: ID [4]
DATE x y ID ID2
<fctr> <dbl> <dbl> <int> <int>
1 06/10/2003 7.210 0.651 1 5
2 12/10/2003 5.990 0.428 1 5
3 18/10/2003 4.680 1.040 1 5
4 24/10/2003 3.470 0.363 1 5
5 30/10/2003 2.420 0.507 1 5
6 02/05/2010 2.720 0.470 2 4
7 05/05/2010 2.600 0.646 2 4
8 08/05/2010 2.670 0.205 2 4
9 11/05/2010 3.570 0.524 2 4
10 12/05/2010 0.428 4.680 3 6
11 13/05/2010 1.040 3.470 3 6
12 14/05/2010 0.363 2.420 3 6
13 18/10/2003 0.507 2.520 3 6
14 24/10/2003 0.418 4.680 3 6
15 30/10/2003 0.470 3.470 3 6
16 29/04/2010 0.646 2.420 4 2
17 18/10/2003 3.470 2.520 4 2

taking average by groups, excluding NA value

I'm struggling with finding something to aggregate my data frame by taking the mean and ignoring the NA value, but the end results would still show a missing value them.
the data table looks for instance like this
Guar1 Bucket2 1 2 3 4 Total Month
10 -10 NA NA NA NA 0 201110
10 -0.2 0 9.87 8.42 0 18.29 201110
10 0 0.81 7.49 3.32 5.92 17.54 201110
10 0.4 0 0 NA 0 0 201110
10 999 0.73 7.57 4.61 0.77 13.68 201110
20 -10 NA NA NA NA 0 201110
20 -0.2 NA NA 100 NA 100 201110
20 0 NA 0 0 0 0 201110
20 0.4 1.39 3.13 14.04 2.98 21.54 201110
20 999 1.38 3.11 17.08 2.97 24.54 201110
999 999 1.06 5.44 8.61 1.52 16.63 201110
10 -10 NA NA NA NA 0 201111
10 -0.2 0 0 8.54 0 8.54 201111
10 0 1.87 6.12 16.6 0 24.59 201111
10 0.4 0 0 0 1.47 1.47 201111
10 999 1.68 5.82 13.15 1.67 22.32 201111
20 -10 NA NA NA NA 0 201111
20 -0.2 NA 0 NA NA 0 201111
20 0 NA NA 0 0 0 201111
20 0.4 2.29 5.38 14.91 14.18 36.76 201111
20 999 2.29 5.35 13.09 14.1 34.83 201111
And the final table
Guar1 Bucket2 1 2 3 4 Total
10 -10 NA NA NA NA 0
10 -0.2 0 4.935 8.48 0 13.415
10 0 1.34 6.805 9.96 2.96 21.065
10 0.4 0 0 0 0.735 0.735
10 999 1.205 6.695 8.88 1.22 18
20 -10 NA NA NA NA 0
20 -0.2 NA 0 100 NA 50
20 0 NA 0 0 0 0
20 0.4 1.84 4.255 14.475 8.58 29.15
20 999 1.835 4.23 15.085 8.535 29.685
999 999 1.06 5.44 8.61 1.52 16.63
I've try the
aggregate(.~ Guar1+Bucket2, df, mean, na.rm = FALSE)
but it then excluding all NA in the final table.
and if I set all the NA value in df equal to 0 then I would not have the desire average.
I hope that someone can help me with this. Thanks!

Check this example with dplyr package
You can group by more than one variable. dplyr package is great for data editing summarising end etc.
dataFrame <- data.frame(group = c("a","a","a", "b","b","b"), value = c(1,2,NA,NA,NA,3))
library("dplyr")
df <- dataFrame %>%
group_by(group) %>%
summarise(Mean = mean(value, na.rm = T))
Output
# A tibble: 2 × 2
group Mean
<fctr> <dbl>
1 a 1.5
2 b 3.0

To avoid the NA rows to be removed, use na.action = na.pass and with na.rm=TRUE from the mean, make sure that we use only the non-NA elements to get the mean
aggregate(.~ Guar1+Bucket2, df, mean, na.rm =TRUE, na.action = na.pass)

Error : missing value where TRUE/FALSE needed

WEEK PRICE QUANTITY SALE_PRICE TYPE
1 4992 5.99 2847.50 0.00 3
2 4995 3.33 36759.00 3.33 3
3 4996 5.99 2517.00 0.00 3
4 4997 5.49 2858.50 0.00 3
5 5001 3.33 32425.00 3.33 3
6 5002 5.49 4205.50 0.00 3
7 5004 5.99 4329.50 0.00 3
8 5006 2.74 55811.00 2.74 3
9 5007 5.49 4133.00 0.00 3
10 5008 5.99 4074.00 0.00 3
11 5009 3.99 12125.25 3.99 3
12 5017 2.74 77645.00 2.74 3
13 5018 5.49 5315.50 0.00 3
14 5020 2.74 78699.00 2.74 3
15 5021 5.49 5158.50 0.00 3
16 5023 5.99 5315.00 0.00 3
17 5024 5.49 6545.00 0.00 3
18 5025 3.33 63418.00 3.33 3
If there are consecutive 0 sale price entries then I want to keep last entry with sale price 0. Like I want to remove week 4996 and want to keep week 4997, I want week 5004 and I want to remove 5002. Similarly I want to delete 5021 & 5023 and want to keep week 5024.

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)). create a grouping variable with rleid based on a logical vector of the presence of 0 in 'SALE_PRICE' (!SALE_PRICE). Using the 'grp' as grouping variable, we get the last row of 'Subset of Data.table (.SD[.N]) if the 'SALE_PRICEelements areall0 orelseget the.SD` i.e. the full rows for a particular group.
library(data.table)
setDT(df1)[, grp:= rleid(!SALE_PRICE)
][,if(all(!SALE_PRICE)) .SD[.N] else .SD , grp
][, grp := NULL][]
# WEEK PRICE QUANTITY SALE_PRICE TYPE
# 1: 4992 5.99 2847.50 0.00 3
# 2: 4995 3.33 36759.00 3.33 3
# 3: 4997 5.49 2858.50 0.00 3
# 4: 5001 3.33 32425.00 3.33 3
# 5: 5004 5.99 4329.50 0.00 3
# 6: 5006 2.74 55811.00 2.74 3
# 7: 5008 5.99 4074.00 0.00 3
# 8: 5009 3.99 12125.25 3.99 3
# 9: 5017 2.74 77645.00 2.74 3
#10: 5018 5.49 5315.50 0.00 3
#11: 5020 2.74 78699.00 2.74 3
#12: 5024 5.49 6545.00 0.00 3
#13: 5025 3.33 63418.00 3.33 3
Or an option using dplyr by creating a grouping variable with diff and cumsum, then filter the rows to keep only the last row of 'SALE_PRICE' that are 0 or (|) select the rows where 'SALE_PRICE' is not 0.
library(dplyr)
df1 %>%
group_by(grp = cumsum(c(TRUE,diff(!SALE_PRICE)!=0))) %>%
filter( !duplicated(!SALE_PRICE, fromLast=TRUE)|SALE_PRICE!=0) %>%
select(-grp)
# grp WEEK PRICE QUANTITY SALE_PRICE TYPE
# (int) (int) (dbl) (dbl) (dbl) (int)
#1 1 4992 5.99 2847.50 0.00 3
#2 2 4995 3.33 36759.00 3.33 3
#3 3 4997 5.49 2858.50 0.00 3
#4 4 5001 3.33 32425.00 3.33 3
#5 5 5004 5.99 4329.50 0.00 3
#6 6 5006 2.74 55811.00 2.74 3
#7 7 5008 5.99 4074.00 0.00 3
#8 8 5009 3.99 12125.25 3.99 3
#9 8 5017 2.74 77645.00 2.74 3
#10 9 5018 5.49 5315.50 0.00 3
#11 10 5020 2.74 78699.00 2.74 3
#12 11 5024 5.49 6545.00 0.00 3
#13 12 5025 3.33 63418.00 3.33 3

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R Upsampling a time series in a dataframe filling missing values - r

Have a look at the approx function. It interpolates data for new points by.

Related

Method in R to find difference between rows with varying row spacing

using melt() does not sort by set ID

Replace ID variable values with counts of value occurrences

taking average by groups, excluding NA value

Error : missing value where TRUE/FALSE needed

Categories

Resources