Rolling condition with column criteria

Rolling condition with column criteria - r

I have combined two data sets, one which has values for every day over a period and the other has values on some days of the same period but not all.
My data looks like this:
ID Date C1 C2 C3 C4 C5
1 AA 2019-11-25 6 6 6 0 0
2 AA 2019-11-26 6 6 6 0 18
3 AA 2019-11-27 5 6 6 0 0
4 AA 2019-11-28 4 5 5 65 60
5 AA 2019-11-29 5 6 6 0 0
6 AA 2019-11-30 5 6 6 0 0
7 AA 2019-12-01 5 6 6 0 0
8 AA 2019-12-02 4 5 5 65 60
9 BB 2019-11-25 6 6 6 20 0
10 BB 2019-11-26 6 6 6 54 12
11 BB 2019-11-27 5 6 6 0 0
12 BB 2019-11-28 4 5 5 0 0
13 BB 2019-11-29 5 6 6 90 33
Where C1:C3 are from the first data set and C4:C5 is from the second.
I am trying to create a new column to achieve the following:
ID Date C1 C2 C3 C4 C5 New Column
1 AA 2019-11-25 6 6 6 0 0 1
2 AA 2019-11-26 6 6 6 0 18 0
3 AA 2019-11-27 5 6 6 0 0 1
4 AA 2019-11-28 4 5 5 65 60 0
5 AA 2019-11-27 5 6 6 0 0 1
6 AA 2019-11-27 5 6 6 0 0 2
7 AA 2019-11-27 5 6 6 0 0 3
8 AA 2019-11-28 4 5 5 65 60 0
9 BB 2019-11-25 6 6 6 20 0 0
10 BB 2019-11-26 6 6 6 54 12 0
11 BB 2019-11-27 5 6 6 0 0 1
12 BB 2019-11-28 4 5 5 0 0 2
13 BB 2019-11-29 5 6 6 90 33 0
I am trying to determine if the sum of C4 and C5 is 0, add 1 to the value above but if not then = 0
Can anyone help me on this?
Thanks!

You can create a group using cumsum :
library(dplyr)
df %>%
group_by(grp = cumsum(C4 + C5 != 0)) %>%
mutate(new_col = cumsum(as.integer(C4+C5==0))) %>%
select(-grp)
# A tibble: 13 x 8
# ID Date C1 C2 C3 C4 C5 new_col
# <chr> <chr> <int> <int> <int> <int> <int> <dbl>
# 1 AA 2019-11-25 6 6 6 0 0 1
# 2 AA 2019-11-26 6 6 6 0 18 0
# 3 AA 2019-11-27 5 6 6 0 0 1
# 4 AA 2019-11-28 4 5 5 65 60 0
# 5 AA 2019-11-29 5 6 6 0 0 1
# 6 AA 2019-11-30 5 6 6 0 0 2
# 7 AA 2019-12-01 5 6 6 0 0 3
# 8 AA 2019-12-02 4 5 5 65 60 0
# 9 BB 2019-11-25 6 6 6 20 0 0
#10 BB 2019-11-26 6 6 6 54 12 0
#11 BB 2019-11-27 5 6 6 0 0 1
#12 BB 2019-11-28 4 5 5 0 0 2
#13 BB 2019-11-29 5 6 6 90 33 0
data
df <- structure(list(ID = c("AA", "AA", "AA", "AA", "AA", "AA", "AA",
"AA", "BB", "BB", "BB", "BB", "BB"), Date = c("2019-11-25", "2019-11-26",
"2019-11-27", "2019-11-28", "2019-11-29", "2019-11-30", "2019-12-01",
"2019-12-02", "2019-11-25", "2019-11-26", "2019-11-27", "2019-11-28",
"2019-11-29"), C1 = c(6L, 6L, 5L, 4L, 5L, 5L, 5L, 4L, 6L, 6L,
5L, 4L, 5L), C2 = c(6L, 6L, 6L, 5L, 6L, 6L, 6L, 5L, 6L, 6L, 6L,
5L, 6L), C3 = c(6L, 6L, 6L, 5L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 5L,
6L), C4 = c(0L, 0L, 0L, 65L, 0L, 0L, 0L, 65L, 20L, 54L, 0L, 0L,
90L), C5 = c(0L, 18L, 0L, 60L, 0L, 0L, 0L, 60L, 0L, 12L, 0L,
0L, 33L)), class = "data.frame", row.names = c(NA, -13L))

Related

R Only Keep Rows up to a certin condition

I have a dataframe as follows
head(data)
subject block trial timeLeft timeRight stim1 stim2 Chosen
1 1 13 0 0 0 2 1 2
2 1 13 1 0 1 3 2 2
3 1 13 3 0 0 3 1 1
4 1 13 4 2 0 2 3 3
5 1 13 6 1 1 1 3 1
6 1 13 7 2 2 2 1 1
...
454 1006 14 0 0 0 6 5 5
455 1006 14 1 0 0 6 4 6
456 1006 14 3 0 1 4 5 4
457 1006 14 4 1 1 4 5 4
458 1006 14 6 1 2 6 4 6
my objective is to group by subject and block and to only keep rows prior and including where both timeLeft and timeRight =0
in this case the output would be
subject block trial timeLeft timeRight stim1 stim2 Chosen
1 1 13 0 0 0 2 1 2
2 1 13 1 0 1 3 2 2
3 1 13 3 0 0 3 1 1
...
454 1006 14 0 0 0 6 5 5
455 1006 14 1 0 0 6 4 6
Thank you in advance!
here is the structure of the data
'data.frame': 64748 obs. of 8 variables:
$ subject : num 1 1 1 1 1 1 1 1 1 1 ...
$ block : int 13 13 13 13 13 13 13 13 13 13 ...
$ trial : int 0 1 3 4 6 7 9 10 12 13 ...
$ timeLeft : int 0 0 0 2 1 2 2 1 3 4 ...
$ timeRight: int 0 1 0 0 1 2 1 3 4 4 ...
$ stim1 : int 2 3 3 2 1 2 2 3 2 2 ...
$ stim2 : int 1 2 1 3 3 1 3 1 1 1 ...
$ Chosen : int 2 2 1 3 1 1 2 1 2 2 ...

You may do this with the help of custom function -
library(dplyr)
select_rows <- function(timeLeft, timeRight) {
inds <- which(timeLeft == 0 & timeRight == 0)
if(length(inds) >= 2) inds[1]:inds[2]
else 0
}
data %>%
group_by(subject, block) %>%
slice(select_rows(timeLeft, timeRight)) %>%
ungroup
# subject block trial timeLeft timeRight stim1 stim2 Chosen
# <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 13 0 0 0 2 1 2
#2 1 13 1 0 1 3 2 2
#3 1 13 3 0 0 3 1 1
#4 1006 14 0 0 0 6 5 5
#5 1006 14 1 0 0 6 4 6
If the data is huge you may also do this with data.table -
library(data.table)
setDT(data)[, .SD[select_rows(timeLeft, timeRight)], .(subject, block)]
data
It is easier to help if you provide data in a reproducible format
data <- structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1006L, 1006L,
1006L, 1006L, 1006L), block = c(13L, 13L, 13L, 13L, 13L, 13L,
14L, 14L, 14L, 14L, 14L), trial = c(0L, 1L, 3L, 4L, 6L, 7L, 0L,
1L, 3L, 4L, 6L), timeLeft = c(0L, 0L, 0L, 2L, 1L, 2L, 0L, 0L,
0L, 1L, 1L), timeRight = c(0L, 1L, 0L, 0L, 1L, 2L, 0L, 0L, 1L,
1L, 2L), stim1 = c(2L, 3L, 3L, 2L, 1L, 2L, 6L, 6L, 4L, 4L, 6L
), stim2 = c(1L, 2L, 1L, 3L, 3L, 1L, 5L, 4L, 5L, 5L, 4L), Chosen = c(2L,
2L, 1L, 3L, 1L, 1L, 5L, 6L, 4L, 4L, 6L)), class = "data.frame", row.names =
c("1", "2", "3", "4", "5", "6", "454", "455", "456", "457", "458"))

If you want to keep all rows before timeLeft and timeRight are 0, you can try this way.
Data
subject block trial timeLeft timeRight stim1 stim2 Chosen
1 1 13 0 0 0 2 1 2
2 1 13 1 0 1 3 2 2
3 1 13 3 0 0 3 1 1
4 1 13 4 2 0 2 3 3
5 1 13 6 1 1 1 3 1
6 1 13 7 2 2 2 1 1
7 1006 14 0 0 1 6 5 5
8 1006 14 0 0 0 6 5 5
9 1006 14 1 0 0 6 4 6
10 1006 14 3 0 1 4 5 4
11 1006 14 4 1 1 4 5 4
12 1006 14 6 1 2 6 4 6
I add one more row for subject:1006, to make first row is not 0,0.
Code
df %>%
group_by(subject) %>%
mutate(key = max(which((timeLeft == 0 & timeRight ==0)))) %>%
slice(1:key)
subject block trial timeLeft timeRight stim1 stim2 Chosen key
<int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 13 0 0 0 2 1 2 3
2 1 13 1 0 1 3 2 2 3
3 1 13 3 0 0 3 1 1 3
4 1006 14 0 0 1 6 5 5 3
5 1006 14 0 0 0 6 5 5 3
6 1006 14 1 0 0 6 4 6 3

You can filter for only rows that meet the condition and then group
data %>%
filter(timeLeft > 0 & timeRight > 0) %>%
group_by(subject, block)

Create a new variable for all observations within a group that equals another variable value conditional on a thirs variable value

I have a data frame that looks like this:
cnpj
time2
n_act_contracts
12
-1
10
12
0
8
12
1
6
13
-1
3
13
0
5
13
1
7
14
1
3
14
2
5
14
3
7
15
NA
3
15
NA
5
15
NA
7
I want to define another variable that takes, for all observations that have the same cnpj, the value of the n_act_contracts when the variable time2 is equal to zero.
cnpj
time2
n_act_contracts
zero_n_act_contracts
12
-1
10
8
12
0
8
8
12
1
6
8
13
-1
3
5
13
0
5
5
13
1
7
5
14
1
3
NA
14
2
5
NA
14
3
7
NA
15
NA
3
NA
15
NA
5
NA
15
NA
7
NA
I have beeing doing it with the following lines of code, but I need to make it more efficient.
data <- data %>%
group_by(cnpj) %>%
mutate(
zero_n_act_contracts = ifelse(time2 == 0,n_act_contracts,-1000),
zero_n_act_contracts = max(zero_n_act_contracts, na.rm = TRUE),
zero_n_act_contracts = ifelse(zero_n_act_contracts == -1000,NA,zero_n_act_contracts))
obs: I have already tryied replacing base "ifelse" by dplyr: "if_else", but my code took longer to run.

We can use
library(dplyr)
data %>%
group_by(cnpj) %>%
mutate(zero_n_act_contracts = n_act_contracts[time2 == 0][1]) %>%
ungroup
-output
# A tibble: 12 x 4
# cnpj time2 n_act_contracts zero_n_act_contracts
# <int> <int> <int> <int>
# 1 12 -1 10 8
# 2 12 0 8 8
# 3 12 1 6 8
# 4 13 -1 3 5
# 5 13 0 5 5
# 6 13 1 7 5
# 7 14 1 3 NA
# 8 14 2 5 NA
# 9 14 3 7 NA
#10 15 NA 3 NA
#11 15 NA 5 NA
#12 15 NA 7 NA
data
df1 <- structure(list(cnpj = c(12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L,
14L, 15L, 15L, 15L), time2 = c(-1L, 0L, 1L, -1L, 0L, 1L, 1L,
2L, 3L, NA, NA, NA), n_act_contracts = c(10L, 8L, 6L, 3L, 5L,
7L, 3L, 5L, 7L, 3L, 5L, 7L)), class = "data.frame", row.names = c(NA,
-12L))

A data.table option
setDT(df)[,zero_n_act_contracts := n_act_contracts[!time2],cnpj]
gives
> df
cnpj time2 n_act_contracts zero_n_act_contracts
1: 12 -1 10 8
2: 12 0 8 8
3: 12 1 6 8
4: 13 -1 3 5
5: 13 0 5 5
6: 13 1 7 5
7: 14 1 3 NA
8: 14 2 5 NA
9: 14 3 7 NA
10: 15 NA 3 NA
11: 15 NA 5 NA
12: 15 NA 7 NA

R - Insert Missing Numbers in A Sequence by Group's Max Value

I'd like to insert missing numbers in the index column following these two conditions:
Partitioned by multiple columns
The minimum value is always 1
The maximum value is always the maximum for the group and type
Current Data:
group type index vol
A 1 1 200
A 1 2 244
A 1 5 33
A 2 2 66
A 2 3 2
A 2 4 199
A 2 10 319
B 1 4 290
B 1 5 188
B 1 6 573
B 1 9 122
Desired Data:
group type index vol
A 1 1 200
A 1 2 244
A 1 3 0
A 1 4 0
A 1 5 33
A 2 1 0
A 2 2 66
A 2 3 2
A 2 4 199
A 2 5 0
A 2 6 0
A 2 7 0
A 2 8 0
A 2 9 0
A 2 10 319
B 1 1 0
B 1 2 0
B 1 3 0
B 1 4 290
B 1 5 188
B 1 6 573
B 1 7 0
B 1 8 0
B 1 9 122
I've just added in spaces between the partitions for clarity.
Hope you can help out!

You can do the following
library(dplyr)
library(tidyr)
my_df %>%
group_by(group, type) %>%
complete(index = 1:max(index), fill = list(vol = 0))
# group type index vol
# 1 A 1 1 200
# 2 A 1 2 244
# 3 A 1 3 0
# 4 A 1 4 0
# 5 A 1 5 33
# 6 A 2 1 0
# 7 A 2 2 66
# 8 A 2 3 2
# 9 A 2 4 199
# 10 A 2 5 0
# 11 A 2 6 0
# 12 A 2 7 0
# 13 A 2 8 0
# 14 A 2 9 0
# 15 A 2 10 319
# 16 B 1 1 0
# 17 B 1 2 0
# 18 B 1 3 0
# 19 B 1 4 290
# 20 B 1 5 188
# 21 B 1 6 573
# 22 B 1 7 0
# 23 B 1 8 0
# 24 B 1 9 122
With group_by you specify the groups you indicated withed the white spaces. With complete you specify which columns should be complete and then what values should be filled in for the remaining column (default would be NA)
Data
my_df <-
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
type = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L),
index = c(1L, 2L, 5L, 2L, 3L, 4L, 10L, 4L, 5L, 6L, 9L),
vol = c(200L, 244L, 33L, 66L, 2L, 199L, 319L, 290L, 188L, 573L, 122L)),
class = "data.frame", row.names = c(NA, -11L))

One dplyr and tidyr possibility could be:
df %>%
group_by(group, type) %>%
complete(index = full_seq(1:max(index), 1), fill = list(vol = 0))
group type index vol
<fct> <int> <dbl> <dbl>
1 A 1 1 200
2 A 1 2 244
3 A 1 3 0
4 A 1 4 0
5 A 1 5 33
6 A 2 1 0
7 A 2 2 66
8 A 2 3 2
9 A 2 4 199
10 A 2 5 0
11 A 2 6 0
12 A 2 7 0
13 A 2 8 0
14 A 2 9 0
15 A 2 10 319
16 B 1 1 0
17 B 1 2 0
18 B 1 3 0
19 B 1 4 290
20 B 1 5 188
21 B 1 6 573
22 B 1 7 0
23 B 1 8 0
24 B 1 9 122

How to keep sequence of numbers until certain row number reached

I have been trying to assign numbers with sequence. I would like to further add to repeat the sequence until the certain row numbers reached. For example repeat the sequence for every 44th row.
Here is what I mean
test_table <- data.frame(col=rep(0:10,each=11), row=c(rev(0:10)))
and assigning cumulative numbers in this way
> library(dplyr)
test_table%>%
mutate(No=(row_number() - 1) %/% 11)
test_table
col row No
1 0 10 0
2 0 9 0
3 0 8 0
4 0 7 0
5 0 6 0
6 0 5 0
7 0 4 0
8 0 3 0
9 0 2 0
10 0 1 0
11 0 0 0
12 1 10 1
13 1 9 1
14 1 8 1
15 1 7 1
16 1 6 1
17 1 5 1
18 1 4 1
19 1 3 1
20 1 2 1
21 1 1 1
22 1 0 1
23 2 10 2
24 2 9 2
25 2 8 2
26 2 7 2
27 2 6 2
28 2 5 2
29 2 4 2
30 2 3 2
31 2 2 2
32 2 1 2
33 2 0 2
34 3 10 3
35 3 9 3
36 3 8 3
37 3 7 3
38 3 6 3
39 3 5 3
40 3 4 3
41 3 3 3
42 3 2 3
43 3 1 3
44 3 0 3
45 4 10 4
46 4 9 4
47 4 8 4
48 4 7 4
49 4 6 4
50 4 5 4
51 4 4 4
52 4 3 4
53 4 2 4
54 4 1 4
55 4 0 4
56 5 10 5
57 5 9 5
58 5 8 5
59 5 7 5
60 5 6 5
61 5 5 5
62 5 4 5
63 5 3 5
64 5 2 5
65 5 1 5
66 5 0 5
67 6 10 6
68 6 9 6
69 6 8 6
70 6 7 6
71 6 6 6
72 6 5 6
73 6 4 6
74 6 3 6
75 6 2 6
76 6 1 6
77 6 0 6
78 7 10 7
79 7 9 7
80 7 8 7
81 7 7 7
82 7 6 7
83 7 5 7
84 7 4 7
85 7 3 7
86 7 2 7
87 7 1 7
88 7 0 7
89 8 10 8
90 8 9 8
91 8 8 8
92 8 7 8
93 8 6 8
94 8 5 8
95 8 4 8
96 8 3 8
97 8 2 8
98 8 1 8
99 8 0 8
100 9 10 9
101 9 9 9
102 9 8 9
103 9 7 9
104 9 6 9
105 9 5 9
106 9 4 9
107 9 3 9
108 9 2 9
109 9 1 9
110 9 0 9
111 10 10 10
112 10 9 10
113 10 8 10
114 10 7 10
115 10 6 10
116 10 5 10
117 10 4 10
118 10 3 10
119 10 2 10
120 10 1 10
121 10 0 10
Ok. Good! But I would like to keep the sequence for example 0 and 1 until the 44th row reached. After that, start to the new sequence from 2 and go 88th row like this.
So the expected output will be
test_table
col row No
1 0 10 0
2 0 9 0
3 0 8 0
4 0 7 0
5 0 6 0
6 0 5 0
7 0 4 0
8 0 3 0
9 0 2 0
10 0 1 0
11 0 0 0
12 1 10 1
13 1 9 1
14 1 8 1
15 1 7 1
16 1 6 1
17 1 5 1
18 1 4 1
19 1 3 1
20 1 2 1
21 1 1 1
22 1 0 1
23 2 10 0
24 2 9 0
25 2 8 0
26 2 7 0
27 2 6 0
28 2 5 0
29 2 4 0
30 2 3 0
31 2 2 0
32 2 1 0
33 2 0 0
34 3 10 1
35 3 9 1
36 3 8 1
37 3 7 1
38 3 6 1
39 3 5 1
40 3 4 1
41 3 3 1
42 3 2 1
43 3 1 1
44 3 0 1
45 4 10 2
46 4 9 2
47 4 8 2
48 4 7 2
49 4 6 2
50 4 5 2
51 4 4 2
52 4 3 2
53 4 2 2
54 4 1 2
55 4 0 2
56 5 10 3
57 5 9 3
58 5 8 3
59 5 7 3
60 5 6 3
61 5 5 3
62 5 4 3
63 5 3 3
64 5 2 3
65 5 1 3
66 5 0 3
67 6 10 2
68 6 9 2
69 6 8 2
70 6 7 2
71 6 6 2
72 6 5 2
73 6 4 2
74 6 3 2
75 6 2 2
76 6 1 2
77 6 0 2
78 7 10 3
79 7 9 3
80 7 8 3
81 7 7 3
82 7 6 3
83 7 5 3
84 7 4 3
85 7 3 3
86 7 2 3
87 7 1 3
88 7 0 3
89 8 10 4
90 8 9 4
91 8 8 4
92 8 7 4
93 8 6 4
94 8 5 4
95 8 4 4
96 8 3 4
97 8 2 4
98 8 1 4
99 8 0 4
100 9 10 5
101 9 9 5
102 9 8 5
103 9 7 5
104 9 6 5
105 9 5 5
106 9 4 5
107 9 3 5
108 9 2 5
109 9 1 5
110 9 0 5
111 10 10 4
112 10 9 4
113 10 8 4
114 10 7 4
115 10 6 4
116 10 5 4
117 10 4 4
118 10 3 4
119 10 2 4
120 10 1 4
121 10 0 4
How can we do that ?
Thanks in advance!

This would do it in more general way
num.seq = 11L # total number of sequences in the first column
num.rows = N * num.seq # total number of rows
seq.length.3 = 44 # length of the pattern in the 3rd column
# number of paterns in the 3rd column
num.seq.3 = ( num.rows - 1 ) %/% seq.length.3 +1
# starting number in the sequence of the 3rd column
nseq=0
# vector for the 3rd column (could be done right in data frame def.)
No = (rep(rep( nseq:(nseq+1), each = N, times= 2), times=num.seq.3) +
rep(0:(num.seq.3 -1)*2, each= seq.length.3)) [1:num.rows]
test_table <- data.frame(col=rep(0:10,each=11),
row=c(rev(0:10)),
No=No)
An alternative way:
library(dplyr)
dt2 <- test_table%>%
mutate(No = (row_number() - 1) %/% 11)
dt2$No <- dt2$No %% 2 + (rep(0:num.seq.3, each =44, times=num.seq.3 )*2)
[1:num.rows]

The arithmetic, which is totally dependent on row numbers, seems right this way.
test_table%>%
mutate(No=((row_number() - 1) %/% 11) %% 2) %>% # alternating 11 rows of 0's and 1's
mutate(No = No + ((row_number() - 1) %/% 44) * 2) # add 2 every after 44 rows
Here is the result, as intended.
structure(list(col = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L), row = c(10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L,
10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L,
6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L,
2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L,
9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L,
5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L,
8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L,
4L, 3L, 2L, 1L, 0L), No = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4)), class = "data.frame", .Names = c("col", "row",
"No"), row.names = c(NA, -121L))

This would deliver what I understand to be the requested vector (except I think your sequencing"skipped a beat"):
c( rep( c(1,2,1,2), each=11) , rep(c(3,4,3,4), each=11), rep(c(5,6,5,6), each=11) )
[1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3
[48] 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
[95] 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6
A more general way:
c( sapply( seq(1, 6, by=2), function(start) {
rep( rep(start:(start+1) , 2), each=11) }))
The outer c() will remove the matrix character that sapply defaults to.

Dataframe, split by column values and put into new columns

I am new to R, and currently working with setting up my data.
My data comes in a format where I each row contains a single measurement (DV), and a column with an explanation for the type of measurement (DVID).
Here is an example of my data:
ID TIME DV DVID
1 0 0.0 7
1 1 27.5 1
1 1 0.0 7
1 4 19.6 1
1 4 0.0 7
1 8 17.9 1
1 8 0.0 7
1 12 17.7 1
1 12 0.0 7
1 24 19.6 1
1 24 0.0 7
1 48 32.9 1
1 48 0.0 7
2 0 0.0 7
2 1 0.0 7
2 4 0.0 7
2 8 0.0 7
2 12 0.0 7
2 24 0.0 7
2 48 27.3 1
2 72 30.9 1
2 72 0.0 7
2 96 20.8 1
3 0 1.0 7
3 1 7.0 1
3 1 0.0 7
3 4 15.0 1
3 4 0.0 7
3 8 27.2 1
3 8 0.0 7
3 12 0.0 7
3 24 47.0 1
3 24 0.0 7
3 48 65.4 1
3 48 0.0 7
3 72 68.7 1
3 72 0.0 7
3 96 82.8 1
3 96 0.0 7
3 120 70.5 1
What I want to do is to "pair together" the different types of measurements, so I have one column with the measurements that is one type (DVID=1) and another column with the measurements that is another type (DVID=7).
I also need to delete the measurements where I don't have both type of measurements (or, alternatively, put in NA in these fields)
An example of this looks like:
ID TIME DV_1 DV_7
1 1 27.5 0
1 4 19.6 0
1 8 17.9 0
1 12 17.7 0
1 24 19.6 0
1 48 32.9 0
The purpose is that I want to be able to plot the DVID = 1 values against the DVID = 7 values.
Can anyone here help me with doing this?
I now that i probably have to use functions in the split and apply family, but I have no idea about where to start.
Thanks in advance!

Here is one approach.
library(dplyr)
library(tidyr)
#Create one column for group1 and another for group7 in DVID
ana <- spread(foo, DVID, DV)
colnames(ana) <- c("ID", "TIME", "DV1", "DV7")
# Remove rows which have NA
filter(ana, !DV1 %in% NA & !DV7 %in% NA)
# ID TIME DV1 DV7
#1 1 1 27.5 0
#2 1 4 19.6 0
#3 1 8 17.9 0
#4 1 12 17.7 0
#5 1 24 19.6 0
#6 1 48 32.9 0
#7 2 72 30.9 0
#8 3 1 7.0 0
#9 3 4 15.0 0
#10 3 8 27.2 0
#11 3 24 47.0 0
#12 3 48 65.4 0
#13 3 72 68.7 0
#14 3 96 82.8 0
Another way could be this given you convert your data frame to data.table
setDT(foo)
bob <- dcast.data.table(foo, ID + TIME ~ DVID, value.var = "DV")
setnames(bob, c("1","7"), c("DV1", "DV7"))[!DV1 %in% NA & !DV7 %in% NA, ]
Update
Given #Arun's advice, the 3rd line can be like this using data.table 1.9.5
na.omit(bob, by=c("1", "7"))

You appear to be wanting to reshape your data. Use cast from the reshape package.
library(reshape)
# read data
dfX = read.table(textConnection("ID TIME DV DVID
1 0 0.0 7
1 1 27.5 1
1 1 0.0 7
1 4 19.6 1
1 4 0.0 7
1 8 17.9 1
1 8 0.0 7
1 12 17.7 1
1 12 0.0 7
1 24 19.6 1
1 24 0.0 7
1 48 32.9 1
1 48 0.0 7
2 0 0.0 7
2 1 0.0 7
2 4 0.0 7
2 8 0.0 7
2 12 0.0 7
2 24 0.0 7
2 48 27.3 1
2 72 30.9 1
2 72 0.0 7
2 96 20.8 1
3 0 1.0 7
3 1 7.0 1
3 1 0.0 7
3 4 15.0 1
3 4 0.0 7
3 8 27.2 1
3 8 0.0 7
3 12 0.0 7
3 24 47.0 1
3 24 0.0 7
3 48 65.4 1
3 48 0.0 7
3 72 68.7 1
3 72 0.0 7
3 96 82.8 1
3 96 0.0 7
3 120 70.5 1"), header = TRUE)
# reshape the data
reshape::cast(dfX, ID + TIME ~ DVID, value = "DV")
Here is the output:
> reshape::cast(dfX, ID + TIME ~ DVID, value = "DV")
ID TIME 1 7
1 1 0 NA 0
2 1 1 27.5 0
3 1 4 19.6 0
4 1 8 17.9 0
5 1 12 17.7 0
6 1 24 19.6 0
7 1 48 32.9 0
8 2 0 NA 0
9 2 1 NA 0
10 2 4 NA 0
11 2 8 NA 0
12 2 12 NA 0
13 2 24 NA 0
14 2 48 27.3 NA
15 2 72 30.9 0
16 2 96 20.8 NA
17 3 0 NA 1
18 3 1 7.0 0
19 3 4 15.0 0
20 3 8 27.2 0
21 3 12 NA 0
22 3 24 47.0 0
23 3 48 65.4 0
24 3 72 68.7 0
25 3 96 82.8 0
26 3 120 70.5 NA

In addition, you could use reshape from base R
na.omit(reshape(df, idvar = c("ID","TIME"),
timevar="DVID", direction = "wide"))[,c(1:2,4:3)]
# ID TIME DV.1 DV.7
#2 1 1 27.5 0
#4 1 4 19.6 0
#6 1 8 17.9 0
#8 1 12 17.7 0
#10 1 24 19.6 0
#12 1 48 32.9 0
#21 2 72 30.9 0
#25 3 1 7.0 0
#27 3 4 15.0 0
#29 3 8 27.2 0
#32 3 24 47.0 0
#34 3 48 65.4 0
#36 3 72 68.7 0
#38 3 96 82.8 0
data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), TIME = c(0L,
1L, 1L, 4L, 4L, 8L, 8L, 12L, 12L, 24L, 24L, 48L, 48L, 0L, 1L,
4L, 8L, 12L, 24L, 48L, 72L, 72L, 96L, 0L, 1L, 1L, 4L, 4L, 8L,
8L, 12L, 24L, 24L, 48L, 48L, 72L, 72L, 96L, 96L, 120L), DV = c(0,
27.5, 0, 19.6, 0, 17.9, 0, 17.7, 0, 19.6, 0, 32.9, 0, 0, 0, 0,
0, 0, 0, 27.3, 30.9, 0, 20.8, 1, 7, 0, 15, 0, 27.2, 0, 0, 47,
0, 65.4, 0, 68.7, 0, 82.8, 0, 70.5), DVID = c(7L, 1L, 7L, 1L,
7L, 1L, 7L, 1L, 7L, 1L, 7L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 1L,
1L, 7L, 1L, 7L, 1L, 7L, 1L, 7L, 1L, 7L, 7L, 1L, 7L, 1L, 7L, 1L,
7L, 1L, 7L, 1L)), .Names = c("ID", "TIME", "DV", "DVID"), class = "data.frame", row.names = c(NA,
-40L))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Rolling condition with column criteria - r

Related

R Only Keep Rows up to a certin condition

Create a new variable for all observations within a group that equals another variable value conditional on a thirs variable value

R - Insert Missing Numbers in A Sequence by Group's Max Value

How to keep sequence of numbers until certain row number reached

Dataframe, split by column values and put into new columns

Categories

Resources