Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
How can we use weight of evidence for binning continuous data in R. For e.g. I have a data:
Recency
364
91
692
13
126
4
40
93
13
33
262
12
136
21
88
16
4
19
24
89
36
5
274
125
740
6
13
715
591
443
104
853
260
125
62
357
559
155
163
16
433
91
1380
96
374
130
574
101
5
11
34
401
13
215
168
So, what should be the command to bin this variable in different groups, based on Weight of evidence, or you can say coarse classing.
Output I want is:
Group I: Recency <200
Group I: Recency 200-400
Group I: Recency >400
Thanks
cut(df$Recency, breaks = c(0, 200, 400, +Inf))
Recency gr
1 364 (200,400]
2 91 (0,200]
3 692 (400,Inf]
4 13 (0,200]
5 126 (0,200]
6 4 (0,200]
7 40 (0,200]
8 93 (0,200]
9 13 (0,200]
10 33 (0,200]
11 262 (200,400]
12 12 (0,200]
13 136 (0,200]
14 21 (0,200]
15 88 (0,200]
16 16 (0,200]
17 4 (0,200]
18 19 (0,200]
19 24 (0,200]
20 89 (0,200]
21 36 (0,200]
22 5 (0,200]
23 274 (200,400]
24 125 (0,200]
25 740 (400,Inf]
26 6 (0,200]
27 13 (0,200]
28 715 (400,Inf]
29 591 (400,Inf]
30 443 (400,Inf]
31 104 (0,200]
32 853 (400,Inf]
33 260 (200,400]
34 125 (0,200]
35 62 (0,200]
36 357 (200,400]
37 559 (400,Inf]
38 155 (0,200]
39 163 (0,200]
40 16 (0,200]
41 433 (400,Inf]
42 91 (0,200]
43 1380 (400,Inf]
44 96 (0,200]
45 374 (200,400]
46 130 (0,200]
47 574 (400,Inf]
48 101 (0,200]
49 5 (0,200]
50 11 (0,200]
51 34 (0,200]
52 401 (400,Inf]
53 13 (0,200]
54 215 (200,400]
55 168 (0,200]
Related
hello I have this df
res1 res4 aa1234
1 1 4 IVGG
2 10 13 RQFP
3 102 105 TSSV
4 112 115 LQNA
5 118 121 EAGT
6 12 15 FPFL
7 132 135 RSGG
8 138 141 SRFP
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
12 165 168 TRRG
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
16 181 184 CEGL
17 195 198 PCGR
18 20 23 NQGR
19 205 208 RVAL
20 32 35 HARF
21 39 42 AASC
22 40 43 ASCF
23 48 51 PGVS
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
33 95 98 LDRE
I want to subset it considering only rows in which res1 are in sequence as i and i <= i+4, as :
res1 res4 aa1234
29 8 11 RPRQ
6 12 15 FPFL
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
I tried something woth functions "filter" and "subset" but I didn't got the result expected.
So in general, I need to have the overlap between two rows in a range (i-i+4) including i+4.
For example, in this 3 lines there is the overlap between rows [9] and [10] (150-153 overlaps with 151-154), but also row [11] corresponds to res1[10] + 4 (151+4 = 155). So maybe an idea should be to consider res1[i] and check if res1[i+1] is =< res[i].
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
why not we are simply doing this?
df[df$res1 %in% c(df$res1 -4,df$res1 -3, df$res1-2, df$res1 -1, df$res1+1,df$res1 +2, df$res1 +3, df$res1 +4),]
res1 res4 aa1234
2 10 13 RQFP
6 12 15 FPFL
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
edited scenario just order the df, and rest will be same. See
df <- df[order(df$res1),]
df[sort(unique(c(which(rev(diff(rev(df$res1))) >= -3 & rev(diff(rev(df$res1))) <= 0), which(diff(df$res1) <= 4 & diff(df$res1) >= 0)+1))),]
res1 res4 aa1234
29 8 11 RPRQ
2 10 13 RQFP
6 12 15 FPFL
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
old answer Use this
df[sort(unique(c(which(rev(diff(rev(df$res1))) >= -3 & rev(diff(rev(df$res1))) <= 0), which(diff(df$res1) <= 4 & diff(df$res1) >= 0)+1))),]
res1 res4 aa1234
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
Data used
df <- read.table(text = "res1 res4 aa1234
1 1 4 IVGG
2 10 13 RQFP
3 102 105 TSSV
4 112 115 LQNA
5 118 121 EAGT
6 12 15 FPFL
7 132 135 RSGG
8 138 141 SRFP
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
12 165 168 TRRG
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
16 181 184 CEGL
17 195 198 PCGR
18 20 23 NQGR
19 205 208 RVAL
20 32 35 HARF
21 39 42 AASC
22 40 43 ASCF
23 48 51 PGVS
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
33 95 98 LDRE", header = T)
Assume a data.frame as follows:
df <- data.frame(name = paste0("Person",rep(1:30)),
number = sample(1:100, 30, replace=TRUE),
focus = sample(1:500, 30, replace=TRUE))
I want to split the above data.frame into 9 groups, each with 9 observations. Each person can be assigned to multiple groups (replacement), so that all 9 groups have all 10 observations (since 9 groups x 9 observations require 81 rows while the df has only 30).
The output will ideally be a large list of 1000 data.frames.
Are there any efficient ways of doing this? This is just a sample data.frame. The actual df has ~10k rows and will require 1000 groups each with 30 rows.
Many thanks.
Is this what you are looking for?
res <- replicate(1000, df[sample.int(nrow(df), 30, TRUE), ], FALSE)
df I used
df <- data.frame(name = paste0("Person",rep(1:1e4)),
number = sample(1:100, 1e4, replace=TRUE),
focus = sample(1:500, 1e4, replace=TRUE))
Output
> res[1:3]
[[1]]
name number focus
529 Person529 5 351
9327 Person9327 4 320
1289 Person1289 78 164
8157 Person8157 46 183
6939 Person6939 38 61
4066 Person4066 26 103
132 Person132 34 39
6576 Person6576 36 397
5376 Person5376 47 456
6123 Person6123 10 18
5318 Person5318 39 42
6355 Person6355 62 212
340 Person340 90 256
7050 Person7050 19 198
1500 Person1500 42 208
175 Person175 34 30
3751 Person3751 99 441
3813 Person3813 93 492
7428 Person7428 72 142
6840 Person6840 58 45
6501 Person6501 95 499
5124 Person5124 16 159
3373 Person3373 38 36
5622 Person5622 40 203
8761 Person8761 9 225
6252 Person6252 75 444
4502 Person4502 58 337
5344 Person5344 24 233
4036 Person4036 59 265
8764 Person8764 45 1
[[2]]
name number focus
8568 Person8568 87 360
3968 Person3968 67 468
4481 Person4481 46 140
8055 Person8055 73 286
7794 Person7794 92 336
1110 Person1110 6 434
6736 Person6736 4 58
9758 Person9758 60 49
9356 Person9356 89 300
9719 Person9719 100 366
4183 Person4183 5 124
1394 Person1394 87 346
2642 Person2642 81 449
3592 Person3592 65 358
579 Person579 21 395
9551 Person9551 39 495
4946 Person4946 73 32
4081 Person4081 98 270
4062 Person4062 27 150
7698 Person7698 52 436
5388 Person5388 89 177
9598 Person9598 91 474
8624 Person8624 3 464
392 Person392 82 483
5710 Person5710 43 293
4942 Person4942 99 350
3333 Person3333 89 91
6789 Person6789 99 259
7115 Person7115 100 320
1431 Person1431 77 263
[[3]]
name number focus
201 Person201 100 272
4674 Person4674 27 410
9728 Person9728 18 275
9422 Person9422 2 396
9783 Person9783 45 37
5552 Person5552 76 109
3871 Person3871 49 277
3411 Person3411 64 24
5799 Person5799 29 131
626 Person626 31 122
3103 Person3103 2 76
8043 Person8043 90 384
3157 Person3157 90 392
7093 Person7093 11 169
2779 Person2779 83 2
2601 Person2601 77 122
9003 Person9003 50 163
9653 Person9653 4 235
9361 Person9361 100 391
4273 Person4273 83 383
4725 Person4725 35 436
2157 Person2157 71 486
3995 Person3995 25 258
3735 Person3735 24 221
303 Person303 81 407
4838 Person4838 64 198
6926 Person6926 90 417
6267 Person6267 82 284
8570 Person8570 67 317
2670 Person2670 21 342
I have this dataframe
time power hr fr VE VO2 VCO2 id
1 1462.0104166666667 25 90 24 20 632 549 LM01-PRD-S1
2 1462.0194444444444 25 92 23 21 679 597 LM01-PRD-S1
3 1462.0305555555556 25 93 22 21 675 607 LM01-PRD-S1
4 1462.0416666666667 25 93 20 19 680 577 LM01-PRD-S1
5 1462.0520833333333 40 96 20 22 745 660 LM01-PRD-S1
6 1462.0618055555556 40 98 21 22 764 675 LM01-PRD-S1
7 1462.0722222222223 40 100 21 22 789 703 LM01-PRD-S1
8 1462.0826388888888 40 100 20 23 805 734 LM01-PRD-S1
9 1462.09375 55 105 22 26 911 843 LM01-PRD-S1
10 1462.1041666666667 55 105 20 25 881 831 LM01-PRD-S1
11 1462.1131944444444 55 109 19 25 895 847 LM01-PRD-S1
12 1462.1229166666667 55 112 21 25 908 868 LM01-PRD-S1
13 1462.1347222222223 70 120 21 28 981 947 LM01-PRD-S1
14 1462.1451388888888 70 120 21 29 1044 1021 LM01-PRD-S1
15 1462.1548611111111 70 122 22 27 1066 1031 LM01-PRD-S1
16 1462.1652777777779 70 127 19 30 1136 1122 LM01-PRD-S1
17 1462.1770833333333 85 130 20 32 1181 1218 LM01-PRD-S1
18 1462.1868055555556 85 141 21 32 1194 1216 LM01-PRD-S1
19 1462.1958333333334 85 139 22 34 1231 1295 LM01-PRD-S1
20 1462.2069444444444 85 139 19 32 1193 1268 LM01-PRD-S1
21 1462.2166666666667 100 139 21 31 1192 1274 LM01-PRD-S1
22 1462.2291666666667 100 146 21 38 1363 1460 LM01-PRD-S1
23 1462.2395833333333 100 150 28 50 1551 1801 LM01-PRD-S1
24 1462.2479166666667 100 148 30 51 1499 1810 LM01-PRD-S1
25 1462.2597222222223 115 150 30 55 1564 1883 LM01-PRD-S1
26 1462.2708333333333 115 153 31 56 1544 1892 LM01-PRD-S1
27 1462.2805555555556 115 157 33 59 1545 2012 LM01-PRD-S1
28 1462.2881944444443 115 157 34 62 1647 2091 LM01-PRD-S1
29 NA NA NA RÈcupÈ ration NA NA LM01-PRD-S1
30 1462.0027777777777 65 157 39 61 1466 1940 LM01-PRD-S1
31 1462.0131944444445 20 153 32 58 1518 1939 LM01-PRD-S1
32 1462.0236111111112 20 148 28 50 1422 1748 LM01-PRD-S1
33 1462.0333333333333 20 144 26 46 1222 1555 LM01-PRD-S1
34 1462.0430555555556 20 141 22 37 963 1209 LM01-PRD-S1
35 1462.0541666666666 20 133 22 42 1165 1464 LM01-PRD-S1
36 1462.0645833333333 20 133 24 47 1021 1384 LM01-PRD-S1
37 1462.0743055555556 20 130 22 40 914 1228 LM01-PRD-S1
38 1462.0854166666666 20 130 23 38 847 1128 LM01-PRD-S1
39 1462.0944444444444 20 120 18 32 755 998 LM01-PRD-S1
40 1462.1069444444445 0 117 17 29 674 904 LM01-PRD-S1
41 1462.1173611111112 0 115 20 27 587 805 LM01-PRD-S1
42 1462.1277777777777 0 113 20 28 536 803 LM01-PRD-S1
43 1462.1368055555556 0 112 18 26 489 744 LM01-PRD-S1
44 1462.1479166666666 0 110 18 25 457 703 LM01-PRD-S1
45 1462.1590277777777 0 103 19 23 419 633 LM01-PRD-S1
46 1462.16875 0 103 17 24 479 672 LM01-PRD-S1
47 1462.1791666666666 0 103 19 21 423 560 LM01-PRD-S1
48 1462.1902777777777 0 100 19 22 459 609 LM01-PRD-S1
49 1462.1993055555556 0 101 18 22 440 599 LM01-PRD-S1
50 1462.004861111111 0 98 18 22 410 572 LM01-PRD-S1
51 1.0416666666666666E-2 35 102 16 18 659 576 LB02-PRD-S1
52 1.9444444444444445E-2 35 101 17 19 729 613 LB02-PRD-S1
53 3.0555555555555555E-2 35 105 15 28 977 851 LB02-PRD-S1
54 4.0972222222222222E-2 35 96 16 28 886 852 LB02-PRD-S1
55 4.9999999999999996E-2 50 90 16 16 593 504 LB02-PRD-S1
56 6.1111111111111116E-2 50 106 18 17 737 552 LB02-PRD-S1
57 7.2222222222222229E-2 50 108 19 23 1053 775 LB02-PRD-S1
58 8.2638888888888887E-2 50 117 17 30 1236 1008 LB02-PRD-S1
59 9.2361111111111116E-2 65 113 18 29 1181 983 LB02-PRD-S1
60 0.10347222222222223 65 114 15 31 1167 1016 LB02-PRD-S1
61 0.11388888888888889 65 118 16 31 1167 1052 LB02-PRD-S1
62 0.12430555555555556 65 114 17 28 1104 967 LB02-PRD-S1
63 0.13402777777777777 80 120 17 35 1318 1172 LB02-PRD-S1
64 0.1451388888888889 80 117 16 32 1236 1153 LB02-PRD-S1
65 0.15486111111111112 80 122 17 31 1168 1094 LB02-PRD-S1
66 0.16458333333333333 80 122 17 34 1312 1205 LB02-PRD-S1
67 0.1763888888888889 95 126 18 37 1311 1274 LB02-PRD-S1
68 0.18611111111111112 95 129 18 35 1248 1201 LB02-PRD-S1
69 0.19722222222222222 95 131 15 33 1275 1196 LB02-PRD-S1
70 0.20625000000000002 95 134 18 39 1444 1381 LB02-PRD-S1
71 0.21736111111111112 110 134 19 43 1539 1472 LB02-PRD-S1
72 0.22847222222222222 110 136 19 41 1417 1406 LB02-PRD-S1
73 0.2388888888888889 110 137 20 43 1496 1437 LB02-PRD-S1
74 0.25 110 139 20 44 1561 1539 LB02-PRD-S1
75 0.25972222222222224 125 142 21 46 1561 1560 LB02-PRD-S1
76 0.26944444444444443 125 146 21 46 1535 1552 LB02-PRD-S1
77 0.28055555555555556 125 148 23 51 1698 1703 LB02-PRD-S1
78 0.29166666666666669 125 150 23 53 1725 1776 LB02-PRD-S1
79 0.30069444444444443 140 151 22 52 1726 1760 LB02-PRD-S1
80 0.31180555555555556 140 151 23 53 1713 1763 LB02-PRD-S1
81 0.32222222222222224 140 153 25 55 1807 1836 LB02-PRD-S1
82 0.33263888888888887 140 155 26 58 1897 1941 LB02-PRD-S1
83 0.34375 155 153 26 59 1929 1963 LB02-PRD-S1
84 0.35347222222222219 155 157 26 57 1843 1908 LB02-PRD-S1
85 0.36388888888888887 155 160 28 65 1942 2065 LB02-PRD-S1
86 0.375 155 164 26 64 2011 2131 LB02-PRD-S1
87 0.38472222222222219 170 166 26 65 2048 2178 LB02-PRD-S1
88 0.39583333333333331 170 166 26 64 2069 2171 LB02-PRD-S1
89 0.40625 170 169 25 64 2165 2269 LB02-PRD-S1
90 0.41666666666666669 170 169 28 76 2328 2539 LB02-PRD-S1
91 0.42638888888888887 185 169 30 76 2189 2449 LB02-PRD-S1
92 0.4368055555555555 185 171 29 73 2225 2411 LB02-PRD-S1
93 0.44722222222222219 185 171 29 68 2170 2292 LB02-PRD-S1
94 0.45763888888888887 185 171 31 82 2458 2712 LB02-PRD-S1
95 0.4680555555555555 200 171 33 89 2443 2780 LB02-PRD-S1
96 0.47847222222222219 200 173 33 87 2465 2784 LB02-PRD-S1
97 0.48888888888888887 200 176 32 88 2536 2853 LB02-PRD-S1
98 0.5 200 176 34 93 2571 2899 LB02-PRD-S1
99 0.51041666666666663 215 176 36 98 2529 2924 LB02-PRD-S1
100 0.52083333333333337 215 179 36 105 2602 3087 LB02-PRD-S1
101 0.53125 215 179 39 111 2795 3282 LB02-PRD-S1
102 0.54097222222222219 215 181 40 118 2679 3240 LB02-PRD-S1
103 0.55208333333333337 230 179 40 113 2649 3160 LB02-PRD-S1
104 0.56180555555555556 230 179 41 111 2601 3055 LB02-PRD-S1
105 0.57291666666666663 230 176 42 116 2639 3129 LB02-PRD-S1
106 0.58263888888888882 230 181 43 126 2683 3277 LB02-PRD-S1
107 0.59375 245 181 47 123 2597 3160 LB02-PRD-S1
108 0.60416666666666663 245 181 48 128 2482 3122 LB02-PRD-S1
109 NA NA NA RÈcupÈ ration NA NA LB02-PRD-S1
110 9.7222222222222224E-3 20 179 42 108 2320 2830 LB02-PRD-S1
111 2.013888888888889E-2 20 173 40 106 2134 2594 LB02-PRD-S1
112 3.125E-2 20 171 37 103 1869 2531 LB02-PRD-S1
113 4.0972222222222222E-2 20 166 38 97 1438 2207 LB02-PRD-S1
114 5.1388888888888894E-2 20 164 36 88 1192 1918 LB02-PRD-S1
115 6.1805555555555558E-2 20 155 37 81 1121 1746 LB02-PRD-S1
116 7.0833333333333331E-2 20 142 32 71 1072 1585 LB02-PRD-S1
117 8.1944444444444445E-2 20 151 26 56 961 1345 LB02-PRD-S1
118 9.2361111111111116E-2 20 148 28 58 996 1367 LB02-PRD-S1
119 0.10277777777777779 20 144 24 49 858 1189 LB02-PRD-S1
120 0.11319444444444444 20 141 25 49 722 1053 LB02-PRD-S1
121 0.125 0 136 25 42 611 895 LB02-PRD-S1
122 0.13472222222222222 0 131 26 42 642 893 LB02-PRD-S1
123 0.1451388888888889 0 129 28 44 612 874 LB02-PRD-S1
124 0.15555555555555556 0 126 24 36 544 728 LB02-PRD-S1
125 0.16527777777777777 0 127 26 40 658 840 LB02-PRD-S1
126 0.1763888888888889 0 130 23 31 511 665 LB02-PRD-S1
127 0.18611111111111112 0 126 24 39 646 815 LB02-PRD-S1
128 0.19652777777777777 0 120 25 38 527 716 LB02-PRD-S1
129 0.20694444444444446 0 120 24 36 509 684 LB02-PRD-S1
130 1462.0104166666667 25 101 20 18 712 584 GC03-PRD-S1
131 1462.0208333333333 25 99 20 17 673 551 GC03-PRD-S1
132 1462.03125 25 97 20 17 686 559 GC03-PRD-S1
133 1462.0402777777779 25 96 20 16 639 524 GC03-PRD-S1
134 1462.0506944444444 40 99 19 16 647 518 GC03-PRD-S1
135 1462.0604166666667 40 105 19 16 669 543 GC03-PRD-S1
136 1462.0729166666667 40 107 21 18 723 598 GC03-PRD-S1
137 1462.0826388888888 40 107 25 19 746 605 GC03-PRD-S1
138 1462.0916666666667 55 109 23 20 775 645 GC03-PRD-S1
139 1462.1020833333334 55 111 20 20 780 671 GC03-PRD-S1
140 1462.1118055555555 55 116 21 21 811 710 GC03-PRD-S1
141 1462.1243055555556 55 113 17 22 858 765 GC03-PRD-S1
142 1462.1340277777779 70 117 21 23 900 789 GC03-PRD-S1
143 1462.1458333333333 70 117 20 23 953 843 GC03-PRD-S1
144 1462.15625 70 120 20 25 980 882 GC03-PRD-S1
145 1462.1652777777779 70 122 22 26 1000 916 GC03-PRD-S1
146 1462.1763888888888 85 122 23 27 1049 961 GC03-PRD-S1
147 1462.1868055555556 85 126 23 28 1072 992 GC03-PRD-S1
148 1462.1965277777779 85 131 22 29 1110 1056 GC03-PRD-S1
149 1462.2076388888888 85 130 22 30 1066 1047 GC03-PRD-S1
150 1462.2173611111111 100 129 21 28 1166 1057 GC03-PRD-S1
151 1462.2284722222223 100 137 27 34 1346 1247 GC03-PRD-S1
152 1462.2395833333333 100 137 22 34 1272 1261 GC03-PRD-S1
153 1462.25 100 136 20 33 1222 1235 GC03-PRD-S1
154 1462.2590277777779 115 139 23 36 1321 1321 GC03-PRD-S1
155 1462.2701388888888 115 142 23 37 1340 1377 GC03-PRD-S1
156 1462.2798611111111 115 144 24 38 1362 1418 GC03-PRD-S1
157 1462.2909722222223 115 150 27 44 1470 1579 GC03-PRD-S1
158 1462.3013888888888 130 151 27 45 1466 1618 GC03-PRD-S1
159 1462.3125 130 153 31 54 1686 1875 GC03-PRD-S1
160 1462.3222222222223 130 155 33 59 1679 1998 GC03-PRD-S1
161 1462.3326388888888 130 157 33 59 1676 2021 GC03-PRD-S1
162 1462.3423611111111 145 157 33 61 1700 2041 GC03-PRD-S1
163 1462.3534722222223 145 160 35 64 1764 2120 GC03-PRD-S1
164 1462.3638888888888 145 160 36 67 1765 2182 GC03-PRD-S1
165 1462.3743055555556 145 162 40 71 1762 2208 GC03-PRD-S1
166 1462.0006944444444 145 162 39 69 1754 2208 GC03-PRD-S1
167 NA NA NA RÈcupÈ ration NA NA GC03-PRD-S1
168 1462.0097222222223 20 155 38 68 1687 2124 GC03-PRD-S1
169 1462.0194444444444 20 148 39 67 1576 1996 GC03-PRD-S1
170 1462.0298611111111 20 142 35 62 1390 1842 GC03-PRD-S1
171 1462.0409722222223 20 136 35 58 1189 1632 GC03-PRD-S1
172 1462.05 20 127 26 46 991 1337 GC03-PRD-S1
173 1462.0604166666667 20 117 21 26 776 896 GC03-PRD-S1
174 1462.0715277777779 20 115 22 31 855 1012 GC03-PRD-S1
175 1462.0819444444444 20 111 23 30 783 950 GC03-PRD-S1
176 1462.0930555555556 20 109 23 30 756 939 GC03-PRD-S1
177 1462.1020833333334 20 100 23 28 702 870 GC03-PRD-S1
178 1462.1131944444444 20 104 23 29 685 853 GC03-PRD-S1
179 1462.1236111111111 20 90 19 20 471 594 GC03-PRD-S1
180 1462.1340277777779 0 96 20 20 494 607 GC03-PRD-S1
181 1462.1444444444444 0 94 20 19 439 559 GC03-PRD-S1
182 1462.1548611111111 0 93 20 19 425 561 GC03-PRD-S1
183 1462.1638888888888 0 90 19 17 357 480 GC03-PRD-S1
184 1462.175 0 91 18 16 345 443 GC03-PRD-S1
185 1462.1854166666667 0 96 21 18 370 480 GC03-PRD-S1
186 1462.1958333333334 0 92 20 16 324 420 GC03-PRD-S1
187 1462.2076388888888 0 92 20 16 324 414 GC03-PRD-S1
188 1462.0083333333334 0 93 20 15 309 391 GC03-PRD-S1
189 1462.0104166666667 60 127 27 40 1267 1274 GT04-PRD-S1
190 1462.0201388888888 60 131 29 40 1264 1274 GT04-PRD-S1
191 1462.0305555555556 60 133 30 40 1281 1298 GT04-PRD-S1
192 1462.0402777777779 60 134 29 42 1304 1360 GT04-PRD-S1
193 1462.0513888888888 80 134 28 40 1274 1324 GT04-PRD-S1
194 1462.0625 80 137 28 40 1337 1335 GT04-PRD-S1
195 1462.0729166666667 80 144 29 45 1485 1501 GT04-PRD-S1
196 1462.0833333333333 80 144 30 50 1573 1630 GT04-PRD-S1
197 1462.0930555555556 100 148 30 47 1380 1478 GT04-PRD-S1
198 1462.1034722222223 100 150 30 49 1520 1576 GT04-PRD-S1
199 1462.1145833333333 100 153 31 50 1553 1589 GT04-PRD-S1
200 1462.1243055555556 100 151 31 55 1735 1818 GT04-PRD-S1
201 1462.1340277777779 120 153 32 65 1905 2146 GT04-PRD-S1
202 1462.1444444444444 120 151 32 62 1748 2026 GT04-PRD-S1
203 1462.1555555555556 120 160 31 61 1799 2041 GT04-PRD-S1
204 1462.1652777777779 120 160 30 64 1810 2105 GT04-PRD-S1
205 1462.1756944444444 140 164 33 73 1895 2314 GT04-PRD-S1
206 1462.1861111111111 140 162 33 72 1966 2345 GT04-PRD-S1
207 1462.1972222222223 140 166 36 79 2021 2470 GT04-PRD-S1
208 1462.2083333333333 140 166 35 76 2022 2450 GT04-PRD-S1
209 1462.2180555555556 160 164 37 78 2115 2491 GT04-PRD-S1
210 1462.2284722222223 160 169 40 82 2147 2583 GT04-PRD-S1
211 1462.2388888888888 160 169 38 83 2190 2647 GT04-PRD-S1
212 1462.2493055555556 160 173 38 85 2202 2713 GT04-PRD-S1
213 1462.2604166666667 180 171 38 88 2332 2837 GT04-PRD-S1
214 1462.2701388888888 180 171 41 95 2321 2937 GT04-PRD-S1
215 1462.28125 180 176 39 94 2358 2994 GT04-PRD-S1
216 1462.2909722222223 180 176 42 104 2339 3086 GT04-PRD-S1
217 1462.2979166666667 200 176 44 105 2444 3186 GT04-PRD-S1
218 NA NA NA RÈcupÈ ration NA NA GT04-PRD-S1
219 1462.0034722222222 125 179 42 97 2304 2957 GT04-PRD-S1
220 1462.0131944444445 30 171 38 92 2266 2900 GT04-PRD-S1
221 1462.0236111111112 30 166 36 93 2136 2851 GT04-PRD-S1
222 1462.0347222222222 30 166 35 91 1829 2619 GT04-PRD-S1
223 1462.0444444444445 30 162 34 83 1576 2306 GT04-PRD-S1
224 1462.0548611111112 30 160 31 65 1411 1904 GT04-PRD-S1
225 1462.0652777777777 30 155 36 78 1439 2013 GT04-PRD-S1
226 1462.0763888888889 30 153 34 69 1337 1832 GT04-PRD-S1
227 1462.0861111111112 30 153 34 66 1283 1716 GT04-PRD-S1
228 1462.0965277777777 30 144 28 49 1012 1303 GT04-PRD-S1
229 1462.1069444444445 30 134 25 41 897 1147 GT04-PRD-S1
230 1462.1180555555557 0 130 25 40 756 1051 GT04-PRD-S1
231 1462.1284722222222 0 126 20 28 500 741 GT04-PRD-S1
232 1462.1381944444445 0 123 23 27 533 712 GT04-PRD-S1
233 1462.1486111111112 0 123 23 29 548 737 GT04-PRD-S1
234 1462.1590277777777 0 117 24 24 415 560 GT04-PRD-S1
235 1462.16875 0 114 21 27 610 728 GT04-PRD-S1
236 1462.1798611111112 0 111 19 23 508 612 GT04-PRD-S1
237 1462.1902777777777 0 113 21 26 548 666 GT04-PRD-S1
238 1462.2006944444445 0 113 23 27 552 683 GT04-PRD-S1
239 1462.0020833333333 0 114 22 28 547 702 GT04-PRD-S1
I would like to remove all rows after words "ration" in the column VE BUT only for each id.
Meaning that I would like to remove lines 29 to 50, 109 to 129, 167 to 188, and from 218 to 239.
The word "ration" is repeated several times, and please take into account that I have several ID (I can not include it in my question because it is too long).
I tried to create at the end of each id but it did not work.
Thank you for your help!
With dplyr:
data %>%
group_by(id) %>%
filter(cumsum(VE == "ration") == 0)
Assuming for all the id you'll have a row with "ration", you can use dplyr like
library(dplyr)
df %>% group_by(id) %>% slice(1:(which.max(VE == "ration") -1))
I would like to merge and sum the values of each row that contains duplicated IDs.
For example, the data frame below contains a duplicated symbol 'LOC102723897'. I would like to merge these two rows and sum the value within each column, so that one row appears for the duplicated symbol.
> head(y$genes)
SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19 SM20 SM21 SM22
1 32 29 23 20 27 105 80 64 83 80 94 58 122 76 78 70 34 32 45 42 138 30
2 246 568 437 343 304 291 542 457 608 433 218 329 483 376 410 296 550 533 537 473 296 382
3 30 23 30 13 20 18 23 13 31 11 15 27 36 21 23 25 26 27 37 27 31 16
4 1450 2716 2670 2919 2444 1668 2923 2318 3867 2084 1121 2175 3022 2308 2541 1613 2196 1851 2843 2078 2180 1902
5 288 366 327 334 314 267 550 410 642 475 219 414 679 420 425 308 359 406 550 398 399 268
6 34 59 62 68 42 31 49 45 62 51 40 32 30 39 41 75 54 59 83 99 37 37
SM23 SM24 SM25 SM26 SM27 SM28 SM29 SM30 Symbol
1 41 23 57 160 84 67 87 113 LOC102723897
2 423 535 624 304 568 495 584 603 LINC01128
3 31 21 49 13 33 31 14 31 LINC00115
4 2453 3041 3590 2343 3450 3725 3336 3850 NOC2L
5 403 347 468 478 502 563 611 577 LOC102723897
6 45 51 56 107 79 105 92 131 PLEKHN1
> dim(y)
[1] 12928 30
I attempted using plyr to merge rows based on the 'Symbol' column, but it's not working.
> ddply(y$genes,"Symbol",numcolwise(sum))
> dim(y)
[1] 12928 30
> length(y$genes$Symbol)
[1] 12928
> length(unique(y$genes$Symbol))
[1] 12896
You group-by on Symbol and sum all columns.
library(dplyr)
df %>% group_by(Symbol) %>% summarise_all(sum)
using data.table
library(data.table)
setDT(df)[ , lapply(.SD, sum),by="Symbol"]
We can just use aggregate from base R
aggregate(.~ Symbol, df, FUN = sum)
Hi I have a time series data as follows:
Code Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1001 2009 183 175 151 173 169 169 158 132 91 91 146 114
1001 2010 76 103 130 103 78 72 64 96 89 91 61 62
1001 2011 73 50 99 90 74 112 113 111 112 112 97 137
1001 2012 105 140 160 129 162 161 150 167 151 161 114 120
1001 2013 140 137 153 128 137 137 135 148 116 134 121 95
1001 2014 135 145 144 110 109 130 110 58 100 109 67 66
1015 2009 21 19 19 21 17 29 56 35 46 33 42 45
1015 2010 46 29 55 62 49 48 44 37 39 46 33 39
1015 2011 59 36 52 41 36 38 42 43 37 37 37 35
1015 2012 46 53 55 41 69 41 38 42 37 50 46 48
1015 2013 64 43 58 43 50 39 29 48 45 26 51 55
1015 2014 40 54 64 58 76 59 69 66 57 60 58 55
1031 2009 2408 2370 2799 3460 3263 3102 2769 2749 3018 3283 3343 3193
1031 2010 3130 3069 3776 3348 3341 4129 3920 4131 4152 4044 4241 3522
1031 2011 3454 3768 5217 4242 4624 5105 4712 6064 5546 6049 5957 4670
1031 2012 4959 3554 2163 1274 1452 1248 1303 1278 916 906 522 324
1031 2013 537 442 417 389 469 423 328 246 291 387 201 122
1031 2014 249 203 42 30 29 36 39 16 36 23 11 19
I am trying to find the decomposition of the timeseries by Code column as different Codes have different trends and seasonality during various months.
I tried using data.table but it gives me error. Following is the code that I am using:
sa_data_ssn_cnt_ts <- data.table(sa_data_ssn_cnt_ts)
sa_data_ssn_cnt_si <- sa_data_ssn_cnt_ts[,list(SI = decompose(sa_data_ssn_cnt_ts, type = "multiplicative", filter = NULL)), by = sa_data_ssn_cnt_ts$site_id]
Error that I get:
Error in decompose(sa_data_ssn_cnt_ts, type = "multiplicative", filter = NULL) : time series has no or less than 2 periods
What is it that I am messing up here?
Is there any other way that I can get the decompositions by Code column?
Thanks a lot for the help.