Filling in multiple columns of missing data from another dataset

Filling in multiple columns of missing data from another dataset - r

I have a data set that contains some missing values which can be completed by merging with a another dataset. My example:
This is the updated data set I am working with.
DF1
Name Paper Book Mug soap computer tablet coffee coupons
1 2 3 4 5 6 7 8 9
2 21 22 23 23 23 7 23 9
3 56 57 58 59 60 7 62 9
4 80.33333 81.33333 82.33333 83 83.66667 7 85 9
5 107.3333 108.3333 109.3333 110 110.6667 7 112 9
6 134.3333 135.3333 136.3333 137 137.6667 7 139 9
7 161.3333 162.3333 163.3333 164 164.6667
8 188.3333 189.3333 190.3333 191 191.6667 7 193 9
9 215.3333 216.3333 217.3333 218 218.6667 7 220 9
10 242.3333 243.3333 244.3333 245 245.6667 7 247 9
11 269.3333 270.3333 271.3333 272 272.6667 7 274 9
12 296.3333 297.3333 298.3333 299 299.6667
13 323.3333 324.3333 325.3333 326 326.6667 7 328 9
14 350.3333 351.3333 352.3333 353 353.6667 7 355 9
15 377.3333 378.3333 379.3333 380 380.6667
16 404.3333 405.3333 406.3333 407 407.6667 7 409 9
17 431.3333 432.3333 433.3333 434 434.6667 7 436 9
18 458.3333 459.3333 460.3333 461 461.6667 7 463 9
19 485.3333 486.3333 487.3333 488 488.6667
DF2
Name Paper Book Mug soap computer tablet coffee coupons
7 161.3333 162.3333 163.3333 164 164.6667 6 6 6
12 296.3333 297.3333 298.3333 299 299.6667 88 96 25
15 377.3333 378.3333 379.3333 380 380.6667 88 62 25
19 485.3333 486.3333 487.3333 488 488.6667 88 88 78
I want to get:
Name Paper Book Mug soap computer tablet coffee coupons
1 2 3 4 5 6 7 8 9
2 21 22 23 23 23 7 23 9
3 56 57 58 59 60 7 62 9
4 80.33333 81.33333 82.33333 83 83.66667 7 85 9
5 107.3333 108.3333 109.3333 110 110.6667 7 112 9
6 134.3333 135.3333 136.3333 137 137.6667 7 139 9
7 161.3333 162.3333 163.3333 164 164.6667 6 6 6
8 188.3333 189.3333 190.3333 191 191.6667 7 193 9
9 215.3333 216.3333 217.3333 218 218.6667 7 220 9
10 242.3333 243.3333 244.3333 245 245.6667 7 247 9
11 269.3333 270.3333 271.3333 272 272.6667 7 274 9
12 296.3333 297.3333 298.3333 299 299.6667 88 96 25
13 323.3333 324.3333 325.3333 326 326.6667 7 328 9
14 350.3333 351.3333 352.3333 353 353.6667 7 355 9
15 377.3333 378.3333 379.3333 380 380.6667 88 62 25
16 404.3333 405.3333 406.3333 407 407.6667 7 409 9
17 431.3333 432.3333 433.3333 434 434.6667 7 436 9
18 458.3333 459.3333 460.3333 461 461.6667 7 463 9
19 485.3333 486.3333 487.3333 488 488.6667 88 88 78
I have tried the following code:
DF1[,c(4:6)][is.na(DF1[,c(4:6)]<-DF2[,c(2:4)][match(DF1[,1],DF2[,1])]
[which(is.na(DF1[,c(4:6)]))]
One of the solutions using dplyr will work, if I omit the columns which are already complete. Not sure if it my version of dplyr, which I have updated last week.
Any help is greatly appreciated! Thanks!

We can do a left join and then coalesce the columns
library(dplyr)
DF1 %>%
left_join(DF2, by = c('NameVar')) %>%
transmute(NameVar, Var1, Var2,
Var3 = coalesce(Var3.x, Var3.y),
Var4 = coalesce(Var4.x, Var4.y),
Var5 = coalesce(Var5.x, Var5.y))
-output
# NameVar Var1 Var2 Var3 Var4 Var5
#1 Sub1 30 45 40 34 65
#2 Sub2 25 30 30 45 45
#3 Sub3 74 34 25 30 49
#4 Sub4 30 45 40 34 65
#5 Sub5 25 30 69 56 72
#6 Sub6 74 34 74 34 60
Or using data.table
library(data.table)
nm1 <- setdiff(intersect(names(DF1), names(DF2)), 'NameVar')
setDT(DF1)[DF2, (nm1) := Map(fcoalesce, mget(nm1),
mget(paste0("i.", nm1))), on = .(NameVar)]
data
DF1 <- structure(list(NameVar = c("Sub1", "Sub2", "Sub3", "Sub4", "Sub5",
"Sub6"), Var1 = c(30L, 25L, 74L, 30L, 25L, 74L), Var2 = c(45L,
30L, 34L, 45L, 30L, 34L), Var3 = c(40L, NA, NA, 40L, 69L, NA),
Var4 = c(34L, NA, NA, 34L, 56L, NA), Var5 = c(65L, NA, NA,
65L, 72L, NA)), class = "data.frame", row.names = c(NA, -6L
))
DF2 <- structure(list(NameVar = c("Sub2", "Sub3", "Sub6"), Var3 = c(30L,
25L, 74L), Var4 = c(45L, 30L, 34L), Var5 = c(45L, 49L, 60L)),
class = "data.frame", row.names = c(NA,
-3L))

Related

Merge 2 data frames using common date, plus 2 rows before and n-1 rows after

So i need to merge 2 data frames:
The first data frame contains dates in YYYY-mm-dd format and event lengths:
datetime length
2003-06-03 1
2003-06-07 1
2003-06-13 1
2003-06-17 3
2003-06-28 5
2003-07-10 1
2003-07-23 1
...
The second data frame contains dates in the same format and discharge data:
datetime q
2003-05-29 36.2
2003-05-30 34.6
2003-05-31 33.1
2003-06-01 30.7
2003-06-02 30.0
2003-06-03 153.0
2003-06-04 69.0
...
The second data frame is much larger.
I want to merge/join only the following rows of the second data frame to the first:
all rows that have the same date as the first frame (I know this can be done with left_join(df1,df2, by = c("datetime"))
two rows before that row
n-1 rows after that row, where n = "length" value of row in first data frame.
I would like to identify the rows belonging to the same event as well.
Ideally i would have the following output: (Notice the event from 2003-06-17)
EventDatesNancy length q event#
2003-06-03 1 153.0 1
2003-06-07 1 120.0 2
2003-06-13 1 45.3 3
2003-06-15 na 110.0 4
2003-06-16 na 53.1 4
2003-06-17 3 78.0 4
2003-06-18 na 167.0 4
2003-06-19 na 145.0 4
...
I hope this makes clear what I am trying to do.

This might be one approach using tidyverse and fuzzyjoin.
First, indicate event numbers in your first data.frame. Add two columns to indicate the start and end dates (start date is 2 days before the date, and end date is length days - 1 after the date).
Then, you can use fuzzy_inner_join to get the selected rows from the second data.frame. Here, you will want to include where the datetime in the second data.frame falls after the start date and before the end date of the first data.frame.
library(tidyverse)
library(fuzzyjoin)
df1$event <- seq_along(1:nrow(df1))
df1$start_date <- df1$datetime - 2
df1$end_date <- df1$datetime + df1$length - 1
fuzzy_inner_join(
df1,
df2,
by = c("start_date" = "datetime", "end_date" = "datetime"),
match_fun = c(`<=`, `>=`)
) %>%
select(datetime.y, length, q, event)
I tried this out with some made up data:
R> df1
datetime length
1 2003-06-03 1
2 2003-06-12 1
3 2003-06-21 1
4 2003-06-30 3
5 2003-07-09 5
6 2003-07-18 1
7 2003-07-27 1
8 2003-08-05 2
9 2003-08-14 1
10 2003-08-23 1
11 2003-09-01 3
R> df2
datetime q
1 2003-06-03 44
2 2003-06-04 52
3 2003-06-05 34
4 2003-06-06 20
5 2003-06-07 57
6 2003-06-08 67
7 2003-06-09 63
8 2003-06-10 51
9 2003-06-11 56
10 2003-06-12 37
11 2003-06-13 16
12 2003-06-14 54
13 2003-06-15 46
14 2003-06-16 6
15 2003-06-17 32
16 2003-06-18 91
17 2003-06-19 61
18 2003-06-20 42
19 2003-06-21 28
20 2003-06-22 98
21 2003-06-23 77
22 2003-06-24 81
23 2003-06-25 13
24 2003-06-26 15
25 2003-06-27 73
26 2003-06-28 38
27 2003-06-29 27
28 2003-06-30 49
29 2003-07-01 10
30 2003-07-02 89
31 2003-07-03 9
32 2003-07-04 80
33 2003-07-05 68
34 2003-07-06 26
35 2003-07-07 31
36 2003-07-08 29
37 2003-07-09 84
38 2003-07-10 60
39 2003-07-11 19
40 2003-07-12 97
41 2003-07-13 35
42 2003-07-14 47
43 2003-07-15 70
This will give the following output:
datetime.y length q event
1 2003-06-03 1 44 1
2 2003-06-10 1 51 2
3 2003-06-11 1 56 2
4 2003-06-12 1 37 2
5 2003-06-19 1 61 3
6 2003-06-20 1 42 3
7 2003-06-21 1 28 3
8 2003-06-28 3 38 4
9 2003-06-29 3 27 4
10 2003-06-30 3 49 4
11 2003-07-01 3 10 4
12 2003-07-02 3 89 4
13 2003-07-07 5 31 5
14 2003-07-08 5 29 5
15 2003-07-09 5 84 5
16 2003-07-10 5 60 5
17 2003-07-11 5 19 5
18 2003-07-12 5 97 5
19 2003-07-13 5 35 5
If the output desired is different than above, please let me know what should be different so that I can correct it.
Data
df1 <- structure(list(datetime = structure(c(12206, 12215, 12224, 12233,
12242, 12251, 12260, 12269, 12278, 12287, 12296), class = "Date"),
length = c(1, 1, 1, 3, 5, 1, 1, 2, 1, 1, 3), event = 1:11,
start_date = structure(c(12204, 12213, 12222, 12231, 12240,
12249, 12258, 12267, 12276, 12285, 12294), class = "Date"),
end_date = structure(c(12206, 12215, 12224, 12235, 12246,
12251, 12260, 12270, 12278, 12287, 12298), class = "Date")), row.names = c(NA,
-11L), class = "data.frame")
df2 <- structure(list(datetime = structure(c(12206, 12207, 12208, 12209,
12210, 12211, 12212, 12213, 12214, 12215, 12216, 12217, 12218,
12219, 12220, 12221, 12222, 12223, 12224, 12225, 12226, 12227,
12228, 12229, 12230, 12231, 12232, 12233, 12234, 12235, 12236,
12237, 12238, 12239, 12240, 12241, 12242, 12243, 12244, 12245,
12246, 12247, 12248), class = "Date"), q = c(44L, 52L, 34L, 20L,
57L, 67L, 63L, 51L, 56L, 37L, 16L, 54L, 46L, 6L, 32L, 91L, 61L,
42L, 28L, 98L, 77L, 81L, 13L, 15L, 73L, 38L, 27L, 49L, 10L, 89L,
9L, 80L, 68L, 26L, 31L, 29L, 84L, 60L, 19L, 97L, 35L, 47L, 70L
)), class = "data.frame", row.names = c(NA, -43L))

Splitting data.frame into matrices and multiplying the diagonal elements to produce a new column

here is my data structure ;
structure(list(a = c(57L, 39L, 31L, 70L, 8L, 93L, 68L, 85L),
b = c(161L, 122L, 101L, 104L, 173L, 192L, 110L, 152L)), class = "data.frame", row.names = c(NA,
-8L))
each two row represents a separate matrix, for example;
a b
<int> <int>
1 57 161
2 39 122
I want to multiply first row's a and second row's b then save it into a variable called c. Then repeat the operation for first row's b and second row's a then save it c again.
For a matrix, desired output is like this;
a b c
<int> <int> <dbl>
1 57 161 6954
2 39 122 6279
For whole data, desired output is like this;
a b c
<int> <int> <dbl>
1 57 161 6954
2 39 122 6279
3 31 101 3224
4 70 104 7070
5 8 173 1536
6 93 192 16089
7 68 110 10336
8 85 152 9350
base R functions would be much better.
Thanks in advance.

We can create a group with gl
library(dplyr)
df1 %>%
group_by(grp = as.integer(gl(n(), 2, n()))) %>%
mutate(c = a * rev(b)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 8 × 3
a b c
<int> <int> <int>
1 57 161 6954
2 39 122 6279
3 31 101 3224
4 70 104 7070
5 8 173 1536
6 93 192 16089
7 68 110 10336
8 85 152 9350
Or with ave from base R
df1$c <- with(df1, a * ave(b, as.integer(gl(length(b), 2, length(b))), FUN = rev))
df1$c
[1] 6954 6279 3224 7070 1536 16089 10336 9350

Here's another way -
inds <- seq(nrow(df))
df$c <- df$a * df$b[inds + rep(c(1, -1), length.out = nrow(df))]
df
# a b c
#1 57 161 6954
#2 39 122 6279
#3 31 101 3224
#4 70 104 7070
#5 8 173 1536
#6 93 192 16089
#7 68 110 10336
#8 85 152 9350
Explanation -
We create an alternating 1 and -1 value and add it to the row number generate to get the corresponding b value to multiply with a.
inds
#[1] 1 2 3 4 5 6 7 8
rep(c(1, -1), length.out = nrow(df))
#[1] 1 -1 1 -1 1 -1 1 -1
inds + rep(c(1, -1), length.out = nrow(df))
#[1] 2 1 4 3 6 5 8 7

Subsetting a list of data frames by condition

Sorry I can't embed pictures yet
I have 21 data frames in a list (listb), all with the same headings of Timestamp, Rainfall
I would like to sort them by Rainfall (descending) and then subset the top 30 (to include the corresponding Timestamp) of each of the 21 data frames. Then put them back into a single dataframe with the name of the initial data frame as a heading?
Please find the list of data frames below, and a small cut from the b1 dataframe
Would I need to create a new dataframe for each of the new subsets then combine them into a list later?
Descending_b1 <- listb$b1[order(-Rainfall),]
b1_30 <- Descending_b1[1:30,1:2]
From that, I produce the following
b1_30 <- structure(list(Timestamp = c("25/1/2013", "24/1/2013", "2/2/2004",
"21/3/2010", "16/7/2016", "1/2/2010", "26/1/2007", "29/12/1998",
"24/2/2008", "5/2/2003", "6/2/2003", "11/11/2001", "3/12/2010",
"8/3/2020", "27/12/2010", "29/1/1998", "18/10/2017", "13/3/2007",
"5/4/2006", "10/6/2006", "19/11/2008", "20/2/2015", "26/3/2014",
"15/3/2017", "27/8/2011", "1/3/2013", "27/8/1998", "11/2/2012",
"11/2/2008", "26/1/2013"),
Rainfall = c(238L, 158L, 131L, 131L,129L, 122L, 112L, 109L, 101L, 94L,
92L, 88L, 82L, 81L, 78L, 74L, 71L, 69L, 65L, 64L, 64L,
64L, 63L, 63L, 62L, 61L, 60L, 60L, 58L,57L)),
row.names = c(5915L, 5914L, 2640L, 4874L, 7183L, 4826L, 3725L, 939L, 4118L, 2278L, 2279L, 1827L, 5131L, 8514L, 5155L,
605L, 7642L, 3771L, 3429L, 3495L, 4387L, 6671L, 6340L, 7425L,
5398L, 5950L, 815L, 5566L, 4105L, 5916L), class = "data.frame")
b1_30
#> Timestamp Rainfall
#> 5915 25/1/2013 238
#> 5914 24/1/2013 158
#> 2640 2/2/2004 131
#> 4874 21/3/2010 131
#> 7183 16/7/2016 129
#> 4826 1/2/2010 122
#> 3725 26/1/2007 112
#> 939 29/12/1998 109
#> 4118 24/2/2008 101
#> 2278 5/2/2003 94
#> 2279 6/2/2003 92
#> 1827 11/11/2001 88
#> 5131 3/12/2010 82
#> 8514 8/3/2020 81
#> 5155 27/12/2010 78
#> 605 29/1/1998 74
#> 7642 18/10/2017 71
#> 3771 13/3/2007 69
#> 3429 5/4/2006 65
#> 3495 10/6/2006 64
#> 4387 19/11/2008 64
#> 6671 20/2/2015 64
#> 6340 26/3/2014 63
#> 7425 15/3/2017 63
#> 5398 27/8/2011 62
#> 5950 1/3/2013 61
#> 815 27/8/1998 60
#> 5566 11/2/2012 60
#> 4105 11/2/2008 58
#> 5916 26/1/2013 57
So yeah I hope to do that with the rest of the data frames within the list to create a new data frame whilst keeping the initial data frame name, and then combine them into a new list

Suppose you have a list like this
set.seed(2021)
listb <- list(b1 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
Rainfall = sample(200:300, 10)),
b2 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
Rainfall = sample(200:300, 10)),
b3 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
Rainfall = sample(200:300, 10)))
> listb
$b1
Timestamp Rainfall
1 2010-01-08 275
2 2010-02-08 250
3 2010-02-16 259
4 2010-02-28 217
5 2010-01-13 298
6 2010-03-12 202
7 2010-03-06 245
8 2010-04-10 225
9 2010-03-11 235
10 2010-01-24 285
$b2
Timestamp Rainfall
1 2010-02-01 242
2 2010-04-09 258
3 2010-01-20 269
4 2010-03-10 285
5 2010-03-28 298
6 2010-01-06 262
7 2010-03-15 278
8 2010-03-05 233
9 2010-02-08 221
10 2010-01-19 215
$b3
Timestamp Rainfall
1 2010-03-21 216
2 2010-03-30 240
3 2010-01-18 230
4 2010-01-21 272
5 2010-03-10 292
6 2010-04-05 226
7 2010-03-14 210
8 2010-03-25 235
9 2010-03-09 237
10 2010-01-03 278
Now you need to do this only (Needless to say replace n argument in slice_max with your desired n=30)
purrr::map2_dfr(listb, names(listb), ~ .x %>%
mutate(list_name = .y) %>%
slice_max(Rainfall, n=5))
Timestamp Rainfall list_name
1 2010-01-13 298 b1
2 2010-01-24 285 b1
3 2010-01-08 275 b1
4 2010-02-16 259 b1
5 2010-02-08 250 b1
6 2010-03-28 298 b2
7 2010-03-10 285 b2
8 2010-03-15 278 b2
9 2010-01-20 269 b2
10 2010-01-06 262 b2
11 2010-03-10 292 b3
12 2010-01-03 278 b3
13 2010-01-21 272 b3
14 2010-03-30 240 b3
15 2010-03-09 237 b3
If you want to return the output back into a similar list
purrr::map(listb, ~ .x %>%
slice_max(Rainfall, n=5))
$b1
Timestamp Rainfall
1 2010-01-13 298
2 2010-01-24 285
3 2010-01-08 275
4 2010-02-16 259
5 2010-02-08 250
$b2
Timestamp Rainfall
1 2010-03-28 298
2 2010-03-10 285
3 2010-03-15 278
4 2010-01-20 269
5 2010-01-06 262
$b3
Timestamp Rainfall
1 2010-03-10 292
2 2010-01-03 278
3 2010-01-21 272
4 2010-03-30 240
5 2010-03-09 237

Change data set from wide to long while retaining group id, and also gathering columns [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 5 years ago.
I'd really appreciate some help getting this messy set of new survey data into a usable form. It was collected in a strange way and now I've got strange data to work with. I've looked through tidyr and used those approaches to no end. I suspect my problem is that I'm thinking about this dataset all wrong and I'm blind to some real answer. But given all the things I need to do to this df, I cant figure out where to start and thus where to start googling.
What I need:
For each person to be their own row
Each person retains their GroupID and Treated value
For the variables currently attached to each person individually to become columns (age, weight, height)
Fake (and much smaller):
structure(list(GroupID = 1:5, Treated = c("Y", "Y", "N", "Y",
"N"), person1_age = c(45L, 33L, 71L, 19L, 52L), person1_weight = c(187L,
145L, 136L, 201L, 168L), person1_height = c(69L, 64L, 51L, 70L,
66L), person2_age = c(54L, 20L, 48L, 63L, 26L), person2_weight = c(140L,
122L, 186L, 160L, 232L), person2_height = c(62L, 70L, 65L, 72L,
74L), person3_age = c(21L, 56L, 40L, 59L, 67L), person3_weight = c(112L,
143L, 187L, 194L, 159L), person3_height = c(61L, 69L, 73L, 63L,
72L)), .Names = c("GroupID", "Treated", "person1_age", "person1_weight",
"person1_height", "person2_age", "person2_weight", "person2_height",
"person3_age", "person3_weight", "person3_height"), row.names = c(NA,
5L), class = "data.frame")
Any help or further readings you could point me to would be very much appreciated.

reshape can do this, with the appropriate arguments:
> reshape(x, direction="long", varying=names(x)[3:11], timevar='person', v.names=c('height', 'age', 'weight'), sep='_')
GroupID Treated person height age weight id
1.1 1 Y 1 187 45 69 1
2.1 2 Y 1 145 33 64 2
3.1 3 N 1 136 71 51 3
4.1 4 Y 1 201 19 70 4
5.1 5 N 1 168 52 66 5
1.2 1 Y 2 140 54 62 1
2.2 2 Y 2 122 20 70 2
3.2 3 N 2 186 48 65 3
4.2 4 Y 2 160 63 72 4
5.2 5 N 2 232 26 74 5
1.3 1 Y 3 112 21 61 1
2.3 2 Y 3 143 56 69 2
3.3 3 N 3 187 40 73 3
4.3 4 Y 3 194 59 63 4
5.3 5 N 3 159 67 72 5
This relies on the order of the columns in your original data, for the varying argument, being in increasing order in the original data.
If that's not the case, specify varying manually. Here's what is used above:
> names(x)[3:11]
[1] "person1_age" "person1_weight" "person1_height" "person2_age" "person2_weight" "person2_height"
[7] "person3_age" "person3_weight" "person3_height"

We can also use melt from data.table which can take multiple patterns in the measure argument
library(data.table)
melt(setDT(x), measure = patterns("age$", "weight$", "height$"),
variable.name = "person", value.name = c("age", "weight", "height"))
# GroupID Treated person age weight height
# 1: 1 Y 1 45 187 69
# 2: 2 Y 1 33 145 64
# 3: 3 N 1 71 136 51
# 4: 4 Y 1 19 201 70
# 5: 5 N 1 52 168 66
# 6: 1 Y 2 54 140 62
# 7: 2 Y 2 20 122 70
# 8: 3 N 2 48 186 65
# 9: 4 Y 2 63 160 72
#10: 5 N 2 26 232 74
#11: 1 Y 3 21 112 61
#12: 2 Y 3 56 143 69
#13: 3 N 3 40 187 73
#14: 4 Y 3 59 194 63
#15: 5 N 3 67 159 72

Grouping the dataframe based on one variable

I have a dataframe with 10 variables all of them numeric, and one of the variable name is age, I want to group the observation based on age.example. age 17 to 18 one group, 19-22 another group and then each row should be attached to each group. And resulting should be a dataframe for further manipulations.
Model of the dataframe:
A B AGE
25 50 17
30 42 22
50 60 19
65 105 17
355 400 21
68 47 20
115 98 18
25 75 19
And I want result like
17-18
A B AGE
25 50 17
65 105 17
115 98 18
19-22
A B AGE
30 42 22
50 60 19
355 400 21
68 47 20
115 98 18
25 75 19
I did group the dataset according to Age var using the split function, now my concern is how I could manipulate the grouped data. Eg:the answer looked like
$1
A B AGE
25 50 17
65 105 17
115 98 18
$2
A B AGE
30 42 22
50 60 19
355 400 21
68 47 20
115 98 18
25 75 19
My question is how can I access each group for further manipulation?
for eg: if I want to do t-test for each group separately?

The split function will work with dataframes. Use either cut with 'breaks' or findInterval with an appropriate set of cutpoints (named 'vec' if you are using named parameters) as the criterion for grouping, the second argument to split. The default for cut is intervals closed on the right and default for findInterval is closed on the left.
> split(dat, findInterval(dat$AGE, c(17, 19.5, 22.5)))
$`1`
A B AGE
1 25 50 17
3 50 60 19
4 65 105 17
7 115 98 18
8 25 75 19
$`2`
A B AGE
2 30 42 22
5 355 400 21
6 68 47 20

Here is the approach with cut
lst <- split(df1, cut(df1$AGE, breaks=c(16, 18, 22), labels=FALSE))
lst
# $`1`
# A B AGE
#1 25 50 17
#4 65 105 17
#7 115 98 18
#$`2`
# A B AGE
#2 30 42 22
#3 50 60 19
#5 355 400 21
#6 68 47 20
#8 25 75 19
Update
If you need to find the sum, mean of columns for each "list" element
lapply(lst, function(x) rbind(colSums(x[-3]),colMeans(x[-3])))
But, if the objective is to find the summary statistics based on the group, it can be done using any of the aggregating functions
library(dplyr)
df1 %>%
group_by(grp=cut(AGE, breaks=c(16, 18, 22), labels=FALSE)) %>%
summarise_each(funs(sum=sum(., na.rm=TRUE),
mean=mean(., na.rm=TRUE)), A:B)
# grp A_sum B_sum A_mean B_mean
#1 1 205 253 68.33333 84.33333
#2 2 528 624 105.60000 124.80000
Or using aggregate from base R
do.call(data.frame,
aggregate(cbind(A,B)~cbind(grp=cut(AGE, breaks=c(16, 18, 22),
labels=FALSE)), df1, function(x) c(sum=sum(x), mean=mean(x))))
data
df1 <- structure(list(A = c(25L, 30L, 50L, 65L, 355L, 68L, 115L, 25L
), B = c(50L, 42L, 60L, 105L, 400L, 47L, 98L, 75L), AGE = c(17L,
22L, 19L, 17L, 21L, 20L, 18L, 19L)), .Names = c("A", "B", "AGE"
), class = "data.frame", row.names = c(NA, -8L))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Filling in multiple columns of missing data from another dataset - r

Related

Merge 2 data frames using common date, plus 2 rows before and n-1 rows after

Splitting data.frame into matrices and multiplying the diagonal elements to produce a new column

Subsetting a list of data frames by condition

Change data set from wide to long while retaining group id, and also gathering columns [duplicate]

Grouping the dataframe based on one variable

Categories

Resources