how to get max from multiple columns in R - r

INPUT:-
year max1 max2 max3
2001 10 101 87
2002 103 19 88
2003 21 23 89
2004 27 28 91
OUTPUT:-
YEAR MAX
2001 101
2002 103
2003 89
2004 91

A dplyr solution:
Data:
df <- fread(" year max1 max2 max3
2001 10 101 87
2002 103 19 88
2003 21 23 89
2004 27 28 91 ")
Code:
library(dplyr)
df %>%
rowwise() %>%
mutate(MAX = max(max1, max2, max3)) %>%
select(year, MAX)
Output:
# A tibble: 4 x 2
# Rowwise:
year MAX
<int> <int>
1 2001 101
2 2002 103
3 2003 89
4 2004 91

A vectorized dplyr solution with pmax(), which facilitates tidyselection while avoiding rowwise() inefficiency:
Solution
Once you've set everything up
library(dplyr)
# ...
# Code to generate 'your_data'.
# ...
here is your solution for the columns {max1, max2, max3}
your_data %>% transmute(YEAR = year, MAX = pmax(max1, max2, max3))
and more generally for all columns of the form max*:
your_data %>% transmute(YEAR = year, MAX = do.call(pmax, unname(across(
# Tidy selection of 'max*' columns:
starts_with("max")
))))
At your discretion, you can replace starts_with() with another selection helper like matches("^max\\d+$").
Results
Given your_data reproduced here
your_data <- structure(
list(
year = c(2001, 2002, 2003, 2004),
max1 = c(10, 103, 21, 27),
max2 = c(101, 19, 23, 28),
max3 = c(87, 88, 89, 91)
),
row.names = c(NA, -4L),
class = "data.frame"
)
this tidy workflow should yield the following data.frame:
YEAR MAX
1 2001 101
2 2002 103
3 2003 89
4 2004 91

This should do it:
OUTPUT = data.frame(
YEAR = INPUT$year,
MAX = apply(INPUT[-1], 1, max)
)

Here's a slightly different option using dplyr::c_across() which gives you handy access to tidyselect semantics
library(tidyverse)
d <- structure(list(year = 2001:2004, max1 = c(10L, 103L, 21L, 27L), max2 = c(101L, 19L, 23L, 28L), max3 = c(87L, 88L, 89L, 91L)), class = "data.frame", row.names = c(NA, -4L))
d %>%
rowwise() %>%
mutate(max = max(c_across(starts_with("max"))), .keep = "unused") %>%
ungroup()
#> # A tibble: 4 x 2
#> year max
#> <int> <int>
#> 1 2001 101
#> 2 2002 103
#> 3 2003 89
#> 4 2004 91
Benchmarking
For what it's worth there are some performance differences which would probably only matter if your dataset is very large but worth noting. The solution from #Gregor Thomas is by far the fastest.
library(microbenchmark)
microbenchmark(
# Dan Adams
c_across = d %>%
rowwise() %>%
mutate(max = max(c_across(starts_with("max"))), .keep = "unused") %>%
ungroup(),
# MonJeanJean
max_all = d %>%
rowwise() %>%
mutate(MAX = max(max1, max2, max3)) %>%
select(year, MAX),
# Greg
do.call = d %>%
transmute(YEAR = year, MAX = do.call(pmax, unname(across(starts_with(
"max"
))))),
# Gregor Thomas
apply = data.frame(
year = d$year,
max = apply(d[-1], 1, max))
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> c_across 6928.9 8123.20 9548.682 9187.00 10709.95 17595.6 100 c
#> max_all 7890.5 9016.35 10473.327 10176.90 11366.65 16389.5 100 d
#> do.call 3392.3 3976.20 4609.419 4473.55 4981.30 9282.5 100 b
#> apply 349.1 470.20 567.896 535.05 670.70 1017.7 100 a
Created on 2022-02-08 by the reprex package (v2.0.1)

Related

How to choose column with the largest number by row

I have a data frame. Each row is a separate person. I need to create a data frame that only shows the latest "date" and "salary" per row. Below is an example of the data frame I'm starting with:
example_df <- tribble(
~person_id, ~date1, ~date2, ~date3, ~salaary1, ~salary2, ~salary3,
1, 2010, 2013, 2015, 100, 200, 300,
2, 1998, NA, NA, 50, NA, NA,
3, 2000, 2001, NA, 100, 200, NA,
4, 1987, 1989, 2005, 50, 300, 500
)
This is what I need the data frame to look like after processing:
example_clean_df <- tribble(
~person_id, ~date, ~salaary,
1, 2015,300,
2, 1998, 50,
3, 2001, 200,
4, 2005, 500
)
Any ideas would be super helpful. Thank you!
Does this work:
library(dplyr)
example_df %>%
rowwise() %>%
mutate(date = max(date1, date2, date3, na.rm = 1),
salary = max(salaary1, salary2, salary3, na.rm = 1)) %>%
select(person_id, date, salary)
# A tibble: 4 × 3
# Rowwise:
person_id date salary
<dbl> <dbl> <dbl>
1 1 2015 300
2 2 1998 50
3 3 2001 200
4 4 2005 500
Use pivot_longer and slice_max:
library(dplyr)
library(tidyr)
example_df %>%
pivot_longer(-person_id, names_pattern = "(date|salary)(\\d)", names_to = c(".value", "number")) %>%
group_by(person_id) %>%
slice_max(salary) %>%
select(-number)
output
# A tibble: 4 × 3
# Groups: person_id [4]
person_id date salary
<dbl> <dbl> <dbl>
1 1 2015 300
2 2 1998 50
3 3 2001 200
4 4 2005 500
The primitive base r / for loop version:
result.df <- list()
for(i in seq(nrow(example_df))){
result.df[[i]] <- cbind(example_df[1][i,],
max(example_df[,2:4][i,], na.rm = T),
max(example_df[,5:7][i,], na.rm = T))
}
result.df <- setNames(do.call(rbind, result.df), c('person_id', 'date', 'salary'))
person_id date salary
1 1 2015 300
2 2 1998 50
3 3 2001 200
4 4 2005 500
PS:
microbenchmark suggests using #Karthik S method, as it is the fastest.
# test1: loop base R
# test2: rowwise mutate
# test3: pivot_longer
min lq mean median uq max neval
9.9957 10.41815 11.858530 10.86845 11.97645 21.8334 100
7.6594 7.96195 9.457389 8.29315 9.49365 25.9524 100
12.0949 12.49685 14.080567 12.83050 13.85300 26.6272 100
PPS:
using lapply speeds up the process:
result.df <- lapply(seq(nrow(example_df)), \(x) cbind(example_df[1][x,],
max(example_df[,2:4][x,], na.rm = T),
max(example_df[,5:7][x,], na.rm = T))) %>%
do.call(rbind, .) %>%
setNames(c('person_id', 'date', 'salary'))
min lq mean median uq max neval
3.6828 3.89075 5.754244 4.41195 7.14130 14.0325 100

Extract data based on time to death

Hi I'm analysing the pattern of spending for individuals before they died. My dataset contains individuals' monthly spending and their dates of death. The dataset looks similar to this:
ID 2018_11 2018_12 2019_01 2019_02 2019_03 2019_04 2019_05 2019_06 2019_07 2019_08 2019_09 2019_10 2019_11 2019_12 2020_01 date_of_death
A 15 14 6 23 23 5 6 30 1 15 6 7 8 30 1 2020-01-02
B 2 5 6 7 7 8 9 15 12 14 31 30 31 0 0 2019-11-15
Each column denotes the month of the year. For example, "2018_11" means November 2018. The number in each cell denotes the spending in that specific month.
I would like to construct a data frame which contains the spending data of each individual in their last 0-12 months. It will look like this:
ID last_12_month last_11_month ...... last_1_month last_0_month date_of_death
A 6 23 30 1 2020-01-02
B 2 5 30 31 2019-11-15
Each individual died at different time. For example, individual A died on 2020-01-02, so the data of the "last_0_month" for this person should be extracted from the column "2020_01", and that of "last_12_month" extracted from "2019_01"; individual B died on 2019-11-15, so the data of "last_0_month" for this person should be extracted from the column "2019_11", and that of "last_12_month" should be extracted from the column "2018_11".
I will be really grateful for your help.
Using data.table and lubridate packages
library(data.table)
library(lubridate)
setDT(dt)
dt <- melt(dt, id.vars = c("ID", "date_of_death"))
dt[, since_death := interval(ym(variable), ymd(date_of_death)) %/% months(1)]
dt <- dcast(dt[since_death %between% c(0, 12)], ID + date_of_death ~ since_death, value.var = "value", fun.aggregate = sum)
setcolorder(dt, c("ID", "date_of_death", rev(names(dt)[3:15])))
setnames(dt, old = names(dt)[3:15], new = paste("last", names(dt)[3:15], "month", sep = "_"))
Results
dt
# ID date_of_death last_12_month last_11_month last_10_month last_9_month last_8_month last_7_month last_6_month last_5_month last_4_month last_3_month
# 1: A 2020-01-02 6 23 23 5 6 30 1 15 6 7
# 2: B 2019-11-15 2 5 6 7 7 8 9 15 12 14
# last_2_month last_1_month last_0_month
# 1: 8 30 1
# 2: 31 30 31
Data
dt <- structure(list(ID = c("A", "B"), `2018_11` = c(15L, 2L), `2018_12` = c(14L,
5L), `2019_01` = c(6L, 6L), `2019_02` = c(23L, 7L), `2019_03` = c(23L,
7L), `2019_04` = c(5L, 8L), `2019_05` = c(6L, 9L), `2019_06` = c(30L,
15L), `2019_07` = c(1L, 12L), `2019_08` = 15:14, `2019_09` = c(6L,
31L), `2019_10` = c(7L, 30L), `2019_11` = c(8L, 31L), `2019_12` = c(30L,
0L), `2020_01` = 1:0, date_of_death = structure(c(18263L, 18215L
), class = c("IDate", "Date"))), row.names = c(NA, -2L), class = c("data.frame"))
here you can find a similar approach to the one presented by #RuiBarradas but using lubridate for extracting the difference in months:
library(dplyr)
library(tidyr)
library(lubridate)
# Initial data
df <- structure(list(
ID = c("A", "B"),
`2018_11` = c(15, 2),
`2018_12` = c(14, 5),
`2019_01` = c(6, 6),
`2019_02` = c(23, 7),
`2019_03` = c(23, 7),
`2019_04` = c(5, 8),
`2019_05` = c(6, 9),
`2019_06` = c(30, 15),
`2019_07` = c(1, 12),
`2019_08` = c(15, 14),
`2019_09` = c(6, 31),
`2019_10` = c(7, 30),
`2019_11` = c(8, 31),
`2019_12` = c(30, 0),
`2020_01` = c(1, 0),
date_of_death = c("2020-01-02", "2019-11-15")
),
row.names = c(NA, -2L),
class = "data.frame"
)
# Convert to longer all cols that start with 20 (e.g. 2020, 2021)
df_long <- df %>%
pivot_longer(starts_with("20"), names_to = "month")
# treatment
df_long <- df_long %>%
mutate(
# To date, just in case
date_of_death = as.Date(date_of_death),
# Need to reformat the colnames from (e.g.) 2021_01 to 2021-01-01
month_fmt = as.Date(paste0(gsub("_", "-", df_long$month), "-01")),
# End of month
month_fmt = ceiling_date(month_fmt, "month") - days(1),
# End of month for month of death
date_of_death_eom = ceiling_date(date_of_death, "month") - days(1),
# Difference in months (using end of months
month_diff = round(time_length(
interval(month_fmt, date_of_death_eom),"month"),0)) %>%
# Select only months bw 0 and 12
filter(month_diff %in% 0:12) %>%
# Create labels for the next step
mutate(labs = paste0("last_", month_diff,"_month"))
# To wider
end <- df_long %>%
pivot_wider(
id_cols = c(ID, date_of_death),
names_from = labs,
values_from = value
)
end
#> # A tibble: 2 x 15
#> ID date_of_death last_12_month last_11_month last_10_month last_9_month
#> <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 A 2020-01-02 6 23 23 5
#> 2 B 2019-11-15 2 5 6 7
#> # ... with 9 more variables: last_8_month <dbl>, last_7_month <dbl>,
#> # last_6_month <dbl>, last_5_month <dbl>, last_4_month <dbl>,
#> # last_3_month <dbl>, last_2_month <dbl>, last_1_month <dbl>,
#> # last_0_month <dbl>
Created on 2022-03-09 by the reprex package (v2.0.1)
Here is a tidyverse solution.
Reshape the data to long format, coerce the date columns to class "Date", use Dirk Eddelbuettel's accepted answer to this question to compute the date differences in months and keep the rows with month differences between 0 and 12.
This grouped long format is probably more useful and I compute means by group and plot the spending of the last 12 months prior to death but since the question asks for a wide format, the output data set spending12_wide is created.
options(width=205)
df1 <- read.table(text = "
ID 2018_11 2018_12 2019_01 2019_02 2019_03 2019_04 2019_05 2019_06 2019_07 2019_08 2019_09 2019_10 2019_11 2019_12 2020_01 date_of_death
A 15 14 6 23 23 5 6 30 1 15 6 7 8 30 1 2020-01-02
B 2 5 6 7 7 8 9 15 12 14 31 30 31 0 0 2019-11-15
", header = TRUE, check.names = FALSE)
suppressPackageStartupMessages(library(dplyr))
library(tidyr)
library(ggplot2)
# Dirk's functions
monnb <- function(d) {
lt <- as.POSIXlt(as.Date(d, origin = "1900-01-01"))
lt$year*12 + lt$mon
}
# compute a month difference as a difference between two monnb's
diffmon <- function(d1, d2) { monnb(d2) - monnb(d1) }
spending12 <- df1 %>%
pivot_longer(cols = starts_with('20'), names_to = "month") %>%
mutate(month = as.Date(paste0(month, "_01"), "%Y_%m_%d"),
date_of_death = as.Date(date_of_death)) %>%
group_by(ID, date_of_death) %>%
mutate(diffm = diffmon(month, date_of_death)) %>%
filter(diffm >= 0 & diffm <= 12)
spending12 %>% summarise(spending = mean(value), .groups = "drop")
#> # A tibble: 2 x 3
#> ID date_of_death spending
#> <chr> <date> <dbl>
#> 1 A 2020-01-02 12.4
#> 2 B 2019-11-15 13.6
spending12_wide <- spending12 %>%
mutate(month = zoo::as.yearmon(month)) %>%
pivot_wider(
id_cols = c(ID, date_of_death),
names_from = diffm,
names_glue = "last_{.name}_month",
values_from = value
)
spending12_wide
#> # A tibble: 2 x 15
#> # Groups: ID, date_of_death [2]
#> ID date_of_death last_12_month last_11_month last_10_month last_9_month last_8_month last_7_month last_6_month last_5_month last_4_month last_3_month last_2_month last_1_month last_0_month
#> <chr> <date> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 A 2020-01-02 6 23 23 5 6 30 1 15 6 7 8 30 1
#> 2 B 2019-11-15 2 5 6 7 7 8 9 15 12 14 31 30 31
ggplot(spending12, aes(month, value, color = ID)) +
geom_line() +
geom_point()
Created on 2022-03-09 by the reprex package (v2.0.1)

Counting the number of values that are more than 60 for each row

I have a data frame that looks like this:
location td1_2019 td2_2019 td3_2019 td4_2019 td1_2020 td2_2020 td3_2020 td4_2020
1 a 50 55 60 58 63 55 60 58
2 b 45 65 57 50 61 66 62 59
3 c 61 66 62 59 45 65 57 50
here, td1_2019 = temperature day1 in 2019 ... and so on
I want count the number of days temperature was above 60 for both 2019 and 2020 for each location. I want the table to look like the following:
location 2019 2020
1 a 1 2
2 b 1 3
3 c 3 1
I am using R, so I would prefer a solution in R. Any help would be appreciated! Thank you!
A dplyr solution
library(dplyr)
df1 %>%
pivot_longer(
-location,
names_to = c("day", "year"),
names_pattern = "td(\\d)_(\\d{4})",
values_to = "temperature"
) %>%
group_by(year, location) %>%
summarise(n = sum(temperature >= 60)) %>%
pivot_wider(names_from = "year", values_from = "n")
A Base R solution
nms <- names(df1)
cond <- df1 >= 60
Reduce(
function(out, y) `[[<-`(out, y, value = rowSums(cond[, which(grepl(y, nms))])),
c("2019", "2020"),
init = df1[, "location", drop = FALSE]
)
Output
location `2019` `2020`
<chr> <int> <int>
1 a 1 2
2 b 1 3
3 c 3 1
Assume that df1 looks like this
> df1
# A tibble: 3 x 9
location td1_2019 td2_2019 td3_2019 td4_2019 td1_2020 td2_2020 td3_2020 td4_2020
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 50 55 60 58 63 55 60 58
2 b 45 65 57 50 61 66 62 59
3 c 61 66 62 59 45 65 57 50
Does this work: I think you want something more year wise.
> library(dplyr)
> temp %>% pivot_longer(-location, names_to = c('td', 'year'), names_pattern = '(.*)_(.*)', values_to = 'temp') %>%
+ filter(temp >= 60) %>% count(location, year, name = 'Count') %>%
+ pivot_wider(location, names_from = year, values_from = Count, values_fill = list(Count = 0))
# A tibble: 3 x 3
location `2019` `2020`
<chr> <int> <int>
1 a 1 2
2 b 1 3
3 c 3 1
>
You can use the following tidy solution. Just as in the other solutions posted (which are very nice), a key move is to get the data in a long format using pivot_longer().
library(dplyr)
library(tidyr)
library(stringr)
data %>%
pivot_longer(-location) %>%
mutate(year = str_sub(name, -2)) %>%
group_by(location, year) %>%
mutate(above60 = sum(value >= 60)) %>%
ungroup() %>%
distinct(location, year, above60) %>%
pivot_wider(names_from = year, values_from = above60)
# location `19` `20`
# <chr> <int> <int>
# 1 a 1 2
# 2 b 1 3
# 3 c 3 1
data
structure(list(location = c("a", "b", "c"), td1_2019 = c(50,
45, 61), td2_2019 = c(55, 65, 66), td3_2019 = c(60, 57, 62),
td4_2019 = c(58, 50, 59), td1_2020 = c(63, 61, 45), td2_2020 = c(55,
66, 65), td3_2020 = c(60, 62, 57), td4_2020 = c(58, 59, 50
)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))
A base R option
cbind(
df[1],
list2DF(
lapply(
split.default(
as.data.frame(df[-1] >= 60),
gsub(".*?(\\d+)$", "\\1", names(df)[-1],
perl = TRUE
)
),
rowSums
)
)
)
which gives
location 2019 2020
1 a 1 2
2 b 1 3
3 c 3 1

R Replace NA for all Columns Except *

library(tidyverse)
df <- tibble(Date = c(rep(as.Date("2020-01-01"), 3), NA),
col1 = 1:4,
thisCol = c(NA, 8, NA, 3),
thatCol = 25:28,
col999 = rep(99, 4))
#> # A tibble: 4 x 5
#> Date col1 thisCol thatCol col999
#> <date> <int> <dbl> <int> <dbl>
#> 1 2020-01-01 1 NA 25 99
#> 2 2020-01-01 2 8 26 99
#> 3 2020-01-01 3 NA 27 99
#> 4 NA 4 3 28 99
My actual R data frame has hundreds of columns that aren't neatly named, but can be approximated by the df data frame above.
I want to replace all values of NA with 0, with the exception of several columns (in my example I want to leave out the Date column and the thatCol column. I'd want to do it in this sort of fashion:
df %>% replace(is.na(.), 0)
#> Error: Assigned data `values` must be compatible with existing data.
#> i Error occurred for column `Date`.
#> x Can't convert <double> to <date>.
#> Run `rlang::last_error()` to see where the error occurred.
And my unsuccessful ideas for accomplishing the "everything except" replace NA are shown below.
df %>% replace(is.na(c(., -c(Date, thatCol)), 0))
df %>% replace_na(list([, c(2:3, 5)] = 0))
df %>% replace_na(list(everything(-c(Date, thatCol)) = 0))
Is there a way to select everything BUT in the way I need to? There's hundred of columns, named inconsistently, so typing them one by one is not a practical option.
You can use mutate_at :
library(dplyr)
Remove them by Name
df %>% mutate_at(vars(-c(Date, thatCol)), ~replace(., is.na(.), 0))
Remove them by position
df %>% mutate_at(-c(1,4), ~replace(., is.na(.), 0))
Select them by name
df %>% mutate_at(vars(col1, thisCol, col999), ~replace(., is.na(.), 0))
Select them by position
df %>% mutate_at(c(2, 3, 5), ~replace(., is.na(.), 0))
If you want to use replace_na
df %>% mutate_at(vars(-c(Date, thatCol)), tidyr::replace_na, 0)
Note that mutate_at is soon going to be replaced by across in dplyr 1.0.0.
You have several options here based on data.table.
One of the coolest options: setnafill (version >= 1.12.4):
library(data.table)
setDT(df)
data.table::setnafill(df,fill = 0, cols = colnames(df)[!(colnames(df) %in% c("Date", thatCol)]))
Note that your dataframe is updated by reference.
Another base solution:
to_change<-grep("^(this|col)",names(df))
df[to_change]<- sapply(df[to_change],function(x) replace(x,is.na(x),0))
df
# A tibble: 4 x 5
Date col1 thisCol thatCol col999
<date> <dbl> <dbl> <int> <dbl>
1 2020-01-01 1 0 25 99
2 2020-01-01 2 8 26 99
3 2020-01-01 3 0 27 99
4 NA 0 3 28 99
Data(I changed one value):
df <- structure(list(Date = structure(c(18262, 18262, 18262, NA), class = "Date"),
col1 = c(1L, 2L, 3L, NA), thisCol = c(NA, 8, NA, 3), thatCol = 25:28,
col999 = c(99, 99, 99, 99)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
replace works on a data.frame, so we can just do the replacement by index and update the original dataset
df[-c(1, 4)] <- replace(df[-c(1, 4)], is.na(df[-c(1, 4)]), 0)
Or using replace_na with across (from the new dplyr)
library(dplyr)
library(tidyr)
df %>%
mutate(across(-c(Date, thatCol), ~ replace_na(., 0)))
If you know the ones that you don't want to change, you could do it like this:
df <- tibble(Date = c(rep(as.Date("2020-01-01"), 3), NA),
col1 = 1:4,
thisCol = c(NA, 8, NA, 3),
thatCol = 25:28,
col999 = rep(99, 4))
#dplyr
df_nonreplace <- select(df, c("Date", "thatCol"))
df_replace <- df[ ,!names(df) %in% names(df_nonreplace)]
df_replace[is.na(df_replace)] <- 0
df <- cbind(df_nonreplace, df_replace)
> head(df)
Date thatCol col1 thisCol col999
1 2020-01-01 25 1 0 99
2 2020-01-01 26 2 8 99
3 2020-01-01 27 3 0 99
4 <NA> 28 4 3 99

Apply a function to all pairs in the same group

I want to apply a function to all pairs of items in the same group e.g.
Example input:
Group Item Value
A 1 89
A 2 76
A 3 2
B 4 21
B 5 10
The desired output is a vector of the function output for all items in the same group.
e.g. for arguments sake if the function was:
addnums=function(x,y){
x+y
}
Then the desired output would be:
165, 91, 78, 31
I have tried to do this using summarize in the dplyr package but this can only be used if the output is a single value.
We can split Value for each Group and then use combn to calculate sum for each pair.
sapply(split(df$Value, df$Group), combn, 2, sum)
#$A
#[1] 165 91 78
#$B
#[1] 31
If needed as one vector we can use unlist.
unlist(sapply(split(df$Value, df$Group), combn, 2, sum), use.names = FALSE)
#[1] 165 91 78 31
If you are interested in tidyverse solution using the same logic we can do
library(dplyr)
library(purrr)
df %>%
group_split(Group) %>%
map(~combn(.x %>% pull(Value), 2, sum)) %>% flatten_dbl
#[1] 165 91 78 31
We can use a group by option with data.table
library(data.table)
setDT(df1)[, combn(Value, 2, FUN = sum), Group]
# Group V1
#1: A 165
#2: A 91
#3: A 78
#4: B 31
If we want to use addnums from the OP's post
setDT(df1)[, combn(Value, 2, FUN = function(x) addnums(x[1], x[2])), Group]
# Group V1
#1: A 165
#2: A 91
#3: A 78
#4: B 31
Or using tidyverse
library(dplyr)
library(tidyr)
df1 %>%
group_by(Group) %>%
summarise(Sum = list(combn(Value, 2, FUN = sum))) %>%
unnest
# A tibble: 4 x 2
# Group Sum
# <chr> <int>
#1 A 165
#2 A 91
#3 A 78
#4 B 31
Using addnums
df1 %>%
group_by(Group) %>%
summarise(Sum = list(combn(Value, 2, FUN =
function(x) addnums(x[1], x[2])))) %>%
unnest
Or using base R with aggregate
aggregate(Value ~ Group, df1, FUN = function(x) combn(x, 2, FUN = sum))
data
df1 <- structure(list(Group = c("A", "A", "A", "B", "B"), Item = 1:5,
Value = c(89L, 76L, 2L, 21L, 10L)), class = "data.frame", row.names = c(NA,
-5L))

Resources