Summation of money amounts in character format by group - r

I have a data frame that contains the monetary transactions among individuals. The transactions can be two-way, i.e. A can transfer money to B and B can also transfer money to A. The structure of the data frame looks like below:
From To Amount
A B $100
A C $40
A D $30
B A $25
B C $70
C A $190
C D $110
I want to summarize the total amount of transactions among each pair of individuals who have transactions with each other and the results should be something like:
Individual_1 Individual_2 Sum
A B $125
A C $230
A D $30
B C $70
C D $110
I tried to utilize the grouping feature of the package dplyr but I think it does not apply to my case.

You can use pmin/pmax to sort From and To columns and sum the Amount value.
library(dplyr)
df %>%
group_by(col1 = pmin(From, To),
col2 = pmax(From, To)) %>%
summarise(Amount = sum(readr::parse_number(Amount)))
# col1 col2 Amount
# <chr> <chr> <dbl>
#1 A B 125
#2 A C 230
#3 A D 30
#4 B C 70
#5 C D 110
Using the same logic in base R you can do :
aggregate(Amount~col1 + col2,
transform(df, col1 = pmin(From, To), col2 = pmax(From, To),
Amount = as.numeric(sub('$', '', Amount, fixed = TRUE))), sum)
data
df <- structure(list(From = c("A", "A", "A", "B", "B", "C", "C"), To = c("B",
"C", "D", "A", "C", "A", "D"), Amount = c("$100", "$40", "$30",
"$25", "$70", "$190", "$110")), class = "data.frame", row.names = c(NA, -7L))

A solution using the tidyverse package. You need to find a way to create a common grouping column with the right order of the individuals. dat2 is the final output.
library(tidyverse)
dat2 <- dat %>%
mutate(Amount = as.numeric(str_remove(Amount, "\\$"))) %>%
mutate(Group = map2_chr(From, To, ~str_c(sort(c(.x, .y)), collapse = "_"))) %>%
group_by(Group) %>%
summarize(Sum = sum(Amount, na.rm = TRUE)) %>%
separate(Group, into = c("Individual_1", "Individual_2"), sep = "_") %>%
mutate(Sum = str_c("$", Sum))
print(dat2)
# # A tibble: 5 x 3
# Individual_1 Individual_2 Sum
# <chr> <chr> <chr>
# 1 A B $125
# 2 A C $230
# 3 A D $30
# 4 B C $70
# 5 C D $110
Data
dat <- read.table(text = "From To Amount
A B $100
A C $40
A D $30
B A $25
B C $70
C A $190
C D $110",
header = TRUE)

A complete solution without packages, based on #RonakShah's great pmin/pmax approach, using list notation in aggregate (in contrast to formula notation) which allows name assignment.
with(
transform(d, a=as.numeric(gsub("\\D", "", Amount)), b=pmin(From, To), c=pmax(From, To)),
aggregate(list(Sum=a), list(Individual_1=b, Individual_2=c), function(x)
paste0("$", sum(x))))
# Individual_1 Individual_2 Sum
# 1 A B $125
# 2 A C $230
# 3 B C $70
# 4 A D $30
# 5 C D $110
Data:
d <- structure(list(From = c("A", "A", "A", "B", "B", "C", "C"), To = c("B",
"C", "D", "A", "C", "A", "D"), Amount = c("$100", "$40", "$30",
"$25", "$70", "$190", "$110")), class = "data.frame", row.names = c(NA,
-7L))

Related

How to replace all values in a column with another value?

Suppose I have a data frame df with two columns:
id category
A 1
B 4
C 3
D 1
I want to replace the numbers in category with the following: 1 = "A", 2 = "B", 3 = "C", 4 = "D".
I.e. the output should be
id category
A A
B D
C C
D A
Does anyone know how to do this?
Here I propose three methods to achieve your goal.
Base R
If you have a vector of values for conversion, you can use match to find the index of the vector to replace the category column.
vec <- c("1" = "A", "2" = "B", "3" = "C", "4" = "D")
df$category <- vec[match(df$category, names(vec))]
dplyr
Use a case_when statement to match the values in category, and assign new strings to it.
library(dplyr)
df %>% mutate(category = case_when(category == 1 ~ "A",
category == 2 ~ "B",
category == 3 ~ "C",
category == 4 ~ "D",
TRUE ~ NA_character_))
left_join from dplyr
Or if you have a dataframe with two columns specifying values for conversion, you can left_join them. Here, the dataframe for conversion is created by enframe.
left_join(df, enframe(vec), by = c("category" = "name")) %>% select(-value)
Output
id category
1 A A
2 B D
3 C C
4 D A
Data
df <- structure(list(id = c("A", "B", "C", "D"), category = c("A",
"D", "C", "A")), row.names = c(NA, -4L), class = "data.frame")
A possible solution:
library(tidyverse)
df %>%
mutate(category = LETTERS[category])
#> id category
#> 1 A A
#> 2 B D
#> 3 C C
#> 4 D A

Filter rows which has at least two of particular values

I have a data frame like this.
df
Languages Order Machine Company
[1] W,X,Y,Z,H,I D D B
[2] W,X B A G
[3] W,I E B A
[4] H,I B C B
[5] W G G C
I want to get the number of rows where languages has 2 out of 3 values among W,H,I.
The result should be: 3 because row 1, row 3 and row 4 contains at least 2 values out of the3 values among W,H,I
You can use strsplit on df$Languages and take the intersect with W,H,I. Then get the lengths of this result and use which to get those which have more than 1 >1.
sum(lengths(sapply(strsplit(df$Languages, ",", TRUE), intersect, c("W","H","I"))) > 1)
#[1] 3
You can use :
sum(sapply(strsplit(df$Languages, ','), function(x)
sum(c("W","H","I") %in% x) >= 2))
#[1] 3
data
df<- structure(list(Languages = c("W,X,Y,Z,H,I", "W,X", "W,I", "H,I",
"W"), Order = c("D", "B", "E", "B", "G"), Machine = c("D", "A",
"B", "C", "G"), Company = c("B", "G", "A", "B", "C")),
class = "data.frame", row.names = c(NA, -5L))
a tidyverse approach
df %>% filter(map_int(str_split(Languages, ','), ~ sum(.x %in% c('W', 'H', 'I'))) >= 2)
Languages Order Machine Company
1 W,X,Y,Z,H,I D D B
2 W,I E B A
3 H,I B C B

how to add a column to identify specific combination of values in R?

I have a database with several columns ( >20) and 2 of these columns have the subject names. I would like to add another column with inside a number that identifies the combination of the two subjects.
Here is an example with only the 2 columns of names (I don't include the others for convenience):
ID1 ID2
A B
A C
A B
B C
A B
B A
C B
And here is what i would like to create:
ID1 ID2 CODE
A B 1
A C 2
A B 1
B C 3
A B 1
B A 1
C B 3
I am kind of new in R and I think it can be done with stringr but I am not sure how
Thanks for the help!
Simo
df$CODE <- as.integer(
factor(
apply(df, 1, function(x) paste0(sort(x), collapse = ""))
)
)
# ID1 ID2 CODE
# 1 A B 1
# 2 A C 2
# 3 A B 1
# 4 B C 3
# 5 A B 1
# 6 B A 1
# 7 C B 3
Data
df <- data.frame(
ID1 = c("A", "A", "A", "B", "A", "B", "C"),
ID2 = c("B", "C", "B", "C", "B", "A", "B")
)
Try this:
library(dplyr)
#Code
new <- df %>% rowwise() %>%
mutate(Var = paste0(sort(c(ID1, ID2)), collapse = '')) %>%
group_by(Var) %>%
mutate(CODE=cur_group_id()) %>%
ungroup() %>%
select(-Var)
Output:
# A tibble: 7 x 3
ID1 ID2 CODE
<chr> <chr> <int>
1 A B 1
2 A C 2
3 A B 1
4 B C 3
5 A B 1
6 B A 1
7 C B 3
Some data used:
#Data
df <- structure(list(ID1 = c("A", "A", "A", "B", "A", "B", "C"), ID2 = c("B",
"C", "B", "C", "B", "A", "B")), class = "data.frame", row.names = c(NA,
-7L))

left_join in a for loop with different columns names

I have a data.frame called a whose structure is similar to:-
a <- data.frame(X1=c("A", "B", "C", "A", "C", "D"),
X2=c("B", "C", "D", "A", "B", "A"),
X3=c("C", "D", "A", "B", "A", "B")
)
And I have another set which is:-
b <- data.frame(Xn=c("A", "B", "C", "D"),
Feature=c("some", "more", "what", "why"))
I want to add all the Features from set b to set a, such that X1, X2 and X3 have their corresponding feature column in set a. In other words, the columns in set a become:-
colnames(a) <- c("X1", "X2", "X3", "Features1", "Features2", "Features3")
How can I do this using a left_join in a for loop??
In base R, we can unlist a dataframe and match it with b$Xn to get corresponding Feature value. We can cbind this dataframe to original dataframe to get final answer.
temp <- a
temp[] <- b$Feature[match(unlist(temp), b$Xn)]
names(temp) <- paste0('Feature', seq_along(temp))
cbind(a, temp)
# X1 X2 X3 Feature1 Feature2 Feature3
#1 A B C some more what
#2 B C D more what why
#3 C D A what why some
#4 A A B some some more
#5 C B A what more some
#6 D A B why some more
In tidyverse, we can get the data in long format, join the data and get it back to wide format.
library(dplyr)
library(tidyr)
a %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row) %>%
left_join(b, by = c('value' = 'Xn')) %>%
select(-value) %>%
pivot_wider(names_from = name, values_from = Feature) %>%
select(-row) %>%
rename_all(~paste0('Feature', seq_along(.))) %>%
bind_cols(a, .)
This can be done by using mutate_all to recode all of the columns in a:
library(tidyverse)
a %>%
mutate_all(funs(feat=recode(., !!!set_names(as.character(b$Feature), b$Xn))))
X1 X2 X3 X1_feat X2_feat X3_feat
1 A B C some more what
2 B C D more what why
3 C D A what why some
4 A A B some some more
5 C B A what more some
6 D A B why some more
You can add a rename_at to get the desired names:
a %>%
mutate_all(funs(f=recode(., !!!set_names(as.character(b$Feature), b$Xn)))) %>%
rename_at(vars(matches("f")), ~gsub(".([0-9]).*", "Feature\\1", .))
X1 X2 X3 Feature1 Feature2 Feature3
1 A B C some more what
2 B C D more what why
3 C D A what why some
4 A A B some some more
5 C B A what more some
6 D A B why some more

Convert information from rows to new columns

Is there a way in R to place every three values in the column "V" (below) to new columns? In others words, I need to reshape the data from long to wide, but only to three columns and where the values are what appears in column V. Below is a demonstration.
Thank you in advance!
data = structure(list(Key = c(200, 200, 200, 200, 200, 200, 300, 300,
300, 300, 300, 300, 400, 400, 400, 400, 400, 400),
V = c("a", "b", "c", "b", "d", "c", "d", "b", "c", "a", "f", "c", "d", "b",
"c", "a", "b", "c")),
row.names = c(NA, 18L),
class = "data.frame")
Here is one option
data %>%
group_by(Key) %>%
mutate(
grp = gl(n() / 3, 3),
col = c("x", "y", "z")[(row_number() + 2) %% 3 + 1]) %>%
group_by(Key, grp) %>%
spread(col, V) %>%
ungroup() %>%
select(-grp)
## A tibble: 6 x 4
# Key x y z
# <dbl> <chr> <chr> <chr>
#1 200 a b c
#2 200 b d c
#3 300 d b c
#4 300 a f c
#5 400 d b c
#6 400 a b c
Note: This assumes that the number of entries per Key is divisible by 3.
Instead of grp = gl(n() / 3, 3) you can also use grp = rep(1:(n() / 3), each = 3).
Update
In response to your comments, let's create sample data by removing some rows from data such that for Key = 200 and Key = 300 we don't have a multiple of 3 V entries.
data2 <- data %>% slice(-c(1, 8))
Then we can do
data2 %>%
group_by(Key) %>%
mutate(grp = gl(ceiling(n() / 3), 3)[1:n()]) %>%
group_by(Key, grp) %>%
mutate(col = c("x", "y", "z")[1:n()]) %>%
spread(col, V) %>%
ungroup() %>%
select(-grp)
## A tibble: 6 x 4
# Key x y z
# <dbl> <chr> <chr> <chr>
#1 200 b c b
#2 200 d c NA
#3 300 d c a
#4 300 f c NA
#5 400 d b c
#6 400 a b c
Note how "missing" values are filled with NA.

Resources