Mutate within a for loop

Mutate within a for loop - r

I have a dataframe like this
structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4,
2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4,
5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4,
2, 2, 7, 5, 2)), .Names = c("Love_ABC", "Love_CNN", "Hate_ABC", "Hate_CNN", "Love_CNBC", "Hate_CNBC"), row.names = c(NA,
8L), class = "data.frame")
I have made the following for loop
channels = c("ABC", "CNN", "CNBC")
for (channel in channels) {
dataframe <- dataframe %>%
mutate(ALL_channel = Love_channel + Hate_channel)
}
But when i run the for loop R tells me " object Love_channel" not found. Have i done something wrong in the for loop?

Here's a way with rlang. Note, reshaping the data is likely more straightforward. Non-standard evaluation (NSE) is a complicated topic.
for (channel in channels) {
DF <- DF %>%
mutate(!!sym(paste0("ALL_", channel)) := !!sym(paste0("Love_", channel)) + !!sym(paste0("Hate_", channel)))
}
DF
## Love_ABC Love_CNN Hate_ABC Hate_CNN Love_CNBC Hate_CNBC ALL_ABC ALL_CNN ALL_CNBC
## 1 1 1 6 6 1 2 7 7 3
## 2 3 3 3 2 2 3 6 5 5
## 3 4 4 6 4 4 4 10 8 8
## 4 6 2 5 5 5 2 11 7 7
## 5 3 6 3 3 6 2 6 9 8
## 6 2 7 6 7 7 7 8 14 14
## 7 5 2 5 2 6 5 10 4 11
## 8 1 6 3 6 3 2 4 12 5

This is a solution with dplyr and tidyr:
library(tidyr)
library(dplyr)
dataframe <- dataframe %>%
tibble::rowid_to_column()
dataframe %>%
pivot_longer(-rowid, names_to = c(NA, "channel"), names_sep = "_") %>%
pivot_wider(names_from = channel, names_prefix = "ALL_", values_from = value, values_fn = sum) %>%
right_join(dataframe, by = "rowid") %>%
select(-rowid)
#> # A tibble: 8 x 9
#> ALL_ABC ALL_CNN ALL_CNBC Love_ABC Love_CNN Hate_ABC Hate_CNN Love_CNBC Hate_CNBC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 7 7 3 1 1 6 6 1 2
#> 2 6 5 5 3 3 3 2 2 3
#> 3 10 8 8 4 4 6 4 4 4
#> 4 11 7 7 6 2 5 5 5 2
#> 5 6 9 8 3 6 3 3 6 2
#> 6 8 14 14 2 7 6 7 7 7
#> 7 10 4 11 5 2 5 2 6 5
#> 8 4 12 5 1 6 3 6 3 2
The idea is to reshape it to make the sums easier. Then you can join the final result back to the initial dataframe.
start by uniquely identifying each row with a rowid.
reshape with pivot_longer so to have all values neatly in one column. In this step you also separate the names Love/Hate_channel in two and you remove the Love/Hate part (you are interested only on the channel) [that is what the NA does!].
reshape again: this time you want to get one column for each channel. In this step you also sum up what previously was Love and Hate together for each rowid and channel (that's what values_fn=sum does!). Also you add a prefix (names_prefix = "ALL_") to each new column name to have names that respect your expected final result.
with right_join you add the values back to the original dataframe. You have no need for rowid now, so you can remove it.

Related

Count how many rows have the same ID and add the number in an new column

My dataframe contains data about political careers, such as a unique identifier (called: ui) column for each politician and the electoral term(called: electoral_term) in which they were elected. Since a politician can be elected in multiple electoral terms, there are multiple rows that contain the same ui.
Now I would like to add another column to my dataframe, that counts how many times the politician got re-elected.
So e.g. the politician with ui=1 was re-elected 2 times, since he occured in 3 electoral_terms.
I already tried
df %>% count(ui)
But that only gives out a table which can't be added into my dataframe.
Thanks in advance!

We may use base R
df$reelected <- with(df, ave(ui, ui, FUN = length)-1)
-output
> df
ui electoral reelected
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1
data
df <- structure(list(ui = c(1, 1, 1, 2, 3, 3), electoral = c(1, 2,
3, 2, 7, 9)), class = "data.frame", row.names = c(NA, -6L))

mydf <- tibble::tribble(~ui, ~electoral, 1, 1, 1, 2, 1, 3, 2, 2, 3, 7, 3, 9)
library(dplyr)
df |>
add_count(ui, name = "re_elected") |>
mutate(re_elected = re_elected - 1)
# A tibble: 6 × 3
ui electoral re_elected
<dbl> <dbl> <dbl>
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1

library(tidyverse)
df %>%
group_by(ui) %>%
mutate(re_elected = n() - 1)
# A tibble: 6 × 3
# Groups: ui [3]
ui electoral re_elected
<dbl> <dbl> <dbl>
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1

R dataframe with special cumsum

I have a dateframe like this:
df <- data.frame(grp = c(rep("a", 5), rep("b", 5)), t = c(1:5, 1:5), value = c(-1, 5, 9, -15, 6, 5, 1, 7, -11, 9))
# Limits for desired cumulative sum (CumSum)
maxCumSum <- 8
minCumSum <- 0
What I would like to calculate is a cumulative sum of value by group (grp) within the values of maxCumSum and minCumSum. The respective table dt2 should look something like this:
grp t value CumSum
a 1 -1 0
a 2 5 5
a 3 9 8
a 4 -15 0
a 5 6 6
b 1 5 5
b 2 1 6
b 3 7 8
b 4 -11 0
b 5 9 8
Think of CumSum as a water storage with has a certain maximum capacity and the level of which cannot sink below zero.
The normal cumsum does obviously not do the trick since there are no limitations to maximum or minimum. Has anyone a suggestion how to achieve this? In the real dataframe there are of course more than 2 groups and far more than 5 times.
Many thanks!

What you can do is create a function which calculate the cumsum until it reach the max value and start again at the min value like this:
df <- data.frame(grp = c(rep("a", 5), rep("b", 5)), t = c(1:5, 1:5), value = c(-1, 5, 9, -15, 6, 5, 1, 7, -11, 9))
library(dplyr)
maxCumSum <- 8
minCumSum <- 0
f <- function(x, y) max(min(x + y, maxCumSum), minCumSum)
df %>%
group_by(grp) %>%
mutate(CumSum = Reduce(f, value, 0, accumulate = TRUE)[-1])
#> # A tibble: 10 × 4
#> # Groups: grp [2]
#> grp t value CumSum
#> <chr> <int> <dbl> <dbl>
#> 1 a 1 -1 0
#> 2 a 2 5 5
#> 3 a 3 9 8
#> 4 a 4 -15 0
#> 5 a 5 6 6
#> 6 b 1 5 5
#> 7 b 2 1 6
#> 8 b 3 7 8
#> 9 b 4 -11 0
#> 10 b 5 9 8
Created on 2022-07-04 by the reprex package (v2.0.1)

Convert list of named vectors of different length to tibble

I have the following list
example <- list(a = c(1, 2, 3),
b = c(2, 3),
c = c(3, 4, 5, 6))
that I'd like to transform into the following tibble
# A tibble: 9 × 2
name value
<chr> <dbl>
1 a 1
2 a 2
3 a 3
4 b 2
5 b 3
6 c 3
7 c 4
8 c 5
9 c 6
I've found multiple StackOverflow questions on this subject like here, here or here, but none is adressing this particular case where the name of the vector is not expected to become a column name.
I managed to achieve the desired result with a good old loop like below, but I'm looking for a faster and more elegant way.
library(dplyr)
example_list <- list(a = c(1, 2, 3),
b = c(2, 3),
c = c(3, 4, 5, 6))
example_tibble <- tibble()
for (i in 1:length(example_list)) {
example_tibble <- example_tibble %>%
bind_rows(as_tibble(example_list[[i]]) %>%
mutate(name = names(example_list)[[i]]))
}
example_tibble <- example_tibble %>%
relocate(name)

Try stack
> stack(example)
values ind
1 1 a
2 2 a
3 3 a
4 2 b
5 3 b
6 3 c
7 4 c
8 5 c
9 6 c

example <- list(a = c(1, 2, 3),
b = c(2, 3),
c = c(3, 4, 5, 6))
library(tidyverse)
enframe(example) %>%
unnest(value)
#> # A tibble: 9 x 2
#> name value
#> <chr> <dbl>
#> 1 a 1
#> 2 a 2
#> 3 a 3
#> 4 b 2
#> 5 b 3
#> 6 c 3
#> 7 c 4
#> 8 c 5
#> 9 c 6
Created on 2021-11-04 by the reprex package (v2.0.1)

Creating a column based on existing column where new column has values plus or minus certain value of old one

I am trying to create a column, where the new column has values plus or minus some fixed number or existing number of old column. For example, my old column is a and new column is b.
data = data.frame(a = 2:11)
new_data = data.frame(a = 2:11, b = c(1, 4, 5, 5, 6, 8, 9, 8, 11, 12))
new_data
#> a b
#> 1 2 1
#> 2 3 4
#> 3 4 5
#> 4 5 5
#> 5 6 6
#> 6 7 8
#> 7 8 9
#> 8 9 8
#> 9 10 11
#> 10 11 12

data$b <- data$a + sample(c(0, -1, +1), nrow(data), replace = T)
so If fixed number is say x do this
x <- 1
data$b <- data$a + sample(c(0, -1*x, x), nrow(data), replace = T)
Edit based on requirements stated in comments below. Use pmin and pmax. seed fixed in order to demonstrate
set.seed(19)
data %>% mutate(b = pmin(11, pmax(2, a + sample(-1:1, nrow(.), T)))) %>% pull(b) %>% cat
2 3 4 6 5 7 7 10 9 11
#otherwise
set.seed(19)
data %>% mutate(b = a + sample(-1:1, nrow(.), T))
a b
1 2 1
2 3 3
3 4 4
4 5 6
5 6 5
6 7 7
7 8 7
8 9 10
9 10 9
10 11 12

Using spread with duplicate identifiers for rows

I have a long form dataframe that have multiple entries for same date and person.
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
I want to convert it to wide form and make it like this:
month Amy.A Bob.A Amy.B Bob.B
1
2
3
1
2
3
1
2
3
1
2
3
My question is very similar to this. I have used the given code in the answer :
kk <- jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
but it gives following error:
Error: Duplicate identifiers for rows (1, 4), (2, 5), (3, 6), (13, 16), (14, 17), (15, 18), (7, 10), (8, 11), (9, 12), (19, 22), (20, 23), (21, 24)
Thanks in advance.
Note: I don't want to delete multiple entries.

Your answer was missing mutate id! Here is the solution using dplyr packge only.
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
spread(temp, value)
# A tibble: 6 x 6
# month id Amy_A Amy_B Bob_A Bob_B
# * <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 9 6 3 5
# 2 1 4 8 5 5 3
# 3 2 2 7 7 2 4
# 4 2 5 6 6 6 1
# 5 3 3 6 8 1 6
# 6 3 6 9 7 5 5

The issue is the two columns for both A and B. If we can make that one value column, we can spread the data as you would like. Take a look at the output for jj_melt when you use the code below.
library(reshape2)
jj_melt <- melt(jj, id=c("month", "student"))
jj_spread <- dcast(jj_melt, month ~ student + variable, value.var="value", fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
I won't mark this as a duplicate since the other question did not summarize by sum, but the data.table answer could help with one additional argument, fun=sum:
library(data.table)
dcast(setDT(jj), month ~ student, value.var=c("A", "B"), fun=sum)
# month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
# 1: 1 17 8 11 8
# 2: 2 13 8 13 5
# 3: 3 15 6 15 11
If you would like to use the tidyr solution, combine it with dcast to summarize by sum.
as.data.frame(jj)
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
dcast(month ~ temp, fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
Edit
Based on your new requirements, I have added an activity column.
library(dplyr)
jj %>% group_by(month, student) %>%
mutate(id=1:n()) %>%
melt(id=c("month", "id", "student")) %>%
dcast(... ~ student + variable, value.var="value")
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 1 2 8 5 5 3
# 3 2 1 7 7 2 4
# 4 2 2 6 6 6 1
# 5 3 1 6 8 1 6
# 6 3 2 9 7 5 5
The other solutions can also be used. Here I added an optional expression to arrange the final output by activity number:
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
dcast(... ~ temp) %>%
arrange(id)
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 2 2 7 7 2 4
# 3 3 3 6 8 1 6
# 4 1 4 8 5 5 3
# 5 2 5 6 6 6 1
# 6 3 6 9 7 5 5
The data.table syntax is compact because it allows for multiple value.var columns and will take care of the spread for us. We can then skip the melt -> cast process.
library(data.table)
setDT(jj)[, activityID := rowid(student)]
dcast(jj, ... ~ student, value.var=c("A", "B"))
# month activityID A_Amy A_Bob B_Amy B_Bob
# 1: 1 1 9 3 6 5
# 2: 1 4 8 5 5 3
# 3: 2 2 7 2 7 4
# 4: 2 5 6 6 6 1
# 5: 3 3 6 1 8 6
# 6: 3 6 9 5 7 5

Since tidyr 1.0.0 pivot_wider is the recommended replacement of spread and you could do the following :
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
library(tidyr)
pivot_wider(
jj,
names_from = "student",
values_from = c("A","B"),
names_sep = ".",
values_fn = list(A= list, B= list)) %>%
unchop(everything())
#> # A tibble: 6 x 5
#> month A.Amy A.Bob B.Amy B.Bob
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 9 3 6 5
#> 2 1 8 5 5 3
#> 3 2 7 2 7 4
#> 4 2 6 6 6 1
#> 5 3 6 1 8 6
#> 6 3 9 5 7 5
Created on 2019-09-14 by the reprex package (v0.3.0)
The twist in this problem is that month is not unique by student, to solve this :
values_fn = list(A= list, B= list)) puts the multiple values in a list
unchop(everything()) unnest the lists vertically, you can use unnest as well here

If we create a unique sequence, then we can the output in the correct format with pivot_wider
library(dplyr)
library(tidyr)
jj %>%
group_by(month, student) %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = 'student', values_from = c('A', 'B'),
names_sep='.') %>%
select(-rn)
# A tibble: 6 x 5
# Groups: month [3]
# month A.Amy A.Bob B.Amy B.Bob
# <int> <dbl> <dbl> <dbl> <dbl>
#1 1 9 3 6 5
#2 2 7 2 7 4
#3 3 6 1 8 6
#4 1 8 5 5 3
#5 2 6 6 6 1
#6 3 9 5 7 5
data
jj <- structure(list(month = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), student = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Amy", "Bob"), class = "factor"),
A = c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B = c(6, 7, 8,
5, 6, 7, 5, 4, 6, 3, 1, 5)), class = "data.frame", row.names = c(NA,
-12L))