My dataframe contains data about political careers, such as a unique identifier (called: ui) column for each politician and the electoral term(called: electoral_term) in which they were elected. Since a politician can be elected in multiple electoral terms, there are multiple rows that contain the same ui.
Now I would like to add another column to my dataframe, that counts how many times the politician got re-elected.
So e.g. the politician with ui=1 was re-elected 2 times, since he occured in 3 electoral_terms.
I already tried
df %>% count(ui)
But that only gives out a table which can't be added into my dataframe.
Thanks in advance!
We may use base R
df$reelected <- with(df, ave(ui, ui, FUN = length)-1)
-output
> df
ui electoral reelected
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1
data
df <- structure(list(ui = c(1, 1, 1, 2, 3, 3), electoral = c(1, 2,
3, 2, 7, 9)), class = "data.frame", row.names = c(NA, -6L))
mydf <- tibble::tribble(~ui, ~electoral, 1, 1, 1, 2, 1, 3, 2, 2, 3, 7, 3, 9)
library(dplyr)
df |>
add_count(ui, name = "re_elected") |>
mutate(re_elected = re_elected - 1)
# A tibble: 6 × 3
ui electoral re_elected
<dbl> <dbl> <dbl>
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1
library(tidyverse)
df %>%
group_by(ui) %>%
mutate(re_elected = n() - 1)
# A tibble: 6 × 3
# Groups: ui [3]
ui electoral re_elected
<dbl> <dbl> <dbl>
1 1 1 2
2 1 2 2
3 1 3 2
4 2 2 0
5 3 7 1
6 3 9 1
I have a dateframe like this:
df <- data.frame(grp = c(rep("a", 5), rep("b", 5)), t = c(1:5, 1:5), value = c(-1, 5, 9, -15, 6, 5, 1, 7, -11, 9))
# Limits for desired cumulative sum (CumSum)
maxCumSum <- 8
minCumSum <- 0
What I would like to calculate is a cumulative sum of value by group (grp) within the values of maxCumSum and minCumSum. The respective table dt2 should look something like this:
grp t value CumSum
a 1 -1 0
a 2 5 5
a 3 9 8
a 4 -15 0
a 5 6 6
b 1 5 5
b 2 1 6
b 3 7 8
b 4 -11 0
b 5 9 8
Think of CumSum as a water storage with has a certain maximum capacity and the level of which cannot sink below zero.
The normal cumsum does obviously not do the trick since there are no limitations to maximum or minimum. Has anyone a suggestion how to achieve this? In the real dataframe there are of course more than 2 groups and far more than 5 times.
Many thanks!
What you can do is create a function which calculate the cumsum until it reach the max value and start again at the min value like this:
df <- data.frame(grp = c(rep("a", 5), rep("b", 5)), t = c(1:5, 1:5), value = c(-1, 5, 9, -15, 6, 5, 1, 7, -11, 9))
library(dplyr)
maxCumSum <- 8
minCumSum <- 0
f <- function(x, y) max(min(x + y, maxCumSum), minCumSum)
df %>%
group_by(grp) %>%
mutate(CumSum = Reduce(f, value, 0, accumulate = TRUE)[-1])
#> # A tibble: 10 × 4
#> # Groups: grp [2]
#> grp t value CumSum
#> <chr> <int> <dbl> <dbl>
#> 1 a 1 -1 0
#> 2 a 2 5 5
#> 3 a 3 9 8
#> 4 a 4 -15 0
#> 5 a 5 6 6
#> 6 b 1 5 5
#> 7 b 2 1 6
#> 8 b 3 7 8
#> 9 b 4 -11 0
#> 10 b 5 9 8
Created on 2022-07-04 by the reprex package (v2.0.1)
I have the following list
example <- list(a = c(1, 2, 3),
b = c(2, 3),
c = c(3, 4, 5, 6))
that I'd like to transform into the following tibble
# A tibble: 9 × 2
name value
<chr> <dbl>
1 a 1
2 a 2
3 a 3
4 b 2
5 b 3
6 c 3
7 c 4
8 c 5
9 c 6
I've found multiple StackOverflow questions on this subject like here, here or here, but none is adressing this particular case where the name of the vector is not expected to become a column name.
I managed to achieve the desired result with a good old loop like below, but I'm looking for a faster and more elegant way.
library(dplyr)
example_list <- list(a = c(1, 2, 3),
b = c(2, 3),
c = c(3, 4, 5, 6))
example_tibble <- tibble()
for (i in 1:length(example_list)) {
example_tibble <- example_tibble %>%
bind_rows(as_tibble(example_list[[i]]) %>%
mutate(name = names(example_list)[[i]]))
}
example_tibble <- example_tibble %>%
relocate(name)
Try stack
> stack(example)
values ind
1 1 a
2 2 a
3 3 a
4 2 b
5 3 b
6 3 c
7 4 c
8 5 c
9 6 c
example <- list(a = c(1, 2, 3),
b = c(2, 3),
c = c(3, 4, 5, 6))
library(tidyverse)
enframe(example) %>%
unnest(value)
#> # A tibble: 9 x 2
#> name value
#> <chr> <dbl>
#> 1 a 1
#> 2 a 2
#> 3 a 3
#> 4 b 2
#> 5 b 3
#> 6 c 3
#> 7 c 4
#> 8 c 5
#> 9 c 6
Created on 2021-11-04 by the reprex package (v2.0.1)
I am trying to create a column, where the new column has values plus or minus some fixed number or existing number of old column. For example, my old column is a and new column is b.
data = data.frame(a = 2:11)
new_data = data.frame(a = 2:11, b = c(1, 4, 5, 5, 6, 8, 9, 8, 11, 12))
new_data
#> a b
#> 1 2 1
#> 2 3 4
#> 3 4 5
#> 4 5 5
#> 5 6 6
#> 6 7 8
#> 7 8 9
#> 8 9 8
#> 9 10 11
#> 10 11 12
data$b <- data$a + sample(c(0, -1, +1), nrow(data), replace = T)
so If fixed number is say x do this
x <- 1
data$b <- data$a + sample(c(0, -1*x, x), nrow(data), replace = T)
Edit based on requirements stated in comments below. Use pmin and pmax. seed fixed in order to demonstrate
set.seed(19)
data %>% mutate(b = pmin(11, pmax(2, a + sample(-1:1, nrow(.), T)))) %>% pull(b) %>% cat
2 3 4 6 5 7 7 10 9 11
#otherwise
set.seed(19)
data %>% mutate(b = a + sample(-1:1, nrow(.), T))
a b
1 2 1
2 3 3
3 4 4
4 5 6
5 6 5
6 7 7
7 8 7
8 9 10
9 10 9
10 11 12
I have a long form dataframe that have multiple entries for same date and person.
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
I want to convert it to wide form and make it like this:
month Amy.A Bob.A Amy.B Bob.B
1
2
3
1
2
3
1
2
3
1
2
3
My question is very similar to this. I have used the given code in the answer :
kk <- jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
but it gives following error:
Error: Duplicate identifiers for rows (1, 4), (2, 5), (3, 6), (13, 16), (14, 17), (15, 18), (7, 10), (8, 11), (9, 12), (19, 22), (20, 23), (21, 24)
Thanks in advance.
Note: I don't want to delete multiple entries.
Your answer was missing mutate id! Here is the solution using dplyr packge only.
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
spread(temp, value)
# A tibble: 6 x 6
# month id Amy_A Amy_B Bob_A Bob_B
# * <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 9 6 3 5
# 2 1 4 8 5 5 3
# 3 2 2 7 7 2 4
# 4 2 5 6 6 6 1
# 5 3 3 6 8 1 6
# 6 3 6 9 7 5 5
The issue is the two columns for both A and B. If we can make that one value column, we can spread the data as you would like. Take a look at the output for jj_melt when you use the code below.
library(reshape2)
jj_melt <- melt(jj, id=c("month", "student"))
jj_spread <- dcast(jj_melt, month ~ student + variable, value.var="value", fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
I won't mark this as a duplicate since the other question did not summarize by sum, but the data.table answer could help with one additional argument, fun=sum:
library(data.table)
dcast(setDT(jj), month ~ student, value.var=c("A", "B"), fun=sum)
# month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
# 1: 1 17 8 11 8
# 2: 2 13 8 13 5
# 3: 3 15 6 15 11
If you would like to use the tidyr solution, combine it with dcast to summarize by sum.
as.data.frame(jj)
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
dcast(month ~ temp, fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
Edit
Based on your new requirements, I have added an activity column.
library(dplyr)
jj %>% group_by(month, student) %>%
mutate(id=1:n()) %>%
melt(id=c("month", "id", "student")) %>%
dcast(... ~ student + variable, value.var="value")
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 1 2 8 5 5 3
# 3 2 1 7 7 2 4
# 4 2 2 6 6 6 1
# 5 3 1 6 8 1 6
# 6 3 2 9 7 5 5
The other solutions can also be used. Here I added an optional expression to arrange the final output by activity number:
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
dcast(... ~ temp) %>%
arrange(id)
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 2 2 7 7 2 4
# 3 3 3 6 8 1 6
# 4 1 4 8 5 5 3
# 5 2 5 6 6 6 1
# 6 3 6 9 7 5 5
The data.table syntax is compact because it allows for multiple value.var columns and will take care of the spread for us. We can then skip the melt -> cast process.
library(data.table)
setDT(jj)[, activityID := rowid(student)]
dcast(jj, ... ~ student, value.var=c("A", "B"))
# month activityID A_Amy A_Bob B_Amy B_Bob
# 1: 1 1 9 3 6 5
# 2: 1 4 8 5 5 3
# 3: 2 2 7 2 7 4
# 4: 2 5 6 6 6 1
# 5: 3 3 6 1 8 6
# 6: 3 6 9 5 7 5
Since tidyr 1.0.0 pivot_wider is the recommended replacement of spread and you could do the following :
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
library(tidyr)
pivot_wider(
jj,
names_from = "student",
values_from = c("A","B"),
names_sep = ".",
values_fn = list(A= list, B= list)) %>%
unchop(everything())
#> # A tibble: 6 x 5
#> month A.Amy A.Bob B.Amy B.Bob
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 9 3 6 5
#> 2 1 8 5 5 3
#> 3 2 7 2 7 4
#> 4 2 6 6 6 1
#> 5 3 6 1 8 6
#> 6 3 9 5 7 5
Created on 2019-09-14 by the reprex package (v0.3.0)
The twist in this problem is that month is not unique by student, to solve this :
values_fn = list(A= list, B= list)) puts the multiple values in a list
unchop(everything()) unnest the lists vertically, you can use unnest as well here
If we create a unique sequence, then we can the output in the correct format with pivot_wider
library(dplyr)
library(tidyr)
jj %>%
group_by(month, student) %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = 'student', values_from = c('A', 'B'),
names_sep='.') %>%
select(-rn)
# A tibble: 6 x 5
# Groups: month [3]
# month A.Amy A.Bob B.Amy B.Bob
# <int> <dbl> <dbl> <dbl> <dbl>
#1 1 9 3 6 5
#2 2 7 2 7 4
#3 3 6 1 8 6
#4 1 8 5 5 3
#5 2 6 6 6 1
#6 3 9 5 7 5
data
jj <- structure(list(month = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), student = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Amy", "Bob"), class = "factor"),
A = c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B = c(6, 7, 8,
5, 6, 7, 5, 4, 6, 3, 1, 5)), class = "data.frame", row.names = c(NA,
-12L))