Randomly assigning columns to other columns in R - r

I have a column of student names and a column consisting the group number for each of those students. How could I randomly assign each student to be a judge of another group's work, could anyone let me know on how to build a function to solve that issue? They cannot be a judge of their own group.
Bob Ross 1
Kanye West 1
Chris Evans 1
Robert Jr 1
Bruce Wayne 2
Peter Parker 2
Steven Strange 2
Danny rand 2
Daniel Fisher 2
Rob Son 3
Son Bob 3
Chun Li 3
Ching Do 3
Ping Pong 3
Michael Jackson 4
Rich Brian 4
Ryan Gosling 4
Nathan Nguyen 4
Justin Bieber 4

Here's one way, using tidyverse methods. Basically this says for each value (map_int) in group, take a sample from the groups that aren't the current one.
library(tidyverse)
df <- structure(list(name = c("Kanye West", "Chris Evans", "Robert Jr", "Bruce Wayne", "Peter Parker", "Steven Strange", "Danny rand", "Daniel Fisher", "Rob Son", "Son Bob", "Chun Li", "Ching Do", "Ping Pong", "Michael Jackson", "Rich Brian", "Ryan Gosling", "Nathan Nguyen", "Justin Bieber"), group = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -18L))
df %>%
mutate(
to_judge = map_int(
.x = group,
.f = ~ sample(
x = unique(group)[unique(group) != .x],
size = 1
)
)
)
#> # A tibble: 18 x 3
#> name group to_judge
#> <chr> <int> <int>
#> 1 Kanye West 1 4
#> 2 Chris Evans 1 2
#> 3 Robert Jr 1 3
#> 4 Bruce Wayne 2 1
#> 5 Peter Parker 2 3
#> 6 Steven Strange 2 3
#> 7 Danny rand 2 4
#> 8 Daniel Fisher 2 1
#> 9 Rob Son 3 1
#> 10 Son Bob 3 2
#> 11 Chun Li 3 4
#> 12 Ching Do 3 4
#> 13 Ping Pong 3 4
#> 14 Michael Jackson 4 2
#> 15 Rich Brian 4 3
#> 16 Ryan Gosling 4 1
#> 17 Nathan Nguyen 4 2
#> 18 Justin Bieber 4 1
Created on 2018-09-20 by the reprex package (v0.2.0).

Another option with tidyverse would be to group_by the group column, define the sample vector with setdiff and draw a sample of the size of the group:
df <- data.frame(Student = LETTERS[1:20],
Group = gl(4, 5))
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(Judge = sample(setdiff(unique(df$Group), Group), n(), replace = T))
# A tibble: 20 x 3
# Groups: Group [4]
Student Group Judge
<fct> <fct> <chr>
1 A 1 4
2 B 1 2
3 C 1 3
4 D 1 3
5 E 1 4
6 F 2 4
7 G 2 4
8 H 2 1
9 I 2 1
10 J 2 4
11 K 3 4
12 L 3 2
13 M 3 1
14 N 3 2
15 O 3 2
16 P 4 2
17 Q 4 1
18 R 4 2
19 S 4 1
20 T 4 3

Related

My question is about R: How to number each repetition in a table in R?

In my data set, their is column of full names (eg: below) and I want to add the another column next to it mentioning if a name has appeared two one, two, three, four.... times using R. My output should look like the column below: Number of repetition.
Eg: Data set name: People
**Full name** **Number of repetition**
Peter 1
Peter 2
Alison
Warren
Jack 1
Jack 2
Jack 3
Jack 4
Susan 1
Susan 2
Henry 1
Walison
Tinder 1
Peter 3
Henry 2
Tinder 2
Thanks
Teena
Here is an alternative way solved with help from akrun: sum() condition in ifelse statement
library(dplyr)
df1 %>%
group_by(Fullname) %>%
mutate(newcol = row_number(),
newcol = if(sum(newcol)> 1) newcol else NA) %>%
ungroup
Fullname newcol
<chr> <int>
1 Peter 1
2 Peter 2
3 Alison NA
4 Warren NA
5 Jack 1
6 Jack 2
7 Jack 3
8 Jack 4
9 Susan 1
10 Susan 2
11 Henry 1
12 Walison NA
13 Tinder 1
14 Peter 3
15 Henry 2
16 Tinder 2
Here is one way. Do a group by 'Fullname', and create the sequence with row_number() if the number of rows is greater than 1. By default, case_when returns the other case as NA
library(dplyr)
df1 <- df1 %>%
group_by(Fullname) %>%
mutate(Number_of_repetition = case_when(n() > 1 ~ row_number())) %>%
ungroup
-output
df1
# A tibble: 16 × 2
Fullname Number_of_repetition
<chr> <int>
1 Peter 1
2 Peter 2
3 Alison NA
4 Warren NA
5 Jack 1
6 Jack 2
7 Jack 3
8 Jack 4
9 Susan 1
10 Susan 2
11 Henry 1
12 Walison NA
13 Tinder 1
14 Peter 3
15 Henry 2
16 Tinder 2
If we need to add a third column, use unite on the updated data from previous step
library(tidyr)
df1 %>%
unite(FullNameRep, Fullname, Number_of_repetition, sep="", na.rm = TRUE, remove = FALSE)
-output
# A tibble: 16 × 3
FullNameRep Fullname Number_of_repetition
<chr> <chr> <int>
1 Peter1 Peter 1
2 Peter2 Peter 2
3 Alison Alison NA
4 Warren Warren NA
5 Jack1 Jack 1
6 Jack2 Jack 2
7 Jack3 Jack 3
8 Jack4 Jack 4
9 Susan1 Susan 1
10 Susan2 Susan 2
11 Henry1 Henry 1
12 Walison Walison NA
13 Tinder1 Tinder 1
14 Peter3 Peter 3
15 Henry2 Henry 2
16 Tinder2 Tinder 2
data
df1 <- structure(list(Fullname = c("Peter", "Peter", "Alison", "Warren",
"Jack", "Jack", "Jack", "Jack", "Susan", "Susan", "Henry", "Walison",
"Tinder", "Peter", "Henry", "Tinder")), row.names = c(NA, -16L
), class = "data.frame")

Binned physiological time series data in R: calculate duration spent in each bin

I have a dataset containing changes in mean arterial blood pressure (MAP) over time from multiple participants. Here is an example dataframe:
df=structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), Time = structure(1:14, .Label = c("11:02:00",
"11:03:00", "11:04:00", "11:05:00", "11:06:00", "11:07:00", "11:08:00",
"13:30:00", "13:31:00", "13:32:00", "13:33:00", "13:34:00", "13:35:00",
"13:36:00"), class = "factor"), MAP = c(90.27999878, 84.25, 74.81999969,
80.87000275, 99.38999939, 81.51000214, 71.51000214, 90.08999634,
88.75, 84.72000122, 83.86000061, 94.18000031, 98.54000092, 51
)), class = "data.frame", row.names = c(NA, -14L))
I have binned the data into groups: e.g. MAP 40-60, 60-80, 80-100 and added a unique flag (1, 2 or 3) in an additional column map_bin. This is my code so far:
library(dplyr)
#Mean Arterial Pressure
#Bin 1=40-60; Bin 2=60-80; Bin 3=80-100
map_bin=c("1","2","3")
output <- as_tibble(df) %>%
mutate(map_bin = case_when(
MAP >= 40 & MAP < 60 ~ map_bin[1],
MAP >= 60 & MAP < 80 ~ map_bin[2],
MAP >= 80 & MAP < 100 ~ map_bin[3]
))
For each ID I wish to calculate, in an additional column, the total time MAP is in each bin. I expect the following output:
ID
Time
MAP
map_bin
map_bin_dur
1
11:02:00
90.27999878
3
5
1
11:03:00
84.25
3
5
1
11:04:00
74.81999969
2
2
1
11:05:00
80.87000275
3
5
1
11:06:00
99.38999939
3
5
1
11:07:00
81.51000214
3
5
1
11:08:00
71.51000214
2
2
2
13:30:00
90.08999634
3
6
2
13:31:00
88.75
3
6
2
13:32:00
84.72000122
3
6
2
13:33:00
83.86000061
3
6
2
13:34:00
94.18000031
3
6
2
13:35:00
98.54000092
3
6
2
13:36:00
51
1
1
Where map_bin_dur is the time in minutes that MAP for each individual resided in each bin. e.g. ID 1 had a MAP in Bin 3 for 5 minutes in total.
If you have Time column of 1 min-duration always you can use add_count -
library(dplyr)
output <- output %>% add_count(ID, map_bin, name = 'map_bin_dur')
output
# ID Time MAP map_bin map_bin_dur
# <int> <fct> <dbl> <chr> <int>
# 1 1 11:02:00 90.3 3 5
# 2 1 11:03:00 84.2 3 5
# 3 1 11:04:00 74.8 2 2
# 4 1 11:05:00 80.9 3 5
# 5 1 11:06:00 99.4 3 5
# 6 1 11:07:00 81.5 3 5
# 7 1 11:08:00 71.5 2 2
# 8 2 13:30:00 90.1 3 6
# 9 2 13:31:00 88.8 3 6
#10 2 13:32:00 84.7 3 6
#11 2 13:33:00 83.9 3 6
#12 2 13:34:00 94.2 3 6
#13 2 13:35:00 98.5 3 6
#14 2 13:36:00 51 1 1

reorder/standardize and create rows in R

I am new to R and I have been looking for a solution to an existing dataframe I have been given. I have a set of variables, each of which contains some of other subcategories. Assume it looks something like this:
Michael Physics 1 2
Michael Math 2 4
Michael Science 3 4
Michael PE 2 1
James Art 0 9
James PE 1 2
James Physics -1 2
James Science 1 2
Simon PE 1 2
Simon Art 1 3
Simon Music 1 4
Simon Science 1 4
Notably, the second column has a "standard" set of variables, so that each student shares most but not necessarily all of the variables, and the ordering of these variables is scrambled. My issue is then that I want to convert this dataframe to a "standard format". That is I want each of the students to have ALL of the variables and in the same order. So if I define a list of all the subjects: say Physics, Math, Science, Art, PE, Music. I would like for there to be 18 rows in my modified dataframe(6 for each student, with the ordering defined for the subject). If the student and subject are contained in the original dataset, the row should have the data from the row, and if the student and subject doesnt exist in the original dataframe, then the other datacolumns would just be NA.
Update on OP's comment:
To keep the original order you could factor Student and define level:
df <- df %>%
mutate(Student = factor(Student, levels = c("Michael", "James", "Simon")))
df1 <- df %>%
expand(Student, Course)
df %>%
right_join(df1) %>%
arrange(Student, Course)
Output:
Student Course V1 V2
<fct> <chr> <dbl> <dbl>
1 Michael Art NA NA
2 Michael Math 2 4
3 Michael Music NA NA
4 Michael PE 2 1
5 Michael Physics 1 2
6 Michael Science 3 4
7 James Art 0 9
8 James Math NA NA
9 James Music NA NA
10 James PE 1 2
11 James Physics -1 2
12 James Science 1 2
13 Simon Art 1 3
14 Simon Math NA NA
15 Simon Music 1 4
16 Simon PE 1 2
17 Simon Physics NA NA
18 Simon Science 1 4
We could combine expand and right_join
library(dplyr)
library(tidyr)
df1 <- df %>%
expand(Student, Course)
df %>%
right_join(df1) %>%
arrange(Student, Course)
Output:
Student Course V1 V2
<chr> <chr> <dbl> <dbl>
1 James Art 0 9
2 James Math NA NA
3 James Music NA NA
4 James PE 1 2
5 James Physics -1 2
6 James Science 1 2
7 Michael Art NA NA
8 Michael Math 2 4
9 Michael Music NA NA
10 Michael PE 2 1
11 Michael Physics 1 2
12 Michael Science 3 4
13 Simon Art 1 3
14 Simon Math NA NA
15 Simon Music 1 4
16 Simon PE 1 2
17 Simon Physics NA NA
18 Simon Science 1 4
In the below, we repeatedly use pivot_ to get the desired result. The output is sorted by student name and subject.
library(tidyverse)
df <- read_delim("Michael Physics 1 2
Michael Math 2 4
Michael Science 3 4
Michael PE 2 1
James Art 0 9
James PE 1 2
James Physics -1 2
James Science 1 2
Simon PE 1 2
Simon Art 1 3
Simon Music 1 4
Simon Science 1 4", delim = " ", col_names = c("student", "subject", "v1", "v2"))
df %>%
pivot_wider(names_from = "subject", values_from = c("v1", "v2")) %>%
pivot_longer(cols = starts_with("v"), names_to = "name", values_to = "value") %>%
separate(name, into = c("var", "subject"), sep = "_") %>%
pivot_wider(names_from = var, values_from = value) %>%
arrange(student, subject)
#> # A tibble: 18 x 4
#> student subject v1 v2
#> <chr> <chr> <dbl> <dbl>
#> 1 James Art 0 9
#> 2 James Math NA NA
#> 3 James Music NA NA
#> 4 James PE 1 2
#> 5 James Physics -1 2
#> 6 James Science 1 2
#> 7 Michael Art NA NA
#> 8 Michael Math 2 4
#> 9 Michael Music NA NA
#> 10 Michael PE 2 1
#> 11 Michael Physics 1 2
#> 12 Michael Science 3 4
#> 13 Simon Art 1 3
#> 14 Simon Math NA NA
#> 15 Simon Music 1 4
#> 16 Simon PE 1 2
#> 17 Simon Physics NA NA
#> 18 Simon Science 1 4
Created on 2021-07-18 by the reprex package (v2.0.0)
You can use complete. To preserve the original ordering of the data you can save the name of the students in a variable and use match and arrange.
library(dplyr)
library(tidyr)
orignal_order <- unique(df$V1)
df %>% complete(V1, V2) %>% arrange(match(V1, orignal_order))
# V1 V2 V3 V4
# <chr> <chr> <int> <int>
# 1 Michael Art NA NA
# 2 Michael Math 2 4
# 3 Michael Music NA NA
# 4 Michael PE 2 1
# 5 Michael Physics 1 2
# 6 Michael Science 3 4
# 7 James Art 0 9
# 8 James Math NA NA
# 9 James Music NA NA
#10 James PE 1 2
#11 James Physics -1 2
#12 James Science 1 2
#13 Simon Art 1 3
#14 Simon Math NA NA
#15 Simon Music 1 4
#16 Simon PE 1 2
#17 Simon Physics NA NA
#18 Simon Science 1 4
data
df <- structure(list(V1 = c("Michael", "Michael", "Michael", "Michael",
"James", "James", "James", "James", "Simon", "Simon", "Simon",
"Simon"), V2 = c("Physics", "Math", "Science", "PE", "Art", "PE",
"Physics", "Science", "PE", "Art", "Music", "Science"), V3 = c(1L,
2L, 3L, 2L, 0L, 1L, -1L, 1L, 1L, 1L, 1L, 1L), V4 = c(2L, 4L,
4L, 1L, 9L, 2L, 2L, 2L, 2L, 3L, 4L, 4L)),
class = "data.frame", row.names = c(NA, -12L))

Sum by groups in two columns in R

I have the following DF:
DAY BRAND SOLD
2018/04/10 KIA 10
2018/04/15 KIA 5
2018/05/01 KIA 7
2018/05/06 KIA 3
2018/04/04 BMW 2
2018/05/25 BMW 8
2018/06/19 BMW 5
2018/06/14 BMW 1
I would like to sum the units sold by month and repeat them in every row where the date belongs to the month (the sum can't be done for different BRANDS in the same MONTH, that's a condition), like this:
DAY BRAND SOLD TOTAL
2018/04/10 KIA 10 15
2018/04/15 KIA 5 15
2018/05/01 KIA 7 10
2018/05/06 KIA 3 10
2018/04/04 BMW 2 2
2018/05/25 BMW 8 8
2018/06/19 BMW 5 6
2018/06/14 BMW 1 6
How can I do this?
We can use ave after extracting the 'month' from the 'DAY' column and use that as grouping variable along with "BRAND"
df1$TOTAL <- with(df1, ave(SOLD, BRAND,
format(as.Date(DAY, "%Y/%m/%d"), "%m"), FUN = sum))
df1$TOTAL
#[1] 15 15 10 10 2 8 6 6
Or in dplyr/lubridate
library(dplyr)
library(lubridate)
df1 %>%
group_by(BRAND, MONTH = month(ymd(DAY))) %>%
mutate(TOTAL = sum(SOLD))
# A tibble: 8 x 5
# Groups: BRAND, MONTH [5]
# DAY BRAND SOLD MONTH TOTAL
# <chr> <chr> <int> <dbl> <int>
#1 2018/04/10 KIA 10 4 15
#2 2018/04/15 KIA 5 4 15
#3 2018/05/01 KIA 7 5 10
#4 2018/05/06 KIA 3 5 10
#5 2018/04/04 BMW 2 4 2
#6 2018/05/25 BMW 8 5 8
#7 2018/06/19 BMW 5 6 6
#8 2018/06/14 BMW 1 6 6
Remove the 'MONTH' column after ungrouping with select(-MONTH) if needed
data
df1 <- structure(list(DAY = c("2018/04/10", "2018/04/15", "2018/05/01",
"2018/05/06", "2018/04/04", "2018/05/25", "2018/06/19", "2018/06/14"
), BRAND = c("KIA", "KIA", "KIA", "KIA", "BMW", "BMW", "BMW",
"BMW"), SOLD = c(10L, 5L, 7L, 3L, 2L, 8L, 5L, 1L)),
class = "data.frame", row.names = c(NA,
-8L))

How to reset a Variable Value to 0 for starting point? [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 4 years ago.
I have a odometer reading with following sample data for different cars. I intend to reset value of odometer to effectively
measure the distance he traveled in an effective manner
Sample data
ID ODometer
1 2132
1 2133
1 2134
1 2135
1 2136
1 2137
2 1123
2 1124
2 1125
Expected:
Expected Output
ID Odometer
1 1
1 2
1 3
1 4
1 5
1 6
2 1
2 2
2 3
We can use row_number() after grouping by 'ID'
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(Odometer = row_number())
# A tibble: 9 x 3
# Groups: ID [2]
# ID ODometer Odometer
# <int> <int> <int>
#1 1 2132 1
#2 1 2133 2
#3 1 2134 3
#4 1 2135 4
#5 1 2136 5
#6 1 2137 6
#7 2 1123 1
#8 2 1124 2
#9 2 1125 3
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L),
ODometer = c(2132L,
2133L, 2134L, 2135L, 2136L, 2137L, 1123L, 1124L, 1125L)),
class = "data.frame", row.names = c(NA, -9L))

Resources