iterative functions in R - r

I’m trying to create multiple new score columns based on other columns. I’d like to use a function to minimize copy pasting large blocks of code.
I’m trying to do something like:
Myfunction <- function(column){
Column_df <- old_df %>%
mutate(column.score = if_else(column = 1, “yes”, “no”)
)
}
Score_df <- Myfunction(c(math, reading, science)))
But I’m getting an error saying object math is not found

Starting with an example data frame as below
df <- purrr::map_dfc(c('math', 'reading', 'science', 'history'),
~ rlang::list2(!!.x := sample(1:3, 10, TRUE)))
df
#> # A tibble: 10 × 4
#> math reading science history
#> <int> <int> <int> <int>
#> 1 2 1 3 1
#> 2 3 2 3 1
#> 3 2 2 2 2
#> 4 2 3 1 2
#> 5 3 3 1 2
#> 6 1 2 3 2
#> 7 3 3 2 1
#> 8 3 3 3 2
#> 9 1 2 2 1
#> 10 2 2 2 3
You can create new "score" columns with a function by passing your columns argument to across inside {{ }}, and using the .name option to add ".score" to the name.
If you want only the "score" columns in the output, rather than to add them to existing columns, use transmute instead of mutate.
library(dplyr, warn.conflicts = FALSE)
Myfunction <- function(df, columns){
df %>%
mutate(across({{ columns }}, ~ if_else(. == 1, 'yes', 'no'),
.names = '{.col}.score'))
}
df %>%
Myfunction(c(math, reading, science))
#> # A tibble: 10 × 7
#> math reading science history math.score reading.score science.score
#> <int> <int> <int> <int> <chr> <chr> <chr>
#> 1 2 1 3 1 no yes no
#> 2 3 2 3 1 no no no
#> 3 2 2 2 2 no no no
#> 4 2 3 1 2 no no yes
#> 5 3 3 1 2 no no yes
#> 6 1 2 3 2 yes no no
#> 7 3 3 2 1 no no no
#> 8 3 3 3 2 no no no
#> 9 1 2 2 1 yes no no
#> 10 2 2 2 3 no no no
Created on 2022-01-18 by the reprex package (v2.0.1)

Related

tidy syntax for matrix to tibble by index?

I have a matrix foo and want to create a data.frame or tibble like bar with the data in a long format with the indices as columns. What's a simple way to do this in the tidyverse?
z <- c(1,8,6,4,7,3,2,4,7)
foo <- matrix(z,3,3)
bar <- expand.grid(j=1:3,i=1:3)
bar$z <- z
foo
bar
Here are two ways.
The first is in fact a base R solution, just change magrittr's pipe for R's native pipe operator |>.
The second is a tidyverse solution which I find too complicated.
suppressPackageStartupMessages(
library(tidyverse)
)
z <- c(1,8,6,4,7,3,2,4,7)
foo <- matrix(z,3,3)
bar <- expand.grid(j=1:3,i=1:3)
bar$z <- z
cbind(
i = foo %>% row() %>% c(),
j = foo %>% col() %>% c(),
z = foo %>% c()
) %>%
as.data.frame()
#> i j z
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7
foo %>%
t() %>%
as.data.frame() %>%
pivot_longer(everything(), values_to = "z") %>%
mutate(i = c(row(foo)), j = c(col(foo))) %>%
select(-name) %>%
relocate(z, .after = j)
#> # A tibble: 9 × 3
#> i j z
#> <int> <int> <dbl>
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7
Created on 2022-10-12 with reprex v2.0.2
Another base R method would be to take advantage of as.table and as.data.frame
as.data.frame(lapply(as.data.frame(as.table(foo)), as.numeric),
col.names = c("row", "col", "val"))
#> row col val
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7

Is there a way to get subdataframes with purrr in magrittr pipes workflow without using data.frame name?

That is, I was interested in doing the same as in the example, but with purrr functions.
tibble(a, b = a * 2, c = 1) %>%
{lapply(X = names(.), FUN = function(.x) select(., 1:.x))}
[[1]]
# A tibble: 5 x 1
a
<int>
1 1
2 2
3 3
4 4
5 5
[[2]]
# A tibble: 5 x 2
a b
<int> <dbl>
1 1 2
2 2 4
3 3 6
4 4 8
5 5 10
[[3]]
# A tibble: 5 x 3
a b c
<int> <dbl> <dbl>
1 1 2 1
2 2 4 1
3 3 6 1
4 4 8 1
5 5 10 1
I only could do it if I named foo <- tibble(a, b = a * 2, c = 1) and inside map I did select(foo, ...), but I wanted to avoid that, since I wanted to mutate the named dataframe in pipe workflow.
Thank you!
You can use map in the following way :
library(dplyr)
library(purrr)
tibble(a = 1:5, b = a * 2, c = 1) %>%
{map(names(.), function(.x) select(., 1:.x))}
Based on your actual use case you can also use imap which will pass column value (.x) along with it's name (.y).
tibble(a = 1:5, b = a * 2, c = 1) %>%
imap(function(.x, .y) select(., 1:.y))
#$a
# A tibble: 5 x 1
# a
# <int>
#1 1
#2 2
#3 3
#4 4
#5 5
#$b
# A tibble: 5 x 2
# a b
# <int> <dbl>
#1 1 2
#2 2 4
#3 3 6
#4 4 8
#5 5 10
#$c
# A tibble: 5 x 3
# a b c
# <int> <dbl> <dbl>
#1 1 2 1
#2 2 4 1
#3 3 6 1
#4 4 8 1
#5 5 10 1

Adding sequential IDs to rows in data frame

I have a dataset called Snapper_new that has 330 rows and each set of nine rows is named 1 through 9 as shown in the id column. I want each set of nine rows (1-9, 10-18, etc.) to have a unique ID (1,2, etc.). How would I do this in R?
Here an approach with the tidyverse
library(tidyverse)
Snapper_new <- rep(seq(1:9), 3) %>%
enframe(name=NULL, value="id")
Snapper_new %>%
mutate(group_start=case_when(id==1 ~ 1,
TRUE ~ as.numeric(0))) %>%
mutate(group_index=cumsum(group_start))
#> # A tibble: 27 x 3
#> id group_start group_index
#> <int> <dbl> <dbl>
#> 1 1 1 1
#> 2 2 0 1
#> 3 3 0 1
#> 4 4 0 1
#> 5 5 0 1
#> 6 6 0 1
#> 7 7 0 1
#> 8 8 0 1
#> 9 9 0 1
#> 10 1 1 2
#> # ... with 17 more rows
Created on 2020-11-30 by the reprex package (v0.3.0)
Pure R answer.
a = data.frame("test"=1:330, "pokus" = 1:330)
b <- unlist(lapply(1:ceiling(330/9), function(x) {replicate(9, x)}))
b <- b[1:nrow(a)]
a <- cbind(a, b)

Create new column based on condition from other column per group using tidy evaluation

Similar to this question but I want to use tidy evaluation instead.
df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
speed = c(3,4,3,4,5,6,6,4,9))
> df
group date speed
1 1 1 3
2 1 2 4
3 1 3 3
4 2 4 4
5 2 5 5
6 2 6 6
7 3 7 6
8 3 8 4
9 3 9 9
The task is to create a new column (newValue) whose values equals to the values of the date column (per group) with one condition: speed == 4. Example: group 1 has a newValue of 2 because date[speed==4] = 2.
group date speed newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
It worked without tidy evaluation
df %>%
group_by(group) %>%
mutate(newValue=date[speed==4L])
#> # A tibble: 9 x 4
#> # Groups: group [3]
#> group date speed newValue
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 3 2
#> 2 1 2 4 2
#> 3 1 3 3 2
#> 4 2 4 4 4
#> 5 2 5 5 4
#> 6 2 6 6 4
#> 7 3 7 6 8
#> 8 3 8 4 8
#> 9 3 9 9 8
But had error with tidy evaluation
my_fu <- function(df, filter_var){
filter_var <- sym(filter_var)
df <- df %>%
group_by(group) %>%
mutate(newValue=!!filter_var[speed==4L])
}
my_fu(df, "date")
#> Error in quos(..., .named = TRUE): object 'speed' not found
Thanks in advance.
We can place the evaluation within brackets. Otherwise, it may try to evaluate the whole expression (filter_var[speed = 4L]) instead of filter_var alone
library(rlang)
library(dplyr)
my_fu <- function(df, filter_var){
filter_var <- sym(filter_var)
df %>%
group_by(group) %>%
mutate(newValue=(!!filter_var)[speed==4L])
}
my_fu(df, "date")
# A tibble: 9 x 4
# Groups: group [3]
# group date speed newValue
# <dbl> <dbl> <dbl> <dbl>
#1 1 1 3 2
#2 1 2 4 2
#3 1 3 3 2
#4 2 4 4 4
#5 2 5 5 4
#6 2 6 6 4
#7 3 7 6 8
#8 3 8 4 8
#9 3 9 9 8
Also, you can use from sqldf. Join df with a constraint on that:
library(sqldf)
df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
speed = c(3,4,3,4,5,6,6,4,9))
sqldf("SELECT df_origin.*, df4.`date` new_value FROM
df df_origin join (SELECT `group`, `date` FROM df WHERE speed = 4) df4
on (df_origin.`group` = df4.`group`)")

The dplyr way to get grouped differences

I am trying to figure out the dplyr way to do grouped differences.
Here is some fake data:
>crossing(year=seq(1,4),week=seq(1,3)) %>%
mutate(value = c(rep(4,3),rep(3,3),rep(2,3),rep(1,3)))
year week value
<int> <int> <dbl>
1 1 1 4
2 1 2 4
3 1 3 4
4 2 1 3
5 2 2 3
6 2 3 3
7 3 1 2
8 3 2 2
9 3 3 2
10 4 1 1
11 4 2 1
12 4 3 1
What I would like is year 1- year2, year2-year3, and year3-year4. The result would like like the following.
year week diffs
<int> <int> <dbl>
1 1 1 1
2 1 2 1
3 1 3 1
4 2 1 1
5 2 2 1
6 2 3 1
7 3 1 1
8 3 2 1
9 3 3 1
Edit:
I apologize. I was trying to make a simple reprex, but I messed up a lot.
Please let me know what the proper etiquette is. I don't want to ruffle any feathers.
I did not know that -diff() was a function. What I am actually looking for is percent difference ((new-old)/old)*100 and I am not able to find a straight forward way to use diff to get that value.
I am starting from the largest year. Adding a arrange(desc(year)) to the above code is what I have. I would be trimming the smallest year not the largest.
If this edit with worth a separate question let me know.
If you don't have missing years for each week:
df %>%
arrange(year) %>%
group_by(week) %>%
mutate(diffs = value - lead(value)) %>%
na.omit() %>% select(-value)
# A tibble: 9 x 3
# Groups: week [3]
# year week diffs
# <int> <int> <dbl>
#1 1 1 1
#2 1 2 1
#3 1 3 1
#4 2 1 1
#5 2 2 1
#6 2 3 1
#7 3 1 1
#8 3 2 1
#9 3 3 1
You can use diff, but it needs adjusting, as it subtracts the other way and returns a vector that's one shorter than what it's passed:
library(tidyverse)
diffed <- crossing(year = seq(1,4),
week = seq(1,3)) %>%
mutate(value = rep(4:1, each = 3)) %>%
group_by(week) %>%
mutate(value = c(-diff(value), NA)) %>%
drop_na(value)
diffed
#> # A tibble: 9 x 3
#> # Groups: week [3]
#> year week value
#> <int> <int> <int>
#> 1 1 1 1
#> 2 1 2 1
#> 3 1 3 1
#> 4 2 1 1
#> 5 2 2 1
#> 6 2 3 1
#> 7 3 1 1
#> 8 3 2 1
#> 9 3 3 1
using dplyr and do:
library(dplyr)
df %>% group_by(week) %>% do(cbind(.[-nrow(.),1:2],diffs=-diff(.$value)))
# # A tibble: 9 x 3
# # Groups: week [3]
# year week diffs
# <int> <int> <dbl>
# 1 1 1 1
# 2 2 1 1
# 3 3 1 1
# 4 1 2 1
# 5 2 2 1
# 6 3 2 1
# 7 1 3 1
# 8 2 3 1
# 9 3 3 1

Resources