I can't think how to do this in a tidy fashion.
I have a table as follows:
tibble(
Min = c(1, 5, 12, 13, 19),
Max = c(3, 11, 12, 14, 19),
Value = c("a", "bb", "c", "d", "e" )
)
and I want to generate another table from it as shown below
tibble(
Row = c(1:3, 5:11, 12:12, 13:14, 19:19),
Value = c( rep("a", 3), rep("bb", 7), "c", "d", "d", "e" )
)
Grateful for any suggestions folk might have. The only 'solutions' which come to mind are a bit cumbersome.
1) If DF is the input then:
library(dplyr)
DF %>%
group_by(Value) %>%
group_modify(~ tibble(Row = seq(.$Min, .$Max))) %>%
ungroup
giving:
# A tibble: 14 x 2
Value Row
<chr> <int>
1 a 1
2 a 2
3 a 3
4 bb 5
5 bb 6
6 bb 7
7 bb 8
8 bb 9
9 bb 10
10 bb 11
11 c 12
12 d 13
13 d 14
14 e 19
2) This one creates a list column L containing tibbles and then unnests it. Duplicate Value elements are ok with this one.
library(dplyr)
library(tidyr)
DF %>%
rowwise %>%
summarize(L = list(tibble(Value, Row = seq(Min, Max)))) %>%
ungroup %>%
unnest(L)
I wish to dynamically select variables and modify them in tibble dataframe. I have copied a sample problem. Can someone please help with that.
Thanks
df <- tibble(
x = c(1, 2, 3),
y = c(4, 5, 6),
z = c(6, 7, 8))
variables = c("x", "y")
for (var in variables)
{
df <- df %>% mutate(var = var + 1)
}
We can use mutate with across
library(dplyr)
df %>%
mutate(across(all_of(variables), ~ . + 1))
-output
# A tibble: 3 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 2 5 6
#2 3 6 7
#3 4 7 8
data
df <- tibble(
x = c(1, 2, 3),
y = c(4, 5, 6),
z = c(6, 7, 8))
variables = c("x", "y")
Try this. You can use !! from rlang and sym() from dplyr to make the evaluation you want using the operator :=. Here the code:
library(dplyr)
#Data and code
df <- tibble(
x = c(1, 2, 3),
y = c(4, 5, 6),
z = c(6, 7, 8))
variables = c("x", "y")
for (var in variables)
{
var <- sym(var)
df <- df %>% mutate(!!var := !!var + 1)
}
Output:
# A tibble: 3 x 3
x y z
<dbl> <dbl> <dbl>
1 2 5 6
2 3 6 7
3 4 7 8
I have a dataset: (actually I have more than 100 groups)
and I want to use dplyr to create a variable-y for each group, and fill first value of y to be 1,
Second y = 1* first x + 2*first y
The result would be:
I tried to create a column- y, all=1, then use
df%>% group_by(group)%>% mutate(var=shift(x)+2*shift(y))%>% ungroup()
but the formula for y become, always use initialize y value--1
Second y = 1* first x + 2*1
Could someone give me some ideas about this? Thank you!
The dput of my result data is:
structure(list(group = c("a", "a", "a", "a", "a", "b", "b", "b" ), x =
c(1, 2, 3, 4, 5, 6, 7, 8), y = c(1, 3, 8, 19, 42, 1, 8, 23)),
row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame" ))
To perform such calculation we can use accumulate from purrr or Reduce in base R.
Since you are already using dplyr we can use accumulate :
library(dplyr)
df %>%
group_by(group) %>%
mutate(y1 = purrr::accumulate(x[-n()], ~.x * 2 + .y, .init = 1))
# group x y y1
# <chr> <dbl> <dbl> <dbl>
#1 a 1 1 1
#2 a 2 3 3
#3 a 3 8 8
#4 a 4 19 19
#5 a 5 42 42
#6 b 6 1 1
#7 b 7 8 8
#8 b 8 23 23
I have two dataframes like so
df_1 <- data.frame(Min = c(1, 4, 9, 25),
Max = c(3, 7, 14, 100))
df_2 <- data.frame(Value = c(5, 2, 33),
Symbol = c("B", "A", "D"))
I want to attach df_2$Symbol to df_1 based on whether or not df_2$Value falls between df_1$Min and df_1$Max. If there's no df_2$Value in the appropriate range I'd like NA instead:
df_target <- data.frame(
Min = c(1, 4, 9, 25),
Max = c(3, 7, 14, 100),
Symbol = c("A", "B", NA, "D")
)
If df_1 and df_2 were of equal lengths this would be simple with findInterval or something with cut but alas...
A solution in either base or tidyverse would be appreciated.
We could use a non-equi join
library(data.table)
setDT(df_1)[df_2, Symbol := Symbol, on = .(Min < Value, Max > Value)]
df_1
# Min Max Symbol
#1: 1 3 A
#2: 4 7 B
#3: 9 14 <NA>
#4: 25 100 D
Or can use fuzzy_left_join
library(fuzzyjoin)
fuzzy_left_join(df_1, df_2, by = c('Min' = 'Value',
'Max' = 'Value'), list(`<`, `>`) ) %>%
dplyr::select(-Value)
# Min Max Symbol
#1 1 3 A
#2 4 7 B
#3 9 14 <NA>
#4 25 100 D
I am trying to calculate the median (but that could be substituted by similar metrics) by group for multiple columns based on subsets defined by other columns. This is direct follow-on question from this previous post of mine. I have attempted to incorporate calculating the median via aggregate into the Map(function(x,y) dosomething, x, y) solution kindly provided by #Frank, but that didn't work. Let me illustrate:
Calculate median for A and B by groups GRP1 and GRP2
df <- data.frame(GRP1 = c("A","A","A","A","A","A","B","B","B","B","B","B"), GRP2 = c("A","A","A","B","B","B","A","A","A","B","B","B"), A = c(0,4,6,7,0,1,9,0,0,8,3,4), B = c(6,0,4,8,6,7,0,9,9,7,3,0))
med <- aggregate(.~GRP1+GRP2,df,FUN=median)
Simple. Now add columns defining which rows to be used for calculating the median, i.e. rows with NAs should be dropped, column a defines which rows to be used for calculating the median in column A, same for columns b and B:
a <- c(1,4,7,3,NA,3,7,NA,NA,4,8,1)
b <- c(5,NA,7,9,5,6,NA,8,1,7,2,9)
df1 <- cbind(df,a,b)
As mentioned above, I have tried combining Map and aggregate, but that didn't work. I assume that Map doesn't know what to do with GRP1 and GRP2.
med1 <- Map(function(x,y) aggregate(.~GRP1+GRP2,df1[!is.na(y)],FUN=median), x=df1[,3:4], y=df1[, 5:6])
This is the result I'm looking for:
GRP1 GRP2 A B
1 A A 4 5
2 B A 9 9
3 A B 4 7
4 B B 4 3
Any help will be much appreciated!
Using data.table
library(data.table)
setDT(df1)
df1[, .(A = median(A[!is.na(a)]), B = median(B[!is.na(b)])), by = .(GRP1, GRP2)]
GRP1 GRP2 A B
1: A A 4 5
2: A B 4 7
3: B A 9 9
4: B B 4 3
Same logic in dplyr
library(dplyr)
df1 %>%
group_by(GRP1, GRP2) %>%
summarise(A = median(A[!is.na(a)]), B = median(B[!is.na(b)]))
The original df1:
df1 <- data.frame(
GRP1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
GRP2 = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B"),
A = c(0, 4, 6, 7, 0, 1, 9, 0, 0, 8, 3, 4),
B = c(6, 0, 4, 8, 6, 7, 0, 9, 9, 7, 3, 0),
a = c(1, 4, 7, 3, NA, 3, 7, NA, NA, 4, 8, 1),
b = c(5, NA, 7, 9, 5, 6, NA, 8, 1, 7, 2, 9)
)
With dplyr:
library(dplyr)
df1 %>%
mutate(A = ifelse(is.na(a), NA, A),
B = ifelse(is.na(b), NA, B)) %>%
# I use this to put as NA the values we don't want to include
group_by(GRP1, GRP2) %>%
summarise(A = median(A, na.rm = T),
B = median(B, na.rm = T))
# A tibble: 4 x 4
# Groups: GRP1 [?]
GRP1 GRP2 A B
<fct> <fct> <dbl> <dbl>
1 A A 4 5
2 A B 4 7
3 B A 9 9
4 B B 4 3