I often have to dynamically generate multiple columns based on values in existing columns. Is there a dplyr equivalent of the following?:
cols <- c("x", "y")
foo <- c("a", "b")
df <- data.frame(a = 1, b = 2)
df[cols] <- df[foo] * 5
> df
a b x y
1 1 2 5 10
Not the most elegant:
library(tidyverse)
df %>%
mutate_at(vars(foo),function(x) x*5) %>%
set_names(.,nm=cols) %>%
cbind(df,.)
a b x y
1 1 2 5 10
This can be made more elegant as suggested by #akrun :
df %>%
mutate_at(vars(foo), list(new = ~ . * 5)) %>%
rename_at(vars(matches('new')), ~ c('x', 'y'))
Related
Lets say i have the following data: x <- 1:2.
My desired output is a data.frame() like the following:
a b
1 2
With base R i would do something along:
df <- data.frame(t(x))
colnames(df) <- c("a", "b")
Question: How would i do this with the pipe operator?
What i tried so far:
library(magrittr)
x %>% data.frame(a = .[1], b = .[2])
After the transpose, convert to tibble with as_tibble and change the column names with set_names
library(dplyr)
library(tibble)
x %>%
t %>%
as_tibble(.name_repair = "unique") %>%
setNames(c("a", "b"))
# A tibble: 1 x 2
# a b
# <int> <int>
#1 1 2
Or another option if we want to use the OP's syntax would be to wrap the code with {}
x %>%
{data.frame(a = .[1], b = .[2])}
I'm trying to put a dplyr pipeline in a function but after reading the vignette multiple times as well as the tidy evaluation (https://tidyeval.tidyverse.org/dplyr.html).
I still can't get it to work...
#Sample data:
dat <- read.table(text = "A ID B
1 X 83
2 X NA
3 X NA
4 Y NA
5 X 2
6 Y 2
12 Y 10
7 Y 18
8 Y 85", header = TRUE)
# What I'm trying to do:
x <- dat %>% filter(!is.na(B)) %>% count('ID') %>% filter(freq>3)
x$ID
# Now in a function:
n_occurences <- function(df, n, column){
# Group by ID and return IDs with number of non-na > n in column
column <- enquo(column)
x <- df %>%
filter(!is.na(!!column)) %>%
count('ID') %>% filter(freq>n)
x$ID
}
# Let's try:
col <- 'B'
n_occurences(dat, n=3, column = col)
There is no error, but the output is wrong. This as something to do with the tidy evaluation, but I just can't get my head around it.
With rlang_0.40, we can do this much easier by using the {{...}} or curly-curly operator
library(rlang)
library(dplyr)
n_occurences <- function(df, n1, column){
df %>%
filter(!is.na({{column}})) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
n_occurences(dat, n1 = 3, column = B)
#[1] Y
#Levels: X Y
If we intend to pass a quoted string, convert it to symbol (sym) and then do the evaluation (!!)
n_occurences <- function(df, n1, column){
column <- rlang::sym(column)
df %>%
filter(!is.na(!!column)) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
col <- 'B'
n_occurences(dat, n1=3, column = col)
#[1] Y
#Levels: X Y
I'm new in pipes R.
I have a dataframe like this
library(magrittr)
library(dplyr)
df = data.frame(a= c(1,2,3,4,5), b = c(3,4,5,6,7))
The result is
df_min = df %>% filter(a > 2) %$% as.data.frame( cbind(a=a*10, b))
> df_min
a b
1 30 5
2 40 6
3 50 7
Is there is a more convinient and shorter way instead of %$% as.data.frame( cbind(a=a*10, b))?
A shorter option with data.table
library(data.table)
setDT(df)[a > 2, .(a = a * 10, b)]
A more convenient way to do it is as follows:
library(magrittr)
library(dplyr)
df = data.frame(a= c(1,2,3,4,5), b = c(3,4,5,6,7))
df_min = df %>% filter(a>2) %>% mutate(a=a*10)
class(df_min)
df_min
You can read about mutate here and here's some examples
I'm trying to apply a function over the rows of a data frame and return a value based on the value of each element in a column. I'd prefer to pass the whole dataframe instead of naming each variable as the actual code has many variables - this is a simple example.
I've tried purrr map_dbl and rowwise but can't get either to work. Any suggestions please?
#sample df
df <- data.frame(Y=c("A","B","B","A","B"),
X=c(1,5,8,23,31))
#required result
Res <- data.frame(Y=c("A","B","B","A","B"),
X=c(1,5,8,23,31),
NewVal=c(10,500,800,230,3100)
)
#use mutate and map or rowwise etc
Res <- df %>%
mutate(NewVal=map_dbl(.x=.,.f=FnAdd(.)))
Res <- df %>%
rowwise() %>%
mutate(NewVal=FnAdd(.))
#sample fn
FnAdd <- function(Data){
if(Data$Y=="A"){
X=Data$X*10
}
if(Data$Y=="B"){
X=Data$X*100
}
return(X)
}
If there are multiple values, it is better to have a key/val dataset, join and then do the mulitiplication
keyVal <- data.frame(Y = c("A", "B"), NewVal = c(10, 100))
df %>%
left_join(keyVal) %>%
mutate(NewVal = X*NewVal)
# Y X NewVal
#1 A 1 10
#2 B 5 500
#3 B 8 800
#4 A 23 230
#5 B 31 3100
It is not clear how many unique values are there in the actual dataset 'Y' column. If we have only a few values, then case_when can be used
FnAdd <- function(Data){
Data %>%
mutate(NewVal = case_when(Y == "A" ~ X * 10,
Y == "B" ~ X *100,
TRUE ~ X))
}
FnAdd(df)
# Y X NewVal
#1 A 1 10
#2 B 5 500
#3 B 8 800
#4 A 23 230
#5 B 31 3100
You were originally looking for a solution using dplyr's rowwise() function, so here is that solution. The nice thing about this approach is that you don't need to create a separate function.
Here's the version using if()
df %>%
rowwise() %>%
mutate(NewVal = ifelse(Y == "A", X * 10,
ifelse(Y == "B", X * 100)))
and here's the version using case_when:
df %>%
rowwise() %>%
mutate(NewVal = case_when(Y == "A" ~ X * 10,
Y == "B" ~ X * 100))
I have a dataframe:
source= c("A", "A", "B")
target = c("B", "C", "C")
source_A = c(5, 5, 6)
target_A = c(6, 7, 7)
source_B = c(10, 10, 11)
target_B = c(11, 12, 12)
c = c(0.5, 0.6, 0.7)
df = data.frame(source, target, source_A, target_A, source_B, target_B, c)
> df
source target source_A target_A source_B target_B c
1 A B 5 6 10 11 0.5
2 A C 5 7 10 12 0.6
3 B C 6 7 11 12 0.7
How can I reduce this dataframe to return only the values for the unique source and target values and return (ignoring column c).
For the Values [A B C]
id A B
1 A 5 10
2 B 6 11
3 C 7 12
At the moment I do something like this:
df1 <- df[,c("source","source_A", "source_B")]
df2 <- df[,c("target","target_A", "target_B")]
names(df1)[names(df1) == 'source'] <- 'id'
names(df1)[names(df1) == 'source_A'] <- 'A'
names(df1)[names(df1) == 'source_B'] <- 'B'
names(df2)[names(df2) == 'target'] <- 'id'
names(df2)[names(df2) == 'target_A'] <- 'A'
names(df2)[names(df2) == 'target_B'] <- 'B'
df3 <- rbind(df1,df2)
df3[!duplicated(df3$id),]
id A B
1 A 5 10
3 B 6 11
5 C 7 12
In reality, I have tens of columns so this is non-viable long term.
How can I do this more succinctly (and ideally, generaliseable to more columns)?
library(dplyr)
library(magrittr)
df1 <- subset(df, select = ls(pattern = "source"))
df2 <- subset(df, select = ls(pattern = "target"))
names(df1) <- names(df2)
df <- bind_rows(df1, df2)
df %<>% group_by(target, target_A, target_B) %>% slice(1)
This should do it, but I do not quite know how you want to generalize it.
I don't think this is the most elegant solution in the world, but it serves the purpose. Hopefully the columns that you intend to use can be targeted by the column name string pattern!
Here's a more general method with dplyr functions. You basically need to gather everything into a long format, where you can rename the variable accordingly, then spread them back into id, A, B:
library(dplyr)
library(tidyr)
df %>%
select(-c) %>%
mutate(index = row_number()) %>%
gather(key , value, -index) %>%
separate(key, c("type", "name"), fill = "right") %>%
mutate(name = ifelse(is.na(name), "id", name)) %>%
spread(key = name, value = value) %>%
select(id, matches("[A-Z]", ignore.case = FALSE)) %>%
distinct