Mutate data frame via function in R - r

Sorry for the basic question, but I could not find an example in this forum to solve this question. I've tried this one and this one.
I want to change / create a new variable in my data.frame via function in R and tidyverse:
Example:
crivo <- function(x) {
x <<- x %>%
mutate(resp_1 = if_else(MEMO_RE01 == 0,"VN","FP")) %>%
mutate(resp_2 = if_else(MEMO_RE02 == 1,"VP","FN"))
}
crivo(memo_re)
My data.frame name is "memo_re", but I'll use this function to other datasets as well, just by changing the x argument. R is creating a new data.frame named x instead of creating a new variable in "memor_re" (original dataset). In other words, I want to assign a function to do that:
memo_re <- memo_re %>% mutate(resp_1 = if_else(MEMO_RE01 == 0,"VN","FP"))
But I need to change many datasets and because of that, I want to be able to specify which dataset I'll change.
reproducible code
library(tidyverse)
memo_re <- data.frame(MEMO_RE01=rep(c(0,1),100), MEMO_RE02=c(0,1))
crivo <- function(x) {
x <<- x %>%
mutate(resp_1 = if_else(MEMO_RE01 == 0,"VN","FP")) %>%
mutate(resp_2 = if_else(MEMO_RE02 == 1,"VP","FN"))
}
crivo(memo_re)

R is doing exactly what you've asked it to do. In your crivo function definition, you've written your function to assign the new data frame you've created called x to the R environment. That's what the <<- operator does. After running your code, use ls() to see what's in your environment, then look at x. You'll see everything is there, just as you've asked it to be, including the correctly mutate x dataframe.
> memo_re <- data.frame(MEMO_RE01=rep(c(0,1),100), MEMO_RE02=c(0,1))
>
> crivo <- function(x) {
+ x <<- x %>%
+ mutate(resp_1 = if_else(MEMO_RE01 == 0,"VN","FP")) %>%
+ mutate(resp_2 = if_else(MEMO_RE02 == 1,"VP","FN"))
+ }
> crivo(memo_re)
> ls()
[1] "crivo" "memo_re" "x"
> head(x)
MEMO_RE01 MEMO_RE02 resp_1 resp_2
1 0 0 VN FN
2 1 1 FP VP
3 0 0 VN FN
4 1 1 FP VP
5 0 0 VN FN
6 1 1 FP VP
Now, if you wanted to have crivo() return something that you could then assign any name you wanted, you should use
crivo <- function(x) {
x %>%
mutate(resp_1 = if_else(MEMO_RE01 == 0,"VN","FP"),
resp_2 = if_else(MEMO_RE02 == 1,"VP","FN"))
}
Note that I haven't used the <<- operator anywhere. As a result, the crivo fx will be returning the mutated x dataframe so that you could do
new <- memo_re %>% crivo()
This way, you can pipe anything you want to crivo and assign it to any new variable. Alternatively, if you just wanted to call the function on memo_re, you can do that too:
memo_re <- memo_re %>% crivo()
Note that the "classic" way to write a function is to use return() to specify what you want a fx to return. If you don't use return() (as I haven't above), R will return whatever is in the last line. Here, it's just the mutate dataframe.

Related

Using function-scoped variable in a tidyverse filter expression

Suppose I have this code:
library(dplyr)
foo <- function(df, var) {
message("var is ", var)
df %>% filter(var==var)
}
df <- data.frame(var=c(1,2,3))
foo(df, 3)
The output is unfiltered, because var==var uses only the data frame column, and not the function parameter. See below:
> foo(df, 3)
var is 3
var
1 1
2 2
3 3
What I always do is rename the function parameter the_var, and use var == the_var. However, I'd like to learn more about tidyverse scoping.
How can I filter the var column by the var function parameter value without changing any names?
We can escape the variable inside the function to check the variable outside the environment of the data
foo <- function(df, var) {
message("var is ", var)
df %>%
filter(var == !!var)
}
-output
foo(df, 3)
#var is 3
# var
#1 3

For loop to extract data

I Have data set with these variables (Branch, Item, Sales, Stock) I need to make a for loop to extract a data with the following
The same item which has
1-different branches
2- its sales is higher than the stock
and save the result in data frame
The code I used is
trials <- sample_n(Data_with_stock,1000)
for (i in 1:nrow(trials))
{
if(trials$sales[i] > trials$stock[i] & trials$item[i] == trials$item[i+1] & trials$branch[i] != trials$branch[i+1])
{s <-data.frame( (trials$NAME[i])
,(trials$branch[i]))
}
}
Suggest you use dplyr library, post installing considering "df" is your dataset, use the below commands for question 1 and 2
Question 1
question_one = df %>%
group_by(Item) %>%
summarise(No_of_branches = n_distinct(Branch))
items_with_more_than_one_branch = question_one[which(question_one$No_of_branches>1)"Item"]
Question 2: Similarly,
question_two = df %>%
group_by(Item) %>%
summarise(Stock_Val = sum(Stock), Sales_Val = sum(Sales))
item_with_sales_greater_than_stock = question_two[which(question_two$Sales > question_two$Stock),"Item"]
Couldn't help but solve without dplyr, however suggest, if not used yet, dplyr will always be useful for data crunching
As you just want to fix your code:
You missed to set one =in your code.
Use:
trials <- sample_n(Data_with_stock,1000)
# next you need first to define s used in your loop
s <- array(NA, dim = c(1,2)) # as you only save 2 things in s per iteration
for (i in 1:nrow(trials)) {
# but I dont get why you compare the second condition.
if(trials$sales[i] > trials$stock[i] & trials$item[i] == trials$item[i] & trials$branch[i] != trials$branch[i+1]) {
s[i,] <- cbind(trials$NAME[i], trials$branch[i])
} else {
s[i,] <- NA # just to have no problem with the index i, you can delete the one with na afterwards with na.omit()
}

How can I simultaneously assign value to multiple new columns with R and dplyr?

Given
base <- data.frame( a = 1)
f <- function() c(2,3,4)
I am looking for a solution that would result in a function f being applied to each row of base data frame and the result would be appended to each row. Neither of the following works:
result <- base %>% rowwise() %>% mutate( c(b,c,d) = f() )
result <- base %>% rowwise() %>% mutate( (b,c,d) = f() )
result <- base %>% rowwise() %>% mutate( b,c,d = f() )
What is the correct syntax for this task?
This appears to be a similar problem (Assign multiple new variables on LHS in a single line in R) but I am specifically interested in solving this with functions from tidyverse.
I think the best you are going to do is a do() to modify the data.frame. Perhaps
base %>% do(cbind(., setNames(as.list(f()), c("b","c","d"))))
would probably be best if f() returned a list in the first place for the different columns.
In case you're willing to do this without dplyr:
# starting data frame
base_frame <- data.frame(col_a = 1:10, col_b = 10:19)
# the function you want applied to a given column
add_to <- function(x) { x + 100 }
# run this function on your base data frame, specifying the column you want to apply the function to:
add_computed_col <- function(frame, funct, col_choice) {
frame[paste(floor(runif(1, min=0, max=10000)))] = lapply(frame[col_choice], funct)
return(frame)
}
Usage:
df <- add_computed_col(base_frame, add_to, 'col_a')
head(df)
And add as many columns as needed:
df_b <- add_computed_col(df, add_to, 'col_b')
head(df_b)
Rename your columns.

How to pass an anonymous function to dplyr summarise

I have a simple data frame with 3 columns: name, goal, and actual.
Because this is a simplification of much larger dataframe, I want to use dplyr to compute the number of times a goal has been met by each person.
df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))
The result should look like this:
I should be able to pass an anonymous function similar to what is shown below, but don't have the syntax quite right:
library(dplyr)
g <- group_by(df, name)
summ <- summarise(g, met_goal = sum((function(x,y) {
if(x>y){return(0)}
else{return(1)}
})(goal, actual)
)
)
When I run the code above, I see 3 of these errors:
Warning messages:
1: In if (x == y) { :
the condition has length > 1 and only the first element will be used
We have equal length vectors in goal and actual, so the relational operators are appropriate to use here. However, when we use them in a simple if() statement we may get unexpected results because if() expects length 1 vectors. Since we have equal length vectors and we require a binary result, taking the sum of the logical vector is the best approach, as follows.
group_by(df, name) %>%
summarise(met_goal = sum(goal <= actual))
# A tibble: 2 x 2
name met_goal
<fctr> <int>
1 Fred 2
2 Sally 1
The operator is switched to <= because you want 0 for goal > actual and 1 otherwise.
Note that you can use an anonymous function. It was the if() statement that was throwing you off. For example, using
sum((function(x, y) x <= y)(goal, actual))
would work in the manner you are asking about.
Solution using data.table:
You asked for dplyr solution, but as actual data is much larger you can use data.table. foo is function you want to apply.
foo <- function(x, y) {
res <- 0
if (x <= y) {
res <- 1
}
return(res)
}
library(data.table)
setDT(df)
setkey(df, name)[, foo(goal, actual), .(name, 1:nrow(df))][, sum(V1), name]
If you prefer pipes then you can use this:
library(magrittr)
setDT(df) %>%
setkey(name) %>%
.[, foo(goal, actual), .(name, 1:nrow(.))] %>%
.[, .(met_goal = sum(V1)), name]
name met_goal
1: Fred 2
2: Sally 1
Found myself needing to do something similar to this again (a year later) but with a more complex function than the simple one provided in the original question. The originally accepted answer took advantage of a specific feature of the problem, but the more general approach was touched on here. Using this approach, the answer I was ultimately after was something like this:
library(dplyr)
df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))
my_func = function(act, goa) {
if(act < goa) {
return(0)
} else {
return(1)
}
}
g <- group_by(df, name)
summ = df %>% group_by(name) %>%
summarise(met_goal = sum(mapply(my_func, .data$actual, .data$goal)))
> summ
# A tibble: 2 x 2
name met_goal
<fct> <dbl>
1 Fred 2
2 Sally 1
The original question referred to using an anonymous function. In that spirit, the last part would look like this:
g <- group_by(df, name)
summ = df %>% group_by(name) %>%
summarise(met_goal = sum(mapply(function(act, go) {
if(act < go) {
return(0)
} else {
return(1)
}
}, .data$actual, .data$goal)))

Calling recursive functions in R

Assuming I have a dataframe, df with this info
group wk source revenue
1 1 C 100
1 1 D 200
1 1 A 300
1 1 B 400
1 2 C 500
1 2 D 600
I'm trying to programatically filter's down to rows of unique combinations of group, wk and source, and then perform some operations on them, before combining them back into another dataframe. I want to write a function that can scale to any number of segments (and not just the example scenario here) and filter down rows. All I need to pass would be the column names by which I want to segment
eg.
seg <- c("group", "wk", "source")
One unique combination to filter rows in df would be
df %>% filter(group == 1 & wk == 1 & source == "A")
I wrote a recursive function (get_rows) to do so, but it doesn't seem to do what I want. Could anyone provide inputs on where I'm going wrong ?
library(dplyr)
filter_row <- function(df,x)
{
df %>% filter(group == x$group & wk == x$wk & source == x$source)
}
seg <- c("group", "wk", "source")
get_rows <- function(df,seg,pos = 1, l = list())
{
while(pos <= (length(seg) + 1))
{
if(pos <= length(seg))
for(j in 1:length(unique(df[,seg[pos]])))
{
k <- unique(df[,seg[pos]])
l[seg[pos]] <- k[j]
get_rows(df,seg,pos+1,l)
return()
}
if(pos > length(seg))
{
tmp <- df %>% filter_row(l)
<call some function on tmp>
return()
}
}
}
get_rows(df,seg)
EDIT: I understand there are prebuilt methods I can use to get what I need, but I'm curious about where I'm going wrong in the recursive function I wrote.
There might be a data.table/dplyr solution out there, but this one is pretty simple.
# Just paste together the values of the column you want to aggregate over.
# This creates a vector of factors
f <- function(data, v) {apply(data[,v,drop=F], 1, paste, collapse = ".")}
# Aggregate, tapply, ave, and a few more functions can do the same thing
by(data = df, # Your data here
INDICES = f(df, c("group", "wk", "source")), # Your data and columns here
FUN = identity, simplify = F) # Your function here
Can also use library(dplyr) and library(data.table)
df %>% data.table %>% group_by(group, wk, source) %>% do(yourfunctionhere, use . for x)

Resources