My question is how to put in multiple actions after the if statement. For example:
vartest <- var.test(var1 ~ group, alternative='two.sided', conf.level=.95, data=data1)
ttest <- t.test(var1~group, alternative='two.sided', conf.level=.95, var.equal=FALSE, data=data1)
if (vartest$p.value>0.05) {
if (ttest$p.value<=0.05) {
cat(ttest$p.value)
ggboxplot(data1, x="group", y="var1", color="group", palette=c("#00AFBB", "#E7B800"), ylab="var1", xlab="group")
group_by(data1, group) %>% summarise(count = n(), mean = mean(var1, na.rm = TRUE), sd = sd(var1, na.rm = TRUE))
} else{
cat("text1.")
}
} else{
cat("text2.")
}
It isn't giving me my desired result, instead only the first and the last part of the expressions was printed to the console.
If I change the order of the actions for that:
vartest <- var.test(var1 ~ group, alternative='two.sided', conf.level=.95, data=data1)
ttest <- t.test(var1~group, alternative='two.sided', conf.level=.95, var.equal=FALSE, data=data1)
if (vartest$p.value>0.05) {
if (ttest$p.value<=0.05) {
cat(ttest$p.value)
group_by(data1, group) %>% summarise(count = n(), mean = mean(var1, na.rm = TRUE), sd = sd(var1, na.rm = TRUE))
ggboxplot(data1, x="group", y="var1", color="group", palette=c("#00AFBB", "#E7B800"), ylab="var1", xlab="group")
} else{
cat("text1.")
}
} else{
cat("text2.")
}
Only the first and the last part of the expressions are printed to the console.
Any ideas how can I run all actions between if and else?
Please help!
Implicit printing is disabled inside braces, and the value of the last expression is returned.
{
"hello"
cat("world\n")
"how are you"
"today?"
}
## world
## [1] "today?"
If you want to print or otherwise display something from inside braces you can use explicit print, or message or cat.
{
print("hello")
cat("world\n")
message("how are you")
"today?"
}
## [1] "hello"
## world
## how are you
## [1] "today?"
I don't have the data you're working with, but I recreated your problem as follows:
Example:
library(ggplot2)
library(dplyr)
p.value <- 0.02
if (p.value<=0.05) {
cat(p.value)
mtcars %>% group_by(gear) %>% summarise(count = n(), mean = mean(hp))
qplot(x = wt, y = mpg, data = mtcars, geom = "point")
} else{
cat("text1.")
}
The example above will only print p.value and the plot. Switching the order of the plot and the summary table will return only the p.value and summary table.
To solve this, we can assign back and explicitly print.
library(ggplot2)
library(dplyr)
p.value <- 0.02
if (p.value<=0.05) {
cat(p.value)
my_sum <- mtcars %>% group_by(gear) %>% summarise(count = n(), mean = mean(hp))
print(my_sum)
my_plot <- qplot(x = wt, y = mpg, data = mtcars, geom = "point")
print(my_plot)
} else{
cat("text1.")
}
Related
Am writing some dplyr across statements. Want to create some p-values using the functions t.test and varTest. The x= columns for calculations are in df_vars and the mu= and sigma.squared= parameter values are in df_mu_sigma.
A hard-coded version of the data I need are in df_sumry. If the variable names were always the same when code is run, something like this would suffice. That's not the case, however.
The beginnings of a non-hard-coded version of what I need are in df_sumry2. That doesn't yield a correct result yet though, because values of mu= and sigma.squared= are not dynamically specified. Only the first two p-values are correct in df_sumry2. They are always wrong after that because the code always uses values for the mpg variable.
How can I consistently get the right values inserted for mu and sigma.squared?
library(dplyr)
library(magrittr)
library(EnvStats)
df_vars <- mtcars %>%
select(mpg, cyl, disp, hp)
set.seed(9302)
df_mu_sigma <- mtcars %>%
select(mpg, cyl, disp, hp) %>%
slice_sample(n = 12) %>%
summarize(
across(
everything(),
list(mean = mean,
std = sd
))
)
df_sumry <- df_vars %>%
summarize(
mpg_mean = mean(mpg),
mpg_mean_prob = t.test(mpg, mu = df_mu_sigma$mpg_mean)$p.value,
mpg_std = sd(mpg),
mpg_std_prob = varTest(mpg, sigma.squared = df_mu_sigma$mpg_std^2)$p.value,
cyl_mean = mean(cyl),
cyl_mean_prob = t.test(cyl, mu = df_mu_sigma$cyl_mean)$p.value,
cyl_std = sd(cyl),
cyl_std_prob = varTest(cyl, sigma.squared = df_mu_sigma$cyl_std^2)$p.value,
disp_mean = mean(disp),
disp_mean_prob = t.test(disp, mu = df_mu_sigma$disp_mean)$p.value,
disp_std = sd(disp),
disp_std_prob = varTest(disp, sigma.squared = df_mu_sigma$disp_std^2)$p.value,
hp_mean = mean(hp),
hp_mean_prob = t.test(hp, mu = df_mu_sigma$hp_mean)$p.value,
hp_std = sd(hp),
hp_std_prob = varTest(hp, sigma.squared = df_mu_sigma$hp_std^2)$p.value
)
vars_num <- names(df_vars)
df_sumry2 <- df_vars %>%
summarize(
across(
all_of(vars_num),
list(mean = mean,
mean_prob = function(x) t.test(x, mu = df_mu_sigma$mpg_mean)$p.value,
std = sd,
std_prob = function(x) varTest(x, sigma.squared = df_mu_sigma$mpg_std^2)$p.value)
)
)
I appear to have come up with a solution to my own problem. I'd be happy to see alternative solutions though as they may be better than mine.
library(dplyr)
library(magrittr)
library(EnvStats)
df_vars <- mtcars %>%
select(mpg, cyl, disp, hp)
df_mu_sigma <- mtcars %>%
select(mpg, cyl, disp, hp) %>%
slice_sample(n = 12) %>%
summarize(
across(
everything(),
list(mean = mean,
std = sd
))
)
df_sumry <- df_vars %>%
summarize(
mpg_mean = mean(mpg),
mpg_mean_prob = t.test(mpg, mu = df_mu_sigma$mpg_mean)$p.value,
mpg_std = sd(mpg),
mpg_std_prob = varTest(mpg, sigma.squared = df_mu_sigma$mpg_std^2)$p.value,
cyl_mean = mean(cyl),
cyl_mean_prob = t.test(cyl, mu = df_mu_sigma$cyl_mean)$p.value,
cyl_std = sd(cyl),
cyl_std_prob = varTest(cyl, sigma.squared = df_mu_sigma$cyl_std^2)$p.value,
disp_mean = mean(disp),
disp_mean_prob = t.test(disp, mu = df_mu_sigma$disp_mean)$p.value,
disp_std = sd(disp),
disp_std_prob = varTest(disp, sigma.squared = df_mu_sigma$disp_std^2)$p.value,
hp_mean = mean(hp),
hp_mean_prob = t.test(hp, mu = df_mu_sigma$hp_mean)$p.value,
hp_std = sd(hp),
hp_std_prob = varTest(hp, sigma.squared = df_mu_sigma$hp_std^2)$p.value
)
vars_num <- names(df_vars)
library(glue)
df_sumry2 <- df_vars %>%
summarize(
across(
all_of(vars_num),
list(mean = mean,
mean_prob = function(x) {
mu_name <- glue("{ensym(x)}_mean")
t.test(x, mu = df_mu_sigma[[mu_name]])$p.value
},
std = sd,
std_prob = function(x) {
sigma_name <- glue("{ensym(x)}_std")
varTest(x, sigma.squared = df_mu_sigma[[sigma_name]]^2)$p.value
}
)
)
)
all.equal(df_sumry, df_sumry2)
This is not much better than your solution, but I would use cur_column() instead of ensym() to avoid quosures handling.
Also, putting the query in a separate function makes things a bit tidier.
Finally, I would use lambda functions instead of anonymous functions for clarity.
get_mu = function(suffix){
df_mu_sigma[[paste0(cur_column(), suffix)]] #you could use glue() as well here
}
df_vars %>%
summarize(
across(
all_of(vars_num),
list(
mean = mean,
mean_prob = ~t.test(.x, mu = get_mu("_mean"))$p.value,
std = sd,
std_prob = ~varTest(.x, sigma.squared = get_mu("_std")^2)$p.value
)
)
) %>% t() #just to format the output
# [,1]
# mpg_mean 20.09062500
# mpg_mean_prob 0.01808550
# mpg_std 6.02694805
# mpg_std_prob 0.96094601
# cyl_mean 6.18750000
# cyl_mean_prob 0.10909740
# cyl_std 1.78592165
# cyl_std_prob 0.77092484
# disp_mean 230.72187500
# disp_mean_prob 0.17613878
# disp_std 123.93869383
# disp_std_prob 0.96381507
# hp_mean 146.68750000
# hp_mean_prob 0.03914858
# hp_std 68.56286849
# hp_std_prob 0.03459963
I have a rate formula, which changes with each model. So, I want the aggregate function to check the number of elements in the rate formula and then run the aggregate function accordingly.
mtcars$rate_1 = mtcars$mpg - mtcars$cyl - mtcars$disp
rate_1 = "mpg - cyl - disp"
open.rate = unlist(strsplit(rate_1,"-",fixed = TRUE))
or
mtcars$rate_1 = mtcars$mpg - mtcars$cyl - mtcars$disp - mtcars$hp
rate_1 = "mpg - cyl - disp - hp"
open.rate = unlist(strsplit(rate_1,"-",fixed = TRUE))
if(length(open.rate == 3)){
data_plot = mtcars %>% group_by(carb) %>% summarise(
####### Rates #######
mpg = weighted.mean(eval(parse(text=mpg)),wt, na.rm = TRUE),
cyl = weighted.mean(eval(parse(text=cyl)),wt, na.rm = TRUE),
disp = weighted.mean(eval(parse(text=disp)),wt, na.rm = TRUE)
)
} else {
data_plot = mtcars %>% group_by(carb) %>% summarise(
####### Rates #######
mpg = weighted.mean(eval(parse(text=mpg)),wt, na.rm = TRUE),
cyl = weighted.mean(eval(parse(text=cyl)),wt, na.rm = TRUE),
disp = weighted.mean(eval(parse(text=disp)),wt, na.rm = TRUE),
hp = weighted.mean(eval(parse(text=hp)),wt, na.rm = TRUE)
)
}
The rate equation can have any number of elements. Depending on the number of elements, the aggregation needs to have the same number of elements. My aim is to write one function which takes care of that, instead of writing multiple if-else conditions.
Is there a way I can do it, without having to write the code twice for two different rates?
My solution is:
aggdata <- mtcars[, "carb"]
for(i in compratelist){
print(i)
data = mtcars %>% group_by(carb) %>% summarise(
eval(parse(text = paste0(i,"_mean = weighted.mean(eval(parse(text = ", i,")),wt, na.rm = TRUE)")))
)
colnames(data)[2] <- i
data <- as.data.frame(data)
aggdata <- left_join(aggdata, data, by = "carb")
}
aggdata <- mtcars[, "carb"]
for(i in compratelist){
print(i)
data = mtcars %>% group_by(carb) %>% summarise(
eval(parse(text = paste0(i,"_mean = weighted.mean(eval(parse(text = ", i,")),wt, na.rm = TRUE)")))
)
colnames(data)[2] <- i
data <- as.data.frame(data)
aggdata <- left_join(aggdata, data, by = "carb")
}
The problem in question would be to apply the function f to each group of a tibble. It is a simpler way to do this, but I would like to solve the problem using the group_map() function.
Data used: starwars of the dplyr package.
What I want is to get an average of the height variable for a grouped tibble considering the variables gender and species. I know the problem could be easily solved by doing:
starwars %>% group_by(gender, species) %>%
summarise(mean = mean(height, na.rm = TRUE))
However, my desire is to implement summarise(mean = mean(height, na.rm = TRUE)) in a function and send to group_map().
I tried to create the f() function that gets the data argument which is a tibble object with the previously defined groups. The second argument of the f() function would be ... so that I could pass the variables of interest from data to f().
f <- function(dados, ...){
dados %>% summarise(mean = mean(..., na.rm = TRUE))
}
starwars %>% group_by(gender, species) %>%
group_map(.tbl = ., .f = ~f(dados = .x), height)
Solutions:
func_1 <- function(dados, var, ...){
var_interesse <- enquo(var)
dots <- enquos(...)
# Could be attributed direct reference ...
dados %>% group_by(!!!dots) %>%
summarise(media = mean(x = !!var_interesse, na.rm = TRUE))
}
starwars %>% func_1(var = height, gender, species)
or
func_2 <- function(dados, var){
var_interesse <- enquo(var)
#dots <- enquos(...)
dados %>% summarise(media = mean(x = !!var_interesse, na.rm = TRUE))
}
agrupamento <- starwars %>% group_by(gender, species)
agrupamento %>%
group_map(.tbl = ., .f = ~func_2(dados = .x, var = height))
I am trying to use purrr::pmap() to apply a custom function in a rowwise fashion along some dataframe rows. I can achieve my desired end result with a for-loop and with apply(), but when I try to use pmap() I can only get the result I want in combination with mutate(), which in my real-life applied case will be insufficient.
Is there a way to use pmap() to apply my custom function and just have the output print rather than be stored in a new column?
library(dplyr)
library(purrr)
library(tibble)
Create demo data & custom function
set.seed(57)
ds_mt <-
mtcars %>%
rownames_to_column("model") %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select(model, mpg, wt, cyl, am, vs) %>%
sample_n(3)
foo <- function(model, am, mpg){
print(
paste("The", model, "has a", am, "transmission and gets", mpg, "mpgs.")
)
}
Successful example of rowwise for-loop:
for (row in 1:nrow(ds_mt)) {
foo(
model = ds_mt[row, "model"],
am = ds_mt[row, "am"],
mpg = ds_mt[row, "mpg"]
)
}
Successful example using apply():
row.names(ds_mt) <- NULL # to avoid named vector as output
apply(
ds_mt,
MARGIN = 1,
FUN = function(ds)
foo(
model = ds["model"],
am = ds["am"],
mpg = ds["mpg"]
)
)
Example using pmap() within mutate() that is almost what I need.
ds_mt %>%
mutate(new_var =
pmap(
.l =
list(
model = model,
am = am,
mpg = mpg
),
.f = foo
))
FAILING CODE: Why doesn't this work?
ds_mt %>%
pmap(
.l =
list(
model = model,
am = am,
mpg = mpg
),
.f = foo
)
So after some more reading it seems this is a case for pwalk() rather than pmap(), because I am trying to get output to print (i.e., a side effect) rather than to be stored in a dataframe.
library(dplyr)
library(purrr)
library(tibble)
set.seed(57)
ds_mt <-
mtcars %>%
rownames_to_column("model") %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select(model, mpg, wt, cyl, am, vs) %>%
sample_n(3)
foo <- function(model, am, mpg){
print(
paste("The", model, "has a", am, "transmission and gets", mpg, "mpgs.")
)
}
ds_mt %>%
select(model, am, mpg) %>%
pwalk(
.l = .,
.f = foo
)
I would like to programmatically set a column name for the dplyr::top_n function.
getSubset <- function(df, t, f) {
df %>%
top_n(t, wt = eval(as.name(f), envir = df))
}
data.frame(x = 1:20, y = 20:1) %>%
getSubset(10, "x")
And it tells me that Error: object 'f' not found. I tried to play with lazyeval package but somehow I keep misunderstanding the concept. Could somebody push me in the right direction? Thanks!
Your problem is that top_n uses non-standard evaluation. It's trying to evaluate the expression eval(as.name(f), envir = df)) in the context of the data frame df, and f doesn't exist in that environment.
One work around would be to temporarily add the desired value of wt to the data frame.
getSubset <- function(df, t, f) {
df %>%
mutate(.wt = eval(as.name(f), envir = df)) %>%
top_n(t, wt = .wt) %>%
select(-.wt)
}
data.frame(x = 1:20, y = 20:1) %>%
getSubset(10, "x")
Another approach would be to use interp from the lazyeval package.
getSubset <- function(df, t, f) {
call <- quote(df %>%
top_n(t, wt = .wt))
call <- interp(call, .wt = eval(as.name(f), envir = df))
eval(call)
}
data.frame(x = 1:20, y = 20:1) %>%
getSubset(10, "x")