How do I plot the histograms for these four random variables . This works, but it seems unneccessarily long.
#libraries
library(tidyverse)
library(purrr)
# Standard deviation question
std_devs %>%
map(rnorm, n=1000, mean=75) %>%
do.call('rbind', .) %>%
t() %>%
as.data.frame() %>%
gather() %>%
ggplot(., aes(x=value))+geom_histogram()+facet_wrap(~key)
purrr is loaded by tidyverse so you can skip that line. map_df makes the rest much more condensed.
library(tidyverse)
# Standard deviation question
set.seed(10)
std_devs <- 1:4
std_devs %>%
map_df(~data_frame(key = ., value = rnorm(n=1000, mean=75, sd = .))) %>%
ggplot(aes(x=value))+geom_histogram()+facet_wrap(~key)
Related
I want to use to following exampe to do t-tests with multiple variables - The code is used from https://www.datanovia.com/en/blog/how-to-perform-multiple-t-test-in-r-for-different-variables/:
options(scipen = 99)
# Load required R packages
library(tidyverse)
library(rstatix)
library(ggpubr)
# Prepare the data and inspect a random sample of the data
mydata <- iris %>%
filter(Species != "setosa") %>%
as_tibble()
mydata %>% sample_n(6)
# Transform the data into long format
# Put all variables in the same column except `Species`, the grouping variable
mydata.long <- mydata %>%
pivot_longer(-Species, names_to = "variables", values_to = "value")
mydata.long %>% sample_n(6)
stat.test <- mydata.long %>%
group_by(variables) %>%
t_test(value ~ Species) %>%
adjust_pvalue(method = "BH") %>%
add_significance()
stat.test
This tutorial uses the t_test function of the rstatix package. It works great, but is there a way to disable the scientific notation of the p-values? I want to output p-values like 0.000445 instead of 4.45e-4.
Unfortunetely the use of
options(scipen = 99)
did not change anything.
Thank you!
EDIT: The solution can be found in the comments - it is necessary to call stat.test this way:
as.data.frame(stat.test)
Thank rawr for his comment!
I want to create a plot in R with ggplot() to visualise the data included in variable matrix that looks like this:
matrix <- matrix(c(time =c(1,2,3,4,5),v1=rnorm(5),v2=c(NA,1,0.5,0,0.1)),nrow=5)
colnames(matrix) <- c("time","v1","v2")
df <-data.frame(
time=rep(matrix[,1],2),
values=c(matrix[,2],matrix[,3]),
names=rep(c("v1","v2"), each=length(matrix[,1]))
)
ggplot(df, aes(x=time,y=values,color=names)) +
geom_point()+
facet_grid(names~.)
Is there a faster way than transforming the data in a data.frame like I do? This way seems to be very laborious..
I would appreciate every help!! Thanks in advance.
A tidyverse approach:
This will produce the data structure you need to use in ggplot
library(tidyverse)
matrix %>%
as_data_frame() %>%
gather(., names, value, -time)
This will generate data structure and plot all at once
matrix %>%
as_data_frame() %>%
gather(., names, value, -time) %>%
ggplot(., aes(x=time,y=value,color=names)) +
geom_point()+
facet_grid(names~.)
I am looking to use the interp and interp2xyz functions from akima in a dplyr pipe as I would like to calculate the interpolation by a group variable and output the xyz values AND a grouping variable. So I ideally I'd like something like this (a general solution):
DATFRAME %>%
group_by(GROUPING VARIABLE) %>%
summarise(interp(x, y , z))
So it is relatively simple to calculate some interpolation values on the whole dataframe and then create another dataframe that has those values using the interpolation functions from the akima package:
library(dplyr)
library(akima)
df <- data.frame(
x=runif(200, 0, 5),
y=runif(200, 0, 5),
z=runif(200, 1, 2),
Group=LETTERS[seq( from = 1, to = 2 )])
interp_df <- interp(x=df$x, y=df$y, z=df$z)
interp2xyz(interp_df, data.frame=TRUE)
But when I try to incorporate those into a dplyr pipe setup like so:
df %>%
group_by(Group) %>%
summarise(interp(x=x, y=y, z=z))
Error: expecting a single value
Or then maybe using mutate:
df %>%
group_by(Group) %>%
mutate(interp(x=x, y=y, z=z))
Error: incompatible size (3), expecting 100 (the group size) or 1
I am not married to the dplyr solution - that is just the approach I can think of. Does anyone know of a way to calculate a 3D interpolation by a grouping variable such that a dataframe with all groups and their interpolations is the result?
There is no try, only do():
dpinterp <- function(df) {
interp_df <- interp(x=df$x, y=df$y, z=df$z)
interp2xyz(interp_df, data.frame=TRUE)
}
df %>%
group_by(Group) %>%
do(dpinterp(.))
One really cool feature from the ggplot2 package that I never really exploited enough was adding lists of layers to a plot. The fun thing about this was that I could pass a list of layers as an argument to a function and have them added to the plot. I could then get the desired appearance of the plot without necessarily returning the plot from the function (whether or not this is a good idea is another matter, but it was possible).
library(ggplot2)
x <- ggplot(mtcars,
aes(x = qsec,
y = mpg))
layers <- list(geom_point(),
geom_line(),
xlab("Quarter Mile Time"),
ylab("Fuel Efficiency"))
x + layers
Is there a way to do this with pipes? Something akin to:
#* Obviously isn't going to work
library(dplyr)
action <- list(group_by(am, gear),
summarise(mean = mean(mpg),
sd = sd(mpg)))
mtcars %>% action
To construct a sequence of magrittr steps, start with .
action = . %>% group_by(am, gear) %>% summarise(mean = mean(mpg), sd = sd(mpg))
Then it can be used as imagined in the OP:
mtcars %>% action
Like a list, we can subset to see each step:
action[[1]]
# function (.)
# group_by(., am, gear)
To review all steps, use functions(action) or just type the name:
action
# Functional sequence with the following components:
#
# 1. group_by(., am, gear)
# 2. summarise(., mean = mean(mpg), sd = sd(mpg))
#
# Use 'functions' to extract the individual functions.
I'm using do() to fit a model to grouped data, and then I want to plot the fit for each group. In plyr, I guess I would use d_ply(). In dplyr, I'm trying either do() or summarise() using a function that makes the plot as a side effect.
I'm getting different results depending on whether I use do() or summarise(), and I'm not sure why. Specifically it seems like summarise() isn't operating on each row correctly.
Here's my example:
require(nycflights13)
require(mgcv)
# fit a gam to the flights grouped by dest (from ?do)
by_dest <- flights %>% group_by(dest) %>% filter(n() > 100)
models = by_dest %>% do(smooth = gam(arr_delay ~ s(dep_time) + month, data = .))
# print the first 4 rows, the dest is ABQ, ACK, ALB, ATL
models %>% slice(1:4)
# make a function to plot the models, titled by dest
plot.w.title = function(title, gam.model){
plot.gam(gam.model, main=title)
return(1)
}
# This code makes plots with the wrong titles, for example ATL is listed twice:
models %>%
slice(1:4) %>%
rowwise %>%
summarise(useless.column = plot.w.title(dest, smooth)) # for plot side effect
# this code gives me the correct titles...why the difference?
models %>%
slice(1:4) %>%
rowwise %>%
do(useless.column = plot.w.title(.$dest, .$smooth))
The summarise() method will work if you modify the function by applying unique() to the title:
plot.w.title = function(title, gam.model){
plot.gam(gam.model, main=unique(title))
return(1)
}