I want to add a summary table to plot with ggplot. I am using annotation_custom to add a previous created table.
My problem is that the table shows a different number of decimals.
As an example I am using the mtcars database and my lines of code are the following:
rm(list=ls()) #Clear environment console
data(mtcars)
head(mtcars)
library(dplyr)
library(tidyr)
library(ggplot2)
library(gridExtra)
table <- mtcars %>% #summary table that needs to be avelayed to the plot
select(wt) %>%
summarise(
Obs = length(mtcars$wt),
Q05 = quantile(mtcars$wt, prob = 0.05),
Mean = mean(mtcars$wt),
Med = median(mtcars$wt),
Q95 = quantile(mtcars$wt, prob = 0.95),
SD = sd(mtcars$wt))
dens <- ggplot(mtcars) + #Create example density plot for wt variable
geom_density(data = mtcars, aes(mtcars$wt))+
labs(title = "Density plot")
plot(dens)
dens1 <- dens + #Overlaping summary table to density plot
annotation_custom(tableGrob(t(table),
cols = c("WT"),
rows=c("Obs", "Q-05", "Mean", "Median", "Q-95", "S.D." ),
theme = ttheme_default(base_size = 11)),
xmin=4.5, xmax=5, ymin=0.2, ymax=0.5)
print(dens1)
Running the previous I obtain the following picture
density plot
I would like to fix the number of displayed decimals to only 2.
I already tried adding sprintf
annotation_custom(tableGrob(t(sprintf("%0.2f",table)),
But obtained the following error "Error in sprintf("%0.2f", table_pet) :
(list) object cannot be coerced to type 'double'"
I have been looking without any look. Any idea how can I do this.
Thank you in advance
grid.table leaves the formatting up to you,
d = data.frame(x = "pi", y = pi)
d2 = d %>% mutate_if(is.numeric, ~sprintf("%.3f",.))
grid.table(d2)
Related
I am using the survival package to make Kaplan-Mayer estimates of survival curves by group and then I plot out the said curves using packages ggfortify and survminer. All works fine except the legend labels for plotting. I want to present N sizes of groups in the legend labels. I thought that adding the N size to the grouping variable itself using paste0 was a good way to go. In my case it is easier than to use something like scale_fill_discrete("", labels = legend_labeller_for_plot).
library(dplyr)
library(ggplot2)
library(survival)
library(survminer)
library(ggfortify)
set.seed = 100
data <- data.frame(
time = rlnorm(20),
event = as.integer(runif(20) < 0.5),
group = ifelse(runif(20) > 0.5,
"group A",
"group B")
)
# Plotting survival curves without N sizes in the legend
fit <- survfit(
with(data, Surv(time, event)) ~ group,
data)
autoplot(fit)
# Adding N sizes to the data and plotting
data_new <- data %>%
group_by(group) %>% mutate(N = n()) %>%
ungroup() %>%
mutate(group_with_N = paste0(group, ", N = ", N))
fit_new <- survfit(
with(data, Surv(time, event)) ~ group_with_N,
data_new)
autoplot(fit_new)
When I try to add N sizes to the groups variable, the part with "N =" in the grouping variable disappears, i.e. the group variable isn't displayed on the legend labels as expected.
For comparison, what I expect is something like the following using Iris data:
What is more, I found that that the culprit is the equali sign =. When I remove the = sign, the legend labels correspond to the grouping variable values.
My question is, why does the equal sign cause this?
An option could be using ggsurvplot where you can specify the legend.labs so you can show your size in the legend like this:
library(dplyr)
library(ggplot2)
library(survival)
library(survminer)
library(ggfortify)
set.seed = 100
data <- data.frame(
time = rlnorm(20),
event = as.integer(runif(20) < 0.5),
group = ifelse(runif(20) > 0.5,
"group A",
"group B")
)
# Adding N sizes to the data and plotting
data_new <- data %>%
group_by(group) %>% mutate(N = n()) %>%
ungroup() %>%
mutate(group_with_N = paste0(group, ", N = ", N))
fit_new <- survfit(
with(data, Surv(time, event)) ~ group_with_N,
data_new)
p <- autoplot(fit_new)
p
# ggsurvplot
ggsurvplot(fit_new, data_new,
legend.labs = unique(sort(data_new$group_with_N)),
conf.int = TRUE)
Created on 2022-08-18 with reprex v2.0.2
I am trying to add p-values to my boxplot using ggboxplpot, but it seems stat_compare_means() doesn't work when I have multiple y = values.
here is the sample code from palmerpenguin dataset
library(palmerpenguins)
library(tidyverse)
library(ggplot2)
library(ggpubr)
#Load data
data(package = 'palmerpenguins')
#Remove NA data
df_clean <- na.omit(penguins)
#Group dataset according to species
df_new <- df_clean %>%
group_by(species)
#Generate multiple boxplots
df_boxplot <- ggboxplot(df_new,
x = "species",
y = c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"),
ylab = "Bill Length (mm)",
xlab = "Species",
color = "species",
fill = "species",
notch = TRUE,
alpha = 0.5,
ggtheme = theme_pubr()) +
stat_compare_means()
df_boxplot
I also tried adding a comparison list but it didn't worked
I added this variable:
comp_list <- list(c("Chinstrap", "Adelie"), c("Chinstrap", "Gentoo"), c("Adelie", "Gentoo"))
then change stat_compare_nea() to stat_compare_nea(comparison = comp_list)
I hope someone can provide an alternative and explain why this does not work. Why won't stat_compare_mean() won't automatically add p values to the 4 different boxplots being created in df_boxplot
The issue is that ggboxplot returns a list of ggplots, one for each of your variables. Hence adding + stat_compare_means() to list won't work but instead will return NULL.
To add p-values to each of your plots have to add + stat_compare_means() to each element of the list using e.g. lapply:
library(palmerpenguins)
library(tidyverse)
library(ggplot2)
library(ggpubr)
# Remove NA data
df_clean <- na.omit(penguins)
# Group dataset according to species
df_new <- df_clean %>%
group_by(species)
# Generate multiple boxplots
df_boxplot <- ggboxplot(df_new,
x = "species",
y = c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"),
ylab = "Bill Length (mm)",
xlab = "Species",
color = "species",
fill = "species",
notch = TRUE,
alpha = 0.5,
ggtheme = theme_pubr()
)
lapply(df_boxplot, function(x) x + stat_compare_means())
#> $bill_length_mm
#>
#> $bill_depth_mm
I've currently got a barplot that has a few basic parameters. However, I'm looking to try and convert this into ggplot. The extra parameters don't matter too much; the main problem that I'm having is that I'm trying to plot the sum of various columns, but I'm unable to transpose it correctly as t(data) doesn't seem to work. Here's what I've got so far:
## Subset of indicators
indicators <- clean_data[c(8, 12, 14:23)]
## Get sum of columns
indicator_sums <- colSums(indicators, na.rm = TRUE)
### Transpose for ggplot
(empty)
## Make bar plot
barplot(indicator_sums, ylim=range(pretty(c(0, indicator_sums))), cex.axis=0.75,cex.lab=0.8, cex.names=0.7, col='magenta', las=2, ylab = 'Offences Recorded Using Indicator')
You may try
library(dplyr)
library(reshape2)
dummy <- data.frame(
A = c(1:20),
B = rnorm(20, 10, 4),
C = runif(20, 19,30),
D = sample(c(10:40),20, replace = T)
)
barplot(colSums(dummy))
dummy %>%
colSums %>%
melt %>%
rownames_to_column %>%
ggplot(aes(x = rowname, y = value)) +
geom_col()
I have created a qqplot (with quantiles of beta distribution) from a dataset including two groups. To visualize, which points belong to which group, I would like to color them. I have tried the following:
res <- beta.mle(data$values) #estimate parameters of beta distribution
qqplot(qbeta(ppoints(500),res$param[1], res$param[2]),data$values,
col = data$group,
ylab = "Quantiles of data",
xlab = "Quantiles of Beta Distribution")
the result is shown here:
I have seen solutions specifying a "col" vector for qqnorm, hover this seems to not work with qqplot, as simply half the points is colored in either color, regardless of group. Is there a way to fix this?
A simulated some data just to shown how to add color in ggplot
Libraries
library(tidyverse)
# install.packages("Rfast")
Data
#Simulating data from beta distribution
x <- rbeta(n = 1000,shape1 = .5,shape2 = .5)
#Estimating parameters
res <- Rfast::beta.mle(x)
data <-
tibble(
simulated_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2])
) %>%
#Creating a group variable using quartiles
mutate(group = cut(x = simulated_data,
quantile(simulated_data,seq(0,1,.25)),
include.lowest = T))
Code
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = simulated_data, col = group))+
geom_point()
Output
For those who are wondering, how to work with pre-defined groups, this is the code that worked for me:
library(tidyverse)
library(Rfast)
res <- beta.mle(x)
# make sure groups are not numerrical
# (else color skale might turn out continuous)
g <- plyr::mapvalues(g, c("1", "2"), c("Group1", "Group2"))
data <-
tibble(
my_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2]),
group = g[order(x)]
)
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = my_data, col = group))+
geom_point()
result
I'm new to R and statistics and haven't been able to figure out how one would go about plotting predicted values vs. Actual values after running a multiple linear regression. I have come across similar questions (just haven't been able to understand the code). I would greatly appreciate it if you explain the code.
This is what I have done so far:
# Attach file containing variables and responses
q <- read.csv("C:/Users/A/Documents/Design.csv")
attach(q)
# Run a linear regression
model <- lm(qo~P+P1+P4+I)
# Summary of linear regression results
summary(model)
The plot of predicted vs. actual is so I can graphically see how well my regression fits on my actual data.
It would be better if you provided a reproducible example, but here's an example I made up:
set.seed(101)
dd <- data.frame(x=rnorm(100),y=rnorm(100),
z=rnorm(100))
dd$w <- with(dd,
rnorm(100,mean=x+2*y+z,sd=0.5))
It's (much) better to use the data argument -- you should almost never use attach() ..
m <- lm(w~x+y+z,dd)
plot(predict(m),dd$w,
xlab="predicted",ylab="actual")
abline(a=0,b=1)
Besides predicted vs actual plot, you can get an additional set of plots which help you to visually assess the goodness of fit.
--- execute previous code by Ben Bolker ---
par(mfrow = c(2, 2))
plot(m)
A tidy way of doing this would be to use modelsummary::augment():
library(tidyverse)
library(cowplot)
library(modelsummary)
set.seed(101)
# Using Ben's data above:
dd <- data.frame(x=rnorm(100),y=rnorm(100),
z=rnorm(100))
dd$w <- with(dd,rnorm(100,mean=x+2*y+z,sd=0.5))
m <- lm(w~x+y+z,dd)
m %>% augment() %>%
ggplot() +
geom_point(aes(.fitted, w)) +
geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") +
labs(x = "Actual", y = "Fitted") +
theme_bw()
This will work nicely for deep nested regression lists especially.
To illustrate this, consider some nested list of regressions:
Reglist <- list()
Reglist$Reg1 <- dd %>% do(reg = lm(as.formula("w~x*y*z"), data = .)) %>% mutate( Name = "Type 1")
Reglist$Reg2 <- dd %>% do(reg = lm(as.formula("w~x+y*z"), data = .)) %>% mutate( Name = "Type 2")
Reglist$Reg3 <- dd %>% do(reg = lm(as.formula("w~x"), data = .)) %>% mutate( Name = "Type 3")
Reglist$Reg4 <- dd %>% do(reg = lm(as.formula("w~x+z"), data = .)) %>% mutate( Name = "Type 4")
Now is where the power of the above tidy plotting framework comes to life...:
Graph_Creator <- function(Reglist){
Reglist %>% pull(reg) %>% .[[1]] %>% augment() %>%
ggplot() +
geom_point(aes(.fitted, w)) +
geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") +
labs(x = "Actual", y = "Fitted",
title = paste0("Regression Type: ", Reglist$Name) ) +
theme_bw()
}
Reglist %>% map(~Graph_Creator(.)) %>%
cowplot::plot_grid(plotlist = ., ncol = 1)
Same as #Ben Bolker's solution but getting a ggplot object instead of using base R
#first generate the dd data set using the code in Ben's solution, then...
require(ggpubr)
m <- lm(w~x+y+z,dd)
ggscatter(x = "prediction",
y = "actual",
data = data.frame(prediction = predict(m),
actual = dd$w)) +
geom_abline(intercept = 0,
slope = 1)