Related
For one of the presentations, I am trying to make a scatter plot. Here is a similar code I am trying to run for the visualization I would like to have
library(tidyverse)
library(showtext)
library(ggtext)
ggplot(mtcars, aes(x = mpg, y = disp)) +
geom_point(size = 1.5) +
labs(title = "<span style = 'color:#005A9C;'>A vs. B</span>",
x = "B<br>BBBBB<br>BBBBBBBBBB<br>BBBBBBBBBB",
y = "A<br>AAAAA<br>AAAAAAAAAA<br>AAAAAAAAAA",
caption = expression(paste(italic("Source:"), " JJJJJ"))) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "white",colour = NA),
panel.background = element_rect(fill = 'white', colour = 'white'),
panel.border = element_blank(),
legend.position = "none",
plot.title = element_markdown(hjust = 0.5, family = "title", size = 45),
plot.subtitle = element_markdown(hjust = 0.5, family = "subtitle", size = 30), # note that to color numbers differently, I used element_markdown not element_text function
axis.title.x = element_markdown(hjust = 0.5, family = "subtitle", size = 33),
axis.text.x = element_markdown(hjust = 0.5, family = "subtitle", size = 20),
axis.title.y = element_markdown(hjust = 0.5, family = "subtitle", size = 33),
axis.text.y = element_markdown(hjust = 0.5, family = "subtitle", size = 20),
plot.caption = element_text(hjust = 0, size = 21, family = "caption"))
ggsave("plot.png", height = 6, width = 7)
And here is the result;
Now, my question is how to reduce the distance between text lines for the x and y axes?
P-values can be added to ggplot2 figures using the function ggpubr::stat_compare_mean(). However I cannot get the text "p = " to show up in front of the p-values. There are examples of how to add "p = " in front of p-values on the help page for the function but they do not seem to work.
Example
library(ggplot2)
library(ggpubr)
library(dplyr)
data("Cars93")
# List of the comparisons I would like to make for which p-values will be derived
my_comparisons <- list(c("Front", "Rear"),
c("Front", "4WD"),
c("Rear", "4WD"))
# creates the figure with p-value but no label indicating the values are p-values
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
stat_compare_means(paired = F,
comparisons = my_comparisons) +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
following the example at the bottom of the ?stat_compare_means page suggests using aes(label = paste0("p = ", ..p.format..) which does not work.
?stat_compare_means
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
stat_compare_means(paired = F,
comparisons = my_comparisons,
aes(label = paste0("p = ", ..p.format..))) +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
If you look at the label argument on the ?stat_compare_means help page it says the allowed values include "p.signif" or "p.format" which made me think ..p.format.. was deprecated, so I tried adding in "p.format" which also did not work.
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
stat_compare_means(paired = F,
comparisons = my_comparisons,
aes(label = paste0("p = ", "p.format"))) +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
In the end I would like the p-values to be preceded by p = such that the labels would say p = 0.00031, p = 0.059, and p = 0.027.
When you use a list of comparisons, stat_compare_means defaults to using geom_signif from the ggsignif package, essentially acting as a glorified wrapper function. In so doing, you lose some of the formatting flexibility. Better in this case to use geom_signif directly:
library(ggsignif)
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
geom_signif(y_position = c(55, 60, 65),
comparisons = my_comparisons,
map_signif_level = function(x) paste("p =", scales::pvalue(x))) +
geom_boxplot(outlier.colour="white", outlier.fill = "white",
outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2),
color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
Need to display the x-axis levels in neatly way without affecting the actual point numbers in the final output. As currently, I am getting x-axis in closely spaced which looks not good while I am showing in powerpoint
library("readxl")
my_data <-read_excel("central_high.xlsx") # Input file
str(my_data)
my_data = as.data.frame(my_data)
str(my_data)
my_data$var1 = NULL
f20 = as.data.frame(table(my_data$Year20))
f20$Var1 = as.Date(f20$Var1, "%Y-%m-%d")
f20$Var1 = format(f20$Var1, format="%m-%d")
f20$Cumulative_F20 = cumsum(f20$Freq) # cumulative calculation
f20
newcol_20 = c( my_data$Year19,
my_data$Year18, my_data$Year17,
my_data$Year16, my_data$Year15,
my_data$Year14, my_data$Year13,
my_data$Year12, my_data$Year11,
my_data$Year10, my_data$Year9,
my_data$Year8, my_data$Year7,
my_data$Year6, my_data$Year5,
my_data$Year4, my_data$Year3,
my_data$Year2, my_data$Year1)
str(newcol_20)
newdata_20 = data.frame(newcol_20)
str(newdata_20)
newdata_20$newcol_20 = as.Date(as.character(newdata_20$newcol_20), "%Y-%m-%d")
newdata_20$newcol_20 = format(newdata_20$newcol_20, format="%m-%d")
str(newdata_20)
newtable_20 = table(newdata_20$newcol_20)
newtable_20
newdf_20 = as.data.frame(newtable_20)
#newdf_20$Cumulative_20 = cumsum(newdf_20$Freq)/19 # cumulative calculation
newdf_20$Freq = newdf_20$Freq/19
newdf_20
newcol_05 = c( my_data$Year19,
my_data$Year18, my_data$Year17,
my_data$Year16)
str(newcol_05)
newdata_05 = data.frame(newcol_05)
str(newdata_05)
newdata_05$newcol_05 = as.Date(as.character(newdata_05$newcol_05), "%Y-%m-%d")
newdata_05$newcol_05 = format(newdata_05$newcol_05, format="%m-%d")
str(newdata_05)
newtable_05 = table(newdata_05$newcol_05)
newtable_05
newdf_05 = as.data.frame(newtable_05)
newdf_05$Cumulative_05 = cumsum(newdf_05$Freq)/4 # cumulative calculation
newdf_05$Freq = newdf_05$Freq/4
newdf_05
library(ggplot2)
library(ggpubr)
ggplot() +
geom_line(data = newdf_20, aes(x=Var1, y=cumsum(Freq), group = 1, color = "#111111"), size = 1.6) +
geom_line(data = newdf_05, aes(x=Var1, y=cumsum(Freq), group = 1, color = "#999999"), size = 1.6) +
geom_line(data = f20, aes(x=Var1, y=cumsum(Freq), group = 1, color = "#CC79A7"), size = 1.6) +
geom_vline(xintercept = "03-25", color="gray", size=1)+
geom_vline(xintercept = "04-21", color="gray", size=1)+
labs(y = "Cumulative_Frequency", colour= "#000000", size = 16 )+
font("ylab", size = 15, color = "black", face = "bold.italic")+
font("legend.text",size = 10, face = "bold")+
font("legend.title",size = 15, face = "bold")+
theme(axis.line.x = element_line(size = 0.5, colour = "black"), # theme modification
axis.line.y = element_line(size = 0.5, colour = "black"),
#axis.text.x=element_blank(),axis.ticks.x=element_blank(),
panel.background = element_blank(),
legend.position = 'none',
axis.text.x = element_text(colour = "#000000", size = 7,
angle = 90, face ="bold" ),
axis.text.y = element_text(colour = "#000000", size = 12,
angle = 90, face ="bold" ))
Please modify the code and I also added the final output what I am getting need a little bit of modification in the code to get x-axis neatly
One option would be dodging the labels in x-axis:
library(ggplot2)
library(ggpubr)
ggplot() +
geom_line(data = newdf_20, aes(x=Var1, y=cumsum(Freq), group = 1, color = "#111111"), size = 1.6) +
geom_line(data = newdf_05, aes(x=Var1, y=cumsum(Freq), group = 1, color = "#999999"), size = 1.6) +
geom_line(data = f20, aes(x=Var1, y=cumsum(Freq), group = 1, color = "#CC79A7"), size = 1.6) +
geom_vline(xintercept = "03-25", color="gray", size=1)+
geom_vline(xintercept = "04-21", color="gray", size=1)+
scale_x_discrete(guide = guide_axis(n.dodge=2))+
labs(y = "Cumulative_Frequency", colour= "#000000", size = 16 )+
font("ylab", size = 15, color = "black", face = "bold.italic")+
font("legend.text",size = 10, face = "bold")+
font("legend.title",size = 15, face = "bold")+
theme(axis.line.x = element_line(size = 0.5, colour = "black"), # theme modification
axis.line.y = element_line(size = 0.5, colour = "black"),
#axis.text.x=element_blank(),axis.ticks.x=element_blank(),
panel.background = element_blank(),
legend.position = 'none',
axis.text.x = element_text(colour = "#000000", size = 7,
angle = 90, face ="bold" ),
axis.text.y = element_text(colour = "#000000", size = 12,
angle = 90, face ="bold" ))
Output:
I would like to set the color and the shape of my 2 indicators which has been plotted in in two layes. The scale_color_manual works however the scale_shape_manual is not working. By having or not having this line "scale_shape_manual"; the result is the same and shape "16" (filled circle) is picked up?
comp_graph_1 <- ggplot() +
layer( mapping = aes(x=log(FV), y= msd, colour = "Reference"), #factor(Dataset)
data = ref,
stat = "identity",
geom = "point",
position = "identity")+
layer(mapping = aes(x=log(FV), y= msd, colour = "Target"), # "red" "blue"
data = target, #data = target[Is_Phone == 0],
stat = "identity",
geom = "point",
position = "identity")+
theme(panel.background = element_rect(fill = 'white'),
panel.grid = element_line(colour = "grey90") , panel.ontop = FALSE)+
theme(legend.justification = c(0, 0), legend.position = "bottom",
legend.background = element_rect(), legend.title = element_blank(), legend.key = element_rect(fill = "white"),
legend.text = element_text(size = 9,colour = "#7F7F7F"), panel.border = element_blank(),
axis.line = element_line(color = "#7F7F7F"))+
theme(plot.title = element_text(size = 16, colour = "#7F7F7F"),
axis.title.x = element_text(size = 11, hjust = 1, face = "bold", colour = "#7F7F7F"),
axis.title.y = element_text(size = 11, hjust = 1, face = "bold", colour = "#7F7F7F")) +
ggtitle(paste0(x, " / ", y, " distribution ")) + xlab(paste0("log ", x)) + ylab(y) +
scale_color_manual(values = c("Reference" ="#FFC000","Target" = "#00AEEF")) +
scale_shape_manual(values = c("Reference" =17, "Target" = 4))
I think where you have colour = "Target" you need a shape statement as well
shape = "Target" and shape = "Reference" and it should work.
I have produced some nice plots with the plotLearnerPrediction function of the MLR package. I was able to make some adjustments to the returned ggplot (see my code below). But I am not sure how to make the last adjustment. Namely, I want to change the coloring of the data points based on labels (groups in example plot).
My last plot (with black data points)
Another produced plot (overlapping data points)
This is the last version of my code (normally part of a for loop):
plot <- plotLearnerPrediction(learner = learner_name, task = tasks[[i]], cv = 0,
pointsize = 1.5, gridsize = 500) +
ggtitle(trimws(sprintf("Predictions %s %s", meta$name[i], meta$nr[i])),
subtitle = sprintf("DR = %s, ML = %s, CV = LOO, ACC = %.2f", meta$type[i],
toupper(strsplit(learner_name, "classif.")[[1]][2]), acc[[i]])) +
xlab(sprintf("%s 1", lab)) +
ylab(sprintf("%s 2", lab)) +
scale_fill_manual(values = colors) +
theme(plot.title = element_text(size = 18, face = "bold"),
plot.subtitle = element_text(size = 12, face = "bold", colour = "grey40"),
axis.text.x = element_text(vjust = 0.5, hjust = 1),
axis.text = element_text(size = 14, face = "bold"),
axis.title.x = element_text(vjust = 0.5),
axis.title = element_text(size = 16, face = "bold"),
#panel.grid.minor = element_line(colour = "grey80"),
axis.line.x = element_line(color = "black", size = 1),
axis.line.y = element_line(color = "black", size = 1),
panel.grid.major = element_line(colour = "grey80"),
panel.background = element_rect(fill = "white"),
legend.justification = "top",
legend.margin = margin(l = 0),
legend.title = element_blank(),
legend.text = element_text(size = 14))
Below is a part of the source code of the plotLearnerPrediction function. I want to overrule geom_point(colour = "black"). Adding simply geom_point(colour = "pink") to my code will not color data points, but the whole plot. Is there a solution to overrule that code with a vector of colors? Possibly a change in the aes() is also needed to change colors based on groups.
else if (taskdim == 2L) {
p = ggplot(mapping = aes_string(x = x1n, y = x2n))
p = p + geom_tile(data = grid, mapping = aes_string(fill = target))
p = p + scale_fill_gradient2(low = bg.cols[1L], mid = bg.cols[2L],
high = bg.cols[3L], space = "Lab")
p = p + geom_point(data = data, mapping = aes_string(x = x1n,
y = x2n, colour = target), size = pointsize)
p = p + geom_point(data = data, mapping = aes_string(x = x1n,
y = x2n), size = pointsize, colour = "black",
shape = 1)
p = p + scale_colour_gradient2(low = bg.cols[1L],
mid = bg.cols[2L], high = bg.cols[3L], space = "Lab")
p = p + guides(colour = FALSE)
}
You can always hack into gg objects. The following works for ggplot2 2.2.1 and adds a manual alpha value to all geom_point layers.
library(mlr)
library(ggplot2)
g = plotLearnerPrediction(makeLearner("classif.qda"), iris.task)
ids.geom.point = which(sapply(g$layers, function(z) class(z$geom)[[1]]) == "GeomPoint")
for(i in ids.geom.point) {
g$layers[[i]]$aes_params$alpha = 0.1
}
g
The plotLearnerPrediction() function returns the ggplot plot object, which allows for some level of customization without having to modify the source code. In your particular case, you can use scale_fill_manual() to set custom fill colors:
library(mlr)
g = plotLearnerPrediction(makeLearner("classif.randomForest"), iris.task)
g + scale_fill_manual(values = c("yellow", "orange", "red"))