I would like to draw a chart with ggplot for a couple of model accuracies. The detail of the plotted result doesn't matter, however, I've a problem to fill the geom_point objects.
A sample file can be found here: https://ufile.io/z1z4c
My code is:
library(ggplot2)
library(ggthemes)
Palette <- c('#A81D35', '#085575', '#1DA837')
results <- read.csv('test.csv', colClasses=c('factor', 'factor', 'factor', 'numeric'))
results$dates <- factor(results$dates, levels = c('01', '15', '27'))
results$pocd <- factor(results$pocd, levels = c('without POCD', 'with POCD', 'null accuracy'))
results$model <- factor(results$model, levels = c('SVM', 'DT', 'RF', 'Ada', 'NN'))
ggplot(data = results, group = pocd) +
geom_point(aes(x = dates, y = acc,
shape = pocd,
color = pocd,
fill = pocd,
size = pocd)) +
scale_shape_manual(values = c(0, 1, 3)) +
scale_color_manual(values = c(Palette[1], Palette[2], Palette[3])) +
scale_fill_manual(values = c(Palette[1], Palette[2], Palette[3])) +
scale_size_manual(values = c(2, 2, 1)) +
facet_grid(. ~ model) +
xlab('Date of knowledge') +
ylab('Accuracy') +
theme(legend.position = 'right',
legend.title = element_blank(),
axis.line = element_line(color = '#DDDDDD'))
As a result I get unfilled circles and squares. How can I fix it, so that the squares and circles are filled with the specfic color?
Additional question: I would like to add a geom_line to the graph, connecting the three points in each group. However, I fail to adjust linetype and width. It always take the values of scale_*_manual, which is very adverse especially in the case of size.
Thanks for helping!
You need to change the shapes specified, like so:
scale_shape_manual(values = c(21,22,23)) +
For your additional question, that should be solved if you set aes(size=) in the first part of your code (under ggplot(data=...) and then manually specify size=1 under geom_line as +geom_line(size=1....`
Related
I want to make the Girls have the dashed trendline and the Boys have a solid trendline. I'd also like to remove the box around the graph, save the y and x-axis lines, and the shading behind the shapes on the key. I am using ggplot2 in R.
dr <- ggplot(DATASET,
aes(x=EC,
y=sqrt_Percent.5,
color=Sex1M,
shape=Sex1M,
linetype=Sex1M)) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
legend.background =element_rect(fill=NA,
size=0.5,
linetype="solid")) +
scale_color_grey(start = 0.0,
end = 0.4)
Current Graph
There is quite something going on in your visualisation. One strategy to develop this is to add layer and feature by feature once you have your base plot.
There a different ways to change the "sequence" of your colours, shapes, etc.
You can do this in ggplot with one of the scale_xxx_manual layers.
Conceptually, I suggest you deal with this in the data and only use the scales for "twisting". But that is a question of style.
In your case, you use Sex1M as a categorical variable. There is a built in sequence for (automatic) colouring and shapes. So in your case, you have to "define" the levels in another order.
As you have not provided a representative sample, I simulate some data points and define Sex1M as part of the data creation process.
DATASET <- data.frame(
x = sample(x = 2:7, size = 20, replace = TRUE)
, y = sample(x = 0.2:9.8, size = 20, replace = TRUE)
, Sex1M = sample(c("Boys", "Girls"), size = 20, replace = TRUE )
Now let's plot
library(dplyr)
library(ggplot2)
DATASET <- DATASET %>%
mutate(Sex1M = factor(Sex1M, levels = c("Boys","Girls)) # set sequence of levels: boys are now the first level aka 1st colour, linetype, shape.
# plot
ggplot(DATASET,
aes(x=x, # adapted to simulated data
y=y, # adapted to simulated data
color=Sex1M, # these values are now defined in the sequence
shape=Sex1M, # of the categorical factor you created
linetype=Sex1M) # adapt the factor levels as needed (e.g change order)
) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
#------------ ggplot is not always intuitive - the legend background the panel
# comprising the legend keys (symbols) and the labels
# you want to remove the colouring of the legend keys
legend.key = element_rect(fill = NA),
# ----------- that can go. To see above mentioned difference of background and key
# set fill = "blue"
# legend.background =element_rect(fill = NA, size=0.5,linetype="solid")
) +
scale_color_grey(start = 0.0,
end = 0.4)
The settings for the background panel make the outer line disappear in my plot.
Hope this helps to get you started.
I am a newbie for stack Overflow and r language.
Here is my problem.
I now have a dataframe with one variable called Type and other 14 variables whose correlation matrix heatmap needed to be calculated.
origin dataset
I already have an overall format using ggplot2, and the theme is default theme_grey but fine for me to view. The code is :
m<- melt(get_lower_tri(round(cor(xrf[3:16], method = 'pearson', use = 'pairwise.complete.obs'), 2)),na.rm = TRUE)
ggplot(m, aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = 'skyblue4',
high = 'coral2',
mid = 'white',
midpoint = 0,
limit = c(-1, 1),
space = "Lab",
name = 'Person\nCorrelation') +
theme_grey()+
coord_fixed() +
theme(axis.title = element_blank())
The result is fine and the background looks good to view.
But when I managed to generate a grouped correlation matrix heatmap, I found that no matter how hard I tried (using theme(panel.background = element_rect()) or theme(panel.background = element_blank())), the subplot backgrounds won’t change and remain this ugly grey which is even different from the overall one.
Here is my code:
Type = rep(c('(a)', '(b)', '(c)','(d)', '(e)', '(f)', '(g)', '(h)', '(i)', '(j)'), each = 14^2)
# Get lower triangle of the correlation matrix
get_lower_tri<-function(x){
x[upper.tri(x)] <- NA
return(x)
}
df2 <- do.call(rbind, lapply(split(xrf, xrf$Type),
function(x) melt(get_lower_tri(round(cor(x[3:16], method = 'pearson', use = 'pairwise.complete.obs'), 2)),na.rm = FALSE)))
my_cors <- cbind(Type,df2)
my_cors %>%
ggplot(aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = 'skyblue4',
high = 'coral2',
mid = 'white',
midpoint = 0,
limit = c(-1, 1),
space = "Lab",
name = 'Person\nCorrelation') +
theme_grey()+
coord_fixed() +
theme(axis.title = element_blank(),
panel.background = element_rect(fill = 'grey90',colour = NA))+
facet_wrap("Type",ncol = 5, nrow = 2)
Isn’t the facet subplot backgrounds the same as the overall one if using the same theme? And how can I change it?
Update:sorry! It’s my first time to raise a question and it’s not a good one!
xrf is my original dataset...But now I have figured out why thanks to Tjebo and those who comment my faulty questions.It’s very instructive to me!!
scale_fill_gredient2(...,na.value = 'transparent') will solve it.The default value of this parameter is "grey50" which I took as the background color.
I am truly sorry for asking such a silly question, and I really really appreciate you guys’s nice comment for a rookie! Thank you guys!
I have a bunch of data for people touching bacteria for up to 5 touches. I'm comparing how much they pick up with and without gloves. I'd like to plot the mean by the factor NumberContacts and colour it red. E.g. the red dots on the following graphs.
So far I have:
require(tidyverse)
require(reshape2)
Make some data
df<-data.frame(Yes=rnorm(n=100),
No=rnorm(n=100),
NumberContacts=factor(rep(1:5, each=20)))
Calculate the mean for each group= NumberContacts
centroids<-aggregate(data=melt(df,id.vars ="NumberContacts"),value~NumberContacts+variable,mean)
Get them into two columns
centYes<-subset(centroids, variable=="Yes",select=c("NumberContacts","value"))
centNo<-subset(centroids, variable=="No",select="value")
centroids<-cbind(centYes,centNo)
colnames(centroids)<-c("NumberContacts","Gloved","Ungloved")
Make an ugly plot.
ggplot(df,aes(x=gloves,y=ungloved)+
geom_point()+
geom_abline(slope=1,linetype=2)+
stat_ellipse(type="norm",linetype=2,level=0.975)+
geom_point(data=centroids,size=5,color='red')+
#stat_summary(fun.y="mean",colour="red")+ doesn't work
facet_wrap(~NumberContacts,nrow=2)+
theme_classic()
Is there a more elegant way by using stat_summary? Also How can I change the look of the boxes at the top of my graphs?
stat_summary is not an option because (see ?stat_summary):
stat_summary operates on unique x
That is, while we can take a mean of y, x remains fixed. But we may do something else that is very concise:
ggplot(df, aes(x = Yes, y = No, group = NumberContacts)) +
geom_point() + geom_abline(slope = 1, linetype = 2)+
stat_ellipse(type = "norm", linetype = 2, level = 0.975)+
geom_point(data = df %>% group_by(NumberContacts) %>% summarise_all(mean), size = 5, color = "red")+
facet_wrap(~ NumberContacts, nrow = 2) + theme_classic() +
theme(strip.background = element_rect(fill = "black"),
strip.text = element_text(color = "white"))
which also shows that to modify the boxes above you want to look at strip elements of theme.
I have data which comes from a statistical test (gene set enrichment analysis, but that's not important), so I obtain p-values for statistics that are normally distributed, i.e., both positive and negative values:
The test is run on several categories:
set.seed(1)
df <- data.frame(col = rep(1,7),
category = LETTERS[1:7],
stat.sign = sign(rnorm(7)),
p.value = runif(7, 0, 1),
stringsAsFactors = TRUE)
I want to present these data in a geom_tile ggplot such that I color code the df$category by their df$p.value multiplied by their df$stat.sign (i.e, the sign of the statistic)
For that I first take the log10 of df$p.value:
df$sig <- df$stat.sign*(-1*log10(df$p.value))
Then I order the df by df$sig for each sign of df$sig:
library(dplyr)
df <- rbind(dplyr::filter(df, sig < 0)[order(dplyr::filter(df, sig < 0)$sig), ],
dplyr::filter(df, sig > 0)[order(dplyr::filter(df, sig > 0)$sig), ])
And then I ggplot it:
library(ggplot2)
df$category <- factor(df$category, levels=df$category)
ggplot(data = df,
aes(x = col, y = category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue', mid='white', high='darkred') +
theme_minimal() +
xlab("") + ylab("") + labs(fill="-log10(P-Value)") +
theme(axis.text.y = element_text(size=12, face="bold"),
axis.text.x = element_blank())
which gives me:
Is there a way to manipulate the legend such that the values of df$sig are represented by their absolute value but everything else remains unchanged? That way I still get both red and blue shades and maintain the order I want.
If you check ggplot's documentation, scale_fill_gradient2, like other continuous scales, accepts one of the following for its labels argument:
NULL for no labels
waiver() for the default labels computed for the transofrmation object
a character vector giving labels (must be same length as breaks)
a function that takes the breaks as input and returns labels as output
Since you only want the legend values to be absolute, I assume you're satisfied with the default breaks in the legend colour bar (-0.1 to 0.4 with increments in 0.1), so all you really need is to add a function that manipulates the labels.
I.e. instead of this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred') +
Use this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred',
labels = abs) +
I'm not sure I did understood what you're looking for. Do you meant that you wan't to change the labels within legends? If you want to change labels manipulating breaks and labels given by scale_fill_gradient2() shall do it.
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
For what you're looking for maybe you could display texts inside the figure to show the values, try stacking stat_bin_2d() like this:
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
stat_bin_2d(geom = 'text', aes(label = sig), colour = 'black', size = 16) +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
You might want to give the size and colour arguments some tries.
First of all, my data:
http://www.pastebin.ca/2599202 (I hope this is not too inconvenient, because I fail creating good fitting example data)
What I basically need, is a plot like the one I did here.
I plotted the repeated measures factor time(x-axis, 3 levels) against ias (continuous dependent variable) for my 3 experimental groups. I did this 4 times (for each quantile of my trait-measure MIHT, miht.binned, .25 - 1.00).
I have to admit I am not really an R professional and the ggplot2 manual simply is an overkill for me. I created the plot with ezPlot (from ezANOVA) and only managed to do a bit layout tweaking with ggplot2:
PlotIAS = ezPlot(
data = MyData
, dv = .(ias)
, wid = .(id)
, between = .(GROUP, miht.binned)
, within = .(time)
, x = .(time)
, split = .(GROUP)
, col = .(miht.binned)
, x_lab = 'time of measurement'
, y_lab = 'IAS Score (Mean)'
#, do_bars = FALSE
, type = 3
)
PlotIAS = PlotIAS +
theme(
panel.grid.major.y = element_line(colour = "gray80", size = NULL, linetype = NULL,
lineend = NULL)
,panel.grid.minor.y = element_line(colour = "gray90", size = NULL, linetype = NULL,
lineend = NULL)
,panel.grid.major.x = element_blank()
,panel.grid.minor.x = element_blank()
,legend.background = element_rect(fill = NULL, colour = "black")
,panel.background = element_rect(fill = "white", colour = "white", size = NULL,
linetype = NULL)
)
print(PlotIAS)
I did'nt find any information about these error bars ezPlot creates. They seem to be the same for each point and length can be arbitrarily redifined with bar_size =. I just need to have error bars with SE or CI. I don't know if it is possible to add these in my ezPlot-based code (and how?) or if one has to create a complete new ggplot object for that (which is an quite an overcharge for me...). Help is highly appreciated.
I think this comes close to what you want:
ggplot(MyData, aes(x=time, y=ias, colour=GROUP, group=GROUP,
linetype=GROUP, shape=GROUP)) +
facet_grid(~miht.binned) +
stat_summary(fun.data="mean_cl_boot", geom="errorbar", conf.int=90) +
#alternatives:
#stat_summary(fun.data="mean_cl_normal", geom="errorbar") +
#stat_summary(fun.data="mean_sdl", geom="errorbar") +
stat_summary(fun.y="mean", geom="point", size=2) +
stat_summary(fun.y="mean", geom="line") +
theme_bw()
See ?mean_cl_boot etc for more info on where these error bars come from. conf.int is the CI-level. Also, having all three of colour, linetype AND shape mapped to GROUP seems like overkill. You could probably do without linetype and shape.
Let me add that
ggplot(MyData, aes(x=time, y=ias, fill=GROUP)) + facet_grid(~miht.binned) +
geom_boxplot() + theme_bw()
may actually be a plot that's easier to read (no crossing/overlapping lines) while at the same time retaining more of the data's characteristics (min/max, outliers).