Creating a density histogram in ggplot2? - r

I want to create the next histogram density plot with ggplot2. In the "normal" way (base packages) is really easy:
set.seed(46)
vector <- rnorm(500)
breaks <- quantile(vector,seq(0,1,by=0.1))
labels = 1:(length(breaks)-1)
den = density(vector)
hist(df$vector,
breaks=breaks,
col=rainbow(length(breaks)),
probability=TRUE)
lines(den)
With ggplot I have reached this so far:
seg <- cut(vector,breaks,
labels=labels,
include.lowest = TRUE, right = TRUE)
df = data.frame(vector=vector,seg=seg)
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
But the "y" scale has the wrong dimension. I have noted that the next run gets the "y" scale right.
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
I just do not understand it. y=..density.. is there, that should be the height. So why on earth my scale gets modified when I try to fill it?
I do need the colours. I just want a histogram where the breaks and the colours of each block are directionally set according to the default ggplot fill colours.

Manually, I added colors to your percentile bars. See if this works for you.
library(ggplot2)
ggplot(df, aes(x=vector)) +
geom_histogram(breaks=breaks,aes(y=..density..),colour="black",fill=c("red","orange","yellow","lightgreen","green","darkgreen","blue","darkblue","purple","pink")) +
geom_density(aes(y=..density..)) +
scale_x_continuous(breaks=c(-3,-2,-1,0,1,2,3)) +
ylab("Density") + xlab("df$vector") + ggtitle("Histogram of df$vector") +
theme_bw() + theme(plot.title=element_text(size=20),
axis.title.y=element_text(size = 16, vjust=+0.2),
axis.title.x=element_text(size = 16, vjust=-0.2),
axis.text.y=element_text(size = 14),
axis.text.x=element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())

fill=seg results in grouping. You are actually getting a different histogram for each value of seg. If you don't need the colours, you could use this:
ggplot(df) +
geom_histogram(breaks=breaks,aes(x=vector,y=..density..), position="identity") +
geom_density(aes(x=vector,y=..density..))
If you need the colours, it might be easiest to calculate the density values outside of ggplot2.

Or an option with ggpubr
library(ggpubr)
gghistogram(df, x = "vector", add = "mean", rug = TRUE, fill = "seg",
palette = c("#00AFBB", "#E7B800", "#E5A800", "#00BFAB", "#01ADFA",
"#00FABA", "#00BEAF", "#01AEBF", "#00EABA", "#00EABB"), add_density = TRUE)

The confusion regarding interpreting the y-axis might be due to density is plotted rather than count. So, the values on the y-axis are proportions of the total sample, where the sum of the bars is equal to 1.

Related

How to log transform only one axis in dual axis plot in r?

i am currently plotting (long format) data which consists of fluorescence (RFU) on the 1. Y-Axis and Growth (OD600) on the 2. Y-Axis. I have managed to create the plots, but i find it very difficult to log transform the 2. Y-axis (for OD600) and not messing up the entire plot. (The data is all derived from the same data frame)
My question is this: Is there any way to log10 transform only the 2. Y-axis (from 0.01-1) and making perhaps 5 breaks something like:("0.01","0.1","0.5","0.1")?
My code looks like this: (i apologize for ugly code)
for (i in 1:length(unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))])){
print(i)
coeff <- 1/max(lf_combined_test$normalized_gfp)
p1<-lf_combined_test[lf_combined_test$media %in% unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i], ] %>%
# filter(normalized_gfp>0) %>%
filter(row_number() %% 3 == 1) %>%
ggplot( aes(x=time)) +
geom_bar( aes(y=normalized_gfp), stat="identity", size=.1, fill="green", color="green", alpha=.4)+
geom_line( aes(y=od / coeff), size=2, color="tomato") +
scale_x_continuous(breaks = round(seq(0,92, by = 5),1))+
geom_vline(xintercept = 12, linetype="dotted",
color = "blue", size=1)+
scale_y_continuous(limits = c(0,80000),
name = "Relative Flourescence [RFU]/[OD] ",
sec.axis = sec_axis(~.*coeff, name="[OD600]")
) +
scale_y_log10(limits=c(0.01,1))+
theme_grey() +
theme(
axis.title.y = element_text(color = "green", size=13),
axis.title.y.right = element_text(color = "tomato", size=13)
) +
ggtitle(paste("Relative fluorescence & OD600 time series for",unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i],sep=" "))
print(p1)
)
}
Which gives a plots that looks like this for now:
Thank you very much in advance! :))
Yes, this is certainly possible. Without your data set it is difficult to give you specific code, but here is an example using the built-in mtcars data set. We plot a best-fitting line for mpg against an x axis of wt.
p <- ggplot(mtcars, aes(wt, mpg)) + geom_smooth(aes(color = 'mpg'))
p
Suppose we want to draw the value of disp according to a log scale which we will show on the y axis. We need to carry out the log transform of our data to do this, but also multiply it by 10 to get it on a similar visual scale to the mpg line:
p <- p + geom_smooth(aes(y = 10 * log10(disp), color = 'disp'))
p
To draw the secondary axis in, we need to supply it with the reverse transformation of 10 * log10(x), which is 10^(x/10), and we will supply appropriately logarithmic breaks at 10, 100 and 1000
p + scale_y_continuous(
sec.axis = sec_axis(~ 10^(.x/10), breaks = c(10, 100, 1000), name = 'disp'))
It seems that you are generating the values of your line by using od / coeff, and reversing that transform with .*coeff, which seems appropriate, but to get a log10 axis, you will need to do something like log10(od) * constant and reverse it with 10^(od/constant). Without your data, it's impossible to know what this constant should be, but you can play around with different values until it looks right visually.

How to properly form ggplot graphs, without cutting off important parts of the graph?

I have created a barchart using ggplot() + geom_bar() functions, by ggplot2 package. I have also used coord_flip() to reverse the orientation of the bars and geom_text() to add the values at the top of each bar. Some of the bars have different colors, so there is a legend following the graph. What I am getting as result is a picture half occupied by the graph, half by the legend and with the values on top of the longest bars being cut off because of the small size of the graph.
Any ideas on how to enlarge the size of the graph and reduce the size of the legend, in order the values of the bars not to be cut off?
Thank you
This is my code on imaginary data:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
df <- as.data.frame(cbind(labels,freq))
type <- c("rich","poor","poor","poor","rich")
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = sort(freq, decreasing = FALSE), size = 3.5, hjust = -0.2)
And this is the graph it gives as result:
There are a few fixes to this:
Change your Limits
As indicated by #Dave2e - see his response
Change the size of your output
The interesting thing about graphics in R is that the aspect ratio and resolution of the graphics device will change the result and look of a plot. When I ran your code... no clipping was observed. You can test this out creating the plot and then saving differently. If I take your default code, here's what I get with different arguments to width= and height= for ggsave() as a png:
ggsave('a1.png', width=10, height=5)
ggsave('a2.png', width=15, height=5)
Set an Expansion
The third way is to set an expansion to the scale limits. By default, ggplot2 actually adds some "padding" to the ends of a scale. So, if you set your limits from 0 to 10, you'll actually have a plot area that goes a bit beyond this (about 5% beyond by default). You can redefine that setting by using the expand= argument of scale_... commands in ggplot. So you can set this limit, for example in the following code:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = freq, size = 3.5, hjust = -0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.15)))
You can define the lower and upper expansion for an axis, so in the above code I've defined to set no expansion to the lower limit of the y scale and to use a multiplier of 0.15 (about 15%) to the upper limit. Default is 0.05, I believe (or 5%).
You can override the default limits on the y axis scale with with the ylim() function.
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
#set the max y axis limit to allow enough room for the label
ylimitmax <- 11
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
ylim(0, ylimitmax) +
geom_text(label = freq, size = 3.5, hjust = -0.2)
The script shows how to code the manual limits but you may want to automate the limit calculation with something like ylimitmax= max(freq) * 1.2.

ggplot2 geom_points won't colour or dodge

So I'm using ggplot2 to plot both a bar graph and points. I'm currently getting this:
As you can see the bars are nicely separated and colored in the desired colors. However my points are all uncolored and stacked ontop of eachother. I would like the points to be above their designated bar and in the same color.
#Add bars
A <- A + geom_col(aes(y = w1, fill = factor(Species1)),
position = position_dodge(preserve = 'single'))
#Add colors
A <- A + scale_fill_manual(values = c("A. pelagicus"= "skyblue1","A. superciliosus"="dodgerblue","A. vulpinus"="midnightblue","Alopias sp."="black"))
#Add points
A <- A + geom_point(aes(y = f1/2.5),
shape= 24,
size = 3,
fill = factor(Species1),
position = position_dodge(preserve = 'single'))
#change x and y axis range
A <- A + scale_x_continuous(breaks = c(2000:2020), limits = c(2016,2019))
A <- A + expand_limits(y=c(0,150))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
A <- A + scale_y_continuous(sec.axis = sec_axis(~.*2.5, name = " "))
# modifying axis and title
A <- A + labs(y = " ",
x = " ")
A <- A + theme(plot.title = element_text(size = rel(4)))
A <- A + theme(axis.text.x = element_text(face="bold", size=14, angle=45),
axis.text.y = element_text(face="bold", size=14))
#A <- A + theme(legend.title = element_blank(),legend.position = "none")
#Print plot
A
When I run this code I get the following error:
Error: Unknown colour name: A. pelagicus
In addition: Warning messages:
1: Width not defined. Set with position_dodge(width = ?)
2: In max(table(panel$xmin)) : no non-missing arguments to max; returning -Inf
I've tried a couple of things but I can't figure out it does work for geom_col and not for geom_points.
Thanks in advance
The two basic problems you have are dealing with your color error and not dodging, and they can be solved by formatting your scale_...(values= argument using a list instead of a vector, and applying the group= aesthetic, respectively.
You'll see the answer to these two question using an example:
# dummy dataset
year <- c(rep(2017, 4), rep(2018, 4))
species <- rep(c('things', 'things1', 'wee beasties', 'ew'), 2)
values <- c(10, 5, 5, 4, 60, 10, 25, 7)
pt.value <- c(8, 7, 10, 2, 43, 12, 20, 10)
df <-data.frame(year, species, values, pt.value)
I made the "values" set for my column heights and I wanted to use a different y aesthetic for points for illustrative purposes, called "pt.value". Otherwise, the data setup is similar to your own. Note that df$year will be set as numeric, so it's best to change that into either Date format (kinda more trouble than it's worth here), or just as a factor, since "2017.5" isn't gonna make too much sense here :). The point is, I need "year" to be discrete, not continuous.
Solve the color error
For the plot, I'll try to create it similar to you. Here note that in the scale_fill_manual object, you have to set the values= argument using a list. In your example code, you are using a vector (c()) to specify the colors and naming. If you have name1=color1, name2=color2,..., this represents a list structure.
ggplot(df, aes(x=as.factor(year), y=values)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
So the colors are applied correctly and my axis is discrete, and the y values of the points are mapped to pt.value like I wanted, but why don't the points dodge?!
Solve the dodging issue
Dodging is a funny thing in ggplot2. The best reasoning here I can give you is that for columns and barplots, dodging is sort of "built-in" to the geom, since the default position is "stack" and "dodge" represents an alternative method to draw the geom. For points, text, labels, and others, the default position is "identity" and you have to be more explicit in how they are going to dodge or they just don't dodge at all.
Basically, we need to let the points know what they are dodging based on. Is it "species"? With geom_col, it's assumed to be, but with geom_point, you need to specify. We do that by using a group= aesthetic, which let's the geom_point know what to use as criteria for dodging. When you add that, it works!
ggplot(df, aes(x=as.factor(year), y=values, group=species)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')

Ggplot2 in R gives incorrect coloring when creating overlapping demographic pyramids

I am creating an overlapping demographic pyramids in R with ggplot2 library to compare demographic data from two different sources.
I have however run in to problems with ggplot2 and the colouring when using the alpha-parameter. I have tried to make sense of ggplot2 and geom_bar structure, but so far it has gotten me nowhere. The deal is to draw four geom_bars where two geom_bars are overlapping each other (males and females, respectively). I'd have no problems if I didn't need use alpha to demonstrate differences in my data.
I would really appreciate some answers where I am going wrong here. As a R programmer I am pretty close to beginner, so bear with me if my code looks weird.
Below is my code which results in the image also shown below. I have altered my demographic data to be random for this question.
library(ggplot2)
# Here I randomise my data for StackOverflow
poptest<-data.frame(matrix(NA, nrow = 101, ncol = 5))
poptest[,1]<- seq(0,100)
poptest[,2]<- rpois(n = 101, lambda = 100)
poptest[,3]<- rpois(n = 101, lambda = 100)
poptest[,4]<- rpois(n = 101, lambda = 100)
poptest[,5]<- rpois(n = 101, lambda = 100)
colnames(poptest) <- c("age","A_males", "A_females","B_males", "B_females")
myLimits<-c(-250,250)
myBreaks<-seq(-250,250,50)
# Plot demographic pyramid
poptestPlot <- ggplot(data = poptest) +
geom_bar(aes(age,A_females,fill="black"), stat = "identity", alpha=0.75, position = "identity")+
geom_bar(aes(age,-A_males, fill="black"), stat = "identity", alpha=0.75, position="identity")+
geom_bar(aes(age,B_females, fill="white"), stat = "identity", alpha=0.5, position="identity")+
geom_bar(aes(age,-B_males, fill="white"), stat = "identity", alpha=0.5, position="identity")+
coord_flip()+
#set the y-axis which (because of the flip) shows as the x-axis
scale_y_continuous(name = "",
limits = myLimits,
breaks = myBreaks,
#give the values on the y-axis a name, to remove the negatives
#give abs() command to remove negative values
labels = paste0(as.character(abs(myBreaks))))+
#set the x-axis which (because of the flip) shows as the y-axis
scale_x_continuous(name = "age",breaks=seq(0,100,5)) +
#remove the legend
theme(legend.position = 'none')+
# Annotate geom_bars
annotate("text", x = 100, y = -200, label = "males",size=6)+
annotate("text", x = 100, y = 200, label = "females",size=6)
# show results in a separate window
x11()
print(poptestPlot)
This is what I get as result: (sorry, as a StackOverflow noob I can't embed my pictures)
Ggplot2 result
The colouring is really nonsensical. Black is not black and white is not white. Instead it may use some sort of default coloring because R or ggplot2 can't interpret my code.
I welcome any and all answers. Thank you.
You are trying to map "black" to data points. That means you would have to add a manual scale and tell ggplot to colour each instance of "black" in colour "black". There is a shortcut for this called scale_colour_identity. However, if this is your only level, it is much easier to just use fill outside the aes. This way the whole geom is filled in black or white respectively:
poptestPlot <- ggplot(data = poptest) +
geom_bar(aes(age,A_females),fill="black", stat = "identity", alpha=0.75, position = "identity")+
geom_bar(aes(age,-A_males), fill="black", stat = "identity", alpha=0.75, position="identity")+
geom_bar(aes(age,B_females), fill="white", stat = "identity", alpha=0.5, position="identity")+
geom_bar(aes(age,-B_males), fill="white", stat = "identity", alpha=0.5, position="identity")+
coord_flip()+
#set the y-axis which (because of the flip) shows as the x-axis
scale_y_continuous(name = "",
limits = myLimits,
breaks = myBreaks,
#give the values on the y-axis a name, to remove the negatives
#give abs() command to remove negative values
labels = paste0(as.character(abs(myBreaks))))+
#set the x-axis which (because of the flip) shows as the y-axis
scale_x_continuous(name = "age",breaks=seq(0,100,5)) +
#remove the legend
theme(legend.position = 'none')+
# Annotate geom_bars
annotate("text", x = 100, y = -200, label = "males",size=6)+
annotate("text", x = 100, y = 200, label = "females",size=6)

Manipulating the legend of scale_fill_gradient2

I have data which comes from a statistical test (gene set enrichment analysis, but that's not important), so I obtain p-values for statistics that are normally distributed, i.e., both positive and negative values:
The test is run on several categories:
set.seed(1)
df <- data.frame(col = rep(1,7),
category = LETTERS[1:7],
stat.sign = sign(rnorm(7)),
p.value = runif(7, 0, 1),
stringsAsFactors = TRUE)
I want to present these data in a geom_tile ggplot such that I color code the df$category by their df$p.value multiplied by their df$stat.sign (i.e, the sign of the statistic)
For that I first take the log10 of df$p.value:
df$sig <- df$stat.sign*(-1*log10(df$p.value))
Then I order the df by df$sig for each sign of df$sig:
library(dplyr)
df <- rbind(dplyr::filter(df, sig < 0)[order(dplyr::filter(df, sig < 0)$sig), ],
dplyr::filter(df, sig > 0)[order(dplyr::filter(df, sig > 0)$sig), ])
And then I ggplot it:
library(ggplot2)
df$category <- factor(df$category, levels=df$category)
ggplot(data = df,
aes(x = col, y = category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue', mid='white', high='darkred') +
theme_minimal() +
xlab("") + ylab("") + labs(fill="-log10(P-Value)") +
theme(axis.text.y = element_text(size=12, face="bold"),
axis.text.x = element_blank())
which gives me:
Is there a way to manipulate the legend such that the values of df$sig are represented by their absolute value but everything else remains unchanged? That way I still get both red and blue shades and maintain the order I want.
If you check ggplot's documentation, scale_fill_gradient2, like other continuous scales, accepts one of the following for its labels argument:
NULL for no labels
waiver() for the default labels computed for the transofrmation object
a character vector giving labels (must be same length as breaks)
a function that takes the breaks as input and returns labels as output
Since you only want the legend values to be absolute, I assume you're satisfied with the default breaks in the legend colour bar (-0.1 to 0.4 with increments in 0.1), so all you really need is to add a function that manipulates the labels.
I.e. instead of this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred') +
Use this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred',
labels = abs) +
I'm not sure I did understood what you're looking for. Do you meant that you wan't to change the labels within legends? If you want to change labels manipulating breaks and labels given by scale_fill_gradient2() shall do it.
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
For what you're looking for maybe you could display texts inside the figure to show the values, try stacking stat_bin_2d() like this:
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
stat_bin_2d(geom = 'text', aes(label = sig), colour = 'black', size = 16) +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
You might want to give the size and colour arguments some tries.

Resources