How can I make a Frequency distribution bar plot in ggplot2? - r

Sample of the dataset.
nq
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305
I would like to make a Percentage Bar plot/Histogram like this question: RE: Alignment of numbers on the individual bars with ggplot2
The max value of NQ for full dataset is 21 and minimum value is 0.00005
But I am unable to adapt the code as I don't have a Freq column and I have one series.
I have made a mockup of the figure I am trying to make.
Could you please help?

Would that work for you?
nq <- read.table(text = "
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305", header = F) # Your data
nq$V2 <- cut(nq$V1, 5, include.lowest = T)
nq2 <- aggregate(V1 ~ V2, nq, length)
nq2$V3 <- nq2$V1/sum(nq2$V1)
library(ggplot2)
ggplot() + geom_bar(data = nq2, aes(V2, V1), stat = "identity", width=1, fill = "white", col = "black", size = 2) +
geom_text(vjust=1, fontface="bold", data = nq2, aes(label = paste(sprintf("%.1f", V3*100), "%", sep=""), x = V2, y = V1 + 0.4), size = 5) +
theme_bw() +
scale_x_discrete(expand = c(0,0), labels = sprintf("%.3f",seq(min(nq$V1), max(nq$V1), by = max(nq$V1)/6))) +
ylab("No. of Cases") + xlab("") +
scale_y_continuous(expand = c(0,0)) +
theme(
axis.title.y = element_text(size = 20, face = "bold", angle = 0),
panel.grid.major = element_blank() ,
panel.grid.minor = element_blank() ,
panel.border = element_blank() ,
panel.background = element_blank(),
axis.line = element_line(color = 'black', size = 2),
axis.text.x = element_text(face="bold"),
axis.text.y = element_text(face="bold")
)

I thought this would be easy, but it turned out to be frustrating. So perhaps the "right" way is to transform your data before using ggplot as it looks like #DavidArenburg has done. But, if you feel like hacking ggplot, here's what I ended up doing.
First, some sample data.
set.seed(15)
dd<-data.frame(x=sample(1:25, 100, replace=T, prob=25:1))
br <- seq(0,25, by=5) # break points
My first attempt was
library(ggplot2)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., label=..density..*..width.., ymax=..count..+1),
vjust=-.5, breaks=br, stat="bin")
but that didn't make "pretty labels"
so i thought i'd use the percent() function from the scales package to make it pretty. However, silly ggplot doesn't really make it possible to use functions with ..().. variables because it evaluates them in the data.frame only (then the empty baseenv()). It doesn't have a way to find the function you use. So this is when I turned to hacking. First i'll extract the "Layer" definition from ggplot and the map_statistic from it. (NOTE: this was done with "ggplot2_1.0.0" and is specific to that version; this is a private function that may change in future releases)
orig.map_statistic <- ggplot2:::Layer$map_statistic
new.map_statistic <- orig.map_statistic
body(new.map_statistic)[[9]]
# stat_data <- as.data.frame(lapply(new, eval, data, baseenv()))
here's the line that's causing grief I would prefer it the function resolved other names in the plot environment that are not found in the data.frame. So I decided to change it with
body(new.map_statistic)[[9]] <- quote(stat_data <- as.data.frame(lapply(new, eval, data, plot$plot_env)))
assign("map_statistic", new.map_statistic, envir=ggplot2:::Layer)
So now I can use functions with ..().. variables. So I can do
library(scales)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., ymax=..count..+2,
label=percent(..density..*..width..)),
vjust=-.5, breaks=br, stat="bin")
to get
So i'm not sure why ggplot has this default behavior. There could be some good reason for it but I don't know what it is. This does change how ggplot will behave for the rest of the session. You can change back to default with
assign("map_statistic", orig.map_statistic, envir=ggplot2:::Layer)

Related

How do I flip the trendline patterns on my ggplot2 graph?

I want to make the Girls have the dashed trendline and the Boys have a solid trendline. I'd also like to remove the box around the graph, save the y and x-axis lines, and the shading behind the shapes on the key. I am using ggplot2 in R.
dr <- ggplot(DATASET,
aes(x=EC,
y=sqrt_Percent.5,
color=Sex1M,
shape=Sex1M,
linetype=Sex1M)) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
legend.background =element_rect(fill=NA,
size=0.5,
linetype="solid")) +
scale_color_grey(start = 0.0,
end = 0.4)
Current Graph
There is quite something going on in your visualisation. One strategy to develop this is to add layer and feature by feature once you have your base plot.
There a different ways to change the "sequence" of your colours, shapes, etc.
You can do this in ggplot with one of the scale_xxx_manual layers.
Conceptually, I suggest you deal with this in the data and only use the scales for "twisting". But that is a question of style.
In your case, you use Sex1M as a categorical variable. There is a built in sequence for (automatic) colouring and shapes. So in your case, you have to "define" the levels in another order.
As you have not provided a representative sample, I simulate some data points and define Sex1M as part of the data creation process.
DATASET <- data.frame(
x = sample(x = 2:7, size = 20, replace = TRUE)
, y = sample(x = 0.2:9.8, size = 20, replace = TRUE)
, Sex1M = sample(c("Boys", "Girls"), size = 20, replace = TRUE )
Now let's plot
library(dplyr)
library(ggplot2)
DATASET <- DATASET %>%
mutate(Sex1M = factor(Sex1M, levels = c("Boys","Girls)) # set sequence of levels: boys are now the first level aka 1st colour, linetype, shape.
# plot
ggplot(DATASET,
aes(x=x, # adapted to simulated data
y=y, # adapted to simulated data
color=Sex1M, # these values are now defined in the sequence
shape=Sex1M, # of the categorical factor you created
linetype=Sex1M) # adapt the factor levels as needed (e.g change order)
) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
#------------ ggplot is not always intuitive - the legend background the panel
# comprising the legend keys (symbols) and the labels
# you want to remove the colouring of the legend keys
legend.key = element_rect(fill = NA),
# ----------- that can go. To see above mentioned difference of background and key
# set fill = "blue"
# legend.background =element_rect(fill = NA, size=0.5,linetype="solid")
) +
scale_color_grey(start = 0.0,
end = 0.4)
The settings for the background panel make the outer line disappear in my plot.
Hope this helps to get you started.

Tidying up the ggplot pie chart

After looking at various post and asking questions here i have been able to make a multi faceted pie chart. But i am facing a problem in tidying up the pie chart. Here are the things i am having troubles with:
How do i remove the facet labels from each row and only have one facet label on the top or bottom and left or right? How do i control how the facet label looks?
I have tried using facet_grid instead of facet_wrap and that removes the label from each row but still the labels are inside a box. I would like to remove the box which i donot seem to be able to do.
Centering the labels so that the values for each fraction of the pie is inside that pie-slice.
Some of my piechart have 8 to 10 values and they are not always inside there fraction. First i used geom_text_repel but that only helped me to repel the text. It didnt place the text inside each fraction. I also looked at this thread. I tried that by creating a new dataframe which has a position values and using that pos inside geom_text like so d<-c %>% group_by(Parameter)%>% mutate(pos= ave(Values, Zones, FUN = function(x) cumsum(x) - 0.5 * x)) and using the same code to make pie chart for d dataframe but it didnt quite work.
Grouping the values under certain level into one single "other" groups so the number of slices would be less
It would be ideal for me to be able to group the values with less than 1 % into one single group and call it "others" so that the number of slices are less. So far i have to completely ignore those values by c<-c[c$Values>1,] and using this newly created data frame.
Any suggestions/help regarding these issues would be helpful.
Following is the reproducible example of my current pie chart:
library(RColorBrewer)
library(ggrepel)
library(ggplot2)
library(tidyverse)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(, nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3), per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## Rearranging data:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=m_dt$Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### New Plot ###
ggplot(c, aes(x="", y=Values, fill=ssoilcmb)) +
geom_bar(stat="identity", width=1, position = position_fill())+
coord_polar("y", start=0) +
facet_wrap(Zones~Parameter, nrow = 3) +
geom_text_repel(aes(label = paste0(Values, "%")), position = position_fill(vjust = 0.5))+
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, y = NULL, fill = NULL, title = "Erosions")+
theme_classic() + theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))
If you're open to alternatives, maybe a facet_wrapped barplot will suit your needs, e.g.
library(RColorBrewer)
library(ggrepel)
library(tidyverse)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3),
per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## Rearranging data:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=m_dt$Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### New Plot ###
c$Zones <- factor(c$Zones,levels(c$Zones)[c(2,3,1)])
ggplot(c, aes(x=ssoilcmb, y=Values, fill=ssoilcmb)) +
geom_col()+
facet_wrap(Zones~Parameter, nrow = 3) +
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, fill = NULL, title = "Erosions")+
theme_minimal() + theme(axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 90,
hjust = 1,
vjust = 0.5),
plot.title = element_text(hjust = 0.5,
color = "#666666"))

ggplot2 : create a faceted pie chart with an empty space after the first column

At the moment I'm creating a graph like this with the following code ussing ggplot2
ggplot(data.df[sel,], aes(x=1, y = mean_ponderation_norm, fill = Usage)) +
geom_bar(stat="identity",color='black') +
coord_polar(theta='y')+
scale_fill_brewer(type="qual", palette = 2, direction = 1, name="")+
facet_grid(zone.x~Type_service)+
theme(#panel.grid = element_blank(), ## remove guide lines
#axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks=element_blank(), # the axis ticks
axis.title=element_blank() # the axis labels
)+
scale_y_continuous(limits = c(0, 100))
But I would like the slices in the second column to start where they stop in the first column. To get something like this
When drawing with polar co-ordinates, it's always useful to think about how the desired effect would look in Cartesian co-ordinates. What you are looking for would look something like this in Cartesian coordinates:
This particular effect is just about possible with a stacked bar chart and a bit of hacking, provided you only have two facets as your picture suggests. However, your code suggests you have 4 or more, so this isn't going to work.
Rather than using a stacked bar, you can achieve the result you're looking for with geom_rect. It's a bit disappointing that you didn't supply any data or a minimal reproducible example, since you'll need to transform your own data to be able to use geom_rect. I have had to create my own data to show how this can be done, and that means that you will have to figure out how to transform your own data to do the same thing.
Anyway, here's the data I'm going to use. In your data, mean_ponderation_norm contains the y values. I'm going to call mine all_widths. You need to cumsum and normalise this. You also need to create a lagged copy of it with a zero at position 1 and add this to your data frame:
set.seed(69)
all_widths <- runif(10, 5, 10)
all_widths <- 60 * cumsum(all_widths)/sum(all_widths)
df <- data.frame(facet_var = rep(LETTERS[1:2], each = 5),
factor_var = rep(LETTERS[1:10]),
start_var = c(0, all_widths[-length(all_widths)]),
end_var = all_widths)
Now your plot goes like this:
ggplot(df, aes(xmin = 0, xmax = 1, ymin = start_var,
ymax = end_var, fill = factor_var)) +
geom_rect(color = 'black', size = 1) +
coord_polar(theta = 'y') +
facet_grid(. ~ facet_var) +
theme(axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank()) +
scale_y_continuous(limits = c(0, 100)) +
guides(fill = "none")

Failed to change the subplot background when generating a grouped correlation matrix heat map using ggplot2 facet

I am a newbie for stack Overflow and r language.
Here is my problem.
I now have a dataframe with one variable called Type and other 14 variables whose correlation matrix heatmap needed to be calculated.
origin dataset
I already have an overall format using ggplot2, and the theme is default theme_grey but fine for me to view. The code is :
m<- melt(get_lower_tri(round(cor(xrf[3:16], method = 'pearson', use = 'pairwise.complete.obs'), 2)),na.rm = TRUE)
ggplot(m, aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = 'skyblue4',
high = 'coral2',
mid = 'white',
midpoint = 0,
limit = c(-1, 1),
space = "Lab",
name = 'Person\nCorrelation') +
theme_grey()+
coord_fixed() +
theme(axis.title = element_blank())
The result is fine and the background looks good to view.
But when I managed to generate a grouped correlation matrix heatmap, I found that no matter how hard I tried (using theme(panel.background = element_rect()) or theme(panel.background = element_blank())), the subplot backgrounds won’t change and remain this ugly grey which is even different from the overall one.
Here is my code:
Type = rep(c('(a)', '(b)', '(c)','(d)', '(e)', '(f)', '(g)', '(h)', '(i)', '(j)'), each = 14^2)
# Get lower triangle of the correlation matrix
get_lower_tri<-function(x){
x[upper.tri(x)] <- NA
return(x)
}
df2 <- do.call(rbind, lapply(split(xrf, xrf$Type),
function(x) melt(get_lower_tri(round(cor(x[3:16], method = 'pearson', use = 'pairwise.complete.obs'), 2)),na.rm = FALSE)))
my_cors <- cbind(Type,df2)
my_cors %>%
ggplot(aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = 'skyblue4',
high = 'coral2',
mid = 'white',
midpoint = 0,
limit = c(-1, 1),
space = "Lab",
name = 'Person\nCorrelation') +
theme_grey()+
coord_fixed() +
theme(axis.title = element_blank(),
panel.background = element_rect(fill = 'grey90',colour = NA))+
facet_wrap("Type",ncol = 5, nrow = 2)
Isn’t the facet subplot backgrounds the same as the overall one if using the same theme? And how can I change it?
Update:sorry! It’s my first time to raise a question and it’s not a good one!
xrf is my original dataset...But now I have figured out why thanks to Tjebo and those who comment my faulty questions.It’s very instructive to me!!
scale_fill_gredient2(...,na.value = 'transparent') will solve it.The default value of this parameter is "grey50" which I took as the background color.
I am truly sorry for asking such a silly question, and I really really appreciate you guys’s nice comment for a rookie! Thank you guys!

Color code points based on percentile in ggplot

I have some very large files that contain a genomic position (position) and a corresponding population genetic statistic (value). I have successfully plotted these values and would like to color code the top 5% (blue) and 1% (red) of values. I am wondering if there is an easy way to do this in R.
I have explored writing a function that defines the quantiles, however, many of them end up being not unique and thus cause the function to fail. I've also looked into stat_quantile but only had success in using this to plot a line marking the 95% and 99% (and some of the lines were diagonals which did not make any sense to me.) (Sorry, I am new to R.)
Any help would be much appreciated.
Here is my code: (The files are very large)
########Combine data from multiple files
fst <- rbind(data.frame(key="a1-a3", position=a1.3$V2, value=a1.3$V3), data.frame(key="a1-a2", position=a1.2$V2, value=a1.2$V3), data.frame(key="a2-a3", position=a2.3$V2, value=a2.3$V3), data.frame(key="b1-b2", position=b1.2$V2, value=b1.2$V3), data.frame(key="c1-c2", position=c1.2$V2, value=c1.2$V3))
########the plot
theme_set(theme_bw(base_size = 16))
p1 <- ggplot(fst, aes(x=position, y=value)) +
geom_point() +
facet_wrap(~key) +
ylab("Fst") +
xlab("Genomic Position (Mb)") +
scale_x_continuous(breaks=c(1e+06, 2e+06, 3e+06, 4e+06), labels=c("1", "2", "3", "4")) +
scale_y_continuous(limits=c(0,1)) +
theme(plot.background = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
legend.position="none",
legend.title = element_blank()
)
p1
You can achieve this slightly more elegantly by incorporating quantile and cut into the aes colour expression. For example col=cut(d,quantile(d)) in this example:
d = as.vector(round(abs(10 * sapply(1:4, function(n)rnorm(20, mean=n, sd=.6)))))
ggplot(data=NULL, aes(x=1:length(d), y=d, col=cut(d,quantile(d)))) +
geom_point(size=5) + scale_colour_manual(values=rainbow(5))
I've also made a useful workflow for pretty legend labels which someone might find handy.
This is how I would approach it - basically creating a factor defining which group each observation is in, then mapping colour to that factor.
First, some data to work with!
dat <- data.frame(key = c("a1-a3", "a1-a2"), position = 1:100, value = rlnorm(200, 0, 1))
#Get quantiles
quants <- quantile(dat$value, c(0.95, 0.99))
There are plenty of ways of getting a factor to determine which group each observation falls into, here is one:
dat$quant <- with(dat, factor(ifelse(value < quants[1], 0,
ifelse(value < quants[2], 1, 2))))
So quant now indicates whether an observation is in the 95-99 or 99+ group. The colour of the points in a plot can then easily be mapped to quant.
ggplot(dat, aes(position, value)) + geom_point(aes(colour = quant)) + facet_wrap(~key) +
scale_colour_manual(values = c("black", "blue", "red"),
labels = c("0-95", "95-99", "99-100")) + theme_bw()
I´m not sure if this is what you are searching for, but maybe it helps:
# a little function which returns factors with three levels, normal, 95% and 99%
qfun <- function(x, qant_1=0.95, qant_2=0.99){
q <- sort(c(quantile(x, qant_1), quantile(x, qant_2)))
factor(cut(x, breaks = c(min(x), q[1], q[2], max(x))))
}
df <- data.frame(samp=rnorm(1000))
ggplot(df, aes(x=1:1000, y=df$samp)) + geom_point(colour=qfun(df$samp))+
xlab("")+ylab("")+
theme(plot.background = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
legend.position="none",
legend.title = element_blank())
as a result I got

Resources