Tidying up the ggplot pie chart - r

After looking at various post and asking questions here i have been able to make a multi faceted pie chart. But i am facing a problem in tidying up the pie chart. Here are the things i am having troubles with:
How do i remove the facet labels from each row and only have one facet label on the top or bottom and left or right? How do i control how the facet label looks?
I have tried using facet_grid instead of facet_wrap and that removes the label from each row but still the labels are inside a box. I would like to remove the box which i donot seem to be able to do.
Centering the labels so that the values for each fraction of the pie is inside that pie-slice.
Some of my piechart have 8 to 10 values and they are not always inside there fraction. First i used geom_text_repel but that only helped me to repel the text. It didnt place the text inside each fraction. I also looked at this thread. I tried that by creating a new dataframe which has a position values and using that pos inside geom_text like so d<-c %>% group_by(Parameter)%>% mutate(pos= ave(Values, Zones, FUN = function(x) cumsum(x) - 0.5 * x)) and using the same code to make pie chart for d dataframe but it didnt quite work.
Grouping the values under certain level into one single "other" groups so the number of slices would be less
It would be ideal for me to be able to group the values with less than 1 % into one single group and call it "others" so that the number of slices are less. So far i have to completely ignore those values by c<-c[c$Values>1,] and using this newly created data frame.
Any suggestions/help regarding these issues would be helpful.
Following is the reproducible example of my current pie chart:
library(RColorBrewer)
library(ggrepel)
library(ggplot2)
library(tidyverse)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(, nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3), per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## Rearranging data:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=m_dt$Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### New Plot ###
ggplot(c, aes(x="", y=Values, fill=ssoilcmb)) +
geom_bar(stat="identity", width=1, position = position_fill())+
coord_polar("y", start=0) +
facet_wrap(Zones~Parameter, nrow = 3) +
geom_text_repel(aes(label = paste0(Values, "%")), position = position_fill(vjust = 0.5))+
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, y = NULL, fill = NULL, title = "Erosions")+
theme_classic() + theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))

If you're open to alternatives, maybe a facet_wrapped barplot will suit your needs, e.g.
library(RColorBrewer)
library(ggrepel)
library(tidyverse)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3),
per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## Rearranging data:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=m_dt$Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### New Plot ###
c$Zones <- factor(c$Zones,levels(c$Zones)[c(2,3,1)])
ggplot(c, aes(x=ssoilcmb, y=Values, fill=ssoilcmb)) +
geom_col()+
facet_wrap(Zones~Parameter, nrow = 3) +
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, fill = NULL, title = "Erosions")+
theme_minimal() + theme(axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 90,
hjust = 1,
vjust = 0.5),
plot.title = element_text(hjust = 0.5,
color = "#666666"))

Related

How do I flip the trendline patterns on my ggplot2 graph?

I want to make the Girls have the dashed trendline and the Boys have a solid trendline. I'd also like to remove the box around the graph, save the y and x-axis lines, and the shading behind the shapes on the key. I am using ggplot2 in R.
dr <- ggplot(DATASET,
aes(x=EC,
y=sqrt_Percent.5,
color=Sex1M,
shape=Sex1M,
linetype=Sex1M)) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
legend.background =element_rect(fill=NA,
size=0.5,
linetype="solid")) +
scale_color_grey(start = 0.0,
end = 0.4)
Current Graph
There is quite something going on in your visualisation. One strategy to develop this is to add layer and feature by feature once you have your base plot.
There a different ways to change the "sequence" of your colours, shapes, etc.
You can do this in ggplot with one of the scale_xxx_manual layers.
Conceptually, I suggest you deal with this in the data and only use the scales for "twisting". But that is a question of style.
In your case, you use Sex1M as a categorical variable. There is a built in sequence for (automatic) colouring and shapes. So in your case, you have to "define" the levels in another order.
As you have not provided a representative sample, I simulate some data points and define Sex1M as part of the data creation process.
DATASET <- data.frame(
x = sample(x = 2:7, size = 20, replace = TRUE)
, y = sample(x = 0.2:9.8, size = 20, replace = TRUE)
, Sex1M = sample(c("Boys", "Girls"), size = 20, replace = TRUE )
Now let's plot
library(dplyr)
library(ggplot2)
DATASET <- DATASET %>%
mutate(Sex1M = factor(Sex1M, levels = c("Boys","Girls)) # set sequence of levels: boys are now the first level aka 1st colour, linetype, shape.
# plot
ggplot(DATASET,
aes(x=x, # adapted to simulated data
y=y, # adapted to simulated data
color=Sex1M, # these values are now defined in the sequence
shape=Sex1M, # of the categorical factor you created
linetype=Sex1M) # adapt the factor levels as needed (e.g change order)
) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
#------------ ggplot is not always intuitive - the legend background the panel
# comprising the legend keys (symbols) and the labels
# you want to remove the colouring of the legend keys
legend.key = element_rect(fill = NA),
# ----------- that can go. To see above mentioned difference of background and key
# set fill = "blue"
# legend.background =element_rect(fill = NA, size=0.5,linetype="solid")
) +
scale_color_grey(start = 0.0,
end = 0.4)
The settings for the background panel make the outer line disappear in my plot.
Hope this helps to get you started.

Make side by side pie-chart for Two different columns in ggplot also facet wrap it for different factors

I am trying to make some pie chart for the data matrix that I have. I have created a dummy matrix below with similar variables. I have two result value to work with that is "AREA" and "EROSION". For each result values I can have one of three factors that is "High Precip Zones", "Moderate Precip. Zones" and "Low Precip. Zones".
I want to show this in piechart of 3 rows and 2 columns. Each row needs to have "Area" pie chart and "Erosion" pie chart for each Zones. I am able to separate the pie chart by adding facet_wrap(~Zones) but I am not sure what I can do to separate the pie chart by two different columns? So that I can show Erosion pie chart and Area pie chart side by side in each row.
Also my original matrix can have upto 15 values in each row which means pie chart is going to be crowded. Is there a way or may be a easy function that I can use the ignore values that are 0 or below certain threshold?
I will appreciate any help and suggestion regarding this.
library(ggplot2)
library(tidyverse)
library(RColorBrewer)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(, nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3), per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
############ plot
ggplot(m_dt, aes(x="", y=per_er, fill=ssoilcmb)) + geom_bar(stat="identity", width=1, position = position_fill())+
coord_polar("y", start=0) + facet_wrap(~ Zones) +geom_text_repel(aes(label = paste0(per_er, "%")), position = position_fill(vjust = 0.5))+
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, y = NULL, fill = NULL, title = "Erosions")+
theme_classic() + theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))
I think you need to rearrange your data in a suitable manner rather than playing with the graphical language itself.
I have given my suggestion here. Please forgive the High, Medium and Low having changed to 1, 2 and 3. I did that to make sense of the data for myself. You will obviously be renaming it.
library(tidyverse)
library(RColorBrewer)
library(ggrepel)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(, nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3), per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## You must rearrange your data as given here:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### Your New Plot ###
ggplot(c, aes(x="", y=Values, fill=ssoilcmb)) +
geom_bar(stat="identity", width=1, position = position_fill())+
coord_polar("y", start=0) +
facet_wrap(Zones~Parameter, nrow = 3) +
geom_text_repel(aes(label = paste0(Values, "%")), position = position_fill(vjust = 0.5))+
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, y = NULL, fill = NULL, title = "Erosions")+
theme_classic() + theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))

ggplot2 : create a faceted pie chart with an empty space after the first column

At the moment I'm creating a graph like this with the following code ussing ggplot2
ggplot(data.df[sel,], aes(x=1, y = mean_ponderation_norm, fill = Usage)) +
geom_bar(stat="identity",color='black') +
coord_polar(theta='y')+
scale_fill_brewer(type="qual", palette = 2, direction = 1, name="")+
facet_grid(zone.x~Type_service)+
theme(#panel.grid = element_blank(), ## remove guide lines
#axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks=element_blank(), # the axis ticks
axis.title=element_blank() # the axis labels
)+
scale_y_continuous(limits = c(0, 100))
But I would like the slices in the second column to start where they stop in the first column. To get something like this
When drawing with polar co-ordinates, it's always useful to think about how the desired effect would look in Cartesian co-ordinates. What you are looking for would look something like this in Cartesian coordinates:
This particular effect is just about possible with a stacked bar chart and a bit of hacking, provided you only have two facets as your picture suggests. However, your code suggests you have 4 or more, so this isn't going to work.
Rather than using a stacked bar, you can achieve the result you're looking for with geom_rect. It's a bit disappointing that you didn't supply any data or a minimal reproducible example, since you'll need to transform your own data to be able to use geom_rect. I have had to create my own data to show how this can be done, and that means that you will have to figure out how to transform your own data to do the same thing.
Anyway, here's the data I'm going to use. In your data, mean_ponderation_norm contains the y values. I'm going to call mine all_widths. You need to cumsum and normalise this. You also need to create a lagged copy of it with a zero at position 1 and add this to your data frame:
set.seed(69)
all_widths <- runif(10, 5, 10)
all_widths <- 60 * cumsum(all_widths)/sum(all_widths)
df <- data.frame(facet_var = rep(LETTERS[1:2], each = 5),
factor_var = rep(LETTERS[1:10]),
start_var = c(0, all_widths[-length(all_widths)]),
end_var = all_widths)
Now your plot goes like this:
ggplot(df, aes(xmin = 0, xmax = 1, ymin = start_var,
ymax = end_var, fill = factor_var)) +
geom_rect(color = 'black', size = 1) +
coord_polar(theta = 'y') +
facet_grid(. ~ facet_var) +
theme(axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank()) +
scale_y_continuous(limits = c(0, 100)) +
guides(fill = "none")

Put legend under each facet using facet_grid; adding one title and one caption to plot

I'm working with a plot analogous to the following:
ggplot(data=mtcars, aes(x=wt, y=mpg, color=carb)) +
geom_line() + facet_grid(gear ~ .) +
ggtitle(expression("Title")) +
labs(caption = "Sources: Compustat, Author's Calculations") +
theme(plot.title = element_text(size = 20, hjust = 0.5),
plot.caption=element_text(size=8, hjust=.5),
strip.background = element_blank(),
strip.text = element_blank(),
legend.title = element_blank())
I'm trying to do the following:
Insert a legend beneath each of the 3 facets, each legend specific to the facet above it.
Insert one plot title (as opposed to the same title above each facet).
Insert one caption beneath the final facet (as opposed to three captions beneath each facet).
I was able to reproduce this example on assigning a legend to each facet.
However, the plot title was placed above and the caption below each facet. Also, this example uses facet_wrap and not facet_grid.
Thank you in advance.
library(dplyr)
library(ggplot2)
tempgg <- mtcars %>%
group_by(gear) %>%
do(gg = {ggplot(data=., aes(x=wt, y=mpg, color=carb)) +
geom_point() +
labs(x = NULL) +
guides(color = guide_colorbar(title.position = "left")) +
theme(plot.title = element_text(size = 20, hjust = 0.5),
plot.caption=element_text(size=8, hjust=.5),
legend.position = "bottom")})
tempgg$gg[1][[1]] <- tempgg$gg[1][[1]] + labs(title = "Top title")
tempgg$gg[3][[1]] <- tempgg$gg[3][[1]] + labs(x = "Axis label", caption = "Bottom caption")
tempgg %>% gridExtra::grid.arrange(grobs = .$gg)
This isn't the most elegant way to do it. Each of the three grobs gets an equal space when you grid.arrange them, so the first and last ones are squished from the title and caption taking up space. You could add something like heights = c(3,2,3) inside the grid.arrange call, but you'd have to fiddle with each of the heights to get it to look right, and even then it would be a visual approximation, not exact.
To do it the more precise way, you'd need to look at the underlying gtables in each of the grobs. https://stackoverflow.com/users/471093/baptiste is the expert on that.
Update:
I used a #baptiste solution, which is still not particularly elegant, but gives you the same plot space for each panel. Use this snippet in place of the last line above.
tempggt <- tempgg %>% do(ggt = ggplot_gtable(ggplot_build(.$gg))) %>% .$ggt
gg1 <- tempggt[[1]]
gg2 <- tempggt[[2]]
gg3 <- tempggt[[3]]
gridExtra::grid.arrange(gridExtra::rbind.gtable(gg1, gg2, gg3))

How can I make a Frequency distribution bar plot in ggplot2?

Sample of the dataset.
nq
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305
I would like to make a Percentage Bar plot/Histogram like this question: RE: Alignment of numbers on the individual bars with ggplot2
The max value of NQ for full dataset is 21 and minimum value is 0.00005
But I am unable to adapt the code as I don't have a Freq column and I have one series.
I have made a mockup of the figure I am trying to make.
Could you please help?
Would that work for you?
nq <- read.table(text = "
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305", header = F) # Your data
nq$V2 <- cut(nq$V1, 5, include.lowest = T)
nq2 <- aggregate(V1 ~ V2, nq, length)
nq2$V3 <- nq2$V1/sum(nq2$V1)
library(ggplot2)
ggplot() + geom_bar(data = nq2, aes(V2, V1), stat = "identity", width=1, fill = "white", col = "black", size = 2) +
geom_text(vjust=1, fontface="bold", data = nq2, aes(label = paste(sprintf("%.1f", V3*100), "%", sep=""), x = V2, y = V1 + 0.4), size = 5) +
theme_bw() +
scale_x_discrete(expand = c(0,0), labels = sprintf("%.3f",seq(min(nq$V1), max(nq$V1), by = max(nq$V1)/6))) +
ylab("No. of Cases") + xlab("") +
scale_y_continuous(expand = c(0,0)) +
theme(
axis.title.y = element_text(size = 20, face = "bold", angle = 0),
panel.grid.major = element_blank() ,
panel.grid.minor = element_blank() ,
panel.border = element_blank() ,
panel.background = element_blank(),
axis.line = element_line(color = 'black', size = 2),
axis.text.x = element_text(face="bold"),
axis.text.y = element_text(face="bold")
)
I thought this would be easy, but it turned out to be frustrating. So perhaps the "right" way is to transform your data before using ggplot as it looks like #DavidArenburg has done. But, if you feel like hacking ggplot, here's what I ended up doing.
First, some sample data.
set.seed(15)
dd<-data.frame(x=sample(1:25, 100, replace=T, prob=25:1))
br <- seq(0,25, by=5) # break points
My first attempt was
library(ggplot2)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., label=..density..*..width.., ymax=..count..+1),
vjust=-.5, breaks=br, stat="bin")
but that didn't make "pretty labels"
so i thought i'd use the percent() function from the scales package to make it pretty. However, silly ggplot doesn't really make it possible to use functions with ..().. variables because it evaluates them in the data.frame only (then the empty baseenv()). It doesn't have a way to find the function you use. So this is when I turned to hacking. First i'll extract the "Layer" definition from ggplot and the map_statistic from it. (NOTE: this was done with "ggplot2_1.0.0" and is specific to that version; this is a private function that may change in future releases)
orig.map_statistic <- ggplot2:::Layer$map_statistic
new.map_statistic <- orig.map_statistic
body(new.map_statistic)[[9]]
# stat_data <- as.data.frame(lapply(new, eval, data, baseenv()))
here's the line that's causing grief I would prefer it the function resolved other names in the plot environment that are not found in the data.frame. So I decided to change it with
body(new.map_statistic)[[9]] <- quote(stat_data <- as.data.frame(lapply(new, eval, data, plot$plot_env)))
assign("map_statistic", new.map_statistic, envir=ggplot2:::Layer)
So now I can use functions with ..().. variables. So I can do
library(scales)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., ymax=..count..+2,
label=percent(..density..*..width..)),
vjust=-.5, breaks=br, stat="bin")
to get
So i'm not sure why ggplot has this default behavior. There could be some good reason for it but I don't know what it is. This does change how ggplot will behave for the rest of the session. You can change back to default with
assign("map_statistic", orig.map_statistic, envir=ggplot2:::Layer)

Resources