How do I flip the trendline patterns on my ggplot2 graph? - r

I want to make the Girls have the dashed trendline and the Boys have a solid trendline. I'd also like to remove the box around the graph, save the y and x-axis lines, and the shading behind the shapes on the key. I am using ggplot2 in R.
dr <- ggplot(DATASET,
aes(x=EC,
y=sqrt_Percent.5,
color=Sex1M,
shape=Sex1M,
linetype=Sex1M)) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
legend.background =element_rect(fill=NA,
size=0.5,
linetype="solid")) +
scale_color_grey(start = 0.0,
end = 0.4)
Current Graph

There is quite something going on in your visualisation. One strategy to develop this is to add layer and feature by feature once you have your base plot.
There a different ways to change the "sequence" of your colours, shapes, etc.
You can do this in ggplot with one of the scale_xxx_manual layers.
Conceptually, I suggest you deal with this in the data and only use the scales for "twisting". But that is a question of style.
In your case, you use Sex1M as a categorical variable. There is a built in sequence for (automatic) colouring and shapes. So in your case, you have to "define" the levels in another order.
As you have not provided a representative sample, I simulate some data points and define Sex1M as part of the data creation process.
DATASET <- data.frame(
x = sample(x = 2:7, size = 20, replace = TRUE)
, y = sample(x = 0.2:9.8, size = 20, replace = TRUE)
, Sex1M = sample(c("Boys", "Girls"), size = 20, replace = TRUE )
Now let's plot
library(dplyr)
library(ggplot2)
DATASET <- DATASET %>%
mutate(Sex1M = factor(Sex1M, levels = c("Boys","Girls)) # set sequence of levels: boys are now the first level aka 1st colour, linetype, shape.
# plot
ggplot(DATASET,
aes(x=x, # adapted to simulated data
y=y, # adapted to simulated data
color=Sex1M, # these values are now defined in the sequence
shape=Sex1M, # of the categorical factor you created
linetype=Sex1M) # adapt the factor levels as needed (e.g change order)
) +
geom_point(size= 3,
aes(shape=Sex1M,
color=Sex1M)) +
scale_shape_manual(values=c(1,16))+
geom_smooth(method=lm,
se=FALSE,
fullrange=TRUE) +
labs(x="xaxis title",
y = "yaxis title",
fill= "") +
xlim(3,7) +
ylim(0,10) +
theme(legend.position = 'right',
legend.title = element_blank(),
panel.border = element_rect(fill=NA,
color = 'white'),
panel.background = NULL,
#------------ ggplot is not always intuitive - the legend background the panel
# comprising the legend keys (symbols) and the labels
# you want to remove the colouring of the legend keys
legend.key = element_rect(fill = NA),
# ----------- that can go. To see above mentioned difference of background and key
# set fill = "blue"
# legend.background =element_rect(fill = NA, size=0.5,linetype="solid")
) +
scale_color_grey(start = 0.0,
end = 0.4)
The settings for the background panel make the outer line disappear in my plot.
Hope this helps to get you started.

Related

Tidying up the ggplot pie chart

After looking at various post and asking questions here i have been able to make a multi faceted pie chart. But i am facing a problem in tidying up the pie chart. Here are the things i am having troubles with:
How do i remove the facet labels from each row and only have one facet label on the top or bottom and left or right? How do i control how the facet label looks?
I have tried using facet_grid instead of facet_wrap and that removes the label from each row but still the labels are inside a box. I would like to remove the box which i donot seem to be able to do.
Centering the labels so that the values for each fraction of the pie is inside that pie-slice.
Some of my piechart have 8 to 10 values and they are not always inside there fraction. First i used geom_text_repel but that only helped me to repel the text. It didnt place the text inside each fraction. I also looked at this thread. I tried that by creating a new dataframe which has a position values and using that pos inside geom_text like so d<-c %>% group_by(Parameter)%>% mutate(pos= ave(Values, Zones, FUN = function(x) cumsum(x) - 0.5 * x)) and using the same code to make pie chart for d dataframe but it didnt quite work.
Grouping the values under certain level into one single "other" groups so the number of slices would be less
It would be ideal for me to be able to group the values with less than 1 % into one single group and call it "others" so that the number of slices are less. So far i have to completely ignore those values by c<-c[c$Values>1,] and using this newly created data frame.
Any suggestions/help regarding these issues would be helpful.
Following is the reproducible example of my current pie chart:
library(RColorBrewer)
library(ggrepel)
library(ggplot2)
library(tidyverse)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(, nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3), per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## Rearranging data:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=m_dt$Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### New Plot ###
ggplot(c, aes(x="", y=Values, fill=ssoilcmb)) +
geom_bar(stat="identity", width=1, position = position_fill())+
coord_polar("y", start=0) +
facet_wrap(Zones~Parameter, nrow = 3) +
geom_text_repel(aes(label = paste0(Values, "%")), position = position_fill(vjust = 0.5))+
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, y = NULL, fill = NULL, title = "Erosions")+
theme_classic() + theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))
If you're open to alternatives, maybe a facet_wrapped barplot will suit your needs, e.g.
library(RColorBrewer)
library(ggrepel)
library(tidyverse)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3),
per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## Rearranging data:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=m_dt$Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### New Plot ###
c$Zones <- factor(c$Zones,levels(c$Zones)[c(2,3,1)])
ggplot(c, aes(x=ssoilcmb, y=Values, fill=ssoilcmb)) +
geom_col()+
facet_wrap(Zones~Parameter, nrow = 3) +
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, fill = NULL, title = "Erosions")+
theme_minimal() + theme(axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 90,
hjust = 1,
vjust = 0.5),
plot.title = element_text(hjust = 0.5,
color = "#666666"))

Make side by side pie-chart for Two different columns in ggplot also facet wrap it for different factors

I am trying to make some pie chart for the data matrix that I have. I have created a dummy matrix below with similar variables. I have two result value to work with that is "AREA" and "EROSION". For each result values I can have one of three factors that is "High Precip Zones", "Moderate Precip. Zones" and "Low Precip. Zones".
I want to show this in piechart of 3 rows and 2 columns. Each row needs to have "Area" pie chart and "Erosion" pie chart for each Zones. I am able to separate the pie chart by adding facet_wrap(~Zones) but I am not sure what I can do to separate the pie chart by two different columns? So that I can show Erosion pie chart and Area pie chart side by side in each row.
Also my original matrix can have upto 15 values in each row which means pie chart is going to be crowded. Is there a way or may be a easy function that I can use the ignore values that are 0 or below certain threshold?
I will appreciate any help and suggestion regarding this.
library(ggplot2)
library(tidyverse)
library(RColorBrewer)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(, nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3), per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
############ plot
ggplot(m_dt, aes(x="", y=per_er, fill=ssoilcmb)) + geom_bar(stat="identity", width=1, position = position_fill())+
coord_polar("y", start=0) + facet_wrap(~ Zones) +geom_text_repel(aes(label = paste0(per_er, "%")), position = position_fill(vjust = 0.5))+
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, y = NULL, fill = NULL, title = "Erosions")+
theme_classic() + theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))
I think you need to rearrange your data in a suitable manner rather than playing with the graphical language itself.
I have given my suggestion here. Please forgive the High, Medium and Low having changed to 1, 2 and 3. I did that to make sense of the data for myself. You will obviously be renaming it.
library(tidyverse)
library(RColorBrewer)
library(ggrepel)
my_pal <- colorRampPalette(brewer.pal(9, "Set1"))
#### create new matrix ############
new_mat<-matrix(, nrow=40, ncol = 4)
colnames(new_mat)<-c("Zones", "ssoilcmb", "Erosion_t", "area..sq.m.")
for ( i in 1:nrow(new_mat)){
new_mat[i,4]<-as.numeric(sample(0:20, 1))
new_mat[i,3]<-as.numeric(sample(0:20, 1))
a<-sample(c("S2","S3","S4","S5","S1"),1)
b<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,1]<-sample(c("High Precip","Moderate Precip","Low Precip"),1)
new_mat[i,2]<-paste0(a,"_",b)
}
m_dt<-as.data.frame(new_mat)
m_dt$Erosion_t<-as.numeric(m_dt$Erosion_t)
m_dt$area..sq.m.<-as.numeric(m_dt$area..sq.m.)
#### calculate parea
m_dt<- m_dt %>%
group_by(Zones)%>%
mutate(per_er=signif((`Erosion_t`/sum(`Erosion_t`))*100,3), per_area=signif((`area..sq.m.`/sum(`area..sq.m.`))*100,3))
## You must rearrange your data as given here:
a<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_er)
b<-data.frame(m_dt$Zones,m_dt$ssoilcmb, m_dt$per_area)
c<-data.frame(Zones=Zones,ssoilcmb=m_dt$ssoilcmb,
Parameter=c(rep("Erosion",40),rep("Area",40)),
Values=c(m_dt$per_er,m_dt$per_area))
### Your New Plot ###
ggplot(c, aes(x="", y=Values, fill=ssoilcmb)) +
geom_bar(stat="identity", width=1, position = position_fill())+
coord_polar("y", start=0) +
facet_wrap(Zones~Parameter, nrow = 3) +
geom_text_repel(aes(label = paste0(Values, "%")), position = position_fill(vjust = 0.5))+
scale_fill_manual(values=my_pal(15)) +
labs(x = NULL, y = NULL, fill = NULL, title = "Erosions")+
theme_classic() + theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))

Stat summary for each factor in scatter plot ggplot2: What about fun.x, fun_y combinations?

I have a bunch of data for people touching bacteria for up to 5 touches. I'm comparing how much they pick up with and without gloves. I'd like to plot the mean by the factor NumberContacts and colour it red. E.g. the red dots on the following graphs.
So far I have:
require(tidyverse)
require(reshape2)
Make some data
df<-data.frame(Yes=rnorm(n=100),
No=rnorm(n=100),
NumberContacts=factor(rep(1:5, each=20)))
Calculate the mean for each group= NumberContacts
centroids<-aggregate(data=melt(df,id.vars ="NumberContacts"),value~NumberContacts+variable,mean)
Get them into two columns
centYes<-subset(centroids, variable=="Yes",select=c("NumberContacts","value"))
centNo<-subset(centroids, variable=="No",select="value")
centroids<-cbind(centYes,centNo)
colnames(centroids)<-c("NumberContacts","Gloved","Ungloved")
Make an ugly plot.
ggplot(df,aes(x=gloves,y=ungloved)+
geom_point()+
geom_abline(slope=1,linetype=2)+
stat_ellipse(type="norm",linetype=2,level=0.975)+
geom_point(data=centroids,size=5,color='red')+
#stat_summary(fun.y="mean",colour="red")+ doesn't work
facet_wrap(~NumberContacts,nrow=2)+
theme_classic()
Is there a more elegant way by using stat_summary? Also How can I change the look of the boxes at the top of my graphs?
stat_summary is not an option because (see ?stat_summary):
stat_summary operates on unique x
That is, while we can take a mean of y, x remains fixed. But we may do something else that is very concise:
ggplot(df, aes(x = Yes, y = No, group = NumberContacts)) +
geom_point() + geom_abline(slope = 1, linetype = 2)+
stat_ellipse(type = "norm", linetype = 2, level = 0.975)+
geom_point(data = df %>% group_by(NumberContacts) %>% summarise_all(mean), size = 5, color = "red")+
facet_wrap(~ NumberContacts, nrow = 2) + theme_classic() +
theme(strip.background = element_rect(fill = "black"),
strip.text = element_text(color = "white"))
which also shows that to modify the boxes above you want to look at strip elements of theme.

geom_point isn't filled by scale_fill_manual

I would like to draw a chart with ggplot for a couple of model accuracies. The detail of the plotted result doesn't matter, however, I've a problem to fill the geom_point objects.
A sample file can be found here: https://ufile.io/z1z4c
My code is:
library(ggplot2)
library(ggthemes)
Palette <- c('#A81D35', '#085575', '#1DA837')
results <- read.csv('test.csv', colClasses=c('factor', 'factor', 'factor', 'numeric'))
results$dates <- factor(results$dates, levels = c('01', '15', '27'))
results$pocd <- factor(results$pocd, levels = c('without POCD', 'with POCD', 'null accuracy'))
results$model <- factor(results$model, levels = c('SVM', 'DT', 'RF', 'Ada', 'NN'))
ggplot(data = results, group = pocd) +
geom_point(aes(x = dates, y = acc,
shape = pocd,
color = pocd,
fill = pocd,
size = pocd)) +
scale_shape_manual(values = c(0, 1, 3)) +
scale_color_manual(values = c(Palette[1], Palette[2], Palette[3])) +
scale_fill_manual(values = c(Palette[1], Palette[2], Palette[3])) +
scale_size_manual(values = c(2, 2, 1)) +
facet_grid(. ~ model) +
xlab('Date of knowledge') +
ylab('Accuracy') +
theme(legend.position = 'right',
legend.title = element_blank(),
axis.line = element_line(color = '#DDDDDD'))
As a result I get unfilled circles and squares. How can I fix it, so that the squares and circles are filled with the specfic color?
Additional question: I would like to add a geom_line to the graph, connecting the three points in each group. However, I fail to adjust linetype and width. It always take the values of scale_*_manual, which is very adverse especially in the case of size.
Thanks for helping!
You need to change the shapes specified, like so:
scale_shape_manual(values = c(21,22,23)) +
For your additional question, that should be solved if you set aes(size=) in the first part of your code (under ggplot(data=...) and then manually specify size=1 under geom_line as +geom_line(size=1....`

How can I make a Frequency distribution bar plot in ggplot2?

Sample of the dataset.
nq
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305
I would like to make a Percentage Bar plot/Histogram like this question: RE: Alignment of numbers on the individual bars with ggplot2
The max value of NQ for full dataset is 21 and minimum value is 0.00005
But I am unable to adapt the code as I don't have a Freq column and I have one series.
I have made a mockup of the figure I am trying to make.
Could you please help?
Would that work for you?
nq <- read.table(text = "
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305", header = F) # Your data
nq$V2 <- cut(nq$V1, 5, include.lowest = T)
nq2 <- aggregate(V1 ~ V2, nq, length)
nq2$V3 <- nq2$V1/sum(nq2$V1)
library(ggplot2)
ggplot() + geom_bar(data = nq2, aes(V2, V1), stat = "identity", width=1, fill = "white", col = "black", size = 2) +
geom_text(vjust=1, fontface="bold", data = nq2, aes(label = paste(sprintf("%.1f", V3*100), "%", sep=""), x = V2, y = V1 + 0.4), size = 5) +
theme_bw() +
scale_x_discrete(expand = c(0,0), labels = sprintf("%.3f",seq(min(nq$V1), max(nq$V1), by = max(nq$V1)/6))) +
ylab("No. of Cases") + xlab("") +
scale_y_continuous(expand = c(0,0)) +
theme(
axis.title.y = element_text(size = 20, face = "bold", angle = 0),
panel.grid.major = element_blank() ,
panel.grid.minor = element_blank() ,
panel.border = element_blank() ,
panel.background = element_blank(),
axis.line = element_line(color = 'black', size = 2),
axis.text.x = element_text(face="bold"),
axis.text.y = element_text(face="bold")
)
I thought this would be easy, but it turned out to be frustrating. So perhaps the "right" way is to transform your data before using ggplot as it looks like #DavidArenburg has done. But, if you feel like hacking ggplot, here's what I ended up doing.
First, some sample data.
set.seed(15)
dd<-data.frame(x=sample(1:25, 100, replace=T, prob=25:1))
br <- seq(0,25, by=5) # break points
My first attempt was
library(ggplot2)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., label=..density..*..width.., ymax=..count..+1),
vjust=-.5, breaks=br, stat="bin")
but that didn't make "pretty labels"
so i thought i'd use the percent() function from the scales package to make it pretty. However, silly ggplot doesn't really make it possible to use functions with ..().. variables because it evaluates them in the data.frame only (then the empty baseenv()). It doesn't have a way to find the function you use. So this is when I turned to hacking. First i'll extract the "Layer" definition from ggplot and the map_statistic from it. (NOTE: this was done with "ggplot2_1.0.0" and is specific to that version; this is a private function that may change in future releases)
orig.map_statistic <- ggplot2:::Layer$map_statistic
new.map_statistic <- orig.map_statistic
body(new.map_statistic)[[9]]
# stat_data <- as.data.frame(lapply(new, eval, data, baseenv()))
here's the line that's causing grief I would prefer it the function resolved other names in the plot environment that are not found in the data.frame. So I decided to change it with
body(new.map_statistic)[[9]] <- quote(stat_data <- as.data.frame(lapply(new, eval, data, plot$plot_env)))
assign("map_statistic", new.map_statistic, envir=ggplot2:::Layer)
So now I can use functions with ..().. variables. So I can do
library(scales)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., ymax=..count..+2,
label=percent(..density..*..width..)),
vjust=-.5, breaks=br, stat="bin")
to get
So i'm not sure why ggplot has this default behavior. There could be some good reason for it but I don't know what it is. This does change how ggplot will behave for the rest of the session. You can change back to default with
assign("map_statistic", orig.map_statistic, envir=ggplot2:::Layer)

Resources