I have a sample data set containing a end of week date and a churn value, either be negative or positive. In ggplot2 I use the scale_fill_manual() on the sign of the value as group.
This works perfectly fine showing the colors for positive versus negative values. Also the labels get rewritten according to the labels provided. However if I simply make it a plotly graph I lose my labels and they are set back to the -1, 1 factors instead. Does plotly not support this and if so is their another way to get this done
library(ggplot2)
library(plotly)
dt <- structure(list(date = structure(c(18651L, 18658L, 18665L, 18672L,
18679L, 18686L, 18693L, 18700L, 18707L, 18714L), class = c("IDate",
"Date")), churn = c(-3.27088948787062, -0.582518144525087, -0.125024925224327,
-0.333746898263027, -0.685714285714286, -0.340165549862042, 0.0601176470588235,
-0.119351608461635, -0.0132513279284316, -0.011201854099989)), row.names = c(NA,
-10L), class = c("data.table", "data.frame"))
plot_ggplot <- ggplot(dt, aes(x = date, y = churn * 100)) +
geom_bar(stat = "identity", aes(fill = factor(sign(churn)))) +
scale_fill_manual(
values = c("#4da63f", "#e84e62"),
breaks = c("-1", "1"),
labels = c("Growing base", "Declining base")
) +
ylim(-75, 25) +
labs(
title = "Weekly churn rate",
fill = "Legend"
)
plot_ggplot
plot_ggplotly <- ggplotly(plot_ggplot)
plot_ggplotly
Does this do the trick?
dt$base = ifelse(sign(dt$churn)>0, "Growing base","Declining base")
plot_ggplot <- ggplot(dt, aes(x = date, y = churn * 100)) +
geom_bar(stat = "identity", aes(fill = base)) +
scale_fill_manual(
values = c("#4da63f", "#e84e62"),
) +
ylim(-75, 25) +
labs(
title = "Weekly churn rate",
fill = "Legend"
)
plot_ggplot
plot_ggplotly <- ggplotly(plot_ggplot)
edit: I just read the comment, I think it is what was suggested
Related
I have 2 dataframe like the following:
df_hk_genes_pre = structure(list(ACTB = c(11.6704399, 12.458028, 11.200511, 12.3073524,
12.066374, 12.064411, 12.1516557, 8.669943, 12.045182, 12.35896,
11.3328069, 10.226411, 11.8971381, 12.0288182, 11.6919341, 12.0735249,
11.8812387, 11.8266266, 11.5526943, 12.3936434), ATP5F1 = c(8.6677137,
8.260138, 8.421619, 8.1465627, 8.956782, 8.792251, 8.6480966,
8.700314, 8.850915, 8.446602, 8.7311666, 8.762719, 8.1397597,
7.9228606, 8.909108, 8.8039817, 8.4693453, 8.5861887, 8.2678096,
8.7482762)), row.names = c(NA, 20L), class = "data.frame")
df_hk_genes_post = structure(list(ACTB = c(11.7087918, 13.1847403, 8.767737, 12.2949669,
12.399929, 12.130683, 9.816222, 10.700336, 11.862543, 12.479818,
12.48152, 11.798277, 12.0932696, 11.014992, 12.3496682, 11.9810211,
11.946094, 12.1517049, 11.6794028, 12.4895911), ATP5F1 = c(8.3731175,
8.3995189, 8.871088, 8.4389342, 8.529104, 9.004405, 8.883721,
8.70097, 8.24411, 8.393635, 8.76813, 8.756177, 8.4418168, 7.986864,
8.4840108, 8.6523954, 8.5645576, 8.2452877, 8.2440872, 8.7155973
)), row.names = c(NA, 20L), class = "data.frame")
I used the following code to generate grid of histograms for each separately:
setDT(df_hk_genes_post)
melt(df_hk_genes_post) %>%
ggplot(aes(x = value)) +
facet_wrap(~ variable, nrow = 2, scale = "free") +
geom_histogram(
fill="#69b3a2", color="#69b3a2", alpha=0.4, position="identity", bins=20
) +
scale_x_continuous(
sec.axis = sec_axis(
~ . , name = "CPM of House-Keeping Genes Distribution",
breaks = NULL, labels = NULL
)
)
But now I wish to plot both on the same grid with different colors. Is it possible using the same snippet or should different approach should be taken?
You could use the following code. First make a single dataframe, with an extra columns specifying which is pre which is post.
Then generate a plot facetting the PrePost var as well.
library(data.table)
## add column identifying pre or post: PrePost
## and rowbind together,
## make a factor from PrePost
df_hk_genes_pre$PrePost <- "pre"
df_hk_genes_post$PrePost <- "post"
df_hk_genes_all <- rbind(df_hk_genes_pre, df_hk_genes_post)
df_hk_genes_all$PrePost <- factor(df_hk_genes_all$PrePost)
## plot with facets in rows for "PrePost"
## and facets in columns for "variable"
setDT(df_hk_genes_all)
melt(df_hk_genes_all) %>%
ggplot(aes(x = value, fill = PrePost)) + ### fill col based on PrePost
facet_grid(cols = vars(variable), rows = vars(PrePost)) + ### PrePost in facet rows
geom_histogram(
bins=20
, color= "grey" ### visually distinct bars
) +
scale_x_continuous(
sec.axis = sec_axis(
~ . , name = "CPM of House-Keeping Genes Distribution",
breaks = NULL, labels = NULL
)
)
This yields the following graph:
If you want to change the fill colors, you could add a line similar to the following:
scale_fill_manual(values= c("#69b3a2", "#25a3c9")) +
Please, let me know whether this is what you had in mind.
Edit 01
If you want to have pre and post on the same subplot, then you may use
position = "dodge" as argument to geom_histogram()
setDT(df_hk_genes_all)
melt(df_hk_genes_all) %>%
ggplot(aes(x = value, fill = PrePost)) + ### fill col based on PrePost
facet_grid(cols = vars(variable)) +
geom_histogram(
bins=20
, color= "grey", ### visually distinct bars
, position = "dodge" ### dodging
) +
scale_x_continuous(
sec.axis = sec_axis(
~ . , name = "CPM of House-Keeping Genes Distribution",
breaks = NULL, labels = NULL
)
)
... yielding this plot:
Similarly you could combine both variable levels on the same plot if you wanted to demonstrate a closer overlap
df_hk_genes_pre2 = reshape::melt(df_hk_genes_pre) %>%
mutate(status = "pre")
df_hk_genes_post2 = reshape::melt(df_hk_genes_post) %>%
mutate(status = "post")
bind_rows(df_hk_genes_pre2, df_hk_genes_post2) %>%
mutate(status = factor(status, levels = c("pre", "post"))) %>%
ggplot(aes(x = value, fill = variable)) +
facet_wrap(~ status, nrow = 2, scale = "free") +
geom_histogram(position = "identity", bins = 20, alpha = .4) +
scale_fill_manual(values= c("#69b3a2", "black")) +
scale_color_manual(values = c("#69b3a2", "black")) +
scale_x_continuous(
sec.axis = sec_axis(
~ . , name = "CPM of House-Keeping Genes Distribution",
breaks = NULL, labels = NULL
)
)
Below is the code I am having trouble with and its output. The data set is linked at the bottom of the post.
What I am wanting to do is group the StateCodes together with each MSN (opposite of what is showing now in the output).
plotdata <- EnergyData %>%
filter(MSN %in% c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB")) %>%
filter(Year %in% c("2009")) %>%
select(StateCode, MSN, Data) %>%
group_by(StateCode) %>%
mutate(pct = Data/sum(Data),
lbl = scales::percent(pct))
plotdata
This outputs to:
I thought that the group_by function would do that for me but I would like to know if I am missing a key chunk of code?
Once the above chunk runs correctly, I want to create side by side Bar charts by StateCode using the percentages of each of the 5 MSN's.
Here's the code I have so far.
ggplot(EnergyData,
aes(x = factor(StateCode,
levels = c("AZ", "CA", "NM", "TX")),
y = pct,
fill = factor(drv,
levels = c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB"),
labels = c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB")))) +
geom_bar(stat = "identity",
position = "fill") +
scale_y_continuous(breaks = seq(0, 1, .2),
label = pct) +
geom_text(aes(label = lbl),
size = 3,
position = position_stack(vjust = 0.5)) +
scale_fill_brewer(palette = "Set2") +
labs(y = "Percent",
fill = "MSN",
x = "State",
title = "Renewable Resources by State") +
theme_minimal()
As of now I believe this all has to do with how I create the percentages for the bar charts.
Any assistance would be great. Thank you!
Here's the data I used Energy Data http://www.mathmodels.org/Problems/2018/MCM-C/ProblemCData.xlsx
Here is a version using data.table for the initial filtering, and changes to the plot function that hopefully get you the result you are after:
library(readxl)
library(data.table)
library(ggplot2)
download.file("http://www.mathmodels.org/Problems/2018/MCM-C/ProblemCData.xlsx", "~/ex/ProblemCData.xlsx")
# by default, factor levels will be in alphabetical order, so we do not need to specify that
EnergyData <- data.table(read_xlsx("~/ex/ProblemCData.xlsx"), key="StateCode", stringsAsFactors = TRUE)
# filter by Year and MSN list
plotdata <- EnergyData[as.character(MSN) %chin% c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB") & Year == 2009]
# calculate percentages of Data by StateCode
plotdata[, pct := Data/sum(Data), by = "StateCode"]
# plot using percent format and specified number of breaks
ggplot(plotdata,
aes(x = StateCode,
y = pct,
fill = MSN)) +
geom_bar(stat = "identity",
position = "fill") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), n.breaks = 6) +
scale_fill_brewer(palette = "Set2") +
labs(y = "Percent",
fill = "MSN",
x = "State",
title = "Renewable Resources by State") +
theme_minimal()
Created on 2020-03-20 by the reprex package (v0.3.0)
I am trying to create a bar plot, with both base r plotting and geom_point plot with ggplot to visualize two variables by a column of factors.
I have created a datasheet that uses r-squared values for Species and Elasmobranch variables calculated using each factor.
Graph <- structure(list(Factors = structure(c(5L, 4L, 11L, 6L, 8L, 10L
), .Label = c("Activity", "Bait", "Depth", "Location", "Marine Park",
"Month", "Sea State", "Start Time", "Substrate", "Swell", "Year"
), class = "factor"), Species = c(0.1064, 0.5806, 0.05974, 0.07888,
0.1325, 0.05725), Elasmobranchs = c(0.02658, 0.4074, 0.02072,
0.1419, 0.1065, 0.08661)), row.names = c(NA, 6L), class = "data.frame")
ggplot(data = Graph, aes(x = Species, y = Factors, colour = Factors)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
theme_classic()
barplot(Species~Factors, xlab="Factors", ylab="R-Sqaured Values",
horizontal = TRUE,
main = "Factor correlations with species richness", frame.plot=FALSE)
barplot(Elasmobranchs~Factors, xlab="Factors", ylab="R-Sqaured Values",
horizontal = TRUE,
main = "Factor correlations with species richness", frame.plot=FALSE)
These ggplot and standard plots work nicely, however I would simply like to add Elasmobranchs on the x-axis alongside Species and have the result displayed in decreasing order. Is there a simple way to do this by adding a small line of code to my existing plots?
Thank you for any assistance.
If I understood correctly what you're trying to do, you can reshape your data frame using pivot_longer (former gather) and then differentiate between Species and Elasmobranchs through shape. You can reverse the scale using scale_x_reverse.
library(tidyverse)
ggplot(data = Graph %>% pivot_longer(-Factors),
aes(x = value, y = Factors, colour = Factors, shape=name)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
scale_x_reverse() +
theme_classic()
EDIT:
In case you're working with an older version of tidyr that doesn't have pivot_longer function, you can use gather (from the same package).
ggplot(data = Graph %>% gather("name", "value", -Factors),
aes(x = value, y = Factors, colour = Factors, shape=name)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
scale_x_reverse() +
theme_classic()
EDIT 2:
To reorder the y-axis based on the values of x-axis.
ggplot(data = Graph %>% gather("name", "value", -Factors),
aes(x = value, y = fct_reorder(Factors, value), colour = Factors, shape=name)) +
geom_point(size = 3) +
xlab("Species Richness") +
ylab("Factors") +
theme_classic()
My data frame looks like
df
Group value
1 Positive 52
2 Negative 239
3 Neutral 9
I would like to make a pie chart of the data frame using ggplot.
pie <- ggplot(df, aes(x="", y=value, fill=Group)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start=0)
This is my pie chart.
But when I try to add percentage labels on the chart
pie <- ggplot(df, aes(x="", y=value, fill=Group)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start=0) +
geom_text(aes(y = value/2 + c(0, cumsum(value)[-length(value)]),
label = percent(value/300 )), size=5)
This is my result.
I have already seen many same question as mine,i.e R + ggplot2 => add labels on facet pie chart and the solutions are not helping.
How about:
vals <- c(239, 52, 9)
val_names <- sprintf("%s (%s)", c("Negative", "Positive", "Neutral"), scales::percent(round(vals/sum(vals), 2)))
names(vals) <- val_names
waffle::waffle(vals) +
ggthemes::scale_fill_tableau(name=NULL)
instead?
It's "fresher" than a pie chart and you aren't really gaining anything with the level of precision you have/want on those pie labels now.
For example, I create a dataframe e3 with 400 vehicles:
e3 <- data.frame(400)
e3 <- rep( c("car", "truck", "other", "bike", "suv"), c(60, 120, 20, 50, 150))
Since pie charts are especially useful for proportions, let's have a look on the proportions of our vehicles, than we will report on the graph in this case:
paste(prop.table(table(e3))*100, "%", sep = "")
[1] "15%" "5%" "30%" "12.5%" "37.5%"
Then you can draw your pie chart,
pie(table(e3), labels = paste(round(prop.table(table(e3))*100), "%", sep = ""),
col = heat.colors(5), main = "Vehicles proportions - n: 400")
Here is an idea matching the order of groups in the pie chart and the order of labels. I sorted the data in descending order by value. I also calculated the percentage in advance. When I drew the ggplot figure, I specified the order of Group in the order in mydf (i.e., Negative, Positive, and Neutral) using fct_inorder(). When geom_label_repel() added labels to the pie, the order of label was identical to that of the pie.
library(dplyr)
library(ggplot2)
library(ggrepel)
library(forcats)
library(scales)
mydf %>%
arrange(desc(value)) %>%
mutate(prop = percent(value / sum(value))) -> mydf
pie <- ggplot(mydf, aes(x = "", y = value, fill = fct_inorder(Group))) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
geom_label_repel(aes(label = prop), size=5, show.legend = F, nudge_x = 1) +
guides(fill = guide_legend(title = "Group"))
DATA
mydf <- structure(list(Group = structure(c(3L, 1L, 2L), .Label = c("Negative",
"Neutral", "Positive"), class = "factor"), value = c(52L, 239L,
9L)), .Names = c("Group", "value"), class = "data.frame", row.names = c("1",
"2", "3"))
I agree with #hrbrmstr a waffle chart would be better. But to answer the original question... your problem comes from the order in which the wedges are drawn, which will default to alphabetical. As you calculate where to place the labels based on the ordering in your data frame, this works out wrong.
As a general principle of readability, do all the fancy calculations of labels and positions they go before the actual code drawing the graphic.
library(dplyr)
library(ggplot2)
library(ggmap) # for theme_nothing
df <- data.frame(value = c(52, 239, 9),
Group = c("Positive", "Negative", "Neutral")) %>%
# factor levels need to be the opposite order of the cumulative sum of the values
mutate(Group = factor(Group, levels = c("Neutral", "Negative", "Positive")),
cumulative = cumsum(value),
midpoint = cumulative - value / 2,
label = paste0(Group, " ", round(value / sum(value) * 100, 1), "%"))
ggplot(df, aes(x = 1, weight = value, fill = Group)) +
geom_bar(width = 1, position = "stack") +
coord_polar(theta = "y") +
geom_text(aes(x = 1.3, y = midpoint, label = label)) +
theme_nothing()
This is my example, using only the basic R code. Hope it help.
Take iris for example
attach(iris)
check the the ratio of iris$Species
a<- table(iris$Species)
class(a)
then convert table format into matrix in order to use rowname code
a_mat<- as.matrix(a)
a_mat
calculate the ratio of each Species
a_ratio<- a_mat[,1]/sum(a_mat[,1])*100
a_ratio
since each Species accounts for 0.33333 (i.e. 33.33333%), I just want 2 decimal places by using signif()
a_ratio<- signif(a_ratio,3)
a_ratio
basic pie chart code of R base
pie(a_ratio,labels=rownames(a_mat))
further add ratio values to labels by using paste()
pie(a_ratio,labels=paste(rownames(a_mat),c("33%","33%","34%")))
final pie chart, please click this link
I'm using ggplot2 to make some bullseye charts in R. They look delightful, and everyone is very pleased - except that they'd like to have the values of the bullseye layers plotted on the chart. I'd be happy just to put them in the lower-right corner of the plot, or even in the plot margins, but I'm having some difficulty doing this.
Here's the example data again:
critters <- structure(list(Zoo = "Omaha", Animals = 50, Bears = 10, PolarBears = 3), .Names = c("Zoo",
"Animals", "Bears", "PolarBears"), row.names = c(NA, -1L), class = "data.frame")
And how to plot it:
d <- data.frame(animal=factor(c(rep("Animals", critters$Animals),
rep("Bears", critters$Bears), rep("PolarBears", critters$PolarBears)),
levels = c("PolarBears", "Bears", "Animals"), ordered= TRUE))
grr <- ggplot(d, aes(x = factor(1), fill = factor(animal))) + geom_bar() +
coord_polar() + labs(x = NULL, fill = NULL) +
scale_fill_manual(values = c("firebrick2", "yellow2", "green3")) +
opts(title = paste("Animals, Bears and Polar Bears:\nOmaha Zoo", sep=""))
I'd like to add a list to, say, the lower right corner of this plot saying,
Animals: 50
Bears: 10
PolarBears: 3
But I can't figure out how. My efforts so far with annotate() have been thwarted, in part by the polar coordinates. If I have to add the numbers to the title, so be it - but I always hold out hope for a more elegant solution.
EDIT:
An important note for those who come after: the bullseye is a bar plot mapped to polar coordinates. The ggplot2 default for bar plots is, sensibly, to stack them. However, that means that the rings of your bullseye will also be stacked (e.g. the radius in my example equals the sum of all three groups, 63, instead of the size of the largest group, 50). I don't think that's what most people expect from a bullseye plot, especially when the groups are nested. Using geom_bar(position = position_identity()) will turn the stacked rings into layered circles.
EDIT 2: Example from ggplot2 docs:
you can also add it directly to the plot:
grr <- ggplot(d, aes(x = factor(1), fill = factor(animal))) + geom_bar() +
coord_polar() + labs(x = NULL, fill = NULL) +
scale_fill_manual(values = c("firebrick2", "yellow2", "green3")) +
opts(title = paste("Animals, Bears and Polar Bears:\nOmaha Zoo", sep=""))+
geom_text(y=c(3,10,50)-3,label=c("3","10","50"),size=4)
grr
You could add the numbers to the legend.
library(ggplot2)
critters <- structure(list(Zoo = "Omaha", Animals = 50, Bears = 10, PolarBears = 3), .Names = c("Zoo", "Animals", "Bears", "PolarBears"), row.names = c(NA, -1L), class = "data.frame")
d <- data.frame(animal=factor(c(rep("Animals", critters$Animals),
rep("Bears", critters$Bears), rep("PolarBears", critters$PolarBears)),
levels = c("PolarBears", "Bears", "Animals"), ordered= TRUE))
levels(d$animal) <- apply(data.frame(table(d$animal)), 1, paste, collapse = ": ")
ggplot(d, aes(x = factor(1), fill = factor(animal))) + geom_bar() +
coord_polar() + labs(x = NULL, fill = NULL) +
scale_fill_manual(values = c("firebrick2", "yellow2", "green3")) +
opts(title = paste("Animals, Bears and Polar Bears:\nOmaha Zoo", sep=""))