I need to add value labels for data points on two separate lines that are generated from a data frame in R using ggplot2. The following is the code snippet that I am using:
DataFrame = data.frame(Amount = c(results$Costs,
results$TotalPoC),
Legend = rep(c("Cost as % of initial costs",
"Revenue as % of cost"),
each = nrow(results)),
Year = rep(0:5,2))
p <- ggplot(ResultsCR, aes(x=Year, y=Amount, group=Legend)) +
geom_line(aes(linetype=Legend))+
geom_point(aes(shape=Legend))+
geom_text(aes(label=Amount))+
theme_classic(base_size = 15) +
ggtitle("Hospital Costs and Revenues")
print(p)
However, the graph is only displaying the labels on the second line, i.e. the one corresponding to the Legend "Revenue as % of cost". How can I generate labels for data points on all lines generated from the same data frame in ggplot2?
I am not able to reproduce your example. Can you please use this dataset to reproduce your problem or draw on it what you would like to change?
library(tidyverse)
set.seed(1)
df <-
tibble(
amount = sample(10:30, 10),
legend = rep(
c("Cost as % of initial costs",
"Revenue as % of cost"),
each = 5),
year = rep(1:5, 2)
)
ggplot(df, aes(x = year, y = amount, group = legend)) +
geom_line(aes(linetype = legend)) +
geom_point(aes(shape = legend)) +
geom_text(aes(label = amount), hjust = -1) +
theme_classic(base_size = 15) +
xlim(1, 5.5) +
ggtitle("Hospital Costs and Revenues")
Related
This is not my data (for confidentiality reasons), but I have tried to create a reproducible example using a dataset included in the ggplot2 library. I have an histogram summarizing the value of some variable by group (factor of 2 levels). First, I did not want the counts but proportions of the total, so I used that code:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>% as.data.frame() %>% filter(cut=="Premium" | cut=="Ideal")
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="Count") +
theme_bw() + theme(legend.position="none")
It gave me this as a result.
enter image description here
The issue is that I would like to print the numeric percentages on top of the bins and haven't find a way to do so.
As I saw it done for printing counts elsewhere, I attempted to print them using stat_bin(), including the same y and label values as the y in geom_histogram, thinking it would print the right numbers:
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
stat_bin(aes(y=after_stat(width*density),label=after_stat(width*density*100)),geom="text",vjust=-.5) +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="%") +
theme_bw() + theme(legend.position="none")
However, it does print way more values than there are bins, these values do not appear consistent with what is portrayed by the bar heights and they do not print in respect to vjust=-.5 which would make them appear slightly above the bars.
enter image description here
What am I missing here? I know that if there was no grouping variable/facet_wrap, I could use after_stat(count/sum(count)) instead of after_stat(width*density) and it seems that it would have fixed my issue. But I need the histograms for both groups to appear next to each other. Thanks in advance!
You have to use the same arguments in stat_bin as for the histogram when adding your labels to get same binning for both layers and to align the labels with the bars:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>%
as.data.frame() %>%
filter(cut == "Premium" | cut == "Ideal")
ggplot(df_example, aes(x = z, fill = cut)) +
geom_histogram(aes(y = after_stat(width * density)),
binwidth = 1, center = 0.5, col = "black"
) +
stat_bin(
aes(
y = after_stat(width * density),
label = scales::number(after_stat(width * density), scale = 100, accuracy = 1)
),
geom = "text", binwidth = 1, center = 0.5, vjust = -.25
) +
facet_wrap(~cut) +
scale_x_continuous(breaks = seq(0, 9, by = 1)) +
scale_y_continuous(labels = scales::number_format(scale = 100)) +
scale_fill_manual(values = c("#CC79A7", "#009E73")) +
labs(x = "Depth (mm)", y = "%") +
theme_bw() +
theme(legend.position = "none")
I am trying to create a density plot for particle size data. My data has multiple density and size readings for each genotype set. Is there a way to specify multiple columns into x and y using ggplot? I tried coding for this but am only getting a blank plot as of now. This is the link to the csv file I used: https://drive.google.com/file/d/11djXTmZliPCGLCZavukjb0TT28HsKMRQ/view?usp=sharing
Thanks!
crop.data6 <- read.csv("barleygt25.csv", header = TRUE)
crop.data6
library(ggplot2)
plot1 = ggplot(data=crop.data6, aes(x=, xend=bq, y=a, yend=bq, color=genotype))
plot1
Your data is in a strange format that doesn't lend itself well to plotting. Effectively, it needs to be transposed then pivoted into long format to make it suitable for plotting:
df <- data.frame(xvals = c(t(crop.data6[1:9, -c(1:2)])),
yvals = c(t(crop.data6[10:18, -c(1:2)])),
genotype = rep(crop.data6$genotype[1:9], each = 68))
ggplot(df, aes(xvals, yvals, color = genotype)) +
geom_line(size = 1) +
scale_color_brewer(palette = "Set1") +
theme_bw(base_size = 16) +
labs(x = "value", y = "density")
Here is my sample data:
Singer <- c("A","B","C","A","B","C")
Rank <- c(1,2,3,3,2,1)
Episode <- c(1,1,1,2,2,2)
Votes <- c(0.3,0.28,0.11,0.14,0.29,0.38)
data <- data_frame(Episode,Singer,Rank,Votes)
data$Episode <- as.character(data$Episode)
I would like to make a line graph to show the performance of each singer.
I tried to use ggplot2 like below:
ggplot(data,aes(x=Episode,y=Votes,group = Singer)) + geom_line()
I have two questions:
How can I format the y-axis as percentage?
How can I label each dot in this line graph as the values of "Rank", which allows me to show rank and votes in the same graph?
To label each point use:
geom_label(aes(label = Rank))
# or
geom_text(aes(label = Rank), nudge_y = .01, nudge_x = 0)
To format the axis labels use:
scale_y_continuous(labels = scales::percent_format())
# or without package(scales):
scale_y_continuous(breaks = (seq(0, .4, .2)), labels = sprintf("%1.f%%", 100 * seq(0, .4, .2)), limits = c(0,.4))
Complete code:
library(ggplot2)
library(scales)
ggplot(data, aes(x = factor(Episode), y = Votes, group = Singer)) +
geom_line() +
geom_label(aes(label = Rank)) +
scale_y_continuous(labels = scales::percent_format())
Data:
Singer <- c("A","B","C","A","B","C")
Rank <- c(1,2,3,3,2,1)
Episode <- c(1,1,1,2,2,2)
Votes <- c(0.3,0.28,0.11,0.14,0.29,0.38)
data <- data_frame(Episode,Singer,Rank,Votes)
# no need to transform to character bc we use factor(Episode) in aes(x=..)
I have temporal data of gas emissions from two species of plant, both of which have been subjected to the same treatments. With some previous help to get this code together [edit]:
soilflux = read.csv("soil_fluxes.csv")
library(ggplot2)
soilflux$Treatment <- factor(soilflux$Treatment,levels=c("L-","C","L+"))
soilplot = ggplot(soilflux, aes(factor(Week), Flux, fill=Species, alpha=Treatment)) + stat_boxplot(geom ='errorbar') + geom_boxplot()
soilplot = soilplot + labs(x = "Week", y = "Flux (mg m-2 d-1)") + theme_bw(base_size = 12, base_family = "Helvetica")
soilplot
Producing this which works well but has its flaws.
Whilst it conveys all the information I need it to, despite Google trawls and looking through here I just couldn't get the 'Treatment' part of the legend to show that L- is light and L+ darkest. I've also been told that a monochrome colour scheme is easier to differentiate hence I'm trying to get something like this where the legend is clear.
(source: biomedcentral.com)
As a workaround you could create a combined factor from species and treatment and assign the fill colors manually:
library(ggplot2)
library(RColorBrewer)
d <- expand.grid(week = factor(1:4), species = factor(c("Heisteria", "Simarouba")),
trt = factor(c("C", "L-", "L+"), levels = c("L-", "C", "L+")))
d <- d[rep(1:24, each = 30), ]
d$flux <- runif(NROW(d))
# Create a combined factor for coding the color
d$spec.trt <- interaction(d$species, d$trt, lex.order = TRUE, sep = " - ")
ggplot(d, aes(x = week, y = flux, fill = spec.trt)) +
stat_boxplot(geom ='errorbar') + geom_boxplot() +
scale_fill_manual(values = c(brewer.pal(3, "Greens"), brewer.pal(3, "Reds")))
I have a dataframe in R like this:
dat = data.frame(Sample = c(1,1,2,2,3), Start = c(100,300,150,200,160), Stop = c(180,320,190,220,170))
And I would like to plot it such that the x-axis is the position and the y-axis is the number of samples at that position, with each sample in a different colour. So in the above example you would have some positions with height 1, some with height 2 and one area with height 3. The aim being to find regions where there are a large number of samples and what samples are in that region.
i.e. something like:
&
---
********- -- **
where * = Sample 1, - = Sample 2 and & = Sample 3
My first try:
dat$Sample = factor(dat$Sample)
ggplot(aes(x = Start, y = Sample, xend = Stop, yend = Sample, color = Sample), data = dat) +
geom_segment(size = 2) +
geom_segment(aes(x = Start, y = 0, xend = Stop, yend = 0), size = 2, alpha = 0.2, color = "black")
I combine two segment geometries here. One draws the colored vertical bars. These show where Samples have been measured. The second geometry draws the grey bar below where the density of the samples is shown. Any comments to improve on this quick hack?
This hack may be what you're looking for, however I've greatly increased the size of the dataframe in order to take advantage of stacking by geom_histogram.
library(ggplot2)
dat = data.frame(Sample = c(1,1,2,2,3),
Start = c(100,300,150,200,160),
Stop = c(180,320,190,220,170))
# Reformat the data for plotting with geom_histogram.
dat2 = matrix(ncol=2, nrow=0, dimnames=list(NULL, c("Sample", "Position")))
for (i in seq(nrow(dat))) {
Position = seq(dat[i, "Start"], dat[i, "Stop"])
Sample = rep(dat[i, "Sample"], length(Position))
dat2 = rbind(dat2, cbind(Sample, Position))
}
dat2 = as.data.frame(dat2)
dat2$Sample = factor(dat2$Sample)
plot_1 = ggplot(dat2, aes(x=Position, fill=Sample)) +
theme_bw() +
opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
geom_hline(yintercept=seq(0, 20), colour="grey80", size=0.15) +
geom_hline(yintercept=3, linetype=2) +
geom_histogram(binwidth=1) +
ylim(c(0, 20)) +
ylab("Count") +
opts(axis.title.x=theme_text(size=11, vjust=0.5)) +
opts(axis.title.y=theme_text(size=11, angle=90)) +
opts(title="Segment Plot")
png("plot_1.png", height=200, width=650)
print(plot_1)
dev.off()
Note that the way I've reformatted the dataframe is a bit ugly, and will not scale well (e.g. if you have millions of segments and/or large start and stop positions).