Barplot overlay with geom line - r

here is the data example:
S P C P_int C_int
10 20 164 72 64
20 550 709 92 89
30 142 192 97 96
40 45 61 99 98
50 12 20 99 99
60 5 6 99 99
70 2 2 99 99
80 4 1 99 99
90 1 0 10 99
100 0 1 10 99
Let's say i have a dataframe called df, the aim is to have a bar chart using variables P and C, with an line chart overlayed using sum of variables P_int and C_int. Currently I have these lines of codes to create the bar chart:
final <- df %>% tidyr::gather(type, value, c(`P`, `C`))
ggplot(final, aes(S))+
geom_bar(aes(y=value, fill=type), stat="identity", position="dodge")
The thing I can't figure out is hot to plot the sum of variables P_int and C_int as a line chart overlayed on the above plot with a second Y axis. Would appreciate any help.

Do you need something like this ?
library(ggplot2)
library(dplyr)
ggplot(final, aes(S))+
geom_bar(aes(y=value, fill=type), stat="identity", position="dodge") +
geom_line(data = final %>%
group_by(S) %>%
summarise(total = sum(P_int + C_int)),
aes(y = total), color = 'blue') +
scale_y_continuous(sec.axis = sec_axis(~./1)) +
theme_classic()
I have kept the scale of secondary y-axis same as primary y-axis since they are in the same range but you might need to adjust it in according to your real data.

Related

Plotting each value of columns for a specific row

I am struggling to plot a specific row from a dataframe. Below is the Graph i am trying to plot. I have tried using ggplot and normal plot but i cannot figure it out.
Wt2 Wt3 Wt4 Wt5 Lngth2 Lngth3 Lngth4 Lngth5
1 48 59 95 82 141 157 168 183
2 59 68 102 102 140 168 174 170
3 61 77 93 107 145 162 172 177
4 54 43 104 104 146 159 176 171
5 100 145 185 247 150 158 168 175
6 68 82 95 118 142 140 178 189
7 68 95 109 111 139 171 176 175
Above is the Data frame I am trying to plot with. The rows are for each bears measurement. So row 1 is for bear 1. How would I plot only the Wt columns for bear 1 against an X-axis that goes from years 2 to 5
You can pivot your data frame into a longer format:
First add a column with the row number (bear number):
df = cbind("Bear"=as.factor(1:nrow(df)), df)
It needs to be factor so we can pass it as a group variable to ggplot. Now pivot:
df2 = tidyr::pivot_longer(df[,1:5], cols=2:5,
names_to="Year", values_to="Weight", names_prefix="Wt")
df2$Year = as.numeric(df2$Year)
We ignore the Length columns with df[,1:5]; say that we only want to pivot the weight columns with df[,2:5]; then say the name of the columns we want to create with names_to and values_to; and lastly the names_prefix="Wt" removes the "Wt" before the column names, leaving only the year number, but we get a character, so we need to make it numeric with as.numeric().
Then plot:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) + geom_line()
Output (Ps: i created my own data, so the actual numbers are off):
Just an addition, if you don't want to specify the columns of your dataset explicity, you can do:
df2 = df2[,grep("Wt|Bear", colnames(df)]
df2 = tidyr::pivot_longer(df2, cols=grep("Wt", colnames(df2)),
names_to="Year", values_to="Weight", names_prefix="Wt")
Edit: one plot for each group
You can use facet_wrap:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) +
facet_wrap(~Bear, nrow=2, ncol=4) +
geom_line()
Output:
You can change the nrow and ncol as you wish, and can remove the linetype from aes() as you already have a differenciation, but it's not mandatory.
You can also change the levels of the categorical data to make the labels on each graph better, do levels(df2$Bear) = paste("Bear", 1:7) for example (or do that the when creating it).
Try
ggplot(mapping = aes(x = seq.int(2, 5), y = c(48, 59, 95, 82))) +
geom_point(color = "blue") +
geom_line(color = "blue") +
xlab("Year") +
ylab("Weight")

ggplot showing a trend with more than 1 variables across y axis

I have a dataframe df where I need to see the comparison of the trend between weeks
df
Col Mon Tue Wed
1 47 164 163
2 110 168 5
3 31 146 109
4 72 140 170
5 129 185 37
6 41 77 96
7 85 26 41
8 123 15 188
9 14 23 163
10 152 116 82
11 118 101 5
Right now I can only plot 2 variables like below. But I need to see for Tuesday and Wednesday as well
ggplot(data=df,aes(x=Col,y=Mon))+geom_line()
You can either add a
geom_line(aes(x = Col, y = Mon), col = 1)
for each day, or you would need to restructure your data frame using a function like gather so your new columns are col, day, value. Without reformatting the data, your result would be
ggplot(data=df)+geom_line(aes(x=Col,y=Mon), col = 1) + geom_line(aes(x=Col,y=Tue), col = 2) + geom_line(aes(x=Col,y=Wed), col = 3)
with a restructure it would be
ggplot(data=df)+geom_line(aes(x=Col,y=Val, col = Day))
The standard way would be to get the data in long format and then plot
library(tidyverse)
df %>%
gather(key, value, -Col) %>%
ggplot() + aes(factor(Col), value, col = key, group = key) + geom_line()

ggplot2: geom_bar(); how to alternate order of fill so bars are not lost inside a bar with a higher value?

I am trying to position two bars at the same position on the x-axis and seperated out by colour (almost as if stacking).
However, instead of stacking I want the bar simply inside the other bar - with the smallest Y-value being visable inside the bar with the highest Y-value.
I can get this to work to some extent - but the issue is that one Y-value is not consistently higher across one of the two factors. This leads to bars being 'lost' within a bar with a higher Y-value.
Here is a subset of my dataset and the current ggplot code:
condition hours expression freq_genes
1 tofde 9 up 27
2 tofde 12 up 92
3 tofde 15 up 628
17 tofde 9 down 0
18 tofde 12 down 1
19 tofde 15 down 0
33 tofp 9 up 2462
34 tofp 12 up 786
35 tofp 15 up 298
49 tofp 9 down 651
50 tofp 12 down 982
51 tofp 15 down 1034
65 tos 0 up 27
66 tos 3 up 123
67 tos 6 up 752
81 tos 0 down 1
82 tos 3 down 98
83 tos 6 down 594
sf_plot <- ggplot(data = gene_freq,
aes(x = hours,
y = freq_genes,
group = condition,
fill = factor(expression,
labels=c("Down",
"Up"))))
sf_plot <- sf_plot + labs(fill="Expression")
sf_plot <- sf_plot + geom_bar(stat = "identity",
width = 2.5,
position = "dodge")
sf_plot <- sf_plot + scale_fill_manual(values=c("#9ecae1",
"#3182bd"))
sf_plot <- sf_plot + xlab("Time (Hours)")
sf_plot <- sf_plot + scale_x_continuous(breaks =
seq(min(gene_freq$freq_genes),
max(gene_freq$freq_genes),
by = 3))
sf_plot <- sf_plot + ylab("Gene Frequency")
sf_plot <- sf_plot + facet_grid(. ~ condition, scales = "free")
sf_plot <- sf_plot + theme_bw()
sf_plot <- sf_plot + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
sf_plot <- sf_plot + theme(axis.text.x = element_text(angle = 90))
# Print plot
sf_plot
You can add alpha = 0.5 to your geom_bar() statement to make the bars transparent. This will allow both bars to be seen. Adding that alpha statement and nothing else will produce what you're looking for, to make both overlaid bars visible. The colors, however, make seeing the two different bars challenging.
Another (and maybe better) option is to change the order in which the plot is created. If I recall correctly, ggplot will plot the bars in alphabetical or numeric or factor-level order. Here, your expression values are c("Down", "Up") and "Down" is being plotted first. If you force "Up" to be plotted first, you could resolve this, too.
library(dplyr)
library(ggplot2)
dat <-
read.table(text = "condition hours expression freq_genes
1 tofde 9 up 27
2 tofde 12 up 92
3 tofde 15 up 628
17 tofde 9 down 0
18 tofde 12 down 1
19 tofde 15 down 0
33 tofp 9 up 2462
34 tofp 12 up 786
35 tofp 15 up 298
49 tofp 9 down 651
50 tofp 12 down 982
51 tofp 15 down 1034
65 tos 0 up 27
66 tos 3 up 123
67 tos 6 up 752
81 tos 0 down 1
82 tos 3 down 98
83 tos 6 down 594") %>%
mutate(expression2 = ifelse(expression == "up", 1, 2))
dat %>%
ggplot(aes(x = hours, y = freq_genes, group = condition,
fill = factor(expression2, labels=c("Up", "Down")))) +
labs(fill="Expression") +
geom_bar(stat = "identity", position = "dodge", width = 2.5, alpha = 0.5) +
scale_fill_manual(values=c("#9ecae1", "#3182bd")) +
xlab("Time (Hours)") +
scale_x_continuous(breaks = seq(min(dat$freq_genes),
max(dat$freq_genes),
by = 3)) +
ylab("Gene Frequency") +
facet_grid(. ~ condition, scales = "free") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "bottom",
axis.text.x = element_text(angle = 90))
Here, I've created a new column called expression2 that is just a numeric version of expression. I changed the fill variable in aes() to match with those new labels. I left the colors in scale_fill_manual() the same as in your original statement and kept the alpha value. "Down" is being plotted on top of "Up" but in keeping the same colors with the alpha value, both bars are easier to see. You can play with the legend to display "Down" before "Up" if that's necessary.
Note that providing machine readable data goes a long way in allowing others to help you out. Consider using dput() to output your data next time rather than pasting it in. Also note that you can "chain" together ggplot() statements with a +. This makes code much more compact and easier to read.

ggplot facets: show annotated text in selected facets

I want to create a 2 by 2 faceted plot with a vertical line shared by the four facets. However, because the facets on top have the same date information as the facets at the bottom, I only want to have the vline annotated twice: in this case in the two facets at the bottom.
I looked a.o. here, which does not work for me. (In addition I have my doubts whether this is still valid code, today.) I also looked here. I also looked up how to influence the font size in geom_text: according to the help pages this is size. In the case below it doesn't work out well.
This is my code:
library(ggplot2)
library(tidyr)
my_df <- read.table(header = TRUE, text =
"Date AM_PM First_Second Systolic Diastolic Pulse
01/12/2017 AM 1 134 83 68
01/12/2017 PM 1 129 84 76
02/12/2017 AM 1 144 88 56
02/12/2017 AM 2 148 93 65
02/12/2017 PM 1 131 85 59
02/12/2017 PM 2 129 83 58
03/12/2017 AM 1 153 90 62
03/12/2017 AM 2 143 92 59
03/12/2017 PM 1 139 89 56
03/12/2017 PM 2 141 86 56
04/12/2017 AM 1 140 87 58
04/12/2017 AM 2 135 85 55
04/12/2017 PM 1 140 89 67
04/12/2017 PM 2 128 88 69
05/12/2017 AM 1 134 99 67
05/12/2017 AM 2 128 90 63
05/12/2017 PM 1 136 88 63
05/12/2017 PM 2 123 83 61
")
# setting the classes right
my_df$Date <- as.Date(as.character(my_df$Date), format = "%d/%m/%Y")
my_df$First_Second <- as.factor(my_df$First_Second)
# to tidy format
my_df2 <- gather(data = my_df, key = Measure, value = Value,
-c(Date, AM_PM, First_Second), factor_key = TRUE)
# Measures in 1 facet, facets split over AM_PM and First_Second
## add anntotations column for geom_text
my_df2$Annotations <- rep("", 54)
my_df2$Annotations[c(4,6)] <- "Start"
p2 <- ggplot(data = my_df2) +
ggtitle("Blood Pressure and Pulse as a function of AM/PM,\n Repetition, and date") +
geom_line(aes(x = Date, y = Value, col= Measure, group = Measure), size = 1.) +
geom_point(aes(x = Date, y = Value, col= Measure, group = Measure), size= 1.5) +
facet_grid(First_Second ~ AM_PM) +
geom_vline(aes(xintercept = as.Date("2017/12/02")), linetype = "dashed",
colour = "darkgray") +
theme(axis.text.x=element_text(angle = -90))
p2
yields this graph:
This is the basic plot from which I start. Now we try to annotate it.
p2 + annotate(geom="text", x = as.Date("2017/12/02"), y= 110, label="start", size= 3)
yielding this plot:
This plot has the problem that the annotation occurs 4 times, while we only want it in the bottom parts of the graph.
Now we use geom_text which will use the "Annotations" column in our dataframe, in line with this SO Question. Be carefull, the column added to the dataframe must be present when you create "p2", the first time (that is why we added the column supra)
p2 + geom_text(aes(x=as.Date("2017/12/02"), y=100, label = Annotations, size = .6))
yielding this plot:
Yes, we succeeded in getting the annotation only in the bottom two parts of the graph. But the font is too big ( ... and ugly) and when we try to correct it with size, two things are interesting: (1) the font size is not changed (although you would expect that from the help pages) and (2) a legend is added.
I have been clicking around a lot and have been unable to solve this after hours and hours. Any help would be appreciated.

geom_bar labeling for melted data / stacked barplot

I have a problem with drawing stacked barplot with ggplot. My data looks like this:
timeInterval TotalWilling TotalAccepted SimID
1 16 12 Sim1
1 23 23 Sim2
1 63 60 Sim3
1 69 60 Sim4
1 61 60 Sim5
1 60 54 Sim6
2 16 8 Sim1
2 23 21 Sim2
2 63 52 Sim3
2 69 64 Sim4
2 61 45 Sim5
2 60 32 Sim6
3 16 14 Sim1
3 23 11 Sim2
3 63 59 Sim3
3 69 69 Sim4
3 61 28 Sim5
3 60 36 Sim6
I would like to draw a stacked barplot for each simID over a timeInterval, and Willing and Accepted should be stacked. I achieved the barplot with the following simple code:
dat <- read.csv("myDat.csv")
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) +
geom_bar(stat="identity", position = "stack")
I get the following graph:
Here my problem is that I would like to put percentages on each stack. Which means, I want to put percentage as for Willing label: (Willing/(Willing+Accepted)) and for Accepted part, ((Accepted/(Accepted+Willing)) so that I can see how many percent is willing how many is accepted such as 45 on red part of stack to 55 on blue part for each stack. I cannot seem to achieve this kind of labeling.
Any hint is appreciated.
applied from Showing data values on stacked bar chart in ggplot2
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
meltedDat$normvalue <- meltedDat$value
meltedDat$valuestr <- sprintf("%.2f%%", meltedDat$value, meltedDat$normvalue*100)
meltedDat <- ddply(meltedDat, .(timeInterval, SimID), transform, pos = cumsum(normvalue) - (0.5 * normvalue))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) + geom_bar(stat="identity", position = "stack") + geom_text(aes(x=timeInterval, y=pos, label=valuestr), size=2)
also, it looks like you may have some of your variables coded as factors.

Resources