R ggplot2: Line overlayed on Bar Graph (from separate data frames) - r

I have a bar graph coming from one set of monthly data and I want to overlay on it data from another set of monthly data in the form of a line. Here is a simplified example (in my data the second data set is not a simple manipulation of the first):
library(reshape2)
library(ggplot2)
test<-abs(rnorm(12)*1000)
test<-rbind(test, test+500)
colnames(test)<-month.abb[seq(1:12)]
rownames(test)<-c("first", "second")
otherTest<-apply(test, 2, mean)
test<-melt(test)
otherTest<-as.data.frame(otherTest)
p<-ggplot(test, aes(x=Var2, y=value, fill=Var1, order=-as.numeric(Var2))) + geom_bar(stat="identity")+
theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("Test Graph") +
scale_fill_manual(values = c(rgb(1,1,1), rgb(.9,0,0))) +
guides(fill=FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
works great to get the bar graph:
but I have tried multiple iterations to get the line on there and can't figure it out (like this):
p + geom_line(data=otherTest,size=1, color=rgb(0,.5,0)
Also, if anybody knows how I can make the bars in front of each other so that all you see is a red bar of height 500, I would appreciate any suggestions. I know I can just take the difference between the two lines of the matrix and keep it as a stacked bar but I thought there might be an easy way to put both bars on the x-axis, white in front of red. Thanks!

You have a few problems to deal with here.
Directly answering your question, if you don't provide a mapping via aes(...) in a geom call (like your geom_line...), then the mapping will come from ggplot(). Your ggplot() specifies x=Var2, y=value, fill=Var1.... All of these variable names must exist in your data frame otherTest for this to work, and they don't right now.
So, you either need to ensure that these variable names exist in otherTest, or specify mapping separately in geom_line. You might want to read up about how these layering options work. E.g., here's a post of mine that goes into some detail.
If you go for the first option, some other problems to think about:
is Var2 a factor with the same levels in both data frames? It probably should be.
to use geom_line as you are, you might need to add group = 1. See here.
Some others too, but here's a brief example of what you might do:
library(reshape2)
library(ggplot2)
test <- abs(rnorm(12)*1000)
test <- rbind(test, test+500)
colnames(test) <- month.abb[seq(1:12)]
rownames(test) <- c("first", "second")
otherTest <- apply(test, 2, mean)
test <- melt(test)
otherTest <- data.frame(
Var2 = names(otherTest),
value = otherTest
)
otherTest$Var2 = factor(otherTest$Var2, levels = levels(test$Var2))
ggplot(test, aes(x = Var2, y = value, group = 1)) +
geom_bar(aes(fill = Var1), stat="identity") +
geom_line(data = otherTest)

Related

How can I make the labels more readable in this lollipop plot?

I am trying to make a lollipop plot that includes a text 'condition' and a value associated. The issue I am having is that, because there is so much data, the labels overlap. Is there an easy fix for this?
This is my code (and my issue):
library(ggplot2)
df <- read.table(file = '24 hpi MP BP.tsv', sep = '\t', header = TRUE)
group <- df$Name
value <- df$Bgd.count
data <- data.frame(
x=group,
y=value
)
ggplot(data, aes(x=x, y=y)) +
geom_segment( aes(x=x, xend=x, y=0, yend=y), color="skyblue") +
geom_point( color="blue", size=4, alpha=0.6) +
theme_light() +
coord_flip() +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank()
)
I am hoping to get a clear separation on the labels
Your question does not provide a reproducible example, so here a more general answer.
The problem is that you want to plot hundreds of discrete values. That is bound to yield a crowded graphic.
your options:
reduce the labels (don’t label all axis) and show only few labels .
focus only on few important data points - I think this would be my preferred approach, as you also give your “story” more justice.
Group your values and show “aggregate values” such as means/error bars
Make your graph appropriately large (change the height of the so called graphic device)
Use facets (but this will not really help with the crowding in all cases)
Shorten your labels
Make the font smaller
Last, but definitely not least, change your visualisation strategy.

Reverse/flip position of plots on ggplot2 [duplicate]

This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
Closed 4 years ago.
I'm trying to make a heatmap using ggplot2 using the geom_tiles function
here is my code below:
p<-ggplot(data,aes(Treatment,organisms))+geom_tile(aes(fill=S))+
scale_fill_gradient(low = "black",high = "red") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "right",
axis.ticks = element_blank(),
axis.text.x = element_text(size = base_size, angle = 90, hjust = 0, colour = "black"),
axis.text.y = element_text(size = base_size, hjust = 1, colour = "black")).
data is my data.csv file
my X axis is types of Treatment
my Y axis is types of organisms
I'm not too familiar with commands and programming and I'm relatively new at this. I just want to be able to specify the order of the labels on the x axis. In this case, I'm trying to specify the order of "Treatment". By default, it orders alphabetically. How do I override this/keep the data in the same order as in my original csv file?
I've tried this command
scale_x_discrete(limits=c("Y","X","Z"))
where x, y and z are my treatment condition order. It however doesn't work very well, and give me missing heat boxes.
It is a little difficult to answer your specific question without a full, reproducible example. However something like this should work:
#Turn your 'treatment' column into a character vector
data$Treatment <- as.character(data$Treatment)
#Then turn it back into a factor with the levels in the correct order
data$Treatment <- factor(data$Treatment, levels=unique(data$Treatment))
In this example, the order of the factor will be the same as in the data.csv file.
If you prefer a different order, you can order them by hand:
data$Treatment <- factor(data$Treatment, levels=c("Y", "X", "Z"))
However this is dangerous if you have a lot of levels: if you get any of them wrong, that will cause problems.
One can also simply factorise within the aes() call directly. I am not sure why setting the limits doesn't work for you - I assume you get NA's because you might have typos in your level vector.
The below is certainly not much different than user Drew Steen's answer, but with the important difference of not changing the original data frame.
library(ggplot2)
## this vector might be useful for other plots/analyses
level_order <- c('virginica', 'versicolor', 'setosa')
p <- ggplot(iris)
p + geom_bar(aes(x = factor(Species, level = level_order)))
## or directly in the aes() call without a pre-created vector:
p + geom_bar(aes(x = factor(Species, level = c('virginica', 'versicolor', 'setosa'))))
## plot identical to the above - not shown
## or use your vector as limits in scale_x_discrete
p + geom_bar(aes(x = Species)) +
scale_x_discrete(limits = level_order)
Created on 2022-11-20 with reprex v2.0.2

Creating a custom legend in r

I'm hoping to have a legend that includes references to all colours, not just the vertical lines, and does not include a title.
I've tried scale_colour_manual and scale_fill_manual and they all either overlap or only show the vertical lines. I would appreciate any suggestions.
Reprex is below, including the custom colour palette.
var1 <- c(head(randu$x,n=12))
var2 <- as.Date(c("2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
var3 <- c(tail(randu[which(randu$x + randu$y < 1),]$x,n=12))
var4 <- c(tail(randu[which(randu$x + randu$y < 1),]$y,n=12))
dat <- data.frame(var1,var2,var3,var4)
setDT(dat)
dat$var5 <- dat[,(var3+var4)]
new_dates <- as.Date(c("2010-09-01","2010-05-01"))
cbp2 <- c("#000000", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7")
ggplot()+
geom_bar(data=dat,colour=cbp2[1],fill = cbp2[1],aes(x=var2,y=var5,colour="var4"),stat="identity")+
geom_bar(data=dat,colour=cbp2[2],fill = cbp2[2],aes(x=var2,y=var3,colour="var3"),stat="identity")+
geom_line(data=dat,colour=cbp2[1],aes(x=var2,y=var1))+
geom_vline(data=data.frame(xintercept = new_dates),
aes(xintercept = new_dates,linetype = "Changes", colour="red"),
linetype="dashed",key_glyph = "path")+
scale_color_manual(name = "",
values = c("red",cbp2[2],cbp2[1]),
breaks = c("red",cbp2[2],cbp2[1]),
labels = c("Changes","Var3","Var4"))+
scale_fill_manual(name = "",
values = c(cbp2[2],cbp2[1]),
breaks = c(cbp2[2],cbp2[1]),
labels = c("var3","var4"))+
ylab("")+
xlab("")+
scale_x_date(expand=c(0,0),date_breaks = "3 month", date_labels = "%b %y") +
scale_y_continuous(labels = function(var5) paste0(var5*100, "%"),
limits=c(0,1),
breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)) +
theme(panel.background = element_blank(),
axis.line = element_line(colour = "#000000"),
axis.text.x = element_text(angle=60, hjust=1),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.x= (element_text(margin = unit(c(3, 0, 0, 0), "mm"))),
legend.position = "top")
There's quite a lot to unpack here with this one, but I gave it my best shot.
First of all, consider what you are trying to plot here. Normally, it's not a problem to call things var1, var2, var3,...; however, in this context it's really quite confusing. Consequently, for this solution, I will be re-posting your entire code reworked instead of just the plotting portion for reasons I hope to outline in this answer.
The Data and the Question
With all that being said, here is my understanding about the nature of the dataset and your desire for the final plot:
var2 in the dataset contains Date class information, and this is the common x axis for the entire plot.
var1 contains values that are to be used for the y values of the geom_line plot layer
var3 and var4 contain values that are to be used for creation of the stacked barplot which should make up the background of the plot
var5 is a sum of var3 + var4, and was a device to create the plot. Herein, it will not be useful, given the data analysis we are to do on the dataset and the application of Tidy Data principles.
xintercept Values for the geom_vline plot layer are supplied as the two dates new_dates
The OP's question indicates a need for the Legend to be displayed correctly. In this case, we want to indicate:
fill color of the bars as var3 and var4
the nature of the vertical lines as dashed red lines.. called "Changes"
A label for the geom_line plot layer. Assume the label will be var1.
Hope all that was correct!
Synthesizing the Dataset
I encourage the OP to consult use of Tidy Data Principles, which will make synthesis of data such as this much more straightforward in the future. Herein, I will apply these principles to the dataset dat.
First of all, let's handle the bar layer data. Applying Tidy Data principles, we would want to gather together var3 and var4 and create out of them two columns: (1) one for the name of the variable ("var3" or "var4"), and (2) one for the value. We will be telling ggplot2 to "stack" bars, so var5 is not needed here: ggplot2 will do that calculation automatically. To gather the columns together, my preference is always to use gather() from dplyr and tidyr:
library(dplyr)
library(tidyr)
library(ggplot2)
library(data.table)
var1 <- c(head(randu$x,n=12))
var2 <- as.Date(c("2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
var3 <- c(tail(randu[which(randu$x + randu$y < 1),]$x,n=12))
var4 <- c(tail(randu[which(randu$x + randu$y < 1),]$y,n=12))
dat <- data.frame(var1,var2,var3,var4)
setDT(dat)
# dat$var5 <- dat[,(var3+var4)] no longer needed
new_dates <- as.Date(c("2010-09-01","2010-05-01"))
cbp2 <- c("#000000", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7")
newdat <- dat %>%
gather(key='var_name', value='value', -var2) # gather all columns except for var2
names(newdat) <- c('Dates', 'var_name', 'value')
newdat$var_name <- factor(newdat$var_name, levels=c('var4', 'var3','var1'))
In addition to gathering together, you will also note that I'm adjusting the names of the columns to make them a bit more easier to follow when it comes down to plotting. Additionally, I'm setting the order of the levels for newdat$var_name. The purpose here is that the order we specify will relate to the ordering used to create the plot. I want var3 to appear as a bar "under" var4, so we need to specify that var4 is first.
You could also create a separate dataset containing var2 and var1 to use for plotting the geom_line layer... but this also works fine.
The Plot
For the plot, I've tried to organize the code into separate sections. What OP was trying to do was to plot column-by-column, rather than using aes(fill= and aes(color= to set and create legends. In addition, the OP's original code had numerous examples of the following:
geom_*(aes(color=...), color=...)
The result of this in ggplot2 is that if you set an aesthetic value (like color=) outside of aes() while also stating this argument inside aes(), the value on the outside will overwrite the value specified inside the mapping--effectively removing any call to place that within a legend. This was the biggest cause for issue in the OP's example, and why certain items were the "right" color, but did not appear in any legend.
Specifying arguments in aes() only indicates that a legend should be created and tells ggplot2 on what basis to apply color, fill, linetype... it does not actually specify the color. Color should be specified using the scale_*_*() functions. In this case, we have 3 legend types created. The OP can organize however they wish to do so, but I tried to keep this example a bit illustrative to allow for some changing on the OP's case, since it is still not entirely clear how the legend is wanted to look completely.
Note that values= is used to apply the color, linetype, or fill aesthetic, and is done by feeding that argument a named vector. You can also use a non-named vector, in which case the attributes will be applied according to the ordering of the levels for that factor.
Note that I changed the line color of the geom_line to blue... just so that it stands out a bit. It would be a bit confusing otherwise, since there is a fill color that is also black.
ggplot(dat, aes(x=Dates, y=value)) +
# plot layers
geom_col(
data=subset(newdat, var_name != 'var1'),
aes(fill=var_name), position='stack') +
geom_line(
data=subset(newdat, var_name == 'var1'),
aes(color=var_name)
) +
geom_vline(data=data.frame(xintercept = new_dates),
aes(xintercept = new_dates, linetype = "Changes"), colour="red",
key_glyph = "path")+
# color and legend settings
scale_fill_manual(
name="Fill",
values=c('var3'=cbp2[2], 'var4'=cbp2[1])) +
scale_color_manual(
name='Color',
values = 'blue') +
scale_linetype_manual(
name='Linetype',
values=2) +
# scale adjustment and theme stuff
scale_y_continuous(labels = function(var5) paste0(var5*100, "%"),
limits=c(0,1),
breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)) +
theme(panel.background = element_blank(),
axis.line = element_line(colour = "#000000"),
axis.text.x = element_text(angle=60, hjust=1),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.x= (element_text(margin = unit(c(3, 0, 0, 0), "mm"))),
legend.position = "top")

ggplot2 geom_points won't colour or dodge

So I'm using ggplot2 to plot both a bar graph and points. I'm currently getting this:
As you can see the bars are nicely separated and colored in the desired colors. However my points are all uncolored and stacked ontop of eachother. I would like the points to be above their designated bar and in the same color.
#Add bars
A <- A + geom_col(aes(y = w1, fill = factor(Species1)),
position = position_dodge(preserve = 'single'))
#Add colors
A <- A + scale_fill_manual(values = c("A. pelagicus"= "skyblue1","A. superciliosus"="dodgerblue","A. vulpinus"="midnightblue","Alopias sp."="black"))
#Add points
A <- A + geom_point(aes(y = f1/2.5),
shape= 24,
size = 3,
fill = factor(Species1),
position = position_dodge(preserve = 'single'))
#change x and y axis range
A <- A + scale_x_continuous(breaks = c(2000:2020), limits = c(2016,2019))
A <- A + expand_limits(y=c(0,150))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
A <- A + scale_y_continuous(sec.axis = sec_axis(~.*2.5, name = " "))
# modifying axis and title
A <- A + labs(y = " ",
x = " ")
A <- A + theme(plot.title = element_text(size = rel(4)))
A <- A + theme(axis.text.x = element_text(face="bold", size=14, angle=45),
axis.text.y = element_text(face="bold", size=14))
#A <- A + theme(legend.title = element_blank(),legend.position = "none")
#Print plot
A
When I run this code I get the following error:
Error: Unknown colour name: A. pelagicus
In addition: Warning messages:
1: Width not defined. Set with position_dodge(width = ?)
2: In max(table(panel$xmin)) : no non-missing arguments to max; returning -Inf
I've tried a couple of things but I can't figure out it does work for geom_col and not for geom_points.
Thanks in advance
The two basic problems you have are dealing with your color error and not dodging, and they can be solved by formatting your scale_...(values= argument using a list instead of a vector, and applying the group= aesthetic, respectively.
You'll see the answer to these two question using an example:
# dummy dataset
year <- c(rep(2017, 4), rep(2018, 4))
species <- rep(c('things', 'things1', 'wee beasties', 'ew'), 2)
values <- c(10, 5, 5, 4, 60, 10, 25, 7)
pt.value <- c(8, 7, 10, 2, 43, 12, 20, 10)
df <-data.frame(year, species, values, pt.value)
I made the "values" set for my column heights and I wanted to use a different y aesthetic for points for illustrative purposes, called "pt.value". Otherwise, the data setup is similar to your own. Note that df$year will be set as numeric, so it's best to change that into either Date format (kinda more trouble than it's worth here), or just as a factor, since "2017.5" isn't gonna make too much sense here :). The point is, I need "year" to be discrete, not continuous.
Solve the color error
For the plot, I'll try to create it similar to you. Here note that in the scale_fill_manual object, you have to set the values= argument using a list. In your example code, you are using a vector (c()) to specify the colors and naming. If you have name1=color1, name2=color2,..., this represents a list structure.
ggplot(df, aes(x=as.factor(year), y=values)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
So the colors are applied correctly and my axis is discrete, and the y values of the points are mapped to pt.value like I wanted, but why don't the points dodge?!
Solve the dodging issue
Dodging is a funny thing in ggplot2. The best reasoning here I can give you is that for columns and barplots, dodging is sort of "built-in" to the geom, since the default position is "stack" and "dodge" represents an alternative method to draw the geom. For points, text, labels, and others, the default position is "identity" and you have to be more explicit in how they are going to dodge or they just don't dodge at all.
Basically, we need to let the points know what they are dodging based on. Is it "species"? With geom_col, it's assumed to be, but with geom_point, you need to specify. We do that by using a group= aesthetic, which let's the geom_point know what to use as criteria for dodging. When you add that, it works!
ggplot(df, aes(x=as.factor(year), y=values, group=species)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')

Plot with a grid

I am looking for a type of plot that is essentially a grid. For example, there will be 10 columns and 50 rows. For example, something like this:
Each of the boxes (in this case, 10*50 = 500) will have a unique value that I will be providing via a data frame. Based on the unique values, I'll have a function that will assign a colour to each box. So then it becomes a grid to visualize "the range" of each box. I'd also need to label each of the columns (probably vertically so all labels fit) and rows (horizontally).
I just don't know what kind of plot that will be and I don't know if any libraries do this. I'm just looking for some help in finding something that does this. I'd appreciate some help if possible.
How about heatmap?
m=matrix(runif(12),3,4)
rownames(m)=c("Me","You","Him")
colnames(m)=c("We","Us","Them","I")
heatmap(m,NA,NA)
Note that it works on a matrix and not a data frame because all the values have to be numbers, and data frames are row-oriented records.
See the help for other options.
Look at the image function in the graphics package, or the rasterImage function if you want more control.
You could also build the plot up from scratch using the rect function.
I would go to ggplot2 for this as it allows a high degree of flexibility. In particular geom_tile is useful. If you actually want the panel lines you can comment out the theme(panel.grid.major = element_blank()) + and theme(panel.grid.minor = element_blank()) + lines and of course you can specify the colours as well. The text in each cell is optional; comment out the geom_text call if you don't need that. Note that you can control the size of the plot (rows and columns) simply by resizing the plot window or - if you want to output to a file using png() - by specifying the width and height arguments.
library(ggplot2)
library(reshape)
library(scales)
set.seed(1234)
num.els <- 5
mydf <- data.frame(category1 = rep(LETTERS[1:num.els], 1, each = num.els),
category2 = rep(1:num.els, num.els),
value = runif(num.els^2, 0, 100))
p <- ggplot(mydf, aes(x = category1,
y = category2,
fill = value)) +
geom_tile() +
geom_text(label = round(mydf$value, 2), size = 4, colour = "black") +
scale_fill_gradient2(low = "blue", high = "red",
limits = c(min(mydf$value), max(mydf$value)),
midpoint = median(mydf$value)) +
scale_x_discrete(expand = c(0,0)) +
scale_y_reverse() +
theme(panel.grid.minor = element_blank()) +
theme(panel.grid.major = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(panel.background = element_rect(fill = "transparent"))+
theme(legend.position = "none") +
theme()
print(p)
Output:
And resized:
Lets say you have a dataframe with "x" and "y" coordinates per each cell of the grid, and a variable "z" for each cell, and you loaded this dataframe in R called "intlgrid":
head(intlgrid)
x y z
243.742 6783.367 0.0035285
244.242 6783.367 0.0037111
244.742 6783.367 0.0039073
"..."
"so on..."
With ggplot2 package you can easily plot your raster. So:
install.packages("ggplot2")
once installed ggplot2, you just call it
library(ggplot2)
Now the code:
ggplot(intlgrid, aes(x,y, fill = z)) + geom_raster() + coord_equal()
And then you get your grid plotted.

Resources