Connect points within x values for ggplot2? - r

I am plotting a series of point that are grouped by two factors. I would like to add lines within one group across the other and within the x value (across the position-dodge distance) to visually highlight trends within the data.
geom_line(), geom_segment(), and geom_path() all seem to plot only to the actual x value rather than the position-dodge place of the data points. Is there a way to add a line connecting points within the x value?
Here is a structurally analogous sample:
# Create a sample data set
d <- data.frame(expand.grid(x=letters[1:3],
g1=factor(1:2),
g2=factor(1:2)),
y=rnorm(12))
# Load ggplot2
library(ggplot2)
# Define position dodge
pd <- position_dodge(0.75)
# Define the plot
p <- ggplot(d, aes(x=x, y=y, colour=g1, group=interaction(g1,g2))) +
geom_point(aes(shape = factor(g2)), position=pd) +
geom_line()
# Look at the figure
p
# How to plot the line instead across g1, within g2, and within x?

Simply trying to close this question (#Axeman please feel free to take over my answer).
p <- ggplot(d, aes(x=x, y=y, colour=g1, group=interaction(g1,g2))) +
geom_point(aes(shape = factor(g2)), position=pd) +
geom_line(position = pd)
# Look at the figure
p

Related

Is it possible to make a column plot using ggplot in which the column fill is controlled by a third variable?

I have a data frame with three continuous variables (x,y,z). I want a column plot in which x defines the x-axis position of the columns, y defines the length of the columns, and the column colors (function of y) are defined by z. The test code below shows the set up.
`require(ggplot2)
require(viridis)
# Create a dummy data frame
x <- c(rep(0.0, 5),rep(0.5,10),rep(1.0,15))
y <- c(seq(0.0,-5,length.out=5),
seq(0.0,-10,length.out=10),
seq(0.0,-15,length.out=15))
z <- c(seq(10,0,length.out=5),
seq(8,0,length.out=10),
seq(6,0,length.out=15))
df <- data.frame(x=x, y=y, z=z)
pbase <- ggplot(df, aes(x=x, y=y, fill=z))
ptest <- pbase + geom_col(width=0.5, position="identity") +
scale_fill_viridis(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
print(ptest)`
The legend has the correct colors but the columns do not. Perhaps this is not the correct way to do this type of plot. I tried using geom_bar() which creates a bars with the correct colors but the y-values are incorrect.
It looks like you have 3 X values that each appear 5, 10, or 15 times. Do you want the bars to be overlaid on top of one another, as they are now? If you add an alpha = 0.5 to the geom_col call you'll see the overlapping bars.
Alternatively, you might use dodging to show the bars next to one another instead of on top of one another.
ggplot(df, aes(x=x, y=y, fill=z, group = z)) +
geom_col(width=0.5, position=position_dodge()) +
scale_fill_viridis_c(option="turbo", # added with ggplot 3.x in 2018
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
Or you might plot the data in order of y so that the smaller bars appear on top, visibly:
ggplot(dplyr::arrange(df,y), aes(x=x, y=y, fill=z))+
geom_col(width=0.5, position="identity") +
scale_fill_viridis_c(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
I solved this by using geom_tile() in place of geom_col().

3-variables plotting heatmap ggplot2

I'm currently working on a very simple data.frame, containing three columns:
x contains x-coordinates of a set of points,
y contains y-coordinates of the set of points, and
weight contains a value associated to each point;
Now, working in ggplot2 I seem to be able to plot contour levels for these data, but i can't manage to find a way to fill the plot according to the variable weight. Here's the code that I used:
ggplot(df, aes(x,y, fill=weight)) +
geom_density_2d() +
coord_fixed(ratio = 1)
You can see that there's no filling whatsoever, sadly.
I've been trying for three days now, and I'm starting to get depressed.
Specifying fill=weight and/or color = weight in the general ggplot call, resulted in nothing. I've tried to use different geoms (tile, raster, polygon...), still nothing. Tried to specify the aes directly into the geom layer, also didn't work.
Tried to convert the object as a ppp but ggplot can't handle them, and also using base-R plotting didn't work. I have honestly no idea of what's wrong!
I'm attaching the first 10 points' data, which is spaced on an irregular grid:
x = c(-0.13397460,-0.31698730,-0.13397460,0.13397460,-0.28867513,-0.13397460,-0.31698730,-0.13397460,-0.28867513,-0.26794919)
y = c(-0.5000000,-0.6830127,-0.5000000,-0.2320508,-0.6547005,-0.5000000,-0.6830127,-0.5000000,-0.6547005,0.0000000)
weight = c(4.799250e-01,5.500250e-01,4.799250e-01,-2.130287e+12,5.798250e-01,4.799250e-01,5.500250e-01,4.799250e-01,5.798250e-01,6.618956e-01)
any advise? The desired output would be something along these lines:
click
Thank you in advance.
From your description geom_density doesn't sound right.
You could try geom_raster:
ggplot(df, aes(x,y, fill = weight)) +
geom_raster() +
coord_fixed(ratio = 1) +
scale_fill_gradientn(colours = rev(rainbow(7)) # colourmap
Here is a second-best using fill=..level... There is a good explanation on ..level.. here.
# load libraries
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
# build your data.frame
df <- data.frame(x=x, y=y, weight=weight)
# build color Palette
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")), space="Lab")
# Plot
ggplot(df, aes(x,y, fill=..level..) ) +
stat_density_2d( bins=11, geom = "polygon") +
scale_fill_gradientn(colours = myPalette(11)) +
theme_minimal() +
coord_fixed(ratio = 1)

Print correlation data in same plot position across facets

I have a faceted ggplot2 scatterplot, and would like to print summary statistics about the linear regression on each facet, as has been done here and here. Unlike those examples, I am using scales="free", and the ranges of the data in each facet are quite different, but I would like the summary statistics to show up in the same relative position in each facet (e.g. top right corner, or whatever). How can I specify to geom_text or annotate that the label should appear in the same position relative to the panel?
Where I am right now:
# Fake data
set.seed(2112)
x <- c(1:10, 6:15)
y <- x + c(runif(10), runif(10)*10)
l <- gl(2, 10)
d <- data.frame(x=x, y=y, l=l)
# Calculate a summary statistic (here, the r-squared) in a separate data frame
r_df <- ddply(d, .(l), summarise, rsq=round(summary(lm(y~x))$r.squared, 2))
# Use geom_text and a separate data frame to print the summary statistic
ggplot(d, aes(x=x, y=y)) +
geom_text(data=r_df, aes(x=8, y=8, label=paste("rsq=", rsq)))+
geom_point() +
facet_wrap(~l, scales="free")
I would like, instead, to have ggplot automatically position the text in the same relative position in each facet.
If you want to place them relative to the corners, you can achieve that by specifying an x or y position of Inf or -Inf:
ggplot(d, aes(x=x, y=y)) +
geom_text(data=r_df, aes(label=paste("rsq=", rsq)),
x=-Inf, y=Inf, hjust=-0.2, vjust=1.2)+
geom_point() +
facet_wrap(~l, scales="free")
I also adjusted hjust and vjust so the label was not in the exact corner of the graph by pushed away from it a bit.

When I use stat_summary with line and point geoms I get a double legend

I have data for 4 sectors (A,B,C,D) and 5 years. I would like to draw 4 lines, 1 for each sector, adding a point for every year and add a fifth line representing the mean line using the stat_summary statement and controlling the line colors by means of scale_color_manual and point shapes in aes() argument. The problem is that if I add the point geom the legend is split in two parts one for point shapes and one for line colors. I didn't understand how to obtain 1 legend combining colors and points.
Here is an example. First of all let's build the data frame dtfr as follows:
a <- 100; b <- 100; c <- 100; d <- 100
for(k in 2:5){
a[k] <- a[k-1]*(1+rnorm(1)/100)
b[k] <- b[k-1]*(1+rnorm(1)/100)
c[k] <- c[k-1]*(1+rnorm(1)/100)
d[k] <- d[k-1]*(1+rnorm(1)/100)
}
v <- numeric()
for(k in 1:5){ v <- c(v,a[k],b[k],c[k],d[k]) }
dtfr <- data.frame(Year=rep(2008:2012,1, each=4),
Sector=rep(c("A","B","C","D"),5),
Value=v,
stringsAsFactors=F)
Now let us start to draw our graph by ggpolt2. In the first graph we draw lines and points geom without the mean line:
library(ggplot2)
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
# stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
ggtitle("Test for ggplot2 graph")
In this graph we have the legend with line colors and point shapes all in one:
But if I use the stat_summary to draw the mean line using the following code:
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
ggtitle("Test for ggplot2 graph")
I get the mean (red) line but the legend is split into two parts one for line colors and one for point shapes. At this point my question is: How can I get the mean line graph with the legend like the one in the first graph? That is, how to get only one legend combining lines and shapes in the second graph where is drawn the mean line?
Try this:
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
stat_summary(aes(colour="mean",shape="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
scale_shape_manual(values=c(1:4, 32)) +
ggtitle("Test for ggplot2 graph")
Maybe someone more knowledgeable can come in and correct my explanation (or provide a better solution), but here's how I understand it: You have 5 values in the color scale, but you only have 4 in the shape scale; you're missing a value for "mean". So the scales aren't really compatible in a way. You can fix this by assigning a blank shape (32) to your mean line.
Here is a different approach that calculates the summary/mean beforehand and adds it as an additional level to the data frame before building the plot.
The approach can be used to easily add an additional line but with a specific color, which may be desired for a summary/mean for example.
First, I calculate the mean and add it to the dtfr of the OP.
dtfr2 <- dtfr %>%
dplyr::group_by(Year) %>%
dplyr::summarise(Value = mean(Value)) %>%
dplyr::mutate(Sector = NA) %>%
dplyr::bind_rows(dtfr)
dtfr2 now has additional rows with the mean values stored in Value and NAs in Sector.
Then, building the plot is easy:
p1 <- ggplot(dtfr2, aes(x=Year, y=Value, color = Sector, shape = Sector)) +
geom_line() +
geom_point()
Finally, you may tweak the legend a little:
p1 +
scale_color_discrete(labels = c(letters[1:4], "M"), na.value = "black") +
scale_shape_discrete(labels = c(letters[1:4], "M"))

ggplot2 using geom_errorbar and geom_point to add points to a plot

I have a plot using ggplot, and I would like to add points and error bars to it. I am using geom_errorbar and geom_point, but I am getting an error: "Discrete value supplied to continuous scale" and I am not sure why. The data labels in the plot below should remain the same. I simply want to add new points to the existing graph. The new graph should look like the one below, except with two points/CI bars for each label on the Y axis.
The following example is from the lme4 package, and it produces a plot with confidence intervals using ggplot below (all can be replicated except the last two lines of borken code). My data is only different in that it includes about 15 intercepts instead of 6 below (which is why I am using scale_shape_manual).
The last two lines of code is my attempt at adding points/confidence intervals. I'm going to put a 50 bounty on this. Please let me know if I am being unclear. Thanks!
library("lme4")
data(package = "lme4")
# Dyestuff
# a balanced one-way classiï¬cation of Yield
# from samples produced from six Batches
summary(Dyestuff)
# Batch is an example of a random effect
# Fit 1-way random effects linear model
fit1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
summary(fit1)
coef(fit1) #intercept for each level in Batch
randoms<-ranef(fit1, postVar = TRUE)
qq <- attr(ranef(fit1, postVar = TRUE)[[1]], "postVar")
rand.interc<-randoms$Batch
#THESE ARE THE ADDITIONAL POINTS TO BE ADDED TO THE PLOT
Inter <- c(-25,-45,20,30,23,67)
SE2 <- c(20,20,20,20,20,20)
df<-data.frame(Intercepts=randoms$Batch[,1],
sd.interc=2*sqrt(qq[,,1:length(qq)]), Intercepts2=Inter, sd.iterc2=SE2,
lev.names=rownames(rand.interc))
df$lev.names<-factor(df$lev.names,levels=df$lev.names[order(df$Intercepts)])
library(ggplot2)
p <- ggplot(df,aes(lev.names,Intercepts,shape=lev.names))
#Added horizontal line at y=0
#Includes first set of points/confidence intervals. This works without error
p <- p + geom_hline(yintercept=0) +geom_errorbar(aes(ymin=Intercepts-sd.interc, ymax=Intercepts+sd.interc), width=0,color="black") + geom_point(aes(size=2))
#Removed legends and with scale_shape_manual point shapes set to 1 and 16
p <- p + guides(size=FALSE,shape=FALSE) + scale_shape_manual(values=c(16,16,16,16,16,16))
#Changed appearance of plot (black and white theme) and x and y axis labels
p <- p + theme_bw() + xlab("Levels") + ylab("")
#Final adjustments of plot
p <- p + theme(axis.text.x=element_text(size=rel(1.2)),
axis.title.x=element_text(size=rel(1.3)),
axis.text.y=element_text(size=rel(1.2)),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank())
#To put levels on y axis you just need to use coord_flip()
p <- p+ coord_flip()
print(p)
#####
# code for adding more plots, NOT working yet
p <- p +geom_errorbar(aes(ymin=Intercepts2-sd.interc2, ymax=Intercepts2+sd.interc2),
width=0,color="gray40", lty=1, size=1)
p <- p + geom_point(aes(Intercepts2, lev.names),size=0,pch=7)
First, in your data frame df and geom_errorbar() there are two different variables sd.iterc2 and sd.interc2. Changed also in df to sd.interc2.
For the last line of geom_point() you get the error because your x and y values are in wrong order. As your are using coord_flip() then x and y values should be placed in the same order as in original plot before coord_flip(), that is, lev.names as x, and Intercepts2 as y. Changed also size= to 5 for better illustration.
+ geom_point(aes(lev.names,Intercepts2),size=5,pch=7)
Update - adding legend
To add legend for the points of intercept types, one option is to reshape your data to long format and add new column with intercept types. Other option with your existing data is, first, remove shape=lev.names from ggplot() call. Then in both geom_point() calls add shape="somename" inside aes(). Then with scale_shape_manual() set shape values you need.
ggplot(df,aes(lev.names,Intercepts))+
geom_hline(yintercept=0) +
geom_errorbar(aes(ymin=Intercepts-sd.interc, ymax=Intercepts+sd.interc), width=0,color="black")+
geom_point(aes(shape="Intercepts"),size=5)+
theme_bw() + xlab("Levels") + ylab("")+
theme(axis.text.x=element_text(size=rel(1.2)),
axis.title.x=element_text(size=rel(1.3)),
axis.text.y=element_text(size=rel(1.2)),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank())+
coord_flip()+
geom_errorbar(aes(ymin=Intercepts2-sd.interc2, ymax=Intercepts2+sd.interc2),
width=0,color="gray40", lty=1, size=1) +
geom_point(aes(lev.names,Intercepts2,shape="Intercepts2"),size=5)+
scale_shape_manual(values=c(16,7))

Resources