How to use sec_axis() for discrete data in ggplot2 R? - r

I have discreet data that looks like this:
height <- c(1,2,3,4,5,6,7,8)
weight <- c(100,200,300,400,500,600,700,800)
person <- c("Jack","Jim","Jill","Tess","Jack","Jim","Jill","Tess")
set <- c(1,1,1,1,2,2,2,2)
dat <- data.frame(set,person,height,weight)
I'm trying to plot a graph with same x-axis(person), and 2 different y-axis (weight and height). All the examples, I find is trying to plot the secondary axis (sec_axis), or discreet data using base plots.
Is there an easy way to use sec_axis for discreet data on ggplot2?
Edit: Someone in the comments suggested I try the suggested reply. However, I run into this error now
Here is my current code:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("height",sec_axis(~.*1.2, name="height"))
p2
I get the error: Error in x < range[1] :
comparison (3) is possible only for atomic and list types
Alternately, now I have modified the example to match this example posted.
p <- ggplot(dat, aes(x = person))
p <- p + geom_line(aes(y = height, colour = "Height"))
# adding the relative weight data, transformed to match roughly the range of the height
p <- p + geom_line(aes(y = weight/100, colour = "Weight"))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*100, name = "Relative weight [%]"))
# modifying colours and theme options
p <- p + scale_colour_manual(values = c("blue", "red"))
p <- p + labs(y = "Height [inches]",
x = "Person",
colour = "Parameter")
p <- p + theme(legend.position = c(0.8, 0.9))+ facet_wrap(~set, scales="free")
p
I get an error that says
"geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?"
I get the template, but no points get plotted

R function arguments are fed in by position if argument names are not specified explicitly. As mentioned by #Z.Lin in the comments, you need sec.axis= before your sec_axis function to indicate that you are feeding this function into the sec.axis argument of scale_y_continuous. If you don't do that, it will be fed into the second argument of scale_y_continuous, which by default, is breaks=. The error message is thus related to you not feeding in an acceptable data type for the breaks argument:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("weight", sec.axis = sec_axis(~.*1.2, name="height"))
p2
The first argument (name=) of scale_y_continuous is for the first y scale, where as the sec.axis= argument is for the second y scale. I changed your first y scale name to correct that.

Related

Creating a legend with shapes using ggplot2

I have created the following code for a graph in which four fitted lines and corresponding points are plotted. I have problems with the legend. For some reason I cannot find a way to assign the different shapes of the points to a variable name. Also, the colours do not line up with the actual colours in the graph.
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
g <- ggplot(df, aes(x=x), shape="shape") +
geom_smooth(aes(y=y1), colour="red", method="auto", se=FALSE) + geom_point(aes(y=y1),shape=14) +
geom_smooth(aes(y=y2), colour="blue", method="auto", se=FALSE) + geom_point(aes(y=y2),shape=8) +
geom_smooth(aes(y=y3), colour="green", method="auto", se=FALSE) + geom_point(aes(y=y3),shape=6) +
geom_smooth(aes(y=y4), colour="yellow", method="auto", se=FALSE) + geom_point(aes(y=y4),shape=2) +
ylab("x") + xlab("y") + labs(title="overview")
geom_line(aes(y=1000), linetype = "dashed")
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5)) +
scale_shape_binned(name="Value g", values=c(y1="14",y2="8",y3="6",y4="2"))
print(g)
I am wondering why the colours don't match up and how I can construct such a legend that it is clear which shape corresponds to which variable name.
While you can add the legend manually via scale_shape_manual, perhaps the adequate solution would be to reshape your data (try using tidyr::pivot_longer() on y1:y4 variables), and then assigning the resulting variable to the shape aesthetic (you can then manually set the colors to your liking). You would then need to use a single geom_point() and geom_smooth() instead of four of each.
Also, you're missing a reproducible example (what are the values of x?) and your code emits some warnings while trying to perform loess smoothing (because there's fewer data points than need to perform it).
Update (2021-12-12)
Here's a reproducible example in which we reshape the original data and feed it to ggplot using its aes() function to automatically plot different geom_point and geom_smooth for each "y group". I made up the values for the x variable.
library(ggplot2)
library(tidyr)
x <- 1:6
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
data2 <- df %>%
pivot_longer(y1:y4, names_to = "group", values_to = "y")
ggplot(data2, aes(x, y, color = group, shape = group)) +
geom_point(size = 3) + # increased size for increased visibility
geom_smooth(method = "auto", se = FALSE)
Run the code line by line in RStudio and use it to inspect data2. I think it'll make more sense here's the resulting output:
Another update
Freek19, in your second example you'll need to specify both the shape and color scales manually, so that ggplot2 considers them to be the same, like so:
library(ggplot2)
data <- ... # from your previous example
ggplot(data, aes(x, y, shape = group, color = group)) +
geom_smooth() +
geom_point(size = 3) +
scale_shape_manual("Program type", values=c(1, 2, 3,4,5)) +
scale_color_manual("Program type", values=c(1, 2, 3,4,5))
Hope this helps.
I managed to get close to what I want, using:
library(ggplot2)
data <- data.frame(x = c(0,0.02,0.04,0.06,0.08,0.1),
y = c(1400,1200,1100,1000,910,850, #y1
1300,1130,1010,970,890,840, #y2
1200,1080,980,950,880,820, #y3
1100,1050,960,930,830,810, #y4
1050,1000,950,920,810,800), #y5
group = rep(c("5%","6%","7%","8%","9%"), each = 6))
data
Values <- ggplot(data, aes(x, y, shape = group, color = group)) + # Create line plot with default colors
geom_smooth(aes(color=group)) + geom_point(aes(shape=group),size=3) +
scale_shape_manual(values=c(1, 2, 3,4,5))+
geom_line(aes(y=1000), linetype = "dashed") +
ylab("V(c)") + xlab("c") + labs(title="Valuation")+
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5))+
labs(group="Program Type")
Values
I am only stuck with 2 legends. I want to change both name, because otherwise they overlap. However I am not sure how to do this.

How to stop ggplot line plot adding fill

I am producing a ggplot which looks at a curve in a dataset. When I build the plot, ggplot is automatically adding fill to data which is on the negative side of the x axis. Script and plot shown below.
ggplot(df, aes(x = Var1, y = Var2)) +
geom_line() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
Using base R, I am able to get the plot shown below which is how it should look.
plot(x = df$Var1, y = df$Var2, type = "l",
xlab = "Var1", ylab = "Var2")
abline(v = 0)
abline(h = df$Var2[1])
If anyone could help identify why I might be getting the automatic fill and how I could make it stop, I would be very appreciative. I would like to make this work in ggplot so I can later animate the line as it is a time series that can be used to compare between other datasets from the same source.
Can add data if necessary. Data set is 1561 obs long however. Thanks in advance.
I guess you should try
ggplot(df, aes(x = Var1, y = Var2)) +
geom_path() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
instead. The geom_line()-function connects the points in order of the variable on the x-axis.
Take a look at this example
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_line()
The two points with x-coordinate -pi/2 will be connected first, creating a vertical black line. Next x = -pi/2 + 0.001 will be processed and so on. The x values will be processed in order.
Therefore you should use geom_path() to get the desired result
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_path()

Adding multiple points to a ggplot ecdf plot

I'm trying to generate a ggplot only C.D.F. plot for some of my data. I am also looking to be able to plot an arbitrary number of percentiles as points on top. I have a solution that works for adding a single point to my curve but fails for multiple values.
This works for plotting one percentile value
TestDf <- as.data.frame(rnorm(1000))
names(TestDf) <- c("Values")
percentiles <- c(0.5)
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(aes(x = quantile(TestDf$Values, percentiles),
y = percentiles))
However this fails
TestDf <- as.data.frame(rnorm(1000))
names(TestDf) <- c("Values")
percentiles <- c(0.25,0.5,0.75)
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(aes(x = quantile(TestDf$Values, percentiles),
y = percentiles))
With error
Error: Aesthetics must be either length 1 or the same as the data (1000): x, y
How can I add an arbitrary number of points to a stat_ecdf() plot?
You need to define a new dataset, outside of the aesthetics. aes refers to the original dataframe that you used for making the CDF (in the original ggplot argument).
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(data = data.frame(x=quantile(TestDf$Values, percentiles),
y=percentiles), aes(x=x, y=y))

Violin plots with additional points

Suppose I make a violin plot, with say 10 violins, using the following code:
library(ggplot2)
library(reshape2)
df <- melt(data.frame(matrix(rnorm(500),ncol=10)))
p <- ggplot(df, aes(x = variable, y = value)) +
geom_violin()
p
I can add a dot representing the mean of each variable as follows:
p + stat_summary(fun.y=mean, geom="point", size=2, color="red")
How can I do something similar but for arbitrary points?
For example, if I generate 10 new points, one drawn from each distribution, how could I plot those as dots on the violins?
You can give any function to stat_summary provided it just returns a single value. So one can use the function sample. Put extra arguments such as size, in the fun.args
p + stat_summary(fun.y = "sample", geom = "point", fun.args = list(size = 1))
Assuming your points are qualified using the same group names (i.e., variable), you should be able to define them manually with:
newdf <- group_by(df, variable) %>% sample_n(10)
p + geom_point(data=newdf)
The points can be anything, including static numbers:
newdf <- data.frame(variable = unique(df$variable), value = seq(-2, 2, len=10))
p + geom_point(data=newdf)
I had a similar problem. Code below exemplifies the toy problem - How does one add arbitrary points to a violin plot? - and solution.
## Visualize data set that comes in base R
head(ToothGrowth)
## Make a violin plot with dose variable on x-axis, len variable on y-axis
# Convert dose variable to factor - Important!
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1)
# Suppose you want to add 3 blue points
# [0.5, 10], [1,20], [2, 30] to the plot.
# Make a new data frame with these points
# and add them to the plot with geom_point().
TrueVals <- ToothGrowth[1:3,]
TrueVals$len <- c(10,20,30)
# Make dose variable a factor - Important for positioning points correctly!
TrueVals$dose <- as.factor(c(0.5, 1, 2))
# Plot with 3 added blue points
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1) +
geom_point(data = TrueVals, color = "blue")

Why are the colors wrong on this ggplot? [duplicate]

This question already has an answer here:
ggplot wrong color assignment
(1 answer)
Closed 7 months ago.
I am new to ggplot2 so please have mercy on me.
My first attempt produces a strange result (at least it's strange to me). My reproducible R code is:
library(ggplot2)
iterations = 7
variables = 14
data <- matrix(ncol=variables, nrow=iterations)
data[1,] = c(0,0,0,0,0,0,0,0,10134,10234,10234,10634,12395,12395)
data[2,] = c(18596,18596,18596,18596,19265,19265,19390,19962,19962,19962,19962,20856,20856,21756)
data[3,] = c(7912,11502,12141,12531,12718,12968,13386,17998,19996,20226,20388,20583,20879,21367)
data[4,] = c(0,0,0,0,0,0,0,43300,43500,44700,45100,45100,45200,45200)
data[5,] = c(11909,11909,12802,12802,12802,13202,13307,13808,21508,21508,21508,22008,22008,22608)
data[6,] = c(11622,11622,11622,13802,14002,15203,15437,15437,15437,15437,15554,15554,15755,16955)
data[7,] = c(8626,8626,8626,9158,9158,9158,9458,9458,9458,9458,9458,9458,9558,11438)
df <- data.frame(data)
n_data_rows = nrow(df)
previous_volumes = df[1:(n_data_rows-1),]/1000
todays_volume = df[n_data_rows,]/1000
time = seq(ncol(df))/6
min_y = min(previous_volumes, todays_volume)
max_y = max(previous_volumes, todays_volume)
ylimit = c(min_y, max_y)
x = seq(nrow(previous_volumes))
# This gives a plot with 6 gray lines and one red line, but no Ledgend
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1"), color="gray")
}
p
This code produces a correct plot... but no legend. The plot looks like:
If I move "color" inside "aes", I now get a legend... but the colors are wrong.
For example, the code:
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1", color="gray"))
}
y2 = as.integer(todays_volume[1,])
dd = data.frame(time, y2)
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p
produces:
Why are the line colors wrong?
Charles
Colours can be controlled on an individual layer basis (i.e. the colour = XYZ) variable, however, these will not appear in any legend. Legends are produced when you have an aesthetic (i.e. in this case colour aesthetic) mapped to a variable in your data, in which case, you need to instruct how to to represent that specific mapping. If you do not specify explicitly, ggplot2 will try to make a best guess (say in the difference between discrete and continuous mapping for factor data vs numeric data). There are many options available here, including (but not limited to): scale_colour_continuous, scale_colour_discrete, scale_colour_brewer, scale_colour_manual.
By the sounds of it, scale_colour_manual is probably what you are after, note that in the below I have mapped the 'variable' column in the data to the colour aesthetic, and in the 'variable' data, the discrete values [PREV-A to PREV-F,Today] exists, so now we need to instruct what actual colour 'PREV-A','PREV-B',...'PREV-F' and 'Today' represents.
Alternatively, If the variable column contains 'actual' colours (i.e. hex '#FF0000' or name 'red') then you can use scale_colour_identity. We can also create another column of categories ('Previous','Today') to make things a little easier, in which case, be sure to introduce the 'group' aesthetic mapping to prevent series with the same colour (which are actually different series) being made continuous between them.
First prepare the data, then go through some different methods to assign colours.
# Put data as points 1 per row, series as columns, start with
# previous days
df.new = as.data.frame(t(previous_volumes))
#Rename the series, for colour mapping
colnames(df.new) = sprintf("PREV-%s",LETTERS[1:ncol(df.new)])
#Add the times for each point.
df.new$Times = seq(0,1,length.out = nrow(df.new))
#Add the Todays Volume
df.new$Today = as.numeric(todays_volume)
#Put in long format, to enable mapping of the 'variable' to colour.
df.new.melt = reshape2::melt(df.new,'Times')
#Create some colour mappings for use later
df.new.melt$color_group = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='Today','Previous'))
df.new.melt$color_identity = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='red','grey'))
And here are a few different ways of manipulating the colours:
#1. Base plot + color mapped to variable
plot1 = base + geom_path(aes(color=variable)) +
ggtitle("Plot #1")
#2. Base plot + color mapped to variable, Manual scale for Each of the previous days and today
colors = setNames(c(rep('gray',nrow(previous_volumes)),'red'),
unique(df.new.melt$variable))
plot2 = plot1 + scale_color_manual(values = colors) +
ggtitle("Plot #2")
#3. Base plot + color mapped to color group
plot3 = base + geom_path(aes(color = color_group,group=variable)) +
ggtitle("Plot #3")
#4. Base plot + color mapped to color group, Manual scale for each of the groups
plot4 = plot3 + scale_color_manual(values = c('Previous'='gray','Today'='red')) +
ggtitle("Plot #4")
#5. Base plot + color mapped to color identity
plot5 = base + geom_path(aes(color = color_identity,group=variable))
plot5a = plot5 + scale_color_identity() + #Identity not usually in legend
ggtitle("Plot #5a")
plot5b = plot5 + scale_color_identity(guide='legend') + #Identity forced into legend
ggtitle("Plot #5b")
gridExtra::grid.arrange(plot1,plot2,plot3,plot4,
plot5a,plot5b,ncol=2,
top="Various Outputs")
So given your question, #2 or #4 is probably what you are after, using #2, we can add another layer to render the value of the last points:
#Additionally, add label of the last point in each series.
df.new.melt.labs = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%.2f",df$value)
df
})
baseWithLabels = base +
geom_path(aes(color=variable)) +
geom_label(data = df.new.melt.labs,aes(label=label,color=variable),
position = position_nudge(y=1.5),size=3,show.legend = FALSE) +
scale_color_manual(values=colors)
print(baseWithLabels)
If you want to be able to distinguish between the various 'PREV-X' lines, then you can also map linetype to this variable and/or make the label geometry more descriptive, below demonstrates both modifications:
#Add labels of the last point in each series, include series info:
df.new.melt.labs2 = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%s: %.2f",df$variable,df$value)
df
})
baseWithLabelsAndLines = base +
geom_path(aes(color=variable,linetype=variable)) +
geom_label(data = df.new.melt.labs2,aes(label=label,color=variable),
position = position_nudge(y=1.5),hjust=1,size=3,show.legend = FALSE) +
scale_color_manual(values=colors) +
labs(linetype = 'Series')
print(baseWithLabelsAndLines)
My solution, which I got from here is to add scale_colour_identity() to your ggplot object -
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p = p + scale_colour_identity()
p

Resources