Add multiple shape legends in ggplot and overlaying shapes - r

I am trying to create an easily understandable ggplot graph with 3 subgroups delineated by geom_point's 1) color (3 colors; for A, B, and C variables), 2) overall shape (3 colored shapes with borders; for c, d, and e criteria), and 3) a cross shape overlaid over the points (2 groups; some with shape overlaid and some without based on df$Subscale = 1 vs. 0).
I am having difficulty figuring out how to incorporate the aesthetics and a separate legend for no. 3, since that would represent a second shape-based aesthetic.
Here's what I have so far:
It looks okay with with these subgroups (other than the color legend not working yet). Next I want to overlay a shape for all the points with variable names (y-axis) that have underscores in their names using df$Subscale (e.g., A_1, C2_1, B_2 rather than A, C, B2). Since I am already using the shape aesthetic, I don't know how to re-apply a shape conditionally.
Example of the shape I'd like applied:
Here is the code for the sample dataset df:
#The way my data is currently structured
a<- c("A", "A_1", "A_2", "A_3", "A2", "A2_1", "A2_2",
"B", "B_1", "B_2", "B2", "B2_1",
"C", "C_1", "C_2", "C2", "C2_1")
b<- c(rep(1, times=4),
rep(2, times = 3),
rep(1, times = 3),
rep(2, times = 2),
rep(1, times = 3),
rep(2, times = 2))
col<- c(rep(1, times=7),
rep(2, times = 5),
rep(3, times = 5))
u <- c(0, rep(1, times=3),
0, rep(1, times = 2),
0, rep(1, times = 2),
0, rep(1, times = 1),
0, rep(1, times = 2),
0, rep(1, times=1))
set.seed(12)
c <- round(rnorm(17, .5, 1),2)
d <- round(rnorm(17, .0, .5),2)
e <- round(rnorm(17, -.2, .5),2)
dat<-data.frame(cbind(a, b, col, u, c, d, e))
#Restructuring for graphing
library(reshape)
df <- melt(dat, id.vars = c("a", "b", "col", "u"))
colnames(df) <- c("Name", "Type", "Color", "Subscale", "Criteria", "Value")
df$Value<- as.numeric(as.character(df$Value))
df$Name_order <- factor(df$Name, levels=df$Name[order(df$Value[df$Criteria == "c"])], ordered=TRUE)
Here is the code to create the graphs:
palette <- c("#56B4E9", "#D55E00","#009E73")
graph_test <- ggplot(df, aes(x=df$Value, y = df$Name_order,
colour = df$Color, shape = df$Criteria)) +
geom_point(size = 6, aes(#colour=factor(df$Color),
fill=factor(df$Color),
shape=factor(df$Criteria))) +
scale_shape_manual(values=c(21, 24, 22),
labels=c("Criteria1", "Criteria2", "Criteria3")) +
scale_fill_manual(values=palette,
labels = "c", "d", "e") +
scale_color_manual(values=c(rep("black", times = 3))) +
labs(fill = "ABC", shape = "Criteria")
#First graph
graph_test
#Second graph
graph_test + geom_point(size = 5, shape=3)
I considered 6 categories for the shape aes, but I would still need to overlay the cross shape conditionally, and I would prefer 3 legends (3 colors, 3 shapes, 2 with overlay vs. not).
df$CriteriabySub <- paste0(df$Criteria, df$Subscale)
Any ideas/tips for correctly applying the cross shape to some of the points and creating a third legend for it?

Related

Values in gganimate col chart differs from original data values

I'm starting with animated charts and using gganimate package. I've found that when generating a col chart animation over time, values of variables change from original. Let me show you an example:
Data <- as.data.frame(cbind(c(1,1,1,2,2,2,3,3,3),
c("A","B","C","A","B","C","A","B","C"),
c(20,10,15,20,20,20,30,25,35)))
colnames(Data) <- c("Time","Object","Value")
Data$Time <- as.integer(Data$Time)
Data$Value <- as.numeric(Data$Value)
Data$Object <- as.character(Data$Object)
p <- ggplot(Data,aes(Object,Value)) +
stat_identity() +
geom_col() +
coord_cartesian(ylim = c(0,40)) +
transition_time(Time)
p
The chart obtained loks like this:
Values obtained in the Y-axis are between 1 and 6. It seems that the original value of 10 corresponds to a value of 1 in the Y-axis. 15 is 2, 20 is 3 and so on...
Is there a way for keeping the original values in the chart?
Thanks in advance
Your data changed when you coerced a factor variable into numeric. (see data section how to efficiently define a data.frame)
You were missing a position = "identity" for your bar charts to stay at the same place. I added a fill = Time for illustration.
Code
p <- ggplot(Data, aes(Object, Value, fill = Time)) +
geom_col(position = "identity") +
coord_cartesian(ylim = c(0, 40)) +
transition_time(Time)
p
Data
Data <- data.frame(Time = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
Object = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
Value = c(20, 10, 15, 20, 20, 20, 30, 25, 35))

Plot order in ggplot by colour and not alphabertical

I have the following code which splits the ggplot by color. Instead of the x axis to be plotted alphabetically is there an easy way to group the bars together so the red bar for example are next to each other? Manually moving them would not be an option as I have many more variables - cheers.
mydata <- data.frame(x = c("a", "d", "c", "q", "e", "s", "r", "b"),
n = c("UK","EUR","UK", "UK", "EUR", "GLB", "GLB", "EUR"),
F = c(-6, 17, 26, -37, 44, -22, 15, 12))
ggplot(mydata, aes(x = x, y = F, colour = n, fill =n)) + geom_bar(stat = "Identity")
You can try:
library(tidyverse)
mydata %>%
mutate(x1 = factor(x, levels=x[order(n,F)])) %>%
ggplot(aes(x = x1, y = F, colour = n, fill =n)) +
geom_col()
Not sure if it is what you want.
I guess you are plotting bar charts, and the bars are currently in alphabetical order like the following example,
library(ggplot2)
library(dplyr)
sample_data <- data.frame(
city = letters[1:5],
value = c(10, 11, 17, 12, 13),
country = c("c1", "c2", "c1", "c1", "c2")
)
ggplot(sample_data) +
geom_col(aes(x=city, y=value, colour=country, fill=country))
The order of the bars (left to right) is a, b, c, d, e. However, you want the bars ordered by country (the variable determines the colours/fill), i.e. a (c1), c (c1), d (c1), b (c2), e (c2).
To do this, you can set the 'correct' order of city using factor(city, levels=...). Since you want to sort city by country, the levels would be city[order(country)].
sample_data <- sample_data %>%
mutate(city2 = factor(city, levels=city[order(country)]))
ggplot(sample_data) +
geom_col(aes(x=city2, y=value, colour=country, fill=country))

ggplot2 geom_line indicating group size

Following http://docs.ggplot2.org/current/aes_group_order.html
h <- ggplot(Oxboys, aes(age, height))
h + geom_line(aes(group = Subject))
Produces
But if two Subjects have exactly the same line, one subject's line will hide the other. Could we use line thickness or intensity to indicate the number of subjects who have the same line? Could we add a bubble using geom_point() to indicate the number of subjects?
Use geom_line(aes(group = 'Subject'), alpha = .5). Play around with the alpha values.
You could accomplish it by first mapping the colour and size aesthetics and then adjusting their values using the scale_size_manual and scale_colour_manual functions. Below is a demonstration of the approach.
# a fake data set with two pairs of identical lines:
df <- data.frame(t = c(1:10, 1:10, 1:10, 1:10),
a = c(1:10, 1:10, seq(5, 8, length =10), seq(5, 8, length =10)),
c = rep(c("a", "b", "c", "d"), each = 10))
ggplot(df, aes(x = t, y = a, group = c)) +
geom_line(aes(size = c, colour = c)) +
scale_size_manual(values = c(4, 2, 3, 1.5)) +
scale_colour_manual(values = c("black", "red", "blue", "yellow"))
You must consider how your grouping factor (in the example c) is ordered, because the lines are also plotted in this order. So the line which is plotted first should get a larger value for size.

connect points in ggplot based on specific column values

I have the following data set called t:
n <- 12
t <- data.frame(
V1 = runif(n, 0.12, 0.35),
V2 = runif(n, 0.25, 0.39),
group = gl(3, 4, labels = c("a1", "a2", "a3")),
x = seq_len(n),
color = rep(rep.int(c("R", "G"), 2), c(3, 4, 3, 2))
)
I created the following plot from this data.
p <- ggplot(t, aes(x, colour = color)) +
geom_point(aes(y = V1, size = 10)) +
geom_point(aes(y = V2, size = 10))
What I want to do now is to connect the points depending on the group column (e.g, points of group a1 will be connected with a blue line, points of group a2 will be connected in a yellow line, ...) and i want the line to be different depending on V1 and V2 (dashed line for V1 and normal line for V2).
How this can be done?
First of all: naming a dataset "t" is not a good idea because it is confusing since there is a function t() as well.
The easiest way is to melt() your dataset first
Molten <- melt(t, id.vars = c("group", "x", "color"))
ggplot(Molten, aes(x = x, y = value, colour = group, linetype = variable)) + geom_line()
Have a look at the ggplot2 website on how to customise the colours.
If you want to plot your graph without using melt():
p <-ggplot(t) + geom_line(aes(x,V2,color=group)) + geom_line(aes(x,V1,color=group), linetype = "dashed")

creating and combining two plots - xy line plot with bar chain plot in R

The following two data sets I intend create graph from:
first data (will develop bottom portion)
position <- c(10, 26, 31, 50, 73, 92, 120, 124) # need scale
minimum 0 to maximum 130
label <- c("A", "B", "C", "D", "E", "F", "G", "H")
mydf <- data.frame (position, label)
second data (will develop line plot over layed)
pos <- 1:130
value <- seq (0, 1.29, 0.01)
mydf2 <- data.frame (pos, value)
The graph want to develop (similar or higher quality):
My trial
The following is what I tried, complete scratch !
yvar <- rep(1, length(position))
require (ggplot2)
bar <- data.frame(y = c(1, 1), x = c(0, 130))
ggplot() +
geom_line(aes(x, factor(y), group = factor(y)),
bar, size = 2, colour = "skyblue") +
geom_rect(aes(y = yvar,
xmin = position - 0.1,
xmax = position + 0.1,
ymin = 1 - yvar /2,
ymax = 1 + yvar /2))
Here is a solution with base graphics.
# Split the plot area in two
layout(matrix(c(1,1,2),nc=1))
# First plot
plot( pos, value, type="l", las=1 )
# Reduce the margins for the second plot
m <- par()$mar
m[1] <- m[3] <- 0
par(mar=m)
# Set the limits of the second plot
plot( pos, pos-pos, type="n", axes=FALSE, xlab="", ylab="" )
# Add the rectangle, the segments and the text.
polygon(
c(0,max(mydf2$pos),max(mydf2$pos),0),
.2*c(-1,-1,1,1),
col=rgb(.1,.5,.3)
)
segments( mydf$position, -.5, mydf$position, .5 )
text(mydf$position, -.7, mydf$label)
text(mydf$position, .7, mydf$position)

Resources