How to add a legend on a multiple line graph in R? - r

I am trying to plot two different datasets on the same plot. I am using this code to add the lines and to actually plot everything
ggplot()+
geom_point(data=Acc, aes(x=Year, y=Accumulo), color="lightskyblue")+
geom_line(data=Acc, aes(x=Year, y=RM3), color="gold1")+
geom_line(data=Acc, aes(x=Year, y=RM5), color="springgreen3")+
geom_line(data=Acc, aes(x=Year, y=RM50), color="blue")+
geom_line(data=Vulcani, aes(x=Year, y=Accumulo.V), color="red")+
theme_bw()+
scale_x_continuous(expand=expand_scale(0)) + scale_y_continuous(limits=c(50,350),expand=expand_scale(0))
but I can't find any way to add a legend and add custom labels to the different series. I find a way to add legends on a single dataset, but I can't find a way to add to this one a legend on the side

You are better off creating a single dataset tailored to your plot needs before, which would be in the long format, so that you can give a single geom_line() instruction, and add colors to the lines with aes(color = ...) within the call to geom_line(). Here's an example with the midwest dataset (consider them as distinct datasets for the sake of example)
library(ggplot2)
library(dplyr)
library(tidyr)
long_midwest <- midwest %>%
select(popwhite, popasian, PID, poptotal) %>%
gather(key = "variable", value = "value", -PID, -poptotal) # convert to long format
long_midwest2 <- midwest %>%
select(poptotal, perchsd, PID) %>%
gather(key = "variable", value = "value", -PID, -poptotal)
plot_data <- bind_rows(long_midwest, long_midwest2) %>% # bind datasets vertically
mutate(line_type = ifelse(variable == 'perchsd', 'A', 'B')) # creates a line_type variable
ggplot(data = plot_data, aes(x=poptotal, y = value))+
geom_line(aes(color = variable, linetype = line_type)) +
scale_color_manual(
values = c('lightskyblue', 'gold1', 'blue'),
name = "My color legend"
) +
scale_linetype_manual(
values = c(3, 1), # play with the numbers to get the correct styling
name = "My linetype legend"
)
I added a line_type variable to show the most generic case where you want specific mapping between the column values and the line type. If it is the same than, say, variable, just use aes(color = variable, linetype = variable). You can then decide which linetype you want (see here for more details).
For customising the labels, just change the content of variable within the dataset with the desired values.

Related

How to insert color code for two geom_step functions in the same grid

I am currently working in a comparison between two inventory levels and I want to plot two step graphs in the same grid with a color code. This is my code.
Intento1<-data.frame(Fecha, NivelI)
Intento2<-data.frame(Fecha, Nivel2)
#Printing the step graphs in one grid
ggplot()+geom_step(Intento1, mapping=aes(x=Fecha, y=NivelI))+geom_step(Intento2, mapping=aes(x=Fecha, y=Nivel2))
And it works fine plotting both graphs in the same grid, I could also add a different color to each graph but I couldn´t add the little colored labels that appear normally at the right. All support is appreciated.
For example data dummy,
dummy <- data.table(
Fecha = seq(as.Date("2020/1/1"), as.Date("2020/1/31"), "day")
)
dummy$NivelI = runif(31, 0, 10)
dummy$Nivel2 = runif(31, 0, 10)
plot using reshape2::melt like below will work.
dummy %>%
melt(id.vars = "Fecha") %>%
ggplot(aes(Fecha, value, group = variable, color = variable)) +
geom_step() + guides(color = guide_legend(title = "aaa"))
In your case, to make dummy formed data, if Fecha, NivelI and Nivel2 are vectors, just try
df <- data.frame(
Fecha,
NivelI,
Nivel2
)
then
df %>%
melt(id.vars = "Fecha") %>%
ggplot(aes(Fecha, value, group = variable, color = variable)) +
geom_step() + guides(color = guide_legend(title = "aaa"))
where "aaa" will be your legend name.

create plot in ggplot for each unique value in a row in r

I have a dataframe like this:
library(tidyverse)
my_data <- tibble(name = c("Justin", "Janet", "Marisa"),
x = c(100, 50, 75),
y = c(2, 3, 6))
Each name is unique, and I want to make a bar graph for each person without having to do it line by line. I also want to save each plot as a unique object because I'll be inputting it into a power point using the officer package. Last, the names won't always be the same, but each name will always be unique.
For instance, I want one plot for Janet, one plot for Justin, and one plot for Marisa. I don't want them faceted but instead as their own objects.
Any thoughts?
We can get the data in long format first and for each individual name create the plot.
library(tidyverse)
long_data <- my_data %>% tidyr::pivot_longer(cols = -name, names_to = 'col')
plots_list <- map(unique(my_data$name), ~long_data %>%
filter(name == .x) %>%
ggplot() + aes(name, value, fill = col) +
geom_bar(stat = 'identity', position = 'dodge') +
scale_fill_manual(values = c('red', 'blue')) +
ggtitle(paste0('Plot for ', .x)))
This will return list of plots where individual plots can be accessed via plots_list[[1]], plots_list[[2]] etc.
plots_list[[1]]

Can I control the order of multiple labels for the same value using geom_text_repel?

I'm making a plot where several data points have the same coordinates. By default, the labels all overlap, but using geom_text_repel with direction = "y", I can vertically space them out.
However, every time I generate the plot, it chooses a new order for the labels. I would like them to be ordered based on a value.
I have tried:
using "arrange" to order the dataframe in the order that I want to see the labels (this seems to have no effect)
Trying to use "nudge_y" to re-arrange the labels in the order I want them. This seems to change the plot - it does "nudge" them - but it does NOT nudge them into the correct order!
Here is sample code to recreate the problem. Basically, I want the final plot to be ordered by the "order" value - so, for the three datapoints on "10", the order should be Ayala, Zoe, JL, and for the two datapoints on "5", the order should be Raph, Oona.
I've color-coded the plot to make it obvious what order they should be in - for each value, the lightest blue should be on top, and the darkest should be on the bottom.
library(tidyverse)
library(ggrepel)
name <- c("Oona","Sam","Raph", "JL", "Zoe","Ayala")
year <- rep(c("2016"),6)
value <- c(5,15,5,10,10,10) #The value I'm plotting
order <- c(5,-10,10,-5,0,5) #The value I want to order the labels by
test_df <- bind_cols(name = name, year = year, value = value, order = order) %>%
arrange(-value, -order) #arranging the df doesn't seem to affect the order on the plot at all, I just do it so I can easily preview the df in the correct order
ggplot(data = test_df, aes(x = year, y = value, group = name)) +
geom_point(aes(color = order)) +
geom_text_repel(data = test_df,
aes(label = name, color = order),
hjust = "left",
nudge_y = order, #This is where I'm trying to "nudge" them into the right order
nudge_x = -.45,
direction = "y")
I think the values in your order column were too big for the y-axis scale provided, so geom_text_repel was doing behind-the-scenes work to make it all actually fit, and changed the order of the labels in the process. When I scaled the order column down to one-fifth the sizes you had originally, it worked perfectly.
test_df$order <- test_df$order*1/5
ggplot(data = test_df, aes(x = year, y = value, group = name)) +
geom_point(aes(color = order)) +
geom_text_repel(data = test_df,
aes(label = name, color = order),
hjust = "left",
nudge_y = test_df$order,
nudge_x = -.45,
direction = "y"
)

Create stacked bar plot with certain columns as 'fill' argument and row summaries as the 'x' argument

I have the datframe below:
Target_Category<-c("Adhesion","Cytochrome")
Validated<-c(5,10)
Candidate<-c(7,8)
dataf<-data.frame(Target_Category,Validated,Candidate)
and I want to create a stacked bar plot like the one in the image attached in which the fill argument should be given by the Validated and Candidate while the x argument should be the summary of each row.
I have tried something like:
p4 <- ggplot() + geom_bar(aes(y = , x = Target_Category, fill = ), data = dataf,
stat="identity")
p4
You'll need to reshape your dataset before plotting:
library(tidyverse)
Target_Category<-c("Adhesion","Cytochrome")
Validated<-c(5,10)
Candidate<-c(7,8)
dataf<-data.frame(Target_Category,Validated,Candidate)
dataf %>%
gather(col, value, -Target_Category) %>%
ggplot() +
geom_bar(aes(Target_Category, value, fill = col), stat="identity")
The rest is about selecting useful names for your variables (i.e what to show on the plot), picking the colours you like and place the legend wherever you want.

How to graph "before and after" measures using ggplot with connecting lines and subsets?

I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)

Resources