How to colour my scatter plot with data that I've combined? - r

I have two datasets (I will eventually be working with eight) that I have combined to create a scatter plot. The issue is, now I've plotted the scatter graph, I do not know how to separate the data I've combined so the colours represent the individual datasets. This is my code...
#Here is what I've combined:
t<-rbind(test202, test342)
#Here is plotting the scatter-plot
```{r}
g<-ggplot(t,aes(x=percentage,y=as.numeric(area), col = area, group = area ))+
geom_point()+
labs(x=expression(paste("Percentage (%)")),
y=expression(paste("Area (m"^2,")"))
)+
scale_y_continuous(breaks = seq(0, 20, by = 1)) +
theme_bw() +
theme(element_blank())
g
``
I've tried googling this but cannot seem to find any specific code to fix this. I've attached a picture of the final product and of 't's contents.
contents of 't' scatter-plot of 't'
EDIT: dput(t)
t <- structure(list(percentage = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L), .Label = c("30", "35",
"40", "45", "50", "55", "60", "65", "70", "75", "80", "85", "90",
"95"), class = "factor"), area = c(1.0068507612755, 1.28144642344154,
1.55604208560758, 1.92216963516231, 2.28829718471704, 2.65442473427176,
3.20361605860385, 3.75280738293594, 4.39353059465671, 5.30884946854352,
6.49876400459638, 7.96327420281528, 10.068507612755, 13.6382512209135,
1.12935650675177, 1.4004020683722, 1.67144762999262, 1.98766745188312,
2.34906153404369, 2.75562987647432, 3.16219821890496, 3.65911508187574,
4.15603194484652, 4.74329732835744, 5.37573697213844, 6.18887365699971,
7.22788164321134, 8.89932927320397)), row.names = c(NA, -28L), class = "data.frame")

Looks like you are just losing track of the source of the data when you rbind the rows together. Since you say you have more dfs to join eventiually I will provide a dplyr solution.
library(dplyr)
library(ggplot2)
t <- dplyr::bind_rows(list(test202 = test202,
test342 = test342),
.id = 'source')
ggplot(t, aes(x = percentage,
y = as.numeric(area),
col = source,
group = source ))+
geom_point()+
labs(x = expression(paste("Percentage (%)")),
y = expression(paste("Area (m"^2,")"))
)+
scale_y_continuous(breaks = seq(0, 20, by = 1)) +
theme_bw() +
theme(element_blank())

Related

Labeling a single point with ggrepel

I am trying to use geom_label_repel to add labels to a couple of data points on a plot. In this case, they happen to be outliers on box plots. I've got most of the code working, I can label the outlier, but for some reason I am getting multiple labels (equal to my sample size for the entire data set) mapped to that point. I'd like just one label for this outlier.
Example:
Here is my data:
dput(sus_dev_data)
structure(list(time_point = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("3", "8", "12"), class = "factor"),
days_to_pupation = c(135L, 142L, 143L, 155L, 149L, 159L,
153L, 171L, 9L, 67L, 53L, 49L, 72L, 67L, 55L, 64L, 60L, 122L,
53L, 51L, 49L, 53L, 50L, 56L, 44L, 47L, 60L)), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 20L, 21L, 22L, 23L, 24L, 26L, 27L, 28L, 29L, 30L), class = "data.frame")
and my code...
####################################################################################################
# Time to pupation statistical analysis
####################################################################################################
## linear model
pupation_Model=lm(sus_dev_data$days_to_pupation~sus_dev_data$time_point)
pupationANOVA=aov(pupation_Model)
summary(pupationANOVA)
# Tukey test to study each pair of treatment :
pupationTUKEY <- TukeyHSD(x=pupationANOVA, which = 'sus_dev_data$time_point',
conf.level=0.95)
## Function to generate significance labels on box plot
generate_label_df <- function(pupationTUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- pupationTUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels, reversed = TRUE)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
return(Tukey.labels)
}
#generate labels using function
labels<-generate_label_df(pupationTUKEY , "sus_dev_data$time_point")
#rename columns for merging
names(labels)<-c('Letters','time_point')
# obtain letter position for y axis using means
pupationyvalue<-aggregate(.~time_point, data=sus_dev_data, max)
#merge dataframes
pupationfinal<-merge(labels,pupationyvalue)
####################################################################################################
# Time to pupation plot
####################################################################################################
# Plot of data
(pupation_plot <- ggplot(sus_dev_data, aes(time_point, days_to_pupation)) +
Alex_Theme +
geom_boxplot(fill = "grey80", outlier.size = 0.75) +
geom_text(data = pupationfinal, aes(x = time_point, y = days_to_pupation,
label = Letters),vjust=-2,hjust=.5, size = 4) +
#ggtitle(expression(atop("Days to pupation"))) +
labs(y = 'Days to pupation', x = 'Weeks post-hatch') +
scale_y_continuous(limits = c(0, 200)) +
scale_x_discrete(labels=c("3" = "13", "8" = "18",
"12" = "22")) +
geom_label_repel(aes(x = 1, y = 9),
label = '1')
)
Here's a shorter example to demonstrate what is going on. Essentially, your labels are beng recycled to be the same length as the data.
df = data.frame(x=1:5, y=1:5)
ggplot(df, aes(x,y, color=x)) +
geom_point() +
geom_label_repel(aes(x = 1, y = 1), label = '1')
You can override this by providing new data for the ggrepel
ggplot(df, aes(x,y, color=x)) +
geom_point() +
geom_label_repel(data = data.frame(x=1, y=1), label = '1')
Based on your data, you have 3 outliers (one in each group), you can manually identify them by applying the classic definition of outliers by John Tukey (Upper: Q3+1.5*IQR and Lower: Q1-1.5*IQR) (but you are free to set your own rules to define an outlier). You can use the function quantile and IQR to get those points.
Here, I incorporated them in a sequence of pipe using dplyr package:
library(tidyverse)
Outliers <- sus_dev_data %>% group_by(time_point) %>%
mutate(Out_up = ifelse(days_to_pupation > quantile(days_to_pupation,0.75)+1.5*IQR(days_to_pupation), "Out","In"))%>%
mutate(Out_Down = ifelse(days_to_pupation < quantile(days_to_pupation,0.25)-1.5*IQR(days_to_pupation), "Out","In")) %>%
filter(Out_up == "Out" | Out_Down == "Out")
# A tibble: 3 x 4
# Groups: time_point [3]
time_point days_to_pupation Out_up Out_Down
<fct> <int> <chr> <chr>
1 3 9 In Out
2 8 122 Out In
3 12 60 Out In
As mentioned by #dww, you need to pass a new dataframe to geom_label_repel if you want your outliers to be single labeled. So, here we use the dataframe Outliers to feed the geom_label_repel function:
library(ggplot2)
library(ggrepel)
ggplot(sus_dev_data, aes(time_point, days_to_pupation)) +
#Alex_Theme +
geom_boxplot(fill = "grey80", outlier.size = 0.75) +
geom_text(data = pupationfinal, aes(x = time_point, y = days_to_pupation,
label = Letters),vjust=-2,hjust=.5, size = 4) +
#ggtitle(expression(atop("Days to pupation"))) +
labs(y = 'Days to pupation', x = 'Weeks post-hatch') +
scale_y_continuous(limits = c(0, 200)) +
scale_x_discrete(labels=c("3" = "13", "8" = "18",
"12" = "22")) +
geom_label_repel(inherit.aes = FALSE,
data = Outliers,
aes(x = time_point, y = days_to_pupation, label = "Out"))
And you get the following graph:
I hope it helps you to figure it how to label all your outliers.

labeling each data point

I have a data like this
df<-structure(list(X = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 10L,
9L, 11L, 12L, 8L), .Label = c("A", "B", "C", "D", "E", "F", "GG",
"IR", "MM", "TT", "TTA", "UD"), class = "factor"), X_t = c(3.7066,
3.6373, 3.2693, 2.5626, 2.4144, 2.2868, 2.1238, 1.8671, 1.7627,
1.4636, 1.4195, 1.0159), NEW = structure(c(8L, 7L, 9L, 1L, 2L,
3L, 4L, 5L, 6L, 10L, 11L, 12L), .Label = c("12-Jan", "14-Jan",
"16-Jan", "19-Jan", "25-Jan", "28-Jan", "4-Jan", "Feb-38", "Feb-48",
"Jan-39", "Jan-41", "Jan-66"), class = "factor")), class = "data.frame", row.names = c(NA,
-12L))
I am trying to put the label for each dot but I get a warning
here is how I plot it
ggplot(data=df)+
geom_point(aes(X_t, X,size =X_t,colour =X_t,label = NEW))
also I want to merge the two legend into one because it is redundant, if you have any tips let me know please
Use geom_text for text (e.g., labels):
ggplot(data=df, aes(X_t, X)) +
geom_point(aes(size = X_t, colour = X_t)) +
geom_text(aes(label = NEW), nudge_y = 0.5) +
guides(color = guide_legend(), size = guide_legend())
Aesthetics you specify in the ggplot() call will be inherited by subsequent layeres (geoms). So by putting the x and y aesthetics in ggplot(), we don't have to re-specify them again.
As for the legend question, see this answer for details. To combine color and size legends we use guide_legend.

Issues with displaying data points on every frame of facet_wrap/facet_grid object

I'm trying to produce a plot with either facet_wrap or facet_grid (no preference at this time), but display a selection of data points on every frame within the facet_wrap/facet_grid object.
I read that you can simply remove the facetting variable from the data set you want included on every plot, but for whatever reason this doesn't seem to be working for me.
This is on Rstudio Version 1.1.453.
I found this code sample:
ggplot(mpg, aes(displ, hwy)) +
geom_point(data = transform(mpg, class = NULL), colour = "grey85") +
geom_point() +
facet_wrap(~class)
And pretty much copied it for my code below. The above code works fine, but for whatever reason in my implementation it returns an error message. Note I've tried setting both geom features to geom_point also with no luck.
ggplot(data = Total, aes(Total$Time, Total$Killing)) +
geom_jitter(data = transform(Total, Run = NULL), colour = "grey85") +
geom_point() +
facet_wrap(~Run)
Error: Aesthetics must be either length 1 or the same as the data (2700): x, y
This is the error message I've been encountering on attempting to run this code.
Ultimately my goal is to run the below code, but I simplified it a bit for the purposes of the question above.
ggplot(data = filter(Total, Cell_Line != "stDev"), aes(x= Time, y=Killing)) +
geom_line(data = filter(select(Total, -Run), Cell_Line == "Wild_Type"), aes(x = Time, y = filter(Total, Cell_Line == "Wild_Type")[,3])) +
geom_errorbar(aes(x = filter(Total, Cell_Line == "Wild_Type")[,2], ymax = filter(Total, Cell_Line == "Wild_Type")[,3] + filter(Total, Cell_Line == "stDev")[,3], ymin = filter(Total, Cell_Line == "Wild_Type")[,3] - filter(Total, Cell_Line == "stDev")[,3])) +
geom_point() +
facet_wrap(~Run)
And here's the result of dput(Total) trimmed down to the first 30 rows:
structure(list(Cell_Line = structure(c(5L, 12L, 13L, 1L, 2L,
3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 15L, 14L, 5L, 12L, 13L, 1L,
2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 15L, 14L), .Label = c("17",
"19", "20", "29", "3", "33", "38", "47", "49", "53", "55", "7",
"8", "stDev", "Wild_Type"), class = "factor"), Time = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("00",
"02", "04", "08", "12", "18", "24", "32", "40", "48", "56", "64",
"72", "80"), class = "factor"), Killing = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0704388, 0.2881066, -0.0132908,
0.04700991, 0.03049371, -0.02243472, 0.1513817, 0.129636, 0.09328508,
0.05876777, 0.1063291, 0.0357473, 0.1974026, 0.07732854, 0.07383331
)), row.names = c(NA, 30L), class = "data.frame")
Your call to transform has an error: you don't have a column named Run.
set.seed(1)
Total$Run <- sample(1:100, 30)
# this is your own code:
ggplot(data = Total, aes(Total$Time, Total$Killing)) +
geom_jitter(data = transform(Total, Run = NULL), colour = "grey85") +
geom_point() +
facet_wrap(~Run)
Which produces this plot:

New error when producing boxplot

So I had this script working yesterday on a different data set, an it actually worked once on this data set, but when I tried to combine it with another figure using plot_grid, I got this error:
Error:
T_SHOW_BACKTRACE environmental variable.
Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
Now when I try to construct the boxplot itself, I get the same error...
Here is my data:
dput(SUICMass)
structure(list(ChillTime = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("2", "4", "6", "24",
"27", "29", "31"), class = "factor"), Mass = c(1.2687, 1.5417,
1.6898, 1.7655, 2.413, 2.0333, 2.0824, 1.2676, 1.4916, 2.1585,
2.2453, 1.3624, 1.2951, 2.4209, 2.0804, 1.9227, 1.9032, 2.1063,
1.7601, 1.9905, 1.9837, 1.6312, 1.8567, 1.4433, 1.9369, 2.1029,
2.0265, 1.3212, 1.2971, 1.5823, 1.4759, 1.2745, 0.714, 1.5693,
1.7906, 1.8607, 1.8851, 1.9192, 1.6307, 1.4269, 1.7011, 0.8249,
1.7198, 1.3939, 1.394, 2.1527, 1.288, 1.4724, 1.5264, 1.6562,
1.5796, 1.4982, 1.2794, 1.6021, 0.6345, 2.4041, 2.0246, 1.8398,
1.349, 2.0156, 1.1563, 2.0462)), .Names = c("ChillTime", "Mass"
), row.names = c(NA, -62L), class = "data.frame")
Here is my code:
library(ggplot2)
library(multcompView)
library(plyr)
library(gridExtra)
library(cowplot)
## Box plot for Susans WMA population
SUICMass <- read.csv('SUICMass_Test_June_28_2017.csv', header = TRUE)
SUICMass$ChillTime <- factor(SUICMass$ChillTime, levels=c("2", "4", "6", "24", "27", "29", "31"))
generate_label_df <- function(SUICMassTUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- SUICMassTUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
return(Tukey.labels)
}
SUICMassmodel=lm(SUICMass$Mass~SUICMass$ChillTime )
SUICMassANOVA=aov(SUICMassmodel)
# Tukey test to study each pair of treatment :
SUICMassTUKEY <- TukeyHSD(x=SUICMassANOVA, 'SUICMass$ChillTime', conf.level=0.95)
labels<-generate_label_df(SUICMassTUKEY , "SUICMass$ChillTime")#generate labels using function
names(labels)<-c('Letters','ChillTime')#rename columns for merging
SUICMassyvalue<-aggregate(.~ChillTime, data=SUICMass, max)# obtain letter position for y axis using means
SUICMassfinal<-merge(labels,SUICMassyvalue) #merge dataframes
SUICMassPlot <- ggplot(SUICMass, aes(x = ChillTime, y = Mass)) +
stat_boxplot(geom ='errorbar', width=.2) +
geom_blank() +
theme_bw() +
theme(panel.border = element_rect(fill=NA, colour = "black", size=0.75)) +
theme(axis.text.x = element_text(face="bold")) +
theme(axis.text.y = element_text(face="bold")) +
labs(x = 'Time (weeks)', y = 'Mass (g)') +
ggtitle(expression(atop(bold("Fresh Mass"), atop(italic("(Sarah's - UIC Colony)"))))) +
theme(plot.title = element_text(hjust = 0.5, vjust = -0.6, face='bold')) +
geom_boxplot(fill = 'dodgerblue1', stat = "boxplot") +
geom_text(data = SUICMassfinal, aes(x = ChillTime, y = Mass, label = Letters),vjust=-2,hjust=.5) +
scale_y_continuous(limit = c(0, 3.5))
I can't figure out what the issue is here, because sometimes I can get the script to work and other times not.

ggplot: assign color of lines by their value at one specific point of the x-axis

I have a plot of lines with colors from black to green. However, I want to color the lines gradual by their y-value at "Value2" on the x-axis. The line with the highest y-value at "Value2" should be green, the one with the lowest y-value at "Value2" should be black.
How can I assign the color to the lines by their y values at a specific point of the x-axis?
My code:
library(ggplot2)
x <- structure(list(ID = c("1998-06-05_area2", "1999-07-11_area2",
"1998-05-13_area1", "1998-05-20_area1", "1998-06-05_area2", "1999-07-11_area2",
"1998-05-13_area1", "1998-05-20_area1", "1998-06-05_area2", "1999-07-11_area2",
"1998-05-13_area1", "1998-05-20_area1"), variable = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("Value1",
"Value2", "Value3"), class = "factor"), value = c(322, 280, 210,
416, 384, 252, 329, 601, 83, 66, 100, 147)), .Names = c("ID",
"variable", "value"), na.action = structure(c(1L, 2L, 3L, 4L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 25L, 26L, 27L, 28L), .Names = c("1",
"2", "3", "4", "13", "14", "15", "16", "17", "18", "19", "20",
"25", "26", "27", "28"), class = "omit"), row.names = c(5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 21L, 22L, 23L, 24L), class = "data.frame")
pal <- colorRampPalette(c("black","green"))
colorlist <- pal(length(unique(x$ID)))
ggplot(data = x , aes(x = variable, y = value, color = ID)) +
geom_line(aes(group =ID),size=1) + geom_point(size = 2) +
scale_colour_manual(values=colorlist)
We can use dplyr to to create an extra column inside your data for the appropriate colour mapping, and consequently pipe it into the ggplot() call to generate the plot.
library(dplyr)
library(ggplot2)
x %>% group_by(ID) %>%
mutate(col = value[variable == "Value2"]) %>% # Add column to map colours
ggplot(aes(x = variable, y = value, color = factor(col))) +
geom_line(aes(group =ID),size=1) + geom_point(size = 2) +
scale_colour_manual(values=colorlist)

Resources