Jitter categorical variable and keep variable name in graph - r

I had to jitter points along a catagorical axis to avoid data overlay. Unfortunately, to do this, I needed to make my categorical variable a factor and then numerical. When I plot it, it remains numerical without the categorical labels. Is there a way I can get the labels to show up?
Here is the code:
levels(factor(All_VARs$Dataset))
[1] "Data1" "Data2" "Data3"
df$Dataset_jit <- jitter(as.numeric(factor(df$Dataset)))
ggplot(df, aes(x = POS_start, y = Dataset_jit, color = Type)) +
geom_point() +
scale_color_manual(values = annotation_color_associations) +
theme_classic()
I would like the y axis to be categorical, while maintaining the jitter.

You can use position = position_jitter():
ggplot(df, aes(x = POS_start, y = as.factor(Set), color = as.factor(Type))) +
geom_point(position = position_jitter(height = 0.2), show.legend = FALSE) +
theme_classic() +
scale_color_manual(values = colorRampPalette(c("pink", "purple"))(5)) +
labs(x = "CDS Position", y = "Dataset")
Edit:
OP says they need to be able to do other things, so another approach is to manually control the y-axis with scale_y_continuous:
df$Dataset_jit <- jitter(as.numeric(factor(df$Set)))
ggplot(df, aes(x = POS_start, y = Dataset_jit, color = as.factor(Type))) +
geom_point(show.legend = FALSE) +
theme_classic() +
scale_color_manual(values = colorRampPalette(c("pink", "purple"))(5)) +
scale_y_continuous(breaks = 1:3, labels = c("Data 1", "Data 2", "Data 3")) +
labs(x = "CDS Position", y = "Dataset")
Sample Data
set.seed(3)
df <- data.frame(POS_start = round(runif(n = 100,1,1500),0),
Set = sample(1:3,100, prob = c(0.45,0.1,0.45), replace = TRUE),
Type = sample(1:5,100,replace = TRUE))

Related

ggplot2: Keep x-axis labels in non-alphabetical order

I'm using ggplot2 to plot the annual occurrence of events in states. I want the state labels to be in the same order as shown in the data table "AZ CT NH NM DE..." but ggplot automatically reorganizes the state labels in alphabetical order "AZ CT DE NH...". I created groups so I could display ranges in "num" values (ex. NM and TN). Please ignore the group numbering--I took out some data points to make the table smaller.
ggplot(guidelines, aes(x = state, y = num, group = grp)) +
geom_point() + geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))
I have tried the suggestions of previous posts to use factor and levels like so:
guidelines$state <- factor(guidelines$state, levels = unique(guidelines$state)
But it does not work because I am using groups and repeating state names. Any ideas on how to get around this?
We can use ordered
library(dplyr)
library(ggplot2)
guidelines %>%
mutate(state =ordered(state, levels = unique(state))) %>%
ggplot(aes(x = state, y = num, group = grp)) +
geom_point() +
geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))
-output
Try this. You were close in that you must use unique(). Adding ordered=T inside the factor() will keep the desired order. Here the code (Please next time share your data using dput() as sometimes it can be complex to use data from screenshots in they are really big):
library(ggplot2)
#Data
guidelines <- data.frame(state=c('AZ','CT','NH','NM','NM','DE','NJ','TN','TN'),
num=c(10,10,10,5,10,5,5,2,5),
grp=c(3,4,17,19,19,5,18,25,25),stringsAsFactors = F)
#Format factor
guidelines$state <- factor(guidelines$state,levels = unique(guidelines$state),ordered = T)
#Plot
ggplot(guidelines, aes(x = state, y = num, group = grp)) +
geom_point() + geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))
Output:
Or as mentioned in comments by #TTS you can use this the scale_x_discrete() with limits option:
#Data
guidelines <- data.frame(state=c('AZ','CT','NH','NM','NM','DE','NJ','TN','TN'),
num=c(10,10,10,5,10,5,5,2,5),
grp=c(3,4,17,19,19,5,18,25,25),stringsAsFactors = F)
#Plot 2
ggplot(guidelines, aes(x = state, y = num, group = grp)) +
geom_point() + geom_line(linetype = "dotted") +
labs(x = "State", y = "Number") +
labs(title = "A") +
scale_y_continuous(breaks = seq(0, 11, 1),
limits=c(0,11))+
scale_x_discrete(limits=unique(guidelines$state))
Output:

adding a label in geom_line in R

I have two very similar plots, which have two y-axis - a bar plot and a line plot:
code:
sec_plot <- ggplot(data, aes_string (x = year, group = 1)) +
geom_col(aes_string(y = frequency), fill = "orange", alpha = 0.5) +
geom_line(aes(y = severity))
However, there are no labels. I want to get a label for the barplot as well as a label for the line plot, something like:
How can I add the labels to the plot, if there is only pone single group? is there a way to specify this manually? Until know I have only found option where the labels can be added by specifying them in the aes
EXTENSION (added a posterior):
getSecPlot <- function(data, xvar, yvar, yvarsec, groupvar){
if ("agegroup" %in% xvar) xvar <- get("agegroup")
# data <- data[, startYear:= as.numeric(startYear)]
data <- data[!claims == 0][, ':=' (scaled = get(yvarsec) * max(get(yvar))/max(get(yvarsec)),
param = max(get(yvar))/max(get(yvarsec)))]
param <- data[1, param] # important, otherwise not found in ggplot
sec_plot <- ggplot(data, aes_string (x = xvar, group = groupvar)) +
geom_col(aes_string(y = yvar, fill = groupvar, alpha = 0.5), position = "dodge") +
geom_line(aes(y = scaled, color = gender)) +
scale_y_continuous(sec.axis = sec_axis(~./(param), name = paste0("average ", yvarsec),labels = function(x) format(x, big.mark = " ", scientific = FALSE))) +
labs(y = paste0("total ", yvar)) +
scale_alpha(guide = 'none') +
theme_pubclean() +
theme(legend.title=element_blank(), legend.background = element_rect(fill = "white"))
}
plot.ExposureYearly <- getSecPlot(freqSevDataAge, xvar = "agegroup", yvar = "exposure", yvarsec = "frequency", groupvar = "gender")
plot.ExposureYearly
How can the same be done on a plot where both the line plot as well as the bar plot are separated by gender?
Here is a possible solution. The method I used was to move the color and fill inside the aes and then use scale_*_identity to create and format the legends.
Also, I needed to add a scaling factor for severity axis since ggplot does not handle the secondary axis well.
data<-data.frame(year= 2000:2005, frequency=3:8, severity=as.integer(runif(6, 4000, 8000)))
library(ggplot2)
library(scales)
sec_plot <- ggplot(data, aes(x = year)) +
geom_col(aes(y = frequency, fill = "orange"), alpha = 0.6) +
geom_line(aes(y = severity/1000, color = "black")) +
scale_fill_identity(guide = "legend", label="Claim frequency (Number of paid claims per 100 Insured exposure)", name=NULL) +
scale_color_identity(guide = "legend", label="Claim Severity (Average insurance payment per claim)", name=NULL) +
theme(legend.position = "bottom") +
scale_y_continuous(sec.axis =sec_axis( ~ . *1, labels = label_dollar(scale=1000), name="Severity") ) + #formats the 2nd axis
guides(fill = guide_legend(order = 1), color = guide_legend(order = 2)) #control which scale plots first
sec_plot

Why do two legends appear when manually editing in ggplot2?

I want to plot two lines, one solid and another one dotted, both with different colors. I'm having trouble dealing with the legends for this plot. Take this example:
library(ggplot2)
library(reshape2)
df = data.frame(time = 0:127,
mean_clustered = rnorm(128),
mean_true = rnorm(128)
)
test_data_long <- melt(df, id="time") # convert to long format
p = ggplot(data=test_data_long,
aes(x=time, y=value, colour=variable)) +
geom_line(aes(linetype=variable)) +
labs(title = "", x = "Muestras", y = "Amplitud", color = "Spike promedio\n") +
scale_color_manual(labels = c("Hallado", "Real"), values = c("blue", "red")) +
xlim(0, 127)
print(p)
Two legends appear, and on top of it, none of them is correct (the one with the right colors has wrong line styles, and the one with the right line styles has all other things wrong).
Why is this happening and how can I get the right legend to appear?
You need to ensure all the aesthetic mappings match between the different aesthetics you're using:
library(ggplot2)
library(reshape2)
data.frame(
time = 0:127,
mean_clustered = rnorm(128),
mean_true = rnorm(128)
) -> xdf
test_data_long <- melt(xdf, id = "time")
ggplot(
data = test_data_long,
aes(x = time, y = value, colour = variable)
) +
geom_line(aes(linetype = variable)) +
scale_color_manual(
name = "Spike promedio\n", labels = c("Hallado", "Real"), values = c("blue", "red")
) +
scale_linetype(
name = "Spike promedio\n", labels = c("Hallado", "Real")
) +
labs(
x = "Muestras", y = "Amplitud", title = ""
) +
xlim(0, 127)
Might I suggest also using theme parameters to adjust the legend title:
ggplot(data = test_data_long, aes(x = time, y = value, colour = variable)) +
geom_line(aes(linetype = variable)) +
scale_x_continuous(name = "Muestras", limits = c(0, 127)) +
scale_y_continuous(name = "Amplitud") +
scale_color_manual(name = "Spike promedio", labels = c("Hallado", "Real"), values = c("blue", "red")) +
scale_linetype(name = "Spike promedio", labels = c("Hallado", "Real")) +
labs(title = "") +
theme(legend.title = element_text(margin = margin(b=15)))

Set specific color in ggplot2 using scale_fill

I am visualizing missing data in R using this method which uses ggplot2:
library(reshape2)
library(ggplot2)
ggplot_missing <- function(x){
x %>%
is.na %>%
melt %>%
ggplot(data = .,
aes(x = Var2,
y = Var1)) +
geom_raster(aes(fill = value)) +
scale_fill_grey(name = "", labels = c("Present","Missing")) +
theme_minimal() +
theme(axis.text.x = element_text(angle=45, vjust=0.5)) +
labs(x = "Columns / Attributes",
y = "Rows / Observations")
}
The scale_fill_grey method uses black and grey. How can I change the color of the cells to a specific color, say "red"?
I have tried:
scale_fill_brewer(name = "", labels = c("Present","Missing"), na.val="red")
Also,
scale_fill_gradient(name = "", labels = c("Present","Missing"), low = "#FF69B4", high = "#FF0000")
But I get the error:
Error: Discrete value supplied to continuous scale
I got it to work by replacing scale_fill_grey with the following:
scale_fill_manual(name = "", values = c('my_color_1', 'my_color_2'), labels = c("Present","Missing")) +

ggplot2 legend with two different geom_point

I have the following ggplot graph with circles representing the observed data and the crosses the mean for each treatment :
d <- data.frame(Number = rnorm(12,100,20),
Treatment = rep(c("A","B","C", "D"), each = 3))
av <- aggregate(d["Number"], d["Treatment"], mean)
ggplot(data = d, aes(y = Number, x = Treatment)) +
geom_point(shape = 1, size = 6, color = "grey50") +
geom_point(data=av, shape = 4) +
theme_bw()
I would like to add a legend with the exact same symbols on top of the graphs but I'm a bit lost... I use aes to force the creation of legend and then try to modify it with manual scales but the result is not convincing. I would like to have one grey circle of size 6. That sounds also quite complicated for such a basic thing ... There is probably an easyier solution.
ggplot(data = d, aes(y = Number, x = Treatment)) +
geom_point(aes(shape = "1", size = "6", color = "grey50")) +
geom_point(data=av, aes(shape = "4")) +
theme_bw() +
scale_shape_manual(name = "", values = c(1,4), labels = c("observed values", "mean")) +
scale_size_manual(name = "", values = c(6,1), labels = c("observed values", "mean")) +
scale_color_manual(name = "", values = c("grey50","black"),
labels = c("observed values", "mean")) +
theme(legend.position = "top",
legend.key = element_rect(color = NA))
http://imagizer.imageshack.us/v2/320x240q90/842/4pgj.png
The ggplot2 way would be combining everything into a single data.frame like this:
av$Aggregated <- "mean"
d$Aggregated <- "observed value"
d <- rbind(d, av)
ggplot(data = d, aes(y = Number, x = Treatment,
shape=Aggregated, size=Aggregated, colour=Aggregated)) +
geom_point()
And than customize using manual scales and themes.

Resources