Related
I have the following data on school enrollment for two years. I want to highlight data from school H in my plot and in the legend by giving it a different alpha.
library(tidyverse)
schools <- c("A","B","C","D","E",
"F","G","H","I","J")
yr2010 <- c(601,809,604,601,485,485,798,662,408,451)
yr2019 <- c(971,1056,1144,933,732,833,975,617,598,822)
data <- data.frame(schools,yr2010,yr2019)
I did some data management to get the data ready for plotting.
data2 <- data %>%
gather(key = "year", value = "students", 2:3)
data2a <- data2 %>%
filter(schools != "H")
data2b <- data2 %>%
filter(schools == "H")
Then I tried to graph the data using two separate geom_line plots, one for school H with default alpha and size=1.5, and one for the remaining schools with alpha=.3 and size=1.
ggplot(data2, aes(x=year,y=students,color=schools,group=schools)) +
theme_classic() +
geom_line(data = data2a, alpha=.3, size=1) +
scale_color_manual(values=c("red","orange","green","skyblue","aquamarine","purple",
"pink","brown","black")) +
geom_line(data = data2b, color="blue", size=1.5)
However, the school I want to highlight is not included in the legend. So I tried to include the color of school H in scale_color_manual instead of in the geom_line call.
ggplot(data2, aes(x=year,y=students,color=schools,group=schools)) +
theme_classic() +
geom_line(data = data2a, alpha=.3, size=1) +
scale_color_manual(values=c("red","orange","green","skyblue","aquamarine","purple",
"pink","blue","brown","black")) +
geom_line(data = data2b, size=1.5)
However, now the alphas in the legend are all the same, which doesn't highlight school H as much as I'd like.
How can I call the plot so that the legend matches the alpha of the line itself for all schools?
You need to put alpha and size categories in aes like you put color. Then, you can use scale_alpha_manual and scale_size_manual with respect to your need. Also, by that there is no need for creating data2a and data2b.
See below code:
ggplot(data2, aes(x=year,y=students,color=schools,group=schools,
alpha=schools, size = schools)) +
theme_classic() +
geom_line() +
scale_color_manual(values=c("red","orange","green","skyblue","aquamarine","purple",
"pink","blue","brown","black")) +
scale_alpha_manual(values = c(0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3,NA, 0.3, 0.3)) +
#for the default alpha, you can write 1 or NA
scale_size_manual(values= c(1,1,1,1,1,1,1,1.5,1,1))
The code brings this plot. Please click.
I hope it will be useful.
I'm plotting correlations in ggpairs and am splitting the data based on a filter.
The density plots are normalising themselves on the number of data points in each filtered group. I would like them to normalise on the total number of data points in the entire data set. Essentially, I would like to be able to have the sum of the individual density plots be equal to the density plot of the entire dataset.
I know this probably breaks the definition of "density plot", but this is a presentation style I'd like to explore.
In plain ggplot, I can do this by adding y=..count.. to the aesthetic, but ggpairs doesn't accept x or y aesthetics.
Some sample code and plots:
set.seed(1234)
group = as.numeric(cut(runif(100),c(0,1/2,1),c(1,2)))
x = rnorm(100,group,1)
x[group == 1] = (x[group == 1])^2
y = (2 * x) + rnorm(100,0,0.1)
data = data.frame(group = as.factor(group), x = x, y = y)
#plot of everything
data %>%
ggplot(aes(x)) +
geom_density(color = "black", alpha = 0.7)
#the scaling I want
data %>%
ggplot(aes(x,y=..count.., fill=group)) +
geom_density(color = "black", alpha = 0.7)
#the scaling I get
data %>%
ggplot(aes(x, fill=group)) +
geom_density(color = "black", alpha = 0.7)
data %>% ggpairs(., columns = 2:3,
mapping = ggplot2::aes(colour=group),
lower = list(continuous = wrap("smooth", alpha = 0.5, size=1.0)),
diag = list(continuous = wrap("densityDiag", alpha=0.5 ))
)
Are there any suggestions that don't involve reformatting the entire dataset?
I am not sure I understand the question but if the densities of both groups plus the density of the entire data is to be plotted, it can easily be done by
Getting rid of the grouping aesthetics, in this case, fill.
Placing another call to geom_density but this time with inherit.aes = FALSE so that the previous aesthetics are not inherited.
And then plot the densities.
library(tidyverse)
data %>%
ggplot(aes(x, y=..count.., fill = group)) +
geom_density(color = "black", alpha = 0.7) +
geom_density(mapping = aes(x, y = ..count..),
inherit.aes = FALSE)
I have got a data frame with several 1000 rows in the form of
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
and would like to make a kind of heatmap in which one axes has a continuous scale (position). The color column is categorical. However due to the large amount of data points I want to use binning, i.e. use it as a continuous variable.
This is more or less how the plot should look like:
I can't think of a way to create such a plot using ggplot2/R. I have tried several geometries, e.g. geom_point()
ggplot(data=df, aes(x=strain, y=pos, color=color)) +
geom_point() +
scale_colour_gradientn(colors=c("yellow", "black", "orange"))
Thanks for your help in advance.
Does this help you?
library(ggplot2)
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))
Looks like this
Improved version with 3 color gradient if you like
library(scales)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))+ scale_fill_gradientn(colours=c("orange","black","yellow"),values=rescale(c(1, 2, 3)),guide="colorbar")
I have the following example.
require(ggplot2)
# Example Data
x <- data.frame(var1=rnorm(800,0,1),
var2=rnorm(800,0,1),
var3=rnorm(800,0,1),
type=factor(rep(c("x", "y"), length.out=800)),
set=factor(rep(c("A","B","C","D"), each=200))
)
Now, I would like to plot (thin) parallel coordinate plots of these lines, with points for each of the variable values. I would like to overlay a boxplot (each of a different color for each method) on these parallel coordinate plots at the variables values. On top of this, I would like to facet for the groups and types, say using set~type. Is this possible to do using ggplot2?
Any suggestions? Thanks!
You need to put data in long format first. I didn't put in points, since the graph is already cluttered enough, but you can do so by adding a geom_point.
require(tidyr)
x$id <- 1:nrow(x)
x2 <- gather(x, var, value, var1:var3)
Boxplots
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_boxplot(aes(fill = var), alpha = 0.5) +
facet_grid(set ~ type) +
theme_bw()
Or perhaps violins
Replacing the boxplots with violins looks pretty cool as well.
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_violin(aes(fill = var), col = NA, alpha = 0.6) +
facet_grid(set ~ type) +
theme_bw()
I have a melted data set which also includes data generated from normal distribution. I want to plot empirical density function of my data against normal distribution but the scales of the two produced density plots are different. I could find this post for two separate data sets:
Normalising the x scales of overlaying density plots in ggplot
but I couldn't figure out how to apply it to melted data. Suppose I have a data frame like this:
df<-data.frame(type=rep(c('A','B'),each=100),x=rnorm(200,1,2)/10,y=rnorm(200))
df.m<-melt(df)
using the code below:
qplot(value,data=df.m,col=variable,geom='density',facets=~type)
produces this graph:
How can I make the two densities comparable given the fact that normal distribution is the reference plot? (I prefer to use qplot instead of ggplot)
UPDATE:
I want to produce something like this (i.e. in terms of plot-comparison) but with ggplot2:
plot(density(rnorm(200,1,2)/10),col='red',main=NA) #my data
par(new=T)
plot(density(rnorm(200)),axes=F,main=NA,xlab=NA,ylab=NA) # reference data
which generates this:
Is this what you had in mind?
There's a built-in variable, ..scaled.. that does this automatically.
set.seed(1)
df<-data.frame(type=rep(c('A','B'),each=100),x=rnorm(200,1,2)/10,y=rnorm(200))
df.m<-melt(df)
ggplot(df.m) +
stat_density(aes(x=value, y=..scaled..,color=variable), position="dodge", geom="line")
df<-data.frame(type=rep(c('A','B'),each=100),x = rnorm(200,1,2)/10, y = rnorm(200))
df.m<-melt(df)
require(data.table)
DT <- data.table(df.m)
Insert a new column with the scaled value into DT. Then plot.
This is the image code:
DT <- DT[, scaled := scale(value), by = "variable"]
str(DT)
ggplot(DT) +
geom_density(aes(x = scaled, color = variable)) +
facet_grid(. ~ type)
qplot(data = DT, x = scaled, color = variable,
facets = ~ type, geom = "density")
# Using fill (inside aes) and alpha outside(so you don't get a legend for it)
ggplot(DT) +
geom_density(aes(x = scaled, fill = variable), alpha = 0.2) +
facet_grid(. ~ type)
qplot(data = DT, x = scaled, fill = variable, geom = "density", alpha = 0.2, facets = ~type)
# Histogram
ggplot(DT, aes(x = scaled, fill = variable)) +
geom_histogram(binwidth=.2, alpha=.5, position="identity") +
facet_grid(. ~ type, scales = "free")
qplot(data = DT, x = scaled, fill = variable, alpha = 0.2, facets = ~type)