I was trying to add direct label to each curve in my density plot. The label is something like plots in this tutorial, however I can't get it worked.
Here is my data frame:
values <- runif(1200, 35, 60)
ind <- as.factor(rep(c(1:6), each=200))
inout <- as.factor(rep(c(1:2), each =600))
df <- data.frame(values,ind,inout)
Here is the density plot:
ggplot(df) +
geom_density(aes(x=values, group=interaction(ind,inout), colour=factor(inout)), alpha=1) +
geom_density(aes(x=values, group=inout, fill=factor(inout)), alpha=.4) +
theme(text = element_text(size=25)) +
theme(legend.justification=c(1,1), legend.position=c(1,1)) +
guides(colour=FALSE) +
scale_fill_discrete(name="Ave.",breaks=c("1", "2"),labels=c("S1", "S2"))
How can I add direct labels (i.e., 1 to 6) to each curve for two groups (i.e., S1 and S2)? Two averaged curves don't need to be labeled.
Thanks a lot.
Related
currently, I'm using ggplot2 to make density plot.
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
using this code, I can get the plot below.
However, I can get different plot if I used plot(density()...) function in R.
Y value starts from 0.
How can I make the ggplot's plot as like plot(density()...) in R?
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
ylim(0,range) #you can use this .
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
ggplot obviously cut off the x-axis at the min and max of the empirical distribution. You can extend the x-axis by adding xlim to the plot but please make sure that the plot does not exceed the theoretical limit of the distribution (in the example below, the theoretical limit is [0, 1], so there is not much reason to show outside the range).
set.seed(1)
temp <- data.frame(x =runif(100)^3)
library(ggplot2)
ggplot(temp, aes(x = x)) + geom_line(stat = "density" + xlim(-.2, 1.2)
plot(density(temp$x))
I want to create a (time-series) plot out of 40 million data points in order to show two regression lines with two specific events on each of it (first occurrence of an optimum in time-series).
Currently, I draw the regression lines and add a geom_vline to it to indicate the event.
As I want to be independent from colours in the plot, it would be beneficial if I could just plot the marker geom_vline as a point on the regression line.
Do you have any idea how to solve this using ggplot2?
My current approach is this here (replaced data points with test data):
library(ggplot2)
# Generate data
m1 <- "method 1"
m2 <- "method 2"
data1 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m1), 100))
data2 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m2), 100))
df <- rbind(data1, data2)
rm(data1, data2)
# Calculate first minima for each Type
m1_intercept <- df[which(df$Type == m1), ][which.min(df[which(df$Type == m1), ]$Value),]
m2_intercept <- df[which(df$Type == m2), ][which.min(df[which(df$Type == m2), ]$Value),]
# Plot regression and vertical lines
p1 <- ggplot(df, aes(x=Time, y=Value, group=Type, colour=Type), linetype=Type) +
geom_smooth(se=F) +
geom_vline(aes(xintercept=m1_intercept$Time, linetype=m1_intercept$Type)) +
geom_vline(aes(xintercept=m2_intercept$Time, linetype=m2_intercept$Type)) +
scale_linetype_manual(name="", values=c("dotted", "dashed")) +
guides(colour=guide_legend(title="Regression"), linetype=guide_legend(title="First occurrence of optimum")) +
theme(legend.position="bottom")
ggsave("regression.png", plot=p1, height=5, width=7)
which generates this plot:
My desired plot would be something like this:
So my questions are
Does it make sense to indicate a minimum value on a regression line? The values y-axis position would be in fact wrong but just to indicate the timepoint?
If yes, how can I achieve such a behaviour?
If no, what would you think could be better?
Thank you very much in advance!
Robin
If you first run your ggplot() call with only geom_smooth(), you can access plotted values through ggplot_build(), which we then can use to plot points on the two fitted lines. Example:
# Create initial plot
p1<-ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F)
# Now we can access the fitted values
smooths <- ggplot_build(p1)$data[[1]]
smooths_1 <- smooths[smooths$group==1,] # First group (method 1)
smooths_2 <- smooths[smooths$group==2,] # Second group (method 2)
# Then we find the closest plotted values to the minima
smooth_1_x <- smooths_1$x[which.min(abs(smooths_1$x - m1_intercept$Time))]
smooth_2_x <- smooths_2$x[which.min(abs(smooths_2$x - m2_intercept$Time))]
# Subset the previously defined datasets for respective closest values
point_data1 <- smooths_1[smooths_1$x==smooth_1_x,]
point_data2 <- smooths_1[smooths_2$x==smooth_2_x,]
Now we use point_data1 and point_data2 to place the points on your plot:
ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F) +
geom_point(data=point_data1, aes(x=x, y=y), colour = "red",size = 5) +
geom_point(data=point_data2, aes(x=x, y=y), colour = "red", size = 5)
To reproduce this plot, you can use set.seed(42) for your data generation step.
I have scatterplots of 2D data from two categories. I want to add density lines for each dimension -- not outside the plot (cf. Scatterplot with marginal histograms in ggplot2) but right on the plotting surface. I can get this for the x-axis dimension, like this:
set.seed(123)
dim1 <- c(rnorm(100, mean=1), rnorm(100, mean=4))
dim2 <- rnorm(200, mean=1)
cat <- factor(c(rep("a", 100), rep("b", 100)))
mydf <- data.frame(cbind(dim2, dim1, cat))
ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() +
stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
It looks like this:
But I want an analogous pair of density curves running vertically, showing the distribution of points in the y-dimension. I tried
stat_density(aes(y=dim2, x=0+(..scaled..))), position="identity", geom="line)
but receive the error "stat_density requires the following missing aesthetics: x".
Any ideas? thanks
You can get the densities of the dim2 variables. Then, flip the axes and store them in a new data.frame. After that it is simply plotting them on top of the other graph.
p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() +
stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
stuff <- ggplot_build(p)
xrange <- stuff[[2]]$ranges[[1]]$x.range # extract the x range, to make the new densities align with y-axis
## Get densities of dim2
ds <- do.call(rbind, lapply(unique(mydf$cat), function(lev) {
dens <- with(mydf, density(dim2[cat==lev]))
data.frame(x=dens$y+xrange[1], y=dens$x, cat=lev)
}))
p + geom_path(data=ds, aes(x=x, y=y, color=factor(cat)))
So far I can produce:
distrib_horiz <- stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() + distrib_horiz
And:
distrib_vert <- stat_density(data=mydf, aes(x=dim2, y=(-2+(..scaled..))),
position="identity", geom="line")
ggplot(data=mydf, aes(x=dim2, y=dim1, colour=as.factor(cat))) +
geom_point() + distrib_vert + coord_flip()
But combining them is proving tricky.
So far I have only a partial solution since I didn't manage to obtain a vertical stat_density line for each individual category, only for the total set. Maybe this can nevertheless help as a starting point for finding a better solution. My suggestion is to try with the ggMarginal() function from the ggExtra package.
p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() + stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
library(ggExtra)
ggMarginal(p,type = "density", margins = "y", size = 4)
This is what I obtain:
I know it's not perfect, but maybe it's a step in a helpful direction. At least I hope so. Looking forward to seeing other answers.
I have data for 4 sectors (A,B,C,D) and 5 years. I would like to draw 4 lines, 1 for each sector, adding a point for every year and add a fifth line representing the mean line using the stat_summary statement and controlling the line colors by means of scale_color_manual and point shapes in aes() argument. The problem is that if I add the point geom the legend is split in two parts one for point shapes and one for line colors. I didn't understand how to obtain 1 legend combining colors and points.
Here is an example. First of all let's build the data frame dtfr as follows:
a <- 100; b <- 100; c <- 100; d <- 100
for(k in 2:5){
a[k] <- a[k-1]*(1+rnorm(1)/100)
b[k] <- b[k-1]*(1+rnorm(1)/100)
c[k] <- c[k-1]*(1+rnorm(1)/100)
d[k] <- d[k-1]*(1+rnorm(1)/100)
}
v <- numeric()
for(k in 1:5){ v <- c(v,a[k],b[k],c[k],d[k]) }
dtfr <- data.frame(Year=rep(2008:2012,1, each=4),
Sector=rep(c("A","B","C","D"),5),
Value=v,
stringsAsFactors=F)
Now let us start to draw our graph by ggpolt2. In the first graph we draw lines and points geom without the mean line:
library(ggplot2)
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
# stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
ggtitle("Test for ggplot2 graph")
In this graph we have the legend with line colors and point shapes all in one:
But if I use the stat_summary to draw the mean line using the following code:
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
ggtitle("Test for ggplot2 graph")
I get the mean (red) line but the legend is split into two parts one for line colors and one for point shapes. At this point my question is: How can I get the mean line graph with the legend like the one in the first graph? That is, how to get only one legend combining lines and shapes in the second graph where is drawn the mean line?
Try this:
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
stat_summary(aes(colour="mean",shape="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
scale_shape_manual(values=c(1:4, 32)) +
ggtitle("Test for ggplot2 graph")
Maybe someone more knowledgeable can come in and correct my explanation (or provide a better solution), but here's how I understand it: You have 5 values in the color scale, but you only have 4 in the shape scale; you're missing a value for "mean". So the scales aren't really compatible in a way. You can fix this by assigning a blank shape (32) to your mean line.
Here is a different approach that calculates the summary/mean beforehand and adds it as an additional level to the data frame before building the plot.
The approach can be used to easily add an additional line but with a specific color, which may be desired for a summary/mean for example.
First, I calculate the mean and add it to the dtfr of the OP.
dtfr2 <- dtfr %>%
dplyr::group_by(Year) %>%
dplyr::summarise(Value = mean(Value)) %>%
dplyr::mutate(Sector = NA) %>%
dplyr::bind_rows(dtfr)
dtfr2 now has additional rows with the mean values stored in Value and NAs in Sector.
Then, building the plot is easy:
p1 <- ggplot(dtfr2, aes(x=Year, y=Value, color = Sector, shape = Sector)) +
geom_line() +
geom_point()
Finally, you may tweak the legend a little:
p1 +
scale_color_discrete(labels = c(letters[1:4], "M"), na.value = "black") +
scale_shape_discrete(labels = c(letters[1:4], "M"))
Is there any way to plot the cumulative probability from a frequency table? I mean a "smooth" version of it, similar to the way geom_density() plots.
So far, I managed to plot the individually calculated probabilities as points joined by lines, but it doesn't look very good.
I generate some test data:
set.seed(1)
x <- sort(sample(1:100, 20))
p <- runif(x); p <- cumsum(p)/sum(p)
table <- data.frame(x=x, prob=p)
You can use geom_smooth from the ggplot2 package.
require("ggplot2")
qplot(x=x, y=p, data=table, aes(ymin=0, ymax=1)) + ylab("ecf") +
geom_smooth(se=F, stat="smooth", method="loess", fullrange=T, fill="lightgrey", size=1)
As an alternative, an easy way to specifiy smoothing by a parameter try DeconCdf from the decon package:
require("decon")
plot(DeconCdf(x, sig=1))
If you want to use ggplot, you first have to transform the Decon function object in a data.frame.
f <- DeconCdf(x, sig=1)
m <- ggplot(data=data.frame(x=f$x, p=f$y), aes(x=x, y=p, ymin=0, ymax=1)) + ylab("ecf")
m + geom_line(size=1)
Use the sig-Parameter as your smoothing parameter:
f <- DeconCdf(x, sig=0.3)
m <- ggplot(data=data.frame(x=f$x, p=f$y), aes(x=x, y=p, ymin=0, ymax=1)) + ylab("ecf")
m + geom_line(size=1)
This version plots a histogram with a smoothed line from geom_density:
# Generate some data:
set.seed(28986)
x2 <- rweibull(100, 1, 1/2)
# Plot the points:
library(ggplot2)
library(scales)
ggplot(data.frame(x=x2),aes(x=x, y=1-cumsum(..count..)/sum(..count..))) +
geom_histogram(aes(fill=..count..)) +
geom_density(fill=NA, color="black", adjust=1/2) +
scale_y_continuous("Percent of units\n(equal to or larger than x)",labels=percent) +
theme_grey(base_size=18)
Note that I've used 1 - "cumulative probability" due to individual preference (I think it looks better and I'm accustomed to dealing with "reliability" metrics), but obviously that's just a preference that you could ignore by removing the 1- part in the aes.