This question already has answers here:
Easier way to plot the cumulative frequency distribution in ggplot?
(3 answers)
Closed 4 years ago.
I have a data frame, which after applying the melt function looks similar to:
var val
1 a 0.6133426
2 a 0.9736237
3 b 0.6201497
4 b 0.3482745
5 c 0.3693730
6 c 0.3564962
..................
The initial dataframe had 3 columns with the column names, a,b,c and their associated values.
I need to plot on the same graph, using ggplot the associated ecdf for each of these columns (ecdf(a),ecdf(b),ecdf(c)) but I am failing in doing this. I tried:
p<-ggplot(melt_exp,aes(melt_exp$val,ecdf,colour=melt_exp$var))
pg<-p+geom_step()
But I am getting an error :arguments imply differing number of rows: 34415, 0.
Does anyone have an idea on how this can be done? The graph should look similar to the one returned by plot(ecdf(x)), not a step-like one.
Thank you!
My first thought was to try to use stat_function, but since ecdf returns a function, I couldn't get that working quickly. Instead, here's a solution the requires that you attach the computed values to the data frame first (using Ramnath's example data):
library(plyr) # function ddply()
mydf_m <- ddply(mydf_m, .(variable), transform, ecd = ecdf(value)(value))
ggplot(mydf_m,aes(x = value, y = ecd)) +
geom_line(aes(group = variable, colour = variable))
If you want a smooth estimate of the ECDF you could also use geom_smooth together with the function ns() from the spline package:
library(splines) # function ns()
ggplot(mydf_m, aes(x = value, y = ecd, group = variable, colour = variable)) +
geom_smooth(se = FALSE, formula = y ~ ns(x, 3), method = "lm")
As noted in a comment above, as of version 0.9.2.1, ggplot2 has a specific stat for this purpose: stat_ecdf. Using that, we'd just do something like this:
ggplot(mydf_m,aes(x = value)) + stat_ecdf(aes(colour = variable))
Based on Ramnath, approach above, you get the ecdf from ggplot2 by doing the following:
require(ggplot2)
mydf = data.frame(
a = rnorm(100, 0, 1),
b = rnorm(100, 2, 1),
c = rnorm(100, -2, 0.5)
)
mydf_m = melt(mydf)
p0 = ggplot(mydf_m, aes(x = value)) +
stat_ecdf(aes(group = variable, colour = variable))
print(p0)
Here is one approach
require(ggplot2)
mydf = data.frame(
a = rnorm(100, 0, 1),
b = rnorm(100, 2, 1),
c = rnorm(100, -2, 0.5)
)
mydf_m = melt(mydf)
p0 = ggplot(mydf_m, aes(x = value)) +
geom_density(aes(group = variable, colour = variable)) +
opts(legend.position = c(0.85, 0.85))
Related
I created these two violin plots in R, using:
install.packages("vioplot")
par(mfrow = c(1, 2))
vioplot::vioplot(HEL$Y,las=2,main="HEL$Y",col="deepskyblue",notch=TRUE)
vioplot::vioplot(ITA$Y,las=2,main="ITA$Y",col="aquamarine",notch=TRUE)
as a result I get the following. However, I don't know why in the X axis I get 1 and 2. How can I get rid of the 2?
Thanks for your help.
This mysterious behavior is due to the use of the argument "notch = TRUE". Example:
set.seed(456)
vioplot(rnorm(10), notch = TRUE)
My interpretation is that notch is not an argument of vioplot, so the function interprets it as data to add to the graph (see the little smudge at y = 1: that's where it wants to put the new data, since TRUE equals 1 when it is converted into a numeric).
To confirm that an unknown argument is interpreted as data to be plotted, here is a little experiment:
vioplot(rnorm(10), unknown_argument = rnorm(10))
And the result:
This is a ggplot2 solution in case you're interested.
library(ggplot2)
library(dplyr)
# Recreate similar data
HEL <- data.frame(Y = rnorm(50, 8, 3))
ITA <- data.frame(Y = rnorm(50, 9, 2))
# Join in a single dataframe and reshape to longer format
dat <- bind_rows(rename(HEL, hel_y = Y),
rename(ITA, ita_y = Y)) |>
tidyr::pivot_longer(everything())
# Make the plots
dat |>
ggplot(aes(name, value)) +
geom_violin(aes(fill = name)) +
geom_boxplot(width = 0.1) +
scale_fill_manual(values = c("deepskyblue", "aquamarine")) +
theme(legend.position = "")
Created on 2022-04-28 by the reprex package (v2.0.1)
I would like to force render a smoother line for this multi-group plot, even in situations where a group has only one or two values. see below:
library(ggplot2)
set.seed(1234)
df <- data.frame(group = factor(c(rep("A",3),rep("B",2),"C")), x = c(1,2,3,1,2,2), value = runif(6))
ggplot(df,aes(x=x,y=value,group=group,color=group))+
geom_point(size=2)+
geom_line(stat="smooth",method = "loess",size = 2, alpha = 0.3)
Here's The output I want to see:
The call gives a lot of warnings which can be inspected by warnings(). One of the warnings says "zero-width neighborhood. make span bigger".
So, I tried OP's code with the additional span = 1 parameter:
library(ggplot2)
ggplot(df, aes(x = x, y = value, group = group, color = group)) +
geom_point(size = 2) +
geom_line(
stat = "smooth",
method = "loess",
span = 1,
size = 2,
alpha = 0.3
)
and got smoothed curves for groups A and B with only 3 and 2 data points, resp.
I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.
I have a data frame mydataAll with columns DESWC, journal, and highlight. To calculate the average and standard deviation of DESWC for each journal, I do
avg <- aggregate(DESWC ~ journal, data = mydataAll, mean)
stddev <- aggregate(DESWC ~ journal, data = mydataAll, sd)
Now I plot a horizontal stripchart with the values of DESWC along the x-axis and each journal along the y-axis. But for each journal, I want to indicate the standard deviation and average with a simple line. Here is my current code and the results.
stripchart2 <-
ggplot(data=mydataAll, aes(x=mydataAll$DESWC, y=mydataAll$journal, color=highlight)) +
geom_segment(aes(x=avg[1,2] - stddev[1,2],
y = avg[1,1],
xend=avg[1,2] + stddev[1,2],
yend = avg[1,1]), color="gray78") +
geom_segment(aes(x=avg[2,2] - stddev[2,2],
y = avg[2,1],
xend=avg[2,2] + stddev[2,2],
yend = avg[2,1]), color="gray78") +
geom_segment(aes(x=avg[3,2] - stddev[3,2],
y = avg[3,1],
xend=avg[3,2] + stddev[3,2],
yend = avg[3,1]), color="gray78") +
geom_point(size=3, aes(alpha=highlight)) +
scale_x_continuous(limit=x_axis_range) +
scale_y_discrete(limits=mydataAll$journal) +
scale_alpha_discrete(range = c(1.0, 0.5), guide='none')
show(stripchart2)
See the three horizontal geom_segments at the bottom of the image indicating the spread? I want to do that for all journals, but without handcrafting each one. I tried using the solution from this question, but when I put everything in a loop and remove the aes(), it give me an error that says:
Error in x - from[1] : non-numeric argument to binary operator
Can anyone help me condense the geom_segment() statements?
I generated some dummy data to demonstrate. First, we use aggregate like you have done, then we combine those results to create a data.frame in which we create upper and lower columns. Then, we pass these to the geom_segment specifying our new dataset. Also, I specify x as the character variable and y as the numeric variable, and then use coord_flip():
library(ggplot2)
set.seed(123)
df <- data.frame(lets = sample(letters[1:8], 100, replace = T),
vals = rnorm(100),
stringsAsFactors = F)
means <- aggregate(vals~lets, data = df, FUN = mean)
sds <- aggregate(vals~lets, data = df, FUN = sd)
df2 <- data.frame(means, sds)
df2$upper = df2$vals + df2$vals.1
df2$lower = df2$vals - df2$vals.1
ggplot(df, aes(x = lets, y = vals))+geom_point()+
geom_segment(data = df2, aes(x = lets, xend = lets, y = lower, yend = upper))+
coord_flip()+theme_bw()
Here, the lets column would resemble your character variable.
I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.