I would like to force render a smoother line for this multi-group plot, even in situations where a group has only one or two values. see below:
library(ggplot2)
set.seed(1234)
df <- data.frame(group = factor(c(rep("A",3),rep("B",2),"C")), x = c(1,2,3,1,2,2), value = runif(6))
ggplot(df,aes(x=x,y=value,group=group,color=group))+
geom_point(size=2)+
geom_line(stat="smooth",method = "loess",size = 2, alpha = 0.3)
Here's The output I want to see:
The call gives a lot of warnings which can be inspected by warnings(). One of the warnings says "zero-width neighborhood. make span bigger".
So, I tried OP's code with the additional span = 1 parameter:
library(ggplot2)
ggplot(df, aes(x = x, y = value, group = group, color = group)) +
geom_point(size = 2) +
geom_line(
stat = "smooth",
method = "loess",
span = 1,
size = 2,
alpha = 0.3
)
and got smoothed curves for groups A and B with only 3 and 2 data points, resp.
Related
On the same ggplot figure, I am trying to have the points (from geom_point), the lines (from geom_line) and the errorbars (from geom_errorbar) on the same "plane" (i.e. not overlapping), this for each factor.
As you can see the "layering" of the errorbars is not following the "layering" of the lines (not mentionning the points).
Here is a reproducible example:
# reproducible example
# package
library(dplyr)
library(ggplot2)
# generate the data
set.seed(244)
d1 <- data.frame(time_serie = as.factor(rep(rep(1:3, each = 6), 3)),
treatment = as.factor(rep(c("HIGH", "MEDIUM", "LOW"), each = 18)),
value = runif(54, 1, 10))
# create the error intervals
d2 <- d1 %>%
dplyr::group_by(time_serie,treatment) %>%
dplyr::summarise(mean_value = mean(value),
SE_value = sd(value/sqrt(length(value)))) %>%
as.data.frame()
# plot
p1 <- ggplot(aes(x = time_serie, y = mean_value, color = treatment, group = treatment), data=d2)
p1
p1a <- p1 + geom_errorbar(aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value), width = .2, position = position_dodge(0.3), size =1) +
geom_point(aes(), position = position_dodge(0.3), size = 3) +
geom_line(aes(color = treatment), position=position_dodge(0.3), size =1)
p1a
Any idea?
Any help would be greatly appreciated :)
Thanks a lot!
Valérian
Up front: this is a partial answer that has two notable issues still to fix (see the end). Edit: the two issues have been resolved, see the far bottom.
I'll change the "dodge" slightly to clarify the point, identify an area of concern, and demonstrate a suggested workaround.
# generate the data
set.seed(244)
d1 <- data.frame(time_serie = as.factor(rep(rep(1:3, each = 6), 3)),
treatment = as.factor(rep(c("HIGH", "MEDIUM", "LOW"), each = 18)),
value = runif(54, 1, 10))
# create the error intervals
d2 <- d1 %>%
dplyr::group_by(time_serie,treatment) %>%
dplyr::summarise(mean_value = mean(value),
SE_value = sd(value/sqrt(length(value)))) %>%
dplyr::arrange(desc(treatment)) %>%
as.data.frame()
# plot
ggplot(aes(x = time_serie, y = mean_value, color = treatment, group = treatment), data=d2) +
geom_errorbar(aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value),
width = 0.2, position = position_dodge(0.03), size = 2) +
geom_point(aes(), position = position_dodge(0.03), size = 3) +
geom_line(aes(color = treatment), position = position_dodge(0.03), size = 2)
Namely, I'll assume that we want HIGH (red) points/lines/error-bars as the top-most layer, masked by nothing. We can see a clear violation of this in the right-most bar: the red dot is over the green errorbar but under the green line.
Unless/until there is an aes(layer=..) aesthetic (there is not afaik), you need to add layers one treatment at a time. While one could hard-code this with nine geoms, you can automate this with lapply. Note that ggplot(.) + list(geom1,geom2,geom3) works just fine, even with nested lists.
I'll control the order of layers with rev(levels(d2$treatment)), assuming that you want LOW as the bottom-most layer (ergo added first). The order of geoms within the list is what defines their layers. Technically we still have a single treatment's errorbar, point, and line on different layers, but they are consecutive so appear to be the same.
ggplot(aes(x = time_serie, y = mean_value, color = treatment, group = treatment), data=d2) +
lapply(rev(levels(d2$treatment)), function(trtmnt) {
list(
geom_errorbar(data = ~ subset(., treatment == trtmnt),
aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value),
width = 0.2, position = position_dodge(0.03), size = 2),
geom_point(data = ~ subset(., treatment == trtmnt), aes(), position = position_dodge(0.03), size = 3),
geom_line(data = ~ subset(., treatment == trtmnt), position = position_dodge(0.03), size = 2)
)
})
(Side note: I use levels(d2$treatment) and data=~subset(., treatment==trtmnt) here, but that's just one way to do it. Another would be lapply(split(d2, d2$treatment), function(x) ...) and use data=x in all of the inner geoms. This latter method allows for multi-variable grouping, if desired. I see no immediate advantage to one over the other.)
The problems with this:
The order of the legend is not consistent with the order of levels of the factor, somehow that is lost. (To be clear, I don't demonstrate this very well here: I can move "medium" to the middle of the legend using levels<-, and it works with the non-lapply rendering code with incorrect layering, but it is again lost with the lapply-geoms.)
position_dodge no longer has awareness of the other treatments, so it does not dodge the other errorbars. The only way around this (not demonstrated here) would be to manually dodge before plotting, shown below.
1: Order of legend elements
This one was solved in lapply'd geoms lose factor-ordering, where we just need to add scale_color_discrete(drop=FALSE).
2: Dodging
The dodge issue can be fixed by using real numerics in the x aesthetic. This is kind of a hack, as it is no longer done by ggplot2 but controlled externally. It's also applying an offset and not dodging, per se. But it does get the desired results.
d2$time_serie2 <- as.integer(as.character(d2$time_serie)) + as.numeric(d2$treatment)/10
ggplot(aes(x = time_serie2, y = mean_value, color = treatment, group = treatment), data = d2) +
lapply(rev(levels(d2$treatment)), function(trtmnt) {
list(
geom_errorbar(data = ~ subset(., treatment == trtmnt),
aes(ymin = mean_value - SE_value, ymax = mean_value + SE_value),
width = 0.2, size = 2),
geom_point(data = ~ subset(., treatment == trtmnt), aes(), size = 3),
geom_line(data = ~ subset(., treatment == trtmnt), size = 2)
)
}) +
scale_color_discrete(drop = FALSE)
I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.
I am generating density plots for observations. The observations belong to a species and some are also connected to an individual ID.
With the data below, I want to generate a line for each level of IndID for species One and Two, and only a single line for Species Three, which does not include IndID. There are related questions on SO, but not with reproducible data and looking for different results.
library(ggplot2)
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 30), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 6),rep(NA,50) ),
Value = sample(1:20, replace = T))
Keeping the color ascetic on the Species level, I want to create multiple lines for Species One and Two (green and red) and a single blue line for species Three.
ggplot(dat, aes(Value)) + geom_density(aes(color = Species), size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))
If you want to be able to tell them apart, you can set the linetype to IndID. Note, however, that you will need to change the NA to some other value to (easily) get it to plot.
I also expanded your data a little bit to give enough values per individual to show meaningful lines. I also used geom_line(stat = "density") instead of geom_density() because it omits the line along the bottom and gives legends with lines instead of boxes.
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 60), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 12),rep("NA",50) ),
Value = sample(1:20, 110, replace = T))
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))
gives
If you want the lines to all be solid, you can run:
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red")) +
scale_linetype_manual(values = rep("solid", 6)) +
guides(linetype = "none")
(or use group as #Henrik suggested in zir comment)
I have recently came across a problem with ggplot2::geom_density that I am not able to solve. I am trying to visualise a density of some variable and compare it to a constant. To plot the density, I am using the ggplot2::geom_density. The variable for which I am plotting the density, however, happens to be a constant (this time):
df <- data.frame(matrix(1,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(5,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
This is OK and something I would expect. But, when I shift this distribution to the far right, I get a plot like this:
df <- data.frame(matrix(71,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(75,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
which probably means that the kernel estimation is still taking 0 as the centre of the distribution (right?).
Is there any way to circumvent this? I would like to see a plot like the one above, only the centre of the kerner density would be in 71 and the vline in 75.
Thanks
Well I am not sure what the code does, but I suspect the geom_density primitive was not designed for a case where the values are all the same, and it is making some assumptions about the distribution that are not what you expect. Here is some code and a plot that sheds some light:
# Generate 10 data sets with 100 constant values from 0 to 90
# and then merge them into a single dataframe
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100),facet=v)
}
df <- do.call(rbind,dfs)
# facet plot them
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
So it is not doing what you thought it was, but it is also probably not doing what you want. You could of course make it "translation-invariant" (almost) by adding some noise like this for example:
set.seed(1234)
noise <- +rnorm(100,0,1e-3)
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100)+noise,facet=v)
}
df <- do.call(rbind,dfs)
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
Note that there is apparently a random component to the geom_density function, and I can't see how to set the seed before each instance, so the estimated density is a bit different each time.
There were example code for E on ggplot2 library:
theme_set(theme_bw())
dat = data.frame(value = rnorm(100,sd=2.5))
dat = within(dat, {
value_scaled = scale(value, scale = sd(value))
obs_idx = 1:length(value)
})
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line() + geom_point()
There is a question: How I can make in ggplot2 my first 10 lines in red and the rest lines in blue based on example? I tried to use some kind of layer syntax is, but it doesn't work.
First, add another column to your data frame dat. It has value 0 for the first 10 rows and 1 for the rest.
dat$group <- factor(rep.int(c(0, 1), c(10, nrow(dat)-10)))
Generate the plot:
library(ggplot2)
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line(aes(colour = group), show_guide = FALSE) +
scale_colour_manual(values = c("red", "blue")) +
geom_point()
The parameter show_guide = FALSE suppresses the legend for the red and blue lines.
OK, I could manage layers, the code is (not elegant, but works):
require(ggplot2)
value=round(rnorm(50,200,50),0)
nmbrs<-length(value) ## length of vector
obrv<-1:length(value) ## list of observations
#create data frame from the values
data_lj<-data.frame(obrv,value)
data_lj20<-data.frame(data_lj[1:20,1:2])
data_lj21v<-data.frame(data_lj[20:nmbrs,1:2])
#plot with ggplot
rr<-ggplot()+
layer(mapping=aes(obrv,value),geom="line",data=data_lj20,colour="red")+
layer(mapping=aes(obrv,value),geom="line",data=data_lj21v,colour="blue")
print(rr)