Plotting average of multiple variables in time-series using ggplot - r

I have a file which contains time-series data for multiple variables from a to k.
I would like to create a graph that plots the average of the variables a to k over time and above and below that average line adds a smoothed area representing maximum and minimum variation on each day.
So something like confidence intervals but in a smoothed version.
Here's the dataset:
https://dl.dropbox.com/u/22681355/co.csv
and here's the code I have so far:
library(ggplot2)
library(reshape2)
meltdf <- melt(df,id="Year")
ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) + geom_line()

This depicts bootstrapped 95 % confidence intervals:
ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) +
stat_summary(fun.data = "mean_cl_boot", geom = "smooth")
This depicts the mean of all values of all variables +-1SD:
ggplot(meltdf,aes(x=Year,y=value)) +
stat_summary(fun.data ="mean_sdl", mult=1, geom = "smooth")
You might want to calculate the year means before calculating the means and SD over the variables, but I leave that to you.
However, I believe a boostrap confidence interval would be more sensible, since the distribution is clearly not symmetric. It would also be narrower. ;)
And of course you could log-transform your values.

Related

Dot-and-whisker coefficient plots using only mean and 95% confidence interval estimates (not the data that produced them)

I have this:
workday <- data.frame(Measure = c("A", "A", "A"),
Session = c("Welcome", "Class 1", "Lunch Talk"),
Mean = c(7.10, 8.90, 4.47),
Ci95 = c(0.40, 0.56, 0.33))
I need to create a coefficient plot similar to this from the package dwplot, where y-axis represents different categorical values of Session. The estimated mean should be a point, and the lower and upper 95% confidence intervals should be plotted as a horizontal line running through its corresponding mean.
I don't have the raw data used to produce the estimated mean (Mean) and 95% confidence intervals (Ci95) - just the values themselves, as seen in workday. This is equivalent to a dwplot() with an argument position = identity from ggplot2.
I can get here:
workday %>%
ggplot(aes(x=Mean, y=Session)) +
geom_point(position="identity") +
ggtitle("A")
But it obviously does not include the horizontal confidence interval line I need.
How can I use ggplot2 (or dwplot) to produce the desired result?
Maybe try this:
library(tidyverse)
#Code
workday %>%
mutate(Low=Mean-Ci95,High=Mean+Ci95) %>%
ggplot(aes(x=Session, y=Mean)) +
geom_point() +
geom_errorbar(aes(ymin=Low,ymax=High),width=0)+
ggtitle("A")+
coord_flip()
Output:

How to detect where the width of loess 95% CI reach a certain amount?

I've fitted a loess curve with 95% CI, and now I'd like to be able to determine where the CI get to a certain width.
For example, using the "cars" dataset:
plot <- ggplot (cars, aes (x=speed, y=dist)) +
geom_point() +
stat_smooth (method= "loess", se=TRUE) +
xlab("Speed")+
ylab("Distance")+
theme_bw()
plot
I'd like to be able to find out at what values of "Speed" the CIs are equal to 20 units of distance. Looking at the plot, it might be approximately 7 and 24.
Thanks!
You can use ggplot_build(plot) to extract relevant data about the layers built in ggplot2 and other misc information.
In this case, the limits for the confidence intervals are in the ymin and ymax columns and can be reached with:
foo <- ggplot_build(plot)
foo[["data"]][[2]]
You can then do a simple mutate to examine the differences between ymax and ymin and at what "Speed" the CI gap reaches 20 through the x column.
mutate_info <- foo[["data"]][[2]] %>% dplyr::mutate(ci_gap = ymax-ymin)

Density plot in R (ggplot2), colored by variable, returning very different distribution than histogram and frequency plot?

I've combed through several questions on here already and I can't seem to figure out what's happening with my density plots.
I have a set of radiocarbon dates which are attributed to different cultures. I need to display the frequencies of dates through time, but distinguish the dates by culture. A stacked histogram does the job (Fig. 1), but their use is generally discouraged, so that's out of the question, yet I want something smoother than a frequency plot (Fig. 2).
Figure 1: Histogram
Figure 2: Frequency plot
When I produce a density plot coloured by culture (Fig. 3), the relative distribution of the cultures on the y-axis change drastically from their counts. For example, in the density plot, the blue density curve is much higher than that of the purple; yet, in the histogram, we can see that there are way more dates attributed to the purple group.
Figure 3: Density plot
Am I doing something wrong with my code (see below)? Or perhaps I need to scale the density curves in some way? Or is there something about density plots I'm not understanding? (Disclaimer: my knowledge of stats is fairly weak)
Thanks in advance!
ggplot(test, aes(x=CalBP))+
theme_tufte(base_family="sans")+
theme(axis.line=element_line(), axis.text=element_text(color="black")) +
theme(legend.position="none") +
theme(text=element_text(size=14)) +
geom_density(aes(color=factor(Culture), fill=factor(Culture)), alpha = 0.5) +
scale_x_reverse() +
labs(x="Cal. B.P.") +
ylab(expression("Density")) +
coord_cartesian(xlim = c(4773, 225)) +
scale_fill_manual(values=c("#cf9045", "#ebe332", "#5f9388", "#6abeef", "#9d88d6")) +
scale_color_manual(values=c("#cf9045", "#ebe332", "#5f9388", "#6abeef", "#9d88d6"))
The difference is that a density plot is scaled so that the total area under the curve is 1. It's function is to model a probability density function, which (by definition) has area 1.
If every group in your data had the same number of observations, then the only difference between the density plot and the histogram would be the y-axis. When you have different numbers of observations, the density plot normalizes for this (each will have total area 1), while the bars of the histogram are much higher for the group with more observations.
In base R, you can fix this in the histogram by setting freq = FALSE, but I've not seen density plots scaled up to histograms - it's usually more interesting to ignore the effects of the relative sample sizes.

R: Plot interaction between categorial Factor and continuous Variable on DV

What I have is a 3-Levels Repeated Measures Factor and a continuous variable (Scores in psychological questionnaire, measured only once pre-experiment, NEO), which showed significant interaction together in a Linear Mixed Effects Model with a Dependent Variable (DV; State-Scores measured at each time level, IAS).
To see the nature of this interaction, I would like to create a plot with time levels on X-Axis, State-Score on Y-Axis and multiple curves for the continuous variable, similar to this. The continuous variable should be categorized in, say quartiles (so I get 4 different curves), which is exactly what I can't achieve. Until now I get a separate curve for each value in the continuous variable.
My goal is also comparable to this, but I need the categorial (time) variable not as separate curves but on the X-Axis.
I tried out a lot with different plot functions in R but did'nt manage to get what I want, maybe because I am not so skilled in dealing with R.
F. e.
gplot(Data_long, aes(x = time, y = IAS, colour = NEO, group = NEO)) +
geom_line()
from the first link shows me dozens of curves (one for each value in the measurement NEO) and I can't find how to group continuous variables in a meaningful way in that gplot function.
Edit:
Original Data:
http://www.pastebin.ca/2598926
(I hope it is not too inconvenient.)
This object (Data_long) was created/converted with the following line:
Data_long <- transform(Data_long0, neo.binned=cut(NEO,c(25,38,46,55,73),labels=c(".25",".50",".75","1.00")))
Every value in the neo.binned col seems to be set correctly with enough cases per quantile.
What I then tried and didn't work:
ggplot(Data_long, aes(x = time, y = ias, color = neo.binned)) + stat_summary(fun.y="median",geom="line")
geom_path: Each group consist of only one observation. Do you need to adjust the group >aesthetic?
I got 92 subjects and values for NEO between 26-73. Any hints what to enter for cut and labels function? Quantiles are 0% 25% 50% 75% 100% 26 38 46 55 73.
Do you mean something like this? Here, your data is binned according to NEO into three classes, and then the median of IAS over these bins is drawn. Check out ?cut.
Data_long <- transform(Data_long, neo.binned=cut(NEO,c(0,3,7,10),labels=c("lo","med","hi")))
Plot everything in one plot.
ggplot(Data_long, aes(x = time, y = IAS, color = neo.binned))
+ stat_summary(aes(group=neo.binned),fun.y="median",geom="line")
And stealing from CMichael's answer you can do it all in multiple (somehow you linked to facetted plots in your question):
ggplot(Data_long,aes(x=time,y=IAS))
+ stat_summary(fun.y="median",geom="line")
+ facet_grid(neo.binned ~ .)
Do you mean facetting #ziggystar initial Plot?
quantiles = quantile(Data_long$NEO,c(0.25,0.5,0.75))
Data_long$NEOQuantile = ifelse(Data_long$NEO<=quantiles[1],"first NEO Quantile",
ifelse(Data_long$NEO<=quantiles[2],
"second NEO Quantile",
ifelse(Data_long$NEO<=quantiles[3],
"third NEO Quantile","forth NEO Quantile")))
require(ggplot2)
p = ggplot(Data_long,aes(x=time,y=IAS)) + stat_quantile(quantiles=c(1),formula=y ~ x)
p = p + facet_grid(.~NEOQuantile)
p

Scaled/weighted density plot

I want to generate a density plot of observed temperatures that is scaled by the number of events observed for each temperature data point. My data contains two columns: Temperature and Number [of observations].
Right now, I have a density plot that only incorporates the Temperature frequency according to:
plot(density(Temperature, na.rm=T), type="l", bty="n")
How do I scale this density to account for the Number of observations at each temperature? For example, I want to be able to see the temperature density plot scaled to show if there are greater/fewer observations for each temperature at higher/lower temperatures.
I think I'm looking for something that could weight the temperatures?
I think you can get what you want by passing a weights argument to density. Here's an example using ggplot
dat <- data.frame(Temperature = sort(runif(10)), Number = 1:10)
ggplot(dat, aes(Temperature)) + geom_density(aes(weights=Number/sum(Number)))
And to do this in base (using DanM's data):
plot(density(dat$Temperature,weights=dat$Number/sum(dat$Number),na.rm=T),type='l',bty='n')

Resources