How to highlight certain days on a timeseries in ggplot2? - r

dateVec <- as.Date(c("08-01-2015","08-02-2015","08-03-2015","08-04-2015","08-05-2015"),format="%m-%d-%Y")
myData <- data.frame(dat=c(.1,.2,-.1,1,.1),
dates=dateVec,
indicator=c(0,0,0,1,0))
ggplot(myData,aes(x=dates,y=dat)) + geom_point()
I manually altered the plot here to shade the area around the datapoint with the highest value, where 'indicator' = 1.
How could I create this shading in ggplot automatically? Ideally I'd like the shaded area to have width, even though the x value is categorical. I've played with coloring the geom_point objects themselves according to the indicator, and while that works it doesn't really pop visually the way I would like it to.

Related

Setting a fixed color scale for a series of data in ggplot2

I've been searching for a while, and I've found a number of answers for problems similar to mine, but not quite working when I try to implement them.
I'm trying to make a series of radar plots for different observations of performance. The data has been normalized such that the mean is 0 and the standard deviation is 1, and the y-axis on the plot has been set from -3 to 3 so as to make it visually comparable how well the subjects performed, with more extreme observations being worse. I would like to add colors associated with that scale, preferably such that -1 to 1 is green, and then the bands between +/- 1-2 is yellow and +/- 2-3 is red. All the examples I've been able to find relating to color fills is based directly in the data or from factors rather than a fixed scale, and anything I try seems to not show correctly. I'm not even sure if it is normally in the functionality of ggplot to be able to set a color scale in the way I'm looking for...
Here's the toy data I've been working with while working out the plotting (after reshaping):
variable <- c("time", "distance", "turns")
value <- c(0.9536197, 0.5842319, -2.1814528)
df <- data.frame(variable, value)
and here's my most recent attempt as far as ggplot code goes (using ggiraphExtra):
ggplot(temp, aes(x=variable, y=value, group=1)) + geom_point() + geom_polygon() +
ggiraphExtra:::coord_radar() + ylim(-3,3) +
scale_fill_gradient(low="red", high="green")
and this is the output:
radar plot with solid green geom_polygon fill

separate density plots with same colors

I'm trying to create a plot that has hundreds of background densities and a single observed density. I'd like each of the background densities to have the same appearance (color, linetype, etc.) and the observed density will be the only one that is different.
Is there a simple way to "group" the background data so that each density is plotted as a separate line without dividing by color/shape/fill/etc...?
Here is the closest I've come, but I'd like the density plots to all have the same appearance:
df <- replicate(100,{rnorm(20,0)}) %>% cbind(rnorm(20,1)) %>% melt
ggplot(df) +
geom_density(aes(x=value, color=factor(Var2)),alpha=.1) +
geom_density(data=df[df$Var2==101,], aes(x=value),color="black")

Adding group mean lines to geom_bar plot and including in legend

I want to be able to create a bar graph which shows also shows the mean value for bars in each group. AND shows the mean bar in the legend.
I have been able to get this graph Bar chart with means using the code below, which is fine, but I would like to be able to see the mean lines in the legend.
##The data to be graphed is the proportion of persons receiving a treatment
## (num=numerator) in each population (denom=demoninator). The population is
##grouped by two age groups and (Age) and further divided by a categorical
##variable V1
###SET UP DATAFRAME###
require(ggplot2)
df <- data.frame(V1 = c(rep(c("S1","S2","S3","S4","S5"),2)),
Age= c(rep(70,5),rep(80,5)),
num=c(5280,6570,5307,4894,4119,3377,4244,2999,2971,2322),
denom=c(9984,12600,9425,8206,7227,7290,8808,6386,6206,5227))
df$prop<-df$num/df$denom*100
PopMean<-sum(df$num)/sum(df$denom)*100
df70<-df[df$Age==70,]
group70mean<-sum(df70$num)/sum(df70$denom)*100
df80<-df[df$Age==80,]
group80mean<-sum(df80$num)/sum(df80$denom)*100
df$PopMean<-c(rep(PopMean,10))
df$groupmeans<-c(rep(group70mean,5),rep(group80mean,5))
I want the plot to look like this, but want the lines in the legend too, to be labelled as 'mean of group' or similar.
#basic plot
P<-ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity")
P
####add mean lines
P+geom_errorbar(aes(y=df$groupmeans, ymax=df$groupmeans,
ymin=df$groupmeans), col="red", lwd=2)
Adding show.legend=TRUE overlays the error bars onto the factor legend, rather than separately. If there is a way of showing geom_errorbar separately in the legend this is probably the simplest solution.
I have also tried various things with geom_line
The syntax below produces a line for the population mean value, but running from the centre of each point rather than covering the width of the bars
This produces a line for the population mean and it does produce a legend but one showing a bar of colour rather than a line.
P+geom_line(aes(y=df$PopMean, group=df$PopMean, color=df$PopMean),lwd=1)
If i try to do lines for group means the lines are not visible (because they are only single points).
P+geom_line(aes(y=df$groupmeans, group=df$groupmeans, color=df$groupmeans))
I also tried to get round this with facet plot, although this requires me to pretend my categorical variable is numeric to get it to work.
###set up new df
df2<-df
df2$V1<-c(rep(c(1,2,3,4,5),2))
P<-ggplot(df2, aes(x=factor(V1), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(),
colour='black',stat="identity",width=1)
P+facet_grid(.~factor(df2$Age))
P+facet_grid(.~factor(df2$Age))+geom_line(aes(y=df$groupmeans,
group=df$groupmeans, color=df$groupmeans))
Facetplot
This allows me to show the mean lines, using geom_line, so a legend does appear (although it doesn't look right, showing a colour gradient rather than coloured lines!). However, the lines still do not go the full width of the bars. Also my x-axis now needs relabelling to show S1, S2 etc rather than numeric 1,2,3
To sum up - is there a way of showing error bar lines separately in the legend?
If not, then, if i use facetting, how do I correct the legend appearance and relabel axes with my categorical variables and is is possible to get the line to go the full width of the plot?
Or is there an alternate solution that I am missing!?
Thanks
To get the legend for the geom_error you need to pass the colour argument in the aes.
As you want only one category (here red), I've create a dummy variable first
df$mean <- "Mean"
ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity") +
geom_errorbar(aes (ymax=groupmeans,
ymin=groupmeans, colour=mean), lwd=2) +
scale_colour_manual(name="",values = "#ff0000")

Standardize Color Range For Multiple Plots

I am plotting multiple dataframes, where the color of the line is dependent on a variable in the dataframe. The problem is that for each plot, R makes the color spectrum relative to the range of each plot.
I would like for the range (and corresponding colors) to be kept constant for all of the dataframes I'm using. I won't know the range of numbers in advance, though they'll all be set before plotting. In addition, there will hundreds of values, so a manual mapping is not feasible.
As of right now, I have:
library(ggplot2)
df1 <- as.data.frame(list('x'=1:5,'y'=1:5,'colors'=6:10))
df2 <- as.data.frame(list('x'=1:5,'y'=1:5,'colors'=8:12))
qplot(data=df1,x,y,geom='line', colour=colors)
qplot(data=df2,x,y,geom='line', colour=colors)
The first plot produces:
where the color range goes from 6-10.
The second plot produces:
where the color range goes from 8-12
I would like a constant range for both that goes from 6-12.

Histogram color fills with categorical variables in R

I am trying to create a plot like this:
qplot(carat, data = diamonds, geom = "histogram", fill = color)
However, instead of having a quantitative variable for the x-axis, I am using a categorical data. I am using a data frame like this:
refBases=c("A","A","A","C","C","C","G","G","G","T","T","T")
altBases=c("C","G","T","A","G","T","A","C","T","A","C","G")
myDF$ref=refBases
myDF$alt=altBases
myDF$Freq=c(5,2,3,6,9,6,8,6,7,4,6,4)
So, basically, I would like my plot to look the same, except that the x-axis will be four bins from the ref column (A,C,G,T); the y-axis will be the Freq; and the color legend will be the four variables in the alt column (A,C,G,T). So, basically, there will be four ref bins on the x-axis, each divided into three parts along the y-axis, with the color legend indicating the alt value.
I get something rather silly when I try what I expect:
qplot(ref,Freq,data=myDF,fill=alt)
What you're describing doesn't sound like a histogram (which is a very specific plot for continuous random variables to estimate the kernel density); sounds like you just want a bar chart. I believe this is what you're looking for
myDF <- data.frame(
ref=c("A","A","A","C","C","C","G","G","G","T","T","T"),
alt=c("C","G","T","A","G","T","A","C","T","A","C","G"),
Freq=c(5,2,3,6,9,6,8,6,7,4,6,4)
)
library(ggplot2)
ggplot(myDF, aes(ref, Freq, fill=alt)) +
geom_bar(stat="identity", position="dodge")

Resources