Standardize Color Range For Multiple Plots - r

I am plotting multiple dataframes, where the color of the line is dependent on a variable in the dataframe. The problem is that for each plot, R makes the color spectrum relative to the range of each plot.
I would like for the range (and corresponding colors) to be kept constant for all of the dataframes I'm using. I won't know the range of numbers in advance, though they'll all be set before plotting. In addition, there will hundreds of values, so a manual mapping is not feasible.
As of right now, I have:
library(ggplot2)
df1 <- as.data.frame(list('x'=1:5,'y'=1:5,'colors'=6:10))
df2 <- as.data.frame(list('x'=1:5,'y'=1:5,'colors'=8:12))
qplot(data=df1,x,y,geom='line', colour=colors)
qplot(data=df2,x,y,geom='line', colour=colors)
The first plot produces:
where the color range goes from 6-10.
The second plot produces:
where the color range goes from 8-12
I would like a constant range for both that goes from 6-12.

Related

Different colours in geomline and geomplot from same vector

I am currently trying to plot some data with dots and lines. My dataframe has an own column (FarbDots) in which I specify my wanted colours. When I try to plot the data, geom_point takes the colours in the wanted order, while geom_lines() creates a total mess (see image).
I was not able to recreate the same effect in a sample data set. Any idea on how to get my colours in order while still specifying them within the geom_line()/ geom_point()?
This is the code I used for plotting: (with b specifying the dataset, x, y, and groups)
b +
geom_line(colour=Data_Biol_long$FarbDots)+
geom_point(colour=Data_Biol_long$FarbDots)+
scale_y_log10()+
facet_grid(Analysis~., scale='free')
dots and lines should receive colour from same vector?!

specify order of variables in position dodge

I honestly don't know why this is being so hard.
I'm creating a simple scatter plot. The x axis is a continuous variable, and at every tick in x I need to plot four points with error bars. I'm using position dodge and everything works fine.
Each point has a different color, size and shape as governed by three further variables: color and shape are governed by factors, size by a continuous variable.
By default, the four points reflect the order of the levels in the color variable (red always left, then green, then blue) but I would like them to reflect the order of the size variable (the continuous one), smallest left and largest right. How do I specify that size should be prioritised when ordering points in position dodge? I tried using reverse ordering but then the points are ordered first according to the shape legend.
I could change the mapping between variable and aesthetics (all variables are fundamentally continuous and could be used with size) but I think it'd be useful to know how to specify the order in which multiple variables should be considered when dodging points.
The question is somewhat unclear unfortunately. You don't show "a simple scatter plot". You are showing some statistics (mean with error band??) for specific x values - although this is seemingly continuous, this looks as if you have categorised it beforehand - resulting in some summary statistics which you are plotting.
Also, it is not easy (impossible) to fully help you without knowing what you have done until now to come to where you are.
I have tried to reproduce a similar looking plot with mtcars.
Dodging is only possible by one group (but one group can contain more than one variable). To specify how to group, add group = ... to your aesthetics.
Like so:
library(tidyverse)
ggplot(filter(mtcars, carb %in% 1:4)) +
geom_point(aes(carb, mpg, size= gear, group = gear, shape = as.character(vs), color = as.factor(cyl)),
position = position_dodge(width = .5))
This is now dodged by gear, which is also used as size aesthetic.

How can I remove space/gaps between continuous x-values in geom_raster

I am working with some time-frequency decomposed EEG data and want to produce a spectrogram-like figure using ggplot2. But, I end up with blank spaces between each of my time points.
Data <- read.csv(url("https://www.dropbox.com/s/al3cygigm86mr3s/Test_Spec_Data.csv?dl=0"))
If I create a vanilla geom_raster I get gaps in the x and y data:
ggplot(Data,aes(Times,Frequency)) +
geom_raster(aes(fill = ERSP))
If I make Frequency a factor, it fills in the y gap; but, the gaps along the x-axis remain:
ggplot(Data,aes(Times,factor(round(Frequency,digits=1)))) +
geom_raster(aes(fill = ERSP))
I can eliminate the gaps by making Times a factor.
But, managing scale_x_discrete with this many data points is cumbersome (note the x-axis labels). Also, these time data are continuous and not really factor-like.
geom_raster doesn't have a width argument like geom_bar and I can't see anything similar in the geom_raster documentation.
Is there a way to keep Times as continuous but remove the gaps between observations?
There are gaps because there are not enough data (or more precisely, they are not evenly spaced).
Your "factor" transformation remove the gap because it remove the X or Y axis parts where data are missing. See in Y axis: ticks are evenly spaces but values are not (8.5-8.1=0.4, while 11.3-10.7=0.6, where you have you biggest gap).
I can see two solutions:
Interpolate data so your source data are evenly spaced
Use geom_tile instead of geom_raster and specify width and height parameters to "expand" your tiles and fill the gaps, as explained in the fourth example of the doc.

How to highlight certain days on a timeseries in ggplot2?

dateVec <- as.Date(c("08-01-2015","08-02-2015","08-03-2015","08-04-2015","08-05-2015"),format="%m-%d-%Y")
myData <- data.frame(dat=c(.1,.2,-.1,1,.1),
dates=dateVec,
indicator=c(0,0,0,1,0))
ggplot(myData,aes(x=dates,y=dat)) + geom_point()
I manually altered the plot here to shade the area around the datapoint with the highest value, where 'indicator' = 1.
How could I create this shading in ggplot automatically? Ideally I'd like the shaded area to have width, even though the x value is categorical. I've played with coloring the geom_point objects themselves according to the indicator, and while that works it doesn't really pop visually the way I would like it to.

adding text to ggplot geom_jitter points that match a condition

How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text?
My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).
One solution would be to capture the xy positions of the jittered plots and use it later in another layer, is that possible?
[update]
From Joran answer, a solution would be to calculate the jittered values with the jitter function from the base package, add them to a data frame and use them with geom_point. For filtering he used ddply to have a filter column (a logic vector) and use it for subsetting the data in geom_text.
He asked for a minimal dataset. I just modified his example (a unique identifier in the label colum)
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''))
This is the result of joran example with my data and lowering the display of ids to the lowest 1%
And this is a modification of the code to have colors by another variable and displaying some values of this variable (the lowest 1% for each group):
library("ggplot2")
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''),quality= rnorm(300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 1% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.01);
g$top_q <- g$qual <= quantile(g$qual,0.01);
g})
#Create a boxplot, overlay the jittered points and
# label the bottom 1% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj,colour=quality)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab)) +
geom_text(data=subset(datJit,top_q),aes(x=xj,label=sprintf("%0.2f",quality)))
Your question isn't completely clear; for example, you mention labeling points at one point but also mention coloring points, so I'm not sure which you really mean, or perhaps both. A reproducible example would be very helpful. But using a little guesswork on my part, the following code does what I think you're describing:
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=rep('label',300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 10% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.1); g})
#Create a boxplot, overlay the jittered points and
# label the bottom 10% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab))
Just an addition to Joran's wonderful solution:
I ran into trouble with the x-axis positioning when I tried to use in a facetted plot using facet_wrap(). The problem is, that ggplot2 uses 1 as the x-value on every facet. The solution is to create a vector of jittered 1s:
datJit$xj <- jitter(rep(1,length(dat$x)),amount=0.1)

Resources