please help: I want to shade a time-series figure in R's plot for all values where an indicator variable, z == 1.
Here follows a code which generates a similar scenario that I am looking at:
x <-runif(100, 5.0, 7.5)
y <-runif(100, 1, 10)
z = as.numeric(y >= 5)
date = seq(as.Date("1910/1/1"), as.Date("2009/1/1"), "years")
data = data.frame(cbind(x,y,z))
color <- rgb(190, 190, 190, alpha=80, maxColorValue=255)
plot(date,x, type='l')
rect(xleft=date[10], xright=date[40], ybottom=5, ytop=7.5, col = color,density=100)
From the code, I can only specify dates one by one. But suppose I want to shade all the areas where z==1? I.e. all the dates where z == 1. Any ideas how this could be done?
Manythanks, Nic
Just feed an entire vector of dates into the xleft and xright parameters, as indexed by z==1. Don't do line shading, it will run a long time, just change the color to grey. Afterwards, plot the time series again over the rectangles:
plot(date,x, type='l')
rect(xleft=date[z==1]-180,xright=date[z==1]+180,
ybottom=5, ytop=7.5, col="grey",border=NA)
lines(date,x)
Related
I have made a graph with the year on the x-axis and sea level rise on the y-axis.
I am trying to make the data from 2025 (predictions) a different colour to that before 2025.
I have grouped and labeled the predicted data using this code and have also included the code for my graph
predictions=data[which(data$Year>2024 & data$Year<2121),]
plot(data$Sea.Level..cm.~data$Year,xlab="Year",ylab="Sea Level (cm)",pch=21,col=c("Blue"))
How do I go from here in making the predictions red but the previous data blue?
Thanks in advance
You can try something like this. Toy data used.
vec <- c( rep(2001,10), rep(2002, 3) )
tf <- (vec < 2002) + 1
barplot( 1:length(vec), vec, col=c("red","blue")[tf], names=vec )
You can provide a vector of colours to plot in the col argument. For example, following your existing syntax:
predictions <- rep("red", nrow(data))
predictions[which(data$Year>2024 & data$Year<2121)] <- "blue"
plot(data$Sea.Level..cm.~data$Year,xlab="Year",ylab="Sea Level (cm)",pch=21,col=predictions)
I have a dummy variable call it "drink" and a corresponding age variable that represents a precise age estimate (several decimal points) for each person in a dataset. I want to first "bin" the age variable, extracting the mean value for each bin based on the "drink" dummy, and then graph the result. My code to do so looks like this:
df$bins <- cut(df$age, seq(from = 17, to = 31, by = .2), include.lowest = TRUE)
df.plot <- ddply(df, .(bins), summarise, avg.drink = mean(drinks_alcohol))
qplot(bins, avg.drink, data = df.plot)
This works well enough, but the x-axis in the graph is unreadable because it corresponds to the length size of the bins. Is there a way to make the modify the X-axis to show, for example, ages 19-23 only, with the "ticks" still aligning with the correct bins? For example, in my current code there is a bin for (19, 19.2] and another bin for (20, 20.2]. I would want only the bins that start in whole numbers to be identified on the X-axis with the first number (19, 20), not the second (19.2, 20.2) shown.
Is there any straightforward way to do this?
The most direct way to specify axis labels is with the appropriate scale function... in the case of factors on the x axis, scale_x_discrete. It will use whatever labels you give it with the labels argument, or you can give it a function that formats things as you like.
To "manually" specify the labels, you just need to create a vector of appropriate length. In this case, if you factor values go are intervals beginning with seq(17, 31.8, by = 0.2) and you want to label bins beginning with integers, then your labels vector will be
bin_starts = seq(17, 31.8, by = 0.2)
bin_labels = ifelse(bin_starts - trunc(bin_starts) < 0.0001, as.character(bin_starts), "")
(I use the a - b < 0.0001 in case of precision problems, though it shouldn't be a problem in this particular case).
A more robust solution would to label the factor levels with the number at the start of the interval from the beginning. cut also has a labels argument.
my_breaks = seq(17, 32, by = 0.2)
df$bins <- cut(df$age, breaks = my_breaks, labels = head(my_breaks, -1),
include.lowest = TRUE)
You could then fairly easily write a formatter (following templates from the scales package) to print only the ones you want:
int_only = function(x) {
# test if we can coerce to numeric, if not do nothing
if (any(is.na(as.numeric(x)))) return(x)
# otherwise convert to numeric and return integers and blanks as labels
x = as.numeric(x)
return(ifelse(x - trunc(x) < 1e-10, as.character(x), ""))
}
Then, using the nicely formatted data created above, you should be able to pass int_only as a formatter function to labels to get the labels you want. (Note: untested! necessary tweaks left as an exercise for the reader, though I'll gladly accept edits :) )
I would like to color X-axis intervals of a plot line with different colours between these points:
52660, 106784, 151429, 192098, 233666, 273857, 307933, 343048, 373099, 408960, 441545, 472813, 497822, 518561, 537471, 556747, 571683, 591232, 599519, 616567, 625727, 633745
The intervals represent SNP positions along 22 chromosomes.
The problem is that the intervals are unequal (e.g. 52660 - 106784, 106784 - 151429, ... 472813 - 497822, ...). Y-axis values represent ancestry frequencies. X-axis name is SNP_position
The closest I have found is using "ifelse", but for some reason it doesn't work well for me.
For instance, for the first interval (0 - 52660) I included the "col" variable for "plot" and I tried:
col = ifelse(SNP_position < 52660,'blue', 'green')
or
col=ifelse(SNP_position < 52660 & SNP_position > 106784,"blue","green")
but when I do this the whole line becomes green.
Here is the plot I want to colour
Any help would be highly appreciated.
Here's a proof of concept on how to do it with segments. First step is to create a vector of alternating segments. I'm using even and odds to do this. You will have to plug in the correct y-axis data in your code.
x <-1:700000
segments <-c(52660, 106784, 151429, 192098, 233666, 273857, 307933, 343048, 373099, 408960, 441545, 472813, 497822, 518561, 537471, 556747, 571683, 591232, 599519, 616567, 625727, 633745)
stOdds <- segments[1:length(segments) %% 2 == 1]
stEvens <- segments[1:length(segments) %% 2 == 0]
plot(x, type="l", col="green", lwd=2)
segments(stOdds,stOdds,stEvens,stEvens,col="blue", lwd=2)
UPDATE
With the additional info, here's how to do it with cut, and lines.
#create data
x <-1:700*1000
y <-runif(700)
z <-data.frame(x,y)
#cut in segments
my_segments <-c(52660, 106784, 151429, 192098, 233666, 273857, 307933, 343048, 373099, 408960, 441545, 472813, 497822, 518561, 537471, 556747, 571683, 591232, 599519, 616567, 625727, 633745)
my_cuts <-cut(x,my_segments, labels = FALSE)
my_cuts[is.na(my_cuts)] <-0
#create subset of of segments
z_alt <-z
z_alt[my_cuts %% 2 == 0,] <-NA
#plot green, then alternating segments in blue
plot(z, type="l", col="green", lwd=2)
lines(z_alt,col="blue", lwd=2)
library(gplots)
shades= c(seq(-1,0.8,length=64),seq(0.8,1.2,length=64),seq(1.2,3,length=64))
heatmap.2(cor_mat, dendrogram='none', Rowv=FALSE, Colv=FALSE, col=redblue(64),
breaks=shades, key=TRUE, cexCol=0.7, cexRow=1, keysize=1)
There is some problem with breaks. Wish to receive help on it.
After running the code I get this error message
Error in image.default(1:nc, 1:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 + : must have one more break than colour
Thank you for your time and consideration.
Well, we don't have cor_mat so we can't try this ourselves, but the problem seems to be what it says on the tin, isn't it? The way heatmap (and generally all functions based on image) works with breaks and a vector of colours, is that the breaks define the points where changes in the value of your data matrix means the colour changes. In short, if break = c(1,2,3), and your col = c("red", "blue"):
values < 1 will be transparent
values >= 1, <= 2 will be plotted as red
values > 2, <= 3 will be plotted as blue
values > 3 will be transparent
What's going on in your code is that with 'shade' you've supplied a length 3*64 vector to break, while redblue(64) only gives you 64 colours. Try replacing redblue(64) with, say, redblue(3*64-1).
I have a chart of financial activity and a couple running sums. Things are getting a little busy and I'm having trouble distinguishing fiscal (ends June 30th) vs calendar year. Is there a way to set the background to different colors based on date?
In other words could I set background to lite green where 2009-06-30 < date < 2010-07-01?
Apply a piece of both suggestions by #G-Grothendieck and #vincent - use rect within zoo package. zoo is excellent for any visualization of time series.
library(zoo)
#random data combined with time series that starts in 2009-01
v <- zooreg(rnorm(37), start = as.yearmon("2009-1"), freq=12)
plot(v, type = "n",xlab="",xaxt="n")
#this will catch some min and max values for y-axis points in rect
u <- par("usr")
#plot green rect - notice that x-coordinates are defined by date points
rect(as.yearmon("2009-6-30"), u[3], as.yearmon("2010-7-1"), u[4],
border = 0, col = "lightgreen")
lines(v)
axis(1, floor(time(v)))
#customized x-axis labels based on dates values
axis(1,at=c(2009.4, 2010.5),padj=-2,lty=0,labels=c("start","end"),cex.axis=0.8)
Check out xblocks.zoo in the zoo package. e.g., example(xblocks.zoo)
You can plot grey rectangles, with rect, before plotting the curves.
You will also need the dimensions of the plotting area: they are in par("usr").
library(quantmod)
getSymbols("A")
plot( index(A), coredata(Ad(A)), type="n" )
# This example uses calendar years: adapt as needed
dates <- c(
ISOdate( year(min(index(A))), 1, 1 ),
ISOdate( year(max(index(A))) + 1, 1, 1 )
)
dates <- as.Date(dates)
dates <- seq.Date(dates[1], dates[2], by="2 year")
rect(
dates,
par("usr")[3],
as.Date( ISOdate( year(dates) + 1, 1, 1 ) ),
par("usr")[4],
col="grey",
border=NA
)
lines(index(A), coredata(Ad(A)), lwd=3)