I have two vectors (in a data frame) that I want to plot like this plot(df$timeStamp,df$value), which works nicely by itself. Now the plot is showing the timestamp in a pure numerical way as markers on the x axis.
When I format the vector of timestamps it into a vector of "hh:mm:ss", plot() complains (which makes sense, as the x-axis data is now a vector of strings).
Is there a way to say plot(x-vector, y-vector, label-x-vector) where the label-x-vector contains the elements to display along the x-axis?
The last part of your general question is done in two commands rather than one. If you look at ?plot.default (linked from ?plot) you'll see an option to leave off the x-axis all together using the xaxt argument (xaxt = 'n'). Do that and then use axis to make the x-axis what you want (check ?axis). I don't know what format your timestamp is currently in so it's hard to help further.
In general it's...
plot(x-vector, y-vector, xaxt = 'n')
axis(1, x-vector, label-x-vector)
(The help for plotting may be just about the messiest part of R-help but once you get used to looking at plot.default, axis, and par you'll start getting a better handle on things)
The standard R plots are pretty good at doing what you want if you give them the correct information. If you can convert your timestamps to actual time objects (Date or POSIXct objects) then plot will tend to do the correct thing. Try the following examples:
tmp <- as.POSIXct( seq(0, length=10, by=60*5), origin='2011-12-28' )
tmp
plot( tmp, runif(10) )
tmp2 <- as.POSIXct( seq(0, length=10, by=60*60*5), origin='2011-12-28' )
tmp2
plot( tmp2, runif(10) )
tmp3 <- as.POSIXct( seq(0, length=10, by=60*60/2), origin='2011-12-28' )
tmp3
plot( tmp3, runif(10) )
In each case the tick labels are pretty meaningful, but if you would like a different format then you can follow #John's example and suppress the default axis, then use axis.POSIXct and specify what format you want.
The examples use equally spaced times (due to my laziness), but will work equally well for unequally spaced times.
Related
I apologize if this is a messy question, but I don't know any better way to format it. I'm trying to get it so on the x-axis of my graph, all of the dates I have in my dataset are shown. Right now, it's just about half of them - I'll attach a picture of my graph. I would also like if there were more tick marks between each date. I used the lubridate() function to combine my date/time columns, so that's not an issue. I was wondering if there was a way to manipulate the axis tick marks even though these aren't typical numerical values. I'll attach my code below.
PS4T1$newdate <- with(PS4T1, as.POSIXct(paste(date, time), format="%m-%d-%Y %H:%M"))
plot(average ~ newdate, data=PS4T1, type="b", col="blue")
First, format date as.POSIXct, this is important for which plot method is called, apparently you already have done that.
dat <- transform(dat, date=as.POSIXct(date))
Then, subset on the substrings where hours are e.g. '00'. Next plot without x-axis and build custom axis using axis and mtext.
st <- substr(dat$date, 12, 13) == '00'
plot(dat, type='b', col='blue', xaxt='n')
axis(1, dat$date[st], labels=F)
mtext(strftime(dat$date[st], '%b %d'), 1, 1, at=dat$date[st])
Data:
set.seed(42)
dat <- data.frame(
date=as.character(seq.POSIXt(as.POSIXct('2021-06-22'), as.POSIXct('2021-06-29'), 'hour')),
v=runif(169)
)
Using the plot() function in R, I'm trying to produce a scatterplot of points of the form (SaleDate,SalePrice) = (saldt,sapPr) from a time-series, cross-section real estate sales dataset in dataframe format. My problem concerns labels for the X-axis. Just about any series of annual labels would be adequate, e.g. 1999,2000,...,2013 or 1999-01-01,...,2013-01-01. What I'm getting now, a single label, 2000, at what appears to be the proper location won't work.
The following is my call to plot():
plot(r12rgr0$saldt, r12rgr0$salpr/1000, type="p", pch=20, col="blue", cex.axis=.75,
xlim=c(as.Date("1999-01-01"),as.Date("2014-01-01")),
ylim=c(100,650),
main="Heritage Square Sales Prices $000s 1990-2014",xlab="Sale Date",ylab="$000s")
The xlim and ylim are called out to bound the date and price ranges of the data to be plotted; note prices are plotted as $000s. r12rgr0$saldt really is a date; str(r12rgr0$saldt) returns:
Date[1:4190], format: "1999-10-26" "2013-07-06" "2003-08-25" NA NA "2000-05-24" xx
I have reviewed several threads here concerning similar questions, and see that the solution probably lies with turning off the default X-axis behavior and using axis.date, but i) At my current level of R skill, I'm not sure I'd be able to solve the problem, and ii) I wonder why the plotting defaults are producing these rather puzzling (to me, at least) results?
Addl Observations: The Y-axis labels are just fine 100, 200,..., 600. The general appearance of the scatterplot indicates the called-for date ranges are being observed and the relative positions of the plotted points are correct. Replacing xlim=... as above with xlim=c("1999-01-01","2014-01-01")
or
xlim=c(as.numeric(as.character("1999-01-01")),as.numeric(as.character("2014-01-01")))
or
xlim=c(as.POSIXct("1999-01-01", format="%Y-%m-%d"),as.POSIXct("2014-01-01", format="%Y-%m-%d"))
all result in error messages.
With plots it's very hard to reproduce results with out sample data. Here's a sample I'll use
dd<-data.frame(
saldt=seq(as.Date("1999-01-01"), as.Date("2014-01-10"), by="6 mon"),
salpr = cumsum(rnorm(31))
)
A simple plot with
with(dd, plot(saldt, salpr))
produces a few year marks
If i wanted more control, I could use axis.Date as you alluded to
with(dd, plot(saldt, salpr, xaxt="n"))
axis.Date(1, at=seq(min(dd$saldt), max(dd$saldt), by="30 mon"), format="%m-%Y")
which gives
note that xlim will only zoom in parts of the plot. It is not directly connected to the axis labels but the axis labels will adjust to provide a "pretty" range to cover the data that is plotted. Doing just
xlim=c(as.Date("1999-01-01"),as.Date("2014-01-01"))
is the correct way to zoom the plot. No need for conversion to numeric or POSIXct.
If you are running a plot in real time and don't mind some warnings, you can just pass, e.g., format = "%Y-%m-%d" in the plot function. For instance:
plot(seq((Sys.Date()-9),Sys.Date(), 1), runif(10), xlab = "Date", ylab = "Random")
yields:
while:
plot(seq((Sys.Date()-9), Sys.Date(), 1), runif(10), format = "%Y-%m-%d", xlab = "Date", ylab = "Random")
yields:
with lots of warnings about format not being a graphical parameter.
I want to break the x-axis of a plot of a cumulative distribution function for which I use the function plot.stepfun, but don't seem to be able to figure out how.
Here's some example data:
set.seed(1)
x <- sample(seq(1,20,0.01),300,replace=TRUE)
Then I use the function ecdf to get the empirical cumulative distribution function of x:
x.cdf <- ecdf(x)
And I change the class of x.cdf to stepfun, because I prefer to call plot.stepfun directly over using plot.ecdf (which also uses plot.stepfun, but has fewer possibilities to customize the plot).
class(x.cdf) <- "stepfun"
Then I am able to create a plot as follows:
plot(x.cdf, do.point=FALSE)
But now I want to break up the x-axis between 12 and 20, e.g. using axis.break [plotrix-library] such as here, but since I have no ordinary x and y-argument for plotting, I don't know how to do this.
Any help would be very much appreciated!
"Breaking the axis between 12 and 20" doesn't make a lot of sense to me since 20 is the end of the x range, so I will exemplify breaking it between 12 and 15. The plotrix.axis.break function doesn't actually do very much (as can be seen if you step through that example.) All it does is put a couple of slashes at a particular location, the "breakpos". All the rest of the work needs to be done with regular plotting functions and plot.stepfun isn't really set up to do it, so I'm using regular plot.default with the type="s" argument. You need to do the offsetting of the x values, the arguments to the ecdf function and the labels in the axis arguments.
png()
plot( c(seq(1,12,0.1), seq(15,20,0.1)-3), # Supply the range, shifted
x.cdf(c(seq(1,12,0.1), seq(15,20,0.1))), # calc domain values, not shifted
type="s", xaxt="n", xlab="X", ylab="Quantile")
axis(1, at=c( 1:12, (16:20)-3), labels=c(1:12, (16:20)) ) #shift x's, labels unshifted
axis.break(breakpos=12)
dev.off()
I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?
I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.
I have some 100k values. When I plot them as a line in R (using plot(type="l") the numbers next to the x-axis ticks are printed in scientific format (e.g. 0e+00,2e+04,...,1e+05). Instead, I would like them to be:
A) 0,20kb,...,100kb
B) the same but now the first coordinate should be 1 (i.e. starting to count from 1 instead of 0).
BTW
R arrays use numbering that starts from 1 (in contrast to arrays in perl, java etc.) so I wonder why when plotting "they" decided starting from 0...
A)
R> xpos <- seq(0, 1000, by=100)
R> plot(1:1000, rnorm(1000), type="l", xaxt="n")
R> axis(1, at=xpos, labels=sprintf("%.2fkb", xpos/1000))
B) same as above, adjust xpos
The question is quite old but when I looked for solutions for the described problem it was ranked quite high. Therefore, I add this - quite late - answer and hope that it might help some others :-) .
In some situations it might be useful to use the tick locations which R suggests. R provides the function axTicks for this purpose. Possibly it did not exist in R2.X but only since R3.X.
A)
myTicks = axTicks(1)
axis(1, at = myTicks, labels = paste(formatC(myTicks/1000, format = 'd'), 'kb', sep = ''))
B)
If you plot data like plot(rnorm(1000)), then the first x-value is 1 and not 0. Therefore, numbering automatically starts with 1. Maybe this was a problem with a previous version of R?!