Binary spark lines with R - r

I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?

I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.

Related

Plot piecewise data, x-axis limits

I use Julia with Plots , to generate my plots.
I want to plot data (A,B) and i know that all interesting data lies in two region of A. The two regions should be plotted between each other in one plot.
My A-data is evenly spaced. So what i did was cutting out my interesting pieces and glued them into one object.
My problem is that i don't know how to manipulate the scale on the x-axis.
When I just plot the B data against their array index, I basically get the form I want. I just need the numbers from A on the x-axis.
I give here a toy example
using Plots
N=5000
B=rand(N)
A=(1:1:N)
xl_1=100
xu_1=160
xl_2=600
xu_2=650
A_new=vcat(A[xl_1:xu_1],A[xl_2:xu_2])
B_new=vcat(B[xl_1:xu_1],B[xl_2:xu_2])
plot(A_new,B_new) # This leaves the spacing between the data explicit
plot(B_new) # This creats basically the right spacing, but
# without the right x axis grid
I did not find anything how one can use two successive xlims, therefore i try it this way.
You can't pass two successive xlims, because you can't have a break in the axis. That is by design in Plots.
So your possibilities are: 1) to have two subplots with different parts of the plot, or 2) to plot with the index, and just change the axis labels.
The second approach would use a command like xticks = ([1, 50, 100, 150], ["1", "50", "600", "650"], but I'd recommend the first as it's strictly speaking a more correct way of displaying the data:
plot(
plot(A[xl_1:xu_1], B[xl_1:xu_1], legend = false),
plot(A[xl_2:xu_2], B[xl_2:xu_2], yshowaxis = false),
link = :y
)

In R rgl, how to choose the position of tick marks in plot3d?

In R,
library(rgl)
m <- matrix(rnorm(300),100,3)
par3d(ignoreExtent=F)
plot3d(m,box=T,axes=F,xlab='',ylab='',zlab='')
axes3d(labels=F,tick=F,box=F)
gr <- grid3d('z')
par3d(ignoreExtent=T)
plot3d(cbind(m[,1:2],rgl.attrib(gr[1],'vertices')[1,3]),col='gray',add=T)
still prints the ticks with numbers:
Shouldn't tick=F parameter in axes3d() get rid of the tick marks and the numbers?
I want to add the x and y axes at the bottom of the graph, not at the top. Also, when I add them using axis3d(), the ticks aren't orthogonal anymore, but inclined in 45 degrees relative to their plane, which I think is ugly.
par3d(ignoreExtent=F)
plot3d(m,box=T,axes=F,xlab='',ylab='',zlab='')
box3d()
axis3d('x--',labels=T,tick=T)
axis3d('y+-',labels=T,tick=T)
axis3d('z++',labels=T,tick=T)
gr <- grid3d('z')
par3d(ignoreExtent=T)
plot3d(cbind(m[,1:2],rgl.attrib(gr[1],'vertices')[1,3]),col='gray',add=T)
If I have to go this second way, how to get rid of the front lines of the box? Or is there another way to print the default tick marks (orthogonal) in the desired position?
Axes in rgl are somewhat confusing and not very flexible. First, there are two different kinds: those drawn by axis3d, and those drawn with rgl.bbox. Only the first type pays attention to the tick argument, and your first example used the second type.
You can remove the ticks in the rgl.bbox axes by setting marklen = 0, marklen.rel = FALSE, but this has the unfortunate effect of putting the numbers right on the box. There isn't a separate parameter to control placement of the numbers independent of tickmark length. If you don't want numbers at all, use xlen = 0, ylen = 0, zlen = 0.
The axis3d axes are also not very flexible. If you want to change their orientation, you'll need to modify that function. The mpos array holds the coordinates of each tick; change it to make the ticks point the way you want.
Regarding the box: it's fixed if you use box3d() to draw it. If you want the rgl.bbox style, you'll have to use that function. You could also use segments3d() and mtext3d() to construct your own axes, but they won't move around like the rgl.bbox axes.

Change axis in R with different number of datas

I want to change x-axis in my graphic, but it doesn't work properly with axis(). Datas in the graphic are daily datas and I want to show only years. Hope someone understands me and find a solution. This is how it looks like now: enter image description here and this is how it looks like with the code >axis (1, at = seq(1800, 1975, by = 25), las=2): enter image description here
Without a reproducible code is not easy to get what could be the problem. I try a "quick and dirt" approach.
High level plots are composed by elements that are sub-composed themselves. Hence, separate drawing commands could turn in use by allowing a finer control on the plotting procedure.
In practice, the first thing to do is plot "nothing".
> plot(x, y, type = "n", xlab = "", ylab = "", axes = F)
type = "n" causes the data to not be drawn. axes = F suppresses the axis and the box around the plot. In spite of that, the plotting region is ready to show the data.
The main benefit is that now the plotting area is correctly dimensioned. Try now to add the desired x axis as you tried before.
> points(x, y) # Plots the data in the area
> axis() # Plots the desired axis with your scale
> title() # Plots the desired titles
> box() # Prints the box surrounding the plot
EDITED based on comment by #scoa
As a quick and dirty solution, you can simply enter the following line after your plot() line:
# This reads as, on axis x (1), anchored at the first (day) value of 0
# and last (day) value of 63917 with 9131 day year increments (by)
# and labels (las) perpendicular (2) to axis (for readability)
# EDITED: and AT the anchor locations, put the labels
# 1800 (year) to 1975 (year) in 25 (year) increments
axis (1, at = seq(0, 63917, by = 9131), las=2, labels=seq(1800, 1975, by=25));
For other parameters, check out ?axis. As #scoa mentioned, this is approximate. I have used 365.25 as a day-to-year conversion, but it's not quite right. It should suffice for visual accuracy at the scale you have provided. If you need precise conversion from days to years, you need to operate on your original data set first before plotting.

How to superimpose a histogram on each panel

I would like to superimpose, on each lattice histogram panel, an additional histogram (which will be the same in each panel). I want the overlayed histogram to have solid borders but empty fill (col), to allow comparison with the underlying histograms.
That is, the end result will be a series of panels, each with a different colored histogram, and each with the same extra outline histogram on top of the colored histogram.
Here's something that I tried, but it just produces empty panels:
foo.df <- data.frame(x=rnorm(40), categ=c(rep("A", 20), rep("B", 20)))
bar.df <- data.frame(x=rnorm(20))
histogram(~ x | categ, data=foo.df,
panel=function(...){histogram(...);
histogram(~ x, data=bar.df, col=NULL)})
(My guess is that I need to use panel.superpose, but this function is somewhat confusing. Sarkar's book doesn't explain how to use it, and the R help page has no examples. I'm finding it difficult to make sense of the panel.superpose help page without already having a basic understanding. There are a very small number of examples that I've found on the web, but I have been unable to figure out what aspects of those examples apply to my case. This answer is surely relevant, but I don't understand its use of panel.groups, and the example overlays three different groups from a single dataframe, whereas I want to repeatedly overlay the same data on multiple panels that also have different data .)
I continued working on this problem, and came up with an answer. I had been on the right track but got several crucial details wrong. Comments in the code below spell out important points.
# Main data, which will be displayed as solid histograms, different in each panel:
foo.df <- data.frame(y=rnorm(40), cat=c(rep("A", 20), rep("B", 20)))
# Comparison data: This will be displayed as an outline histogram in each panel:
bar.df <- data.frame(y=rnorm(30)-2)
# Define some vectors that we'll use in the histogram call.
# These have to be adjusted for the data by trial and error.
# Usually, panel.histogram will figure out reasonable default values for these.
# However, the two calls to panel.histogram below may figure out different values,
# producing pairs of histograms that aren't comparable.
bks <- seq(-5,3,0.5) # breaks that define the bar bins
yl <- c(0,50) # height of plot
# The key is to coordinate breaks in the two panel.histogram calls below.
# The first one inherits the breaks from the top-level call through '...' .
# Using "..." in the second call generates an error, so I specify parameters explicitly.
# It's not necessary to specify type="percent" at the top level, since that's the default,
# but it is necessary to specify it in the second panel.histogram call.
histogram(~ y | cat, data=foo.df, ylim=yl, breaks=bks, type="percent", border="cyan",
panel=function(...){panel.histogram(...)
panel.histogram(x=bar.df$y, col="transparent",
type="percent", breaks=bks)})
# col="transparent" is what makes the second set of bars into outlines.
# In the first set of bars, I set the border color to be the same as the value of col
# (cyan by default) rather than using border="transparent" because otherwise a filled
# bar with the same number of points as an outline bar will be slightly smaller.

R How to build angled column headings above columns in heatmap.2: pass (text) plot to the layout?

I am very close to the heatmap I want, but I have been struggling for several days to figure out the headings problem. I want angled headings (such as 45 or 50 degrees) at the top of each column. I have already suppressed the dendrograms (for both columns and rows), and used the first column of my matrix (depth) as labels for the rows.
From my many, many searches on this issue, I see that mtext won't help me because the text cannot be rotated in mtext. Then I thought I could do it with text, but the column labels get overwritten onto the heatmap itself; they "disappear" (get covered) when the words reach the edge of the heatmap layout space. So I examined the layout used by heatmap (thanks very much to #Ian Sudbery), and it occurred to me that what I really need is a dedicated space in the layout for my column headings. I can allocate that space using the layout function, and I have done so in the code below. But I think the next step may involve getting inside the heatmap.2 code. Heatmap.2 calls four plots (two of which I have suppressed, the dendrograms). How do I make it call a fifth plot? And if that is possible, how do I tell it that I want to use text as my fifth "plot", where the text is my column headings, rotated 50deg?
Thanks very much for any ideas. Please forgive the clumsy way I've provided sample data in the code below; I am new to R and generally do not know the most elegant way to do most things.
par(oma=c(0.5,0.5,.5,0.5)) # outer margins of whole figure
lmat = rbind(c(3,5,5),c(2,1,1),c(0,4,0))
lwid = c(.08,.9, .1)
lhei = c(1,4,1.5)
Depth<-c("0m","20m","40m","60m","80m","100m")
Sept2008<-c(3,6,8,10,15,16)
March2010<-c(10,12,11,13,12,11)
Sept2010<-c(5,6,NA,8,11,13)
March2011<-c(4,6,10,NA,14,14)
Sept2011<-c(2,5,3,9,16,12)
heatmap_frame=data.frame(Depth=Depth,Sept2008=Sept2008,March2010=March2010,Sept2010=Sept2010, March2011=March2011, Sept2011=Sept2011)
row.names(heatmap_frame)<-heatmap_frame$Depth
heatmap_frame<-heatmap_frame[,-1]
heatmap_matrix <- as.matrix(heatmap_frame)
labCol=c("Sept 2008","March 2010","Sept 2010","March 2011","Sept 2011")
cexCol=1.1
heatmap <- heatmap.2(heatmap_matrix, dendrogram="none", trace="none",Rowv=NA, Colv=NA,
col = brewer.pal(9,"Blues"), scale="none", margins=c(2,5),labCol="",
lmat=lmat, lwid=lwid,lhei=lhei, density.info="none",key=TRUE)
# want to plot a fifth area, to use for col labels
# don't know how to pass a text line to the heatmap.2/layout/matrix to print as my fifth plot
mtext("Use for main title", side=3,outer=F,line=2.75, font=2, cex=1.25)
# testing the text function; did not work as desired
#text(x=1, y=1, labels="Label_1",xpd=T)
text(x=c(0,.2,.4,.6,.8), y=0.95, pos=3, srt=50, labels=labCol,xpd=T, cex=1)
Here's a hack that doesn't involve pulling apart the convoluted code of heatmap.2:
pos2 <- locator() #will return plotting coordinates after doing this:
# Shift focus to the graphics window by clicking on an edge
# Left-Click once where you want the first label to be centered
# Left-click again on the point where you want the last label centered
# Right-Click, then return focus to the console session window
pos2 <- structure(list(x = c(0.27149971320082, 0.858971646016485),
y = c(0.861365598392473, 0.857450478257082)),
.Names = c("x", "y"))
text(x=seq(pos2$x[1], pos2$x[2], len=5), y=rep(pos2$y[1],5) ,
srt=50, xpd=TRUE, adj = 0,
labels=c("Sept 2008","March 2010","Sept 2010",
"March 2011","Sept 2011") )
I don't know if you actually need the xpd in there, since it appears that after heatmap.2 is finished it returns the window to its native coordinates: [0,1]x[0,1]

Resources