spplot() - make color.key look nice - r

I'm afraid I have a spplot() question again.
I want the colors in my spplot() to represent absolute values, not automatic values as spplot does it by default.
I achieve this by making a factor out of the variable I want to draw (using the command cut()). This works very fine, but the color-key doesn't look good at all.
See it yourself:
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
meuse.grid$random[meuse.grid$random < 0] <- 0
meuse.grid$random[meuse.grid$random > 10] <- 10
# making a factor out of meuse.grid$ random to have absolute values plotted
meuse.grid$random <- cut(meuse.grid$random, seq(0, 10, 0.1))
spplot(meuse.grid, c("random"), col.regions = rainbow(100, start = 4/6, end = 1))
How can I have the color.key on the right look good - I'd like to have fewer ticks and fewer labels (maybe just one label on each extreme of the color.key)
Thank you in advance!
[edit]
To make clear what I mean with absolute values: Imagine a map where I want to display the sea height. Seaheight = 0 (which is the min-value) should always be displayed blue. Seaheight = 10 (which, just for the sake of the example, is the max-value) should always be displayed red. Even if there is no sea on the regions displayed on the map, this shouldn't change.
I achieve this with the cut() command in my example. So this part works fine.
THIS IS WHAT MY QUESTION IS ABOUT
What I don't like is the color description on the right side. There are 100 ticks and each tick has a label. I want fewer ticks and fewer labels.

The way to go is using the attribute colorkey. For example:
## labels
labelat = c(1, 2, 3, 4, 5)
labeltext = c("one", "two", "three", "four", "five")
## plot
spplot(meuse.grid,
c("random"),
col.regions = rainbow(100, start = 4/6, end = 1),
colorkey = list(
labels=list(
at = labelat,
labels = labeltext
)
)
)

First, it's not at all clear what you are wanting here. There are many ways to make the color.key look "nice" and that is to understand first the data being passed to spplot and what is being asked of it. cut() is providing fully formatted intervals like (2.3, 5.34] which will need to be handled a different way, increasing the margins in the plot, specific formatting and spacing for the labels, etc. etc. This just may not be what you ultimately want.
Perhaps you just want integer values, rounded from the input values?
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
Round the values (or trunc(), ceil(), floor() them . . .)
meuse.grid$rclass <- round(meuse.grid$random)
spplot(meuse.grid, c("rclass"), col.regions = rainbow(100, start = 4/6, end = 1))

Related

R: Increase space between multiple boxplots to avoid omitted x axis labels

Let's say I generate 5 sets of random data and want to visualize them using boxplots and save those to a file "boxplots.png". Using the code
png("boxplots.png")
data <- matrix(rnorm(25),5,5)
boxplot(data, names = c("Name1","Name2","Name3","Name4","Name5"))
dev.off()
there are 5 boxplots created as desired in "boxplots.png", however the names for the second ("Name2") and the fourth ("Name4") boxplot are omitted. Even changing the window of my png-view makes no difference. How can I avoid this behavior?
Thank you!
Your offered code does not produce an overlap in my setting, but that point is relatively moot: you want a way to allow more space between words.
One (brute-force-ish) way to fix the symptom is to alternate putting them on separate lines:
set.seed(42)
data <- matrix(rnorm(25),5,5)
nms <- c("Name1","Name2","Name3","Name4","Name5")
oddnums <- which(seq_along(nms) %% 2 == 0)
evennums <- which(seq_along(nms) %% 2 == 1)
(There's got to be a better way to do that, but it works.)
From here:
png("boxplot.png", height = 240)
boxplot(data, names = FALSE)
mtext(nms[oddnums], side = 1, line = 2, at = oddnums)
mtext(nms[evennums], side = 1, line = 1, at = evennums)
dev.off()
(The use of png is not important here, I just used it because of your edit.)

R: Create a more readable X-axis after binning data in ggplot2. Turn bins into whole numbers

I have a dummy variable call it "drink" and a corresponding age variable that represents a precise age estimate (several decimal points) for each person in a dataset. I want to first "bin" the age variable, extracting the mean value for each bin based on the "drink" dummy, and then graph the result. My code to do so looks like this:
df$bins <- cut(df$age, seq(from = 17, to = 31, by = .2), include.lowest = TRUE)
df.plot <- ddply(df, .(bins), summarise, avg.drink = mean(drinks_alcohol))
qplot(bins, avg.drink, data = df.plot)
This works well enough, but the x-axis in the graph is unreadable because it corresponds to the length size of the bins. Is there a way to make the modify the X-axis to show, for example, ages 19-23 only, with the "ticks" still aligning with the correct bins? For example, in my current code there is a bin for (19, 19.2] and another bin for (20, 20.2]. I would want only the bins that start in whole numbers to be identified on the X-axis with the first number (19, 20), not the second (19.2, 20.2) shown.
Is there any straightforward way to do this?
The most direct way to specify axis labels is with the appropriate scale function... in the case of factors on the x axis, scale_x_discrete. It will use whatever labels you give it with the labels argument, or you can give it a function that formats things as you like.
To "manually" specify the labels, you just need to create a vector of appropriate length. In this case, if you factor values go are intervals beginning with seq(17, 31.8, by = 0.2) and you want to label bins beginning with integers, then your labels vector will be
bin_starts = seq(17, 31.8, by = 0.2)
bin_labels = ifelse(bin_starts - trunc(bin_starts) < 0.0001, as.character(bin_starts), "")
(I use the a - b < 0.0001 in case of precision problems, though it shouldn't be a problem in this particular case).
A more robust solution would to label the factor levels with the number at the start of the interval from the beginning. cut also has a labels argument.
my_breaks = seq(17, 32, by = 0.2)
df$bins <- cut(df$age, breaks = my_breaks, labels = head(my_breaks, -1),
include.lowest = TRUE)
You could then fairly easily write a formatter (following templates from the scales package) to print only the ones you want:
int_only = function(x) {
# test if we can coerce to numeric, if not do nothing
if (any(is.na(as.numeric(x)))) return(x)
# otherwise convert to numeric and return integers and blanks as labels
x = as.numeric(x)
return(ifelse(x - trunc(x) < 1e-10, as.character(x), ""))
}
Then, using the nicely formatted data created above, you should be able to pass int_only as a formatter function to labels to get the labels you want. (Note: untested! necessary tweaks left as an exercise for the reader, though I'll gladly accept edits :) )

R Circular Chord Plots

Im learning how to create circular plots in R, similiar to CIRCOS
Im using the package circlize to draw links between origin and destination pairs based on if the flight was OB, Inbound and Return. The logic fo the data doesnt really matter, its just a toy example
I have gotten the plot to work based on the code below which works based on the following logic
Take my data, combine destination column with the flight type
Convert to a matrix and feed the origin and the new column into circlize
Reference
library(dplyr)
library(circlize)
# Create Fake Flight Information in a table
orig = c("IE","GB","US","ES","FI","US","IE","IE","GB")
dest = c("FI","FI","ES","ES","US","US","FI","US","IE")
direc = c("IB","OB","RETURN","DOM","OB","DOM","IB","RETURN","IB")
mydf = data.frame(orig, dest, direc)
# Add a column that combines the dest and direction together
mydf <- mydf %>%
mutate(key = paste(dest,direc)) %>%
select (orig, key)
# Create a Binary Matrix Based on mydf
mymat <- data.matrix(as.data.frame.matrix(table(mydf)))
# create the objects you want to link from to in your diagram
from <- rownames(mymat)
to <- colnames(mymat)
# Create Diagram by suppling the matrix
par(mar = c(1, 1, 1, 1))
chordDiagram(mymat, order = sort(union(from, to)), directional = TRUE)
circos.clear()
I like the plot a lot but would like to change it a little bit. For example FI (which is Finland) has 3 measurements on the diagram FI IB, FI OB and FI. I would like to combine them all under FI if possible and distinguish between the three Types of flights using either a colour scheme, Arrows or even adding an additional track which acts as an umbrella for IB OB and RETURN flights
So for Example,
FI OB would be placed in FI but have a one way arrow to GB to signify OB
FI IB would be placed in FI but have a one way arrow into FI
FI RETURN (if it exists) would have a double headed arrow
Can anyone help, Has anyone seen anything similiar been done before?
The end result should just have the countries on the plot once so that someone can see very quickly which countries have the most amount of flights
I have tried following other posts but am afraid im getting lost when they move to the more advanced stuff
Thank you very much for your time
First, I think there is a duplicated record (IE-FI-IB) in your data.
I will first attach the code and figure and then explain a little bit.
df = data.frame(orig, dest, direc, stringsAsFactors = FALSE)
df = unique(df)
col = c("IB" = "red",
"OB" = "blue",
"RETURN" = "orange",
"DOM" = "green")
directional = c("IB" = -1,
"OB" = 1,
"RETURN" = 2,
"DOM" = 0)
diffHeight = c("IB" = -0.04,
"OB" = 0.04,
"RETURN" = 0,
"DOM" = 0)
chordDiagram(df[1:2], col = col[df[[3]]], directional = directional[df[[3]]],
direction.type = c("arrows+diffHeight"),
diffHeight = diffHeight[df[[3]]])
legend("bottomleft", pch = 15, legend = names(col), col = col)
First you need to use the development version of circlize for which
you can install it by
devtools::install_github("jokergoo/circlize")
In this new version, chordDiagram() supports input variable as a data frame and drawing two-head arrows for the links (just after reading your post :)).
In above code, col, directional, direction.type and diffHeight can all be set as a vector which corresponds to rows in df.
When directional argument in chordDiagram() is set to 2, the corresponding link will have two directions. Then if direction.type contains arrows, there will be a two-head arrow.
Since diffHeight is a vector which correspond to rows in df, if you want to visualize the direction for a single link both by arrow and offset of the roots, you need to merge these two options as a single string as shown in the example code "arrows+diffHeight".
By default direction for links are from the first column to the second column. But in your case, IB means the reversed direction, so we need to set diffHeight to a negative value to reverse the default direction.
Finally, I observe you have links which start and end in a same sector (ES-ES-DOM and US-US-DOM), you can use self.link argument to control how to represent such self-link. self.link is set to 1 in following figure.
Do you need the arrows because the color coding in the graph is telling the From / To story already (FROM -> color edge FROM COUNTRY, TO is color of the FROM COUNTRY arriving at the TO COUNTRY, IF FROM == TO Its own color returns at its own base (see US or ES for example)).
library(dplyr)
library(circlize)
# Create Fake Flight Information in a table
orig = c("IE","GB","US","ES","FI","US","IE","IE","GB")
dest = c("FI","FI","ES","ES","US","US","FI","US","IE")
mydf = data.frame(orig, dest)
# Create a Binary Matrix Based on mydf
mymat <- data.matrix(as.data.frame.matrix(table(mydf)))
# create the objects you want to link from to in your diagram
from <- rownames(mymat)
to <- colnames(mymat)
# Create Diagram by suppling the matrix
par(mar = c(1, 1, 1, 1))
chordDiagram(mymat, order = sort(union(from, to)), directional = TRUE)
circos.clear()
BY the way -> there is also a OFFSET difference on the edge that tells if it is FROM (wider edge) or TO (smaller edge)

avoiding over-crowding of labels in r graphs

I am working on avoid over crowding of the labels in the following plot:
set.seed(123)
position <- c(rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5),
rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5))
group <- c(rep (1, length (position)/2),rep (2, length (position)/2) )
mylab <- paste ("MR", 1:length (group), sep = "")
barheight <- 0.5
y.start <- c(group-barheight/2)
y.end <- c(group+barheight/2)
mydf <- data.frame (position, group, barheight, y.start, y.end, mylab)
plot(0,type="n",ylim=c(0,3),xlim=c(0,10),axes=F,ylab="",xlab="")
#Create two horizontal lines
require(fields)
yline(1,lwd=4)
yline(2,lwd=4)
#Create text for the lines
text(10,1.1,"Group 1",cex=0.7)
text(10,2.1,"Group 2",cex=0.7)
#Draw vertical bars
lng = length(position)/2
lg1 = lng+1
lg2 = lng*2
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, mydf$mylab[1:lng], srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
You can see some areas are crowed with the labels - when x value is same or similar. I want just to display only one label (when there is multiple label at same point). For example,
mydf$position[1:5] are all 0,
but corresponding labels mydf$mylab[1:5] -
MR1 MR2 MR3 MR4 MR5
I just want to display the first one "MR1".
Similarly the following points are too close (say the difference of 0.35), they should be considered a single cluster and first label will be displayed. In this way I would be able to get rid of overcrowding of labels. How can I achieve it ?
If you space the labels out and add some extra lines you can label every marker.
clpl <- function(xdata, names, y=1, dy=0.25, add=FALSE){
o = order(xdata)
xdata=xdata[o]
names=names[o]
if(!add)plot(0,type="n",ylim=c(y-1,y+2),xlim=range(xdata),axes=F,ylab="",xlab="")
abline(h=1,lwd=4)
dy=0.25
segments(xdata,y-dy,xdata,y+dy)
tpos = seq(min(xdata),max(xdata),len=length(xdata))
text(tpos,y+2*dy,names,srt=90,adj=0)
segments(xdata,y+dy,tpos,y+2*dy)
}
Then using your data:
clpl(mydf$position[lg1:lg2],mydf$mylab[lg1:lg2])
gives:
You could then think about labelling clusters underneath the main line.
I've not given much thought to doing multiple lines in a plot, but I think with a bit of mucking with my code and the add parameter it should be possible. You could also use colour to show clusters. I'm fairly sure these techniques are present in some of the clustering packages for R...
Obviously with a lot of markers even this is going to get smushed, but with a lot of clusters the same thing is going to happen. Maybe you end up labelling clusters with a this technique?
In general, I agree with #Joran that cluster labelling can't be automated but you've said that labelling a group of lines with the first label in the cluster would be OK, so it is possible to automate some of the process.
Putting the following code after the line lg2 = lng*2 gives the result shown in the image below:
clust <- cutree(hclust(dist(mydf$position[1:lng])),h=0.75)
u <- rep(T,length(unique(clust)))
clust.labels <- sapply(c(1:lng),function (i)
{
if (u[clust[i]])
{
u[clust[i]] <<- F
as.character(mydf$mylab)[i]
}
else
{
""
}
})
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, clust.labels, srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
(I've only labelled the clusters on the lower line -- the same principle could be applied to the upper line too). The parameter h of cutree() might have to be adjusted case-by-case to give the resolution of labels that you want, but this approach is at least easier than labelling every cluster by hand.

aligning patterns across panels with gridExtra and grid.pattern()

The gridExtra package adds a grob of class "pattern" that lets one fill rectangles with patterns. For example,
library(gridExtra)
grid.pattern(pattern = 1)
creates a box filled with diagonal lines. I want to create a stack of panels in which each panel is filled with these diagonal lines. This is easy:
library(lattice); library(gridExtra)
examplePlot <- xyplot(
1 ~ 1 | 1:2,
panel = function () grid.pattern(pattern = 1),
layout = c(1, 2),
# Remove distracting visual detail
scales = list(x=list(draw=FALSE), y=list(draw=FALSE)),
strip = FALSE, xlab = '', ylab = ''
)
print(examplePlot)
The problem is that the diagonal lines aren't aligned across panels. That is, there is a visual "break" where the bottom of the first panel meets the top of the second panel: at that point, the lines don't line up. This is the problem that I want to fix.
I can eliminate most of the visual break by adding the argument pattern.offset = c(.2005, 0) to the grid.pattern call, and making sure that it applies only to the bottom panel. But this solution doesn't generalize. For example, if I change the pattern (e.g., by using the granularity argument to grid.pattern), this solution won't work. Is there a more general fix?
To make this work, you'll have to take charge of setting the panel.height argument used by print.trellis. (To see why, try resizing your plotting device after running your example code: as the size of the device and the panels changes, so does the matching/mismatching of the lines):
## Calculate vertical distance (in mm) between 45 degree diagonal lines
## spaced 5mm apart (the default distance for grid.pattern).
vdist <- 5 * sqrt(2)
nLines <- 8L ## can be any integer
panelHeight <- list(x = nLines*vdist, units = "mm", data = NULL)
## Plot it
print(examplePlot, panel.height=panelHeight)

Resources