R Bubble chart: labels overlapping - r

Attempting to code a function that returns a bubble chart from aggregated data.
I'm passing it a column of a data.frame in "agg".
aggs2 <- function(agg, deporur=0, all=TRUE){
##create aggregate from library data
agg1 <- aggregate(agg, by=list(NoNA$IMD_NATIONAL_QUINTILE, NoNA$UR),
FUN=function(x) c(mn=mean(x), n=length(x)))
##bind into a dataframe
agg1 <- cbind(agg1[,1:2], agg1[,3])
##add column holding values of Deprivation Quantile and Urban/Rural status
agg1$NewCol <- do.call(paste, c("Deprivation Quantile", agg1[c("Group.1", "Group.2")],
sep = " "))
##set column names
colnames(agg1) <- c("Deprivation", "Urban and Rural", "Mean", "Count", "DepUR")
##remove categories with low counts
if(all==FALSE){
agg1 <- subset(agg1, agg1$Count > 9)
}
##order data.frame by mean
agg1 <- agg1[order(agg1$Mean, decreasing=TRUE),]
##create bubble chart
if(deporur==1){
radius3 <- sqrt(agg1$Count/pi)
symbols(factor(agg1$DepUR), agg1$Mean, circles=radius3, inches=0.35,
xlim=c(0,10.0), ylim=c(min(agg1$Mean-0.25),10.0), fg="white", bg="purple",
xlab="Deprivation Quantile and Urban/Rural Status", ylab="Mean Response")
text(factor(agg1$DepUR), agg1$Mean-.1, agg1$DepUR, cex=0.7)
}
#return ordered dataframe
agg1
}
This returns a sorted data.frame by mean, and the following chart:
Because this function will need to create graphs from a variety of different documents and columns, I would like to code it so that the labels do not overlap the bubbles, or other labels.
I have looked at the directlabels library, but I have been unable to work out how to code it properly.
Would greatly appreciate any assistance.

I'm not aware of any solution for non-overlapping labels with regards to other labels AND other circles. Nevertheless, wordcloud::textplot might be a starting point:
library(wordcloud)
set.seed(8)
df <- data.frame(x = runif(10), y = runif(10), size = sample(10:20, 10), lab = paste0("label", 1:10))
par(mfrow = c(1,2))
with(df, {
plot(x, y, cex = size, pch = 19, col = adjustcolor("violet", alpha.f = .4), main = "non-overlapping")
textplot(x, y, lab, new = FALSE, show.lines = FALSE, cex = 2)
plot(x, y, cex = size, pch = 19, col = adjustcolor("violet", alpha.f = .4), main = "overlapping")
text(x, y, lab, cex = 2)
})

Related

Insert legend into plot that is associated with factors

I have a plot(x,y) associated with two other factors z and t. There are three levels in z and two levels in t. How do I properly use legend function to insert legend to give three levels of z with t1, such as z1t1, z2t1, z3t1, and three levels of z with t2, such as z1t2, z2t2, z3t2? In other words, the legends should show a total of six.
with(df, plot(x, y,
pch = as.numeric(as.factor(paste(z,t))),
col = as.numeric(as.factor(paste(z, t)))))
This looks like what you are looking for.
UPDATE: Factors in the legend are sorted now.
#creating test data
x <- rnorm(20)
y <- x + runif(20)
dat <- data.frame("x" = x, "y" = y,
z = sample(c("z1", "z2", "z3"), 20, replace = TRUE),
t = sample(c("t1", "t2"), 20, replace = TRUE))
#it's quicker to do the pasting outside
dat$zt <- as.numeric(as.factor(paste(dat$z,dat$t)))
with(dat, plot(x, y,
pch = zt,
col = zt))
with(dat, legend(x = "bottomright",
legend = sort(unique(paste(z,t))),
pch = unique(zt),
col = unique(zt)))
Hope it helps.

How to plot a table of values and corresponding colors in R? The dreaded heat map [duplicate]

I am trying to create a data table whose cells are different colors based on the value in the cell. I can achieve this with the function addtable2plot from the plotrix package. The addtable2plot function lays a table on an already existing plot. The problem with that solution is that I don't want a plot, just the table.
I've also looked at the heatmap functions. The problem there is that some of the values in my table are character, and the heatmap functions, from what I can tell, only accept numeric matrices. Also, I want my column names to be at the top of the table, not the bottom, and that doesn't seem to be an option.
Here's the example code for addtable2plot. If I could get just the table, filling the whole screen, that would be great.
library(plotrix)
testdf<-data.frame(Before=c(10,7,5,9),During=c(8,6,2,5),After=c(5,3,4,3))
rownames(testdf)<-c("Red","Green","Blue","Lightblue")
barp(testdf,main="Test addtable2plot",ylab="Value",
names.arg=colnames(testdf),col=2:5)
# show most of the options including the christmas tree colors
abg<-matrix(c(2,3,5,6,7,8),nrow=4,ncol=3)
addtable2plot(2,8,testdf,bty="o",display.rownames=TRUE,hlines=TRUE,
vlines=TRUE,title="The table",bg=abg)
Any help would be greatly appreciated.
A heatmap alternative:
library(gplots)
# need data as matrix
mm <- as.matrix(testdf, ncol = 3)
heatmap.2(x = mm, Rowv = FALSE, Colv = FALSE, dendrogram = "none",
cellnote = mm, notecol = "black", notecex = 2,
trace = "none", key = FALSE, margins = c(7, 11))
In heatmap.2 the side of the plot the axis is to be drawn on is hard-coded. But if you type "heatmap.2" at the console and copy the output to an editor, you can search for axis(1, where the 1 is the side argument (two hits). You can then change from a 1 (axis below plot) to a 3 (axis above the plot). Assign the updated function to a new name, e.g. heatmap.3, and run it as above.
An addtable2plot alternative
library(plotrix)
# while plotrix is loaded anyway:
# set colors with color.scale
# need data as matrix*
mm <- as.matrix(testdf, ncol = 3)
cols <- color.scale(mm, extremes = c("red", "yellow"))
par(mar = c(0.5, 1, 2, 0.5))
# create empty plot
plot(1:10, axes = FALSE, xlab = "", ylab = "", type = "n")
# add table
addtable2plot(x = 1, y = 1, table = testdf,
bty = "o", display.rownames = TRUE,
hlines = TRUE, vlines = TRUE,
bg = cols,
xjust = 2, yjust = 1, cex = 3)
# *According to `?color.scale`, `x` can be a data frame.
# However, when I tried with `testdf`, I got "Error in `[.data.frame`(x, segindex) : undefined columns selected".
A color2D.matplot alternative
library(plotrix)
par(mar = c(0.5, 8, 3.5, 0.5))
color2D.matplot(testdf,
show.values = TRUE,
axes = FALSE,
xlab = "",
ylab = "",
vcex = 2,
vcol = "black",
extremes = c("red", "yellow"))
axis(3, at = seq_len(ncol(testdf)) - 0.5,
labels = names(testdf), tick = FALSE, cex.axis = 2)
axis(2, at = seq_len(nrow(testdf)) -0.5,
labels = rev(rownames(testdf)), tick = FALSE, las = 1, cex.axis = 2)
After this little exercise, I tend to agree with #Drew Steen that LaTeX alternatives may be investigated as well. For example, check here and here.
You can hack something with grid and gtable,
palette(c(RColorBrewer::brewer.pal(8, "Pastel1"),
RColorBrewer::brewer.pal(8, "Pastel2")))
library(gtable)
gtable_add_grobs <- gtable_add_grob # alias
d <- head(iris, 3)
nc <- ncol(d)
nr <- nrow(d)
extended_matrix <- cbind(c("", rownames(d)), rbind(colnames(d), as.matrix(d)))
## text for each cell
all_grobs <- matrix(lapply(extended_matrix, textGrob), ncol=ncol(d) + 1)
## define the fill background of cells
fill <- lapply(seq_len(nc*nr), function(ii)
rectGrob(gp=gpar(fill=ii)))
## some calculations of cell sizes
row_heights <- function(m){
do.call(unit.c, apply(m, 1, function(l)
max(do.call(unit.c, lapply(l, grobHeight)))))
}
col_widths <- function(m){
do.call(unit.c, apply(m, 2, function(l)
max(do.call(unit.c, lapply(l, grobWidth)))))
}
## place labels in a gtable
g <- gtable_matrix("table", grobs=all_grobs,
widths=col_widths(all_grobs) + unit(4,"mm"),
heights=row_heights(all_grobs) + unit(4,"mm"))
## add the background
g <- gtable_add_grobs(g, fill, t=rep(seq(2, nr+1), each=nc),
l=rep(seq(2, nc+1), nr), z=0,name="fill")
## draw
grid.newpage()
grid.draw(g)
Sort of a hacky solution based on ggplot2. I don't totally understand how you actually want to map your colors, since in your example the colors in the table are not mapped to the rownames of testdf, but here I've mapped the colors to the value (converted to a factor).
testdf$color <- rownames(testdf)
dfm <- melt(testdf, id.vars="color")
p <- ggplot(dfm, aes(x=variable, y=color, label=value, fill=as.factor(value))) +
geom_text(colour="black") +
geom_tile(alpha=0.2)
p
You can change what variable the values are mapped to using fill=, and you can change the mapping using scale_fill_manual(values=[a vector of values].
That said, I'd be curious to see a solution that produces an actual table, rather than a plot masquerading as a table. Possibly using Sweave and LaTeX tables?

How to make the "ylim" of two plots to be exactly the same in R

Background
I have a function called TPN. When you run this function, it produces two plots (see picture below). The bottom-row plot samples from the top-row plot.
Question
I'm wondering how I could fix the ylim of the bottom-row plot to be always (i.e., regardless of the input values) the same as ylim of the top-row plot?
R code is provided below the picture (Run the entire block of code).
############## Input Values #################
TPN = function( each.sub.pop.n = 150,
sub.pop.means = 20:10,
predict.range = 10:0,
sub.pop.sd = .75,
n.sample = 2 ) {
#############################################
par( mar = c(2, 4.1, 2.1, 2.1) )
m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
set.seed(2460986)
Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
set.seed(NULL)
x <- rep(predict.range, each = each.sub.pop.n)
plot(x, y) ## Plot #1
sample <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sample <- data.frame(y = unlist(sample),
x = as.numeric(rep(names(sample), each = n.sample)))
plot(sample$x, sample$y) ## Plot # 2
}
## TEST HERE:
TPN()
You can get the ylim using par("yaxp")[1:2]. So, you can change the second plot code to have its ylim as the first plot's:
plot(sample$x, sample$y, ylim = par("yaxp")[1:2]) ## Plot # 2
or as mentioned in the comments, you can simply set the ylim for both plots to be range of both data-sets and add that to both plots:
ylim = range(c(y, sample$y))
Another option: Produce the same plot again but with type = "n" and then filling the points with points(). For example, change your plot 2 to
plot(x, y, type = "n")
points(sample$x, sample$y)
A benefit of this approach is that everything in the plot will be exactly the same, not just the y-axis (which may or may not matter for your function).

Label the x axis correct in a histogram in R

I tried to name the x axis correct.
hist(InsectSprays$count, col='pink', xlab='Sprays', labels=levels(InsectSprays$spray), xaxt='n')
axis(1, at=unique(InsectSprays$spray), labels=levels(InsectSprays$spray))
But this produces
I want the letters below the bars and not on top.
You have to plot the labels at the histogram bin midpoints. If you want to remove the axis and just have lettering, the padj will move the letters closer to the axis which you just removed.
h <- hist(InsectSprays$count, plot = FALSE)
plot(h, xaxt = "n", xlab = "Insect Sprays", ylab = "Counts",
main = "", col = "pink")
axis(1, h$mids, labels = LETTERS[1:6], tick = FALSE, padj= -1.5)
I generally think barplot are more suited for categorical variables. A solution in base R could be, with some rearrangement of the data:
d <- aggregate(InsectSprays$count, by=list(spray=InsectSprays$spray), FUN=sum)
d <- d[order(d$x, decreasing = T),]
t <- d$x
names(t) <- d$spray
barplot(t, las = 1, space = 0, col = "pink", xlab = "Sprays", ylab = "Count")
The output is the following:
Since you mentioned a ggplot solution would be nice:
library(ggplot)
library(dplyr)
InsectSprays %>%
group_by(spray) %>%
summarise(count = sum(count)) %>%
ggplot(aes(reorder(spray, -count),count)) +
geom_bar(stat = "identity", fill = "pink2") +
xlab("Sprays")
The output being:

clusplot - showing variables

I would like to add to a clusplot plot the variables used for pca as arrows. I am not sure that a way has been implemented (I can't find anything in the documentation).
I have produced a clusplot that looks like this:
With the package princomp I can independently plot the observations in an analogous space of representation, with the variables (columns) as arrows:
Is there a way to do the two things at the same time, by showing the clusters and the variables of pca on the same diagram?
I wanted to to the same thing as OP today and ended up putting pieces from clusplot and biplot together. This is the result which may be useful if you want to do the same thing:
clusplot2 <- function(dat, clustering, ...) {
clusplot(dat, clustering, ...)
## this is from clusplot.default
pca <- princomp(dat, scores = TRUE, cor = (ncol(dat) != 2))
## this is (adapted) from biplot.princomp
directions <- t(t(pca$loadings[, 1:2]) * pca$sdev[1:2]) * sqrt(pca$n.obs)
## all below is (adapted) from biplot.default
unsigned.range <- function(x) c(-abs(min(x, na.rm = TRUE)),
abs(max(x, na.rm = TRUE)))
x <- predict(pca)[, 1:2]
y <- directions
rangx1 <- unsigned.range(x[, 1L])
rangx2 <- unsigned.range(x[, 2L])
rangy1 <- unsigned.range(y[, 1L])
rangy2 <- unsigned.range(y[, 2L])
xlim <- ylim <- rangx1 <- rangx2 <- range(rangx1, rangx2)
ratio <- max(rangy1/rangx1, rangy2/rangx2)
par(new = T)
col <- par("col")
if (!is.numeric(col))
col <- match(col, palette(), nomatch = 1L)
col <- c(col, col + 1L)
cex <- rep(par("cex"), 2)
plot(y, axes = FALSE, type = "n", xlim = xlim * ratio, ylim = ylim *
ratio, xlab = "", ylab = "", col = col[1L])
axis(3, col = col[2L])
axis(4, col = col[2L])
box(col = col[1L])
text(y, labels = names(dat), cex = cex[2L], col = col[2L])
arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L],
length = 0.1)
}
############################################################
library(cluster)
dat <- iris[, 1:4]
clus <- pam(dat, k = 3)
clusplot2(dat, clus$clustering, main = "Test")
Of course there is much room for improvement (as this is just copied together) but I think anyone can easily adapt it if needed.
If you wonder why the arrows (loadings * sdev) are scaled with 0.8 * sqrt(n): I have absolutely no idea. I would have plotted loadings * sdev which should resemble the correlation between the principal components and the variables but this is how biplot does it.
Anyway, this should produce the same arrows as biplot.princomp and use the same pca as clusplot which was the primary goal for me.

Resources