ggplot heatmap failing to fill tiles - r

This (minimal, self-contained) example is broken:
require(ggplot2)
min_input = c(1, 1, 1, 2, 2, 2, 4, 4, 4)
input_range = c(4, 470, 1003, 4, 470, 1003, 4, 470, 1003)
density = c(
1.875000e-01,
5.598958e-04,
0.000000e+00,
1.250000e-02,
3.841146e-04,
0.000000e+00,
1.250000e-02,
1.855469e-04,
0.000000e+00)
df = data.frame(min_input, input_range, density)
pdf(file='problemspace.pdf')
ggplot(df, aes(x=min_input, y=input_range, fill=density)) +
geom_tile()
dev.off()
Producing:
Why are there big gaps?

There are gaps because you don't have data for all of the tiles. If you want to try to fill them in, your only option is to interpolate (assuming you don't have access to additional data). In theory, geom_raster() (a close relative of geom_tile()) supports interpolation. However, according to this github issue, that feature is not currently functional.
As a workaround, however, you can use qplot, which is just a wrapper around ggplot:
qplot(min_input, input_range, data=df, geom="raster", fill=density, interpolate=TRUE)
If there is too much space between the points that you have data for, you will still end up with blank spaces in your graph, but this will extend the range that you can estimate values for.
EDIT:
Based on the example that you posted, this will be the output
As you can see, there is a vertical band of white running through the middle, due to the lack of data points between 2 and 4.

Related

Fitting a long list of studies in Metafor Forestplot to make it readable

I'm trying to do a forest plot using metafor::forest. The problem is that because I have over 100 studies the output is unreadable because the rows blend together and overlap unless I choose a very small font (in which case you also can't read it).
Any ideas on how to plot this so that the font is large enough to be read without having all rows overlap?
Here's the code I'm using:
library(metafor)
res <- rma(yi, vi, data=dat)
par(mar = c(6, 6, 6, 6))
forest(res, addpred=T, header=TRUE, atransf=exp, at=log(c(.05, 0.5, 5, 15)), xlim=c(-5,5), ylim=c(-3,190), cex=.75)
This is the output
The only way it doesn't overlap is with cex = .05, but it is of course impossible to see anything:
I do not known the library nor the data, and cannot reproduce your example, but, for case, try using another output device:
png("file.png" , width = 480, height = 4000)
plot(runif(100))
dev.off()
Then see result file in current (getwd()) directory.

Lines don't align in rastertoPolygons

I am having trouble with aligning grids on a plot I made. Basically the plots show the result of a 34x34 matrix where each point has a value of 0,1,2,3 and is colored based on this. The lines which outline the cells do not match up perfectly with the coloring of the cells. My code and image are below.
library(raster)
r<-raster(xmn=1,xmx=34,ymn=1,ymx=34,nrows=34,ncols=34)
data1<-read.csv(file ="mat_aligned.csv",row.names = 1)
numbers<-data.matrix(data1)
r[]<-numbers
breakpoints<-c(-1,0.1,1.1,2.1,3.1)
colors<-c("white","blue","green","red")
plot(r,breaks=breakpoints,col=colors)
plot(rasterToPolygons(r),add=TRUE,border='black',lwd=3)
I would appreciate any help with this!
The problem is that the base R plot and the drawing of the grid use different plotting systems. The polygons will stay constant relative to the plotting window (they will appear narrower as the window shrinks), and won't preserve their relationship to the underlying plot axes, whereas the coloured squares will resize to preserve shape. You'll probably find that you can get your grid to match better by resizing your window, but of course, this isn't ideal.
The best way to get round this is to use the specific method designed for plotting SpatialPolygonDataFrame, which is the S4 class produced by rasterToPolygons. This is, after all, how you're "meant" to create such a plot.
Here's a reprex (obviously I've had to make some random data as yours wasn't shared in the question) :
library(raster)
r <- raster(xmn = 1, xmx = 34, ymn = 1, ymx = 34, nrows = 34, ncols = 34)
r[] <- data.matrix(as.data.frame(replicate(34, sample(0:3, 34, TRUE))))
colors <- c("white","blue","green","red")
spplot(rasterToPolygons(r), at = 0:4 - 0.5, col.regions = colors)
Created on 2020-05-04 by the reprex package (v0.3.0)
It is difficult to help if you not provide a minimal self-contained reporducible example. Something like this
library(raster)
r <- raster(xmn=1,xmx=34,ymn=1,ymx=34,nrows=34,ncols=34)
values(r) <- sample(4, ncell(r), replace=T)
p <- rasterToPolygons(r)
plot(r)
lines(p)
I see what you describe, even though it is minimal. A work-around could be to only plot the polygons
colors<-c("white","blue","green","red")
plot(p, col=colors[p$layer])

Add curve to Lattice barchart

I hope the question is correctly posted. It is probably trivial but I am still not able to answer. I checked several options, included the info contained here, but with no fortune. Perhaps, I am still not used to Lattice commands, or the problem is actually not relevant.
I would overlap a barchart with a curve, such as (let's say) a normal standard distribution curve or the density distribution of the data.
Please consider the following data as example, representing the results of several die rolls:
e11 <- data.frame(freq = rep(seq(1, 6, 1), c(53, 46, 42, 65, 47, 44)))
plot_e11 <- barchart(e11,
horizontal = FALSE,
type = "density",
main = "Die results frequencies",
panel = function(x, ...){
panel.barchart(x, ...)
panel.abline(densityplot(e11$freq))})
print(plot_e11a)
It returns the normal barchart instead of the expected result.
How can I add a curve to the barchart, such as the one in the following example?
plot_e11b <- densityplot(e11$freq,
plot.points = FALSE)
panel.abline is the wrong panel function.
panel.abline adds a line of the form y = a + b * x, or vertical
and/or horizontal lines.
densityplot(e11$freq,
panel=function(x, ...) {
tab <- table(x)
panel.barchart(names(tab), tab/length(x),
horizontal=FALSE)
panel.densityplot(x, plot.points=FALSE)},
ylim=c(0, 0.3))

Spidergraph in R

The following is some code that produces various spider graphs:
# Data must be given as the data frame, where the first cases show maximum.
maxmin <- data.frame(
total=c(5, 1),
phys=c(15, 3),
psycho=c(3, 0),
social=c(5, 1),
env=c(5, 1))
# data for radarchart function version 1 series, minimum value must be omitted from above.
RNGkind("Mersenne-Twister")
set.seed(123)
dat <- data.frame(
total=runif(3, 1, 5),
phys=rnorm(3, 10, 2),
psycho=c(0.5, NA, 3),
social=runif(3, 1, 5),
env=c(5, 2.5, 4))
dat <- rbind(maxmin,dat)
op <- par(mar=c(1, 2, 2, 1),mfrow=c(2, 2))
radarchart(dat, axistype=1, seg=5, plty=1, vlabels=c("Total\nQOL", "Physical\naspects",
"Phychological\naspects", "Social\naspects", "Environmental\naspects"),
title="(axis=1, 5 segments, with specified vlabels)")
radarchart(dat, axistype=2, pcol=topo.colors(3), plty=1, pdensity=30, pfcol=topo.colors(3),
title="(topo.colors, fill, axis=2)")
radarchart(dat, axistype=3, pty=32, plty=1, axislabcol="grey", na.itp=FALSE,
title="(no points, axis=3, na.itp=FALSE)")
radarchart(dat, axistype=1, plwd=1:5, pcol=1, centerzero=TRUE,
seg=4, caxislabels=c("worst", "", "", "", "best"),
title="(use lty and lwd but b/w, axis=1,\n centerzero=TRUE, with centerlabels)")
par(op)
The output of the graphs consists of two sets of line segments with different colors. Where did the second set of line segments come from? Also what is a good way to graph multiple items on the same spider graph?
You should mention that you are using the fmsb library to create the graph. The code you show is the example in the documentation. The puzzling thing at first glance is why three sets of lines are shown (not two as you imply with "second set") while there are 5 records in dat.
It is all in that same documentation you took the code from:
row 1 = the maximum values (defined in `maxmin` in the example code)
row 2 = minimum values (defined in `maxmin` in the example code)
row 3 to 5 are example data points, each row leading to one of the
three line segments that you see in the example graphs.
Just read the documentation for radarchart {fmsb} again and play with the numbers in the example as you do so. It should be pretty clear what is happening and what options you have for your own data. You can add as many data-rows and create corresponding lines as you wish. But these do tend to become unreadable if you overdo it.

avoiding over-crowding of labels in r graphs

I am working on avoid over crowding of the labels in the following plot:
set.seed(123)
position <- c(rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5),
rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5))
group <- c(rep (1, length (position)/2),rep (2, length (position)/2) )
mylab <- paste ("MR", 1:length (group), sep = "")
barheight <- 0.5
y.start <- c(group-barheight/2)
y.end <- c(group+barheight/2)
mydf <- data.frame (position, group, barheight, y.start, y.end, mylab)
plot(0,type="n",ylim=c(0,3),xlim=c(0,10),axes=F,ylab="",xlab="")
#Create two horizontal lines
require(fields)
yline(1,lwd=4)
yline(2,lwd=4)
#Create text for the lines
text(10,1.1,"Group 1",cex=0.7)
text(10,2.1,"Group 2",cex=0.7)
#Draw vertical bars
lng = length(position)/2
lg1 = lng+1
lg2 = lng*2
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, mydf$mylab[1:lng], srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
You can see some areas are crowed with the labels - when x value is same or similar. I want just to display only one label (when there is multiple label at same point). For example,
mydf$position[1:5] are all 0,
but corresponding labels mydf$mylab[1:5] -
MR1 MR2 MR3 MR4 MR5
I just want to display the first one "MR1".
Similarly the following points are too close (say the difference of 0.35), they should be considered a single cluster and first label will be displayed. In this way I would be able to get rid of overcrowding of labels. How can I achieve it ?
If you space the labels out and add some extra lines you can label every marker.
clpl <- function(xdata, names, y=1, dy=0.25, add=FALSE){
o = order(xdata)
xdata=xdata[o]
names=names[o]
if(!add)plot(0,type="n",ylim=c(y-1,y+2),xlim=range(xdata),axes=F,ylab="",xlab="")
abline(h=1,lwd=4)
dy=0.25
segments(xdata,y-dy,xdata,y+dy)
tpos = seq(min(xdata),max(xdata),len=length(xdata))
text(tpos,y+2*dy,names,srt=90,adj=0)
segments(xdata,y+dy,tpos,y+2*dy)
}
Then using your data:
clpl(mydf$position[lg1:lg2],mydf$mylab[lg1:lg2])
gives:
You could then think about labelling clusters underneath the main line.
I've not given much thought to doing multiple lines in a plot, but I think with a bit of mucking with my code and the add parameter it should be possible. You could also use colour to show clusters. I'm fairly sure these techniques are present in some of the clustering packages for R...
Obviously with a lot of markers even this is going to get smushed, but with a lot of clusters the same thing is going to happen. Maybe you end up labelling clusters with a this technique?
In general, I agree with #Joran that cluster labelling can't be automated but you've said that labelling a group of lines with the first label in the cluster would be OK, so it is possible to automate some of the process.
Putting the following code after the line lg2 = lng*2 gives the result shown in the image below:
clust <- cutree(hclust(dist(mydf$position[1:lng])),h=0.75)
u <- rep(T,length(unique(clust)))
clust.labels <- sapply(c(1:lng),function (i)
{
if (u[clust[i]])
{
u[clust[i]] <<- F
as.character(mydf$mylab)[i]
}
else
{
""
}
})
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, clust.labels, srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
(I've only labelled the clusters on the lower line -- the same principle could be applied to the upper line too). The parameter h of cutree() might have to be adjusted case-by-case to give the resolution of labels that you want, but this approach is at least easier than labelling every cluster by hand.

Resources