(R) Axis widths in gbm.plot - r

Hoping for some pointers or some experiences insight as i'm literally losing my mind over this, been trying for 2 full days to set up the right values to have a function spit out clean simple line plots from the gbm.plot function (packages dismo & gbm).
Here's where I start. bty=n in par to turn off the box & leave me with only left & bottom axes. Gbm.plot typically spits out one plot per explanatory variable, so usually 6 plots etc, but I'm tweaking it to do one per variable & looping it. I've removed the loop & lots of other code so it's easy to see what's going on.
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1, #this is part of the multiple plots thing, calls the explanatory variable
lwd=8, #this controls the width of the main result line ONLY
rug=F)
dev.off()
So this is what the starting condition looks like. Aim: make the axes & ticks thicker. That's it.
Putting "lwd=20" in par does nothing.
Adding axes=F into gbm.plot() turns the axes and their numbers off. So I conclude that the control of these axes is handled by gbm.plot, not par. Here's where it get's frustrating and crap. Accepted wisdom from searches says that lwd should control this but it only controls the wiggly centre line as per my note above. So maybe I could add axis(side=1, lwd=8) into gbm.plot() ?
It runs but inexplicably adds a smoother! (which is very thin & hard to see on the web but it's there, I promise). It adds these warnings:
In if (smooth & is.vector(predictors[[j]])) { ... :
the condition has length > 1 and only the first element will be used
Fine, R's going to be a dick for seemingly no reason, I'll keep plugging the leaks as they come up. New code with axis as before and now smoother turned off:
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1,
lwd=8,
rug=F,
smooth=F,
axis(side=1,lwd=8))
dev.off()
Gives error:
Error in axis(side = 1, lwd = 8) : plot.new has not been called yet
So it's CLEARLY drawing axes within plot since I can't affect the axes from par and I can turn them off in plot. I can do what I want and make one axis bold, but that results in a smoother and warnings. I can turn the smoother off, but then it fails because it says plot.new hadn't been called. And this doesn't even account for the other axis I have to deal with, which also causes the plot.new failure if I call 2 axis sequentially and allow the smoother.
Am I the butt of a big joke here, or am I missing something obvious? It took me long enough to work out that par is supposed to be before all plots unless you're outputting them with png etc in which case it has to be between png & plot - unbelievably this info isn't in ?par. I know I'm going off topic by ranting, sorry, but yeah, 2 full days. Has this been everyone's experience of plotting in R?
I'm going to open the vodka in the freezer. I appreciate I've not put the full reproducible code here, apologies, I can do if absolutely necessary, but it's such a huge timesuck to get to reproducible stage and I'm hoping someone can see a basic logical/coding failure screaming out at them from what I've given.
Thanks guys.
EDIT: reproducibility
core data csv: https://drive.google.com/file/d/0B6LsdZetdypkWnBJVDJ5U3l4UFU
(I've tried to make these data reproducible before and I can't work out how to do so)
samples<-read.csv("data.csv", header = TRUE, row.names=NULL)
my_gbm_model<-gbm.step(data=samples, gbm.x=1:6, gbm.y=7, family = "bernoulli", tree.complexity = 2, learning.rate = 0.01, bag.fraction = 0.5))

Here's what will widen your axis ticks:
..... , lwd.ticks=4 , ...
I predict on the basis of no testing because I keep getting errors with what limited code you have provided) that it will get handled correctly in either gbm.plot or in a subsequent axis call. There will need to be a subsequent axis call, two of them in fact (because as you noted 'lwd' gets passed around indiscriminately):
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1,
lwd=8,
rug=F,
smooth=F, axes="F",
axis(side=1,lwd=8))
axis(1, lwd.ticks=4, lwd=4)
# the only way to prevent `lwd` from also affecting plot line
axis(2, lwd.ticks=4, lwd=4)
dev.off()
This is what I see with a simple example:
png(); Speed <- cars$speed
Distance <- cars$dist
plot(Speed, Distance,
panel.first = lines(stats::lowess(Speed, Distance), lty = "dashed"),
pch = 0, cex = 1.2, col = "blue", axes=FALSE)
axis(1, lwd.ticks=4, lwd=4)
axis(2, lwd.ticks=4, lwd=4)
dev.off()

Related

R: plot() Function with type="h" Misrepresents Small Numbers ( For Larger Values of "lwd" )

I am trying to generate a plot showing the probabilities of a Binomial(10, 0.3) distribution.
I'd like to do this in base R.
The following code is the best I have come up with,
plot(dbinom(1:10, 10, 0.3), type="h", lend=2, lwd=20, yaxs="i")
My issue with the above code is the small numbers get disproportionately large bars. (See below) For example P(X = 8) = 0.00145 but the height in the plot looks like about 0.025.
It seems to be an artifact created by wanting wider bars, if the lwd = 20 argument is removed you get tiny bars but their heights seem to be representative.
I think the problem is your choice of lend (line-end) parameter. The 'round' (0) and 'square' (2) choices are intended for when you want a little bit of extra extension beyond the end of a segment, e.g. so that adjacent segments join nicely, e.g. if you were plotting line segments that should be part of a connected line (see example below).
f <- function(le) plot(dbinom(1:10, 10, 0.3),
type="h", lend = le, lwd=20, yaxs="i", main = le)
par(mfrow=c(1,3))
invisible(lapply(c("round", "butt", "square"), f))
"round", "butt", and "square" could also be specified (less mnemonically) as 0, 1, and 2 ...
x <- 1:5; y <- c(1,4,2,3,5)
f2 <- function(le) {
plot(x,y, type ="n", main = le)
segments(x[-length(x)], y[-length(x)], x[-1], y[-1],
lwd = 20, lend = le)
}
par(mfrow=c(1,3))
invisible(lapply(c("round", "butt", "square"), f2))
Here you can see that the round end caps work well, both 'butt' and 'square' have issues. (I can't think offhand of a use case for "square", but I'm sure one exists ...) There is a good description of line-drawing parameters here (although it also doesn't suggest use cases ...)

plot function in R producing legend without legend() being called

I'm trying to produce a cumulative incidence plot for a competing hazards survival analysis using plot() in R. For some reason, the plot that is produced has a legend that I have not called. The legend is intersecting with the lines on my graph and I can't figure out how to get rid of it. Please help!
My code is as follows:
CompRisk2 <- cuminc(ftime=ADI$time_DeathTxCensor, fstatus=ADI$status, group=ADI$natADI_quart)
cols <- c("darkorange","coral1","firebrick1","firebrick4","lightskyblue","darkturquoise","dodgerblue","dodgerblue4")
par(bg="white")
plot(CompRisk2,
col=cols,
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,10),
ylim=c(0,0.6))
Which produces the following plot:
I tried adding the following code to move the legend out of the frame, but I got an error:
legend(0,5, legend=c(11,21,31,41,12,22,32,42),
col=c("darkorange","coral1","firebrick1","firebrick4","lightskyblue","darkturquoise","dodgerblue","dodgerblue4"),
lty=1:2, cex=0.8, text.font=4, box.lty=0)
Error: Error in title(...) : invalid graphics parameter
Any help would be much appreciated!
You are using the cuminc function from the cmprsk package. This produces an object of class cuminc, which has an S3 plot method. ?plot.cuminc shows you the documentation and typing plot.cuminc shows you the code.
There is some slightly obscure code that suggests a workaround:
u <- list(...)
if (length(u) > 0) {
i <- pmatch(names(u), names(formals(legend)), 0)
do.call("legend", c(list(x = wh[1], y = wh[2], legend = curvlab,
col = color, lty = lty, lwd = lwd, bty = "n", bg = -999999),
u[i > 0]))
}
This says that any additional arguments passed in ... whose names match the names of arguments to legend will be passed to legend(). legend() has a plot argument:
plot: logical. If ‘FALSE’, nothing is plotted but the sizes are returned.
So it looks like adding plot=FALSE to your plot() command will work.
In principle you could try looking at the other arguments to legend() and see if any of them will adjust the legend position/size as you want. Unfortunately the x argument to legend (which would determine the horizontal position) is masked by the first argument to plot.cuminc.
I don't think that the ellipsis arguments are intended for the legend call inside plot.cuminc. The code offered in Ben's answer suggests that there might be a wh argument that determines the location of the legend. It is not named within the parameters as "x" in the code he offered, but is rather given as a positionally-defined argument. If you look at the plot.cuminc function you do in fact find that wh is documented.
I cannot test this because you have not offered us access to the ADI-object but my suggestion would be to try:
opar <- par(xpd=TRUE) # xpd lets graphics be placed 'outside'
plot(CompRisk2,
col=cols, wh=c(-.5, 7),
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,10),
ylim=c(0,0.6))
par(opar) # restores original graphics parameters
It's always a bit risky to put out a code chunk without testing, but I'm happy to report that I did find a suitable test and it seems to work reasonably as predicted. Using the code below on the object in the SO question prior question about using the gg-packages for cmprsk:
library(cmprsk)
# some simulated data to get started
comp.risk.data <- data.frame("tfs.days" = rweibull(n = 100, shape = 1, scale = 1)*100,
"status.tfs" = c(sample(c(0,1,1,1,1,2), size=50, replace=T)),
"Typing" = sample(c("A","B","C","D"), size=50, replace=T))
# fitting a competing risks model
CR <- cuminc(ftime = comp.risk.data$tfs.days,
fstatus = comp.risk.data$status.tfs,
cencode = 0,
group = comp.risk.data$Typing)
opar <- par(xpd=TRUE) # xpd lets graphics be placed 'outside'
plot(CR,
wh=c(-15, 1.1), # obviously different than the OP's coordinates
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,400),
ylim=c(0,1))
par(opar) # restores graphics parameters
I get the legend to move up and leftward from its original position.

How to put statistical information on the output diagram with hist()?

I made the following histogram. I want to display a stats box with a border here, like the image on this sites image, and write various information there. In addition, it is assumed that you will also write legends for multiple histograms.
As a result of research, I found that ROOT has a similar function, but I would like to realize this with R. How can I do this with R?
The xlsx file I used is large, so I uploaded it here.
library(readxl)
file = read_excel("./data.xlsx")
summary(file)
data = file[["foo[%]"]]
par(las=1, family="Century gothic", xaxs="i", yaxs="i", cex.main=3, cex.lab=1.1, cex.axis=1.1, font.lab=2, font.axis=2)
h = hist(data, yaxt="n", tck=0.03, breaks=seq(18,18.35,0.01), main=NA, xlab="xaxis", ylab="freq", border=NA, col="#00000000", ylim=c(0,100))
axis(2, tck=0.03, at=c(0,10,20,30,40,50,60,70,80,90,100))
grid()
lines(rep(h$breaks, each=2)[-c(1,2*length(h$breaks))], rep(h$counts, each=2), lwd=3)
box()
#jay.sf's suggestion is a little tricky to implement. Here's an example that comes close, but it's not perfect: the spacing is wrong. Maybe someone can improve it?
statname <- expression("Entries", "Mean", "Std Dev", chi^2/"ndf", "Prob", "Constant", "Slope")
statvalue <- expression(5000, 19.59, 18.61, 131.1/115, 0.1447, 5.54 %+-% 0.04,
-0.0514 %+-% 0.0011)
plot(1)
legend("topright", title = "hB",
legend = c(statname, statvalue),
ncol = 2,
x.intersp = 0)
Created on 2021-02-21 by the reprex package (v0.3.0)
It uses expression() for the entries because you have math in them; you might be able to do it reasonably well with special characters instead. I couldn't find a way to set the spacing of the two columns differently: your example had the names flush left and the values flush right. Nor could I set the column widths separately. Probably if you really want to fine tune this, you'll have to write your own function based on the legend function.

R: PCA plot with different colors for Sites

I´m recently trying to analyse my data and want to make the graphs a little nicer but I´m failing at this.
So I have a data set with 144 sites and 5 environmental variables. It´s basically about the substrate composition around an island and the fish abundance. On this island there is supposed to be a difference in the substrate composition between the north and the southside. Right now I am doing a pca and with the biplot function it works quite fine, but I would like to change the plot a bit.
I need one where the sites are just points and not numbered, arrows point to the different variable and the sites are colored according to their location (north or southside). So I tried everything i could find.
Most examples where with the dune data and suggested something like this:
library(vegan)
library(biplot)
data(dune)
mod <- rda(dune, scale = TRUE)
biplot(mod, scaling = 3, type = c("text", "points"))
So according to this I would just need to say text and points and R would label the variables and just make points for the sites. When i do this, however I get the Error:
Error in plot.default(x, type = "n", xlim = xlim, ylim = ylim, col = col[1L], :
formal argument "type" matched by multiple actual arguments
No idea how to get around this.
So next strategy I found, is to make a plot manually like this:
require("vegan")
data(dune, dune.env)
mod <- rda(dune, scale = TRUE)
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
text(mod,display="species", scaling = scl, cex = 0.8, col = "darkcyan")
with(dune.env, legend("bottomright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
This works fine so far as well, I get different colors and points, but now the arrows are missing. So I found that this should be corrected easy, if i just put "display="bp"" in the text line. But this doesn´t work either. Everytime I put "bp" R says:
Error in match.arg(display) :
argument "display" is missing, with no default
So I´m kind of desperate now. I looked through all the answers here and I don´t understand why display="bp" and type=c("text","points") is not working for me.
If anyone has an idea i would be super grateful.
https://www.dropbox.com/sh/y8xzq0bs6mus727/AADmasrXxUp6JTTHN5Gr9eufa?dl=0
This is the link to my dropbox folder. It contains my R-script and the csv files. The one named environmentalvariables_Kon1 also contains the data about north and southside.
So yeah...if anyone could help me. That would be awesome. I really don´t know what to do anymore.
Best regards,
Nancy
You can add arrows with arrows(). See the code for vegan:::biplot.rda to see how it works in the original function.
With your plot, add
g <- scores(mod, display = "species")
len <- 1
arrows(0, 0, len * g[, 1], len * g[, 2], length = 0.05, col = "darkcyan")
You might want to adjust the value of len to make the arrows longer

Inconsistent results saving png() and jpeg() in R

I am saving some complicated graphs off in an R program that include plot(), lines(), points() and abline() function calls and have tried using both png() and jpeg(), but both are rendering very inconsistent results. In one run the grid will be saved in the background, in the next it will not. In one run, the points will be added at the correct lwd, in another they will be huge, or maybe not added at all. In another run, a line will added, and then disappear when I run it again. I am looping through hundreds of iterations, and getting different results with almost every run.
png(paste("/someFilePlace/pics/", propIn, ".png", sep = ""))
plot(plotDat$yhat, col = "white", ylim = c(0,max(plotDat$yhat)*1.1),xaxt='n')
fairlylightgray <- rgb(204/255, 204/255, 204/255, alpha=0.4)
abline(v=(seq(0,1700,100)), col=fairlylightgray, lty="dotted")
abline(h=(seq(0,10,0.5)), col=fairlylightgray, lty="dotted")
points(plotDat$y, cex = '*', lwd = 3, col= "gray")
lines(plotDat$yhat, col = "#08519C")
axis(1, at = c(1,500,1000,1500),
labels = c(plotDat$dt[1],plotDat$dt[500],plotDat$dt[1000],plotDat$dt[1500]))
dev.off()
Congratulations, I think you may have found an obscure almost-bug (at least, failure to intercept a user error). Try replacing cex="*" with something sensible in your code (it should be a number -- or maybe you meant pch="*").
I am able to get different results in different PNGs as follows (if I plot in an X11 window I can get funny things to happen as I resize the window).
for (i in 1:10) {
png(paste("tmp",i,"png", sep="."))
plot(1:10,1:10,cex="*");
dev.off()
}

Resources