How to set x limits on varImpPlot - r

How can I change the x limits of a plot produced by varImpPlot from the randomForest package?
If I try
set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000, keep.forest=FALSE,
importance=TRUE)
varImpPlot(mtcars.rf, scale=FALSE, type=1, xlim=c(0,15))
I get the following error:
Error in dotchart(imp[ord, i], xlab = colnames(imp)[i], ylab = "", main = if (nmeas == : formal argument "xlim" matched by multiple actual arguments".
This is because varImpPlot defines its own x limits, I think, but how could I get around this if I wanted to set the x limits myself (perhaps for consistency across plots)?

First I extracted the values using importance() (thanks to the suggestion from #dww)
impToPlot <- importance(mtcars.rf, scale=FALSE)
Then I plotted them using dotchart(), which allowed me to manually set the x limits (and any other plot features I'd like)
dotchart(sort(impToPlot[,1]), xlim=c(0,15), xlab="%IncMSE")

Related

ggsurvplot - axes crossing at 0,0

Survminer produces nice plots, but is there a way to further change the outcome with regular ggplot-commands?
What I try to do is make the y-axis start in the origin, as stated here.
For a regular ggplot, this works perfectly, but I can't make it work with survminer:
library(survival)
library(survminer)
df<-genfan
df$treat<-sample(c(0,1),nrow(df),replace=TRUE)
fit <- survfit(Surv(hours, status) ~ treat,
data = df)
p<-ggsurvplot(fit, data = df, risk.table = TRUE,,ncensor.plot=FALSE,tables.theme = theme_cleantable(),
ggtheme = theme_survminer(font.legend=c(12,"bold","black") ))
p%+%scale_x_continuous(expand=c(0,0))%+%scale_y_continuous(expand=c(0,0))
This produces an error
"Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale."
and an additional error
"Error: Discrete value supplied to continuous scale"
Is there some way around this?
A possibile solution to your problem is to modify the ggsurvplot and ggrisktable inserting the expand=c(0,0) in the proper place.
At this link you can find my proposal.
Click on "download" and save the file a_modified_ggsurvplot.txt in your working directory.
Then run the following code:
library(survival)
library(survminer)
df <- genfan
df$treat <- sample(c(0,1),nrow(df),replace=TRUE)
fit <- survfit(Surv(hours, status) ~ treat, data = df)
source("a_modified_ggsurvplot.txt")
p <- myggsurvplot(fit, data = df, risk.table = TRUE,
ncensor.plot=FALSE,tables.theme = theme_cleantable(),
ggtheme = theme_survminer(font.legend=c(12,"bold","black")))
print(p)
Here is the plot:
Hope it could help you.

Change x and y labels on a gbm partial plot

I am having trouble changing the x and y labels on a partial plot for a gbm model. I need to rename them for the journal article.
I read this in and create the plot as follows:
library(gbm)
final<- readRDS(final_gbm_model)
summary(final, n.trees=final$n.trees)
Here is the summary output:
var rel.inf
ProbMn50ppb ProbMn50ppb 11.042750
ProbDOpt5ppm ProbDOpt5ppm 7.585275
Ngw_1975 Ngw_1975 6.314080
PrecipMinusETin_1971_2000_GWRP PrecipMinusETin_1971_2000_GWRP 4.988598
N_total N_total 4.776950
DTW60YrJurgens DTW60YrJurgens 4.415016
CVHM_TextZone CVHM_TextZone 4.225048
RiverDist_NEAR RiverDist_NEAR 4.165035
LateralPosition LateralPosition 4.036406
CAML1990_natural_water CAML1990_natural_water 3.720303
PctCoarseMFUpActLayer PctCoarseMFUpActLayer 3.668184
BioClim_BIO12 BioClim_BIO12 3.561071
MFDTWSpr2000Faunt MFDTWSpr2000Faunt 3.383900
PBot_krig PBot_krig 3.362289
WaterUse2 WaterUse2 3.291040
AVG_CLAY AVG_CLAY 3.280454
Age_yrs Age_yrs 3.144734
MFVelSept2000 MFVelSept2000 3.064030
AVG_SILT AVG_SILT 2.882709
ScreenLength ScreenLength 2.683542
HydGrp_C HydGrp_C 2.666106
AVG_POR AVG_POR 2.563147
MFVelFeb2000 MFVelFeb2000 2.505106
HiWatTabDepMin HiWatTabDepMin 2.421521
RechargeAnnualmmWolock RechargeAnnualmmWolock 2.252706
I can create a partial dependence plot as follows:
plot(final,"ProbMn50ppb",n.trees=final$n.trees)
But if I try to set the label arguments I get the following error:
plot(final,"ProbMn50ppb",n.trees=final$n.trees,ylab="LNNO3")
Error in plot.default(X$X1, X$y, type = "l", xlab = x$var.names[i.var], :
formal argument "ylab" matched by multiple actual arguments
How can I change the y and x axis labels?
The plot.gbm function passes its own name to the generic plot function so the two are colliding. So you will not be able to customize the plot the way you want in that mode. But the authors did provide an alternative when you set return.grid=TRUE. Instead of building a plot, it will output the data itself. You can then use that for any plot including ggplot2.
plotdata <- plot(gbm1, return.grid=TRUE)
plot(plotdata, type="l", ylab="ylab", xlab="xlab")
Example data from help(gbm)
You can also change the gbm object itself before plotting (or in a function):
your_gbm_obj$var.names[index] = "axis label"

Error using bquote() for axis labelling

I'm experiencing a strange error when using the function bquote for axis labeling. The error is only occurring when applying the label (the greek symbol "mu") to the y-axis:
df <- data.frame(x=1:10, y=1:10)
plot(y~x, df, t="l", xlab=bquote(.("Size [")*mu*m*.("]"))) # works
plot(y~x, df, t="l", ylab=bquote(.("Size [")*mu*m*.("]"))) # doesn't work
# Error in plot.default(1:10, 1:10, ylab = "Size [" * mu * m * "]", xlab = quote("x"), : object 'mu' not found
I know I could use expression as an alternative in this case, but I'm trying to understand the error.
This is due to subtleties of evaluation rules and the specifics of the implementation of this plotting function.
Note that this does not occur when not using the formula interface
plot(df$x,df$y, type="l", ylab=bquote(.("Size [")*mu*m*.("]"))) #works as you expect
To see what is happening, examine the source
getAnywhere("plot.formula")
and you'll see the equivalent of this simplified example
plotex<-function(x,y,type="l",ylab,...) {
m=match.call(expand.dots = FALSE)
dots <- lapply(m$..., eval)
dots$xlab <- enquote(dots$xlab)
do.call(plot,c(list(x=x,y=y,type=type,ylab=ylab),dots))
}
The xlab argument is in ... and protected against evaluation with an explicit enquote. The ylab is a named parameter and its evaluation is forced by inclusion in the list provided to do.call.

Visualize data using histogram in R

I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)

Add jittered data points to lattice xYplot with error bars

I'm trying to plot error bars overlaid on jittered raw data points using xYplot from package Hmisc. Seemed straightforward to just call a function within xYplot using panel.stripplot. It works, but there is a strange glitch - I can't 'jitter' the data plotted with panel.stripplot. Let me illustrate my point:
library(reshape2)
library(Hmisc)
data(iris)
#get error bars
d <- melt(iris, id=c("Species"), measure=c("Sepal.Length"))
X <- dcast(d, Species ~ variable, mean)
SD <- dcast(d, Species ~ variable, sd)
SE = SD[,2]/1#this is wrong on purpose, to plot larger error bars
Lo = X[,2]-SE
Hi = X[,2]+SE
fin <- data.frame(X,Lo=Lo,Hi=Hi)
#plot the error bars combined with raw data points
quartz(width=5,height=7)
xYplot(Cbind(Sepal.Length, Lo, Hi) ~ numericScale(Species), fin,
type=c("p"), ylim=c(4,8),lwd=3, col=1,
scales = list(x = list(at=1:3, labels=levels(d$Species))),
panel = function(x, y, ...) {
panel.xYplot(x, y, ...)
panel.stripplot(d$Species, d$value, jitter.data = TRUE, cex=0.2, ...)
}
)
Which results in:
As you can see, the points are lined up vertically with the error bars, why I would like them to be slightly offset in horizontal plain. I tried to tweak factor and amount parameters in the panel.stripplot but it doesn't change it. Any suggestions? Solutions with lattice-only please, preferably using xYplot.
Use horizontal=FALSE:
panel.stripplot(d$Species, d$value,
jitter.data = TRUE, cex=0.2,horizontal=FALSE, ...)
Internally is just a call to :
panel.xyplot(d$Species, d$value, cex=0.2,jitter.x=TRUE, ...)

Resources