R: plotting decision tree labels leaves text cut off - r

(I'm still learning how to handle images in R; this is sort of a continuation of rpart package: Save Decision Tree to PNG )
I'm trying to save a decision tree plot from rpart in PNG form, instead of the provided postscript. My code looks like this:
png("tree.png", width=1000, height=800, antialias="cleartype")
plot(fit, uniform=TRUE,
main="Classification Tree")
text(fit, use.n=TRUE, all=TRUE, cex=.8)
dev.off()
but cuts off a little of the labels for the edge nodes on both sides. this isn't a problem in the original post image, which I've converted to png just to check. I've tried using both oma and mar settings in par, which were recommended as solutions for label/text problems, and both added white space around the image but don't show anymore of the labels. Is there any way to get the text to fit?

The rpart.plot package plots rpart trees and automatically takes care of
the margin and related issues. Use rpart.plot (instead of plot and text in the rpart package). For example:
library(rpart.plot)
data(ptitanic)
fit <- rpart(survived~., data=ptitanic)
png("tree.png", width=1000, height=800, antialias="cleartype")
rpart.plot(fit, main="Classification Tree")
dev.off()

The default margin is 0. So if your text is a set of words or just a long word, try to put more margin in plot call. For example,
plot(fit, uniform=TRUE,margin=0.2)
text(fit, use.n=TRUE, all=TRUE, cex=.8)
Alternatively, you can adjust text font size by changing cex in text call. For example,
plot(fit, uniform=TRUE)
text(fit,use.n=TRUE, all=TRUE, cex=.7)
Of course, you can adjust both mar in plot call and cex in text call to get what you want.

On rpart man, at rpart() examples the author gives the solution, set par options with xpd = NA:
par(mfrow = c(1,2), xpd = NA)
otherwise on some devices the text is clipped

Problem tiwh titanic dataset is rplot will not join ages and fare to display a nive "age > 10" label. It will display them by extension, like:
age = 11,18,19,22,24,28,29,30,32,33,37,39,40,42,45.5,5,56,58,60...
That makes no room for labels (see the picture)
bad labels
Solution is here:
https://community.rstudio.com/t/rpart-result-is-too-small-to-see/60702/4
Basically, you have to mutate age and fare columns into numeric variables. Like:
clean_titanic <- titanic %>%
select(-c(home.dest, cabin, name, x, ticket)) %>%
mutate(
pclass = factor(pclass, levels = c(1, 2, 3), labels = c('Upper', 'Middle', 'Lower')),
survived = factor(survived, levels = c(0, 1), labels = c('No', 'Yes')),
# HERE. Also notice I'm removing dots from numbers
age = as.numeric(age),
fare = as.numeric(fare)
)
That will give you better labels, and room for them in the plot.
One more thing: you could get a warning when you force non numeric values with as.numeric, and there are a couple of ways to solve that, like replacing characters or ignoring the warning. Ignore like:
suppressWarnings(as.numeric(age)))
good plot

Related

plot function in R producing legend without legend() being called

I'm trying to produce a cumulative incidence plot for a competing hazards survival analysis using plot() in R. For some reason, the plot that is produced has a legend that I have not called. The legend is intersecting with the lines on my graph and I can't figure out how to get rid of it. Please help!
My code is as follows:
CompRisk2 <- cuminc(ftime=ADI$time_DeathTxCensor, fstatus=ADI$status, group=ADI$natADI_quart)
cols <- c("darkorange","coral1","firebrick1","firebrick4","lightskyblue","darkturquoise","dodgerblue","dodgerblue4")
par(bg="white")
plot(CompRisk2,
col=cols,
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,10),
ylim=c(0,0.6))
Which produces the following plot:
I tried adding the following code to move the legend out of the frame, but I got an error:
legend(0,5, legend=c(11,21,31,41,12,22,32,42),
col=c("darkorange","coral1","firebrick1","firebrick4","lightskyblue","darkturquoise","dodgerblue","dodgerblue4"),
lty=1:2, cex=0.8, text.font=4, box.lty=0)
Error: Error in title(...) : invalid graphics parameter
Any help would be much appreciated!
You are using the cuminc function from the cmprsk package. This produces an object of class cuminc, which has an S3 plot method. ?plot.cuminc shows you the documentation and typing plot.cuminc shows you the code.
There is some slightly obscure code that suggests a workaround:
u <- list(...)
if (length(u) > 0) {
i <- pmatch(names(u), names(formals(legend)), 0)
do.call("legend", c(list(x = wh[1], y = wh[2], legend = curvlab,
col = color, lty = lty, lwd = lwd, bty = "n", bg = -999999),
u[i > 0]))
}
This says that any additional arguments passed in ... whose names match the names of arguments to legend will be passed to legend(). legend() has a plot argument:
plot: logical. If ‘FALSE’, nothing is plotted but the sizes are returned.
So it looks like adding plot=FALSE to your plot() command will work.
In principle you could try looking at the other arguments to legend() and see if any of them will adjust the legend position/size as you want. Unfortunately the x argument to legend (which would determine the horizontal position) is masked by the first argument to plot.cuminc.
I don't think that the ellipsis arguments are intended for the legend call inside plot.cuminc. The code offered in Ben's answer suggests that there might be a wh argument that determines the location of the legend. It is not named within the parameters as "x" in the code he offered, but is rather given as a positionally-defined argument. If you look at the plot.cuminc function you do in fact find that wh is documented.
I cannot test this because you have not offered us access to the ADI-object but my suggestion would be to try:
opar <- par(xpd=TRUE) # xpd lets graphics be placed 'outside'
plot(CompRisk2,
col=cols, wh=c(-.5, 7),
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,10),
ylim=c(0,0.6))
par(opar) # restores original graphics parameters
It's always a bit risky to put out a code chunk without testing, but I'm happy to report that I did find a suitable test and it seems to work reasonably as predicted. Using the code below on the object in the SO question prior question about using the gg-packages for cmprsk:
library(cmprsk)
# some simulated data to get started
comp.risk.data <- data.frame("tfs.days" = rweibull(n = 100, shape = 1, scale = 1)*100,
"status.tfs" = c(sample(c(0,1,1,1,1,2), size=50, replace=T)),
"Typing" = sample(c("A","B","C","D"), size=50, replace=T))
# fitting a competing risks model
CR <- cuminc(ftime = comp.risk.data$tfs.days,
fstatus = comp.risk.data$status.tfs,
cencode = 0,
group = comp.risk.data$Typing)
opar <- par(xpd=TRUE) # xpd lets graphics be placed 'outside'
plot(CR,
wh=c(-15, 1.1), # obviously different than the OP's coordinates
xlab="Years",
ylab="Probability of Mortality or Transplant",
xlim=c(0,400),
ylim=c(0,1))
par(opar) # restores graphics parameters
I get the legend to move up and leftward from its original position.

r: decisoin tree label doesn't show completly

When I am using rpart to draw the decision tree, there is a small problem, the label of the plot does not works well.
The plot only have a half text on the top and bottom of the plot. How can I fix this.
Here is the code:
library(rpart)
iris.rpart = rpart(Species ~ ., data = iris)
plot(iris.rpart) #Plot the tree
text(iris.rpart) #Show the labels
When you read the documentation of plot.rpart, there are two options mentioned: use of the par option xpd, or using the parameter margin of the plot.rpart function.
1)
The margin parameter adds an extra portion of white space
library(rpart)
iris.rpart = rpart(Species ~ ., data = iris)
plot(iris.rpart, margin = .2) # margin added
text(iris.rpart, use.n = T)
2)
Looking at your picture, it could be that xpd has been set to FALSE (see ?par). From the documentation:
xpd: A logical value or NA. If FALSE, all plotting is clipped to the plot
region, if TRUE, all plotting is clipped to the figure region, and if
NA, all plotting is clipped to the device region. See also clip.
You can see the setting of xpd by typing:
par()xpd
in that case xpd is not TRUE, the solution is as follows:
opar <- par() # to reset later
par(xpd=TRUE)
plot(iris.rpart)
text(iris.rpart, use.n = T)
par <- opar # restore old setting
Please, let me know whether this solved your problem.

Define margins of PDF used for boxplot rendering

When I render a boxplot on a PDF device in R there is a large white space besides the graph, especially at the top that i intent to reduce.
My script is basically just:
data <- read.csv("input.csv")
pdf(file="output.pdf", width=4, height=5)
boxplot(data, xlab="input graphs", ylab="vertex count")
This leads to something like:
where the grey outline indicates the end of the document.
I tried to use the par attributes "mar" and "mai" as described in https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/par.html but it had no effect.
boxplot(data, mar=c(0,0,0,0=, mai=c(0,0,0,0))
Do you have an advice how I can gain whitespace control? I want to have zero outer whitespace as the generated graph will be used in a Latex environment that provides sufficient spacings on its own. I am using Ubuntu as OS.
Define mar right after pdf. Try this as an example
pdf(file = "test.pdf", width = 5, height = 5)
par(mar = c(5, 5, 0.05, 0.05))
set.seed(42)
plot(rnorm(20))
dev.off()

Heatmap like plot with Lattice

I can not figure out how the lattice levelplot works. I have played with this now for some time, but could not find reasonable solution.
Sample data:
Data <- data.frame(x=seq(0,20,1),y=runif(21,0,1))
Data.mat <- data.matrix(Data)
Plot with levelplot:
rgb.palette <- colorRampPalette(c("darkgreen","yellow", "red"), space = "rgb")
levelplot(Data.mat, main="", xlab="Time", ylab="", col.regions=rgb.palette(100),
cuts=100, at=seq(0,1,0.1), ylim=c(0,2), scales=list(y=list(at=NULL)))
This is the outcome:
Since, I do not understand how this levelplot really works, I can not make it work. What I would like to have is the colour strips to fill the whole window of the corresponding x (Time).
Alternative solution with other method.
Basically, I'm trying here to plot the increasing risk over time, where the red is the highest risk = 1. I would like to visualize the sequence of possible increase or clustering risk over time.
From ?levelplot we're told that if the first argument is a matrix then "'x' provides the
'z' vector described above, while its rows and columns are
interpreted as the 'x' and 'y' vectors respectively.", so
> m = Data.mat[, 2, drop=FALSE]
> dim(m)
[1] 21 1
> levelplot(m)
plots a levelplot with 21 columns and 1 row, where the levels are determined by the values in m. The formula interface might look like
> df <- data.frame(x=1, y=1:21, z=runif(21))
> levelplot(z ~ y + x, df)
(these approaches do not quite result in the same image).
Unfortunately I don't know much about lattice, but I noted your "Alternative solution with other method", so may I suggest another possibility:
library(plotrix)
color2D.matplot(t(Data[ , 2]), show.legend = TRUE, extremes = c("yellow", "red"))
Heaps of things to do to make it prettier. Still, a start. Of course it is important to consider the breaks in your time variable. In this very simple attempt, regular intervals are implicitly assumed, which happens to be the case in your example.
Update
Following the advice in the 'Details' section in ?color2D.matplot: "The user will have to adjust the plot device dimensions to get regular squares or hexagons, especially when the matrix is not square". Well, well, quite ugly solution.
par(mar = c(5.1, 4.1, 0, 2.1))
windows(width = 10, height = 2.5)
color2D.matplot(t(Data[ , 2]),
show.legend = TRUE,
axes = TRUE,
xlab = "",
ylab = "",
extremes = c("yellow", "red"))

Combine two plots created with effects package in R

I have the following Problem. After running an ordered logit model, I want to R's effects package to visualize the results. This works fine and I did so for two independent variables, then I tried to combine the two plots. However, this does not seem to work. I provide a little replicable example here so you can see my problem for yourself:
library(car)
data(Chile)
mod <- polr(vote ~ age + log(income), data=Chile)
eff <- effect("log(income)", mod)
plot1 <- plot(eff, style="stacked",rug=F, key.args=list(space="right"))
eff2 <- effect("age", mod)
plot2 <- plot(eff2, style="stacked",rug=F, key.args=list(space="right"))
I can print these two plots now independently, but when I try to plot them together, the first plot is overwritten. I tried setting par(mfrow=c(2,1)), which didn't work. Next I tried the following:
print(plot1, position=c(0, .5, 1, 1), more=T)
print(plot2, position=c(0,0, 1, .5))
In this latter case, the positions of the two plots are just fine, but still the first plot vanishes once I add the second (or better, it is overwritten). Any suggestions how to prevent this behavior would be appreciated.
Reading down the long list of arguments to ?print.eff we see that there are some arguments for doing just this:
plot(eff, style="stacked",rug=F, key.args=list(space="right"),
row = 1,col = 1,nrow = 1,ncol = 2,more = TRUE)
plot(eff2, style="stacked",rug=F, key.args=list(space="right"),
row = 1,col = 2,nrow = 1,ncol = 2)
The reason par() didn't work is because this package is using lattice graphics, which are based on the grid system, which is incompatible with base graphics. Neither par() nor layout will have any effect on grid graphics.
This seems to work:
plot(eff,col=1,row=2,ncol=1,nrow=2,style="stacked",rug=F,
key.args=list(space="right"),more=T)
plot(eff2,col=1,row=1,ncol=1,nrow=2,style="stacked",rug=F,
key.args=list(space="right"))
edit: Too late...

Resources