r: decisoin tree label doesn't show completly - r

When I am using rpart to draw the decision tree, there is a small problem, the label of the plot does not works well.
The plot only have a half text on the top and bottom of the plot. How can I fix this.
Here is the code:
library(rpart)
iris.rpart = rpart(Species ~ ., data = iris)
plot(iris.rpart) #Plot the tree
text(iris.rpart) #Show the labels

When you read the documentation of plot.rpart, there are two options mentioned: use of the par option xpd, or using the parameter margin of the plot.rpart function.
1)
The margin parameter adds an extra portion of white space
library(rpart)
iris.rpart = rpart(Species ~ ., data = iris)
plot(iris.rpart, margin = .2) # margin added
text(iris.rpart, use.n = T)
2)
Looking at your picture, it could be that xpd has been set to FALSE (see ?par). From the documentation:
xpd: A logical value or NA. If FALSE, all plotting is clipped to the plot
region, if TRUE, all plotting is clipped to the figure region, and if
NA, all plotting is clipped to the device region. See also clip.
You can see the setting of xpd by typing:
par()xpd
in that case xpd is not TRUE, the solution is as follows:
opar <- par() # to reset later
par(xpd=TRUE)
plot(iris.rpart)
text(iris.rpart, use.n = T)
par <- opar # restore old setting
Please, let me know whether this solved your problem.

Related

Modify figure sizes of `pr_curve` and `auc_curve` from R package yardstick

I'm trying to generate ROC curve and precision-recall curve using the library "yardstick". However, I could not find a way to modify the figure shape. Here's a toy example.
## Precision-recall curve
data.frame(true = as.factor(rep(c(0,1), 10)),
pred = runif(20)) %>%
pr_curve(truth = true, pred) %>%
autoplot()
## ROC curve
data.frame(true = as.factor(rep(c(0,1), 10)),
pred = runif(20)) %>%
roc_curve(truth = true, pred) %>%
autoplot()
When you run the codes, the generated figures look like below;
The top figure (ROC curve) is in a square form, while the bottom one (precision-recall curve) is a rectangle.
I've tried to
change width and height options in pdf function
change different options supported by ggplot2 (e.g. plot.margin using theme)
but could not find a good way to make two figures in the same shape.
How could I unify their shapes (or forms)?
Any comment will be very appreciated.
coord_fixed() from ggplot2 will do the trick. Note that you also need the adapt xlim and ylim if you want the plot area to be a square.
pr_curve(tmp1, truth = true, pred) %>%
autoplot() +
coord_fixed(xlim = 0:1, ylim = 0:1)

Control plot layout in foodweb plot from mvbutils R-package

I would like to visualize how functions in my own R package depend on each other. For this purpose I use the foodweb() function from the mvbutils package.
I can get the right functional dependencies out without a problem but the plot looks a bit messy, with lines crossing each other and function names not aligned vertically or horizontally.
Is there a way to control the layout of the plot similar to the way this works in the igraph package?
Example
dirPath <- "~/dev/stackoverflow/46910042"
setwd(dirPath)
## Download example Package
urlPackage <- "https://github.com/kbroman/qtlcharts/archive/master.zip"
download.file(urlPackage, destfile = "master.zip")
unzip("./master.zip", exdir = dirPath, overwrite = TRUE)
## Install or load mcbutils
if (!require(mvbutils)) install.packages("mvbutils")
thefiles = list.files(path = "./qtlcharts-master/R/", full.names = TRUE)
thefiles
## Now we load all the package files into memory, so we can have
## foodweb generate a map of the package functions.
sapply(thefiles, source)
## Generate plot
par(mar = rep(0.1, 4))
foodweb(border = TRUE, boxcolor = "pink", lwd = 1.5, cex = 0.8)
Plot Output:
Michael,
One option is to look behind the curtains of foodweb. The mvbutils::foodweb function returns an object of (S3) class foodweb. This has three components:
funmat a matrix of 0s and 1s showing what (row) calls what (column). The dimnames are the function names.
x shows the x-axis location of the centre of each function’s name in the display, in par("usr") units
level shows the y-axis location of the centre of each function’s name in the display, in par("usr") units.
thus one approach we can take is to call foodweb but tell it not to create a plot rather return a foodweb object. This then allows us to manipulate the data directory or via graphics::plot() externally of the defaults provided by the mvbutils::foodweb() function.
Why? Well, to do what you suggest my sense is three options exist:
You can either play with mvbutils::foodweb() parameters.
Work with data structure returned with another plotting package.
Use graphics::par() and graphics::plot to manipulate the plot size and attributes of the foodweb structure returned.
It would be great to know your preference. Excluding, that my sense was to provide a base example:
Plot Package Example
In the case of using graphics::plot, you need to go look at how you manipulate graphics:par. par() allows you to set or query graphical parameters. For example, if we want to clean up the function plot you might choose to modify the grahics::par() fin parameter to increase the figure region dimensions, (width, height), in inches. A simple example but my sense it helps map out and demonstrate the options available to you.
## Generate plot
if (!require(qtlcharts)) install.packages("qtlcharts")
## Here we specify `asNamespace` to get the package internals
fw <- foodweb( where = asNamespace( "qtlcharts"),
plotting = FALSE,
)
#Display foodweb structure
str(fw)
# Expand plot figure region dimensions...
par(fin = c(9.9,7))
# Plot fw strucuture
plot(fw,
border = TRUE,
expand.xbox = 1,
boxcolor = "pink", lwd = 1.5, cex = 0.8)
Plot Output example
Note that the function names are not spaced out. Note I cut the top and bottom white of plot here. In this case, you can play with the par constraints such as margin to get the plot you want.
Pruning your plot
Another option within the constraints of mvbutils::foodweb is to use the prune and rprune option to simplify your plots. These are super poweful and useful especially the regular expression version.
if (!require(qtlcharts)) install.packages("qtlcharts")
fw <- foodweb( where = asNamespace( "qtlcharts"),
plotting = FALSE)
str(fw)
par(fin = c(9.9,7))
plot(fw,
border = TRUE,
expand.xbox = 1,
boxcolor = "pink", lwd = 1.5, cex = 0.8)
fw <- foodweb( where = asNamespace( "qtlcharts"),
rprune = "convert_", ## search on `convert_` to negate use `~convert_`
plotting = FALSE)
str(fw)
par(fin = c(9.9,7))
plot(fw,
border = TRUE,
expand.xbox = 1,
boxcolor = "pink", lwd = 1.5, cex = 0.8)
Hoping the above information points you in the right direction.
T.
Because of the fact that there are many data, connections etc, the plot is squeezed in order to fit in the screen, hence it becomes messy.
What I would suggest is to save it in a PDF or PNG with big enough width and Height and then you can zoom in. This will save you a lot of time.
E.G.
## Generate plot
pdf( "mygraph.pdf", width = 50, height = 80 )
par(mar = rep(0.1, 4))
foodweb(border = TRUE, boxcolor = "pink", lwd = 1.5, cex = 0.8)
dev.off()
In addition, you can play with the plot options of foodweb.
Hope it helps.

how to make a biplot without label in R

I used
biplot(prcomp(data, scale.=T), xlabs=rep("·", nrow(data)))
but it did not work to omit the labels.
Even if I remove the labels my plot is so messy and ugly which can be seen below!
I also need to show the percentage of PCs on axes
I used the following command to plot the image
biplot(prcomp(data, scale.=T), xlabs=rep("·", nrow(data)), ylabs = rep("·", ncol(data)))
Try this one
\devtools::install_github("sinhrks/ggfortify")
library(ggfortify)
ggplot2::autoplot(stats::prcomp(USArrests, scale=TRUE), label = FALSE, loadings.label = TRUE)

Equivalent of boxplot lwd parameter for bwplot

I want to have the box plotted with thicker lines. In boxplot function I simply put lwd=2, but in the lattice bwplot I can pull my hair out and haven't found a solution!
(with the box I mean the blue thing in the image above)
Sample code to work with:
require(lattice)
set.seed(123)
n <- 300
type <- sample(c("city", "river", "village"), n, replace = TRUE)
month <- sample(c("may", "june"), n, replace = TRUE)
x <- rnorm(n)
df <- data.frame(x, type, month)
bwplot(x ~ type|month, data = df, panel=function(...) {
panel.abline(h=0, col="green")
panel.bwplot(...)
})
As John Paul pointed out, the line widths are controlled by the the box.rectangle and box.umbrella components of lattice's graphical parameter list. (For your future reference, typing names(trellis.par.get()) is a fast way to scan the list of graphical attributes controlled by that list.)
Here's a slightly cleaner way to set those options for one or more particular figures:
thickBoxSettings <- list(box.rectangle=list(lwd=2), box.umbrella=list(lwd=2))
bwplot(x ~ type|month, data = df,
par.settings = thickBoxSettings,
panel = function(...) {
panel.abline(h=0, col="green")
panel.bwplot(...)
})
One thing you can do is get the trellis settings for the box, and change those. Try
rect.settings<-trellis.par.get("box.rectangle") #gets all rectangle settings
rect.settings$lwd<-4 #sets width to 4, you can choose what you like
trellis.par.set("box.rectangle",rect.settings)
Put these above your bwplot call and it should do it.
The box rectangle settings also has color, fill etc.
Edit to add if you get box.umbrella you can edit it to change what the lines above and below the box look like.
There is a further feature of lattice plots that needs mention. They are really objects, so methods exist for modifying their list representations;
myBW <- bwplot(x ~ type|month, data = df, panel=function(...) {
panel.abline(h=0, col="green")
panel.bwplot(...)
})
newBW <- update(myBW, par.settings=list(box.rectangle=list(lwd=4) ))
plot(newBW) # need to print or plot a grid object
You can also use trellis.focus and apply further updating function to overlay new data or text.

R: plotting decision tree labels leaves text cut off

(I'm still learning how to handle images in R; this is sort of a continuation of rpart package: Save Decision Tree to PNG )
I'm trying to save a decision tree plot from rpart in PNG form, instead of the provided postscript. My code looks like this:
png("tree.png", width=1000, height=800, antialias="cleartype")
plot(fit, uniform=TRUE,
main="Classification Tree")
text(fit, use.n=TRUE, all=TRUE, cex=.8)
dev.off()
but cuts off a little of the labels for the edge nodes on both sides. this isn't a problem in the original post image, which I've converted to png just to check. I've tried using both oma and mar settings in par, which were recommended as solutions for label/text problems, and both added white space around the image but don't show anymore of the labels. Is there any way to get the text to fit?
The rpart.plot package plots rpart trees and automatically takes care of
the margin and related issues. Use rpart.plot (instead of plot and text in the rpart package). For example:
library(rpart.plot)
data(ptitanic)
fit <- rpart(survived~., data=ptitanic)
png("tree.png", width=1000, height=800, antialias="cleartype")
rpart.plot(fit, main="Classification Tree")
dev.off()
The default margin is 0. So if your text is a set of words or just a long word, try to put more margin in plot call. For example,
plot(fit, uniform=TRUE,margin=0.2)
text(fit, use.n=TRUE, all=TRUE, cex=.8)
Alternatively, you can adjust text font size by changing cex in text call. For example,
plot(fit, uniform=TRUE)
text(fit,use.n=TRUE, all=TRUE, cex=.7)
Of course, you can adjust both mar in plot call and cex in text call to get what you want.
On rpart man, at rpart() examples the author gives the solution, set par options with xpd = NA:
par(mfrow = c(1,2), xpd = NA)
otherwise on some devices the text is clipped
Problem tiwh titanic dataset is rplot will not join ages and fare to display a nive "age > 10" label. It will display them by extension, like:
age = 11,18,19,22,24,28,29,30,32,33,37,39,40,42,45.5,5,56,58,60...
That makes no room for labels (see the picture)
bad labels
Solution is here:
https://community.rstudio.com/t/rpart-result-is-too-small-to-see/60702/4
Basically, you have to mutate age and fare columns into numeric variables. Like:
clean_titanic <- titanic %>%
select(-c(home.dest, cabin, name, x, ticket)) %>%
mutate(
pclass = factor(pclass, levels = c(1, 2, 3), labels = c('Upper', 'Middle', 'Lower')),
survived = factor(survived, levels = c(0, 1), labels = c('No', 'Yes')),
# HERE. Also notice I'm removing dots from numbers
age = as.numeric(age),
fare = as.numeric(fare)
)
That will give you better labels, and room for them in the plot.
One more thing: you could get a warning when you force non numeric values with as.numeric, and there are a couple of ways to solve that, like replacing characters or ignoring the warning. Ignore like:
suppressWarnings(as.numeric(age)))
good plot

Resources