rpart plot text shorter - r

I am using the prp function from the rpart.plot package to plot a tree. For categorical data like states, it gives a really long list of variables and makes it less readable. Is there any way to wrap text to two or more lines if exceeds some length?

Here's an example that wraps long split labels over multiple
lines. The maximum length of each line is 25 characters. Change the
25 to suit your purposes. (This example is derived from Section 6.1 in
the rpart.plot vignette.)
tree <- rpart(Price/1000 ~ Mileage + Type + Country, cu.summary)
split.fun <- function(x, labs, digits, varlen, faclen)
{
# replace commas with spaces (needed for strwrap)
labs <- gsub(",", " ", labs)
for(i in 1:length(labs)) {
# split labs[i] into multiple lines
labs[i] <- paste(strwrap(labs[i], width=25), collapse="\n")
}
labs
}
prp(tree, split.fun=split.fun)

Related

How to split the label in R for a plot

I am trying to plot CART model that I have built,the node label is too long and want to split it into two lines. I have used gsub() to split it but I want to split the next line too.
here is the code.
model=rpart(Train$service_date_max_count~ injury_type_cd+specialty_type_cd
+state+age+sex+pedestrian_yn+vehicle_driver_yn+vehicle_passenger_yn
+marital_status+category+extesio, data=Train,method ="anova")
split.fun <- function(x, labs, digits, varlen, faclen)
{
gsub(" = ", ":\n", labs)
}
fancyRpartPlot(model , palettes = 'Oranges' ,tweak =0.8, split.fun=split.fun)
I have attached a snippet showing how the plot looks

Where is this extra data coming from? (in R plot)

I'm a biologist, but I had to teach myself python and R working different places a few years ago. A situation came up at my current job that R would be really useful for, and so i cobbled together a program. Surprisingly, it does just what I'd like EXCEPT the graphs it's generating have an extra bar at the beginning. !
I've entered no data to correspond to that first bar:
I'm hoping this is some simple error in how I've set the plot parameters. Could it be because I'm using plot instead of boxplot? Is it plotting the headings?
More worrisome is the possibility that while reading in and merging my 3 data frames I'm creating some sort of artifact data, which would also affect the statistical tests and make me very sad, though I don't see anything like this when I have it write the matrix to a file.
I greatly appreciate any help!
Here's what it looks like, and then the function it calls (in another script).
(I'm really not a programmer, so I apologize if the following code is miserable.) The goal is to compare our data (which is in columns 10-17 of a csv) to all of the data in a big sheet of clinical data in turn. Then, if there is a significant correlation (the p value is less than .05), to graph the two against each other. This gives me a fast way to find if there's something worth looking further into in this big data set.
first <- read.csv(labdata)
second <- read.csv(mrntoimacskey)
third <- read.csv(imacsdata)
firsthalf<-merge(first,second)
mp <-merge(firsthalf, third, by="PATIENTIDNUMBER")
setwd(aplaceforus)
pfile2<- sprintf("%spvalues", todayis)
setwd("fulldataset")
for (m in 10:17) {
n<-m-9
pretty= pretties[n]
for (i in 1:length(colnames(mp))) {
tryCatch(sigsearchA(pfile2,mp, m, i, crayon=pretty), error= function(e)
{cat("ERROR :", conditionMessage(e), "\n")})
tryCatch(sigsearchC(pfile2,mp, m, i, crayon=pretty), error= function(e)
{cat("ERROR :", conditionMessage(e), "\n")})
}
}
sigsearchA<-function(n, mp, y, x, crayon="deepskyblue"){
#anova, plots if significant. takes name of file, name of database,
#and the count of the columns to use for x and y
stat<-oneway.test(mp[[y]]~mp[[x]])
pval<-stat[3]
heads<-colnames(mp)
a<-heads[y]
b<-heads[x]
ps<-c(a, b, pval)
write.table(ps, file=n, append= TRUE, sep =",", col.names=FALSE)
feedback<- paste(c("Added", b, "to", n), collapse=" ")
if (pval <= 0.05 & pval>0) {
#horizontal lables
callit<-paste(c(a,b,".pdf"), collapse="")
val<-sprintf("p=%.5f", pval)
pdf(callit)
plot(mp[[x]], mp[[y]], ylab=a, main=b, col=crayon)
mtext(val, adj=1)
dev.off()
#with vertical lables, in case of many groups
callit<-paste(c(a,b,"V.pdf"), collapse="")
pdf(callit)
plot(mp[[x]], mp[[y]], ylab=a, main=b,las=2,cex.axis=0.7, col=crayon)
mtext(val, adj=1)
dev.off()
}
print(feedback) }
graphics.off()
I can't be absolutely certain without a reproducible example, but it looks like the x-variable in your plot (let's call it x and let's assume your data frame is called df) has at least one row with an empty string ("") or maybe a space character (" ") and x is also coded as a factor. Even if you remove all of the "" values from the data frame, the level for that value will still be part of the factor coding and will show up in plots. To remove the level, do df$x = droplevels(df$x) and then run your plot again.
For illustration, here's an analogous example with the built-in iris data frame:
# Shows that Species is coded as a factor
str(iris)
# Species is a factor with three levels
levels(iris$Species)
# There are 50 rows for each level of Species
table(iris$Species)
# Three boxplots, one for each level of Species
boxplot(iris$Sepal.Width ~ iris$Species)
# Now let's remove all the rows with Species = "setosa"
iris = iris[iris$Species != "setosa",]
# The "setosa" rows are gone, but the factor level remains and shows up
# in the table and the boxplot
levels(iris$Species)
table(iris$Species)
boxplot(iris$Sepal.Width ~ iris$Species)
# Remove empty levels
iris$Species = droplevels(iris$Species)
# Now the "setosa" level is gone from all plots and summaries
levels(iris$Species)
table(iris$Species)
boxplot(iris$Sepal.Width ~ iris$Species)

How to put comma in large number of VennDiagram?

I have a venn diagram that I make with the package VennDiagram. The numbers are above the 100,000.
I would like the number in the iddle to be 150,001, with a comma separator, or 150 000, with a small space in between. Is this possible to do with VennDiagram?
This is my example
library(VennDiagram)
venn.diagram(x = list(A = 1:200000,B = 50000:300000), filename = "../example.tiff")
I dont think you can do this easily. There are two print modes, raw, and percent, but these are hard-coded in the function (have a look at VennDiagram::draw.triple.venn). You can add formats by changing the function (which I wouldn't fancy) or by manually tweaking the grobs (which is done below)
library(VennDiagram)
p <- venn.diagram(x = list(A = 1:200000,B = 50000:300000), filename = NULL)
# Change labels for first three text grobs
# hard-coded three, but it would be the number of text labels
# minus the number of groups passed to venn.diagram
idx <- sapply(p, function(i) grepl("text", i$name))
for(i in 1:3){
p[idx][[i]]$label <-
format(as.numeric(p[idx][[i]]$label), big.mark=",", scientific=FALSE)
}
grid.newpage()
grid.draw(p)

ggplot2 : printing multiple plots in one page with a loop

I have several subjects for which I need to generate a plot, as I have many subjects I'd like to have several plots in one page rather than one figure for subject.
Here it is what I have done so far:
Read txt file with subjects name
subjs <- scan ("ListSubjs.txt", what = "")
Create a list to hold plot objects
pltList <- list()
for(s in 1:length(subjs))
{
setwd(file.path("C:/Users/", subjs[[s]])) #load subj directory
ifile=paste("Co","data.txt",sep="",collapse=NULL) #Read subj file
dat = read.table(ifile)
dat <- unlist(dat, use.names = FALSE) #make dat usable for ggplot2
df <- data.frame(dat)
pltList[[s]]<- print(ggplot( df, aes(x=dat)) + #save each plot with unique name
geom_histogram(binwidth=.01, colour="cyan", fill="cyan") +
geom_vline(aes(xintercept=0), # Ignore NA values for mean
color="red", linetype="dashed", size=1)+
xlab(paste("Co_data", subjs[[s]] , sep=" ",collapse=NULL)))
}
At this point I can display the single plots for example by
print (pltList[1]) #will print first plot
print(pltList[2]) # will print second plot
I d like to have a solution by which several plots are displayed in the same page, I 've tried something along the lines of previous posts but I don't manage to make it work
for example:
for (p in seq(length(pltList))) {
do.call("grid.arrange", pltList[[p]])
}
gives me the following error
Error in arrangeGrob(..., as.table = as.table, clip = clip, main = main, :
input must be grobs!
I can use more basic graphing features, but I d like to achieve this by using ggplot. Many thanks for consideration
Matilde
Your error comes from indexing a list with [[:
consider
pl = list(qplot(1,1), qplot(2,2))
pl[[1]] returns the first plot, but do.call expects a list of arguments. You could do it with, do.call(grid.arrange, pl[1]) (no error), but that's probably not what you want (it arranges one plot on the page, there's little point in doing that). Presumably you wanted all plots,
grid.arrange(grobs = pl)
or, equivalently,
do.call(grid.arrange, pl)
If you want a selection of this list, use [,
grid.arrange(grobs = pl[1:2])
do.call(grid.arrange, pl[1:2])
Further parameters can be passed trivially with the first syntax; with do.call care must be taken to make sure the list is in the correct form,
grid.arrange(grobs = pl[1:2], ncol=3, top=textGrob("title"))
do.call(grid.arrange, c(pl[1:2], list(ncol=3, top=textGrob("title"))))
library(gridExtra) # for grid.arrange
library(grid)
grid.arrange(pltList[[1]], pltList[[2]], pltList[[3]], pltList[[4]], ncol = 2, main = "Whatever") # say you have 4 plots
OR,
do.call(grid.arrange,pltList)
I wish I had enough reputation to comment instead of answer, but anyway you can use the following solution to get it work.
I would do exactly what you did to get the pltList, then use the multiplot function from this recipe. Note that you will need to specify the number of columns. For example, if you want to plot all plots in the list into two columns, you can do this:
print(multiplot(plotlist=pltList, cols=2))

Using plot with title containing text, formulas and variables

I want to make the title of my plot contain text, formulas and variables. Consider the toy example where I want the title to read as:
Histogram of normal distribution with (mu/sigma) equal to (value of mu/sigma)
(where the first bracket is to be rendered as a formula)
Based on some questions around this site, I tried the following code:
x <- rnorm(1000)
mu <- 1
sigma <- 0
hist(x, main=bquote("Histogram of normal distribution with " *frac(mu,sigma)* " equal to ", .(mu/sigma) ) )
Now the problem is that the value of mu/sigma is not shown, like so:
How can I get the last bit to show?
Here's one way to do it:
title(main=substitute(paste("Histogram of normal distribution with ",
frac(mu,sigma), " equal to ", frac(m,s)),
list(m=mu, s=sigma)))

Resources