ggplot2 proportional squares - r

I am looking to make some kind of proportional squares (by lack of a better name) visualization in R. Example:
Any advice on how to do this in R (preferably ggplot2)?

This type of visualization is called a treemap. Appropriately, you can use the treemap package. You can find a detailed tutorial for treemap here but I'll show you the basics. Below I show you how to create a treemap in ggplot2 as well.
Treemap package
library(treemap)
cars <- mtcars
cars$carname <- rownames(cars)
treemap(
cars,
index = "carname",
vSize = "disp",
vColor = "cyl",
type = "value",
format.legend = list(scientific = FALSE, big.mark = " ")
)
ggplot2
There's also a developmental package on github for creating treemaps using ggplot2. Here's the repo for installing the package.
library(tidyverse)
library("ggfittext")
library("treemapify")
cars <- mtcars
cars$carname <- rownames(cars)
cars <- mutate(cars, cyl = factor(cyl))
ggplot(cars, aes(area = disp, fill = cyl, label = carname)) +
geom_treemap() +
geom_treemap_text(
fontface = "italic",
colour = "white",
place = "centre",
grow = TRUE
)

Related

Visualizing the CLARA cluster center/medoid

I visualized my CLARA results using fviz_cluster (ggplot2) and I would like to have the medoids of each cluster more prominent (like changing their shape or color, etc) than other data points. The issue is, I have more than 800,000 data points and it is impossible to see it just through the "show.clust.cent".
How can I color the medoids with different colors and make them so much bigger than other data points, or make other data points invisible except the medoids? I also tried to use the star.plot but somehow it didn't work.
I know the line number of the medoids and thought to add it manually, but I also don't know how to integrate it to the fviz_cluster.
Can anyone help me with this? Thank you!
fviz_cluster(clara.res,
palette = c("#004c6d",
"#00ffff",
"#00a1c1",
"#6efa75",
"#78ab63",
"#cc0089",
"#ffc334",
"#ff9509",
"#ffb6de",
"#00cfe3"
), # color palette
ellipse.type = "t",geom = "point",show.clust.cent = TRUE,repel = TRUE,pointsize = 0.5,
ggtheme = theme_classic()
Will this be ok for you?
library(tidyverse)
fpoint = function(n) tibble(
Dlm1 = rnorm(n, sample(-20:20,1), sample(1:5,1)),
Dlm2 = rnorm(n, sample(-20:20,1), sample(1:5,1))
)
df = tibble(cluster = paste(1:10)) %>%
mutate(data = map(cluster, ~fpoint(1000))) %>%
unnest(data)
df %>% ggplot(aes(Dlm1, Dlm2, color=cluster))+
geom_point(alpha = 0.2, pch=21)+
stat_ellipse(size=0.7)
Write data to tibble and use standard ggplot.
Update 1
library(factoextra)
library(cluster)
df = USArrests %>% na.omit() %>% scale()
kmed = pam(df, k = 4)
fviz_cluster(kmed, data = df, alpha=0.3, geom = "point", show.clust.cent = TRUE,repel = TRUE, pointsize = 2)
Is that the point?

Heatmap in plotly with defined colors per category in r

I am trying to plot a heatmap with specified colors (by category) in plotly. I asked a similar question here: "Split" up by category in plotly.
However, I ran into a new problem while trying a similar thing with a heatmap. My code looks like:
# Test DataFrame
test_df <- data.frame(
"weekday" = c("Fr", "Sa", "Su"),
"time" = c("06:00:00", "12:00:00", "18:00:00"),
"channel" = c("NBC", "CBS Drama", "ABC"),
"colors" = c("#FCB711", "#162B48", "#AA8002"),
"views" = c(1200, 1000, 1250)
)
plot_ly(colors = unique(as.character(test_df$colors)), type = "heatmap") %>%
add_trace(test_df,
x = test_df$weekday,
y = test_df$time,
z = test_df$views,
type = "heatmap")
What I get is the following picture:
The problems I have here are:
1. The colors are not the colors which I told R to use
2. I do not want a colorscale, rather the categories split up channels.
I know there is a workaround in ggplot, and I am working on it, but I want to have it in plotly.
Here is what it looks like in ggplot and what I want to have in plotly (I am aware of ggplotly, but that still isn't pure plotly):
Here is the code for the above picture:
channel_colors <- test_df %>% distinct(colors) %>% pull(colors)
names(channel_colors) <- test_df %>% distinct(channel) %>% pull(channel)
p <- ggplot(data = test_df,
aes(
x = weekday,
y = time,
fill = channel)) +
geom_tile(aes(alpha = views)) +
scale_alpha(range = c(0.5, 1)) +
theme_minimal() +
scale_fill_manual(values = channel_colors)
ggplotly(p)
I would appreciate any help.

How do I plot the Variable Importance of my trained rpart decision tree model?

I trained a model using rpart and I want to generate a plot displaying the Variable Importance for the variables it used for the decision tree, but I cannot figure out how.
I was able to extract the Variable Importance. I've tried ggplot but none of the information shows up. I tried using the plot() function on it, but it only gives me a flat graph. I also tried plot.default, which is a little better but still now what I want.
Here's rpart model training:
argIDCART = rpart(Argument ~ .,
data = trainSparse,
method = "class")
Got the variable importance into a data frame.
argPlot <- as.data.frame(argIDCART$variable.importance)
Here is a section of what that prints:
argIDCART$variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958
I'd like to plot a graph that shows the variable/feature name and its numerical importance. I just can't get it to do that. It appears to only have one column. I tried separating them using the separate function, but can't do that either.
ggplot(argPlot, aes(x = "variable importance", y = "feature"))
Just prints blank.
The other plots look really bad.
plot.default(argPlot)
Looks like it plots the points, but doesn't put the variable name.
Since there is no reproducible example available, I mounted my response based on an own R dataset using the ggplot2 package and other packages for data manipulation.
library(rpart)
library(tidyverse)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
df <- data.frame(imp = fit$variable.importance)
df2 <- df %>%
tibble::rownames_to_column() %>%
dplyr::rename("variable" = rowname) %>%
dplyr::arrange(imp) %>%
dplyr::mutate(variable = forcats::fct_inorder(variable))
ggplot2::ggplot(df2) +
geom_col(aes(x = variable, y = imp),
col = "black", show.legend = F) +
coord_flip() +
scale_fill_grey() +
theme_bw()
ggplot2::ggplot(df2) +
geom_segment(aes(x = variable, y = 0, xend = variable, yend = imp),
size = 1.5, alpha = 0.7) +
geom_point(aes(x = variable, y = imp, col = variable),
size = 4, show.legend = F) +
coord_flip() +
theme_bw()
If you want to see the variable names, it may be best to use them as the labels on the x-axis.
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART))
(You may need to resize the window to see the labels properly.)
If you have a lot of variables, you may want to rotate the variable names so that the do not overlap.
par(mar=c(7,4,3,2))
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART), las=2)
Data
argIDCART = read.table(text="variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958",
header=TRUE)

R: Change axis label in plot.Mclust AND/OR plot uncertainty of mclust model with ggplot2

I am really confused. I would like to change the axis labels of a plot (classification or uncertainty) for a 'Mclust' model object in R and I don't understand why it's working for a simple object with just two variables, but not several ones.
Here an example:
require(mclust)
mod1 = Mclust(iris[,1:2])
plot(mod1, what = "uncertainty", dimens = c(1,2), xlab = "test")
# changed x-axis-label
mod2 = Mclust(iris[,1:4])
plot(mod2, what = "uncertainty", dimens = c(1,2), xlab = "test")
# no changed x-axis-label
Another way I tried was with coordProj:
coordProj(data= iris[, -5], dimens = c(1,2), parameters = mod2$parameters,
z = mod2$z, what = "uncertainty", xlab = "test")
# Error in plot.default(data[, 1], data[, 2], pch = 19, main = "", xlab = xlab, :
# formal argument "xlab" matched by multiple actual arguments
So I thought, maybe it will work with ggplot2 (and that would be my favourite option). Now I can change the axis labels and so on but I don't know how to plot the ellipses?
require(ggplot2)
ggplot(data = iris) +
geom_point(aes(x = Sepal.Length, y = Sepal.Width, size = mod2$uncertainty)) +
scale_x_continuous(name = "test")
It would be nice, if someone might know a solution to change the axis labels in plot.Mclust or to add the ellipses to ggplot.
Thanks a lot!
I started to look at the code for plot.Mclust, but then I just used stat_ellipse and changed the level until the plots looked the same. It appears to be a joint t-distribution (the default) at 50% confidence (instead of the default 95%). There's probably a better way to do it using the actual covariance matrix (mod2$parameters$variance$sigma), but this gets you to where you want.
require(dplyr)
iris %>%
mutate(uncertainty = mod2$uncertainty,
classification = factor(mod2$classification)) %>%
ggplot(aes(Sepal.Length, Sepal.Width, size = uncertainty, colour = classification)) +
geom_point() +
guides(size = F, colour = F) + theme_classic() +
stat_ellipse(level = 0.5, type = "t") +
labs(x = "Label X", y = "Label Y")

exporting to powerpoint using a function to create graphs

I’m currently trying to export multiple graphs into the same Powerpoint presentation in R. The multiple graphs are created using a function.
However, when I run the code below it produces a separate Powerpoint for each of the variables (I want them in the same one for each of Calc_Commissiona and CalcCommission_Perc), Age_Banded, InstalmentsRequestedInd and NetPrem_Banded. This is because the ggsave just looks at the last plot I’m assuming.
Any ideas?
Also, the CreateGraph function is just producing the graph for CalcCommission Perc. Both CalcCommission and CalcCommission_Perc work independently when the other is removed…
require(ggplot2)
require(RDCOMClient)
require(R2PPT)
date <- "20160401"
CalcCommission <- function(Variable,FName,AxisAngle){
Actual_Commission <- tapply(Converted_A$Commission,Converted_A[Variable],mean)
Predicted_Commission <- tapply(Final_cut$Commission_Response*Final_cut$Origination.Demand,Final_cut[Variable],sum)/tapply(Final_cut$Origination.Demand,Final_cut[Variable],sum)/100
Data <- data.frame(x=names(Actual_Commission),Actual_Commission,Predicted_Commission)
Commission_Plot <- ggplot(Data,aes(x=seq(length(unique(x))))) +
geom_line(aes(y=Actual_Commission, colour = "Actual Commission")) +
geom_line(aes(y=Predicted_Commission, colour = "Predicted Commission")) +
scale_x_continuous(name = FName,
breaks = seq(length(unique(Data$x))),
labels = unique(Data$x)) +
scale_y_continuous(name = "Commission £") +
ggtitle("Commission £") +
theme(legend.title=element_blank(),axis.text.x = element_text(angle = AxisAngle, hjust = 1))
mypres <- PPT.Init(method="RDCOMClient")
mypres<-PPT.AddTitleSlide(mypres,title="Commission £",subtitle=date)
ggsave(my_temp_file<-paste(tempfile(),".wmf",sep=""), plot=Commission_Plot)
mypres <- PPT.AddBlankSlide(mypres)
mypres <- PPT.AddGraphicstoSlide(mypres,file=my_temp_file)
unlink(my_temp_file)
}
CalcCommission_Perc <- function(Variable,FName,AxisAngle){
Actual_Commission_Perc <- tapply((Converted_A$Commission/Converted_A$NetPremium)*100,Converted_A[Variable],mean)
Predicted_Commission_Perc <- (((tapply(Final_cut$Commission_Response*Final_cut$Origination.Demand,Final_cut[Variable],sum)/tapply(Final_cut$Origination.Demand,Final_cut[Variable],sum))/100)/
(tapply(Final_cut$Prem_Net*Final_cut$Origination.Demand,Final_cut[Variable],sum)/tapply(Final_cut$Origination.Demand,Final_cut[Variable],sum)))*100
Data <- data.frame(x=names(Actual_Commission_Perc),Actual_Commission_Perc,Predicted_Commission_Perc)
Commission_Perc_Plot <- ggplot(Data,aes(x=seq(length(unique(x))))) +
geom_line(aes(y=Actual_Commission_Perc, colour = "Actual Commission %")) +
geom_line(aes(y=Predicted_Commission_Perc, colour = "Predicted Commission %")) +
scale_x_continuous(name = FName,
breaks = seq(length(unique(Data$x))),
labels = unique(Data$x)) +
scale_y_continuous(name = "Commission £") +
ggtitle("Commission %") +
theme(legend.title=element_blank(),axis.text.x = element_text(angle = AxisAngle, hjust = 1))
mypres <- PPT.Init(method="RDCOMClient")
mypres<-PPT.AddTitleSlide(mypres,title="Commission %",subtitle=date)
ggsave(my_temp_file<-paste(tempfile(),".wmf",sep=""), plot=Commission_Perc_Plot)
mypres <- PPT.AddBlankSlide(mypres)
mypres <- PPT.AddGraphicstoSlide(mypres,file=my_temp_file)
unlink(my_temp_file)
}
CreateGraph <- function(Variable,FName,AxisAngle){
CalcCommission(Variable,FName,AxisAngle)
CalcCommission_Perc(Variable,FName,AxisAngle)
}
CreateGraph("Age_Banded","Age",0)
CreateGraph("InstalmentsRequestedInd","DD Payment",0)
CreateGraph("NetPrem_Banded","Net Premium",45)
Here's one way to save two plots in one pptx file:
library(ReporteRs)
library(ggplot2)
library(magrittr)
pptx() %>%
addSlide("Title and Content") %>%
addTitle("plot #1") %>%
addPlot(function() barplot( 1:8, col = 1:8 )) %>%
addSlide("Title and Content") %>%
addTitle("plot #2") %>%
addPlot(fun = print, x = qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width, alpha = I(0.7) )) %>%
writeDoc(ppfn <<- tempfile(fileext = ".pptx"))
ppfn contains the PowerPoint file name including its path. Check out the package documentation here.
Answer above is outdated, as ReporteRs has been removed from CRAN and is superseded by officer. I just made a new package export built on top of officer that easily allows one to export several graphs to a single Powerpoint presentation using the graph2ppt() command and the append=TRUE option, e.g. to produce a presentation with 2 slides :
install.packages("export")
library(export)
library(ggplot2)
qplot(Sepal.Length, Petal.Length, data = iris, color = Species,
size = Petal.Width, alpha = I(0.7))
graph2ppt(file="plots.pptx", width=6, height=5)
qplot(Sepal.Width, Petal.Length, data = iris, color = Species,
size = Petal.Width, alpha = I(0.7))
graph2ppt(file="plots.pptx", width=6, height=5, append=TRUE)

Resources