R: Saving JPEGS in R - r

I am working with the R programming language.
I have the following dataset:
library(treemap)
library(dplyr)
var1 <- c("A", "B", "C", "D", "E")
var1<- sample(var1, 1000, replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2))
var1<- as.factor(var1)
var2 <- c("abc", "bcd", "egf", "hiu", "lkj", "oiu", "piu", "xsr", "zze", "tre")
var2<- sample(var2, 1000, replace=TRUE, prob=c(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1))
var2<- as.factor(var2)
my_data = data.frame(var1, var2)
my_data = data.frame(my_data %>% group_by(var1, var2) %>% summarise(counts = n()))
I am now using this code to make a treemap for the entire dataset:
resize.win <- function(Width=6, Height=6)
{
# works for windows
dev.off(); # dev.new(width=6, height=6)
windows(record=TRUE, width=Width, height=Height)
}
resize.win(10,10)
treemap(my_data, index = c("var1", "var2"), vSize = "counts", type = "index", palette = "Set1", title = "my map")
Now, I would like to make an individual treemap for each unique factor of var1 using a LOOP, and then save each of them. I tried to do this with a loop:
levs = my_data$var1
factors = lapply(levs[sapply(levs, is.factor)], levels)
factors = factors[[1]]
my_list = list()
for (i in 1:length(factors))
{
index_i = factors[[i]]
title_i = paste0("Tree Map", index_i)
data_i = my_data[which(my_data$var1 == index_i),]
treemap_i = treemap(data_i, index = c("var1", "var2"), vSize = "counts", type = "index",
palette = "Set1", title = title_i)
my_list[[i]] = treemap_i
save_title_i = paste0("treemap_",i, ".png")
png(save_title_i )
}
However, I don't think I am doing this right since the outputs are being corrupted:
Can someone please show me how to fix this?
Thanks!

If you want to save the plot locally you need to remember two things:
use png()(bmp, jpeg, pdf etc.) before the plot to open the file
use dev.off() after the plot to close the file
*In addition, you can change the dimensions of the plot you want to save by:
png(save_title_i , width = 800, height = 800)
for (i in 1:length(factors))
{
index_i = factors[[i]]
title_i = paste0("Tree Map", index_i)
data_i = my_data[which(my_data$var1 == index_i),]
save_title_i = paste0("treemap_",i, ".png")
png(save_title_i ) # open the file
treemap_i = treemap(data_i, index = c("var1", "var2"), vSize = "counts", type = "index", palette = "Set1", title = title_i)
my_list[[i]] = treemap_i
dev.off() # close the file
}

Related

R: Deduplicating Legends in Plotly

I am working with the R programming language. I have the following data:
myFun <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
var1 <- c("a", "b", "c", "d", "e")
var1 <- sample(var1, 100, replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2))
var2 = rnorm(100, 100,100)
var3 = rnorm(100,100,100)
var4 = rnorm(100, 100,100)
label1 = myFun(100)
label2 = myFun(100)
label3 = myFun(100)
label4 = myFun(100)
my_data =data.frame(var1, var2, var3, var4, label1, label2, label3, label4)
I am trying to make a Bubble Plot using the Plotly library. Here is the code I am using:
library(plotly)
p = plot_ly(my_data, x = ~ var2, y = ~ var3, color = ~ var1, type = "scatter", mode = "markers", size = ~var4, marker = list(symbol = 'circle', sizemode = 'diameter', line = list(width = 2, color = '#FFFFFF'), opacity =0.4, autosize = T))
p = p %>% layout ( axis = list(title = 'Title 1'), yaxis = list(title = 'title 2'), legend = list(title = list(text = '<b> var1 </b>')))
p = p %>% layout(title = paste0('Main Title', '<br>', 'sup', 'Subtitle' ))
annotation_row = my_data[1,]
annotation = list( x = my_data$var2, y = my_data$var3, text = my_data$var1, xref = "x", yref = "y", showarrow = TRUE, arrowhead = 7, ax = 20, ay = -40)
p = p %>% add_trace (text = paste("Var 1:", my_data$var1, "<br> Var 2: ", my_data$var2, "<br> Var 3: ", my_data$var3 ), hoverinfo = "text" )
p = p %>% layout(annotations = annotation)
htmltools::save_html(html = p, file = "file.html")
The plot runs fine, but I am noticing two problems:
The values in the legend are appearing twice (e.g. a,b,c,d,e,a,b,c,d,e)
(Even though the plot in this example appears fine) When I run this code on my real data, the plot is appearing "horizontally stretched" (e.g.https://stackoverflow.com/questions/73552382/r-forcing-plotly-to-save-full-sized-plots). I tried different options to fix this "horizontal stretching"
For example (instead of "autosize", I tried to manually specify the height and width) :
p = plot_ly(my_data, x = ~ var2, y = ~ var3, color = ~ var1, type = "scatter", mode = "markers", size = ~var4, marker = list(symbol = 'circle', sizemode = 'diameter', line = list(width = 2, color = '#FFFFFF'), opacity =0.4, height = 1000, width = 1000))
Can someone please show me how to fix this problem?
Is it possible to undo this "legend duplication" and re-write the plotly code in such a way that the "horizontal stretching" problem is fixed?
Thanks!

How to segment descriptive bar plots using DataExplorer

I have a dataset and would like to do some exploratory data analysis before building a predictive model. All variables are categorical. I know that I can use 'dataExplorer' to do some quick EDA:
library(tidyverse)
library(dataExplorer)
dat <- data.frame(circuit = sample(c("China", "Murica", "Brazil"), 100, replace = T),
driver = sample(c("Kimi", "Seb", "Max", "Lando", "Lance"), 100, replace = T),
opinion = sample(c("Garbage", "Not.Garbage"), 100, replace = T, prob = c(0.8, 0.2)))
dat %>%
select(-opinion) %>%
plot_bar
However, I want the bars for 'circuit' and 'driver' to be filled in to represent the respective proportions of 'opinion' for each variable (see below). This is so that I can see which predictor variables are most closely associated with my outcome variable.
dat %>%
ggplot(aes(x = circuit, fill = opinion)) +
geom_histogram(stat = "count")
However, I don't want to build each plot individually and then use grid.arrange to organize them.
Thanks for any help :)
I don't think there is an easy way unless you tweak the plot_bar function, since it is currently designed to visualize univariate distribution. Run the following function and it should work with your example:
library(tidyverse)
library(data.table) ## Note: You will need to load data.table
library(DataExplorer)
## Rewrite plot_bar
plot_bar2 <- function(data, group, with = NULL, maxcat = 50, order_bar = TRUE, binary_as_factor = TRUE, title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 3L, ncol = 3L, parallel = FALSE) {
frequency <- measure <- variable <- value <- NULL
if (!is.data.table(data)) data <- data.table(data)
split_data <- split_columns(data, binary_as_factor = binary_as_factor)
if (split_data$num_discrete == 0) stop("No discrete features found!")
discrete <- split_data$discrete
ind <- DataExplorer:::.ignoreCat(discrete, maxcat = maxcat)
if (length(ind)) {
message(length(ind), " columns ignored with more than ", maxcat, " categories.\n", paste0(names(ind), ": ", ind, " categories\n"))
drop_columns(discrete, names(ind))
if (length(discrete) == 0) stop("Note: All discrete features ignored! Nothing to plot!")
}
feature_names <- names(discrete)
if (is.null(with)) {
dt <- discrete[, list(frequency = .N), by = feature_names]
} else {
if (is.factor(data[[with]])) {
measure_var <- suppressWarnings(as.numeric(levels(data[[with]]))[data[[with]]])
} else if (is.character(data[[with]])) {
measure_var <- as.numeric(data[[with]])
} else {
measure_var <- data[[with]]
}
if (all(is.na(measure_var))) stop("Failed to convert `", with, "` to continuous!")
if (with %in% names(discrete)) drop_columns(discrete, with)
tmp_dt <- data.table(discrete, "measure" = measure_var)
dt <- tmp_dt[, list(frequency = sum(measure, na.rm = TRUE)), by = feature_names]
}
dt2 <- suppressWarnings(melt.data.table(dt, id.vars = c(group, "frequency"), measure.vars = setdiff(feature_names, group))) # This line is updated
layout <- DataExplorer:::.getPageLayout(nrow, ncol, ncol(discrete))
plot_list <- DataExplorer:::.lapply(
parallel = parallel,
X = layout,
FUN = function(x) {
if (order_bar) {
base_plot <- ggplot(dt2[variable %in% feature_names[x]], aes(x = reorder(value, frequency), y = frequency))
} else {
base_plot <- ggplot(dt2[variable %in% feature_names[x]], aes(x = value, y = frequency))
}
base_plot +
geom_bar(stat = "identity", aes_string(fill = group)) + # This line is updated
coord_flip() +
xlab("") + ylab(ifelse(is.null(with), "Frequency", toTitleCase(with)))
}
)
class(plot_list) <- c("multiple", class(plot_list))
plotDataExplorer(
plot_obj = plot_list,
page_layout = layout,
title = title,
ggtheme = ggtheme,
theme_config = theme_config,
facet_wrap_args = list(
"facet" = ~ variable,
"nrow" = nrow,
"ncol" = ncol,
"scales" = "free"
)
)
}
## Create data and plot
dat <- data.frame(
circuit = sample(c("China", "Murica", "Brazil"), 100, replace = T),
driver = sample(c("Kimi", "Seb", "Max", "Lando", "Lance"), 100, replace = T),
opinion = sample(c("Garbage", "Not.Garbage"), 100, replace = T, prob = c(0.8, 0.2))
)
plot_bar2(dat, group = "opinion")
The plot looks like this:

R Shiny Datatable color cell based on value range

I have this dataset:
test <- data.frame("type" = c("A", "B", "C", "D"), "goal" = c(3000,4500,250,2000),"Jan" = c(4000,2000,240,800), "Feb" = c(2000,3000,300,1500), "Mar" = c(2800,4000,100,1400) )
I would like to have a data table display this in my shiny app with color coded cell using the following criteria:
Green: Below 60% of goal
Amber: 60-80% of goal
Red: Above 80% of goal
Desired Output
I have played around with JSCallback option in renderDatatable function but it seems that it needs the cell to be in percentage instead of actual numbers. Any help would be greatly appreciated! Thank you!
This is the one I could come up with using styleInterval but not sure how scalable you need.
We first logical values based on Goal vs Actual intervals
Use that to color the cell
#courtesy: https://stackoverflow.com/a/50950368/5086335
library(DT)
test <- data.frame("type" = c("A", "B", "C", "D"), "goal" = c(3000,4500,250,2000),"Jan" = c(4000,2000,240,800), "Feb" = c(2000,3000,300,1500), "Mar" = c(2800,4000,100,1400) )
# Green: Below 60% of goal Amber: 60-80% of goal Red: Above 80% of goal
test$jan_goal <- ifelse(test$Jan > test$goal * 0.8, 2,
ifelse(test$Jan < test$goal * 0.6, 0,
1))
test$feb_goal <- ifelse(test$Feb > test$goal * 0.8, 2,
ifelse(test$Feb < test$goal * 0.6, 0,
1))
test$mar_goal <- ifelse(test$Mar > test$goal * 0.8, 2,
ifelse(test$Mar < test$goal * 0.6, 0,
1))
DT::datatable(
test,
rownames = FALSE,
options = list(
columnDefs = list(list(targets = c(5,6,7), visible = FALSE))
)
) %>%
formatStyle(columns = "Jan",
valueColumns = "jan_goal",
backgroundColor = styleEqual(levels = c(0,1,2), values = c("#008000","#FFA500","#F00"))) %>%
formatStyle(columns = "Feb",
valueColumns = "feb_goal",
backgroundColor = styleEqual(levels = c(0,1,2), values = c("#008000","#FFA500","#F00"))) %>%
formatStyle(columns = "Mar",
valueColumns = "mar_goal",
backgroundColor = styleEqual(levels = c(0,1,2), values = c("#008000","#FFA500","#F00")))

Label edges in geom_net in r

How to label edges in ggplot's geom_net library?
library(geomnet)
library(ggplot2)
x <- structure(list(from = c("a", "b", "d", "f", "g", "e", "c", "i",
"e", "h", "i", "i", "j", "j"), to = c("", "", "", "", "", "a",
"b", "c", "d", "e", "f", "g", "h", "i"), edge_val = c(NA, NA,
NA, NA, NA, 1.6, 2.25, 1.75, 0.95, 1.8, 3.2, 2.6, 2.95, 2.45)), .Names = c("from",
"to", "edge_val"), class = "data.frame", row.names = c(NA, -14L
))
ggplot(x, aes(from_id = from, to_id = to, linewidth = edge_val)) +
geom_net(layout.alg = "fruchtermanreingold", labelgeom = "text",repel = TRUE,
size = 4, labelon = TRUE, vjust = -1, ecolour = "grey80",
directed = FALSE, fontsize = 4, ealpha = 0.5) +
theme_net()
The idea would be to plot the edge value on the edges as:
I made some modifications to the code used for geom_net() (found here). It can be used like this:
# similar code as question, with linelabel = edge_val added to aes() & geom_net2
ggplot(x,
aes(from_id = from, to_id = to, linewidth = edge_val, linelabel = edge_val)) +
geom_net2(layout.alg = "fruchtermanreingold", labelgeom = "text", repel = TRUE,
size = 4, labelon = TRUE, vjust = -1, ecolour = "grey80",
directed = FALSE, fontsize = 4, ealpha = 0.5) +
theme_net()
To create geom_net2():
Step 1: Create a modified version of the draw_panel function used by geomnet::GeomNet, with line labels if aes(...) includes a mapping for linelabel.
old.draw_panel <- environment(GeomNet$draw_panel)$f
new.draw_panel <- old.draw_panel
# convert function body to a list, for easier code chunk insertions
body(new.draw_panel) <- as.list(body(new.draw_panel))
# geomnet code includes usage of %||%, which is an unexported function
# (it is identical to the exported version in rlang / purrr, so you can skip
# this step if you have one of those packages loaded)
body(new.draw_panel) <-
append(body(new.draw_panel),
substitute(
"%||%" <- function(a, b) {if (!is.null(a)) a else b}
), after = 1)
# remove the last chunk of code, which returns a grobTree for the geom layer
# (we'll add on a new grobTree later)
body(new.draw_panel) <-
body(new.draw_panel)[-length(body(new.draw_panel))]
# define label_line as NULL
body(new.draw_panel) <-
append(body(new.draw_panel),
substitute(
label_line <- NULL
))
# if aes(...) includes a mapping for linelabel, use it for label_line, positioned at the
# midpoint of each line
body(new.draw_panel) <-
append(body(new.draw_panel),
substitute(
if (!is.null(data$linelabel)){
label_line.df <- subset(data, to != "")
label_line.df$x <- (label_line.df$x + label_line.df$xend) / 2
label_line.df$y <- (label_line.df$y + label_line.df$yend) / 2
label_line.df$label <- label_line.df$linelabel
label_line <- ggplot2::GeomText$draw_panel(label_line.df,
panel_scales, coord)
}
))
# return a grobTree, with label_line added
body(new.draw_panel) <-
append(body(new.draw_panel),
substitute(
ggplot2:::ggname("geom_net2",
grid::grobTree(edges_draw, selfies_draw, selfies_arrows,
GeomPoint$draw_panel(vertices, panel_scales, coord),
label_grob, label_line))
))
body(new.draw_panel) <- as.call(body(new.draw_panel))
rm(old.draw_panel)
Step 2: Create GeomNet2 ggproto, which inherits from geomnet::GeomNet, but uses the modified draw_panel function.
GeomNet2 <- ggproto(`_class` = "GeomNet2",
`_inherit` = geomnet::GeomNet,
draw_panel = new.draw_panel)
Step 3: Create geom_net2 function, which is similar to geomnet::geom_net, except that it uses GeomNet2 as its geom.
geom_net2 <- function (
mapping = NULL, data = NULL, stat = "net", position = "identity", show.legend = NA,
na.rm = FALSE, inherit.aes = TRUE, layout.alg="kamadakawai", layout.par=list(),
directed = FALSE, fiteach=FALSE, selfloops = FALSE, singletons = TRUE, alpha = 0.25,
ecolour=NULL, ealpha=NULL, arrow=NULL, arrowgap=0.01, arrowsize=1, labelon=FALSE,
labelcolour=NULL, labelgeom = 'text', repel = FALSE,
vertices=NULL, ...) {
ggplot2::layer(
geom = GeomNet2, mapping = mapping, data = data, stat = stat,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, layout.alg=layout.alg, layout.par=layout.par,
fiteach=fiteach, labelon=labelon, labelgeom=labelgeom, ecolour = ecolour,
ealpha=ealpha, arrow=arrow, arrowgap=arrowgap, directed=directed, repel=repel,
arrowsize=arrowsize, singletons=singletons, labelcolour=labelcolour,
vertices=vertices, selfloops = selfloops,
...)
)
}
You can use the ggplot_build object to get the positions for the text labels. You need to set the seed so that the plot is not redrawn with a new layout.
library(geomnet)
library(ggplot2)
set.seed(1)
p <- ggplot(x, aes(from_id = from, to_id = to, linewidth = edge_val)) +
geom_net(layout.alg = "fruchtermanreingold", labelgeom = "text",repel = TRUE,
size = 4, labelon = TRUE, vjust = -1, ecolour = "grey80",
directed = FALSE, fontsize = 4, ealpha = 0.5) +
theme_net()
# grab plot data
g <- ggplot_build(p)
edgeData <- subset(g$data[[1]], !is.na(linewidth))
# draw labels
# x and y coords are mid between vertices
set.seed(1) # use the same seed
p + geom_text(data=edgeData,
aes(x=(xend+x)/2, y=(yend+y)/2, label=linewidth),
inherit.aes = FALSE)
If it's not compulsory geomnet package, we can take a look to ggraph package.
Here something to think about:
library(tidyverse)
library(tidytext)
library(tidygraph)
library(ggraph)
library(ggrepel)
# first we have to give to ggraph data as it likes:
edges <- x[-c(1:5),] # edges
colnames(edges) <- c('a','b','edge_val') # colnames
# second the nodes, taking all the nodes in the edges. You can also give them a weight.
nodes <- rbind(data.frame(node = edges$a, n = 1),data.frame(node = edges$b, n = 1)) %>% group_by(node) %>% summarise(n = sum(n))
Now you have to avoid a possible bug in the package, as stated here:
# here the fix
edges$a <- match(edges$a, nodes$node)
edges$b <- match(edges$b, nodes$node)
# you have to give to the graph data in this way
tidy <- tbl_graph(nodes = nodes, edges = edges, directed = T)
tidy <- tidy %>%
activate(nodes)
# lastly, the plot
set.seed(1)
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=1, color = 1)) +
geom_edge_link(alpha = 0.8,aes(label = edge_val)) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=node), size = 6) +
# here some warnings about font...
theme_graph()

Manipulating legend text in R plotly

I have a data.frame I'd like to scatter plot using R's plotly with two factors which I'd like to color and shape by.
Here's my data:
set.seed(1)
df <- data.frame(x=rnorm(12),y=rnorm(12),
group=c(rep(1,3),rep(2,3),rep(3,3),rep(4,3)),
treatment=c(rep("A",6),rep("B",6)),
stringsAsFactors=F)
df$group <- factor(df$group,levels=1:4)
df$treatment <- factor(df$treatment,levels=c("A","B"))
Here's how I'm trying to plot:
require(plotly)
plot_ly(marker=list(size=10),type='scatter',mode="markers",x=~df$x,y=~df$y,color=~df$group,symbol=~df$treatment) %>%
add_annotations(text="group,treatment",xref="paper",yref="paper",x=1.02, xanchor="left",y=1.02,yanchor="top",legendtitle=TRUE,showarrow=FALSE) %>%
layout(xaxis=list(title="x"),yaxis=list(title="y"))
which gives me:
Is it possible to get the text of group and treatment in the legend be separated by comma instead of the new line as it is now?
This means that instead of:
1
A
2
A
3
B
4
B
I'll have:
1,A
2,A
3,B
4,B
Sounds trivial but it's one of the cases where Plotly decides whats good for you.
The legend labels are composed of the categories of color and symbol which are all passed in one command. In order to get control over the output, let's add each trace separately.
for (grou in groups) {
for (treat in treatments) {
trace_data <- subset(df, group == grou & treatment == treat)
if (nrow(trace_data) > 0) {
p <- add_trace(p,
x = trace_data$x,
y = trace_data$y,
marker = list(size = 10,
color = group,
symbol = as.integer(charToRaw(treat)) - 65),
type = 'scatter',
mode = "markers",
name = paste(grou, treat, sep = ",")
)
}
}
}
We pass the color (not strictly necessary) via marker and symbol also via marker (both can be passed in the add_trace command as well but then again Plotly decides for you what do to do with it).
The legend label is passed via name.
Note: You need to convert your treatment explicitly because symbol expects either a named symbol or a number (unless your treatments are named diamond or circle)
Complete code
library(utils)
library(plotly)
set.seed(1)
df <- data.frame(x = rnorm(12),
y = rnorm(12),
group = c(rep(1, 3),
rep(2, 3),
rep(3, 3),
rep(4, 3)
),
treatment=c(rep("A", 6),
rep("B", 6)
),
stringsAsFactors = FALSE
)
groups <- unique(df$group)
treatments <- unique(df$treatment)
p <- plot_ly()
for (grou in groups) {
for (treat in treatments) {
trace_data <- subset(df, group == grou & treatment == treat)
if (nrow(trace_data) > 0) {
p <- add_trace(p,
x = trace_data$x,
y = trace_data$y,
marker = list(size = 10,
color = group,
symbol = as.integer(charToRaw(treat)) - 65),
type = 'scatter',
mode = "markers",
name = paste(grou, treat, sep = ",")
)
}
}
}
p <- add_annotations(p,
text = "group,treatment",
xref = "paper",
yref = "paper",
x = 0.96,
xanchor = "left",
y = 1.03,
yanchor = "top",
legendtitle = TRUE,
showarrow = FALSE) %>%
layout(xaxis = list(title = "x"),
yaxis = list(title = "y"))
p

Resources