R edit labels on hover of a fviz_cluster - r

I have a fv_viz plot currently displaying the labels on hover with the x position, y position and belonging cluster of the data point. I want to display the rowname(index column of the dataframe, the Company).
Currently the labels looklike this:
And this is the code that is generating them.
output$clustering_plot <- renderPlotly({
valuation_total_raised <- unicorn_countries_clustering_cleaned[, c("Valuation...B.", "Total.Raised")]
rownames(valuation_total_raised) <- unicorn_countries_clustering_cleaned$Company
valuation_total_raised <- valuation_total_raised %>% drop_na()
# Perform k-means clustering
kmeans_fancy <- kmeans(valuation_total_raised, max(input$range_clusters) , nstart = 100)
# Add cluster column to the original dataframe
unicorn_countries_clustering_cleaned$cluster <- kmeans_fancy$cluster
# plot the clusters
#fviz_cluster(kmeans_fancy, data = scale(valuation_total_raised), geom = c("point"),ellipse.type = "euclid")
# Create the ggplot2 object
plot <- fviz_cluster(kmeans_fancy, data = scale(valuation_total_raised),
geom = c("point"),
ellipse.type = "convex")
# Convert the ggplot2 object to an interactive plotly object
plotly_plot <- plotly_build(plot)
# Show the interactive plotly object
plotly_plot
})
How can I edit this code so that it displays the Company name of each data point, in which the Company name is the index of each row? (When I add geom = c("point", "text") the Company appears on top of the point. I want it displayed on the label popup that is on the image above)

Related

Venn Diagram in R to show character labels

I'm very new to R and trying to create a 4-dimensional Venn diagram with a data which contains all categorical variables.
For instance, if i have the data below and I want to create a Venn Diagram in R to show the word "hello" at the intersection of A & B instead of the counts and percentages, how do i do that? I used the code ggVennDigram(x) after creating the list below and it gave me the counts instead of the actual data labels.
x=list()
x$A=as.character(c("hello"))
x$B=as.character(c("hello", "how", "are"))
x$C=as.character(c("how", "You"))
x$D=as.character(c("me", "her", "they"))
The graph cannot be label by default. This is solution with some hack.
(modified from Venn Diagram with Item labels)
library(VennDiagram)
library(stringr)
library(purrr)
# Generate plot
v <- venn.diagram(x,
fill = c("orange", "blue", "red", "green"),
filename=NULL)
# Calculate overlap site
overlaps <- calculate.overlap(x)
overlaps <- overlaps[str_sort(names(overlaps), numeric = TRUE)] # sort base on numeric value
# Apply name to global variable
# Index of venn diagram start at calculate.overlaps + 8. You have to find the index value by yourself for (3,5,6,7,.. venn)
walk2(seq(overlaps) +8, seq(overlaps),
function(x,y) {v[[x]]$label <<- paste0(overlaps[[y]], collapse = "\n")})
# Draw plot
grid.draw(v)

Base R Choropleth: colors aren't being applied to the map according to the order of the interval/breaks which makes the map hard to read

I created a choropleth with base R but I'm struggling with the colors. First, the colors don't follow the same order as the intervals and second, two of the intervals are using the same color, all of which makes the graph hard to read. This happens regardless of how many colors I use. It also doesn't matter whether I'm using brewer.pal or base colors.Here is a map with its respective legend illustrating the issue.
Below are the statements that I use to create the graph once data has been downloaded:
#Relevant packages:
library(dplyr)
library(RColorBrewer)
library(rgdal)
#create colors vector
pop_colors <- brewer.pal(8,"Purples")
#create breaks/intervals
pop_breaks <- c(0,20000,40000,60000,80000,100000,120000)
#apply breaks to population
cuts <- cut(cal_pop$Pop2016, pop_breaks, dig.lab = 6)
#create a vector with colors by population according to the interval they belong to:
color_breaks <- pop_colors[findInterval(cal_pop$Pop2016,vec = pop_breaks)]
Create choropleth
plot(cal_pop,col = color_breaks, main = "Calgary Population (2016)")
#create legend
legend("topleft", fill = color_breaks, legend = levels(cuts), title = "Population")
I used readOGR() command to read the shape file, which I'm linking here in case anybody is interested in taking a look at the data.
I'd appreciate any advice you could give me.
Thanks!
Your error is in this line:
color_breaks <- pop_colors[findInterval(cal_pop$Pop2016,vec = pop_breaks)]
I can't read your data file, so I'll use a built-in one from the sf package.
library(sf)
nc <- readOGR(system.file("shapes/", package="maptools"), "sids")
str(nc#data)
colors <- brewer.pal(8,"Purples")
#create breaks/intervals
sid_breaks <- c(0,2,4,6,8,10,12,20,60)
#apply breaks to population
sid_cuts <- cut(nc$SID79, sid_breaks, dig.lab = 6, include=TRUE)
#create a vector with colors by population according to the interval they belong to:
sid_colors <- colors[sid_cuts]
#Create choropleth
par(mar=c(0,0,0,0))
plot(nc, col = sid_colors)
legend("bottomleft", fill = colors, legend = levels(sid_cuts), nc=2, title = "SID (1979)", bty="n")

Extracting the exact coordinates of a mouse click in an interactive plot

In short: I'm looking for a way to get the exact coordinates of a series of mouse positions (on-clicks) in an interactive x/y scatter plot rendered by ggplot2 and ggplotly.
I'm aware that plotly (and several other interactive plotting packages for R) can be combined with Shiny, where a box- or lazzo select can return a list of all data points within the selected subspace. This list will be HUGE in most of the datasets I'm analysing, however, and I need to be able to do the analysis reproducibly in an R markdown format (writing a few, mostly less than 5-6, point coordinates is much more readable). Furthermore, I have to know the exact positions of the clicks to be able to extract points within the same polygon of points in a different dataset, so a list of points within the selection in one dataset is not useful.
The grid.locator() function from the grid package does almost what I'm looking for (the one wrapped in fx gglocator), however I hope there is a way to do the same within an interactive plot rendered by plotly (or maybe something else that I don't know of?) as the data sets are often HUGE (see the plot below) and thus being able to zoom in and out interactively is very much appreciated during several iterations of analysis.
Normally I have to rescale the axes several times to simulate zooming in and out which is exhausting when doing it MANY times. As you can see in the plot above, there is a LOT of information in the plots to explore (the plot is about 300MB in memory).
Below is a small reprex of how I'm currently doing it using grid.locator on a static plot:
library(ggplot2)
library(grid)
p <- ggplot(mtcars, aes(wt, mpg)) +
geom_point()
locator <- function(p) {
# Build ggplot object
ggobj <- ggplot_build(p)
# Extract coordinates
xr <- ggobj$layout$panel_ranges[[1]]$x.range
yr <- ggobj$layout$panel_ranges[[1]]$y.range
# Variable for selected points
selection <- data.frame(x = as.numeric(), y = as.numeric())
colnames(selection) <- c(ggobj$plot$mapping$x, ggobj$plot$mapping$y)
# Detect and move to plot area viewport
suppressWarnings(print(ggobj$plot))
panels <- unlist(current.vpTree()) %>%
grep("panel", ., fixed = TRUE, value = TRUE)
p_n <- length(panels)
seekViewport(panels, recording=TRUE)
pushViewport(viewport(width=1, height=1))
# Select point, plot, store and repeat
for (i in 1:10){
tmp <- grid.locator('native')
if (is.null(tmp)) break
grid.points(tmp$x,tmp$y, pch = 16, gp=gpar(cex=0.5, col="darkred"))
selection[i, ] <- as.numeric(tmp)
}
grid.polygon(x= unit(selection[,1], "native"), y= unit(selection[,2], "native"), gp=gpar(fill=NA))
#return a data frame with the coordinates of the selection
return(selection)
}
locator(p)
and from here use the point.in.polygon function to subset the data based on the selection.
A possible solution could be to add, say 100x100, invisible points to the plot and then use the plotly_click feature of event_data() in a Shiny app, but this is not at all ideal.
Thanks in advance for your ideas or solutions, I hope my question was clear enough.
-- Kasper
I used ggplot2. Besides the materials at https://shiny.rstudio.com/articles/plot-interaction.html, I'd like to mention the following:
Firstly, when you create the plot, don't use "print( )" within "renderPlot( )", or the coordinates would be wrong. For instance, if you have the following in UI:
plotOutput("myplot", click = "myclick")
The following in the Server would work:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
p
})
But the clicking coordinates would be wrong if you do:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
print(p)
})
Then, you could store the coordinates by adding to the Server:
mydata = reactiveValues(x_values = c(), y_values = c())
observeEvent(input$myclick, {
mydata$x_values = c(mydata$x_values, input$myclick$x)
mydata$y_values = c(mydata$y_values, input$myclick$y)
})
In addition to X-Y coordinates, when you use facet with ggplot2, you refer to the clicked facet panel by
input$myclick$panelvar1

How to color connecting lines when comparing two dendrograms in R dendextend

I am using dendextend's tanglegram to compare two dendrograms. Almost everything is working, including coloring the node labels to correspond to the clusters. What is not working is that I want all the connecting lines (from a node label in one dendrogram to the node label in the other dendrogram) to be black. Currently they are randomly colored, which makes it difficult to interpret.
Here is the R script. What is curious to me is that if I run it in RGui, it is correct (all connecting lines black), but the png has connecting lines of various colors.
suppressPackageStartupMessages(library(dendextend))
library(dendextend)
# for StackOverflow question, data is here rather than loaded from csv file
hab1 <- data.frame(matrix(c(100,90.6,88.9,89.2,91.2,98.2,91,55.9,91.5,97.2,90.6,100,93.9,85.3,98.3,90.1,96.2,53,88.7,91.6,88.9,93.9,100,82.9,94.4,88,93.4,51.9,87.1,90.4,89.2,85.3,82.9,100,86.6,89.8,85.2,60.7,95.8,91,91.2,98.3,94.4,86.6,100,90.6,96.4,53.4,89.2,92.2,98.2,90.1,88,89.8,90.6,100,90.4,56,91.8,97,91,96.2,93.4,85.2,96.4,90.4,100,52.4,88.6,92,55.9,53,51.9,60.7,53.4,56,52.4,100,59.8,56,91.5,88.7,87.1,95.8,89.2,91.8,88.6,59.8,100,93.3,97.2,91.6,90.4,91,92.2,97,92,56,93.3,100),nrow=10,ncol=10))
# set the column names, which are used for node labels
colnames(hab1) <- c("W01","W02","W03","W04","W05","W06","W07","W08","W09","W10")
hclust1 <- hclust(as.dist(100 - hab1), method="average")
dend1 <- as.dendrogram(hclust1)
hab2 <- data.frame(matrix(c(100,89.5,87.4,88.1,90.1,96.4,89.7,55.1,89.9,96,89.5,100,93.3,85.3,98.3,89.5,96,52.9,88.2,91.6,87.4,93.3,100,82.4,93.9,87,92.7,51.5,86.1,89.9,88.1,85.3,82.4,100,86.6,89.3,85.1,60.6,95.2,91,90.1,98.3,93.9,86.6,100,90.1,96.2,53.3,88.7,92.2,96.4,89.5,87,89.3,90.1,100,89.7,55.5,90.7,96.4,89.7,96,92.7,85.1,96.2,89.7,100,52.2,88,91.8,55.1,52.9,51.5,60.6,53.3,55.5,52.2,100,59.4,55.9,89.9,88.2,86.1,95.2,88.7,90.7,88,59.4,100,92.7,96,91.6,89.9,91,92.2,96.4,91.8,55.9,92.7,100),nrow=10,ncol=10))
# set the column names, which are used for node labels
colnames(hab2) <- c("W01","W02","W03","W04","W05","W06","W07","W08","W09","W10")
hclust2 <- hclust(as.dist(100 - hab2), method="average")
dend2 <- as.dendrogram(hclust2)
# colors for the node labels
colors_to_use1 <- c("purple","orange","blue","darkolivegreen","orange","purple","magenta","red","darkolivegreen","cyan")
colors_to_use2 <- c("purple","orange","blue","darkolivegreen","orange","purple","magenta","red","darkolivegreen","cyan")
# sort the colors based on their order in dend1
colors_to_use_dend1 <- colors_to_use1[order.dendrogram(dend1)]
labels_colors(dend1) <- colors_to_use_dend1
# sort the colors based on their order in dend2
colors_to_use_dend2 <- colors_to_use2[order.dendrogram(dend2)]
labels_colors(dend2) <- colors_to_use_dend2
dends_1_2 <- dendlist(dend1, dend2)
x <- dends_1_2 %>% untangle(method = "step2side") %>% tanglegram(color_lines = c("black"))
png("Exclude vs not exclude.png")
x %>% plot(main = "Exclude vs not exclude")
dev.off( )
What am I doing wrong? I have tried using common_subtrees_color_lines = FALSE for tanglegram, but to no avail.

Creating Hexbins with Dates in R hexbin()

I am trying to create hexbins where the x-axis is a date using the hexbin function in the hexbin package in R. When I feed in my data, it seems to convert the dates into a numeric, which gets displayed on the x-axis. I want it force the x-axis to be a date.
#Create Hex Bins
hbin <- hexbin(xData$Date, xData$YAxis, xbins = 80)
#Plot using rBokeh
figure() %>%
ly_hexbin(hbin)
This gives me:
Here's a brute force approach using the underlying grid plotting package. The axes are ugly; maybe someone with better grid skills than I could pretty them up.
# make some data
x = seq.Date(as.Date("2015-01-01"),as.Date("2015-12-31"),by='days')
y = sample(x)
# make the plot and capture the plot
p <- plot(hexbin(x,y),yaxt='n',xaxt='n')
# calculate the ticks
x_ticks_date <-
x_ticks <- axTicks(1, log = FALSE, usr = as.numeric(range(x)),
axp=c(as.numeric(range(x)) ,5))
class(x_ticks_date) <- 'Date'
y_ticks_date <-
y_ticks <- axTicks(1, log = FALSE, usr = as.numeric(range(y)),
axp=c(as.numeric(range(y)) ,5))
class(y_ticks_date) <- 'Date'
# push the ticks to the view port.
pushViewport(p$plot.vp#hexVp.off)
grid.xaxis(at=x_ticks, label = format(y_ticks_date))
grid.yaxis(at=y_ticks, label = format(y_ticks_date))

Resources