plotly: Map size (shape) to value colum in scatterplot - r

I really appreaciate the 'plotly' r-package. Currently I run into an issue, where I want to visualize a data frame as points and map the point size (as well as the shape potentially) to a dimension of the data frame.
The problem I run into with my own dataset is, that the sizes are somehow "mixed up" in the sense, that the bigger points don't correspond to the bigger values.
I haven't fully understood the options I have with plotly (sizeref and other marker-options; the fundamental difference between mapping the dimension directly or in the marker arguments; etc) , so this is my best shot as a minimal example right here.
(The second plot is closer to what I currently do. If this one could be fixed, it would be preferable to me)
Your thoughts are greatly appreciated. :)
library(plotly)
set.seed(1)
df <- data.frame(x = 1:10,
y = rep(c("id1", "id2"), 5),
col = factor(sample(3, 10, replace = TRUE)))
df$size <- c(40, 40, 40, 30, 30, 30, 20, 20, 20, 10)
df
#> x y col size
#> 1 1 id1 1 40
#> 2 2 id2 2 40
#> 3 3 id1 2 40
#> 4 4 id2 3 30
#> 5 5 id1 1 30
#> 6 6 id2 3 30
#> 7 7 id1 3 20
#> 8 8 id2 2 20
#> 9 9 id1 2 20
#> 10 10 id2 1 10
# Mapping looks right, but the size may not be correct
plot_ly(df,
x = ~x,
y = ~y,
color = ~col,
size = ~size,
type = 'scatter',
mode = 'markers',
hoverinfo = "text",
text = ~paste('</br> x: ', x,
'</br> y: ', y,
'</br> col: ', col,
'</br> size: ', size)
# , marker = list(size = ~size)
)
# Size looks right, but mapping to points is wrong
plot_ly(df,
x = ~x,
y = ~y,
color = ~col,
# size = ~size,
type = 'scatter',
mode = 'markers',
hoverinfo = "text",
text = ~paste('</br> x: ', x,
'</br> y: ', y,
'</br> col: ', col,
'</br> size: ', size)
, marker = list(size = ~size)
)
devtools::session_info() # excerpt
#> plotly * 4.8.0

Related

Seeking help to generate a similar heatmap as attached

I am trying to generate a heatmap as the following figure. I have already tried pheatmap and the code is as follows:
breaks_2 <- seq(min(0), max(2), by = 0.1)
pheatmap::pheatmap(
mat = data,
cluster_cols = F,
cluster_rows = F,
scale = "column",
border_color = "white",
color = inferno(20),
show_colnames = TRUE,
show_rownames = FALSE,
breaks = breaks_2
)
But this does not seem to work. So far I am understanding I am mistaking with defining break or have to use another package than pheatmap. Any suggestion will be really helpful.
The color scale in pheatmap adjusts to the range of the input data. If you want anything above a certain value to be coloured daffodil, then simply send pheatmap a copy of your data with the highest values rounded to 2.
Suppose you have a data frame like this, with values anywhere between 0 and 3:
set.seed(1)
data <- as.data.frame(matrix(runif(64, 0, 3), nrow = 8))
names(data) <- LETTERS[1:8]
data
#> A B C D E F G H
#> 1 0.7965260 1.8873421 2.1528555 0.801662 1.4806239 2.46283888 2.1969412 0.9488151
#> 2 1.1163717 0.1853588 2.9757183 1.158342 0.5586528 1.94118058 2.0781947 1.5559028
#> 3 1.7185601 0.6179237 1.1401055 0.040171 2.4821200 2.34879829 1.4328589 1.9860152
#> 4 2.7246234 0.5296703 2.3323357 1.147164 2.0054002 1.65910893 2.5836284 1.2204906
#> 5 0.6050458 2.0610685 2.8041157 2.609073 2.3827196 1.58915874 1.3142913 2.7386278
#> 6 2.6951691 1.1523112 0.6364276 1.021047 0.3238309 2.36806870 0.7343918 0.8808101
#> 7 2.8340258 2.3095243 1.9550213 1.446240 2.1711328 0.06999361 0.2120371 1.3771972
#> 8 1.9823934 1.4930977 0.3766653 1.798697 1.2338233 1.43169020 0.2983985 0.9971840
Some of the values are greater than two. We want all of these to appear the same colour on our heatmap, so we create a copy of our data for plotting, and round down all of the values that were greater than 2 to be exactly 2:
data_2 <- data
data_2[] <- lapply(data_2, function(x) { x[x > 2] <- 2; x })
So now if we run pheatmap on data_2, we see that all the values that were greater than 2 in our original data frame are coloured daffodil.
library(viridis)
library(pheatmap)
breaks_2 <- seq(0, 2, by = 0.1)
pheatmap(
mat = data_2,
cluster_cols = F,
cluster_rows = F,
border_color = "white",
scale = 'none',
color = inferno(22),
show_colnames = TRUE,
show_rownames = FALSE,
legend_breaks = breaks_2
)

R: Customizing Scatterplots

I am using the R programming language. I am trying to follow the answer posted in this previous stackoverflow post (scatterplot3d: regression plane with residuals) and add a "plane" to a scatterplot.
Suppose I have the following data:
my_data <- data.frame(read.table(header=TRUE,
row.names = 1,
text="
weight height age
1 2998.958 15.26611 53
2 3002.208 18.08711 52
3 3008.171 16.70896 49
4 3002.374 17.37032 55
5 3000.658 18.04860 50
6 3002.688 17.24797 45
7 3004.923 16.45360 47
8 2987.264 16.71712 47
9 3011.332 17.76626 50
10 2983.783 18.10337 42
11 3007.167 18.18355 50
12 3007.049 18.11375 53
13 3002.656 15.49990 42
14 2986.710 16.73089 47
15 2998.286 17.12075 52
"))
I adapted the code to fit my example:
library(scatterplot3d)
model_1 <- lm(age ~ weight + height, data = my_data)
# scatterplot
s3d <- scatterplot3d(my_data$height, my_data$weight, my_data$age, pch = 19, type = "p", color = "darkgrey",
main = "Regression Plane", grid = TRUE, box = FALSE,
mar = c(2.5, 2.5, 2, 1.5), angle = 55)
# regression plane
s3d$plane3d(model_1, draw_polygon = TRUE, draw_lines = TRUE,
polygon_args = list(col = rgb(.1, .2, .7, .5)))
# overlay positive residuals
wh <- resid(model_1) > 0
s3d$points3d(my_data$height, my_data$weight, my_data$age, pch = 19)
Problem: However, the "plane" appears to be absent :
Desired Result:
Can someone please show me what I am doing wrong?
Thanks
The order of height and weight caused the problem.
s3d <- scatterplot3d(my_data$weight, my_data$height,my_data$age, pch = 19, type = c("p"), color = "darkgrey",
main = "Regression Plane", grid = TRUE, box = FALSE,
mar = c(2.5, 2.5, 2, 1.5), angle = 55)
# regression plane
s3d$plane3d(model_1, draw_polygon = TRUE, draw_lines = TRUE,
polygon_args = list(col = rgb(.1, .2, .7, .5)))
# overlay positive residuals
wh <- resid(model_1) > 0
s3d$points3d(my_data$height, my_data$weight, my_data$age, pch = 19)

r plotly chart based on multiple columns [duplicate]

This question already has answers here:
Format axis tick labels to percentage in plotly
(2 answers)
Closed 2 years ago.
I have a df which can have 2 or more columns with the first one month always fixed.I am trying to plot them using plotly r. As of now it has three columns: month,apple,orange. Based on analysis it can have another column banana. Below is the code I am using right now but it even takes the column month for y-axis. How do I fix this:
> sample_test
month apple orange
2 Aug-17 2 1
3 Dec-17 2 1
4 Feb-18 2 1
5 Jan-18 2 1
6 Jul-17 2 1
7 Jun-17 2 1
8 May-17 2 1
9 Nov-17 2 1
10 Oct-17 2 1
11 Sep-17 2 1
p<- plot_ly(sample_test, x = sample_test$month, name = 'alpha', type = 'scatter', mode = 'lines',
line = list(color = 'rgb(24, 205, 12)', width = 4)) %>%
layout(#title = "abbb",
xaxis = list(title = "Time"),
yaxis = list (title = "Percentage"))
for(trace in colnames(sample_test)){
p <- p %>% plotly::add_trace(y = as.formula(paste0("~`", trace, "`")), name = trace)
}
p
The output looks like this :
Does this help?
sample_test <- read.table(
text = ' month apple orange
2 Aug-17 2 1
3 Dec-17 2 1
4 Feb-18 2 1
5 Jan-18 2 1
6 Jul-17 2 1
7 Jun-17 2 1
8 May-17 2 1
9 Nov-17 2 1
10 Oct-17 2 1
11 Sep-17 2 1'
)
sample_test$month <- as.Date(paste('01', sample_test$month, sep = '-'), format = '%d-%b-%y')
library(plotly)
p <- plot_ly(sample_test, type = 'scatter', mode = 'lines',
line = list(color = 'rgb(24, 205, 12)', width = 4)) %>%
layout(#title = "abbb",
xaxis = list(title = "Time"),
yaxis = list (title = "Percentage", tickformat = '%'))
for(trace in colnames(sample_test)[2:ncol(sample_test)]){
p <- p %>% plotly::add_trace(x = sample_test[['month']], y = sample_test[[trace]], name = trace)
}
p
There are couple of things to note here -
While dealing with dates, it's best to format them as dates. This can save a lot of headache later on. It is also useful as most if not all functions that require dealing with dates have methods built to handle them.
While adding traces in a for loop, always reference the vector to be plotted explicitly like data$vector or data[['vector']] and not like y = ~vector, because plotly for some reason ends up plotting just one trace over and over again.
You can specify a trace for the first y element, which will give you your raw counts. Next you can add a format for your y-axis using tickformat, which will convert to percentages.
sample_test <- data.frame(month = c("Aug-17", "Dec-17", "Feb-18"), apple = c(2,2,2), orange = c(1,1,1))
p <- plot_ly(sample_test, x = sample_test$month, y = ~apple, name = 'alpha', type = 'scatter', mode = 'lines',
line = list(color = 'rgb(24, 205, 12)', width = 4)) %>%
layout(xaxis = list(title = "Time")) %>%
layout(yaxis = list(tickformat = "%", title = "Percentage"))
Although for some reason this appears to just multiply by 100 and add a % label for some reason, rather than actually calculate a percentage. From this SO answer, looks like that's all it does. I don't really use plotly, but in ggplot you can do this if you reshape your data to long and map your categorical variable (in this case fruit) as a percent.
Edit: per OP's comment, removed month from being traced.
p <- plot_ly(type = 'scatter', mode = 'lines') %>%
layout(yaxis = list(tickformat = "%", title = "Percentage"))
colNames <- names(sample_test)
colNames <- colNames[-which(colNames == 'month')]
for(trace in colNames){
p <- p %>% plotly::add_trace(data = sample_test, x = ~ month, y = as.formula(paste0("~`", trace, "`")), name = trace)
print(paste0("~`", trace, "`"))
}
p

how to plot networks over a map with the least overlap

I have some authors with their city or country of affiliation. I would like to know if it is possible to plot the coauthors' networks (figure 1), on the map, having the coordinates of the countries. Please consider multiple authors from the same country. [EDIT: Several networks could be generated as in the example and should not show avoidable overlaps]. This is intended for dozens of authors. A zooming option is desirable. Bounty promise +100 for future better answer.
refs5 <- read.table(text="
row bibtype year volume number pages title journal author
Bennett_1995 article 1995 76 <NA> 113--176 angiosperms. \"Annals of Botany\" \"Bennett Md, Leitch Ij\"
Bennett_1997 article 1997 80 2 169--196 estimates. \"Annals of Botany\" \"Bennett MD, Leitch IJ\"
Bennett_1998 article 1998 82 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"Bennett MD, Leitch IJ, Hanson L\"
Bennett_2000 article 2000 82 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"Bennett MD, Someone IJ\"
Leitch_2001 article 2001 83 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"Leitch IJ, Someone IJ\"
New_2002 article 2002 84 SUPPL.A 121--134 weeds. \"Annals of Botany\" \"New IJ, Else IJ\"" , header=TRUE,stringsAsFactors=FALSE)
rownames(refs5) <- refs5[,1]
refs5<-refs5[,2:9]
citations <- as.BibEntry(refs5)
authorsl <- lapply(citations, function(x) as.character(toupper(x$author)))
unique.authorsl<-unique(unlist(authorsl))
coauth.table <- matrix(nrow=length(unique.authorsl),
ncol = length(unique.authorsl),
dimnames = list(unique.authorsl, unique.authorsl), 0)
for(i in 1:length(citations)){
paper.auth <- unlist(authorsl[[i]])
coauth.table[paper.auth,paper.auth] <- coauth.table[paper.auth,paper.auth] + 1
}
coauth.table <- coauth.table[rowSums(coauth.table)>0, colSums(coauth.table)>0]
diag(coauth.table) <- 0
coauthors<-coauth.table
bip = network(coauthors,
matrix.type = "adjacency",
ignore.eval = FALSE,
names.eval = "weights")
authorcountry <- read.table(text="
author country
1 \"LEITCH IJ\" Argentina
2 \"HANSON L\" USA
3 \"BENNETT MD\" Brazil
4 \"SOMEONE IJ\" Brazil
5 \"NEW IJ\" Brazil
6 \"ELSE IJ\" Brazil",header=TRUE,fill=TRUE,stringsAsFactors=FALSE)
matched<- authorcountry$country[match(unique.authorsl, authorcountry$author)]
bip %v% "Country" = matched
colorsmanual<-c("red","darkgray","gainsboro")
names(colorsmanual) <- unique(matched)
gdata<- ggnet2(bip, color = "Country", palette = colorsmanual, legend.position = "right",label = TRUE,
alpha = 0.9, label.size = 3, edge.size="weights",
size="degree", size.legend="Degree Centrality") + theme(legend.box = "horizontal")
gdata
In other words, adding the names of authors, lines and bubbles to the map. Note, several authors maybe from the same city, or country and should not overlap.
Figure 1 Network
EDIT: The current JanLauGe answer overlaps two non-related networks. authors "ELSE" and "NEW" need to be apart from others as in figure 1.
Are you looking for a solution using exactly the packages you used, or would you be happy to use suite of other packages? Below is my approach, in which I extract the graph properties from the network object and plot them on a map using the ggplot2 and map package.
First I recreate the example data you gave.
library(tidyverse)
library(sna)
library(maps)
library(ggrepel)
set.seed(1)
coauthors <- matrix(
c(0,3,1,1,3,0,1,0,1,1,0,0,1,0,0,0),
nrow = 4, ncol = 4,
dimnames = list(c('BENNETT MD', 'LEITCH IJ', 'HANSON L', 'SOMEONE ELSE'),
c('BENNETT MD', 'LEITCH IJ', 'HANSON L', 'SOMEONE ELSE')))
coords <- data_frame(
country = c('Argentina', 'Brazil', 'USA'),
coord_lon = c(-63.61667, -51.92528, -95.71289),
coord_lat = c(-38.41610, -14.23500, 37.09024))
authorcountry <- data_frame(
author = c('LEITCH IJ', 'HANSON L', 'BENNETT MD', 'SOMEONE ELSE'),
country = c('Argentina', 'USA', 'Brazil', 'Brazil'))
Now I generate the graph object using the snp function network
# Generate network
bip <- network(coauthors,
matrix.type = "adjacency",
ignore.eval = FALSE,
names.eval = "weights")
# Graph with ggnet2 for centrality
gdata <- ggnet2(bip, color = "Country", legend.position = "right",label = TRUE,
alpha = 0.9, label.size = 3, edge.size="weights",
size="degree", size.legend="Degree Centrality") + theme(legend.box = "horizontal")
From the network object we can extract the values of each edge, and from the ggnet2 object we can get degree of centrality for nodes as below:
# Combine data
authors <-
# Get author numbers
data_frame(
id = seq(1, nrow(coauthors)),
author = sapply(bip$val, function(x) x$vertex.names)) %>%
left_join(
authorcountry,
by = 'author') %>%
left_join(
coords,
by = 'country') %>%
# Jittering points to avoid overlap between two authors
mutate(
coord_lon = jitter(coord_lon, factor = 1),
coord_lat = jitter(coord_lat, factor = 1))
# Get edges from network
networkdata <- sapply(bip$mel, function(x)
c('id_inl' = x$inl, 'id_outl' = x$outl, 'weight' = x$atl$weights)) %>%
t %>% as_data_frame
dt <- networkdata %>%
left_join(authors, by = c('id_inl' = 'id')) %>%
left_join(authors, by = c('id_outl' = 'id'), suffix = c('.from', '.to')) %>%
left_join(gdata$data %>% select(label, size), by = c('author.from' = 'label')) %>%
mutate(edge_id = seq(1, nrow(.)),
from_author = author.from,
from_coord_lon = coord_lon.from,
from_coord_lat = coord_lat.from,
from_country = country.from,
from_size = size,
to_author = author.to,
to_coord_lon = coord_lon.to,
to_coord_lat = coord_lat.to,
to_country = country.to) %>%
select(edge_id, starts_with('from'), starts_with('to'), weight)
Should look like this now:
dt
# A tibble: 8 × 11
edge_id from_author from_coord_lon from_coord_lat from_country from_size to_author to_coord_lon
<int> <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 1 BENNETT MD -51.12756 -16.992729 Brazil 6 LEITCH IJ -65.02949
2 2 BENNETT MD -51.12756 -16.992729 Brazil 6 HANSON L -96.37907
3 3 BENNETT MD -51.12756 -16.992729 Brazil 6 SOMEONE ELSE -52.54160
4 4 LEITCH IJ -65.02949 -35.214117 Argentina 4 BENNETT MD -51.12756
5 5 LEITCH IJ -65.02949 -35.214117 Argentina 4 HANSON L -96.37907
6 6 HANSON L -96.37907 36.252312 USA 4 BENNETT MD -51.12756
7 7 HANSON L -96.37907 36.252312 USA 4 LEITCH IJ -65.02949
8 8 SOMEONE ELSE -52.54160 -9.551913 Brazil 2 BENNETT MD -51.12756
# ... with 3 more variables: to_coord_lat <dbl>, to_country <chr>, weight <dbl>
Now moving on to plotting this data on a map:
world_map <- map_data('world')
myMap <- ggplot() +
# Plot map
geom_map(data = world_map, map = world_map, aes(map_id = region),
color = 'gray85',
fill = 'gray93') +
xlim(c(-120, -20)) + ylim(c(-50, 50)) +
# Plot edges
geom_segment(data = dt,
alpha = 0.5,
color = "dodgerblue1",
aes(x = from_coord_lon, y = from_coord_lat,
xend = to_coord_lon, yend = to_coord_lat,
size = weight)) +
scale_size(range = c(1,3)) +
# Plot nodes
geom_point(data = dt,
aes(x = from_coord_lon,
y = from_coord_lat,
size = from_size,
colour = from_country)) +
# Plot names
geom_text_repel(data = dt %>%
select(from_author,
from_coord_lon,
from_coord_lat) %>%
unique,
colour = 'dodgerblue1',
aes(x = from_coord_lon, y = from_coord_lat, label = from_author)) +
coord_equal() +
theme_bw()
Obviously you can change the colour and design in the usual way with ggplot2 grammar. Notice that you could also use geom_curve and the arrow aesthetic to get a plot similar to the one in the uber post linked in the comments above.
As an effort to avoid the overlapping of the 2 networks, I came to this modification of the x and y coordenates of the ggplot, which by default does not overlap the networks, see figure 1 in the question.
# get centroid positions for countries
# add coordenates to authorcountry table
# download and unzip
# https://worldmap.harvard.edu/data/geonode:country_centroids_az8
setwd("~/country_centroids_az8")
library(rgdal)
cent <- readOGR('.', "country_centroids_az8", stringsAsFactors = F)
countrycentdf<-cent#data[,c("name","Longitude","Latitude")]
countrycentdf$name[which(countrycentdf$name=="United States")]<-"USA"
colnames(countrycentdf)[names(countrycentdf)=="name"]<-"country"
authorcountry$Longitude<-countrycentdf$Longitude[match(authorcountry$country,countrycentdf$country)]
authorcountry$Latitude <-countrycentdf$Latitude [match(authorcountry$country,countrycentdf$country)]
# original coordenates of plot and its transformation
ggnetbuild<-ggplot_build(gdata)
allcoord<-ggnetbuild$data[[3]][,c("x","y","label")]
allcoord$Latitude<-authorcountry$Latitude [match(allcoord$label,authorcountry$author)]
allcoord$Longitude<-authorcountry$Longitude [match(allcoord$label,authorcountry$author)]
allcoord$country<-authorcountry$country [match(allcoord$label,authorcountry$author)]
# increase with factor the distance among dots
factor<-7
allcoord$coord_lat<-allcoord$y*factor+allcoord$Latitude
allcoord$coord_lon<-allcoord$x*factor+allcoord$Longitude
allcoord$author<-allcoord$label
# plot as in answer of JanLauGe, without jitter
library(tidyverse)
library(ggrepel)
authors <-
# Get author numbers
data_frame(
id = seq(1, nrow(coauthors)),
author = sapply(bip$val, function(x) x$vertex.names)) %>%
left_join(
allcoord,
by = 'author')
# Continue as in answer of JanLauGe
networkdata <- ##
dt <- ##
world_map <- map_data('world')
myMap <- ##
myMap

Get separate graphed lines for different variables

I have a variable data with the following structure:
week: int 1 1 2 2 3 3 4 4 5 5 ...
earn: int 2 3 2 7 8 9 2 6 4 2 ...
name: chr "C", "A", "C", "A" ...
Each name (person) has a week with what they earned. So from the above we can see that C earned 2 in week 1 while A earned 3 in week 1. C earned 2 in week two while A earned 7 in week too.
I wish to plot this on a line graph. The below is what I have tried.
p <- plot.ly(data, x = data$week, name = "Week", type = "scatter", mode = "lines") %>%
add_trace(y = data$earn, name = "earn", mode = "lines+markers) %>%
add_trace(y = data$earn, name = "earn", mode = "markers")
p
However, this gives a graph with one line where the marker for week one has (2,3) as this is both the earning on this week. However I would like two lines so it can be clearly seen the difference in earnings for both names.
Defining color will give you what you want.
p <- plot_ly(data, x = ~week, y = ~ earn) %>%
add_lines(color = ~name) %>%
add_markers(color = ~name, showlegend = FALSE)
p
alternatively you can also use:
p <- plot_ly(data=data, x = ~week, y = ~ earn) %>%
add_traces(color = ~name, mode = "lines+markers")
p

Resources