scatter3d - colour dots based on 4th variable - r

I'd like to do a 3D plot with scatter3d but don't know how to colour the dots based on the values of a fourth variable (here VAR4). Could someone kindly help me? It would be great if by adding these colours, I still keep the fading effect that is in the default version (the points the further in the 3D plot appear with a lighter colour). Thank you!
df <- data.frame(VAR4=c(10,52,78,34,13,54),
A=c(12, 8, 10, 7, 13, 15),
B=c(4,3,2,1,7,5),
C=c(1,3,2,1,3,1))
library(rgl)
library(car)
scatter3d(A ~B + C,color=VAR4, data=df, surface=F)

You can try this:
library(rgl)
library(car)
# add as group the VAR4, making as factor
scatter3d(A ~B + C, groups = as.factor(df$VAR4), data=df, surface=F)
But, if I could advice you, it's more pretty and nice to use the plotly package:
library(plotly)
# in this case we use VAR4 as continuous, you can put color = ~as.factor(VAR4) to have it as factors
plot_ly(df, x = ~A, y = ~B, z = ~C, color = ~VAR4) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'A'),
yaxis = list(title = 'B'),
zaxis = list(title = 'C')))

Related

R Plotly linked subplot with percentage histogram and categories coloured

The Background
I am using the plotly API in R to create two linked plots. The first is a scatter plot and the second is a bar chart that should show the percentage of data belonging to each category, in the current selection. I can't make the percentages behave as expected.
The problem
The plots render correctly and the interactive selection works fine. When I select a set of data points in the top scatter plot, I would like to see the percentage of that selection that belongs to each category. Instead what I see is the percentage of points in that selection in that category that belong to that category, in other words always 100%. I guess this is because I set color = ~c which applies a grouping to the category.
The Example
Here is a reproducible example to follow. First create some dummy data.
library(plotly)
n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
data = data.frame(
x = make_axis(n),
y = make_axis(n),
c = rep(c("A", "B"), each = n)
)
Create a sharedData object and supply it to plot_ly() for the base plot.
shared_data = data %>%
highlight_key()
baseplot = plot_ly(shared_data)
Make the individual panels.
points = baseplot %>%
add_markers(x = ~x, y = ~y, color = ~c)
bars = baseplot %>%
add_histogram(x = ~c, color = ~c, histnorm = "percent", showlegend = FALSE) %>%
layout(barmode = "group")
And put them together in a linked subplot with selection and highlighting.
subplot(points, bars) %>%
layout(dragmode = "select") %>%
highlight("plotly_selected")
Here is a screenshot of this to illustrate the problem.
An Aside
Incidentally when I set histnorm = "" in add_histogram() then I get closer to the expected behaviour but I do want percentages and not counts. When I remove color = ~c then I get closer to the expected behaviour but I do want the consistent colour scheme.
What have I tried
I have tried manually supplying the colours but then some of the linked selection breaks. I have tried creating a separate summarised data set from the sharedData object first and then plotting that but again this breaks the linkage between the plots.
If anyone has any clues as to how to solve this I would be very grateful.
To me it seems the behaviour you are looking for isn't implemented in plotly.
Please see schema(): object ► traces ► histogram ► attributes ► histnorm ► description
However, here is the closest I was able to achive via add_bars and perprocessing the data (Sorry for adding data.table, you will be able to do the same in base R, just personal preference):
library(plotly)
library(data.table)
n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
DT = data.table(
x = make_axis(n),
y = make_axis(n),
c = rep(c("A", "B"), each = n)
)
DT[, grp_percent := rep(100/.N, .N), by = "c"]
shared_data = DT %>%
highlight_key()
baseplot = plot_ly(shared_data)
# Make the individual panels.
points = baseplot %>%
add_markers(x = ~x, y = ~y, color = ~c)
bars = baseplot %>%
add_bars(x = ~c, y = ~grp_percent, color = ~c, showlegend = FALSE) %>%
layout(barmode = "group")
subplot(points, bars) %>%
layout(dragmode = "select") %>%
highlight("plotly_selected")
Unfortunately, the resulting hoverinfo isn't really desirable.

How can I explicitly assign unique colors to every point in an R Plotly scatterplot?

I have some data like this:
data <- data.frame(x=runif(500), y=runif(500), z=runif(500))
I want a scatterplot with points colored independently/discretely in each dimension (X, Y, and Z) using RGB values.
This is what I have tried:
Code:
library(dplyr)
library(plotly)
xyz_colors <- rgb(data$x, data$y, data$z)
plot_ly(data = data,
x = ~x, y = ~y, z = ~z,
color= xyz_colors,
type = 'scatter3d',
mode='markers') %>%
layout(scene = list(xaxis = list(title = 'X'),
yaxis = list(title = 'Y'),
zaxis = list(title = 'Z')))
Plot:
RColorBrewer thinks I'm trying to create a continuous scale from 500 intermediate colors:
Warning messages:
1: In RColorBrewer::brewer.pal(N, "Set2") :
n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
2: In RColorBrewer::brewer.pal(N, "Set2") :
n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
What are some correct ways to color the points like this in R with Plotly?
Also, how can one generally assign colors to data points in R with Plotly, individually?
To clarify, I am trying to color each point where the color is of the format "#XXYYZZ" where 'XX' a value between 00 and FF linearly mapped to the value of data$x from 0 to 1. That is, the X dimension determines the amount of red, the Y dimension determines the amount of green, and the Z dimension determines the amount of blue. At 0,0,0 the point should be black and at 1,1,1 the point should be white. The reason for this is to make as easy to visualize the 3D position of the points as possible.
Updated answer after comments:
So, is there no way to color every point separately?
Yes, there is through the power and flexibility of add_traces(). And it's a lot less cumbersome than I first thought.
Just set up an empty plotly figure with some required 3D features:
p <-plot_ly(data = data, type = 'scatter3d', mode='markers')
And apply add_traces() in a loop over each defined color:
for (i in seq_along(xyz_colors)){
p <- p %>% add_trace(x=data$x[i], y=data$y[i], z=data$z[i],
marker = list(color = xyz_colors[i], opacity=0.6, size = 5),
name = xyz_colors[i])
}
And you can easily define single points with a color of your choice like this:
p <- p %>% add_trace(x=0, y=0, z=0,
marker = list(color = rgb(0, 0, 0), opacity=0.8, size = 20),
name = xyz_colors[i])
Plot:
Complete code:
library(dplyr)
library(plotly)
# data and colors
data <- data.frame(x=runif(500), y=runif(500), z=runif(500))
xyz_colors <- rgb(data$x, data$y, data$z)
# empty 3D plot
p <-plot_ly(data = data, type = 'scatter3d', mode='markers') %>%
layout(scene = list(xaxis = list(title = 'X'),
yaxis = list(title = 'Y'),
zaxis = list(title = 'Z')))
# one trace per color
for (i in seq_along(xyz_colors)){
p <- p %>% add_trace(x=data$x[i], y=data$y[i], z=data$z[i],
marker = list(color = xyz_colors[i], opacity=0.6, size = 5),
name = xyz_colors[i])
}
# Your favorite data point with your favorite color
p <- p %>% add_trace(x=0, y=0, z=0,
marker = list(color = rgb(0, 0, 0), opacity=0.8, size = 20),
name = xyz_colors[i])
p
Original answer:
In 3D plots you can use the same color for all of the points, discern different clusters or categories from each other using different colors, or you use individual colors for each point to illustrate a fourth value (or fourth dimension if you like, as described here) in your dataset. All these approaches are, as you put it, examples of '[...] correct ways to color the points [...]'. Have a look below and see if this suits your needs. I've included fourthVal <- data$x+data$y+data$z as an example for an extra dimension. What you end up using will depend entirely on your dataset and what you'd like to illustrate.
Code:
library(dplyr)
library(plotly)
data <- data.frame(x=runif(500), y=runif(500), z=runif(500))
xyz_colors <- rgb(data$x, data$y, data$z)
fourthVal <- data$x+data$y+data$z
plot_ly(data = data,
x = ~x, y = ~y, z = ~z,
color= fourthVal,
type = 'scatter3d',
mode='markers') %>%
layout(scene = list(xaxis = list(title = 'X'),
yaxis = list(title = 'Y'),
zaxis = list(title = 'Z')))
Plot:

Bar-plot does not show bar for only one x value

I am having an issue with a plotly bar plot when I define the date range for the x-axis.
When there is one or more data points with the same x-value, the bars do not show in the plot. If there is at least two different x-values or if I do not use a x-axis range, then the bars show as they should.
Below follows an example (I am currently using lubridate to deal with dates).
library(lubridate)
library(plotly)
# Same x-value: bar does not show
plot_ly(x = c(ymd("2019-08-25"), ymd("2019-08-25")), y = c(1, 2), type = "bar") %>%
layout(xaxis = list(range = ymd(c("2019-08-20", "2019-08-30"))))
# Different x-values: bars are shown
plot_ly(x = c(ymd("2019-08-25"), ymd("2019-08-26")), y = c(1, 2), type = "bar") %>%
layout(xaxis = list(range = ymd(c("2019-08-20", "2019-08-30"))))
# No x-axis range defined, same x-values: the bar is shown
plot_ly(x = c(ymd("2019-08-25"), ymd("2019-08-25")), y = c(1, 2), type = "bar")
Any solution?
Edit: For comparison, ggplot2 does not have the same issue:
# ggplot works like expected
library(lubridate)
library(ggplot2)
ggplot(NULL, aes(x = ymd(c("2019-08-25", "2019-08-25")), y = c(1, 2))) +
geom_col() +
xlim(ymd(c("2019-08-20", "2019-08-30")))
Your code is actually being understood in your first version, but you need to set the width of the bars so they show up in the end.
I'm not sure what the units are (maybe miliseconds???) so you may need to play around with it or do research to get a good width for your actual scenario.
plot_ly() %>%
add_bars(x = c(ymd("2019-08-25"), ymd("2019-08-25")), y = c(1, 2), type = "bar",width=100000000)%>%
layout(xaxis = list(range = ymd(c("2019-08-20", "2019-08-30"))))

Adding color and bubble size legend in R plotly

Probably an easy one.
I have an xy dataset I'd like to plot using R's plotly. Here are the data:
set.seed(1)
df <- data.frame(x=1:10,y=runif(10,1,10),group=c(rep("A",9),"B"),group.size=as.integer(runif(10,1,10)))
I'd like to color the data by df$group and have the size of the points follow df$group.size (i.e., a bubble plot). In addition, I'd like to have both legends added.
This is my naive attempt:
require(plotly)
require(dplyr)
main.plot <-
plot_ly(type='scatter',mode="markers",color=~df$group,x=~df$x,y=~df$y,size=~df$group.size,marker=list(sizeref=0.1,sizemode="area",opacity=0.5),data=df,showlegend=T) %>%
layout(title="Title",xaxis=list(title="X",zeroline=F),yaxis=list(title="Y",zeroline=F))
which comes out as:
and unfortunately messes up the legend, at least how I want it to be: a point for each group having the same size but different colors.
Then to add a legend for the group.size I followed this, also helped by aocall's answer:
legend.plot <- plot_ly() %>% add_markers(x = 1, y = unique(df$group.size),
size = unique(df$group.size),
showlegend = T,
marker = list(sizeref=0.1,sizemode="area")) %>%
layout(title="TITLE",xaxis = list(zeroline=F,showline=F,showticklabels=F,showgrid=F),
yaxis=list(showgrid=F))
which comes out as:
Here my problem is that the legend is including values that do not exist in my data.
then I combine them using subplot:
subplot(legend.plot, main.plot, widths = c(0.1, 0.9))
I get this:
where the legend title is eliminated
So I'd be helpful for some help.
Based on the updated request:
Note the changes in legend.plot (mapping values to a sequence of integers, then manually changing the axis tick text), and the use of annotations to get a legend title. As explained in this answer, only one title may be used, regardless of how many subplots are used.
The circle on the plot legend seems to correspond to the minimum point size of each trace. Thus, I've added a point at (12, 12), and restricted the range of the axes to ensure it isn't shown.
titleX and titleY control the display of axis labels, as explained here.
set.seed(1)
df <- data.frame(x=1:10,y=runif(10,1,10),group=c(rep("A",9),"B"),group.size=as.integer(runif(10,1,10)))
require(plotly)
require(dplyr)
## Take unique values before adding dummy value
unique_vals <- unique(df$group.size)
df <- rbind(c(12, 12, "B", 1), df)
df[c(1, 2, 4)] <- lapply(df[c(1, 2, 4)], as.numeric)
main.plot <-
plot_ly(type='scatter',
mode="markers",
color=~df$group,
x=~df$x,
y=~df$y,
size=~df$group.size,
marker=list(
sizeref=0.1,
sizemode="area",
opacity=0.5),
data=df,
showlegend=T) %>%
layout(title="Title",
xaxis=list(title="X",zeroline=F, range=c(0, 11)),
yaxis=list(title="Y",zeroline=F, range=c(0, 11)))
legend.plot <- plot_ly() %>%
add_markers(x = 1,
y = seq_len(length(unique_vals)),
size = sort(unique_vals),
showlegend = F,
marker = list(sizeref=0.1,sizemode="area")) %>%
layout(
annotations = list(
list(x = 0.2,
y = 1,
text = "LEGEND TITLE",
showarrow = F,
xref='paper',
yref='paper')),
xaxis = list(
zeroline=F,
showline=F,
showticklabels=F,
showgrid=F),
yaxis=list(
showgrid=F,
tickmode = "array",
tickvals = seq_len(length(unique_vals)),
ticktext = sort(unique_vals)))
subplot(legend.plot, main.plot, widths = c(0.1, 0.9),
titleX=TRUE, titleY=TRUE)
Firstly, you are only passing in the unique values to the legend. If you pass in all possible values (ie, seq(min(x), max(x), by=1), or in this case seq_len(max(x))) the legend will show the full range.
Secondly, sizeref and sizemode in the marker argument alter the way that point size is calculated. The following example should produce a more consistent plot:
set.seed(1)
df <- data.frame(x=1:10,y=runif(10,1,10),group=c(rep("A",9),"B"),group.size=as.integer(runif(10,1,10)))
require(plotly)
require(dplyr)
a <- plot_ly(type='scatter',mode="markers",
color=~df$group,
x=~df$x,
y=~df$y,
size=df$group.size,
marker = list(sizeref=0.1, sizemode="area"),
data=df,
showlegend=F) %>%
layout(title="Title",
xaxis=list(title="X",zeroline=F),
yaxis=list(title="Y",zeroline=F))
b <- plot_ly() %>% add_markers(x = 1, y = seq_len(max(df$group.size)),
size = seq_len(max(df$group.size)),
showlegend = F,
marker = list(sizeref=0.1, sizemode="area")) %>%
layout(
xaxis = list(zeroline=F,showline=F,showticklabels=F,showgrid=F),
yaxis=list(showgrid=F))
subplot(b, a, widths = c(0.1, 0.9))

Adding a Vertical / Horizontal Reference Line using Plotly

I'm working with a proportional bar chart and I'd like to draw a vertical line at a particular X value. I'd prefer to accomplish this using the plotly package, but it doesn't seem to be easily done.
The solution found at Horizontal/Vertical Line in plotly doesn't seem to get the job done.
I've provided some sample code below that could be used to draw the vertical line at X = 3.
library(plotly)
library(ggplot2)
plot_ly(diamonds[1:1000, ], x = ~x, y = ~cut, color = ~color) %>% add_bars()
I'd appreciate any help in this matter.
I found some information about lines in plotly from Zappos Engineering here. The range -0.5 to 4.5 is because there are five categories in the data provided, each centered on a whole number. The y range creates the line, while the x constant (at 3) keeps the line vertical.
p <- plot_ly(diamonds[1:1000, ], x = ~x, y = ~cut, color = ~color) %>% add_bars()
p <- layout(p, shapes = list(type = "line", fillcolor = "red",
line = list(color = "red"),
opacity = 1,
x0 = 3, x1 = 3, xref = 'x',
y0 = -0.5, y1 = 4.5, yref = 'y'))

Resources