I am using R, package plotly and I have problem with connecting first and last point in my graph. I want to avoid it. The code is following:
graph<-plot_ly(data, x = ~date, y = ~variable, z = ~value, mode="lines")
I tried google some solution, but nothing work so far.
The graph looks like this.
Can anyone help?
If I understand correctly, you don't want the lines in variables 1, 2, 3... to be connected to each other, right?
If this is the case, I think what is happening is that plotly is assuming all your data belongs to the same series.
You need to tell it that the data from each variable is a different series. You can do it by mapping the variable to an attribute of the line (color, linetype, stroke, etc...).
library(tidyverse)
library(plotly)
# Create a data set from EuStockMarkets data for this example
# (this is just to put the data in a dataframe in the same format as your dataset. You can skip this part)
df.data <- EuStockMarkets %>% as.data.frame() %>%
dplyr::mutate(date=time(EuStockMarkets)) %>%
dplyr::mutate(year=as.integer(floor(date))) %>%
dplyr::mutate(day.of.year = ceiling((date-year) * 365)) %>%
dplyr::mutate(date=ymd(sprintf('%4d-01-01', year))+ days(day.of.year)) %>%
dplyr::select(-year, -day.of.year) %>%
tidyr::pivot_longer(-date, names_to = "variable") %>%
dplyr::arrange(variable, date)
plot_ly(data=df.data, x = ~date, y = ~variable, z = ~value, type="scatter3d", mode='lines', color = ~variable)
If you don't want the lines in each variable to have different colors, you can use the split argument, which will create a different trace (i.e., line) for each value of variable. I had to set the color of the line manually otherwise plotly set a different color automatically. I also removed the legend.
plot_ly(data=df.data, x = ~date, y = ~variable, z = ~value, type="scatter3d", mode='lines', split=~variable, color=I('black')) %>% layout(showlegend = FALSE)
If you can add the first point in you list again, but this time after the last point, then a line between last to first will be added. Or you can add a curve to the plot consisting of the first and last oint only.
Related
I want to plot multiple points connected by lines in plotly. While there should be exactly one line per id-group (as defined in the dataset), the color of those lines should depend on a different column in the data.
Consider this example data:
df <- data.frame(
x = c(1,2,3,4,5,6),
y = c(0,0,3,1,4,5),
id= c(1,1,2,2,3,3), # defines belonging to line
group = c("A","A","A","A","B","B") # defines color of the point and line
)
if I plot it this way, it kind of works:
plot_ly(df,type = "scatter", mode = "lines+markers",
x = ~x,
y = ~y,
split = ~id,
color = ~group)
Unfortunately, the legend is also split along split and color:
I only want it to display the color part of the legend. In this example case the legend would only have two entries. One for "A" and one for "B".
Is there a simple way to do this that I am not seeing here?
Addendum: An id is always linked to one group only. There are no id values, that are shared bewtween multiple different group values
plotly's color parameter will always create traces based on the provided data (just like split, but with colors mapped).
However, we can combine the separate legend items via legendgroup and display only one of them:
library(plotly)
DF <- data.frame(
x = c(1,2,3,4,5,6),
y = c(0,0,3,1,4,5),
id= c(1,1,2,2,3,3), # defines belonging to line
group = c("A","A","A","A","B","B") # defines color of the point and line
)
fig <- plot_ly(DF, type = "scatter", mode = "lines+markers",
x = ~x,
y = ~y,
split = ~id,
color = ~group,
name = ~group,
legendgroup = ~group) %>%
layout(showlegend = TRUE) %>%
style(showlegend = FALSE, traces = 1)
fig
Please check my answer here for a more general approach.
The Background
I am using the plotly API in R to create two linked plots. The first is a scatter plot and the second is a bar chart that should show the percentage of data belonging to each category, in the current selection. I can't make the percentages behave as expected.
The problem
The plots render correctly and the interactive selection works fine. When I select a set of data points in the top scatter plot, I would like to see the percentage of that selection that belongs to each category. Instead what I see is the percentage of points in that selection in that category that belong to that category, in other words always 100%. I guess this is because I set color = ~c which applies a grouping to the category.
The Example
Here is a reproducible example to follow. First create some dummy data.
library(plotly)
n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
data = data.frame(
x = make_axis(n),
y = make_axis(n),
c = rep(c("A", "B"), each = n)
)
Create a sharedData object and supply it to plot_ly() for the base plot.
shared_data = data %>%
highlight_key()
baseplot = plot_ly(shared_data)
Make the individual panels.
points = baseplot %>%
add_markers(x = ~x, y = ~y, color = ~c)
bars = baseplot %>%
add_histogram(x = ~c, color = ~c, histnorm = "percent", showlegend = FALSE) %>%
layout(barmode = "group")
And put them together in a linked subplot with selection and highlighting.
subplot(points, bars) %>%
layout(dragmode = "select") %>%
highlight("plotly_selected")
Here is a screenshot of this to illustrate the problem.
An Aside
Incidentally when I set histnorm = "" in add_histogram() then I get closer to the expected behaviour but I do want percentages and not counts. When I remove color = ~c then I get closer to the expected behaviour but I do want the consistent colour scheme.
What have I tried
I have tried manually supplying the colours but then some of the linked selection breaks. I have tried creating a separate summarised data set from the sharedData object first and then plotting that but again this breaks the linkage between the plots.
If anyone has any clues as to how to solve this I would be very grateful.
To me it seems the behaviour you are looking for isn't implemented in plotly.
Please see schema(): object ► traces ► histogram ► attributes ► histnorm ► description
However, here is the closest I was able to achive via add_bars and perprocessing the data (Sorry for adding data.table, you will be able to do the same in base R, just personal preference):
library(plotly)
library(data.table)
n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
DT = data.table(
x = make_axis(n),
y = make_axis(n),
c = rep(c("A", "B"), each = n)
)
DT[, grp_percent := rep(100/.N, .N), by = "c"]
shared_data = DT %>%
highlight_key()
baseplot = plot_ly(shared_data)
# Make the individual panels.
points = baseplot %>%
add_markers(x = ~x, y = ~y, color = ~c)
bars = baseplot %>%
add_bars(x = ~c, y = ~grp_percent, color = ~c, showlegend = FALSE) %>%
layout(barmode = "group")
subplot(points, bars) %>%
layout(dragmode = "select") %>%
highlight("plotly_selected")
Unfortunately, the resulting hoverinfo isn't really desirable.
A feature of plotly that I really like is the ability to dive into the data by clicking on specific groupings in the legend. For example, if I set a column to color for a scatter plot, I can filter on the various color variables. However, I only know how to create this filter when assigning the column to color. Is there a way to assign a variable to the legend to filter without changing the design of the plot. For example is there a function like legend_filter in plotly I could use:
iris2 <- iris
iris2$sample <- sample(c('A','B'), nrow(iris2), replace = T)
p <- plot_ly(data = iris2, x = ~Sepal.Length, y = ~Petal.Length, color = ~Species,
# legend_filter = ~sample
)
p
such that 'A' and 'B' show up in the side bar to interactively click on, but aren't referenced on the graph?
Thanks
This method lets you toggle all of A and B on off as a group by clicking any one of the entries.
Legend is definitely cluttered side, and you have to add another set of markers for each level of your grouping variable. I don't think this is really the end result you're looking for, but I figured I might as well post anyway in case any pieces of it are useful.
plot_ly() %>%
add_markers(data = iris2[iris2$sample == "A",],
x = ~Sepal.Length,
y = ~Petal.Length,
color = ~Species,
legendgroup = "A",
name = "A") %>%
add_markers(data = iris2[iris2$sample == "B",],
x = ~Sepal.Length,
y = ~Petal.Length,
color = ~Species,
legendgroup = "B",
name = "B")
Yields
I have a bunch of 'paired' observations from a study for the same subject, and I am trying to build a spaghetti plot to visualize these observations as follows:
library(plotly)
df <- data.frame(id = rep(1:10, 2),
type = c(rep('a', 10), rep('b', 10)),
state = rep(c(0, 1), 10),
values = c(rnorm(10, 2, 0.5), rnorm(10, -2, 0.5)))
df <- df[order(df$id), ]
plot_ly(df, x = type, y = values, group = id, type = 'line') %>%
layout(showlegend = FALSE)
It produces the correct plot I am seeking. But, the code shows each grouped line in own color, which is really annoying and distracting. I can't seem to find a way to get rid of colors.
Bonus question: I actually want to use color = state and actually color the sloped lines by that variable instead.
Any approaches / thoughts?
You can set the lines to the same colour like this
plot_ly(df, x = type, y = values, group = id, type = 'scatter', mode = 'lines+markers',
line=list(color='#000000'), showlegend = FALSE)
For the 'bonus' two-for-the-price-of-one question 'how to color by a different variable to the one used for grouping':
If you were only plotting markers, and no lines, this would be simple, as you can simply provide a vector of colours to marker.color. Unfortunately, however, line.color only takes a single value, not a vector, so we need to work around this limitation.
Provided the data are not too numerous (in which case this method becomes slow, and a faster method is given below), you can set colours of each line individually by adding them as separate traces one by one in a loop (looping over id)
p <- plot_ly()
for (id in df$id) {
col <- c('#AA0000','#0000AA')[df[which(df$id==id),3][1]+1] # calculate color for this line based on the 3rd column of df (df$state).
p <- add_trace(data=df[which(df$id==id),], x=type, y=values, type='scatter', mode='markers+lines',
marker=list(color=col),
line=list(color=col),
showlegend = FALSE,
evaluate=T)
}
p
Although this one-trace-per-line approach is probably the simplest way conceptually, it does become very (impractically) slow if applied to hundreds or thousands of line segments. In this case there is a faster method, which is to plot only one line per colour, but to split this line up into multiple segments by inserting NA's between the separate segments and using the connectgaps=FALSE option to break the line into segments where there are missing data.
Begin by using dplyr to insert missing values between line segements (i.e. for each unique id we add a row containing NA in the columns that provide x and y coordinates).
library(dplyr)
df %<>% distinct(id) %>%
`[<-`(,c(2,4),NA) %>%
rbind(df) %>%
arrange (id)
and plot, using connectgaps=FALSE:
plot_ly(df, x = type, y = values, group = state, type = 'scatter', mode = 'lines+markers',
showlegend = FALSE,
connectgaps=FALSE)
How can I create a subplot grid with Plotly in R?
The official site has this nice Python example:
The python code has the option rows=2 and cols=2, but in R the subplot function has just the parameter nrows, without ncols.
I tried this example in R, but nrows do not seam to work as expected:
# Basic subplot
library(plotly)
p <- plot_ly(economics, x = date, y = uempmed)
subplot(p,p,p,p,
margin = 0.05,
nrows=2
) %>% layout(showlegend = FALSE)
They are in a line instead of in a grid. See the result:
Here is the R suplots page for reference. Unfortunately, use ggplotly is not a option for me, like this
UPDATE
It was a bug. Plotly team is very fast, and it was fixed in just 3 days (check here)! Github version is already updated. Great job!
This seems to be a genuine bug in the way subplot() generates the y-axis domains for the two plots. Indeed, they overlap which can easily be seen if you execute
p <- plot_ly(economics, x = date, y = uempmed)
q <- plot_ly(economics, x = date, y = unemploy)
subplot(p,q, nrows = 2)
This will produce the following plot:
If you take a close look at the y-axis you see that they overlap. That hints at a problem in the way subplot() defines the domain of the y-axes of the subplot.
If we correct the domain specification of the y-axes manually (following the plotly documentation), we can solve the problem:
subplot(p,q, nrows = 2) %>% layout(yaxis = list(domain = c(0, 0.48)),
yaxis2 = list(domain = c(0.52, 1)))
This produces:
Now, if you want to reproduce the 4x4 subplot matrix similar to the Python example, you probably need to manually adjust the x-axis domains in a similar way.
Since this is a bug and my solution is only a workaround, I suggest, however, that you file an issue with plotly on GitHub.
Based on this:
p <- economics %>%
tidyr::gather(variable, value, -date) %>%
transform(id = as.integer(factor(variable))) %>%
plot_ly(x = ~date, y = ~value, color = ~variable, colors = "Dark2",
yaxis = ~paste0("y", id)) %>%
add_lines() %>%
subplot(nrows = 5, shareX = TRUE)