How do I add multiple regression lines to the same plot in plotly?
I want to graph the scatter plot, as well as a regression line for each CATEGORY
The scatter plot plots fine, however the graph lines are not graphed correctly (as compared to excel outputs, see below)
df <- as.data.frame(1:19)
df$CATEGORY <- c("C","C","A","A","A","B","B","A","B","B","A","C","B","B","A","B","C","B","B")
df$x <- c(126,40,12,42,17,150,54,35,21,71,52,115,52,40,22,73,98,35,196)
df$y <- c(92,62,4,23,60,60,49,41,50,76,52,24,9,78,71,25,21,22,25)
df[,1] <- NULL
fv <- df %>%
filter(!is.na(x)) %>%
lm(x ~ y + y*CATEGORY,.) %>%
fitted.values()
p <- plot_ly(data = df,
x = ~x,
y = ~y,
color = ~CATEGORY,
type = "scatter",
mode = "markers"
) %>%
add_trace(x = ~y, y = ~fv, mode = "lines")
p
Apologies for not adding in all the information beforehand, and thanks for adding the suggestion of "y*CATEGORY" to fix the parallel line issue.
Excel Output
https://i.imgur.com/2QMacSC.png
R Output
https://i.imgur.com/LNypvDn.png
Try this:
library(plotly)
df <- as.data.frame(1:19)
df$CATEGORY <- c("C","C","A","A","A","B","B","A","B","B","A","C","B","B","A","B","C","B","B")
df$x <- c(126,40,12,42,17,150,54,35,21,71,52,115,52,40,22,73,98,35,196)
df$y <- c(92,62,4,23,60,60,49,41,50,76,52,24,9,78,71,25,21,22,25)
df[,1] <- NULL
df$fv <- df %>%
filter(!is.na(x)) %>%
lm(y ~ x*CATEGORY,.) %>%
fitted.values()
p <- plot_ly(data = df,
x = ~x,
y = ~y,
color = ~CATEGORY,
type = "scatter",
mode = "markers"
) %>%
add_trace(x = ~x, y = ~fv, mode = "lines")
p
Related
I want to show multiple lines being added to a plotly plot (as an animation) using R. For example, I have the following plotly line graphs (p, p2, p3):
library(plotly)
set.seed(3)
x = 1:10
y = 1:10
y2 = y^2
y3 = y^3
p = plot_ly(data = data.frame(x = x, y = y), x = ~ x, y = ~y, type = "scatter", mode = "lines")
p2 = plot_ly(data = data.frame(x = x, y = y2), x = ~ x, y = ~y2, type = "scatter", mode = "lines")
p3 = plot_ly(data = data.frame(x = x, y = y3), x = ~ x, y = ~y3, type = "scatter", mode = "lines")
Here p, p2, p3 are different plots but they all have the same x axis and different y axis. I want to be able to make an animation where the lines y, y2, y3 will successively appear in the plotly graph.
P.S: It does not strictly have to be done using plotly, but strongly preferred.
An idea might be to create a 'dataset' for each frame.
The first frame contains all values for y and all values for y2 and y3 are located outside the y-axis limits. For the second frame all values from y and y2 are shown and just the values from y3 are beyond the limit. In frame 3 all values are included.
library(tidyverse)
library(plotly)
# transform dataframe into a long format
df <- data.frame(x = 1:10,
y = 1:10) %>%
mutate(y2 = y^2,
y3 = y^3) %>%
pivot_longer(cols = -x,
names_to = "line",
values_to = "value")
# set the values for each frame and line
# (all lines not shown, need to hidden outside the plot limits - NA won't work)
df_plot <- map_df(1:3, ~ mutate(df, frame = .)) %>%
mutate(value = case_when(frame == 1 & line %in% c("y2", "y3") ~ -10,
frame == 2 & line %in% c("y3") ~ -10,
TRUE ~ value))
# create plot
plot_ly(data = df_plot,
x = ~x,
y = ~value,
color = ~line,
type = "scatter",
mode = "line",
frame = ~frame) %>%
layout(yaxis=list(range = c(0, 1000))) %>%
animation_opts(easing = "bounce")
I am trying to "functionize" my plot statements. If i want to add an additional trace from another dataframe, i am getting an error that the values on the y axis do not equal the first number of values in the first dataframe. I am not certain why this is relevant.
library(tidyverse)
library(plotly)
library(lubridate)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by="month")
Values <- c(2,3,4,3,4,5,6,4,5,6,7,8,9,10,8,9,10,11,12,13,11,12,13,14)
Date2 <- seq(as.Date("2018-07-1"), as.Date("2018-09-01"), by="month")
Values2 <- c(16,17,18)
df <- tibble::tibble(Date, Values)
df2 <- tibble::tibble(Date2, Values2)
testfunction <- function(x, y, y2){
p <- plot_ly(df,x = ~x, y = ~y, colors = "Blues", type = 'scatter', mode = 'lines') %>%
add_trace(data = df2, y = ~y2, line = list(color = 'rgb(255, 36,1)', width = 2.25)) %>%
layout(xaxis = list(tickformat = "%b %e"))
p
}
testfunction(Date, Values, Values2)
#Error: Column `y` must be length 1 or 24, not 3
Notice that Date, Values, and Values2 are objects that exist in your global environment. So, testfunction is actually using those objects in the call to plot_ly. To demonstrate this, try removing df in the plot_ly call -- you should still be able to get a plot (i.e. plot_ly isn't actually using the values in the dataframe). However, I suspect what you're trying to do is to specify variable names in your dataframe in the arguments to your function. In which case, try
testfunction <- function(x, y, x2, y2) {
x <- enquo(x)
y <- enquo(y)
x2 <- enquo(x2)
y2 <- enquo(y2)
plot_ly(df, x = x, y = y, type = "scatter", mode = "lines") %>%
add_trace(x = x2, y = y2, data = df2)
}
testfunction(Date, Values, Date2, Values2)
with a hat tip to this question and answer: Pass variables as parameters to plot_ly function
I'm trying to learn how to draw surfaces in a 3D scatter plot in Plotly using R.
I tried to extend the example give in this questions: Add Regression Plane to 3d Scatter Plot in Plotly
Except I changed the example from using a standard Iris data set to using to random clusters separated by a 2D plane with the formula: Z = -X -Y
I get the error:
Error in traces[[i]][[obj]] :
attempt to select less than one element in get1index
So I set up my data to be divided by the plane
rm(list=ls())
library(plotly)
library(reshape2)
x <- rnorm(100,1,1)
y <- rnorm(100,1,1)
z <- rnorm(100,1,1)
col <- rep("red",100)
df.1 <- data.frame(x,y,z,col)
x <- rnorm(100,-1,1)
y <- rnorm(100,-1,1)
z <- rnorm(100,-1,1)
col <- rep("blue",100)
df.2 <- data.frame(x,y,z,col)
df<- rbind(df.1,df.2)
Next, I want to calculate a surface for a plane whose formula is x + y + z = 0
graph_reso <- 0.1
#Setup Axis
axis_x <- seq(min(df$x), max(df$x), by = graph_reso)
axis_y <- seq(min(df$x), max(df$x), by = graph_reso)
surface <- expand.grid(x = axis_x,y = axis_y,KEEP.OUT.ATTRS = F)
Here I compute the surface - in the cited example they use linear regression
surface$z <- 0 - surface$x - surface$y
surface2 <- acast(surface, y ~ x, value.var = "z") #y ~ x
Next, I use plot_ly to create the 3D scatter plot -- which works fine
p <- plot_ly(df, x = ~x, y = ~y, z = ~z, color = ~col, colors = c('#BF382A', '#0C4B8E')) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'X'),
yaxis = list(title = 'Y'),
zaxis = list(title = 'Z')))
So this is the step where I get stuck -- I guess I'm not creating my surface correctly. Tried google and troubleshooting -- it seems I'm stuck.
add_trace(p,z=surface2,x=axis_x,y=axis_y,type="surface")
The error I get is:
Error in traces[[i]][[obj]] :
attempt to select less than one element in get1index
Add inherit=FALSE inside add_trace:
p <- plot_ly(df, x = ~x, y = ~y, z = ~z, color = ~col, colors=c('#BF382A', '#0C4B8E')) %>%
add_markers() %>%
add_trace(z=surface2, x=axis_x, y=axis_y, type="surface", inherit=FALSE) %>%
layout(scene = list(xaxis = list(title = 'X'), yaxis = list(title = 'Y'),
zaxis = list(title = 'Z'), aspectmode='cube'))
print(p)
In creating a trend line for a scatter plot, I am using add_trace to add a linear trend line.
When the data only has one "series" of data, i.e. there is only one group of coordinates, the code below works fine. However, when I introduce a number of series, the "trend line" looks like this:
Here is the relevant part of the code:
p <- plot_ly(filteredFull(), x=Relative.Time.Progress, y=cumul.ans.keystroke,
mode='markers', color=KeystrokeRate, size=KeystrokeRate,
marker=list(sizeref=100), type='scatter',
hoverinfo='text', text=paste("token: ",Token, "Keystrokes: ",
KeystrokeCount)) %>%
layout(
xaxis=list(range=c(0,1)),
yaxis=list(range=c(0,max(filteredFull()$cumul.ans.keystroke)))
)
lm.all <- lm(cumul.ans.keystroke ~ Relative.Time.Progress,
data=df)
observe(print(summary(lm.all)))
p <- add_trace(p, y=fitted(lm.all), x=Relative.Time.Progress,
mode='lines') %>%
layout(
xaxis= list(range = c(0,1))
)
p
I can add more code, or try to make a minimal working example, if necessary. However, I'm hoping that this is a famililar problem that is obvious from the code.
I think you'll need to specify the data = ... argument in add_trace(p, y=fitted(lm.all), x=Relative.Time.Progress, mode='lines').
The first trace seems to be a subset but the second trace uses the regression fitted values which are obtained by fitting a regression model to the entire dataset.
There might be a mismatch between Relative.Time.Progress in filteredFull() vs df.
Here's an example. Hopefully helps...
library(plotly)
df <- diamonds[sample(1:nrow(diamonds), size = 500),]
fit <- lm(price ~ carat, data = df)
df1 <- df %>% filter(cut == "Ideal")
plot_ly(df1, x = carat, y = price, mode = "markers") %>%
add_trace(x = carat, y = fitted(fit), mode = "lines")
plot_ly(df1, x = carat, y = price, mode = "markers") %>%
add_trace(data = df, x = carat, y = fitted(fit), mode = "lines")
It changed now a bit, the following code should work fine:
df <- diamonds[sample(1:nrow(diamonds), size = 500),]
fit <- lm(price ~ carat, data = df)
df1 <- df %>% filter(cut == "Ideal")
plot_ly() %>%
add_trace(data = df1, x = ~carat, y = ~price, mode = "markers") %>%
add_trace(data = df, x = ~carat, y = fitted(fit), mode = "lines")
Need to start with empty plotly and add traces.
The two separate charts created from data.frame work correctly when created using the R plotly package.
However,
I am not sure how to combine them into one (presumably with the add_trace function)
df <- data.frame(season=c("2000","2000","2001","2001"), game=c(1,2,1,2),value=c(1:4))
plot_ly(df, x = game, y = value, mode = "markers", color = season)
plot_ly(subset(df,season=="2001"), x = game, y = value, mode = "line")
Thanks in advance
The answer given by #LukeSingham does not work anymore with plotly 4.5.2.
You have to start with an "empty" plot_ly() and then to add the traces:
df1 <- data.frame(season=c("2000","2000","2001","2001"), game=c(1,2,1,2), value=c(1:4))
df2 <- subset(df, season=="2001")
plot_ly() %>%
add_trace(data=df1, x = ~game, y = ~value, type="scatter", mode="markers") %>%
add_trace(data=df2, x = ~game, y = ~value, type="scatter", mode = "lines")
here is a way to do what you want, but with ggplot2 :-) You can change the background, line, points color as you want.
library(ggplot2)
library(plotly)
df_s <- df[c(3:4), ]
p <- ggplot(data=df, aes(x = game, y = value, color = season)) +
geom_point(size = 4) +
geom_line(data=df_s, aes(x = game, y = value, color = season))
(gg <- ggplotly(p))
There are two main ways you can do this with plotly, make a ggplot and convert to a plotly object as #MLavoie suggests OR as you suspected by using add_trace on an existing plotly object (see below).
library(plotly)
#data
df <- data.frame(season=c("2000","2000","2001","2001"), game=c(1,2,1,2),value=c(1:4))
#Initial scatter plot
p <- plot_ly(df, x = game, y = value, mode = "markers", color = season)
#subset of data
df1 <- subset(df,season=="2001")
#add line
p %>% add_trace(x = df1$game, y = df1$value, mode = "line")