This may look similar to this question: How to format two Axes in Plotly using R? but it is not. Plus this one doesn't have any responses.
I am trying to have three subplots (all time-series) one under the other using Plot_ly in R. While the plots are correct, I want their x-axis i.e. date-time ranges to be same. At the moment the third plot has time stamp starting from 8 AM whereas others have it starting at 00:00 AM. I would like my third plot also to be starting at 00:00 AM even if there are no values there. This would make visual comparison much easier. Here is my code snippet:
pWrist <- plot_ly(combinedCounts, x = ~DATE_TIME, y= ~Vector.Magnitude_ANKLE, name = "ANKLE_COUNTS", legendgroup = "ANKLE", type = "bar")
pAnkle <- plot_ly(combinedCounts, x = ~DATE_TIME, y= ~Vector.Magnitude_WRIST, name = "WRIST_COUNTS", legendgroup = "WRIST", type = "bar")
pResponses <- plot_ly(uEMAResponsesOnly, x=~PROMPT_END_TIME, y=~RESPONSE_NUMERIC, name = "uEMA_RESPONSES", legendgroup = "uEMA", type = "bar")
subplot( style(pWrist, showlegend = TRUE), style(pAnkle, showlegend = TRUE), style(pResponses, showlegend = TRUE), nrows = 3, margin = 0.01)
The subplot function puts them all one under the other. Any help or information is appreciated.
**EDIT: **Here is what it looks like right now. I just want the third axis also to start from the same time as the first two. As you can see, the third one starts at 8 AM. Current plot
This is solved.
pResponses <- plot_ly(uEMAResponsesOnly, x=~PROMPT_END_TIME, y=~RESPONSE_NUMERIC, name = "uEMA_RESPONSES", legendgroup = "uEMA", type = "bar")%>% layout(xaxis = list(range = as.POSIXct(c('2018-01-13 00:00:00', '2018-01-14 23:00:00'))))
Related
I am using Plotly to plot actual versus predicted values from a dataset. Everything runs fine, except I would like to color actual value points (SalePrice) in blue and predicted value points (pred.price) in red. I'm relatively new to Plotly and have been playing around with some examples given online but it's not quite what I'm looking for. I have the following syntax below that plots all points in a singular color but I am not sure how to proceed. Thank you
avp.plot <- plot_ly(data = avp.df, x = ~SalePrice, y = ~pred.price, type = 'scatter')
avp.plot
The head of the df looks like this:
SalePrice pred.price
1 208500 206900.5
2 181500 175133.0
3 223500 216843.6
4 140000 216189.2
5 250000 281824.2
6 143000 135957.3
In my understanding you still need the x-axis defined. I did this with the column id.
Here is the provided data which I took from your example:
avp.df <- data.frame(id = c(1,2,3,4,5,6),
SalePrice = c(208500, 181500, 223500, 140000, 250000, 143000),
pred.price = c(206900.5, 175133.0, 216843.6, 216189.2, 281824.2, 135957.3))
There are different ways to create a plotly scatter plot. Taken this example you need to define your x and y value. x is defined as the id and SalePrice and pred.price as the y-axis values.
fig <- plot_ly(data = avp.df, x= ~id, y = ~SalePrice, name = 'Sale Price', type = 'scatter', mode = 'markers', marker=list(color='rgb(255,0,0)'))
fig <- fig %>% add_trace(y = ~pred.price, name = 'Predicted Price', mode = 'markers', marker=list(color='rgb(0,0,255)'))
fig
And you will get following scatter plot:
For additional adjustments I recommend the official plotly homepage.
I am having trouble understanding the default axis range for bars and lines in plotly for R. They seem to be different. To be precise, the default y-axis range for bars is not based on the extrema of the input data while it is based as such for lines. A little bit of background follows.
So I am making a plot of different economic time series. As is often the case with visualization of economic data, I often need two y-axes to show different variables which might be related. Currently I am making a line chart on the primary y axis and a bar chart on the secondary axis. The problem is that the secondary axis bar chart does not adequately represent the data because it selects a very wide range for the axis on a default basis. E.g. the specific variable on the sec. axis ranges from 3500 to 4000 but the range is shown from 0 to 4000. For line charts there is no such problem.
I can of course change these ranges manually using the attribute "range" in the the "layout" function but I want to be able to get my desired plot without much manual input. Also, it is helpful if plotly figures out the extrema by itself because the input data changes quite frequently. Here is my current code:
plot_ly(data = filter(dlx_df3, month_date >= "2013-01-01", month_date <= "2014-01-01")) %>%
add_lines(x = ~month_date,y = ~walr, name = "walr") %>%
add_bars(x = ~ month_date,y = ~ advances,yaxis = "y2", name = "adv") %>%
layout(
xaxis = list(ticks = "outside"),
yaxis2 = list(
side = "right",
autotick = TRUE,
ticks = "outside",
rangemode = "normal"
),
yaxis = list(
overlaying = "y2",
autotick = TRUE,
ticks = "outside"
),
legend = list(x = 1.08, y = 0.7)
)
You can see that the bars do not show much "variation". But this changes if I change the add_bars to add_lines. See below:
How do I change this axis modification for bars?
I really like the parallel coordinates plot available in
Plotly but I just ran into an issue I could use help with.
Is it possible to have log10 based axis for some of the coordinates?
As you can see in the example below performing a log10 transform allows to better distinguish the smaller values. However, by transforming the data we loose the ability to interpret the values. I would prefer to log scale the axis instead of the data but couldn't find a way to do this.
I did find something related to "axis styling" in the github issue https://github.com/plotly/plotly.js/issues/1071#issuecomment-264860379 but
not a solution to this problem.
I would appreciate any ideas/pointer.
library(plotly)
# Setting up some data that span a wide range.
df <- read.csv("https://raw.githubusercontent.com/bcdunbar/datasets/master/iris.csv")
df$sepal_width[1] = 50
df$sepal_width_log10 = log10(df$sepal_width)
p <- df %>%
plot_ly(type = 'parcoords',
line = list(color = ~species_id,
colorscale = list(c(0,'red'),c(0.5,'green'),c(1,'blue'))),
dimensions = list(
list(range = c(~min(sepal_width),~max(sepal_width)),
label = 'Sepal Width', values = ~sepal_width),
list(range = c(~min(sepal_width_log10),~max(sepal_width_log10)),
tickformat='.2f',
label = 'log10(Sepal Width)', values = ~sepal_width_log10),
list(range = c(4,8),
constraintrange = c(5,6),
label = 'Sepal Length', values = ~sepal_length))
)
p
More Parallel Coordinate Examples
Plotly Parallel Coordinates Doc
Since the log projection is not supported (yet) creating tick labels manually seems to be a valid solution.
# Lets create the axis text manually and map the log10 transform
# back to the original scale.
my_tickvals = seq(min(df$sepal_width_log10), max(df$sepal_width_log10), length.out=8)
my_ticktext = signif(10 ^ my_tickvals, digits = 2)
library(plotly)
# Setting up some data that span a wide range.
df <- read.csv("https://raw.githubusercontent.com/bcdunbar/datasets/master/iris.csv")
df$sepal_width[1] = 50
df$sepal_width_log10 = log10(df$sepal_width)
# Lets create the axis text manually and map the log10 transform back to the original scale.
my_tickvals = seq(min(df$sepal_width_log10), max(df$sepal_width_log10), length.out=8)
my_ticktext = signif(10 ^ my_tickvals, digits = 2)
p <- df %>%
plot_ly(type = 'parcoords',
line = list(color = ~species_id,
colorscale = list(c(0,'red'),c(0.5,'green'),c(1,'blue'))),
dimensions = list(
list(range = c(~min(sepal_width),~max(sepal_width)),
label = 'Sepal Width', values = ~sepal_width),
list(range = c(~min(sepal_width_log10),~max(sepal_width_log10)),
tickformat='.2f',
label = 'log10(Sepal Width)', values = ~sepal_width_log10),
list(range = c(~min(sepal_width_log10),~max(sepal_width_log10)),
tickvals = my_tickvals,
ticktext = my_ticktext,
label = 'Sepal Width (log10 axis)', values = ~sepal_width_log10),
list(range = c(4,8),
constraintrange = c(5,6),
label = 'Sepal Length', values = ~sepal_length))
)
p
The underlying plotly.js parcoords doesn't support log projection (scales, axes) at the moment, though as you mention it comes up sometimes and we plan with this functionality. In the meantime, an option is to take the logarithm of the data ahead of time, with the big drawback that axis ticks will show log values, which needs explanation and adds to cognitive burden.
I am plotting jitter boxplots through plotly in R. Plotly boxplots allow analyzing interactively the quartiles and the values of outliers. (Examples here: https://plot.ly/r/box-plots/)
I would like to see the name of observations that are outliers, so I can analyze them later.
However, it seems that boxplots don't have the option of watching to which observation they belong to, in contrast to scatter boxplots, where one can see it through 'text' option.
Before implementing other approaches, however, I would like to confirm that there is no possibility to have this information plotted.
I didn't find this option also.
I tried to plot but I didn't succeed, so I Located the outlier with the function boxplot.stats and wrote over it.
Look this example:
set.seed(1234)
a<-rnorm(50)
a2 <- rnorm(50, 1)
plot_ly(y = a, type = 'box') %>%
add_trace(y = a2) %>%
layout(title = 'Box Plot',xaxis = list(title = "cond", showgrid = F), yaxis = list(title = "rating"),
annotations = list(
x = -0.01,
y = boxplot.stats(a)$out,
text = "Outlier",
showarrow = FALSE,
xanchor = "right"
))
If you still want the outliers labeled by tooltips you can also identify them separately and pass the outliers dataset to add_marker(), overwriting the boxplot outliers. Try something like this:
#Set seed
set.seed(9)
#Generate random dataset
x <- data.frame(values = rnorm(100,sd=2),labels = paste("point",as.character(1:100)))
#Get outliarsdata
vals <- boxplot(x[,"values"],plot = FALSE)
#Make outliars dataset
y <- x[x[,"values"] > vals$stats[5,1] | x[,"values"] < vals$stats[1,1],]
#Make plot
plot_ly(x,y = ~values,x = 1,type = "box") %>%
add_markers(data = y, text = y[,'labels'])
I know that this comes horrendously late.
Check This link. Plotly allows a few options for showing outliers.
Curiously, it does not allow any option to NOT plot outliers (that is what I was looking for).
I have a bunch of 'paired' observations from a study for the same subject, and I am trying to build a spaghetti plot to visualize these observations as follows:
library(plotly)
df <- data.frame(id = rep(1:10, 2),
type = c(rep('a', 10), rep('b', 10)),
state = rep(c(0, 1), 10),
values = c(rnorm(10, 2, 0.5), rnorm(10, -2, 0.5)))
df <- df[order(df$id), ]
plot_ly(df, x = type, y = values, group = id, type = 'line') %>%
layout(showlegend = FALSE)
It produces the correct plot I am seeking. But, the code shows each grouped line in own color, which is really annoying and distracting. I can't seem to find a way to get rid of colors.
Bonus question: I actually want to use color = state and actually color the sloped lines by that variable instead.
Any approaches / thoughts?
You can set the lines to the same colour like this
plot_ly(df, x = type, y = values, group = id, type = 'scatter', mode = 'lines+markers',
line=list(color='#000000'), showlegend = FALSE)
For the 'bonus' two-for-the-price-of-one question 'how to color by a different variable to the one used for grouping':
If you were only plotting markers, and no lines, this would be simple, as you can simply provide a vector of colours to marker.color. Unfortunately, however, line.color only takes a single value, not a vector, so we need to work around this limitation.
Provided the data are not too numerous (in which case this method becomes slow, and a faster method is given below), you can set colours of each line individually by adding them as separate traces one by one in a loop (looping over id)
p <- plot_ly()
for (id in df$id) {
col <- c('#AA0000','#0000AA')[df[which(df$id==id),3][1]+1] # calculate color for this line based on the 3rd column of df (df$state).
p <- add_trace(data=df[which(df$id==id),], x=type, y=values, type='scatter', mode='markers+lines',
marker=list(color=col),
line=list(color=col),
showlegend = FALSE,
evaluate=T)
}
p
Although this one-trace-per-line approach is probably the simplest way conceptually, it does become very (impractically) slow if applied to hundreds or thousands of line segments. In this case there is a faster method, which is to plot only one line per colour, but to split this line up into multiple segments by inserting NA's between the separate segments and using the connectgaps=FALSE option to break the line into segments where there are missing data.
Begin by using dplyr to insert missing values between line segements (i.e. for each unique id we add a row containing NA in the columns that provide x and y coordinates).
library(dplyr)
df %<>% distinct(id) %>%
`[<-`(,c(2,4),NA) %>%
rbind(df) %>%
arrange (id)
and plot, using connectgaps=FALSE:
plot_ly(df, x = type, y = values, group = state, type = 'scatter', mode = 'lines+markers',
showlegend = FALSE,
connectgaps=FALSE)