I have a time series I'm plotting in R using ggplotly to auto-convert my ggplot2 graph plotly. My time series goes back 20 years, but when it's brought up I only want it to display the most recent 4 years of data. I've used
layout(ggplotly_object, xaxis=list(range=c(min_date,max_date)))
This does not appear to even be working to limit the date ranges, which I'm setting using lubridate to subtract 4 years from the maximum date.
I have not found any documentation on changing the default zoom of a plotly plot to a limited range of data while still allowing the user to zoom out and pan to past data. Any tips would be appreciated
The date axis is in measured in milliseconds, so you need to convert to this unit first. Here's an example:
library(plotly)
library(lubridate)
set.seed(42)
# Dummy data
t1 <- ymd_hms("2006-03-14 12:00:00")
t2 <- ymd_hms("2016-03-14 12:00:00")
df <- data.frame(t = seq(t1, t2, by = 'week'),
y = rexp(522, rate = 0.25))
# Full plot
p <- plot_ly(df, x = t, y = y, type = 'scatter')
p
# Now zoom. Needs to be the number of milliseconds since 01/01/1970.
# I'm deliberately using lubridate functions.
min_Date <- ymd_hms("2010-03-14 12:00:00")
min_Date_ms <- interval("1970-01-01 00:00:00", min_Date) / dmilliseconds(1)
max_Date <- ymd_hms("2012-03-14 12:00:00")
max_Date_ms <- interval("1970-01-01 00:00:00", max_Date) / dmilliseconds(1)
p %>% layout(xaxis = list(range = c(min_Date_ms, max_Date_ms)))
There's probably a more elegant way of doing this but it should work.
So for range, you should set it to a vector length 2, ie: c(min value,max value).
Related
I've developed some R code to find and draw trend lines on stock market data. However, the approach I'm using involves brute-force use of processing power and can take a long time, especially if I want to draw trend lines over more than a year's worth of price data. So, I'd love if someone could help me find a more efficient way to do this.
Basically, my current method involves generating all possible pairs of two daily lows in the data set, generating all possible trend lines that pass through a pair of points, and then testing each line to see if any daily low in the data set falls below the line. We keep all lines for which this is FALSE.
The processing time required increases exponentially as you increase the time frame over which you're trying to generate trend lines. To cut down on processing time, I've been filtering the data set for lows that are below the simple moving average. This gets rid of about half the data and generally preserves the most relevant data points. However, it doesn't fully solve the problem. When analyzing long time frames over more than one ticker symbol, running this code can still take a long time.
Here's what I've got:
# Load libraries for tidy stock data analysis
library(tidyverse)
library(tidyquant)
# Retrieve 1 year's worth of Apple stock price data from Yahoo! Finance
ticker <- "AAPL"
start <- Sys.Date() %m-% years(1)
prices <- tq_get(ticker, from = start) %>%
mutate(open = round(open,digits=2),
high = round(high,digits=2),
low = round(low,digits=2),
close = round(close,digits=2)) %>%
select(symbol,date,open,high,low,close)
# Filter prices data for lows that are below the simple moving average
lows <- prices %>%
filter(low < SMA(close),date<max(date))
# Find all unique possible combinations of two lows
# (and all unique possible combinations of their associated dates)
all_lowcombos <- bind_cols(as.data.frame(t(combn(lows$date,m=2,simplify=TRUE))),as.data.frame(t(combn(lows$low,m=2,simplify=TRUE))))
colnames(all_lowcombos) <- c("X1","X2","Y1","Y2")
# Generate a trendline for every combination of points
n <- seq(1:nrow(all_lowcombos))
low_trendfinder <- function(n,all_lowcombos){
model <- lm(c(all_lowcombos$Y1[n],all_lowcombos$Y2[n])~c(all_lowcombos$X1[n],all_lowcombos$X2[n]))
data.frame(intercept = model$coefficients[1],slope = model$coefficients[2])
}
low_trendlines <- map_dfr(n,low_trendfinder,all_lowcombos = all_lowcombos)
# For each low_trendline, check if any low in the prices dataframe falls below the line
# Keep only trendlines for which this is FALSE
# Also make sure the trendline wouldn't be less than half the current price for today's date; I only want lines that might be tradeable in the next week
low_trendline_test <- function(x,y,prices){
!any(x*as.numeric(prices$date) + y > prices$low + 0.01) & !(x*as.numeric(Sys.Date())+y < 0.5*prices$close[nrow(prices)])
}
none_below <- map2(.x = low_trendlines$slope,.y = low_trendlines$intercept,.f = low_trendline_test,prices = prices)
none_below <- unlist(none_below)
low_trendlines <- low_trendlines[none_below,]
# Chart support trendlines on a candlestick chart
prices %>% ggplot(aes(x = date, y = close)) +
geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
geom_abline(intercept=low_trendlines$intercept,slope=low_trendlines$slope) +
labs(title = paste(ticker,"Trendline Chart"),
y = "Price",
x = "Date",
caption = paste("Price data courtesy of Yahoo! Finance. Accessed ",
Sys.Date(),
".",
sep="")) +
theme_tq()
Here's an implementation of the solution from the comments.
# Load libraries for tidy stock data analysis
library(tidyverse)
library(tidyquant)
# Retrieve 1 year's worth of Apple stock price data from Yahoo! Finance
ticker <- "AAPL"
start <- Sys.Date() %m-% years(2)
prices <- tq_get(ticker, from = start) %>%
mutate(open = round(open,digits=2),
high = round(high,digits=2),
low = round(low,digits=2),
close = round(close,digits=2)) %>%
select(symbol,date,open,high,low,close)
# Filter prices data for lows that fall on the convex hull
lows <- prices[chull(prices[c("date", "low")]),] %>%
filter(date<max(date))
# Find all unique possible combinations of two lows
# (and all unique possible combinations of their associated dates)
all_lowcombos <- bind_cols(as.data.frame(t(combn(lows$date,m=2,simplify=TRUE))),as.data.frame(t(combn(lows$low,m=2,simplify=TRUE))))
colnames(all_lowcombos) <- c("X1","X2","Y1","Y2")
# Generate a trend line for every combination of points
n <- seq(1:nrow(all_lowcombos))
low_trendfinder <- function(n,all_lowcombos){
model <- lm(c(all_lowcombos$Y1[n],all_lowcombos$Y2[n])~c(all_lowcombos$X1[n],all_lowcombos$X2[n]))
data.frame(intercept = model$coefficients[1],slope = model$coefficients[2])
}
low_trendlines <- map_dfr(n,low_trendfinder,all_lowcombos = all_lowcombos)
# For each low_trendline, check if any low in the prices dataframe falls below the line
# Keep only trendlines for which this is FALSE
# Also make sure the trendline wouldn't be less than half the current price for today's date
low_trendline_test <- function(x,y,prices){
!any(x*as.numeric(prices$date) + y > prices$low + 0.01) & !(x*as.numeric(Sys.Date())+y < 0.5*prices$close[nrow(prices)])
}
none_below <- map2(.x = low_trendlines$slope,.y = low_trendlines$intercept,.f = low_trendline_test,prices = prices)
none_below <- unlist(none_below)
low_trendlines <- low_trendlines[none_below,]
# Chart support and resistance trendlines and this week's price targets
prices %>%
ggplot(aes(x = date, y = close)) +
geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
geom_abline(intercept=low_trendlines$intercept,slope=low_trendlines$slope) +
labs(title = paste(ticker,"Trendline Chart"), y = "Price", x = "Date", caption = paste("Price data courtesy of Yahoo! Finance. Accessed ",Sys.Date(),".",sep="")) +
theme_tq()
I am trying to plot a scatterplot in shiny, using times of day on the y axis.
When plotting multiple points, the y axis looks great.
Here's the code:
output$outputPlot <- renderPlot({
coords <- subset(coords, location == input$cities)
month <- coords$month
time <- strptime(coords$format_time, format = "%l:%M:%S %p")
plot(month, time)
})
But when there's only 1 data point in coords, the plot's time scale on the y-axis isn't in terms of time anymore, and the data point appears in the middle of the graph.
Thanks for your help!
What you're seeing is that R doesn't know how to guess an appropriate range for a single point. Typically it expands the range of data by 4% of the range (look at ?par and look for 'xaxs'), but with a single point that means nothing.
So we need to tell it what ylim to use. (Similarly, your x-axis needs some guidance, too.)
Fake data:
set.seed(2)
coords <- data.frame(tm = Sys.time() + runif(20, -3e7, 3e7))
coords$month <- as.integer(format(coords$tm, format = "%m"))
coords$format_time <- format(coords$tm, format = "%l:%M:%S %p")
head(coords)
# tm month format_time
# 1 2018-10-24 20:15:17 10 8:15:17 PM
# 2 2019-10-19 05:07:04 10 5:07:04 AM
# 3 2019-07-21 14:19:22 7 2:19:22 PM
# 4 2018-10-13 03:44:57 10 3:44:57 AM
# 5 2020-04-03 21:32:22 4 9:32:22 PM
# 6 2020-04-03 15:27:59 4 3:27:59 PM
The "normal" plot looks fine:
month <- coords$month
time <- strptime(coords$format_time, format = "%l:%M:%S %p")
plot(month, time)
but the single-point does not:
sub <- coords[1,]
month <- sub$month
time <- strptime(sub$format_time, format = "%l:%M:%S %p")
plot(month, time)
So we fix it by specifying the xlim and ylim arguments. In this case, since I'm inferring it is meant to be a year of months (x) and a day of times (y), I can hard-code them, but in other situations you might want to just substract/add a small amount from the one datum you have:
sub <- coords[1,]
month <- sub$month
time <- strptime(sub$format_time, format = "%l:%M:%S %p")
xlim <- c(1, 12)
ylim <- strptime(c("12:00:00 AM", "11:59:59 PM"), format = "%l:%M:%S %p")
plot(month, time, xlim = xlim, ylim = as.numeric(ylim))
You only need to specify ylim to answer this question, but without setting xlim= here, the previous x-axis spanned 6-14, not good for months. Also of note is that I had to coerce ylim to numeric for the plot, it did not work with ylim in its pure POSIXt form ... not sure exactly why that is the case, but this doesn't detract from the utility of the plot in general.
I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))
I want to plot a density time series with following data:
density vector (4,2,5,8,4,6,4)
sampling period vector (unit: seconds) (2,2,2,2,3,2,2)
as you can see, the sampling period is not constant. I only know the starting date and time.
I somehow need to assign the start time date to the first measurement and then compute the following dates and times for the following measurements, but i don't know how exactly to code it.
Try converting first the desired vector in a ts, provided an initial starttime and period's cumsum.
I assumed that you sample a continous process (there are not spanned/death times)
require (lubridate)
require (tidyr)
require (ggplot2)
require (ggfortify)
require (timetk)
density <- c (4,2,5,8,4,6,4)
seconds <- c (2,2,2,2,3,2,2)
starttime <- 0
time <- 0 + cumsum (seconds)
df <- as.data.frame (cbind (time, seconds, density))
df$time <- as_datetime(df$time)
df$ts <- tk_ts (df, select = density)
autoplot (df$ts, ts.geom = 'bar', fill = 'blue')
Plot the density against the cumulative sum of the seconds added to the start.
dens <- c(4,2,5,8,4,6,4)
secs <- c(2,2,2,2,3,2,2)
st <- as.POSIXct("2000-01-01 00:00:00")
plot(st + cumsum(secs), dens, xlab = "", type = "l")
The below is my sample data
Though there are values for mobile and tablet across all 4 dates, when I try to facet_wrap across device category, my results are not what is expected. All the values corresponding to each date are being added to the desktop only and are not being distributed across the 3 categories.
The code that I used is
qplot(data=gaData, x=gaData$Date, y=gaData$Users, xlim = c(20170101,20170101))+
facet_wrap(~gaData$Device.Category, ncol = 1)
The output that I'm seeing in the plot is
I'm new to the whole data visualization area. I'm unable to identify what is wrong with the code.
P.S. I'm able to plot mobile and tablet individually for the same dates successfully as individual plots.
x <- data.frame(Date = c('2017-01-01','2017-01-01','2017-01-01','2017-01-02','2017-01-02','2017-01-02','2017-01-03','2017-01-03','2017-01-03',
'2017-01-04','2017-01-04','2017-01-04'), Device = c("desktop","mobile","tablet","desktop","mobile","tablet",
"desktop","mobile","tablet","desktop","mobile","tablet"),
Users = c(404,223,39,529,211,43,1195,285,29,1019,275,35))
x$Date <- as.POSIXct(x$Date, tz = "UTC")
ggplot(x, aes(Date, Users)) + geom_line() + facet_wrap(~Device)
Is this what you wanted?
Hope this helps.
# Simulate some dummy data
dat <- data.frame(
Date = rep(20170101:20170104, each = 3),
Device = rep(c('D', 'M', 'T'), 4),
Users = runif(n = 12, max = 1000, min = 10) %>% round()
)
# This is the 'base map', map variables onto aesthetics
ggplot(aes(x = Date, y = Users, col = Device), data = dat) +
# What kind of geometry?
geom_line() +
geom_point() +
# From 1d panel to 2d
facet_wrap(~Device, ncol = 1)
Plot Result
You may also consider converting Date variable to class date.
The following references hope to help you gain some understanding of ggplot2.
http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
http://r-statistics.co/Complete-Ggplot2-Tutorial-Part1-With-R-Code.html
http://ggplot2.tidyverse.org/reference/
Also, DataCamp provides wonderful online tutorials.
Welcome to the amazing world of R. Cheers.
First I suggest you convert 'Date' to POSIXct. I use lubridate package:
library(lubridate)
Date=ymd(c(rep("2017-01-01",3),rep("2017-01-02",3),rep("2017-01-03",3),rep("2017-01-04",3)))
then we can build the rest of the dat frame
Country=rep("United States",12)
Device.Category=rep(c("Desktop","Mobile","Tablet"),4)
Users=c(404,223,39,529,211,43,1195,285,29,1019,275,35)
df=data.frame(Date,Country,Device.Category,Users)
If you want to plot only for "2017-01-01" use this
ggplot(df,aes(x=Date,y=Users))+geom_point()+facet_grid(Device.Category~.)+xlim(ymd("2017-01-01"),ymd("2017-01-01"))
Or if you want all dates just remove the xlim function
ggplot(df,aes(x=Date,y=Users))+geom_point()+facet_grid(Device.Category~.)+xlim(ymd("2017-01-01"),ymd("2017-01-01"))