Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))
Related
I want to import daily stock market price data into R from any ticker, and examine one historical time segment of it. Then, from this segment, convert these prices into daily ROC/rateofchange % changes. Next, take this ROC series and create a cumulative probability density function which allows me to set any custom number of sorting bins, and any size limit for each bin. example: 22 bins with .3% limit. Next, plot this CPDF as either a histogram or a scatterplot. The final step would be to do this for 2 different sections of the same stock and plot them next to each other for visual inspection. I have started a code on stock ticker SPY, but I cannot get it to work.
library(quantmod)
library(tidyquant)
library(tidyverse)
# using tidyverse to import a ticker
spy <- tq_get("spy")
spy010422 <- tq_get("spy", get ="stock.prices", from ='2022-01-04', to = '2022-01-24')
str(spy010422)
# getting ROC between prices in the series
spy010422.rtn = ROC(spy010422$close, n = 1, type = c("discrete"), na.pad = TRUE)
str(spy010422.rtn)
# trying to use ggplot and tibble to create an ECDF function
spy010422.rtn %>%
tibble() %>%
ggplot() +
stat_ecdf(aes(.))
# another attempt at running ECDF on the ROC series
spy010422.rtn %>%
ggplot(spy010422.rtn) +
stat_ecdf(aes(close))
# trying to set the number of bins and bin size for the ECDF
spy010422.rtn %>%
mutate(rounded = round(close/.3, 0) *.3,
bin = min_rank(rounded)) %>%
ggplot(aes(close, bin)) +
geom_line()
# next time segment of the ticker spy to compare this to
spy020222 <- tq_get("spy", get ="stock.prices", from ='2022-02-02', to = '2022-02-24')
I couldn't understand what exacly you wanted to plot. Normally a CPDF is just a continuous line, and doesn't have bins to customise. Also "plot this CPDF as either a histogram or a scatterplot" is a weird prhase to me, as one normally plots the histogram/scatterplot of the variable, not of the CPDF of the variable. Given that, I made a function that plots the histogram of the ROC of the ticker, and you can coment if that was what you wanted or not.
The function takes a list of dates in the format list(c(from1, to1), c(from2, to1), ...) (you can add as many intervals as you want), and loops for each interval on this list (with the purrr::map function). For each interation, it creates the histogram costumizing the bins argument. After the loop, the graphs are binded in one figure using the ggpubr::ggarrange function (you must run install.packages("ggpubr") if you don't have the package installed).
library(quantmod)
library(tidyquant)
library(tidyverse)
gg.roc.hist = function(ticker, dates, bins = 30){
map(dates, function(dates){ #loop for each interval in the 'dates' list
df = tq_get(ticker, get ="stock.prices", from = dates[1], to = dates[2]) #get the prices
df$roc = ROC(df$close, n = 1, type = c("discrete"), na.pad = TRUE) #add a column with the ROC
ggplot(df, aes(x = roc)) +
geom_histogram(bins = bins) + #create a histogram changing the bins
labs(title = paste0(dates[1], " to ", dates[2]))}) %>%
ggpubr::ggarrange(plotlist = .) #bind the graphs together
}
Runnig:
gg.roc.hist('spy', list(c('2022-01-04','2022-01-24'), c('2022-02-02', '2022-02-24')), 22)
Yields this graph:
I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))
I would like to automate an analysis I have been doing with Graphpad Prism with R, but apparently it is harder than I thought.
I have Voltage~Time data that I would like to integrate and plot. In Graphpad Prism, this is performed by Analysis -> Integrate -> Create the Integral.
Here blow I plot the data in Prism and I plot the trace that I got from the Plot Integral command.
How can I do that with R?
The data I used are similar to these:
Time <- seq(1,100,1)
Voltage <- sample(1:1000,100, replace = F)
I tried integrate(), but that requires a function to integrate, which I do not have, and gives me just a number.
I tried approxfun() and I could create a function of my data but again, as soon as I apply 'integrate()' I only got a single value.
Do you have any ideas on what the Graphpad Prism function does and how I can translate that to R?
Thank you for the help!
With discrete values you can use cumsum:
set.seed(1)
Time <- seq(1,100,1)
Voltage <- sample(1:1000,100, replace = F)
df = data.frame(Time, Voltage)
library(ggplot2)
p1 <- ggplot(data = df)+
geom_line(aes(x = Time, y = Voltage))
p2 <- ggplot(data = df)+
geom_line(aes(x = Time, y = cumsum(Voltage)))
library(gridExtra)
grid.arrange(p1, p2)][1]][1]
For unevenly spaced time values, you would want to calculate:
cumsum(df$Voltage[1:(nrow(df)-1)]) * diff(df$Time)
I have time-series data of four years. Now I want to plot the same data year-wise and do comparative analysis. The dummy data is as
library(xts)
library(ggplot2)
timeindex <- seq(as.POSIXct('2016-01-01'),as.POSIXct('2016-12-31 23:59:59'), by = "1 mins")
dataframe <- data.frame(year1=rnorm(length(timeindex),100,10),year2=rnorm(length(timeindex),150,7),
year3=rnorm(length(timeindex),200,3),
year4=rnorm(length(timeindex),350,4))
xts_df <- xts(dataframe,timeindex)
Now, when I use ggplot it takes too long to plot all the series using following lines
visualize_dataframe_all_columns(xts_df)
The above function is defined as:
visualize_dataframe_all_columns <- function(xts_data) {
library(RColorBrewer)# to increase no. of colors
library(plotly)
dframe <- data.frame(timeindex=index(xts_data),coredata(xts_data))
df_long <- reshape2::melt(dframe,id.vars = "timeindex")
colourCount = length(unique(df_long$variable))
getPalette = colorRampPalette(brewer.pal(8, "Dark2"))(colourCount) # brewer.pal(8, "Dark2") or brewer.pal(9, "Set1")
g <- ggplot(df_long,aes(timeindex,value,col=variable,group=variable))
g <- g + geom_line() + scale_colour_manual(values=getPalette)
ggplotly(g)
}
Problems with above approach are:
It takes long time to plot. Can I reduce the plot time?
It is very diffcult to zoom into the plot using plotly. Is there any other better way
Are there any better approaches to visualize this data?
I faced more or less the same problem with frequency of 10 mins data. However, the question is that, does it make sense to plot the minute data for whole year? Human eyes cannot recognize the difference.
I would create a daily xts from that data and and plot it for the year. And modify the function to plot for a period of time for the minute data.
I'm reasonably familiar with the usual ways of modifying a plot by writing your own x axis labels or a main title, but I've been unable to customize the output when plotting the results of a time series decomposition.
For example,
library(TTR)
t <- ts(co2, frequency=12, start=1, deltat=1/12)
td <- decompose(t)
plot(td)
plot(td, main="Title Doesn't Work") # gets you an error message
gives you a nice, basic plot of the observed time series, trend, etc. With my own data (changes in depth below the water surface), however, I'd like to be able to switch the orientation of the y axes (eg ylim=c(40,0) for 'observed', or ylim=c(18,12) for 'trend'), change 'seasonal' to 'tidal', include the units for the x axis ('Time (days)'), and provide a more descriptive title for the figure.
My impression is that the kind of time series analyses I'm doing is pretty basic and, eventually, I may be better off using another package, perhaps with better graphical control, but I'd like to use ts() and decompose() if I can for now (yeah, cake and consumption). Assuming this doesn't get too horrendous.
Is there a way to do this?
Thanks! Pete
You can modify the plot.decomposed.ts function (that's the plot "method" that gets dispatched when you run plot on an object of class decomposed.ts (which is the class of td).
getAnywhere(plot.decomposed.ts)
function (x, ...)
{
xx <- x$x
if (is.null(xx))
xx <- with(x, if (type == "additive")
random + trend + seasonal
else random * trend * seasonal)
plot(cbind(observed = xx, trend = x$trend, seasonal = x$seasonal, random = x$random),
main = paste("Decomposition of", x$type, "time series"), ...)
}
Notice in the code above that the function hard-codes the title. So let's modify it so that we can choose our own title:
my_plot.decomposed.ts = function(x, title="", ...) {
xx <- x$x
if (is.null(xx))
xx <- with(x, if (type == "additive")
random + trend + seasonal
else random * trend * seasonal)
plot(cbind(observed = xx, trend = x$trend, seasonal = x$seasonal, random = x$random),
main=title, ...)
}
my_plot.decomposed.ts(td, "My Title")
Here's a ggplot version of the plot. ggplot requires a data frame, so the first step is to get the decomposed time series into data frame form and then plot it.
library(tidyverse) # Includes the packages ggplot2 and tidyr, which we use below
# Get the time values for the time series
Time = attributes(co2)[[1]]
Time = seq(Time[1],Time[2], length.out=(Time[2]-Time[1])*Time[3])
# Convert td to data frame
dat = cbind(Time, with(td, data.frame(Observed=x, Trend=trend, Seasonal=seasonal, Random=random)))
ggplot(gather(dat, component, value, -Time), aes(Time, value)) +
facet_grid(component ~ ., scales="free_y") +
geom_line() +
theme_bw() +
labs(y=expression(CO[2]~(ppm)), x="Year") +
ggtitle(expression(Decomposed~CO[2]~Time~Series)) +
theme(plot.title=element_text(hjust=0.5))