How do you create trend line in grafana charts - opentsdb

I am using grafana with opentsdb. I can create charts with avg, max, min etc but I dont see how a trend can be added. Is it possible to put trend line on charts in grafana?

I found a way to do this. Use the movingAverage function, and set the window size to something really large, like in the thousands. The higher you set it, the smoother the trendline gets.

So, Grafana does not have a way to add a trendline. A tragedy to be sure.
That doesn't mean it is impossible to add one, but it is VERY time consuming.
Here is how I did it.
For my purposes I already had the y values as separate grafana variables, you could copy what I did or you could just create another with query to populate your data, you'll need to be able to call each y value separately.
Once you have you y values you can calculate your trendline.
More info on the trendline equation here https://classroom.synonym.com/calculate-trendline-2709.html
with
a as (
select
(12*($1*1 + $2*2 + $3*3 + $4*4 + $5*5 + $6*6 + $7*7 + $8*8 + $9*9 + $10*10 + $11*11 + $12*12)) as value
),
b as (
select
($1+$2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12)*(1+2+3+4+5+6+7+8+9+10+11+12) as value
),
c as (
select
12*(1^2+2^2+3^2+4^2+5^2+6^2+7^2+8^2+9^2+10^2+11^2+12^2) as value
),
d as (
select
(1+2+3+4+5+6+7+8+9+10+11+12)^2 as value
),
slope as (
select
(a.value-b.value)/(c.value-d.value) as value
from a, b, c, d),
e as (
select
($1+$2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12) as value
),
f as (
select
slope.value*(1+2+3+4+5+6+7+8+9+10+11+12) as value
from slope),
y_intercept as (
select
(e.value-f.value)/12 as value
from e, f
)
Now you just need to populate the x values and y values for your trendline. x values must be and date. I used relative date ranges to match my y value data time range.
select
x_value as time,
trendline_value
from
(select
now() - interval '1 month' as x_value,
slope.value*1+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '2 month' as x_value,
slope.value*2+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '3 month' as x_value,
slope.value*3+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '4 month' as x_value,
slope.value*4+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '5 month' as x_value,
slope.value*5+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '6 month' as x_value,
slope.value*6+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '7 month' as x_value,
slope.value*7+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '8 month' as x_value,
slope.value*8+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '9 month' as x_value,
slope.value*9+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '10 month' as x_value,
slope.value*10+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '11 month' as x_value,
slope.value*11+y_intercept.value as trendline_value
from
slope, y_intercept
union
select
now() - interval '12 month' as x_value,
slope.value*12+y_intercept.value as trendline_value
from
slope, y_intercept
) as line_data
order by time
Here is what the final product looks like Grafana with trendline
It's not pretty but it works.

Related

how can I determine when the confidence interval of a model hits certain value?

The company I work for has certain COVID infection rate targets before letting people return from home. One of those targets is daily new infections per unit population to be below 10 per 100k. How can I determine when the upper and lower confidence intervals hit that target? See image with annotation in red.
Right now, the two vertical lines are entered manually, but I'd like these add them automatically at the intersection points of the upper and lower confidence interval.
Data: https://raw.githubusercontent.com/robhanssen/covid19-v3/main/data/sc-casesdeath.csv (filtered after Jan 21, 2021 in image).
Code example (from https://github.com/robhanssen/covid19-v3/blob/main/process_us_data.r)
casesdeathsbylocation %>% filter(date > as.Date("2021-01-21")) %>%
ggplot + aes(x=date, y=casesper100k) + geom_point() + geom_smooth(method="lm", fullrange=TRUE) +
scale_y_continuous(limit=c(-50,100), breaks=seq(0,100,10)) +
scale_x_date(breaks="2 weeks", date_labels="%b %d", limit=as.Date(c("2021-01-21","2021-04-07"))) +
labs(x="Date", y="Cases per 100k population", title="Cases in South Carolina", subtitle="Cases per 100,000") +
geom_hline(yintercept=10, lty=2) +
geom_hline(yintercept=5, lty=3) +
geom_hline(yintercept=0, lty=1) +
geom_vline(xintercept=as.Date("2021-03-09"),lty=2) + geom_vline(xintercept=as.Date("2021-04-02"),lty=2)
You can do this by creating a linear approximation of the inverse of a specified confidence limit (which is linear in this case anyway!) and using it to interpolate the value at which the line hits a specified threshold.
Note that here we are approximating x as a function of y (e.g. date as a function of lower CI):
find_value <- function(x,y,target=10) {
aa <- approx(y,x,xout=target)$y
as.Date(aa,origin="1970-01-01") ## convert back to a date (ugh)
}
Once we have this helper function, we can use it in a tidy workflow that uses broom::augment to generate the confidence intervals.
library(broom)
lims <- (cdbl
## fit linear model
%>% lm(formula=casesper100k~date)
## predict/add confidence intervals
%>% augment(interval="confidence",
newdata=data.frame(date=
seq.Date(from=min(cdbl$date),to=max(cdbl$date)+20,
by="1 day")))
%>% select(date,.lower,.upper)
## interpolate to find date corresponding to target value (10)
## should use across() but I can't get it working
%>% summarise(lwr=find_value(date,.lower),
upr=find_value(date,.upper))
## convert to useful data frame for ggplot
%>% pivot_longer(cols=everything(),names_to="limit",values_to="date")
)
Now you have a lims data frame that you can use for whatever you want. Using it in the plotting context:
(ggplot(cdbl)
+ aes(x=date, y=casesper100k)
+ geom_point()
+ expand_limits(x=max(cdbl$date+20))
+ geom_smooth(method="lm", fullrange=TRUE)
+ scale_y_continuous(limit=c(-50,100), breaks=seq(0,100,10))
+ scale_x_date(breaks="2 weeks", date_labels="%b %d")
+ geom_hline(yintercept=10,lty=2)
+ geom_vline(data=lims,aes(xintercept=date),lty=2)
)
As pointed out in the comments, you will get a more reliable answer if you use a more sophisticated forecasting method. As long as you get the confidence intervals returned by augment, the code here will work.

Best way to ggplot time series with multiple categorical values

I am working with following data set:
team, time, rank1, rank2, rank3, rank4, rank5
bull, 20180102,0,0,0,0,1
corn, 20180102,0,29,0,0,1
fivfo, 20180102,23,4,0,0,1
lazy, 20180102,0,0,0,0,1
tt, 20180102,0,4,222,0,1
cheer, 20180102,23,0,34,0,1
manup, 20180102,0,13,0,0,1
bull, 20180103,0,10,0,10,1
corn, 20180103,0,59,0,0,1
fivfo, 20180103,43,4,0,0,1
lazy, 20180103,0,0,0,0,1
tt, 20180103,0,4,122,0,1
cheer, 20180103,23,0,34,0,11
manup, 20180103,0,13,10,0,11
Goal is to plot rank per team while reflecting the date time. I was trying to use melt but can't really figure which axis is to be melt against.
I tried to use the melt as follows:
melt.s <- melt(s, id=c("team","time"))
ggplot(melt.s,aes(x=time,y=value,colour=variable,group=variable)) + geom_line()
problem with the above is that team name doesn't really appear key take away of the plot that I want to show case is team and the number of time that they have reached the rank.
Trying to figure the best way to plot but so far thinking following
rank5 |
rank4 |
rank3 | legend (team)
rank2 |
rank1 |___________________
time
Perhaps something like this:
library(tidyr); library(lubridate)
gather(s, rank, rank_count, -c(team, time)) %>%
mutate(time = ymd(time)) %>%
ggplot(aes(time, rank_count)) +
geom_line() +
ggrepel::geom_text_repel(aes(label = rank_count), size = 3) +
scale_x_date(date_labels = "%b %d") +
facet_grid(rank~team)

ggplot2 creating specific color gradients for specific geom-lines/ribbons from different dataframe

I like to produce similar graphs than the one produced by NOAA for their SST and Degree Heating Week (DHW)in R using ggplot (but if you have another way, I take it too)
NOAA graph for SST and DHW time series:
So I have one data frame per surveyed site that contains time series of Sea Surface Temperature (SST) and a different data frame with the DHW value for my study area.
Length of those data frames does not always match because time-series are not the same but they overlap.
SST dataframe look like this:
DateTime meanSST DateShort
07/29/15 07:00:00 PM 26.781 07/29/15
07/29/15 08:00:00 PM 27.272 07/29/15
07/29/15 09:00:00 PM 25.708 07/29/15
07/29/15 10:00:00 PM 25.902 07/29/15
07/29/15 11:00:00 PM 25.805 07/29/15
07/30/15 12:00:00 AM 25.610 07/30/15
when DHW dataframe look like that:
Date2 SST_MIN SST_MAX DHWvalue
6/3/2013 26.683 29.8978 0
6/4/2013 26.9522 30.6074 0.168
6/5/2013 27.066 30.5716 0.342
6/6/2013 25.7735 30.6236 0.5282
6/7/2013 25.2618 30.4678 0.6848
I tried the following code in R:
#dataset
dd<-read.csv("HBH-LTER20150728-201610b.csv")
DHW<- read.csv("NOAA data South TW/DHW SouthernTW NOAA.csv")
#date format (lubridate required)
Sys.setlocale("LC_TIME", "usa") #windows
dd$Date = mdy(dd$Date)
DHW$Date2=mdy(DHW$Date2)
library(ggplot2)
#plot
ggplot() +
geom_line(data = DHW, aes(x = Date2, y = DHWvalue, color = DHWvalue)) +
scale_y_continuous(limits = c(0,31), breaks = seq(0,31,5)) +
geom_line(data = dd, aes(x = Date, y = meanSST, color = meanSST) )+
scale_colour_gradient2(low = "blue", mid = "light green" , high = "red", midpoint = 26)+
scale_x_date(date_breaks = "2 month", date_labels = waiver())+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
and I obtain this figure:
My problem is that I only succeed to give a color gradient for the entire graph but I want one color gradient for my SST data (upper line) going from 20 to 31 degree and a second color gradient going from 0 to 15 for the DHW line, both starting from blue to their low values to dark red for the highest value.
When I try to add a second scale_colour_gradient, my first gradient is replaced by the latest so I still failed to apply a different color gradient on my different lines.
Later on, I will also add SST for other sites from the area where I have a longer time series, that why I kept the entire time series for DHW.
any help will be extremely welcome.
Sincerely,
Lauriane

Producing a histogram - occurence of events across the hours

I have a set of data showing patients arrival and departure in a hospital:
arrival<-c("12:00","12:30","14:23","16:55","00:04","01:00","03:00")
departure<-c("13:00","16:00","17:38","00:30","02:00","07:00","23:00")
I want to produce a histogram counting the number of patients at each time band (00:00-01:00; 01:00-02:00 etc) in the hospital.
So I would get something like between 12:00- 12:59 there is 2 patients etc.
You can try this (change the example data a little bit, to ensure that the departure time is always greater than the arrival time, it will be good if you have date and time both in the arrival and departure), in the figure below, the time label 10:00 actually represents time from 10:00-10:59, you can change the labels if you want.
arrival<-c("12:00","12:30","14:23","16:55","00:04","01:00","03:00")
departure<-c("13:00","16:00","17:38","23:30","02:00","07:00","11:00")
df <- data.frame(arrival=strptime(arrival, '%H:%M'),departure=strptime(departure, '%H:%M'))
hours_present <- do.call('c', apply(df, 1, function(x) seq(from=as.POSIXct(x[1], tz='UTC'),
to=as.POSIXct(x[2], tz='UTC'), by="hour")))
library(ggplot2)
qplot(hours_present, geom='bar') +
scale_x_datetime(date_breaks= "1 hour", date_labels = "%H:%M",
limits = as.POSIXct(c(strptime("0:00", "%H:%M"), strptime("23:00", "%H:%M")), tz='UTC')) +
scale_y_continuous(breaks=1:5) +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
you can have 'histogram' instead as geom in qplot to get the following figure:

R growth rate calculation week over week on daily timeseries data

I'm trying to calculate w/w growth rates entirely in R. I could use excel, or preprocess with ruby, but that's not the point.
data.frame example
date gpv type
1 2013-04-01 12900 back office
2 2013-04-02 16232 back office
3 2013-04-03 10035 back office
I want to do this factored by 'type' and I need to wrap up the Date type column into weeks. And then calculate the week over week growth.
I think I need to do ddply to group by week - with a custom function that determines if a date is in a given week or not?
Then, after that, use diff and find the growth b/w weeks divided by the previous week.
Then I'll plot week/week growths, or use a data.frame to export it.
This was closed but had same useful ideas.
UPDATE: answer with ggplot:
All the same as below, just use this instead of plot
ggplot(data.frame(week=seq(length(gr)), gr), aes(x=week,y=gr*100)) + geom_point() + geom_smooth(method='loess') + coord_cartesian(xlim = c(.95, 10.05)) + scale_x_discrete() + ggtitle('week over week growth rate, from Apr 1') + ylab('growth rate %')
(old, correct answer but using only plot)
Well, I think this is it:
df_net <- ddply(df_all, .(date), summarise, gpv=sum(gpv)) # df_all has my daily data.
df_net$week_num <- strftime(df_net$date, "%U") #get the week # to 'group by' in ddply
df_weekly <- ddply(df_net, .(week_num), summarize, gpv=sum(gov))
gr <- diff(df_weekly$gpv)/df_weekly$gpv[-length(df_weekly$gpv)] #seems correct, but this I don't understand via: http://stackoverflow.com/questions/15356121/how-to-identify-the-virality-growth-rate-in-time-series-data-using-r
plot(gr, type='l', xlab='week #', ylab='growth rate percent', main='Week/Week Growth Rate')
Any better solutions out there?
For the last part, if you want to calculate the growth rate you can take logs and then use diff, with the default parameters lag = 1 (previos week) and difference = 1 (first difference):
df_weekly_log <- log(df_weekly)
gr <- diff(df_weekly_log , lag = 1, differences = 1)
The later is an approximation, valid for small differences.
Hope it helps.

Resources