I would like to plot a threshold model with smooth 95% confidence interval lines between line segments. You would think this would be on the simple side but I have not been able to find an answer!
My threshold/breakpoints are known, it would be great if there were a way to visualize this data. I have tried the segmented package which produces the following plot:
The plot shows a threshold model with a breakpoint at 5.4. However, the confidence intervals are not smooth between regression lines.
If anyone knows of any way to produce smooth (i.e. without the jump between line segments) CI lines between segmented regression lines (ideally in ggplot) that would be amazing. Thank you so much.
I have included sample data and the code I have tried below:
x <- c(2.26, 1.95, 1.59, 1.81, 2.01, 1.63, 1.62, 1.19, 1.41, 1.35, 1.32, 1.52, 1.10, 1.12, 1.11, 1.14, 1.23, 1.05, 0.95, 1.30, 0.79,
0.81, 1.15, 1.10, 1.29, 0.97, 1.05, 1.05, 0.84, 0.64, 0.80, 0.81, 0.61, 0.71, 0.75, 0.30, 0.30, 0.49, 1.13, 0.55, 0.77, 0.51,
0.67, 0.43, 1.11, 0.29, 0.36, 0.57, 0.02, 0.22, 3.18, 3.79, 2.49, 2.44, 2.12, 2.45, 3.22, 3.44, 3.86, 3.53, 3.13)
y <- c(22.37, 18.93, 16.99, 15.65, 14.62, 13.79, 13.09, 12.49, 11.95, 11.48, 11.05, 10.66, 10.30, 9.96, 9.65, 9.35, 9.07, 8.81,
8.56, 8.32, 8.09, 7.87, 7.65, 7.45, 7.25, 7.05, 6.86, 6.68, 6.50, 6.32, 6.15, 5.97, 5.80, 5.63, 5.47, 5.30,
5.13, 4.96, 4.80, 4.63, 4.45, 4.28, 4.09, 3.90, 3.71, 3.50, 3.27, 3.01, 2.70, 2.28, 22.37, 16.99, 11.05, 8.81,
8.56, 8.32, 7.25, 7.05, 6.50, 6.15, 5.63)
lin.mod <- lm(y ~ x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=2)
plot(x, y)
plot(segmented.mod, add=TRUE, conf.level = 0.95)
which produces the following plot (and associated jumps in 95% confidence intervals):
segmented plot
Background: The non-smoothness in existing change point packages are due to the fact that frequentist packages operate with a fixed change point value. But as with all inferred parameters, this is wrong because there is indeed uncertainty concerning the location of the change.
Solution: AFAIK, only Bayesian methods can quantify that and the mcp package fills this space.
library(mcp)
model = list(
y ~ 1 + x, # Segment 1: Intercept and slope
~ 0 + x # Segment 2: Joined slope (no intercept change)
)
fit = mcp(model, data = data.frame(x, y))
Default plot (plot.mcpfit() returns a ggplot object):
plot(fit) + ggtitle("Default plot")
Each line represents a possible model that generated the data. The posterior for the change point is shown as a blue density. You can add a credible interval on top using plot(fit, q_fit = TRUE) or plot it alone:
plot(fit, lines = 0, q_fit = c(0.025, 0.975), cp_dens = FALSE) + ggtitle("Credible interval only")
If your change point is indeed known and if you want to model different residual scales for each segment (i.e., quasi-emulate segmented), you can do:
model2 = list(
y ~ 1 + x,
~ 0 + x + sigma(1) # Add intercept change in residual scale
)
fit = mcp(model2, df, prior = list(cp_1 = 1.9)) # Note: prior is a fixed value - not a distribution.
plot(fit, q_fit = TRUE, cp_dens = FALSE)
Notice that the CI does not "jump" around the change point as in segmented. I believe that this is the correct behavior. Disclosure: I am the author of mcp.
reproducible example for my data:
df_1 <- data.frame(cbind("Thriving" = c(2.33, 4.21, 6.37, 5.28, 4.87, 3.92, 4.16, 5.53), "Satisfaction" = c(3.45, 4.53, 6.01, 3.87, 2.92, 4.50, 5.89, 4.72), "Wellbeing" = c(2.82, 3.45, 5.23, 3.93, 6.18, 4.22, 3.68, 4.74), "id" = c(1:8)))
As you can see, it includes three variables of psychological measures and one identifier with an id for each respondent.
Now, my aim is to create a 2D-grid with which I can have a nice overview of all the values for all respondents concerning each of the variables. So on the x-axis I would have the id of all the respondents and on the y-axis all variables, whereas the colour of the particular field depends on the value - 1 to 3 in red, 3 to 5 in yellow and 5 to 7 in green The style of the grid should be like this image.
All I have achieved so far is the following code which compresses all the variables/items into one column so they can together be portrayed on the y-axis - the id is of course included in its own column as are the values:
df_1 %>%
select("Thr" = Thriving, "Stf" = Satisfaction, "Wb" = Wellbeing, "id" = id) %>%
na.omit %>%
gather(key = "variable", value = "value", -id) %>%
I am looking for a solution that works without storing the data in a new frame.
Also, I am looking for a solution that would be useful for even 100 or more respondents and up to about 40 variables. It would not matter if one rectangle would then be very small, I just want to have a nice colour play which would give a nice taste of where an organisation may be achieving low or high - and how it is achieving in general.
Thanks for reading, very grateful for any help!
There is probably a better graphics oriented approach, but you can do this with base plot and by treating your data as a raster:
library(raster)
df_1 <- cbind("Thriving" = c(2.33, 4.21, 6.37, 5.28, 4.87, 3.92, 4.16, 5.53), "Satisfaction" = c(3.45, 4.53, 6.01, 3.87, 2.92, 4.50, 5.89, 4.72), "Wellbeing" = c(2.82, 3.45, 5.23, 3.93, 6.18, 4.22, 3.68, 4.74), "id" = c(1:8))
r <- raster(ncol=nrow(df_1), nrow=3, xmn=0, xmx=8, ymn=0, ymx=3)
values(r) <- as.vector(as.matrix(df_1[,1:3]))
plot(r, axes=F, box=F, asp=NA)
axis(1, at=seq(-0.5, 8.5, 1), 0:9)
axis(2, at=seq(-0.5, 3.5, 1), c("", colnames(df_1)), las=1)
I am now facing on a problem about how to make moving average crossover plot in R. I added ma5 and ma20 as two moving average plots base on my price data.
It is my sample code here..
library("TTR")
library(ggplot2)
price<- c(3.23, 3.29, 3.29 , 3.21, 3.19, 3.18, 3.11, 3.21, 3.25,
3.40, 3.39, 3.28, 3.31 , 3.32, 3.21, 3.19, 3.16, 3.20,
3.26, 3.30, 3.42, 3.44, 3.40, 3.41, 3.59, 3.83, 3.70,
3.86, 3.95, 3.89, 3.94, 3.78, 3.69, 3.74, 3.67, 3.69,
3.69, 3.61, 3.64, 3.83, 3.88, 3.98, 3.98, 3.86, 3.87,
3.93, 4.05, 3.97, 3.90, 3.93, 4.00, 3.85, 3.81, 4.20,
4.17, 4.05, 3.95, 3.96, 3.97, 3.96, 3.88, 3.85, 3.79,
3.83, 3.68, 3.72, 3.73, 3.81, 3.80, 3.81, 3.75, 3.87,
3.90, 3.89, 3.86, 3.81, 3.86, 3.78, 3.83, 3.87, 3.91,
4.05, 4.07, 4.02, 4.01, 4.00, 4.13, 4.07, 4.11, 4.26,
4.33, 4.32, 4.39, 4.30, 4.39, 4.68, 4.69, 4.70, 4.60,
4.71, 4.81, 4.73, 4.78, 4.64, 4.64, 4.64, 4.61, 4.44)
date<- c("2004-01-23", "2004-01-26", "2004-01-27", "2004-01-28",
"2004-02-02", "2004-02-03", "2004-02-04", "2004-02-05",
"2004-02-06", "2004-02-11", "2004-02-12", "2004-02-13",
"2004-02-17", "2004-02-18", "2004-02-19", "2004-02-20",
"2004-02-23", "2004-02-24", "2004-02-25", "2004-02-26",
"2004-02-27", "2004-03-01", "2004-03-02", "2004-03-03",
"2004-03-04", "2004-03-05", "2004-03-08", "2004-03-09",
"2004-03-10", "2004-03-11", "2004-03-12", "2004-03-15",
"2004-03-16", "2004-03-17", "2004-03-18", "2004-03-19",
"2004-03-22", "2004-03-23", "2004-03-24", "2004-03-25",
"2004-03-26", "2004-03-29", "2004-03-30", "2004-03-31",
"2004-04-01", "2004-04-02", "2004-04-05", "2004-04-06",
"2004-04-07", "2004-04-08", "2004-04-12", "2004-04-13",
"2004-04-14", "2004-04-15", "2004-04-16", "2004-04-19",
"2004-04-20", "2004-04-21", "2004-04-22", "2004-04-23",
"2004-04-26", "2004-04-27", "2004-04-28", "2004-04-29",
"2004-04-30", "2004-05-03", "2004-05-04", "2004-05-05",
"2004-05-06", "2004-05-07", "2004-05-10", "2004-05-11",
"2004-05-12", "2004-05-13", "2004-05-14", "2004-05-17",
"2004-05-18", "2004-05-19", "2004-05-20", "2004-05-21",
"2004-05-24", "2004-05-25", "2004-05-26", "2004-05-27",
"2004-05-28", "2004-06-01", "2004-06-02", "2004-06-03",
"2004-06-04", "2004-06-07", "2004-06-08", "2004-06-09",
"2004-06-10", "2004-06-14", "2004-06-15", "2004-06-16",
"2004-06-17", "2004-06-18", "2004-06-21", "2004-06-22",
"2004-06-23", "2004-06-24", "2004-06-25", "2004-06-28",
"2004-06-29", "2004-06-30", "2004-07-01", "2004-07-02")
price5<- SMA(price,n=5)
price20<- SMA(price,n=20)
pricedf<- data.frame(date,price5,price20,price)
ggplot(pricedf,aes(date))+geom_line(group=1,aes(y=price5,colour="ma5"))+geom_line(group=1,aes(y=price20,colour="ma20"))+xlab("Date")+ylab("Price")
There are a couples of crossovers on this plot. What I want to have is when ma5 above ma20 mark as green line on 'price'(one feature in my pricedf) plot. On the other hand when ma5 under ma20 mark as red line on 'price' plot.
The example plot looks like this picture,
I was thinking subtract price5 to price20 and compare whether the values are greater than 0. But how can I draw them on another plot with different colors?
Here is how I solved it.
library("TTR")
library(ggplot2)
price<- c(3.23, 3.29, 3.29 , 3.21, 3.19, 3.18, 3.11, 3.21, 3.25,
3.40, 3.39, 3.28, 3.31 , 3.32, 3.21, 3.19, 3.16, 3.20,
3.26, 3.30, 3.42, 3.44, 3.40, 3.41, 3.59, 3.83, 3.70,
3.86, 3.95, 3.89, 3.94, 3.78, 3.69, 3.74, 3.67, 3.69,
3.69, 3.61, 3.64, 3.83, 3.88, 3.98, 3.98, 3.86, 3.87,
3.93, 4.05, 3.97, 3.90, 3.93, 4.00, 3.85, 3.81, 4.20,
4.17, 4.05, 3.95, 3.96, 3.97, 3.96, 3.88, 3.85, 3.79,
3.83, 3.68, 3.72, 3.73, 3.81, 3.80, 3.81, 3.75, 3.87,
3.90, 3.89, 3.86, 3.81, 3.86, 3.78, 3.83, 3.87, 3.91,
4.05, 4.07, 4.02, 4.01, 4.00, 4.13, 4.07, 4.11, 4.26,
4.33, 4.32, 4.39, 4.30, 4.39, 4.68, 4.69, 4.70, 4.60,
4.71, 4.81, 4.73, 4.78, 4.64, 4.64, 4.64, 4.61, 4.44)
date<- c("2004-01-23", "2004-01-26", "2004-01-27", "2004-01-28",
"2004-02-02", "2004-02-03", "2004-02-04", "2004-02-05",
"2004-02-06", "2004-02-11", "2004-02-12", "2004-02-13",
"2004-02-17", "2004-02-18", "2004-02-19", "2004-02-20",
"2004-02-23", "2004-02-24", "2004-02-25", "2004-02-26",
"2004-02-27", "2004-03-01", "2004-03-02", "2004-03-03",
"2004-03-04", "2004-03-05", "2004-03-08", "2004-03-09",
"2004-03-10", "2004-03-11", "2004-03-12", "2004-03-15",
"2004-03-16", "2004-03-17", "2004-03-18", "2004-03-19",
"2004-03-22", "2004-03-23", "2004-03-24", "2004-03-25",
"2004-03-26", "2004-03-29", "2004-03-30", "2004-03-31",
"2004-04-01", "2004-04-02", "2004-04-05", "2004-04-06",
"2004-04-07", "2004-04-08", "2004-04-12", "2004-04-13",
"2004-04-14", "2004-04-15", "2004-04-16", "2004-04-19",
"2004-04-20", "2004-04-21", "2004-04-22", "2004-04-23",
"2004-04-26", "2004-04-27", "2004-04-28", "2004-04-29",
"2004-04-30", "2004-05-03", "2004-05-04", "2004-05-05",
"2004-05-06", "2004-05-07", "2004-05-10", "2004-05-11",
"2004-05-12", "2004-05-13", "2004-05-14", "2004-05-17",
"2004-05-18", "2004-05-19", "2004-05-20", "2004-05-21",
"2004-05-24", "2004-05-25", "2004-05-26", "2004-05-27",
"2004-05-28", "2004-06-01", "2004-06-02", "2004-06-03",
"2004-06-04", "2004-06-07", "2004-06-08", "2004-06-09",
"2004-06-10", "2004-06-14", "2004-06-15", "2004-06-16",
"2004-06-17", "2004-06-18", "2004-06-21", "2004-06-22",
"2004-06-23", "2004-06-24", "2004-06-25", "2004-06-28",
"2004-06-29", "2004-06-30", "2004-07-01", "2004-07-02")
price5<- SMA(price,n=5)
price20<- SMA(price,n=20)
pricedf<- data.frame(date,price5,price20,price)
coldf <- ifelse(price5 - price20 > 0, 'green', 'red')
coldf[is.na(coldf)] <- 'green'
coldf
ggplot(pricedf) +
geom_line( aes(x = date, y=price, group = 1, color = coldf)) +
xlab("Date") +
ylab("Price")
Which creates this
graph,
I used an ifelse statement to find where price5 is greater then price 20. The problem is that this creates NA's which I filled with green. I am not 100% on if you which way you wanted it to be in terms of the green to the red. You can simply change the
coldf <- ifelse(price5 - price20 > 0, 'green', 'red')
to
coldf <- ifelse(price5 - price20 > 0, 'red', 'green')
Which looks like graph2.