Order axis when doing a bubble chart using plotly in R - r

I have a bubble chart using plotly in R but the order of the axis appear to be somehow odd.
The output is as follows and you can see how the axis are not correct:
The code that I'm using is as follows
library(plotly)
library(ggplot2)
file <- c("C://link//data.csv")
#dataSource <- read.csv(file, sep =",", header = TRUE)
dataSource <- read.table(file, header=T, sep=",")
dataSource <- na.omit(dataSource)
slope <- 1
dataSource$size <- sqrt(dataSource$Y.1 * slope)
colors <- c('#4AC6B7', '#1972A4') #, '#965F8A', '#FF7070', '#C61951')
plot_ly(dataSource,
x = ~Y.1.vs.Y.2,
y = ~YTD.vs.Y.1.YTD,
color = ~BU,
size = ~size,
colors = colors,
type = 'scatter',
mode = 'markers',
sizes = c(min(dataSource$size), max(dataSource$size)),
marker = list(symbol = 'circle', sizemode = 'diameter',
line = list(width = 2, color = '#FFFFFF')),
text = ~paste('Business Unit:',
BU, '<br>Product:',
Product, '<br>Y.1.vs.Y.2:',
Y.1.vs.Y.2, '<br>YTD.vs.Y.1.YTD:',
YTD.vs.Y.1.YTD)) %>%
layout(title = 'Y.1.vs.Y.2 v. YTD.vs.Y.1.YTD',
xaxis = list(title = 'Y.1.vs.Y.2',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'YTD.vs.Y.1.YTD',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
paper_bgcolor = 'rgb(243, 243, 243)',
plot_bgcolor = 'rgb(243, 243, 243)')
The data is as follows:
structure(list(BU = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("B", "D"), class = "factor"), Product = structure(c(4L, 5L, 7L, 8L, 9L, 13L, 1L, 3L, 4L, 11L, 12L, 13L), .Label = c("ADT", "BHL", "CEX", "CMX", "CTL", "HTH", "MTL", "SSL", "TLS", "UTV", "WEX", "WLD", "WMX"), class = "factor"), Y.2 = c(4065L, 499L, 20L, 5491L, 781L, 53L, 34L, 1338L, 557L, 428L, 310L, 31L), Y.1 = c(4403L, 550L, 28L, 5225L, 871L, 46L, 22L, 1289L, 602L, 426L, 318L, 37L), Y.1.YTD = c(4403L, 550L, 28L, 5225L, 871L, 46L, 22L, 1289L, 602L, 426L, 318L, 37L), YTD = c(5026L, 503L, 29L, 3975L, 876L, 40L, 62L, 1395L, 717L, 423L, 277L, 35L), Y.1.vs.Y.2 = structure(c(12L, 7L, 11L, 4L, 8L, 1L, 2L, 3L, 12L, 6L, 10L, 9L), .Label = c("-13%", "-35%", "-4%", "-5%", "-76%", "0%", "10%", "12%", "19%", "3%", "40%", "8%"), class = "factor"), YTD.vs.Y.1.YTD = structure(c(8L, 5L, 11L, 3L, 7L, 2L, 9L, 12L, 10L, 1L, 2L, 4L), .Label = c("-1%", "-13%", "-24%", "-5%", "-9%", "0%", "1%", "14%", "182%", "19%", "4%", "8%"), class = "factor")), .Names = c("BU", "Product", "Y.2", "Y.1", "Y.1.YTD", "YTD", "Y.1.vs.Y.2", "YTD.vs.Y.1.YTD"), row.names = c(2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 13L, 14L, 15L), class = "data.frame", na.action = structure(c(1L, 7L, 12L), .Names = c("1", "7", "12"), class = "omit"))
Any ideas on how can I order the axis properly?
Thanks

There are a few ways to manipulate factor levels, but things can get a bit messy if you're not careful. You should familiarize yourself with ?levels and ?factor, as well as maybe ?reorder, ?relevel
In the meantime, try something like this
dataSource[[7]] <- factor(dataSource[[7]], levels = c("-76%", "-35%", "-13%", "-5%", "-4%", "0%", "3%", "8%", "10%", "12%", "19%", "40%"))
Edit
To consolidate my answer and comment...
This behaviour is caused because of the way factors are encoded. Your axes are strings and factor order is determined alphnumerically. So to change their order you have to specify it as above, or else code them numerically and give them the required names. There are many different ways to change them, in several packages. This answer provides a standard base R method for handling factors. For further info start with the manual pages I suggested.
As for it being "very manual", since factors are categorical (and therefore have a potentially arbitrary order), there is no way to automate their order unless you code them numerically in the desired order.

Thanks to the comments above I've been able to resolve the issue. Find below the full code, which I hope might help other users:
library(plotly)
library(ggplot2)
file <- c("C://link//data.csv")
dataSource <- read.table(file, header=T, sep=",")
dataSource <- na.omit(dataSource)
# Additional code to format the input values and recalculate the percentages
BUValues = dataSource$BU
ProductValues = dataSource$Product
dataSource <- as.data.frame(data.matrix(dataSource), stringsAsfactors = FALSE)
dataSource$BU = BUValues
dataSource$Product = ProductValues
dataSource$Y.1.vs.Y.2 = round((dataSource$Y.1/dataSource$Y.2 -1)*100,2)
dataSource$YTD.vs.Y.1.YTD = round((dataSource$YTD/dataSource$Y.1.YTD -1)*100,2)
slope <- 1
dataSource$size <- sqrt(dataSource$Y.1 * slope)
colors <- c('#4AC6B7', '#1972A4') #, '#965F8A', '#FF7070', '#C61951')
plot_ly(dataSource,
x = ~Y.1.vs.Y.2,
y = ~YTD.vs.Y.1.YTD,
color = ~BU,
size = ~size,
colors = colors,
type = 'scatter',
mode = 'markers',
sizes = c(min(dataSource$size), max(dataSource$size)),
marker = list(symbol = 'circle', sizemode = 'diameter',
line = list(width = 2, color = '#FFFFFF')),
text = ~paste('Business Unit:', BU,
'<br>Product:', Product,
'<br>YoY:',Y.1.vs.Y.2,
'<br>YTD:',YTD.vs.Y.1.YTD)) %>%
layout(title = 'YoY vs YTD Performance',
xaxis = list(title = 'YoY Performance (%)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'YTD Performance (%)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
paper_bgcolor = 'rgb(243, 243, 243)',
plot_bgcolor = 'rgb(243, 243, 243)')

Related

how to show average, median, and stdv in a histogram

How would I go about showing in these histograms the average, median, and standard deviation of the data.
Here is my histogram code:
hist(PRE$Productivity...Productivité, main = "PRE", xlab = "Productivity")
hist(DBN$Productivity...Productivité, main = "DBN", xlab = "Productivity")
hist(DBG$Productivity...Productivité, main = "DBG", xlab = "Productivity")
hist(POST$Productivity...Productivité, main = "POST", xlab = "Productivity")
And here is it's output
dput(head(DBN))
structure(list(Participant.Code = c("AE1_02", "AE1_02", "AE1_02",
"AE1_02", "AE2_08", "AE2_08"), Condition = structure(c(5L, 5L,
5L, 5L, 5L, 5L), levels = c("", "DBG", "DBG DBN", "DBG POST",
"DBN", "DBN DBG", "DBN POST", "POST", "PRE", "PRE DBG", "PRE DBN"
), class = "factor"), Start.time = c("3-9-22 8:39:27", "3-9-22 16:27:44",
"3-10-22 8:48:34", "3-10-22 16:09:33", "3-18-22 8:36:15", "3-18-22 17:26:13"
), Stiffness...Raideur = c(7L, 7L, 7L, 7L, 4L, 4L), Fatigue...Fatigue = c(7L,
8L, 8L, 8L, 4L, 6L), Discomfort...Inconfort = c(7L, 7L, 7L, 7L,
3L, 6L), Happiness...Joie = c(8L, 8L, 8L, 8L, 6L, 5L), Productivity...Productivité = c(6L,
8L, 7L, 7L, 5L, 4L), Ability.to.concentrate...Capacité.de.se.concentrer = c(7L,
8L, 7L, 6L, 5L, 4L), Alertness...Vigilance = c(7L, 8L, 7L, 6L,
5L, 5L), Stress...Stress = c(6L, 8L, 7L, 6L, 5L, 5L), Back.Pain...Mal.de.dos = c(8L,
7L, 8L, 8L, 3L, 4L), Neck.Pain...Douleur.au.cou = c(5L, 4L, 7L,
7L, 3L, 4L), Head.Pain...Mal.de.tête = c(1L, 1L, 1L, 1L, 2L,
4L), Eye.Pain...Douleur.oculaire = c(1L, 1L, 1L, 1L, 3L, 4L)), row.names = c(17L,
18L, 21L, 22L, 57L, 58L), class = "data.frame")
You can use the function abline immediatly after the histogram call to add a vertical line intersecting the x axis. In this case, I am creating a line to show the location of the mean in each dataset. Then, to add the value, you can add it directly as a label or put it into a legend. I am adding some padding to the ylim so the legend or label doesn't overlap with the title. Finally, to arrange them in a similar way as you want it, you can prepare the panel using the function par():
n <- 10000
example_a <- rgamma(n, 5, 2)
example_b <- rnorm(n, 5, 2)
example_c <- rbeta(n, 5, 2)
max_a <- max(hist(example_a, plot = F)$counts)
max_b <- max(hist(example_b, plot = F)$counts)
max_c <- max(hist(example_c, plot = F)$counts)
mean_a <- mean(example_a)
mean_b <- mean(example_b)
mean_c <- mean(example_c)
par(mfrow = c(2,2)) #creates 4x4 layout
hist(example_a, main = "PRE", xlab = "Productivity",
col = "slategray1", border = "gray",
ylim = c(0, max_a + 200))
abline(v = mean_a, col = "darkred", lwd = 3, lty = 2)
legend("topright", legend = c("Mean", round(mean_a, 3)),
lwd = c(3, NA), lty = c(2, NA), col = c("darkred", NA))
hist(example_b, main = "DBN", xlab = "Productivity",
col = "slategray1", border = "gray",
ylim = c(0, max_b + 200))
abline(v = mean_b, col = "forestgreen", lwd = 3, lty = 2)
text(x = mean_b - 2, y = 1990, paste("Mean = ", round(mean_b, 3)))
hist(example_c, main = "DBG", xlab = "Productivity",
col = "slategray1", border = "gray",
ylim = c(0, max_c + 200))
abline(v = mean(example_c), col = "purple4", lwd = 3, lty = 2)
legend("topleft", legend = c("Mean", round(mean_c, 3)),
lwd = c(3, NA), lty = c(2, NA), col = c("darkred", NA))
And that gives us the following plots:

R circos plot calculating percent of total for each category

My data has two variables, TRV and TRJ, and I am seeing how often they match with each other. I would like to place on a circos figure what percent each variable shows up in total on the outer layer. It should add up to 200%, 100% for TRV, 100% for TRJ.
library(circlize)
library(plyr)
df <- structure(list(TRV = structure(c(1L, 1L, 1L, 2L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("TRAV29/DV5", "TRAV36/DV7",
"TRDV1", "TRDV2", "TRDV3", "TRGV8", "TRGV9"), class = "factor"),
TRJ = structure(c(64L, 65L, 67L, 64L, 64L, 65L, 66L, 67L,
64L, 65L, 66L, 64L, 65L, 66L), .Label = c("", "mTRAJ22",
"mTRAJ30", "mTRAJ34", "mTRAJ37", "mTRAJ45", "mTRAJ49", "mTRBJ1-1",
"mTRBJ2-5", "mTRDJ1", "mTRDJ2", "mTRGJ1", "mTRGJ4", "TRAJ10",
"TRAJ15", "TRAJ16", "TRAJ19", "TRAJ2", "TRAJ20", "TRAJ21",
"TRAJ22", "TRAJ23", "TRAJ24", "TRAJ26", "TRAJ27", "TRAJ30",
"TRAJ32", "TRAJ34", "TRAJ36", "TRAJ37", "TRAJ38", "TRAJ39",
"TRAJ40", "TRAJ41", "TRAJ42", "TRAJ43", "TRAJ44", "TRAJ45",
"TRAJ49", "TRAJ5", "TRAJ50", "TRAJ52", "TRAJ53", "TRAJ54",
"TRAJ56", "TRAJ57", "TRAJ58", "TRAJ6", "TRAJ7", "TRAJ8",
"TRBJ1-1", "TRBJ1-2", "TRBJ1-3", "TRBJ1-4", "TRBJ1-5", "TRBJ1-6",
"TRBJ2-1", "TRBJ2-2", "TRBJ2-3", "TRBJ2-4", "TRBJ2-5", "TRBJ2-6",
"TRBJ2-7", "TRDJ1", "TRDJ2", "TRDJ3", "TRDJ4", "TRGJ1", "TRGJ2",
"TRGJP", "TRGJP1", "TRGJP2"), class = "factor"), freq = c(387L,
3L, 1L, 1L, 3533L, 445L, 132L, 55L, 563L, 15L, 5L, 830L,
4L, 72L)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
10L, 11L, 12L, 13L, 14L, 15L), class = "data.frame")
grid.col = c(`TRAV29/DV5` = "red", TRDV3 = "green", TRDV2 = "blue", TRDV1 = "purple", `TRAV36/DV7` = "pink",
TRDJ4 = "orange", TRDJ1 = "palegreen", TRDJ2 = "lightsteelblue", TRDJ3 = "thistle", TRGJP = "yellow", TRGJ2 = "grey", TRGJP2 = "brown", TRGJ1 = "lightpink")
circ_plot <- function(df){
circos.par(canvas.xlim=c(-1.5,1.5),canvas.ylim=c(-1.5,1.5))
chordDiagram(df, annotationTrack = "grid",
grid.col = grid.col,
link.lwd = matrix(1, nrow = nrow(df), ncol = ncol(df)),
link.border = "black")
for(si in get.all.sector.index()) {
xlim = get.cell.meta.data("xlim", sector.index = si, track.index = 1)
ylim = get.cell.meta.data("ylim", sector.index = si, track.index = 1)
circos.text(mean(xlim),ylim[1], si, sector.index = si, track.index = 1,
facing = "clockwise",
cex=0.8,
adj=c(-1,0),
niceFacing = TRUE)
circos.axis(h = 0,
major.at = c(0,0.5,1,1.5,2,2.5,3,3.5,4,4.5,5) ,
labels.cex = 0.2,labels.facing = "inside",
sector.index = si, track.index = 1)
}
# Restart circular layout parameters
circos.clear()
}
circ_plot(df)

Add Count Labels on Top of Barchart in base R

I have a barplot to which I'm looking to add count labels on top of each bars. Can someone tell me how to do that in base R and NOT using ggplot2?
Since I saw your last question I have a bit more detail than others.
Example data
structure(list(ID = c(140L, 620L, 868L, 1120L, 2313L), DemAffl = c(10L,
4L, 5L, 10L, 11L), DemAge = c(76L, 49L, 70L, 65L, 68L), DemCluster = c(16L,
35L, 27L, 51L, 4L), DemClusterGroup = c("C", "D", "D", "F", "A"
), DemGender = c("U", "U", "F", "M", "F"), DemReg = c("Midlands",
"Midlands", "Midlands", "Midlands", "Midlands"), DemTVReg = c("Wales & West",
"Wales & West", "Wales & West", "Midlands", "Midlands"), PromClass = c("Gold",
"Gold", "Silver", "Tin", "Tin"), PromSpend = c(16000, 6000, 0.02,
0.01, 0.01), PromTime = c(4L, 5L, 8L, 7L, 8L), TargetBuy = c(0L,
0L, 1L, 1L, 0L), TargetAmt = c(0L, 0L, 1L, 1L, 0L)), row.names = c(NA,
5L), class = "data.frame")
To make the plot
counts <- table(df$TargetBuy)
Here you will need to change the y-axis scale because if you don't the top label wont show
b <- barplot(counts, main= "number of yes/no", xlab = "response", ylab = "number of occurrences", ylim=c(0,4))
To add the labels you need to add text() to the plot
text(x= b, y=counts,pos = 3, label = counts, cex = 0.8, col = "red")
so the full thing will look like this
counts <- table(df$TargetBuy)
b <- barplot(counts, main= "number of yes/no", xlab = "response", ylab = "number of occurrences", ylim=c(0,4)
text(x= b, y=counts,pos = 3, label = counts, cex = 0.8, col = "red")
This produces a plot that looks like this. Note that I changed the y axis length to 4. If it was set to 3, the top 3 above the first bar would not show

Plotly heatmap with different cell widths

I would like to plot an interactive heatmap, where the column widths are different.
Although I managed to get different cell widths, the widths do not correspond to the values and the ordering is not correct.
The order of the x-axis should remain the same as the segments column in the df data.frame.
If the heatmap doesn't work, I would also be fine with a stacked barchart.
df <- structure(list(
segments = c(101493L, 101493L, 101493L, 101492L, 101492L, 101492L, 101494L, 101494L, 101494L, 102018L, 102018L,
102018L, 102018L, 102018L, 102019L, 102019L, 102019L, 102019L, 102019L),
timestamp = structure(c(1579233600, 1579240800, 1579248000,
1579233600, 1579240800, 1579248000, 1579233600, 1579240800, 1579248000,
1579219200, 1579226400, 1579233600, 1579240800, 1579248000, 1579219200,
1579226400, 1579233600, 1579240800, 1579248000), class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin"),
value = c(91.772, 91.923, 96.968, 104.307, 101.435, 105.539, 104.879, 104.197, 103.038,
96.403, 90.926, 111.807, 115.931, 111.729, 100.129, 86.903, 108.22, 117.841, 112.293),
width = c(5L, 5L, 5L, 2L, 2L, 2L, 3L, 3L, 3L, 10L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L)),
row.names = c(1L, 2L, 3L, 11L, 12L, 13L, 21L, 22L, 23L, 31L, 32L, 33L, 34L, 35L,43L, 44L, 45L, 46L, 47L),
class = "data.frame")
library(plotly)
plot_ly(data = df) %>%
add_trace(type="heatmap",
x = ~as.character(width),
y = ~timestamp,
z = ~value,
xgap = 0.2, ygap = 0.2) %>%
plotly::layout(xaxis = list(rangemode = "nonnegative",
tickmode = "array",
tickvals=as.character(unique(df$width)),
ticktext=as.character(unique(df$segments)),
zeroline = FALSE))
By giving Plotly a matrix for the z-values it seems to work and the widths are respected.
df$newx <- rep(cumsum(df[!duplicated(df$segments),]$width), rle(df$segments)$length)
mappdf <- expand.grid(timestamp=unique(df$timestamp), newx=unique(df$newx))
mappdf <- merge(mappdf, df[,c("timestamp","value","newx")], all.x = T, all.y = F, sort = F)
mappdf <- mappdf[order(mappdf$newx, mappdf$timestamp),]
zvals <- matrix(data = mappdf$value,
nrow = length(unique(df$timestamp)),
ncol = length(unique(df$newx)))
plot_ly() %>%
add_heatmap(y = sort(unique(df$timestamp)),
x = c(0,unique(df$newx)),
z = zvals) %>%
plotly::layout(xaxis = list(
title = "",
tickvals=unique(df$newx),
ticktext=paste(unique(df$segments), "-", unique(df$width))
))

Multiple vertical shaded area

I am plotting the proportion of deep sleep (y axis) vs days (x axis). I would like to add vertical shaded area for a better understanding (e.g. grey for week-ends, orange for sick period...).
I have tried using geom_ribbon (I created a variable taking the value of 30, with is the top of my y axis if the data is during the WE - information given in another column), but instead of getting rectangles, I get trapezes.
In another post, someone proposed the use of "geom_rect", or "annotate" if one's know the x and y coordinates, but I don't see how to adapt it in my case, when I want to have the colored area repeated to all week-end (it is not exactly every 7 days because some data are missing).
Do you have any idea ?
Many thanks in advance !
ggplot(Sleep.data, aes(x = DATEID)) +
geom_line(aes(y = P.DEEP, group = 1), col = "deepskyblue3") +
geom_point(aes(y = P.DEEP, group = 1, col = Sign.deep)) +
guides(col=FALSE) +
geom_ribbon(aes(ymin = min, ymax = max.WE), fill = '#6495ED80') +
facet_grid(MONTH~.) +
geom_hline(yintercept = 15, col = "forestgreen") +
geom_hline(yintercept = 20, col = "forestgreen", linetype = "dashed") +
geom_vline(xintercept = c(7,14,21,28), col = "grey") +
scale_x_continuous(breaks=seq(0,28,7)) +
scale_y_continuous(breaks=seq(0,30,5)) +
labs(x = "Days",y="Proportion of deep sleep stage", title = "Deep sleep")
Proportion of deep sleep vs time
Head(Sleep.data)
> dput(head(Sleep.data))
structure(list(DATE = structure(c(1L, 4L, 7L, 10L, 13L, 16L), .Label = c("01-Dec-17",
"01-Feb-18", "01-Jan-18", "02-Dec-17", "02-Feb-18", "02-Jan-18",
"03-Dec-17", "03-Feb-18", "03-Jan-18", "04-Dec-17", "04-Feb-18",
"04-Jan-18", "05-Dec-17", "05-Feb-18", "05-Jan-18", "06-Dec-17",
"06-Feb-18", "06-Jan-18", "07-Dec-17", "07-Feb-18", "07-Jan-18",
"08-Dec-17", "08-Jan-18", "09-Dec-17", "09-Feb-18", "09-Jan-18",
"10-Dec-17", "10-Jan-18", "11-Dec-17", "11-Feb-18", "11-Jan-18",
"12-Dec-17", "12-Jan-18", "13-Dec-17", "13-Feb-18", "13-Jan-18",
"14-Dec-17", "14-Feb-18", "14-Jan-18", "15-Dec-17", "15-Jan-18",
"16-Dec-17", "16-Jan-18", "17-Dec-17", "17-Jan-18", "18-Dec-17",
"18-Jan-18", "19-Dec-17", "19-Jan-18", "20-Dec-17", "21-Dec-17",
"21-Jan-18", "22-Dec-17", "22-Jan-18", "23-Dec-17", "23-Jan-18",
"24-Dec-17", "24-Jan-18", "25-Dec-17", "25-Jan-18", "26-Dec-17",
"26-Jan-18", "27-Dec-17", "27-Jan-18", "28-Dec-17", "28-Jan-18",
"29-Dec-17", "29-Jan-18", "30-Dec-17", "30-Jan-18", "31-Dec-17",
"31-Jan-18"), class = "factor"), DATEID = 1:6, MONTH = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("Decembre", "Janvier", "Février"
), class = "factor"), DURATION = c(8.08, 7.43, 6.85, 6.23, 7.27,
6.62), D.DEEP = c(1.67, 1.37, 1.62, 1.75, 1.95, 0.9), P.DEEP = c(17L,
17L, 21L, 24L, 25L, 12L), STIMS = c(0L, 0L, 0L, 0L, 390L, 147L
), D.REM = c(1.7, 0.95, 0.95, 1.43, 1.47, 0.72), P.REM = c(17L,
11L, 12L, 20L, 19L, 9L), D.LIGHT = c(4.7, 5.12, 4.27, 3.05, 3.83,
4.98), P.LIGHT = c(49L, 63L, 55L, 43L, 49L, 66L), D.AWAKE = c(1.45,
0.58, 0.47, 0.87, 0.37, 0.85), P.AWAKE = c(15L, 7L, 6L, 12L,
4L, 11L), WAKE.UP = c(-2L, 0L, 2L, -1L, 3L, 1L), AGITATION = c(-1L,
-3L, -1L, -2L, 2L, -1L), FRAGMENTATION = c(1L, -2L, 2L, 1L, 0L,
-1L), PERIOD = structure(c(3L, 3L, 4L, 4L, 4L, 4L), .Label = c("HOLIDAYS",
"SICK", "WE", "WORK"), class = "factor"), SPORT = structure(c(2L,
1L, 2L, 2L, 2L, 1L), .Label = c("", "Day", "Evening"), class = "factor"),
ACTIVITY = structure(c(6L, 1L, 3L, 4L, 5L, 1L), .Label = c("",
"Bkool", "eBike", "Gym", "Natation", "Run"), class = "factor"),
TABLETS = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5), Ratio = c(1.15,
2.36, 3.45, 2.01, 5.27, 1.06), Sign = structure(c(2L, 2L,
2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
Sign.ratio = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0",
"1"), class = "factor"), Sign.deep = structure(c(2L, 2L,
2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"),
Sign.awake = structure(c(1L, 2L, 2L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), Sign.light = structure(c(2L, 1L,
1L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"),
index = structure(c(1L, 1L, 1L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), min = c(0, 0, 0, 0, 0, 0), max.WE = c(30,
30, 0, 0, 0, 0)), .Names = c("DATE", "DATEID", "MONTH", "DURATION",
"D.DEEP", "P.DEEP", "STIMS", "D.REM", "P.REM", "D.LIGHT", "P.LIGHT",
"D.AWAKE", "P.AWAKE", "WAKE.UP", "AGITATION", "FRAGMENTATION",
"PERIOD", "SPORT", "ACTIVITY", "TABLETS", "Ratio", "Sign", "Sign.ratio",
"Sign.deep", "Sign.awake", "Sign.light", "index", "min", "max.WE"
), row.names = c(NA, 6L), class = "data.frame")
Thanks for adding the data, that makes it easier to understand exactly what you're working with and to confirm that an answer actually addresses your question.
I thought it would be helpful to make a separate table with just the start and end of each contiguous set of rows with the same PERIOD. I did this using dplyr::case_when, assuming we should mark dates as a "start" if they are the first row in the table (row_number() == 1), or they have a different PERIOD value than the prior row. I mark dates as an "end" if they are the last row of the table, or have a different PERIOD than the next row. I only keep the starts and ends, and spread these into new columns called start and end.
library(tidyverse)
Period_ranges <- Sleep.data %>%
mutate(period_status = case_when(row_number() == 1 ~ "start",
PERIOD != lag(PERIOD) ~ "start",
row_number() == n() ~ "end",
PERIOD != lead(PERIOD) ~ "end",
TRUE ~ "other")) %>%
filter(period_status %in% c("start", "end")) %>%
select(DATEID, PERIOD, period_status) %>%
mutate(PERIOD_NUM = cumsum(PERIOD != lag(PERIOD) | row_number() == 1)) %>%
spread(period_status, DATEID)
# Output based on sample data only. If there's a problem with the full data, please add more. To share full data, use `dput(Sleep.data)` or to share 20 rows use `dput(head(Sleep.data, 20))`.
>Period_ranges
PERIOD PERIOD_NUM end start
1 WE 1 2 1
2 WORK 2 6 3
We can now use that in the plot. If you want to toggle the inclusion or fiddle with the appearance separately of different PERIOD types, you could modify the code below with Period_ranges %>% filter(PERIOD == "WE"),
ggplot(Sleep.data, aes(x = DATEID)) +
# Here I specify that this geom should use its own data.
# I start the rectangles half a day before and end half a day after to fill the space.
geom_rect(data = Period_ranges, inherit.aes = F,
aes(xmin = start - 0.5, xmax = end + 0.5,
ymin = 0, ymax = 30,
fill = PERIOD), alpha = 0.5) +
# Here we can specify the shading color for each type of PERIOD
scale_fill_manual(values = c(
"WE" = '#6495ED80',
"WORK" = "gray60"
)) +
# rest of your code
Chart based on data sample:

Resources