A circular histogram in R shows incorrect values - r

I'm trying to recreate a circular plot from here (a first plot on this page), but the output I just got seems incorrect. The 'last' bar (between 23 and 0) is missing and the 'first' one (between 0 and 1) is unproportionally high. What's more, bars appear 'moved' by one unit to the left, while on the website above the plot seems fine.
Here is a code which I copied from that site. The only difference I made is that I removed "width=2" from geom_histogram(), because otherwise it raised an error saying that argument width was deprecated.
library(lubridate)
library(ggplot2)
set.seed(44)
N=500
events <- as.POSIXct("2011-01-01", tz="GMT") +
days(floor(365*runif(N))) +
hours(floor(24*rnorm(N))) +
minutes(floor(60*runif(N))) +
seconds(floor(60*runif(N)))
hour_of_event <- hour(events)
eventdata <- data.frame(datetime = events, eventhour = hour_of_event)
# determine if event is in business hours
eventdata$Workday <- eventdata$eventhour %in% seq(9, 17)
ggplot(eventdata, aes(x = eventhour, fill = Workday)) +
geom_histogram(breaks = seq(0, 24), colour = "grey") +
coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + ylab("Count") +
ggtitle("Events by Time of day") +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0, 24))
Here is what I got:
Here is a table of the data. You can see that for hour 23 should be a value of 17 instead of 0 like in my plot.
table(eventdata$eventhour)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
23 22 18 26 28 20 19 21 16 17 20 16 18 22 16 21 24 21 22 27 25 18 23 17
Do you have an idea why my plot doesn't show correct values and how I can fix this?

I propose this solution based on this post :
library(lubridate)
library(ggplot2)
set.seed(44)
N=500
events <- as.POSIXct("2011-01-01", tz="GMT") +
days(floor(365*runif(N))) +
hours(floor(24*rnorm(N))) +
minutes(floor(60*runif(N))) +
seconds(floor(60*runif(N)))
hour_of_event <- hour(events)
eventdata <- data.frame(datetime = events, eventhour = hour_of_event)
# determine if event is in business hours
eventdata$Workday <- eventdata$eventhour %in% seq(9, 17)
df <- data.frame(table(eventdata$eventhour),
business_hour = 0:23 %in% seq(9, 17))
colnames(df)[1:2] <- c("hour", "value")
ggplot(df, aes(hour, value, fill = business_hour)) +
coord_polar(theta = "x", start = 0) +
geom_bar(stat = "identity", width = .9)
I hope it helps. It doesn't tell you why you have a problem in your case but it gives you a viable solution.

It seems that the issue was caused by arguments of geom_histogram and scale_x_continuous function.
Instead of this:
geom_histogram(breaks = seq(0, 24), colour = "grey") +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0, 24))
it should be:
geom_histogram(bins = 24, colour = "grey") +
scale_x_continuous(breaks = seq(-0.5, 23.5), labels = seq(0, 24))
It's still a bit confusing to me why it works only this way, but it finally works...

Related

Change legend labels and position dodge

I created with ggplot an interaction plot and added with a different dataframe outliers into the same plot. I want to change the legend's labels (yes and no), but a new legend is added instead of changing them. Here is the Code:
the theme I'm using:
theme_apa(
legend.pos = "right",
legend.use.title = FALSE,
legend.font.size = 12,
x.font.size = 12,
y.font.size = 12,
facet.title.size = 12,
remove.y.gridlines = TRUE,
remove.x.gridlines = TRUE
)
the plot:
InteractionWithOutliers <- ggplot() +
geom_line(data=data2, aes(x=Messzeitpunkt,
y = Sum_PCLMean,group = TB2,linetype=TB2),) +
scale_color_manual(labels = c("test", "test"),values=c('#000000','#000000'))+
geom_point(data = outliersDF, aes(Messzeitpunkt,Sum_PCL,
shape=TB2, color=TB2, size=TB2),) +
geom_point(data = data2, aes(Messzeitpunkt,Sum_PCLMean,
shape=TB2, color=TB2, size=TB2), ) +
scale_shape_manual(values=c(15, 17))+
scale_size_manual(values=c(2,2)) +
ylim(0, 60) +
scale_x_continuous(breaks = seq(0,2)) +
geom_errorbar(data=data2,aes(x = Messzeitpunkt,ymin=Sum_PCLMean-Sum_PCLSD, ymax=Sum_PCLMean+Sum_PCLSD), width=.2,)
InteractionWithOutliers + theme_apa() +
labs(x ="Measurement Period", y = "PTSS mean scores")
Image of the Graph:
Furthermore, when i try to use position dodge to split the position of the interaction plot and the outliers, not everything moves the same way.
Code:
InteractionWithOutliers <- ggplot() +
geom_line(data=data2, aes(x=Messzeitpunkt,
y = Sum_PCLMean,group = TB2,linetype=TB2),position = position_dodge(width = 0.4)) +
scale_color_manual(labels = c("test", "test"),values=c('#000000','#000000'))+
geom_point(data = outliersDF, aes(Messzeitpunkt,Sum_PCL,
shape=TB2, color=TB2, size=TB2),position = position_dodge(width = 0.4)) +
geom_point(data = data2, aes(Messzeitpunkt,Sum_PCLMean,
shape=TB2, color=TB2, size=TB2),position = position_dodge(width = 0.4) ) +
scale_shape_manual(values=c(15, 17))+
scale_size_manual(values=c(2,2)) +
ylim(0, 60) +
scale_x_continuous(breaks = seq(0,2)) +
geom_errorbar(data=data2,aes(x = Messzeitpunkt,ymin=Sum_PCLMean-Sum_PCLSD, ymax=Sum_PCLMean+Sum_PCLSD),
width=.2,position = position_dodge(width = 0.4))
InteractionWithOutliers + theme_apa() +
labs(x ="Measurement Period", y = "PTSS mean scores")
Thank you for your help!
Edit: Data for the Outliers:
Messzeitpunkt Sum_PCL TB2
0 38 no
0 37 yes
0 40 yes
0 41 yes
0 38 yes
1 56 no
1 33 no
2 39 no
2 33 no
Data for the interaction plots:
Messzeitpunkt Sum_PCLMean TB2 Sum_PCLSD
0 9 no 11
0 12 yes 11
1 9 no 15
1 18 yes 16
2 8 no 12
2 14 yes 12
Merging legends can sometimes be painful. If your variables are already labelled (like in your example), then you also don't need to stipulate breaks or labels. (see first example).
However, a good rule is - don't add an aesthetic if you don't really need it. Size and color are constant aesthetics in your case, thus you could (and should) add it as a constant aesthetic outside of aes.
P.S. I have slightly changed the plot in order to make the essential more visible. I personally prefer to keep my plots in an order geoms->scales->coordinates->labels->theme, this helps me keeping an overview over the layers.
library(ggplot2)
data2 <- read.table(text = "Messzeitpunkt Sum_PCL TB2
0 38 no
0 37 yes
0 40 yes
0 41 yes
0 38 yes
1 56 no
1 33 no
2 39 no
2 33 no", head = T)
outliersDF <- read.table(text = "Messzeitpunkt Sum_PCLMean TB2 Sum_PCLSD
0 9 no 11
0 12 yes 11
1 9 no 15
1 18 yes 16
2 8 no 12
2 14 yes 12", head = T)
ggplot() +
geom_line(data = data2, aes(
x = Messzeitpunkt,
y = Sum_PCL, group = TB2, linetype = TB2
)) +
geom_point(data = outliersDF, aes(Messzeitpunkt, Sum_PCLMean,
shape = TB2, color = TB2, size = TB2
)) +
geom_point(data = data2, aes(Messzeitpunkt, Sum_PCL,
shape = TB2, color = TB2, size = TB2
)) +
## if your variable is labelled, no need to specify breaks or labels
scale_color_manual(values = c("#000000", "#000000")) +
scale_shape_manual(values = c(15, 17)) +
scale_size_manual(values = c(2, 2))
## Better, if you have constant aesthetics, not to use aes(), but
## add the values as constants instead
ggplot() +
geom_line(data = data2, aes(
x = Messzeitpunkt,
y = Sum_PCL, group = TB2, linetype = TB2
)) +
geom_point(data = outliersDF, aes(Messzeitpunkt, Sum_PCLMean,
shape = TB2
), size = 2) +
geom_point(data = data2, aes(Messzeitpunkt, Sum_PCL,
shape = TB2
## black color is default, this is just for demonstration
), color = "black", size = 2) +
scale_shape_manual(values = c(15, 17))
Created on 2022-07-15 by the reprex package (v2.0.1)

ggplot pretty scale function not displaying max value

My maximum pretty axis break value is not displayed in my plot. Specifically, the rightmost break value 25 is not displayed in ggplot axis.
breaks = pretty(1:23)
> breaks
#[1] 0 5 10 15 20 25
p <- p + scale_x_continuous(breaks = pretty(1:23))
breaks = pretty(1:23)
> breaks
#[1] 0 5 10 15 20 25
p <- p + scale_x_continuous(breaks = pretty(1:23))
Try setting the limits too:
p <- p + scale_x_continuous(breaks = pretty(1:23), limits = c(0, 25))

How to define day from 6am to 6am on x axis in ggplot?

I am trying to do a bar chart of an aggregate, by the hour.
hourly <- data.frame(
hour = 0:23,
N = 7+0:23,
hour.mod = c(18:23, 0:17))
The day is from 6am to 6am, so I added an offset, hour.mod, and then:
ggplot(hourly, aes(x = hour.mod, y = N)) +
geom_col() +
labs(x = "6am to 6am", y = "Count")
Except, the x-axis scale at 0 contradicts the label. While tinkering with scales: scale_x_discrete(breaks = c(6, 10, 14, 18, 22)) disappeared the scale altogether; which works for now but sub-optimal.
How do I specify x axis to start at an hour other than 0 or 23? Is there way to do so without creating an offset column? I am a novice, so please assume you are explaining to the village idiot.
You don't say what you want to see, but it's fairly clear that you should be using scale_x_continuous and shifting your labels somehow, either "by hand" or with some simple math:
ggplot(hourly, aes(x = hour.mod, y = N)) +
geom_col() +
labs(x = "6am to 6am", y = "Count") +
scale_x_continuous(breaks= c(0,4,8,12,16), labels = c(6, 10, 14, 18, 22) )
Or perhaps:
ggplot(hourly, aes(x = hour.mod, y = N)) +
geom_col() +
labs(x = "6am to 6am", y = "Count") +
scale_x_continuous(breaks= c(6, 10, 14, 18, 22)-6, # shifts all values lower
labels = c(6, 10, 14, 18, 22) )
It's possible you need to use modulo arithmetic, which in R involves the use of %% and %/%:
1:24 %% 12
[1] 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0

ggplot2 facets: Different annotation text for each plot

I have the following generated data frame called Raw_Data:
Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c
I plotted this data with ggplot2:
ggplot(Raw_Data, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
I have the objects: Regression_a, Regression_b, Regression_c. These are the linear regression equations for each plot. Each plot should display the corresponding equation.
Using annotate displays the particular equation on each plot:
annotate("text", x = 1.78, y = 5, label = Regression_a, color="black", size = 5, parse=FALSE)
I tried to overcome the issue with the following code:
Regression_a_eq <- data.frame(x = 1.78, y = 1,label = Regression_a,
Type = "a")
p <- x + geom_text(data = Raw_Data,label = Regression_a)
This did not solve the problem. Each plot still showed Regression_a, rather than just plot a
You can put the expressions as character values in a new dataframe with the same unique Type's as in your data-dataframe and add them with geom_text:
regrDF <- data.frame(Type = c('a','b','c'), lbl = c('Regression_a', 'Regression_b', 'Regression_c'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type ~.)
which gives:
You can replace the text values in regrDF$lbl with the appropriate expressions.
Just a supplementary for the adopted answer if we have facets in both horizontal and vertical directions.
regrDF <- data.frame(Type1 = c('a','a','b','b'),
Type2 = c('c','d','c','d'),
lbl = c('Regression_ac', 'Regression_ad', 'Regression_bc', 'Regression_bd'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type1 ~ Type2)
The answer is good but still imperfect as I do not know how to incorporate math expressions and newline simultaneously (Adding a newline in a substitute() expression).

add geom_bar to circular plot ggplot2

I have a data frame that looks like so:
dat<-structure(list(x = 1:14, y = c(1.26071476002898, 1.97600316441492,
2.41629009067185, 3.48953782319898, 10, 8.49584395945854, 3.80688560348562,
3.07092373734549, 2.96665740569527, 2.73020216450355, 2.39926441554745,
2.4236111796397, 2.63338121290737, 2.13662243060685)), .Names = c("x",
"y"), row.names = c(NA, -14L), class = "data.frame")
x y
1 1.260715
2 1.976003
3 2.416290
4 3.489538
5 10.000000
6 8.495844
7 3.806886
8 3.070924
9 2.966657
10 2.730202
11 2.399264
12 2.423611
13 2.633381
14 2.136622
I'm trying to create a circular plot in ggplot2 where the circle is divided into the 14 data points I have and the length of each of the arcs corresponds to the value of y. Something like this:
My code produces a very weird output with the bars overlapping one another. I have searched everywhere to fix it but no success. Here is my code:
ggplot(dat, aes(x, y)) + geom_bar(breaks = seq(1,14), width = 2, colour = "grey", stat="identity") + coord_polar(start = 0) + scale_x_continuous("", limits = c(1, 14), breaks = seq(1, 14), labels = seq(1, 14))
Please help me… thanks in advance...
It's because you specified width=2. This produces the overlap.
Note also that you don't need any breaks.
Try this:
library(ggplot2)
ggplot(dat, aes(x, y)) +
geom_bar(stat="identity") +
coord_polar(start = 0) +
scale_x_continuous("",breaks = seq(1, 14))

Resources