R hist vs geom_hist break points - r

I am using both geom_hist and histogram in R with the same breakpoints but I get different graphs. I did a quick search, does anyone know what the definition breaks are and why they would be a difference
These produce two different plots.
set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)
pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))
hist(data$Mos,breaks=seq(0, 50, by = 2))
Thanks

To get the same histogram in ggplot2 you specify the breaks inside scale_x_continuous and binwidth inside geom_histogram.
Additionally, hist and histograms in ggplot2 use different defaults to create the intervals:
hist: right-closed (left open) intervals. Default: right = TRUE
stat_bin (ggplot2): left-closed (right open) intervals. Default: right = FALSE
**hist** **ggplot2**
freq1 Freq freq2 Freq
1 (0,2] 0 [0,2) 0
2 (2,4] 2 [2,4) 2
3 (4,6] 2 [4,6) 1
4 (6,8] 1 [6,8) 2
5 (8,10] 6 [8,10) 2
6 (10,12] 9 [10,12) 7
7 (12,14] 24 [12,14) 17
8 (14,16] 27 [14,16) 26
9 (16,18] 39 [16,18) 31
10 (18,20] 48 [18,20) 46
11 (20,22] 52 [20,22) 43
12 (22,24] 38 [22,24) 57
13 (24,26] 44 [24,26) 36
14 (26,28] 46 [26,28) 52
15 (28,30] 39 [28,30) 39
16 (30,32] 31 [30,32) 33
17 (32,34] 30 [32,34) 26
18 (34,36] 24 [34,36) 29
19 (36,38] 18 [36,38) 27
20 (38,40] 9 [38,40) 12
21 (40,42] 5 [40,42) 6
22 (42,44] 4 [42,44) 0
23 (44,46] 1 [44,46) 5
24 (46,48] 1 [46,48) 0
25 (48,50] 0 [48,50) 1
I included the argument right = FALSE so the histogram intervalss are left-closed (right open) as they are in ggplot2. I added the labels in both plots, so it is easier to check the intervals are the same.
ggplot(data, aes(x = Mos))+
geom_histogram(binwidth = 2, colour = "black", fill = "white")+
scale_x_continuous(breaks = seq(0, 50, by = 2))+
stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")
hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)
To check the frequencies in each bin:
freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE)
as.data.frame(table(frecuencias))

Related

Adding text in one of the four facets [duplicate]

This question already has an answer here:
Annotation on only the first facet of ggplot in R?
(1 answer)
Closed last month.
I want to add a few texts in one facet out of four facets in my ggplot.
I am using annotate function to add a text but it generates the text at a given location (x,y) in every facet. Because the data variables have different ranges of y in each facet, the texts are not coming at a desired location (x,y).
Please let me know what should be done. Thanks.
library(dplyr)
library(tidyr)
library(ggplot2)
df%>%
select(Date, Ca, Na, K, Mg)%>%
gather(var,value,-Date)%>%
ggplot(aes(as.Date(Date), value))+
geom_point()+
theme_bw()+
facet_wrap(~var,scales = 'free_y',ncol = 1)+
ylab(" (ppm) (ppm)
(ppm) (ppm)")+
facet_wrap(~var,scales = 'free_y',ncol = 1, strip.position = "right")+
geom_vline(aes(xintercept = as.Date("2021-04-28")), col = "red")+
geom_vline(aes(xintercept = as.Date("2021-04-28")), col = "red")+
geom_vline(aes(xintercept = as.Date("2021-04-29")), col = "red")+
theme(axis.title = element_text(face="bold"))+
theme(axis.text = element_text(face="bold"))+
xlab('Date')+
theme(axis.title.x = element_text(margin = margin(t = 10)))+
theme(axis.title.y = element_text(margin = margin(r = 10)))+
annotate("text", label = "E1", x = as.Date("2021-04-28"), y = 2.8)
This is the code I am using for the desired output. I want to name all the xintercept lines which is E1, E2, E3 (from left to right) on the top of xaxis i.e. above the first facet of variable Ca in the data. Any suggestions?
Here is a part of my data:
df <- read.table(text = "
Date Ca K Mg Na
2/18/2021 1 25 21 19
2/22/2021 2 26 22 20
2/26/2021 3 27 23 21
3/4/2021 4 28 5 22
3/6/2021 5 29 6 8
3/10/2021 6 30 7 9
3/13/2021 7 31 8 10
3/17/2021 8 32 9 11
3/20/2021 9 33 10 12
3/23/2021 10 34 11 13
3/27/2021 11 35 12 14
3/31/2021 12 36 13 15
4/3/2021 13 37 14 16
4/7/2021 14 38 15 17
4/10/2021 15 39 16 18
4/13/2021 16 40 17 19
4/16/2021 17 41 18 20
4/19/2021 8 42 19 21
4/22/2021 9 43 20 22
4/26/2021 0 44 21 23
4/28/2021 1 45 22 24
4/28/2021 2 46 23 25
4/28/2021 3 47 24 26
4/28/2021 5 48 25 27
4/29/2021 6 49 26 28
5/4/2021 7 50 27 29
5/7/2021 8 51 28 30
5/8/2021 9 1 29 31
5/10/2021 1 2 30 32
5/29/2021 3 17 43 45
5/31/2021 6 18 44 46
6/1/2021 4 19 45 47
6/2/2021 8 20 46 48
6/3/2021 2 21 47 49
6/7/2021 3 22 48 50
6/10/2021 5 23 49 51
6/14/2021 3 5 50 1
6/18/2021 1 6 51 2
", header = TRUE)
Prepare the data before plotting, make a separate data for text annotation:
dfplot <- df %>%
select(Date, Ca, Na, K, Mg) %>%
#convert to date class before plotting
mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
#using pivot instead of gather. gather is superseded.
#gather(var, value, -Date)
pivot_longer(cols = 2:5, names_to = "grp", values_to = "ppm")
dftext <- data.frame(grp = "Ca", # we want text to show up only on "Ca" facet.
ppm = max(dfplot[ dfplot$grp == "Ca", "ppm" ]),
Date = as.Date(c("2021-04-27", "2021-04-28", "2021-04-29")),
label = c("E1", "E2", "E3"))
After cleaning up your code, we can use geom_text with dftext:
ggplot(dfplot, aes(Date, ppm)) +
geom_point() +
facet_wrap(~grp, scales = 'free_y',ncol = 1, strip.position = "right") +
geom_vline(xintercept = dftext$Date, col = "red") +
geom_text(aes(x = Date, y = ppm, label = label), data = dftext, nudge_y = -2)
Try using ggrepel library to avoid label overlap, replace geom_text with one of these:
#geom_text_repel(aes(x = Date, y = ppm, label = label), data = dftext)
#geom_label_repel(aes(x = Date, y = ppm, label = label), data = dftext)
After cleaning up the code and seeing the plot, I think this post is a duplicate of Annotation on only the first facet of ggplot in R? .

Remove link between time series and add minor date tick on x_axis in ggplot

I was trying to plot a time series composed of weekly averanges. Here is the plot that I have obtained:
[weekly averages A]
[1]: https://i.stack.imgur.com/XMGMs.png
As you can see the time serie do not cover all the years completely, so, when I have got no data ggplot links two subsequent years. I think I have to group the data in some ways, but I do not understand how. Here is the code:
df4 <- data.frame(df$Date, df$A)
colnames(df4)<- c("date","A")
df4$date <- as.Date(df4$date,"%Y/%m/%d")
df4$week_day <- as.numeric(format(df4$date, format='%w'))
df4$endofweek <- df4$date + (6 - df4$week_day)
week_aveA <- df4 %>%
group_by(endofweek) %>%
summarise_all(list(mean=mean), na.rm=TRUE) %>%
na.omit()
g1 = ggplot() +
geom_step(data=week_aveA, aes(group = 1, x = (endofweek), y = (A_mean)), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="year", labels=date_format("%Y")) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
Here an extraction (the former three years) of the dataset:
endofweek date_mean A_mean week_day_mean
1 20/03/2010 17/03/2010 939,2533437 3
2 27/03/2010 24/03/2010 867,3620121 3
3 03/04/2010 31/03/2010 1426,791222 3
4 10/04/2010 07/04/2010 358,5698314 3
5 17/04/2010 13/04/2010 301,1815352 2
6 24/04/2010 21/04/2010 273,4922895 3,333333333
7 01/05/2010 28/04/2010 128,5989633 3
8 08/05/2010 05/05/2010 447,8858881 3
9 15/05/2010 12/05/2010 387,9828891 3
10 22/05/2010 19/05/2010 138,0770986 3
11 29/05/2010 26/05/2010 370,2147933 3
12 05/06/2010 02/06/2010 139,0451791 3
13 12/06/2010 09/06/2010 217,1286356 3
14 19/06/2010 16/06/2010 72,36972411 3
15 26/06/2010 23/06/2010 282,2911902 3
16 03/07/2010 30/06/2010 324,3215936 3
17 10/07/2010 07/07/2010 210,568691 3
18 17/07/2010 14/07/2010 91,76930829 3
19 24/07/2010 21/07/2010 36,4211218 3,666666667
20 31/07/2010 28/07/2010 37,53981103 3
21 07/08/2010 04/08/2010 91,33282642 3
22 14/08/2010 11/08/2010 28,38587352 3
23 21/08/2010 18/08/2010 58,72836406 3
24 28/08/2010 24/08/2010 102,1050612 2,5
25 04/09/2010 02/09/2010 13,45357513 4,5
26 11/09/2010 08/09/2010 51,24017212 3
27 18/09/2010 15/09/2010 159,7395663 3
28 25/09/2010 21/09/2010 62,71136678 2
29 02/04/2011 31/03/2011 1484,661164 4
30 09/04/2011 06/04/2011 656,1827964 3
31 16/04/2011 13/04/2011 315,3097313 3
32 23/04/2011 20/04/2011 293,2904042 3
33 30/04/2011 26/04/2011 255,7517519 2,4
34 07/05/2011 04/05/2011 360,7035289 3
35 14/05/2011 11/05/2011 342,0902797 3
36 21/05/2011 18/05/2011 386,1380421 3
37 28/05/2011 24/05/2011 418,9624807 2,833333333
38 04/06/2011 01/06/2011 112,7568 3
39 11/06/2011 08/06/2011 85,17855619 3,2
40 18/06/2011 15/06/2011 351,8714638 3
41 25/06/2011 22/06/2011 139,7936898 3
42 02/07/2011 29/06/2011 68,57716191 3,6
43 09/07/2011 06/07/2011 62,31823822 3
44 16/07/2011 13/07/2011 80,7328917 3
45 23/07/2011 20/07/2011 114,9475331 3
46 30/07/2011 27/07/2011 90,13118758 3
47 06/08/2011 03/08/2011 43,29372258 3
48 13/08/2011 10/08/2011 49,39935204 3
49 20/08/2011 16/08/2011 133,746822 2
50 03/09/2011 31/08/2011 76,03928942 3
51 10/09/2011 05/09/2011 27,99834637 1
52 24/03/2012 23/03/2012 366,2625797 5,5
53 31/03/2012 28/03/2012 878,8535513 3
54 07/04/2012 04/04/2012 1029,909052 3
55 14/04/2012 11/04/2012 892,9163416 3
56 21/04/2012 18/04/2012 534,8278693 3
57 28/04/2012 25/04/2012 255,1177585 3
58 05/05/2012 02/05/2012 564,5280546 3
59 12/05/2012 09/05/2012 767,5018168 3
60 19/05/2012 16/05/2012 516,2680148 3
61 26/05/2012 23/05/2012 241,2113073 3
62 02/06/2012 30/05/2012 863,6123397 3
63 09/06/2012 06/06/2012 201,2019288 3
64 16/06/2012 13/06/2012 222,9955486 3
65 23/06/2012 20/06/2012 91,14166632 3
66 30/06/2012 27/06/2012 26,93145693 3
67 07/07/2012 04/07/2012 67,32183278 3
68 14/07/2012 11/07/2012 46,25297513 3
69 21/07/2012 18/07/2012 81,34359825 3,666666667
70 28/07/2012 25/07/2012 49,59130851 3
71 04/08/2012 01/08/2012 44,13438077 3
72 11/08/2012 08/08/2012 30,15773151 3
73 18/08/2012 15/08/2012 57,47256772 3
74 25/08/2012 22/08/2012 31,9109555 3
75 01/09/2012 29/08/2012 52,71058484 3
76 08/09/2012 04/09/2012 24,52495229 2
77 06/04/2013 01/04/2013 1344,388042 1,5
78 13/04/2013 10/04/2013 1304,838687 3
79 20/04/2013 17/04/2013 892,620141 3
80 27/04/2013 24/04/2013 400,1720434 3
81 04/05/2013 01/05/2013 424,8473083 3
82 11/05/2013 08/05/2013 269,2380208 3
83 18/05/2013 15/05/2013 238,9993749 3
84 25/05/2013 22/05/2013 128,4096151 3
85 01/06/2013 29/05/2013 158,5576121 3
86 08/06/2013 05/06/2013 175,2036942 3
87 15/06/2013 12/06/2013 79,20250839 3
88 22/06/2013 19/06/2013 126,9065428 3
89 29/06/2013 26/06/2013 133,7480108 3
90 06/07/2013 03/07/2013 218,0092943 3
91 13/07/2013 10/07/2013 54,08460936 3
92 20/07/2013 17/07/2013 91,54285041 3
93 27/07/2013 24/07/2013 44,64567928 3
94 03/08/2013 31/07/2013 229,5067999 3
95 10/08/2013 07/08/2013 49,70729373 3
96 17/08/2013 14/08/2013 53,38618335 3
97 24/08/2013 21/08/2013 217,2800997 3
98 31/08/2013 28/08/2013 49,43590136 3
99 07/09/2013 04/09/2013 64,88783029 3
100 14/09/2013 11/09/2013 11,04300773 3
So at the end I have one mainly question: how can I eliminated the connection between the years? ... and an aesthetic question: how can I add minor ticks on the x_axis? At least one every 6 months, just to make the plot easy to read.
Thanks in advance for any suggestion!
Edit
This is the code I tried with the suggestion, maybe I mistype some part of it.
library(tidyverse)
library(dplyr)
library(lubridate)
df4 <- data.frame(df$Date, df$A)
colnames(df4)<- c("date","A")
df4$date <- as.Date(df4$date,"%Y/%m/%d")
df4$week_day <- as.numeric(format(df4$date, format='%w'))
df4$endofweek <- df4$date + (6 - df4$week_day)
week_aveA <- df4 %>%
group_by(endofweek) %>%
summarise_all(list(mean=mean), na.rm=TRUE) %>%
na.omit()
week_aveA$endofweek <- as.Date(week_aveA$endofweek,"%d/%m/%Y")
week_aveA$A_mean <- as.numeric(gsub(",", ".", week_aveA$A_mean))
week_aveA$week_day_mean <- as.numeric(gsub(",", ".", week_aveA$week_day_mean))
week_aveA$year <- format(week_aveA$endofweek, "%Y")
library(ggplot2)
library(methods)
library(scales)
mylabel <- function(x) {
ifelse(grepl("-07-01$", x), "", format(x, "%Y"))
}
ggplot() +
geom_step(data=week_aveA, aes(x = endofweek, y = A_mean, group = year), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="6 month", labels = mylabel) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
You have to group by year:
Add a variable with the year to your dataset
Map the year variable on the group aesthetic
For the ticks. Increase the number of the breaks. If you want only ticks but not labels you can use a custom function to get rid of unwanted labels, e.g. my approach below set the breaks to "6 month" but replaces the mid-year labels with an empty string:
week_aveA$endofweek <- as.Date(week_aveA$endofweek,"%d/%m/%Y")
week_aveA$A_mean <- as.numeric(gsub(",", ".", week_aveA$A_mean))
week_aveA$week_day_mean <- as.numeric(gsub(",", ".", week_aveA$week_day_mean))
week_aveA$year <- format(week_aveA$endofweek, "%Y")
library(ggplot2)
mylabel <- function(x) {
ifelse(grepl("-07-01$", x), "", format(x, "%Y"))
}
ggplot() +
geom_step(data=week_aveA, aes(x = endofweek, y = A_mean, group = year), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="6 month", labels = mylabel) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))

How to show legend values as percent

For the following data:
RW GA Freq percFreq
0 0 9 0.13043478
0 3 1 0.01449275
0 14 1 0.01449275
0 16 1 0.01449275
0 23 1 0.01449275
0 25 1 0.01449275
0 29 2 0.02898551
0 30 1 0.01449275
2 30 1 0.01449275
15 30 2 0.02898551
19 30 1 0.01449275
22 30 1 0.01449275
24 30 1 0.01449275
29 30 1 0.01449275
30 29 16 0.23188406
30 30 29 0.42028986
I would like to change the legend values in the following plot to be shown as percent:
The script to generate the plot is:
ggplot(counts, aes(x=RW, y=GA, size=Freq, color=as.factor(percFreq))) + geom_point(alpha=0.7) +
scale_size(range = c(1, 10), name="Freq", limits=c(1,30), breaks=lbreaks) +
scale_color_discrete(name="Freq", breaks=lbreaks)
Basically, instead of showing 0.42028986 in the legend, I want it to be shown as 42%.
How can I do that?
Use 'percent' from 'scales' library.
Load the scales library:
library(scales)
And add labels = percent to your discrete scale:
ggplot(counts, aes(x=RW, y=GA, size=Freq, color=as.factor(percFreq))) +
geom_point(alpha=0.7) +
scale_size(range = c(1, 10), name="Freq", limits=c(1,30), breaks=lbreaks) +
scale_color_discrete(name="Freq", breaks=lbreaks, labels = percent(lbreaks, accuracy = .01))
If you want to change how it rounds the number, use the accuracy argument:
scales::percent(percFreq, accuracy = .001)
(this has accuracy = .1)
Hope this helps.
You can either transform percFreq into percentages
df$percFreq <- df$percFreq*100
or you can color=as.factor(percFreq*100)))
---- Reproducible example
df <- data.frame(RW = round(runif(16,0,30)),
GA=round(runif(16,0,30)),
Freq=round(runif(16,1,30)),
percFreq = runif(16,0.1,0.9))
df$percFreq <- round(df$percFreq*100,digits = 2)
ggplot(df, aes(x=RW, y=GA, size=Freq, color=as.factor(percFreq))) +
geom_point(alpha=0.7) +
scale_size(range = c(1, 10), name="Freq", limits=c(1,30)) +
scale_color_discrete(name="%")
I would advise against, but if you want the % with the numbers, simply paste(df$percFreq,"%",sep=" ")

ggplot2 merge color and fill legends

I want to merge two legends in ggplot2. I use the following code:
ggplot(dat_ribbon, aes(x = x)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax,
group = group, fill = "test4 test5"), alpha = 0.2) +
geom_line(aes(y = y, color = "Test2"), data = dat_m) +
scale_colour_manual(values=c("Test2" = "white", "test"="black", "Test3"="red")) +
scale_fill_manual(values = c("test4 test5"= "dodgerblue4")) +
theme(legend.title=element_blank(),
legend.position = c(0.8, 0.85),
legend.background = element_rect(fill="transparent"),
legend.key = element_rect(colour = 'purple', size = 0.5))
The output is shown below. There are two problems:
When I use two or more words in the fill legend, the alignment becomes wrong
I want to merge the two legends into one, such that the fill legend is just part of a block of 4.
Does anyone know how I can achieve this?
Edit: reproducible data:
dat_m <- read.table(text="x quantile y group
1 1 50 0.4967335 0
2 2 50 0.4978249 0
3 3 50 0.5113562 0
4 4 50 0.4977866 0
5 5 50 0.5013287 0
6 6 50 0.4997994 0
7 7 50 0.4961121 0
8 8 50 0.4991302 0
9 9 50 0.4976087 0
10 10 50 0.5011666 0")
dat_ribbon <- read.table(text="
x ymin group ymax
1 1 0.09779713 40 0.8992385
2 2 0.09979283 40 0.8996875
3 3 0.10309222 40 0.9004759
4 4 0.10058433 40 0.8985366
5 5 0.10259125 40 0.9043807
6 6 0.09643109 40 0.9031940
7 7 0.10199870 40 0.9022920
8 8 0.10018253 40 0.8965690
9 9 0.10292754 40 0.9010934
10 10 0.09399359 40 0.9053067
11 1 0.20164694 30 0.7974174
12 2 0.20082056 30 0.7980642
13 3 0.20837821 30 0.8056074
14 4 0.19903399 30 0.7973723
15 5 0.19903322 30 0.8050146
16 6 0.19965049 30 0.8051922
17 7 0.20592719 30 0.8042850
18 8 0.19810139 30 0.7956606
19 9 0.20537392 30 0.8007527
20 10 0.19325158 30 0.8023044
21 1 0.30016463 20 0.6953927
22 2 0.29803646 20 0.6976961
23 3 0.30803808 20 0.7048137
24 4 0.30045448 20 0.6991248
25 5 0.29562249 20 0.7031225
26 6 0.29647060 20 0.7043499
27 7 0.30159103 20 0.6991356
28 8 0.30369025 20 0.6949053
29 9 0.30196483 20 0.6998127
30 10 0.29578036 20 0.7015861
31 1 0.40045725 10 0.5981147
32 2 0.39796299 10 0.5974115
33 3 0.41056038 10 0.6057062
34 4 0.40046287 10 0.5943157
35 5 0.39708008 10 0.6014512
36 6 0.39594129 10 0.6011162
37 7 0.40052411 10 0.5996186
38 8 0.40128517 10 0.5959748
39 9 0.39917658 10 0.6004600
40 10 0.39791453 10 0.5999168")
You are not using ggplot2 according to its philosophy. That makes things difficult.
ggplot(dat_ribbon, aes(x = x)) +
geom_ribbon(aes(ymin = ymin, ymax = ymax, group = group, fill = "test4 test5"),
alpha = 0.2) +
geom_line(aes(y = y, color = "Test2"), data = dat_m) +
geom_blank(data = data.frame(x = rep(5, 4), y = 0.5,
group = c("test4 test5", "Test2", "test", "Test3")),
aes(y = y, color = group, fill = group)) +
scale_color_manual(name = "combined legend",
values=c("test4 test5"= NA, "Test2" = "white",
"test"="black", "Test3"="red")) +
scale_fill_manual(name = "combined legend",
values = c("test4 test5"= "dodgerblue4",
"Test2" = NA, "test"=NA, "Test3"=NA))

Circular time plots in R with stacked rose

I have a data frame imported in excel with the following values:
> dt <- read.csv(file="teste1.csv",head=TRUE,sep=";")
> dt
hour occur time tt
1 1 one 00:00:59 59
2 2 one 08:40:02 31202
3 3 one 07:09:59 25799
4 4 one 01:22:16 4936
5 5 one 01:30:28 5428
6 6 one 01:28:57 5337
7 7 one 19:05:34 68734
8 8 one 01:57:47 7067
9 9 one 00:13:17 797
10 10 one 12:14:48 44088
11 11 one 23:24:43 84283
12 12 one 13:23:14 48194
13 13 one 02:28:51 8931
14 14 one 14:21:24 51684
15 15 one 13:26:14 48374
16 16 one 00:27:24 1644
17 17 one 15:56:51 57411
18 18 one 11:07:50 40070
19 19 one 07:18:18 26298
20 20 one 07:33:13 27193
21 21 one 10:02:03 36123
22 22 one 11:30:32 41432
23 23 one 21:21:27 76887
24 24 one 00:49:18 2958
25 1 two 21:01:11 75671
26 2 two 11:00:40 39640
27 3 two 21:40:09 78009
28 4 two 01:05:37 3937
29 5 two 00:44:17 2657
30 6 two 12:43:21 45801
31 7 two 10:53:49 39229
32 8 two 08:29:09 30549
33 9 two 05:07:46 18466
34 10 two 17:32:37 63157
35 11 two 09:35:16 34516
36 12 two 03:04:19 11059
37 13 two 23:09:13 83353
38 14 two 01:15:49 4549
39 15 two 14:24:33 51873
40 16 two 01:12:53 4373
41 17 two 21:20:11 76811
42 18 two 02:25:21 8721
43 19 two 01:17:37 4657
44 20 two 15:07:50 54470
45 21 two 22:27:32 80852
46 22 two 01:41:07 6067
47 23 two 09:40:23 34823
48 24 two 05:31:17 19877
I want to create a circular time with stacked rose based on the data frame, ie, each stacked rose are grouped by column occur, and the size is defined by column time.
The column hour indicates the x position of each rose.
So I tried in this way but the result doesn't match with what I want:
ggplot(dt, aes(x = hour, fill = occur)) + geom_histogram(breaks = seq(0,
24), width = 2, colour = "grey") + coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,
24))
What I'm doing wrong? I want something like this http://blog.odotech.com/Portals/57087/images/French%20landfill%20wind%20rose.png
I hope I've explained correctly. Thank you!
Not sure, but hope it helps:
Convert your time value to numeric (I used chron package, but there are numerous other ways, so you don't have to call this library, but it's just to make it more straighforward):
library(chron)
x$tt<-hours(times(x$time))*3600+minutes(times(x$time))*60+seconds(times(x$time))
And make a graph:
p<-ggplot(x, aes(x = hour, y=tt,fill = occur)) +
geom_bar(breaks = seq(0,24), width = 2, colour="grey",stat = "identity") +
theme_minimal() +
scale_fill_brewer()+coord_polar(start=0)+
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,24))
Is that ok?
Here some cases have only 1 colors, but it's due to the scaling issues, as some have time near 24 hours, while others are in seconds only.
You can try separate graphs using facet (it's better to play with colors afterwards :))
p+facet_grid(~occur)+ theme(axis.title.y = theme_blank(),
axis.text.y = theme_blank())
The circular graph is good if you're comparing data by hours, but if you also want to compare differences in occur variable, think it's better to show in old fashion bar graphs.

Resources