x axis tick mark labels of different length starting at same position - r

I have problems with the axis labels and tick marks in ggplot2.
The x axis displays different length classes, the y axis the number of individuals. How can I make all the tick mark labels of the length classes of the x axis start at the same position (at the top)? at the moment the shorter labels e.g. (51-60) are centered, whereas the longer (121-130) ones are written at a higher position. how can I arrange them so that they start at the same level/position in height? also I do not know why it does not display my x and y axis titles.
Thanks for the help!
ggplot(data=ALL, aes(x=Langenklasse_Zahl, y=Anz.10.ha)) +
geom_bar(stat="identity")+
scale_x_continuous(name="Längenklasse")+
scale_y_continuous(name="Anzahl Bachforellen")+
scale_y_continuous(limits=c(0, 84))+
scale_x_discrete(breaks=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32"),
labels=c("31-40","41-50","51-60","61-70","71-80","81-90","91-100","101-110","111-120","121-130","131-140", "141-150", "151-160", "161-170", "171-180", "181-190", "191-200", "201-210", "211-220", "221-230", "231-240", "241-250", "251-260", "261-270", "271-280", "281-290", "291-300", "301-310", "311-320", "321-330","331-340", ">340"))+
theme(axis.title.y = element_text(vjust=1.3, size=15),
axis.text.y = element_text(vjust=0.5, size=15),
axis.title.x = element_text(vjust=-.5, size=15),
axis.text.x = element_text(angle=90,vjust=0.5, size=15))+
ggtitle("Längendiagramm der kanalisierten Strecke im Mai 2014") +
theme(plot.title = element_text(lineheight=3, size=20, face="bold"))
the data:
Langenklasse_Zahl Langenklasse Anz 10 ha
1 31-40 0
2 41-50 0
3 51-60 0
4 61-70 0
5 71-80 0
6 81-90 0
7 91-100 0
8 101-110 3
9 111-120 12
10 121-130 12
11 131-140 15
12 141-150 9
13 151-160 9
14 161-170 6
15 171-180 3
16 181-190 0
17 191-200 3
18 201-210 3
19 211-220 0
20 221-230 0
21 231-240 0
22 241-250 0
23 251-260 3
24 261-270 3
25 271-280 9
26 281-290 0
27 291-300 3
28 301-310 3
29 311-320 0
30 321-330 3
31 331-340 0
32 >340 6

To have your x axis labels in the same starting position, add hjust=1 or hjust=0 to the theme() elemet axis.text.x=
+ theme(axis.text.x = element_text(angle=90,vjust=0.5, size=15,hjust=1))
Your axis titles are not displayed because you have to scale_x_continuous() and scale_y_continuous() calls. Move the titles of axis to the same scale_... call where you provide breaks, label and limits, for example,
+ scale_y_continuous(name="Anzahl Bachforellen",limits=c(0, 84))+

Related

Align x labels in flipped bar chart

I am trying to align all of my x-labels, where they are left justified, and start from the same point. In the code below, when I set hjust=-.01, it basically looks correct:
However, if I try to nudge it a bit further to the right, by setting hjust=-.05, everything falls out of alignment:
ggplot(dt.summ, aes(x=reorder(dialogue_act,n), y=n)) +
geom_col(aes(alpha=.3)) +
geom_text(aes(y=-.5, x=dialogue_act, label=dialogue_act), hjust=-.01, size=3) +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
scale_y_continuous(expand = c(0, 0)) +
coord_flip()
How can I correct this?
Data:
> print(dt.summ, n=nrow(dt.summ))
# A tibble: 27 × 2
dialogue_act n
<chr> <int>
1 Statement-non-opinion 2650
2 Statement-opinion 666
3 Yes-No-Question 483
4 Wh-Question 255
5 Appreciation 211
6 Conventional-closing 107
7 Conventional-opening 83
8 Agree/Accept 77
9 Declarative Yes-No-Question 71
10 Acknowledge (Backchannel) 60
11 Open-Question 56
12 Action-directive 27
13 Repeat-phrase 22
14 Quotation 18
15 Collaborative Completion 16
16 Signal-non-understanding 13
17 Negative Non-no Answers 11
18 Backchannel in Question Form 8
19 No Answers 8
20 Apology 7
21 Hold Before Answer/Agreement 7
22 Or-Clause 6
23 Rhetorical-Question 6
24 Offers, Options Commits 4
25 Hedge 3
26 Other 2
27 Self-talk 2
Answered my own question. Changed hjust=0 and aes(y=100).
ggplot(dt.summ, aes(x=reorder(dialogue_act,n), y=n)) +
geom_col(aes(alpha=.3)) +
geom_text(aes(y=100, x=dialogue_act, label=dialogue_act), hjust=0, size=3) +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
scale_y_continuous(expand = c(0, 0)) +
coord_flip()

time series aesthetics with ggplot2

hello I have tried to graph the following data
I have tried to graph the following time series
fecha importaciones
1 Ene\n1994 171.0
2 Feb\n1994 170.7
3 Mar\n1994 183.7
4 Abr\n1994 214.6
5 May\n1994 227.2
6 Jun\n1994 221.1
7 Jul\n1994 216.4
8 Ago\n1994 235.3
9 Sep\n1994 227.0
10 Oct\n1994 216.0
11 Nov\n1994 221.5
12 Dic\n1994 270.9
13 Ene\n1995 250.4
14 Feb\n1995 259.6
15 Mar\n1995 258.2
16 Abr\n1995 232.9
17 May\n1995 335.0
18 Jun\n1995 295.2
19 Jul\n1995 302.5
20 Ago\n1995 283.3
21 Sep\n1995 264.4
22 Oct\n1995 277.6
23 Nov\n1995 289.1
24 Dic\n1995 280.5
25 Ene\n1996 252.4
26 Feb\n1996 250.1
.
.
.
320 Ago\n2020 794.6
321 Sep\n2020 938.2
322 Oct\n2020 966.3
323 Nov\n2020 958.9
324 Dic\n2020 1059.2
325 Ene\n2021 1056.2
326 Feb\n2021 982.5
I graph it with office cal
but trying to plot it in R with ggplot
ggplot(datos, aes(x = fecha, y = importaciones)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
I have tried to graph with all the possible steps but it does not fit me in a correct way for someone to guide me
Change the x-axis to date class.
library(ggplot2)
df$fecha <- lubridate::dmy(paste0(1, df$fecha))
ggplot(datos, aes(x = fecha, y = importaciones, group = 1)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
You can use scale_x_date to change the breaks and display format of dates on x-axis.

Add a percent to y axis labels [duplicate]

This question already has answers here:
How can I change the Y-axis figures into percentages in a barplot?
(4 answers)
Closed 4 years ago.
I'm sure I missed an obvious solution tot his problem but I can't figure out how to add a percent sign to the y axis labels.
Data Sample:
Provider Month Total_Count Total_Visits Procedures RX State
Roberts 2 19 19 0 0 IL
Allen 2 85 81 4 4 IL
Dawson 2 34 34 0 0 CA
Engle 2 104 100 4 4 CA
Goldbloom 2 7 6 1 1 NM
Nathan 2 221 192 29 20 NM
Castro 2 6 6 0 0 AK
Sherwin 2 24 24 0 0 AK
Brown 2 282 270 12 12 UT
Jackson 2 114 96 18 16 UT
Corwin 2 22 22 0 0 CO
Dorris 2 124 102 22 22 CO
Ferris 2 427 318 109 108 OH
Jeffries 2 319 237 82 67 OH
The following code gives graphs with inaccurate values because R seems to be multiplying by 100.
procs <- read.csv(paste0(dirdata, "Procedure percents Feb.csv"))
procs$Percentage <- round(procs$Procedures/procs$Total.Visits*100, 2)
procs$Percentage[is.na(procs$Percentage)] <- 0
procsplit <- split(procs, procs$State)
plots <- function(procs) {
ggplot(data = procs, aes(x= Provider, y= Percentage, fill= Percentage)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(x = Provider, y = Percentage, label = sprintf("%.1f%%", Percentage)), position = position_dodge(width = 0.9), hjust = .5, vjust = 0 , angle = 0) +
theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
ggtitle("Procedure Percentages- February 2018", procs$State) +
theme(plot.title = element_text(size = 22, hjust = .5, family = "serif")) +
theme(plot.subtitle = element_text(size = 18, hjust = .5, family = "serif")) +
scale_y_continuous(name = "Percentage", labels = percent)
}
lapply(procsplit, plots)
I'm not sure if there's a way to use sprintf to add it or if there's a way to paste it onto the labels.
adding + scale_y_continuous(labels = function(x) paste0(x, "%")) to the ggplot statement fixes this issue

R hist vs geom_hist break points

I am using both geom_hist and histogram in R with the same breakpoints but I get different graphs. I did a quick search, does anyone know what the definition breaks are and why they would be a difference
These produce two different plots.
set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)
pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))
hist(data$Mos,breaks=seq(0, 50, by = 2))
Thanks
To get the same histogram in ggplot2 you specify the breaks inside scale_x_continuous and binwidth inside geom_histogram.
Additionally, hist and histograms in ggplot2 use different defaults to create the intervals:
hist: right-closed (left open) intervals. Default: right = TRUE
stat_bin (ggplot2): left-closed (right open) intervals. Default: right = FALSE
**hist** **ggplot2**
freq1 Freq freq2 Freq
1 (0,2] 0 [0,2) 0
2 (2,4] 2 [2,4) 2
3 (4,6] 2 [4,6) 1
4 (6,8] 1 [6,8) 2
5 (8,10] 6 [8,10) 2
6 (10,12] 9 [10,12) 7
7 (12,14] 24 [12,14) 17
8 (14,16] 27 [14,16) 26
9 (16,18] 39 [16,18) 31
10 (18,20] 48 [18,20) 46
11 (20,22] 52 [20,22) 43
12 (22,24] 38 [22,24) 57
13 (24,26] 44 [24,26) 36
14 (26,28] 46 [26,28) 52
15 (28,30] 39 [28,30) 39
16 (30,32] 31 [30,32) 33
17 (32,34] 30 [32,34) 26
18 (34,36] 24 [34,36) 29
19 (36,38] 18 [36,38) 27
20 (38,40] 9 [38,40) 12
21 (40,42] 5 [40,42) 6
22 (42,44] 4 [42,44) 0
23 (44,46] 1 [44,46) 5
24 (46,48] 1 [46,48) 0
25 (48,50] 0 [48,50) 1
I included the argument right = FALSE so the histogram intervalss are left-closed (right open) as they are in ggplot2. I added the labels in both plots, so it is easier to check the intervals are the same.
ggplot(data, aes(x = Mos))+
geom_histogram(binwidth = 2, colour = "black", fill = "white")+
scale_x_continuous(breaks = seq(0, 50, by = 2))+
stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")
hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)
To check the frequencies in each bin:
freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE)
as.data.frame(table(frecuencias))

Circular time plots in R with stacked rose

I have a data frame imported in excel with the following values:
> dt <- read.csv(file="teste1.csv",head=TRUE,sep=";")
> dt
hour occur time tt
1 1 one 00:00:59 59
2 2 one 08:40:02 31202
3 3 one 07:09:59 25799
4 4 one 01:22:16 4936
5 5 one 01:30:28 5428
6 6 one 01:28:57 5337
7 7 one 19:05:34 68734
8 8 one 01:57:47 7067
9 9 one 00:13:17 797
10 10 one 12:14:48 44088
11 11 one 23:24:43 84283
12 12 one 13:23:14 48194
13 13 one 02:28:51 8931
14 14 one 14:21:24 51684
15 15 one 13:26:14 48374
16 16 one 00:27:24 1644
17 17 one 15:56:51 57411
18 18 one 11:07:50 40070
19 19 one 07:18:18 26298
20 20 one 07:33:13 27193
21 21 one 10:02:03 36123
22 22 one 11:30:32 41432
23 23 one 21:21:27 76887
24 24 one 00:49:18 2958
25 1 two 21:01:11 75671
26 2 two 11:00:40 39640
27 3 two 21:40:09 78009
28 4 two 01:05:37 3937
29 5 two 00:44:17 2657
30 6 two 12:43:21 45801
31 7 two 10:53:49 39229
32 8 two 08:29:09 30549
33 9 two 05:07:46 18466
34 10 two 17:32:37 63157
35 11 two 09:35:16 34516
36 12 two 03:04:19 11059
37 13 two 23:09:13 83353
38 14 two 01:15:49 4549
39 15 two 14:24:33 51873
40 16 two 01:12:53 4373
41 17 two 21:20:11 76811
42 18 two 02:25:21 8721
43 19 two 01:17:37 4657
44 20 two 15:07:50 54470
45 21 two 22:27:32 80852
46 22 two 01:41:07 6067
47 23 two 09:40:23 34823
48 24 two 05:31:17 19877
I want to create a circular time with stacked rose based on the data frame, ie, each stacked rose are grouped by column occur, and the size is defined by column time.
The column hour indicates the x position of each rose.
So I tried in this way but the result doesn't match with what I want:
ggplot(dt, aes(x = hour, fill = occur)) + geom_histogram(breaks = seq(0,
24), width = 2, colour = "grey") + coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,
24))
What I'm doing wrong? I want something like this http://blog.odotech.com/Portals/57087/images/French%20landfill%20wind%20rose.png
I hope I've explained correctly. Thank you!
Not sure, but hope it helps:
Convert your time value to numeric (I used chron package, but there are numerous other ways, so you don't have to call this library, but it's just to make it more straighforward):
library(chron)
x$tt<-hours(times(x$time))*3600+minutes(times(x$time))*60+seconds(times(x$time))
And make a graph:
p<-ggplot(x, aes(x = hour, y=tt,fill = occur)) +
geom_bar(breaks = seq(0,24), width = 2, colour="grey",stat = "identity") +
theme_minimal() +
scale_fill_brewer()+coord_polar(start=0)+
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,24))
Is that ok?
Here some cases have only 1 colors, but it's due to the scaling issues, as some have time near 24 hours, while others are in seconds only.
You can try separate graphs using facet (it's better to play with colors afterwards :))
p+facet_grid(~occur)+ theme(axis.title.y = theme_blank(),
axis.text.y = theme_blank())
The circular graph is good if you're comparing data by hours, but if you also want to compare differences in occur variable, think it's better to show in old fashion bar graphs.

Resources