wrong labeling in ggplot pie chart - r

I am trying to create a pie chart for percentage values, when I try to label them the labeling is wrong,
I mean the values are pointing the wrong place in the graph.
ggplot(Consumption_building_type, aes(x="", y=percentage, fill=Building_type))+ geom_bar(width = 0.5,stat ="identity")+coord_polar(theta = "y",direction = -1)+geom_text(aes(x=1.3,y = percentage/3 + c(0, cumsum(percentage)[- length(percentage)]),label = round(Consumption_building_type$percentage,0))) + theme_void()+ scale_fill_brewer(palette="GnBu")+ggtitle("Breakdown of building types")+theme_minimal()
This is the code I used and this is the result I got:
When I change the direction=1 both the graph and the labels shift
the data I used
structure(list(
Building_type = c("Commercial", "Industrial", "Institutional", "Large residential",
"Large Residential", "Residential", "Small residential"),
Total_consumption_GJ = c(99665694, 5970695, 10801610, 63699633,
16616981, 24373766, 70488556),
average_consumption_GJ = c(281541.508474576, 72813.3536585366, 109107.171717172,
677655.670212766, 213038.217948718, 123099.828282828, 640805.054545455),
total = c(354L, 82L, 99L, 94L, 78L, 198L, 110L),
percentage = c(34.8768472906404, 8.07881773399015, 9.75369458128079,
9.26108374384236, 7.68472906403941, 19.5073891625616, 10.8374384236453)),
.Names = c("Building_type", "Total_consumption_GJ", "average_consumption_GJ", "total", "percentage"),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L)))
Really sorry about the data a new user not sure how to paste the data

Update for ggplot 2.0+
ggplot 2.0+ has some new parameters for position_stack() that make solving this problem much simpler. There's no need to calculate the center point of each bar manually (though that solution may still be preferred in some situations and is therefore preserved below). Instead, we can simply use the "vjust" parameter of position_stack():
g <- ggplot(Consumption_building_type, aes(x="", y=percentage, fill=Building_type))+
geom_bar(width = 0.5,stat ="identity")+
coord_polar(theta = "y",direction = 1)+
geom_text(aes(x=1.3,y = percentage, label = round(Consumption_building_type$percentage,0)), position = position_stack(vjust = 0.5)) +
scale_fill_brewer(palette="GnBu")+ggtitle("Breakdown of building types")+theme_minimal() +
labs(x = NULL)
General solution: calculating the midpoint of stacked bars manually
I'm assuming that your goal is to place a label for each bar at the bar's center point. In that case, first we can calculate the center point and add it to the data frame:
Consumption_building_type$zone.start <- with(Consumption_building_type, c(0, cumsum(percentage)[-length(percentage)]))
Consumption_building_type$zone.end <- with(Consumption_building_type, cumsum(percentage))
Consumption_building_type$label.point <- with(Consumption_building_type, (zone.start + zone.end) / 2)
Building_type Total_consumption_GJ average_consumption_GJ total percentage zone.start zone.end label.point
1 Commercial 99665694 281541.51 354 34.87 0.00 34.87 17.435
2 Industrial 5970695 72813.35 82 8.07 34.87 42.94 38.905
3 Institutional 10801610 109107.17 99 9.75 42.94 52.69 47.815
4 Large residential 63699633 677655.67 94 9.26 52.69 61.95 57.320
5 Large Residential 16616981 213038.22 78 7.68 61.95 69.63 65.790
6 Residential 24373766 123099.83 198 19.50 69.63 89.13 79.380
7 Small residential 70488556 640805.05 110 10.83 89.13 99.96 94.545
And then the y aesthetic in geom_label() is simply the newly created "label.point" column.
I've also added labs(x = NULL) so that there are no empty quote marks on the y-axis of the final plot.
new.plot <- ggplot(Consumption_building_type, aes(x="", y=percentage, fill=Building_type))+
geom_bar(width = 0.5,stat ="identity")+
coord_polar(theta = "y",direction = 1)+
geom_text(aes(x=1.3,y = label.point, label = round(Consumption_building_type$percentage,0))) +
scale_fill_brewer(palette="GnBu")+ggtitle("Breakdown of building types")+theme_minimal()

Related

How to plot Quarterly and Year-to-Date values in ggplot?

Raw data
structure(list(attainment_target = c(7.5, 15), quarter_2022 = c("Q1",
"Q2"), total_attainment = c(2, 4), percent_attainment = c(0.2666,
0.2666)), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
Quarter | Target | Attainment
2022-01-01 7.5 2
2022-04-01 15 4
Scenario
I would like to plot a ggplot (geom_col or geom_bar) with Quarter as x-axis and Attainment as y-axis with Target as a horizontal dash line that shows how far off I am from that value.
However, I am having trouble plotting YTD (Total attainment given # of quarters) in the same plot. Here is an example of how I used dplyr to create new field that shows calculated YTD value:
Desired output
Quarter | Target | Attainment | YTD. | % Attainment
2022-01-01 7.5 2 2 27
2022-04-01 15 4 6 40
Which is the best way to plot this via ggplot in R? Here is my current approach but having trouble incorporating all the above:
df1 <- df %>%
mutate(YTD_TOTAL = sum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = sum(total_attainment) / max(attainment_target))
ggplot(data = df1, aes(fill=quarter_2022, x=attainment_target, y=total_attainment, color = quarter_2022, palette = "Paired",
label = TRUE,
position = position_dodge(0.9)))
Not sure exactly what you have in mind but here are some of the pieces you might want to use:
df %>%
mutate(YTD_TOTAL = cumsum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = YTD_TOTAL/ attainment_target) %>%
ggplot(aes(quarter_2022, total_attainment)) +
geom_col(aes(y = YTD_TOTAL), fill = NA, color = "gray20") +
geom_text(aes(y = YTD_TOTAL, label = scales::percent(YTD_PERCENT_ATTAINMENT)),
vjust = -0.5) +
geom_col(fill = "gray70", color = "gray20") +
geom_text(aes(label = total_attainment),
position = position_stack(vjust = 0.5)) +
geom_segment(aes(x = as.numeric(as.factor(quarter_2022)) - 0.4,
xend = as.numeric(as.factor(quarter_2022)) + 0.4,
y = attainment_target, yend = attainment_target),
linetype = "dashed")

Consistent mapping from value to color in ggplot

I think I'm missing something very easy here, but I just can't figure it out at the moment:
I would like to consistently assign colors to certain values from a column across multiple plots.
So I have this tibble (sl):
# A tibble: 15 x 3
class hex x
<chr> <chr> <int>
1 translational slide #c23b22 1
2 rotational slide #AFC6CE 2
3 fast flow-type #b7bf5e 3
4 complex #A6CEE3 4
5 area subject to rockfall/topple #1F78B4 5
6 fall-type #B2DF8A 6
7 n.d. #33A02C 7
8 NA #FB9A99 8
9 area subject to shallow-slides #E31A1C 9
10 slow flow-type #FDBF6F 10
11 topple #FF7F00 11
12 deep-seated movement #CAB2D6 12
13 subsidence #6A3D9A 13
14 areas subject to subsidence #FFFF99 14
15 area of expansion #B15928 15
This should recreate it:
structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
Now I would like to plot each class with a bar in the color if its hex-code (for now just for visualization purposes). So I did:
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = sl$hex) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But these are not the colors as they are in the tibble.
So I tried to follow this guide: How to assign colors to categorical variables in ggplot2 that have stable mapping? and created this:
# create the color palette
mycols = sl$hex
names(mycols) = sl$class
and then plotted it with
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = mycols) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But the results is the same. It's this:
For example the translational slide has the hex code: "#c23b22" and should be a pastell darkish red.
Anyone might have an idea what I'm missing here?
Consider this:
sl <- structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
sl$class <- factor( sl$class, levels=unique(sl$class) )
cl <- sl$hex
names(cl) <- paste( sl$class )
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual( values = cl, na.value = cl["NA"] ) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
By changing class to a factor and setting levels to it, and using a named vector for your values in scale_fill_manual, and using na.value in there properly, yo might get something that looks more as expected.
You need to provide correct order to colors as per your column, since there is already one called 'x' I have used it as well. Also I replaced NA with character 'NA'. I have checked few of them, Please let me know if this is not the desired output. Thanks
#Assuming df is your dataframe:
df[is.na(df$class), 'class'] <- 'NA'
ggplot(df) +
geom_col(aes(x = x,
y = 1,
fill = factor(x))) +
scale_fill_manual(values = df$hex, labels=df$class) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
Output:
I think the problem is that scale_fill_manual expect the order of its values and labels arguments to match. This isn't the case with your dataset.
Does
sl %>% ggplot() +
geom_col(aes(x = x,
y = 1,
fill = hex)) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90) +
scale_fill_manual(values=sl$hex, labels=sl$class)
Give you what you want?
next time, please dput() your test data: it took me as long to create the test dataset as to answer your question. Also, using hex codes for colours make it difficult to check the colours are as expected. For a MWE, blue/green/black etx would have been more helpful.

How to put gestational age in weeks.days on x-axis in ggplot

I am trying to plot weight of a fetus over time.
The y-axis is fetal weight in grams
The x-axis needs to be formatted as the following:
7 weeks 3 days == 27.3
29 weeks 6 days == 29.6
etc
My data (df) looks something like this
weight age
2013 22.4
2302 25.6
2804 27.2
3011 29.1
I have tried something like this... but not sure how to adjust the scale...
ggplot(df, aes(x = age, y = weight)) +
geom_point() +
scale_x_continuous()
If I get the actual numeric value for the age (i.e. 22.4 == 22weeks + 4/7days == 22.57),
Is it possible to label the corresponding age value with the label i want?
For example...
weight age.label age.value
2013 22.4 22.57
2302 25.6 25.86
2804 27.2 27.29
3011 29.1 29.14
When I call this:
df <- df %>% mutate(age.label = as.character(age.label))
ggplot(df, aes(x = age.value, y = weight)) +
geom_point() +
scale_x_continuous(label = "age.label")
I get the following...
Error in f(..., self = self) : Breaks and labels are different lengths
Any help much appreciated
I borrowed from this answer and this one, to create a variable ticks labels that uses formatting to seperate the days and the weeks.
I have supplied three different methods.
Simply places ticks at every day point but does not number them.
Numbers the days and the weeks correctly and distinguishes between them by making weeks bold and days light grey.
Same as 2 but uses size. This method doesn't work very well, as it creates a large gap between the labels and the plot. It has been included for completeness... and in the hope somebody says how to fix it.
The plot below is the second method.
I think the vertical tick lines could also be coloured so that some of them disappear if you want as well.
library(ggplot2)
library(tidyverse)
df<-read.table(header=TRUE, text="weight age.label age.value
2013 22.4 22.57
2302 25.6 25.86
2804 27.2 27.29
3011 29.1 29.14")
#have ticks for every day using 1/7 distance tick marks
ggplot(df, aes(x = age.value, y = weight)) +
geom_point() +
scale_x_continuous(limits=c(22, 30),
minor_breaks = seq(from = 1, to = 33, by = 1/7),
breaks = 1:30)
#create a df of tick marks labels containing day number and week number
breaks_labels_df <- data.frame(breaks = seq(from = 1, to = 33, by = 1/7)) %>%
mutate(minors= rep(0:6, length.out = nrow(.)),
break_label = ifelse(minors == 0, breaks, minors))
#plot both day number and week number differentiating between them by the label formatting.
#remove the minor tick lines to reduce the busyness of the plot
ggplot(df, aes(x = age.value, y = weight)) +
geom_point() +
scale_x_continuous(limits=c(22, 30),
breaks = seq(from = 1, to = 33, by = 1/7),
labels = breaks_labels_df$break_label) +
theme(axis.text.x = element_text(color = c("grey60","grey60","black",rep("grey60",4)),
size = 8, angle = 0,
hjust = .5, vjust = .5,
face = c("plain","plain","bold",rep("plain",4))),
panel.grid.minor.x = element_blank()) +
labs(title = "Baby weight in relation to age", x = "Age in weeks and days", y = "weight in grams")
#Changing the font size places a large gap between the tick labels and the axis
ggplot(df, aes(x = age.value, y = weight)) +
geom_point() +
scale_x_continuous(limits=c(22, 30),
breaks = seq(from = 1, to = 33, by = 1/7),
labels = breaks_labels_df$break_label) +
theme(axis.text.x = element_text(vjust = 0, size = c(8,8,12,rep(8,4)),
margin = margin(t = 0), lineheight = 0))
In order to add labels to the plot, use the geom_text function in the ggplot2 package. One can use the "hjust" and "vjust" to fine tune the placement.
df<-read.table(header=TRUE, text="weight age
2013 22.4
2302 25.6
2804 27.2
3011 29.1")
library(dplyr)
library(ggplot2)
#calculate the proper decimal value for axis
df<-df %>%mutate(age.value=floor(age)+ (age-floor(age))*10/7) %>% round(2)
ggplot(df, aes(x = age.value, y = weight)) +
geom_point() +
scale_x_continuous(limits=c(20, 30)) +
geom_text(aes(label = age), hjust = -.2, vjust=.1)

How to visualize two column in bar chart using R?

I don't know if my question clear enough...
I have this table
Name Mark_Oral Mark_Written Total_M_Oral Total_M_Written
1 Hercule Poirot 50 49 858 781
2 Joe O'Neil 70 79 1056 1083
3 John McAuley 81 99 1219 1333
and I have to visualize the last two column in bar chart using R to compare student total mark
Data
table <- structure(list(Name = c("Hercule Poirot", "Joe O'Neil", "John McAuley"),
Mark_Oral = c(50L, 70L, 81L),
Mark_Written = c(49L, 79L, 99L),
Total_M_Oral = c(858L, 1056L, 1219L),
Total_M_Written = c(781L, 1083L, 1333L)),
.Names = c("Name", "Mark_Oral", "Mark_Written", "Total_M_Oral", "Total_M_Written"),
row.names = c("1", "2", "3"), class = "data.frame")
You can use + to combine other plots on the same ggplot object. For example:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
With ggplot2 (as your tags suggest) the syntax is:
ggplot(data = table,aes(x= Total_M_Oral,y=Total_M_Written))+geom_bar(stat = "identity")
Where table is replaced by the name of your dataframe.
Edit
I was unsure that my first answer really answered your question (multiple uses of bars).
Create dummy data
df<-data.frame(x = rpois(n = 100,lambda = 800),y = rpois(n = 100,lambda = 800))
With previous plot:
If you want to count and have a color for Oral and one for written
df2<-data.frame(x = c(df$x,df$y),y = rep(c("written","oral"),each = nrow(df)))
ggplot(data = df2,aes(x= x,fill=y),alpha = I(0.5))+geom_bar(stat = "count")
Which gives:
Comment: alpha parameter is not necessary, it just deals with the transparency so that you can see when there are overlapping bars.
With student names
df3<-data.frame(name = rep(table$Name,times = 2),
y = c(table$Total_M_Oral, table$Total_M_Written),
fill = rep(c("oral","written"),each = nrow(table)))
ggplot(data = df3, aes(x = name,y= y,fill = fill,alpha = 0.5))+geom_bar(stat= "identity")

Overlaying whiskers or error-bar-esque lines on a ggplot

I am creating plots similar to the first example image below, and need plots like the second example below.
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# plot 2015 data
ggplot(data.2015, aes(x = area, y = score, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major)
The data.2014 has only values for the "Findings" group. I would like to show those 2014 Findings values on the plot, on the appropriate/corresponding data.2015$area, where there is 2014 data available.
To show last year's data just on the "Finding" (red bars) data, I'd like to use a one-sided errorbar/whisker that emanates from the value of the relevant data.2015 bar, and terminates at the data.2014 value, for example:
I thought to do this by using layers and plotting error bars so that the 2015 data could overlap, however this doesn't work when the 2014 result is abs() smaller than the 2015 result and is thus occluded.
Considerations:
I'd like the errorbar/whisker to be the same width as the bars, perhaps even dashed line with a solid cap.
Bonus points for a red line when the value has decreased, and green when the value has increased
I generate lots of these plots in a loop, sometimes with many groups, with a different amount of areas in each plot. The 2014 data is (at this stage) always displayed only for a single group, and every area has some data (except for just one NA case, but need to provision for that scenario)
EDIT
So I've added to the below solution, I used that exact code but instead used the geom_linerange so that it would add lines without the caps, then I also used the geom_errorbar, but with ymin and ymax set to the same value, so that the result is a one-sided error bar in ggplot geom_bar! Thanks for the help.
I believe you can get most of what you want with a little data manipulation. Doing an outer join of the two datasets will let you add the error bars with the appropriate dodging.
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
To make the error bar one-sided, you'll want ymin to be either the same as y or NA depending on the group. It seemed easiest to make a new variable, which I called plotscore, to achieve this.
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
The last thing I did is to make a variable direction for when the 2015 score decreased vs increased compared to 2014. I included a third category for the Benchmark group as filler because I ran into some issues with the dodging without it.
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
The dataset used for plotting would look like this:
area group score.2015 score.2014 plotscore direction
1 first Benchmark -40 NA NA absent
2 first Findings -50 -30 -50 dec
3 second Benchmark -10 NA NA absent
4 second Findings 20 40 20 dec
5 third Benchmark 60 NA NA absent
6 third Findings 15 -15 15 inc
The final code I used looked like this:
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(aes(ymin = plotscore, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))
I'm using the development version of ggplot2, ggplot2_1.0.1.9002, and show_guide is now deprecated in favor of show.legend, which I used in geom_errorbar.
I obviously didn't change the line type of the error bars to dashed with a solid cap, nor did I remove the bottom whisker as I don't know an easy way to do either of these things.
In response to a comment suggesting I add the full solution as an answer:
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# reconfigure data to create values for the additional errorbar/linerange
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
# set the data min and max as the same to have a single 'cap' with no line
geom_errorbar(aes(ymin = score.2014, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
#then add the line
geom_linerange(aes(ymin = score.2015, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))

Resources