Getting an error while trying to apply the labels of geom_text to a subset of bars - r

I want to apply geom_text to a particular set of variables.
I have, for example:
decade count
<dbl> <int>
1 1930 505
2 1940 630
3 1950 806
4 1960 446
5 1970 469
6 1980 807
7 1990 1057
8 2000 1856
9 2010 2133
My plot looks like this:
My plot
So, I want to add some labels to each bar, where the year has to be shown. For the bars with a value of >= 500, I want the label to be inside the bar, for the rest I want it to be outside.
I tried to do this with geom_text:
geom_col(fill = THISBLUE,
width = 0.7) +
geom_text(data = subset(data, count >= 500)
aes(0, y = name, label = name))
However, I get this error message:
Error in count >= 500 : comparison (5) is possible only for atomic and list types

How about this approach?
ggplot(df,aes(decade,count)) +
geom_col(fill = "blue", width = 4) +
coord_flip() +
geom_text(data = subset(df, count >= 500), aes(label = count),nudge_y = -100,color="white") +
geom_text(data = subset(df, count < 500), aes(label = count),nudge_y = 100,color="black")
Input:
df =tribble(
~decade,~count,
1930, 505,
1940, 630,
1950, 806,
1960, 446,
1970, 469,
1980, 807,
1990, 1057,
2000, 1856,
2010, 2133
)

Related

Axis of a plot with quarters and year

Im working with time series given in quarter in R and I want to make a graph as follows:
I have search all over the internet but I haven't get any answer
If you have a time series object like this:
d <- ts(c(265, 280, 288, 280, 278, 292, 298, 282), start = 2016, frequency = 4)
d
#> Qtr1 Qtr2 Qtr3 Qtr4
#> 2016 265 280 288 280
#> 2017 278 292 298 282
Then you could get a reasonable replica of the plot by doing:
library(ggplot2)
data.frame(date = time(d), SALES = d) |>
within(year <- floor(date)) |>
within(Quarter <- paste0("Q", date %% 1 * 4 + 1)) |>
ggplot(aes(interaction(Quarter, year), SALES, group = 1)) +
geom_line(linewidth = 2, color = "#497dba") +
scale_y_continuous("PRICE (millions)", labels = scales::dollar,
breaks = 24:31 * 10, limits = c(240, 310)) +
scale_x_discrete(NULL, labels = ~sub("\\.", "\n", .x)) +
theme_classic(base_size = 20)
Created on 2023-02-19 with reprex v2.0.2

How to position a label text at the outside end of horizontal bar graph (ggplot) when label is unrelated to fill?

I have a dataset that looks like the following below. I am making a horizontal bar chart with ggplot that plots number of students (filled by status) but I want the label (the total students who completed the work to be at the end of the bar so you can see the total n but also % completed. I have the code I am currently using. I'm trying to figure out how to do hjust based on the rate rather then the number of students. Do I need to adjust the dataframe with new info or different info to make this easier or can I use something with hjust to put the label at the outside end of the bar chart?
State Number of Students Rate Denominator Status Label
1 CT 4500 0.471 8500 Completed 47.1%
2 CT 4000 0.471 8500 Not Completed <NA>
3 OK 4375 0.653 6700 Completed 65.3%
4 OK 2325 0.653 6700 Not Completed <NA>
5 TX 5040 0.70 7200 Completed 70.0%
6 TX 2160 0.70 7200 Not Completed <NA>
ec <- ggplot(data, aes(y=reorder(State, Rate), x= `Number of Students`, fill = Status)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Dark2") +
ggtitle("Completed Rates by State")+
ylab("State") +
geom_text(aes(label=label))
df <- data.frame(state = c('CT', 'CT',
'OK', 'OK',
'TX', 'TX'),
numberOfStudents = c(4500, 4000,
4375, 2325,
5040, 2160),
denominator = c(8500, 8500,
6700, 7600,
7200, 7200),
rate = c( 0.471,
0.471,
0.653,
0.653,
0.70, 0.70),
status = c('Completed', 'Not Completed',
'Completed','Not Completed',
'Completed', 'Not Completed'),
label = c('47.1%', NA, '65.3%', NA, '70.0%', NA)
)
ggplot() +
geom_bar(data = df,
mapping = aes(y=reorder(state, rate),
x= numberOfStudents,
fill = status),
stat = "identity") +
scale_fill_brewer(palette = "Dark2") +
ggtitle("Completed Rates by State")+
ylab("State") +
xlim(0, 9000)+
geom_text(data = df,
mapping = aes(y = state,
x = denominator,
label=label),
hjust = 0)

changing the bins in stat_sum for overlapping points

I am plotting the count of samples found at each site. So the more samples, the bigger the plot point should be. The counts range from 1 to 2500. By default the legend automatically decides to display and bin the data into a range that I would like to expand on the lower end. It starts at 500, but I would like to see it start at 100, but still not get so excessive that I have more than 10 bins
I have tried to understand the scale_size_binned command as I kind of feel this is where it would work, but when I try to set the limits or the range, it makes a legend item show for a smaller and larger count but with no numbers. So, there would be a smaller circle above the 500 with no value and a larger circle below 2500 with no value. It also puts in tick marks in the legend between values. In reality, all but 4 values are actually under 500 so Im not even sure why the scale starts at 500 by default. The range is 12-2683 mean is 339.
Here is some of my data:
A tibble: 33 x 3
# Groups: Site [20]
Site Sample n
<fct> <chr> <int>
1 1(20A) 20A 279
2 1(20C) 20C 99
3 2(20C) 20C 158
4 3(25C) 25C 170
5 4(25C) 25C 117
6 5(20B) 20B 72
7 6(20F) 20F 369
8 7(19D) 19D 218
9 8(20E) 20E 1044
10 9(20F) 20F 427
Here is the code Ive used
nb.cols = 20
mycolors <- colorRampPalette(brewer.pal(12, "Paired"))(nb.cols) #tryin to find a palette with 33 colors
ggplot(data, aes(x = Sample, y = Site)) +
stat_sum(aes(size = n, color= Site)) +
scale_color_manual(values = mycolors)+
guides(color = "none")+
labs(x = "Sample", y = "Site", size = "Found")+
theme_classic(base_size=14, base_family='serif')
We could adapt the the size legend with scale_size_area() function and the breaks argument:
I have adapted the code a little:
Removed stat_sum and put it in geom_point() aesthetics:
For your original data: scale_size_area(breaks = c(100, 500, 1000, 1500, 2000, 2500)) +
library(tidyverse)
ggplot(df, aes(x = Sample, y = Site)) +
geom_point(aes(color = factor(n), size = n)) +
scale_size_area(breaks = seq(100,1000,100)) +
scale_color_manual(values = mycolors) +
guides(color = "none")+
labs(x = "Sample", y = "Site", size = "Found")+
theme_classic(base_size=14, base_family='serif')
#TarJae
Thank you for your answer. I also figured out another way is
scale_size_continuous(breaks = c(25, 50, 100, 500,1000,2500),
labels = c(25,50, 100, 500, 1000,2500))+
Then I do not need scale_size_area

Adjusting the secondary y axis using ggplot

I am trying to graph two different datasets, reconstructed temperatures (10-16) and charcoal data (0-140), with two different time series values, using ggplot. Is this possible?
I used this code (see below) but unfortunately it produced a plot (see below) that limits the variability of the temperature reconstruction. Is there a way to adjust the y axis so we can see more variability in the temperature record?
Thank you very much for your support.
R code
df <- data.frame(Charfiretempdata$AGETEMPS, Charfiretempdata$FIREAGE, Charfiretempdata$Comp2TEMPS,Charfiretempdata$Char.Acc.Rate..Char...cm.2.yr.1.)
ggplot(df) +
geom_col(mapping = aes(x = Charfiretempdata$FIREAGE,
y = Charfiretempdata$Char.Acc.Rate..Char...cm.2.yr.1. * 16/150), size = 2, color = "darkblue",
fill = "white") +
geom_line(mapping = aes(x = Charfiretempdata$AGETEMPS, y = Charfiretempdata$Comp2TEMPS)) +
geom_point(mapping = aes(x = Charfiretempdata$AGETEMPS, y = Charfiretempdata$Comp2TEMPS), size
= 3, shape = 21, fill = "white")+
scale_y_continuous(
name = expression("Temperature ("~degree~"C)"),
sec.axis = sec_axis(~ . * 150/16 , name = "Charcoal (mm)"))
R plot
I create a random sample data that would share similar characteristics to your data.
library(dplyr)
library(ggplot2)
set.seed(282930)
df <- tibble(x_axis = c(1400, 1500, 1600, 2000, 2001, 2002, 2003, 2004, 2005, 2006,
2007, 2008, 2009, 2010, 2011, 2012, 2013, 2015, 2016, 2017),
y_axis_1 = runif(20, min = 10, max = 16),
y_axis_2 = runif(20, min = 0, max = 150))
Here is the df
> df
# A tibble: 20 x 3
x_axis y_axis_1 y_axis_2
<dbl> <dbl> <dbl>
1 1400 15.7 5.28
2 1500 11.8 141.
3 1600 14.5 149.
4 2000 11.6 121.
5 2001 15.6 37.3
6 2002 15.0 72.5
7 2003 10.7 130.
8 2004 15.4 84.7
9 2005 11.5 118.
10 2006 10.4 17.4
11 2007 11.3 124.
12 2008 13.6 22.6
13 2009 13.0 14.5
14 2010 15.9 142.
15 2011 12.3 103.
16 2012 10.3 131.
17 2013 12.6 93.6
18 2015 14.6 12.4
19 2016 11.4 27.9
20 2017 15.3 116.
Here is the ggplot similar to your but with the different Axis adjustment
ggplot(df,
# as they sharing same X-axis you can define share variable aes in the
# main call of ggplot
aes(x = x_axis)) +
geom_col(mapping =
# added 10 to 2nd axis value as will scale from 10 instead of 0
aes(y = (y_axis_2 * 10 / 150) + 10),
# the size here is size of the border - and due to the nature of
# your data, the col suppose to be very thin to match with that one
# tick on x-axis - so the inner fill is covered by dark blue border
size = 2, color = "darkblue",
# The fill is not really useful as you cannot see it.
fill = "white") +
geom_line(mapping = aes(y = y_axis_1)) +
geom_point(mapping = aes(y = y_axis_1), size
= 3, shape = 21, fill = "white") +
# Set the main Axis start at 10 instead of 0 so it would allow more zoom into it
coord_cartesian(ylim = c(10, 20), expand = c(0, 0)) +
scale_y_continuous(
name = expression("Temperature ("~degree~"C)"),
# The calculation of second axis lable is calculate base on 1st axis.
# and as the 1st axis start at 10, there fore the fomular need to minus 10
# before multiply back 15 - I keep 150 / 10 so it clear reverse of original
# transform of the 2nd axis value above.
sec.axis = sec_axis(~ (. - 10) * 150 / 10 , name = "Charcoal (mm)"))
Here is the sample output plot
And even with the adjsut y-axis we can hardly see the temperature at the end of the data because there are a lot more data points at the end. I think if you don't need all of data point at the end you may just take every 10 x as the data was on the range of 600 years so you don't need to graph so much details at the end. And if you need details just graph that time frame separately
Filter data at the end to only take every 10 year instead
ggplot(df %>% filter(x_axis <= 2000 | x_axis %% 10 == 0),
aes(x = x_axis)) +
# similar code to above but I use geom_bar instead
geom_bar(mapping =
aes(y = (y_axis_2 * 10 / 150) + 10),
stat = "identity", size = 2, color = "darkblue",
fill = "white") +
geom_line(mapping = aes(y = y_axis_1)) +
geom_point(mapping = aes(y = y_axis_1), size
= 3, shape = 21, fill = "white")+
scale_y_continuous(
name = expression("Temperature ("~degree~"C)"),
sec.axis = sec_axis(~ (. - 10) * 150/10 , name = "Charcoal (mm)")) +
coord_cartesian(ylim = c(10, 20), expand = c(0, 0))
(As you can see that with less data point, we started to see the fill as plot have more space)
Zoom in at the end of the data
ggplot(df %>% filter(x_axis >= 2000),
aes(x = x_axis)) +
# similar code to above but I use geom_bar instead
geom_bar(mapping =
aes(y = (y_axis_2 * 10 / 150) + 10),
stat = "identity", size = 2, color = "darkblue",
fill = "white") +
geom_line(mapping = aes(y = y_axis_1)) +
geom_point(mapping = aes(y = y_axis_1), size
= 3, shape = 21, fill = "white")+
scale_y_continuous(
name = expression("Temperature ("~degree~"C)"),
sec.axis = sec_axis(~ (. - 10) * 150/10 , name = "Charcoal (mm)")) +
coord_cartesian(ylim = c(10, 20), expand = c(0, 0))
(Now we can see both the darkblue border and the white fill inside)

Draw group bar plot without reshaping data

I have following data.table
Golds Bronzes Silvers Country
1: 930 639 728 USA
2: 247 320 284 GER
3: 192 234 212 FRA
and I want to draw a group bar plot with country in the x-axis and number of medals in the y-axis. For each country, the graph should have 3 bars indicating Gold, Silver & Bronze. Is there a way to do that with ggplot wnad without melting data.?
The standard barplot function accepts a matrix of heights:
barplot(as.matrix(x[, 1:3]), beside = TRUE,
legend.text = x$Country)
Update: To plot it the other way around you can transpose the matrix:
barplot(t(as.matrix(x[, 1:3])),
beside = TRUE,
names.arg = x$Country,
legend.text = names(x)[1:3])
create a data frame
test <- data.frame(Country = c("USA", "GER", "FRA"),
Golds = c(930, 247, 192),
Bronzes = c(639, 320, 234),
Silvers = c(728, 284, 212))
plot in one step
test %>%
gather(key = "award", value = "number", -Country) %>%
ggplot(aes(x = Country, y = number, color = award, fill = award)) +
geom_col(position = "dodge")

Resources