Remove some of the X axis labels in ggplot bar chart - r

I have the following code for a stacked bar graph for the period 1970-2020, which is the X-axis label.
The graph is generated from a dataframe with 3 columns and 51 rows representing various years.
Year Active New
1970 1 1
......
2020 268 60
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
As my X-axis is not readable, I would like to remove most of the Years in the X-axis, and keep eg. 1970, 1980, 1990, 2000, 2020 - at their corresponding positions. I'm not sure that scale_x_discrete can do this?

The obvious answer in this specific case is to convert Year to a numeric variable, which will make the breaks pretty by default.
This sample data allows us to run your code and reproduce your issue:
set.seed(1)
df <- data.frame(Year = factor(1970:2020),
Active = cumsum(rnorm(51, 4, 2)),
New = cumsum(rnorm(51, 1, 1)))
Using your exact plotting code produces a similar plot with unreadable axis labels:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
But if we simply convert Year to numeric values, we get the same plot with pretty breaks:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = as.numeric(as.character(Year)), y = Count,
fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
xlab("Year") +
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
If for whatever reason it needs to be a factor, you can leave specific years blank using a labeling function.
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New")) +
scale_x_discrete(labels = function(x) ifelse(as.numeric(x) %% 10, "", x)) +
theme(axis.ticks.length.x = unit(0, "mm"))
Created on 2022-08-19 with reprex v2.0.2

Related

x-axis starting value for diverging plot

How can I change the "x-axis starting value" from the diverging bar chart below (extracted from here), so that the vertical axis is set at 25 instead of 0. And therefore the bars are drawn from 25 and not 0.
For instance, I want this chart:
To look like this:
EDIT
It it not the label I want to change, it is how the data is plotted. My apologies if I wasn't clear. See example below:
Another example to make it clear:
You can provide computed labels to an (x-)scale via scale_x_continuous(labels = function (x) x + 25).
If you also want to change the data, you’ll first need to offset the x-values by the equivalent amount (in the opposite direction):
Example:
df = tibble(Color = c('red', 'green', 'blue'), Divergence = c(5, 10, -5))
offset = 2
df %>%
mutate(Divergence = Divergence - offset) %>%
ggplot() +
aes(x = Divergence, y = Color) +
geom_col() +
scale_x_continuous(labels = function (x) x + offset)
I'm still not 100% clear on your intended outcome but you can "shift" your data by adding/subtracting 25 from each value, e.g.
Original plot:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
subtract 25:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
If you combine that with my original relabelling I think that's the solution:
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change - 25)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(breaks = c(-25, 0, 25, 50),
labels = c(0, 25, 50, 75))
The answers that existed at the time that I'm writing this are suggesting to change the data or to change the label. Here, I'm proposing to change neither the data nor the labels, and instead just change where the starting position of a bar is.
First, for reproducibility, I took #jared_mamrot's approach for the data subset.
library(gapminder)
library(tidyverse)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
Then, you can set xmin = after_scale(25). You'll get a warning that xmin doesn't exists, but it does exist after the bars are reparameterised to rectangles in the ggplot2 internals (which is after the x-scale has seen the data to determine limits). This effectively changes the position where bars start.
ggplot(gapminder_subset,
aes(gdp_change, country)) +
geom_col(aes(xmin = after_scale(25)))
#> Warning: Ignoring unknown aesthetics: xmin
Created on 2021-06-28 by the reprex package (v1.0.0)

displaying data as a line in charts

df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv')
df$countryName = as.character(df$countryName)
I processed the dataset.
Can we show the patient and population charts of the continents as separate line charts on the same chart?
as output;
''date region confirmed
''2020/01/03 europa 850
The data in the output I created are examples. The data in the example are not real.
Here's an approach with dplyr, tidyr and ggplot:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
group_by(region, day) %>%
dplyr::summarize(confirmed = sum(confirmed),
recovered = sum(recovered),
death = sum(death)) %>%
pivot_longer(cols = c("confirmed","recovered","death"), names_to = "condition") %>%
ggplot(aes(x= as.Date(day), y = value, group = region, color = region)) +
geom_line() +
facet_grid(rows = vars(condition), scales = "free_y") +
labs(x = "Date", y = "Number of Individuals")

GGplot order legend using last values on x-axis

I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()
We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.
There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

R dplyr group, ungroup, top_n and ggplot

I have an object with several values including cities, states, year and number of murders. I use dplyr to group it by city and calculate the total murders over all years for the top 10 cities like this:
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_histogram(stat = "identity") +
facet_wrap(~city)
I would like to plot this for only the top ten cities, but 'x = year' is not found because it has been grouped by city. Can anyone explain how I can accomplish this?
EDIT: this the original source data https://interactive.guim.co.uk/2017/feb/09/gva-data/UCR-1985-2015.csv
And here is my code:
Deaths <- read.csv("UCR-1985-2015.csv", stringsAsFactors = F)
MurderRate <- Deaths[, -c(5:35)]
MurderNb <- Deaths[, -c(36:66)]
colnames(MurderNb) <- gsub("X", "", colnames(MurderNb))
colnames(MurderNb) <- gsub("_raw_murder_num", "", colnames(MurderNb))
MurderNb_reshaped <- melt(MurderNb, id = c("city", "Agency", "state", "state_short"))
colnames(MurderNb_reshaped) <- c("city", "Agency", "state", "state_short", "Year", "Murders")
MurderNb_reshaped2 <- MurderNb_reshaped
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)
Ok there were a couple minor issue. This should do the trick:
#this gives you the top cities
topCities <- MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total)
#you then need to filter your original data to be only the data for the top cities
MurderNb_reshaped2 <- filter(MurderNb_reshaped2, city %in% topCities$city)
ggplot(data = MurderNb_reshaped2, aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)

Resources