geom_bar overlapping labels - r

for simplicity lets suppose we have a database like
# A
1 1
2 2
3 2
4 2
5 3
We have a categorical variable "A" with 3 possible values (1,2,3). And im tring this code:
ggplot(df aes(x="", y=df$A, fill=A))+
geom_bar(width = 1, stat = "identity")
The problem is that the labels are overlapping. Also i want to change the labes for 1,2,3 to x,y,z.
Here is picture of what is happening
And here is a link for the actual data that im using.
https://a.uguu.se/anKhhyEv5b7W_Data.csv

Your graph does not correspond to the sample of data you are showing, so it is hard to be sure that the structure of your real data is actually the same.
Using a random example, I get the following plot:
df <- data.frame(A = sample(1:3,20, replace = TRUE))
library(ggplot2)
ggplot(df, aes(x="A", y=A, fill=as.factor(A)))+
geom_bar(width = 1, stat = "identity") +
scale_fill_discrete(labels = c("x","y","z"))
EDIT: Using data provided by the OP
Here using your data, you should get the following plot:
ggplot(df, aes(x = "A",y = A, fill = as.factor(A)))+
geom_col()
Or if you want the count of each individual values of A, you can do:
library(dplyr)
library(ggplot2)
df %>% group_by(A) %>% count() %>%
ggplot(aes(x = "A", y = n, fill = as.factor(A)))+
geom_col()
Is it what you are looking for ?

Related

R ggplot: Modifying aesthetics of individual lines without recreating entire color palette

In ggplot, is there any simple way of overriding the line attributes of a single group(s) without having to specify the entirety of the color/line pallet via scale_*_manual()?
In the example below, I basically want to make all the boot_* lines gray and skinny, while I want all other lines to retain the default colors/widths otherwise being used.
I know there's a lot of brute ways of doing this by some combo of a) creating some auxiliary variables in the data-frame based on the string-pattern that will server as my color/size group, then b) generating the plot below, extracting all the color-layer info, and then filling out an entire scale_color_manual() and scale_size_manual() map, and c)replacing the 'boot_*' values with "grey."
Are there any versatile shortcuts here?
library(dplyr)
library(ggplot)
set.seed(231)
df=tibble(time=c(1:5), actual=2*time+3, estimate = actual+rnorm(length(actual)))
for(i in 1:8){
df[paste('boot_', i, sep='')] = df$estimate + rnorm(nrow(df))
}
> head(df) %>% data.frame
# time actual estimate boot_1 boot_2 boot_3 boot_4 boot_5 boot_6 boot_7
# 1 1 5 4.466898 4.684295 4.240585 4.786520 5.904332 4.862498 2.092772 4.595850
# 2 2 7 4.688336 4.751258 6.074914 5.694181 3.445036 4.639329 4.548511 5.453597
# 3 3 9 8.045802 7.167972 6.858666 7.519752 7.721405 7.801243 10.156436 9.521482
# 4 4 11 11.262516 11.826206 10.682760 11.137814 11.252465 11.452442 11.925339 11.754248
# 5 5 13 12.526643 12.492315 13.927974 14.176896 11.924183 12.950479 11.257865 13.430229
# boot_8
# 1 3.987001
# 2 3.813539
# 3 7.549984
# 4 11.482360
# 5 11.645106
# Melt for ggplot compatibility
df_long = df %>%
pivot_longer(cols=(-time))
head(df_long) %>% data.frame
# time name value
# 1 1 actual 5.000000
# 2 1 estimate 4.466898
# 3 1 boot_1 4.684295
# 4 1 boot_2 4.240585
# 5 1 boot_3 4.786520
# 6 1 boot_4 5.904332
## The basic ggplot
df_long %>%
ggplot(aes(x=time, y=value, color=name)) + geom_line()
You could just use the first four characters of name for the colour aesthetic (using substr), and the full name as a group aesthetic. It's a bit hacky but it's short, effective, and all gets done in the plotting code without extra data wrangling, post-hoc changes or a long vector of colour mappings.
df_long %>%
ggplot(aes(x = time, y = value, color = substr(name, 1, 4), group = name)) +
geom_line() +
scale_color_manual(labels = c("actual", "boot", "estimate"),
values = c("orange", "gray", "blue3"), name = "name")
An alternative is using filtering to have two sets of lines: one coloured, and one merely grouped. This has the benefit that you don't need to add any scale calls at all:
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value, color = name)) +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name), color = "gray", size = 0.3) +
geom_line()
EDIT
It's pretty difficult to only specify an aesthetic mapping for a single (multiple) group, while leaving the others at default values. However, it is possible using ggnewscale. Here we only have to specify the color of the boot group:
library(ggnewscale)
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value)) +
new_scale_color() +
geom_line(aes(color = name)) +
scale_color_discrete(name = "Variable") +
new_scale_color() +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name, color = "boot"), size = 0.3) +
scale_color_manual(values = "gray", name = "") +
theme(legend.margin = margin(-28, 10, 0, 0))

How to make a dual axis in ggplot R

I have made a time series plot for total count data of 4 different species. As you can see the results with sharksucker have a much higher count than the other 3 species. To see the trends of the other 3 species they need to plotted separately (or on a smaller y axis). However, I have a figure limit in my masters paper. So, I was trying to create a dual axis plot or have the y axis split into two. Does anyone know of a way I could do this?
library(tidyverse)
library(reshape2)
dat <- read_xlsx("ReefPA.xlsx")
dat1 <- dat
dat1$Date <- format(dat1$Date, "%Y/%m")
plot_dat <- dat1 %>%
group_by(Date) %>%
summarise(Sharksucker_Remora = sum(Sharksucker_Remora)) %>%
melt("Date") %>%
filter(Date > '2018-01-01') %>%
arrange(Date)
names(plot_dat) <- c("Date", "Species", "Count")
ggplot(data = plot_dat) +
geom_line(mapping = aes(x = Date, y = Count, group = Species, colour = Species)) +
stat_smooth(method=lm, aes(x = Date, y = Count, group = Species, colour = Species)) +
scale_colour_manual(values=c(Golden_Trevally="goldenrod2", Red_Snapper="firebrick2", Sharksucker_Remora="darkolivegreen3", Juvenile_Remora="aquamarine2")) +
xlab("Date") +
ylab("Total Presence Per Month") +
theme(legend.title = element_blank()) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
The thing is, the problem you're trying to solve doesn't seem like a 2nd Y axis issue. The problem here is of relative scale of the species. You might want to think of something like standardizing the initial species presence to 100 and showing growth or decline from there.
Another option would be faceting by species.

programmatically setting individual axis limits in facets

I need help on setting the individual x-axis limits on different facets as described below.
A programmatical approach is preferred since I will apply the same template to different data sets.
first two facets will have the same x-axis limits (to have comparable bars)
the last facet's (performance) limits will be between 0 and 1, since it is calculated as a percentage
I have seen this and some other related questions but couldn't apply it to my data.
Thanks in advance.
df <-
data.frame(
call_reason = c("a","b","c","d"),
all_records = c(100,200,300,400),
problematic_records = c(80,60,100,80))
df <- df %>% mutate(performance = round(problematic_records/all_records, 2))
df
call_reason all_records problematic_records performance
a 100 80 0.80
b 200 60 0.30
c 300 100 0.33
d 400 80 0.20
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance'))) %>%
ggplot(aes(x=call_reason, y=value)) +
geom_bar(stat="identity") +
coord_flip() +
facet_grid(. ~ facet_group)
So here is one way to go about it with facet_grid(scales = "free_x"), in combination with a geom_blank(). Consider df to be your df at the moment before piping it into ggplot.
ggplot(df, aes(x=call_reason, y=value)) +
# geom_col is equivalent to geom_bar(stat = "identity")
geom_col() +
# geom_blank includes data for position scale training, but is not rendered
geom_blank(data = data.frame(
# value for first two facets is max, last facet is 1
value = c(rep(max(df$value), 2), 1),
# dummy category
call_reason = levels(df$call_reason)[1],
# distribute over facets
facet_group = levels(df$facet_group)
)) +
coord_flip() +
# scales are set to "free_x" to have them vary independently
# it doesn't really, since we've set a geom_blank
facet_grid(. ~ facet_group, scales = "free_x")
As long as your column names remain te same, this should work.
EDIT:
To reorder the call_reason variable, you could add the following in your pipe that goes into ggplot:
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance')),
# In particular the following bit:
call_reason = factor(call_reason, levels(call_reason)[order(value[facet_group == "performance"])]))

Change the boxplot background based in x-variable (ggplot2)

I want to change the background of a boxplot based in x-variables. My code is very simple:
ggplot(data = df, aes(x = variable, y = value)) +
geom_boxplot() +
So, i have 17 x-variables and i generate 17 boxplots in the same picture. I want to change to grey the background of the boxplots from 1 to 4 and from 11 to 14. I donĀ“t know how can i do that.
Thanks.
You must create some factor to aid this process. In the example I created a new feature (tales) in df.
library(tidyverse)
df <- data.frame(variable = rep(base::LETTERS[1:17], 5),
value = runif(17*5, 0, 100))
df <- df %>%
dplyr::mutate(tales = rep(c(rep("x", 4), rep("y", 11-4), rep("w", 17-11)), 5))
ggplot(data = df, aes(x = variable, y = value)) +
geom_boxplot(aes(fill = tales))

Some bars don't reorder in ggplot

My dataframe:
data <- data.frame(commodity = c("A", "A", "B", "C", "C", "D"),
cost = c(1809065, 348456, 203686, 5966690, 172805, 3176424))
data
commodity cost
1 A 1809065
2 A 348456
3 B 203686
4 C 5966690
5 C 172805
6 D 3176424
Next I plot a barplot with reorder:
library(tidyverse)
data %>%
ggplot(aes(x = reorder(factor(commodity), cost), y = cost)) +
geom_bar(stat = "identity", fill = "steelblue3")
What happens next is that most bars are ordered just like I want, but a few aren't. Here's an image of my problematic plot:
you can try
library(tidyverse)
data %>%
ggplot(aes(x = reorder(commodity, cost, sum), y = cost)) +
geom_col(fill = "steelblue3")
Change the default mean function of reorder to sum. Then the order is in line with the bar function of ggplot. Of note, using geom_col is prefered over geom_bar when using stat="identity". If you need a decreased ordering try rev(reorder(commodity, cost, sum)) or create a function by yourself like function(x) -sum(x).
Reorder will by default reorder by the mean value for each group, as explained in the help page. Jimbou's solution is better but you could also do this in a different way by aggregating the data before plotting and using geom_col instead:
data %>%
group_by(commodity) %>%
summarise(cost = sum(cost)) %>%
ggplot(aes(x = reorder(factor(commodity), cost), y = cost)) +
geom_col(fill = "steelblue3")

Resources