R: barplot {base} complete missing integer values on x axis - r

I have a barplot of counts per year. Year 2006 is missing from my dataset; but I want to dislay it as 0 on x axis. I think that should be possible by converting df$year to factor and setting the factor levels, following an example here: R - how to make barplot plot zeros for missing values over the data range? but I can't make it work.
df<-data.frame(year = c(2005,2007,2008, 2009),
area = c(10,20,30,15))
barplot(df$area)
My not working attempt:
barplot(df$area,
names.arg = factor(df$year,
levels = 2005:2009))

We can merge with a full dataset, replace the NA elements to 0 and then do the barplot
df1 <- merge(data.frame(year = min(df$year):max(df$year)), df, all.x=TRUE)
df1$area[is.na(df1$area)] <- 0
barplot(setNames(df1$area, df1$year))
-output
This can also be done with tidyverse
library(tidyverse)
df %>%
complete(year = min(year):max(year), fill = list(area = 0)) %>%
ggplot(., aes(year, area)) +
geom_bar(stat = 'identity') +
theme_bw()
-output

Related

Arrange weekdays starting on Sunday

everyone!
How can I arrange weekdays, starting on Sunday, in R? I got the weekdays using lubridate's function weekdays(), but the days appears randomly (image attached) and I can't seem to find a way to sort it. I tried the arrange function, but I guess it only works with numeric values. A bar chart looks very weird starting on Friday. This is what the code looks like:
my_dataset <- my_dataset %>%
mutate(weekDay = weekdays(Date))
my_dataset %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = weekDay, y = steps))+
geom_bar(stat = "identity")
Thanks!
I tried the arrange function, but I guess it only works with numeric values.
Your weekDay-vector probably is of the class character. This will be arranged in alphabetical order by ggplot. The solution to this is to convert this character-vector into a factor-class.
There are several ways to get the x-axis in the order you would like to see. All of them mean to convert weekDays into a factor.
In order to come close to your example I have at first created a data frame with weekdays and some data. As those are both created randomly a seed was set to make the code reproducible.
One method is to create the data.frame with summaries and then to define in this DF weekdays as a factor with defined levels.
This can also be done within the ggplot-call when creating the aesthetics.
library(tidyverse)
set.seed(111)
myData <- data.frame(
weekDay = sample(weekdays(Sys.Date() + 0:6), 100, replace = TRUE),
TotalSteps = sample(1000:8000, 100)
)
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) -> DF # new data.frame
# the following defines weekDay as a factor and also sets
# the sequence of factor levels. This sequence is then taken
# by ggplot to construct the x-axis.
DF$weekDay <- factor(DF$weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
))
ggplot(DF, aes(x = weekDay, y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")
# the factor can also be defined within the ggplot-call
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = factor(weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
)), y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")

Iteratively plotting all columns in ggplot

I have a dataframe of tempratures where each column represents a year from 1996 to 2015 and rows are data from 1-Jul to Oct-31:
head(df)
[![Dataframe head][1]][1]
I am trying to create a line plot with x= DAYS and y=temp per year. when I use DAYS in the loop, either with aes() or aes_strint() it doesn't produce anything:
iterator <- c(colnames(df))[-1]
g <- ggplot(df, aes_string(x = 'DAY'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
so I added an index column which is just integers from 1 to 123. Now the same code plots a bunch of lines but very strange:
df$index <- c(1:123)
iterator <- c(colnames(df))[-1]
iterator <- iterator[-21]
g <- ggplot(df, aes_string(x = 'index'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
[![Final plot][2]][2]
as you can see, I have one line per column name and all the Colum names are stacking above each other. This has compressed the vertical axis so much that the variations in temperature is not visible. I wish my y-axis just goes from 50 to 100 and there will be one line per column name there with the same scale as other columns. How do I do that?
[1]: https://i.stack.imgur.com/ruF11.png
[2]: https://i.stack.imgur.com/gAvMe.png
Agree with Andrew's solution. Just a minor change: you have to remove the "df" on 3rd line as you declared it already in the beginning.
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(aes(x = DAYS, y = temp, group = column)) +
geom_line()
I think you could rearrange your data frame, e.g. using the tidyr package, so that you have a data frame with "year", "day" and "temp" columns
library(ggplot2)
library(tidyr)
year1 = c(5,6,4,5)
year2 = c(6,5,5,6)
year3 = c(3,4,3,4)
date = c("a", "b", "c", "d")
data = data.frame(date, year1, year2, year3)
data2 = gather(data , "year", "temp", -date)
Then, you can easily plot the temperature per year.
ggplot(data2, aes(x = date, y = temp, group = year, color = year))+
geom_path()
If you're doing something with loops in R, especially with ggplot2, you are probably doing something wrong. I'm not 100% sure why you're looping at all, when you probably want to do something more like,
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(df, aes(x = day, y = temp, group = column)) +
geom_line()
but without a reprex / data set I can't be sure if that's what you want.

programmatically setting individual axis limits in facets

I need help on setting the individual x-axis limits on different facets as described below.
A programmatical approach is preferred since I will apply the same template to different data sets.
first two facets will have the same x-axis limits (to have comparable bars)
the last facet's (performance) limits will be between 0 and 1, since it is calculated as a percentage
I have seen this and some other related questions but couldn't apply it to my data.
Thanks in advance.
df <-
data.frame(
call_reason = c("a","b","c","d"),
all_records = c(100,200,300,400),
problematic_records = c(80,60,100,80))
df <- df %>% mutate(performance = round(problematic_records/all_records, 2))
df
call_reason all_records problematic_records performance
a 100 80 0.80
b 200 60 0.30
c 300 100 0.33
d 400 80 0.20
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance'))) %>%
ggplot(aes(x=call_reason, y=value)) +
geom_bar(stat="identity") +
coord_flip() +
facet_grid(. ~ facet_group)
So here is one way to go about it with facet_grid(scales = "free_x"), in combination with a geom_blank(). Consider df to be your df at the moment before piping it into ggplot.
ggplot(df, aes(x=call_reason, y=value)) +
# geom_col is equivalent to geom_bar(stat = "identity")
geom_col() +
# geom_blank includes data for position scale training, but is not rendered
geom_blank(data = data.frame(
# value for first two facets is max, last facet is 1
value = c(rep(max(df$value), 2), 1),
# dummy category
call_reason = levels(df$call_reason)[1],
# distribute over facets
facet_group = levels(df$facet_group)
)) +
coord_flip() +
# scales are set to "free_x" to have them vary independently
# it doesn't really, since we've set a geom_blank
facet_grid(. ~ facet_group, scales = "free_x")
As long as your column names remain te same, this should work.
EDIT:
To reorder the call_reason variable, you could add the following in your pipe that goes into ggplot:
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance')),
# In particular the following bit:
call_reason = factor(call_reason, levels(call_reason)[order(value[facet_group == "performance"])]))

Plotting a factor column against two columns in one bar plot with ggplot

I want to make a bar plot using ggplot where one column is a factor of 0/1 and the other two columns represent count
my dataset looks as follows:
id in out
0 30036 547148
1 213176 23902
I managed plotting id against in and out separately but cant figure out how to merge both plots together so that 0 has two bars (in/out) representing the counts and the same for 1
df <- read.table(header = T, stringsAsFactors = T,
text = "id in out
`0` 30036 547148
`1` 213176 23902")
library(tidyverse)
df %>%
gather(direction, value, -id) %>%
ggplot(aes(id, value, fill = direction)) +
geom_col(position = "dodge")

ggplot2 - geom_line of cumulative counts of factor levels

I want to plot the cumulative counts of level OK of factor X (*), over time (column Date). I am not sure what is the best strategy, whether or not I should create a new data frame with a summary column, or if there is a ggplot2 way of doing this.
Sample data
DF <- data.frame(
Date = as.Date(c("2018-01-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-03-01", "2018-04-01") ),
X = factor(rep("OK", 6), levels = c("OK", "NOK")),
Group = factor(c(rep("A", 4), "B", "B"))
)
DF <- rbind(DF, list(as.Date("2018-02-01"), factor("NOK"), "A"))
From similar questions I tried this:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='bin')
Using stat='count' (as the answer to this question) is even worse:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='count')
which shows the counts for factor levels (*), but not the accumulation over time.
Desperate measure - count with table
I tried creating a new data frame with counts using table like this:
cum <- as.data.frame(table(DF$Date, DF$Group))
ggplot(cum, aes(Var1, cumsum(Freq), col = Var2, group = Var2)) +
geom_line()
Is there a way to do this with ggplot2? Do I need to create a new column with cumsum? If so, how should I cumsum the factor levels, by date?
(*) Obs: I could just filter the data frame to use only the intended levels with DF[X == "OK"], but I am sure someone can find a smarter solution.
One option using dplyr and ggplot2 can be as:
library(dplyr)
library(ggplot2)
DF %>% group_by(Group) %>%
arrange(Date) %>%
mutate(Value = cumsum(X=="OK")) %>%
ggplot(aes(Date, y=Value, group = Group, col = Group)) + geom_line()

Resources