ggplot2 - geom_line of cumulative counts of factor levels - r

I want to plot the cumulative counts of level OK of factor X (*), over time (column Date). I am not sure what is the best strategy, whether or not I should create a new data frame with a summary column, or if there is a ggplot2 way of doing this.
Sample data
DF <- data.frame(
Date = as.Date(c("2018-01-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-03-01", "2018-04-01") ),
X = factor(rep("OK", 6), levels = c("OK", "NOK")),
Group = factor(c(rep("A", 4), "B", "B"))
)
DF <- rbind(DF, list(as.Date("2018-02-01"), factor("NOK"), "A"))
From similar questions I tried this:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='bin')
Using stat='count' (as the answer to this question) is even worse:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='count')
which shows the counts for factor levels (*), but not the accumulation over time.
Desperate measure - count with table
I tried creating a new data frame with counts using table like this:
cum <- as.data.frame(table(DF$Date, DF$Group))
ggplot(cum, aes(Var1, cumsum(Freq), col = Var2, group = Var2)) +
geom_line()
Is there a way to do this with ggplot2? Do I need to create a new column with cumsum? If so, how should I cumsum the factor levels, by date?
(*) Obs: I could just filter the data frame to use only the intended levels with DF[X == "OK"], but I am sure someone can find a smarter solution.

One option using dplyr and ggplot2 can be as:
library(dplyr)
library(ggplot2)
DF %>% group_by(Group) %>%
arrange(Date) %>%
mutate(Value = cumsum(X=="OK")) %>%
ggplot(aes(Date, y=Value, group = Group, col = Group)) + geom_line()

Related

Arrange weekdays starting on Sunday

everyone!
How can I arrange weekdays, starting on Sunday, in R? I got the weekdays using lubridate's function weekdays(), but the days appears randomly (image attached) and I can't seem to find a way to sort it. I tried the arrange function, but I guess it only works with numeric values. A bar chart looks very weird starting on Friday. This is what the code looks like:
my_dataset <- my_dataset %>%
mutate(weekDay = weekdays(Date))
my_dataset %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = weekDay, y = steps))+
geom_bar(stat = "identity")
Thanks!
I tried the arrange function, but I guess it only works with numeric values.
Your weekDay-vector probably is of the class character. This will be arranged in alphabetical order by ggplot. The solution to this is to convert this character-vector into a factor-class.
There are several ways to get the x-axis in the order you would like to see. All of them mean to convert weekDays into a factor.
In order to come close to your example I have at first created a data frame with weekdays and some data. As those are both created randomly a seed was set to make the code reproducible.
One method is to create the data.frame with summaries and then to define in this DF weekdays as a factor with defined levels.
This can also be done within the ggplot-call when creating the aesthetics.
library(tidyverse)
set.seed(111)
myData <- data.frame(
weekDay = sample(weekdays(Sys.Date() + 0:6), 100, replace = TRUE),
TotalSteps = sample(1000:8000, 100)
)
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) -> DF # new data.frame
# the following defines weekDay as a factor and also sets
# the sequence of factor levels. This sequence is then taken
# by ggplot to construct the x-axis.
DF$weekDay <- factor(DF$weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
))
ggplot(DF, aes(x = weekDay, y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")
# the factor can also be defined within the ggplot-call
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = factor(weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
)), y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")

Iteratively plotting all columns in ggplot

I have a dataframe of tempratures where each column represents a year from 1996 to 2015 and rows are data from 1-Jul to Oct-31:
head(df)
[![Dataframe head][1]][1]
I am trying to create a line plot with x= DAYS and y=temp per year. when I use DAYS in the loop, either with aes() or aes_strint() it doesn't produce anything:
iterator <- c(colnames(df))[-1]
g <- ggplot(df, aes_string(x = 'DAY'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
so I added an index column which is just integers from 1 to 123. Now the same code plots a bunch of lines but very strange:
df$index <- c(1:123)
iterator <- c(colnames(df))[-1]
iterator <- iterator[-21]
g <- ggplot(df, aes_string(x = 'index'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
[![Final plot][2]][2]
as you can see, I have one line per column name and all the Colum names are stacking above each other. This has compressed the vertical axis so much that the variations in temperature is not visible. I wish my y-axis just goes from 50 to 100 and there will be one line per column name there with the same scale as other columns. How do I do that?
[1]: https://i.stack.imgur.com/ruF11.png
[2]: https://i.stack.imgur.com/gAvMe.png
Agree with Andrew's solution. Just a minor change: you have to remove the "df" on 3rd line as you declared it already in the beginning.
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(aes(x = DAYS, y = temp, group = column)) +
geom_line()
I think you could rearrange your data frame, e.g. using the tidyr package, so that you have a data frame with "year", "day" and "temp" columns
library(ggplot2)
library(tidyr)
year1 = c(5,6,4,5)
year2 = c(6,5,5,6)
year3 = c(3,4,3,4)
date = c("a", "b", "c", "d")
data = data.frame(date, year1, year2, year3)
data2 = gather(data , "year", "temp", -date)
Then, you can easily plot the temperature per year.
ggplot(data2, aes(x = date, y = temp, group = year, color = year))+
geom_path()
If you're doing something with loops in R, especially with ggplot2, you are probably doing something wrong. I'm not 100% sure why you're looping at all, when you probably want to do something more like,
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(df, aes(x = day, y = temp, group = column)) +
geom_line()
but without a reprex / data set I can't be sure if that's what you want.

Using ggplot to group in two different ways

I have data that looks kinda like this:
id = rep(1:33,3)
year = rep(1:3,33)
group = sample(c(1:3),99, replace=T)
test_result = sample(c(TRUE,FALSE), size=99, replace = T)
df = data.frame(id, year, group, test_result)
df$year = as.factor(year)
df$group = as.factor(group)
My goal is to visualize it so that I can see how group number and year relate to test_result.
df %>%
group_by(id,year) %>%
summarize(x=sum(test_result)) %>%
ggplot() +
geom_histogram(aes(fill = year,
x = x),
binwidth = 1,
position='dodge') +
theme_minimal()
gets me almost all the way there. What I want is to be able to add something like facet_wrap(group~.) to the end of this to show how these change by group but obviously group is not part of the aggregated dataframe.
Right now my best solution is just to show multiple plots like
df %>% filter(group==1) # Replace group number here
group_by(id,year) %>%
summarize(x=sum(test_result)) %>%
ggplot() +
geom_histogram(aes(fill = year,
x = x),
binwidth = 1,
position='dodge') +
theme_minimal()
but I'd love to figure out how to put them all in one figure and I'm wondering if maybe the way to do that is to put more of the grouping logic into ggplot?

How to plot multiple facets histogram with ggplot in r?

i have a dataframe structured like this
Elem. Category. SEZa SEZb SEZc
A. ONE. 1. 3. 4
B. TWO. 4. 5. 6
i want to plot three histograms in three different facets (SEZa, SEZb, SEZc) with ggplot where the x values are the category values (ONE. e TWO.) and the y values are the number present in columns SEZa, SEZb, SEZc.
something like this:
how can I do? thank you for your suggestions!
Assume df is your data.frame, I would first convert from wide format to a long format:
new_df <- reshape2::melt(df, id.vars = c("Elem", "Category"))
And then make the plot using geom_col() instead of geom_histogram() because it seems you've precomputed the y-values and wouldn't need ggplot to calculate these values for you.
ggplot(new_df, aes(x = Category, y = value, fill = Elem)) +
geom_col() +
facet_grid(variable ~ .)
I think that what you are looking for is something like this :
library(ggplot2)
library(reshape2)
df <- data.frame(Category = c("One", "Two"),
SEZa = c(1, 4),
SEZb = c(3, 5),
SEZc = c(4, 6))
df <- melt(df)
ggplot(df, aes(x = Category, y = value)) +
geom_col(aes(fill = variable)) +
facet_grid(variable ~ .)
My inspiration is :
http://felixfan.github.io/stacking-plots-same-x/

R: barplot {base} complete missing integer values on x axis

I have a barplot of counts per year. Year 2006 is missing from my dataset; but I want to dislay it as 0 on x axis. I think that should be possible by converting df$year to factor and setting the factor levels, following an example here: R - how to make barplot plot zeros for missing values over the data range? but I can't make it work.
df<-data.frame(year = c(2005,2007,2008, 2009),
area = c(10,20,30,15))
barplot(df$area)
My not working attempt:
barplot(df$area,
names.arg = factor(df$year,
levels = 2005:2009))
We can merge with a full dataset, replace the NA elements to 0 and then do the barplot
df1 <- merge(data.frame(year = min(df$year):max(df$year)), df, all.x=TRUE)
df1$area[is.na(df1$area)] <- 0
barplot(setNames(df1$area, df1$year))
-output
This can also be done with tidyverse
library(tidyverse)
df %>%
complete(year = min(year):max(year), fill = list(area = 0)) %>%
ggplot(., aes(year, area)) +
geom_bar(stat = 'identity') +
theme_bw()
-output

Resources