Include number of missing values in ggplot - r

I'm using ggplot2 to plot different time series (one for Alice, one for Bob, one for Eve), which have a different number of missing values.
require('ggplot2')
df3 <- data.frame(name=c(rep("Alice",10),rep("Bob",10),rep("Eve",10)),value=c(seq(1,10), seq(4,13), seq(5,14)), time=rep(seq(1,10),3))
df3$value[c(3,4,15,16,17,22,23,24,25)]<- NA
ggplot(data=df3, aes(time, value)) +
geom_line() +
geom_point() + facet_wrap(~ name, nrow=1)
I'd like to have the count of NAs displayed in each of the plots, e.g. as an overlay of a number (2 for Alice, 3 for Bob, 4 for Eve). Is there an elegant way to do this?

As #MLavoie suggested in the comments, generate a new dataframe for the text labels then work with that. This should work for your purposes:
require('ggplot2')
require('dplyr')
df3 <- data.frame(name=c(rep("Alice",10),rep("Bob",10),rep("Eve",10)),value=c(seq(1,10), seq(4,13), seq(5,14)), time=rep(seq(1,10),3))
df3$value[c(3,4,15,16,17,22,23,24,25)]<- NA
NAdf<-df3 %>%
group_by(name) %>%
summarise(ycoor=mean(value, na.rm=TRUE),
xcoor=mean(time, na.rm=TRUE),
num_NA=sum(is.na(value)))
ggplot(data=df3, aes(time, value)) +
geom_line() +
geom_point() +
geom_text(data=NAdf, aes(x=xcoor, y=ycoor, label=paste(num_NA,"for",name))) +
facet_wrap(~ name, nrow=1)
HTH
Updated
In response to the comment below. Generally I find placing text labels into a facetted plot fairly finicky. In your example you could simply define the x and y coordinates as 5,5 for all panels like this:
NAdf<-df3 %>%
group_by(name) %>%
summarise(ycoor=5,
xcoor=5,
num_NA=sum(is.na(value)))
Then you could plot using the same code as before:
ggplot(data=df3, aes(time, value)) +
geom_line() +
geom_point() +
geom_text(data=NAdf, aes(x=xcoor, y=ycoor, label=paste(num_NA,"for",name))) +
facet_wrap(~ name, nrow=1)
The issue with this is that it isn't a generalized solution. In practice though I find you need to fiddle with your geom_text plotting coordinates each and every time to get it just right. Truth be told #Sam Dickson's solution is very elegant for this particular problem.

One option is to add the count to the variable used in the faceting:
df3$NAs <- ave(df$value,df$name,FUN=function(x) sum(is.na(x))))
df3$name1 <- paste0(df3$name,' (NA = ',df3$NAs,')')
ggplot(data=df3, aes(time, value)) +
geom_line() +
geom_point() + facet_wrap(~ name1, nrow=1)

Related

Combine scale_x_upset with scale_y_break

I made an upset plot using the ggupset package and added a break to the y axis with scale_y_break from the ggbreakpackage.
However, when I add scale_y_break, the combination matrix under the bar plot disappears.
Is there a way to combine the combination matrix of the plot made without scale_y_break with the bar plot portion of a plot made with scale_y_break? I can't seem to be able to access the grobs of these plots or use any other workaround. If anyone could help, I would greatly appreciate it!
Example with scale_x_upset and scale_y_break:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
I would like to combine the barplot portion of the plot created with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
with the combination matrix portion of the plot made with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)
Thanks!

Box plot with ggplot2 using data from read.table

I am plotting a box plot that shows the height of students. However I am unsure what I use as x and y. I have only measurments, so one should be height and the other one amount of students that have that height.
x=N, y=Height
My code:
# Library
library(ggplot2)
library(tidyverse)
# 1. Read data (comma separated)
data = read.table(text = "184,180,183,184,184,160,173",
sep=",",stringsAsFactors=F, na.strings="unknown")
# 2. Print table
print(data)
# 3. Plot box plot
data %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x=value, y=value)) +
geom_boxplot() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")
I think the best plot to represent a vector of data is an histogram. However you could use the boxplot by create a dummy factor that group your observation. i.e.
data %>%
pivot_longer(cols = everything()) %>%
mutate(type="student") %>%
ggplot(aes(x=type, y=value)) +
geom_boxplot() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")
if you want a histogram (I think much better for your situation), you don'ty need the dummy factor and you could do something like :
data %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x=value)) +
geom_histogram() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")
To use a boxplot correctly, you have to have one categorical variable and one continuous. Put the categorical (e.g. make, female, etc.) on the x-axis and the continuous on the y-axis (height in your case).

ggplot for each column in a data

I am missing some basics in R.
How do I make a plot for each column in a data frame?
I have tried making plots for each column separately. I was wondering if there was a easier way?
library(dplyr)
library(ggplot2)
data(economics)
#scatter plots
ggplot(economics,aes(x=pop,y=pce))+
geom_point()
ggplot(economics,aes(x=pop,y=psavert))+
geom_point()
ggplot(economics,aes(x=pop,y=uempmed))+
geom_point()
ggplot(economics,aes(x=pop,y=unemploy))+
geom_point()
#boxplots
ggplot(economics,aes(y=pce))+
geom_boxplot()
ggplot(economics,aes(y=pop))+
geom_boxplot()
ggplot(economics,aes(y=psavert))+
geom_boxplot()
ggplot(economics,aes(y=uempmed))+
geom_boxplot()
ggplot(economics,aes(y=unemploy))+
geom_boxplot()
All I'm looking for is having 1 box plot 2*2 and 1 2*2 scatter plot with ggplot2. I understand there is facet grid which I have failed to understand how to implement.(I believe this can be achieved easily with par(mfrow()) and base R plots. I saw somewhere else using using widening the data? which i didn't understand.
In cases like this the solution is almost always to reshape the data from wide to long format.
economics %>%
select(-date) %>%
tidyr::gather(variable, value, -pop) %>%
ggplot(aes(x = pop, y = value)) +
geom_point(size = 0.5) +
facet_wrap(~ variable, scales = "free_y")
economics %>%
tidyr::gather(variable, value, -date) %>%
ggplot(aes(y = value)) +
geom_boxplot() +
facet_wrap(~ variable, scales = "free_y")

plotly::ggplotly is breaking for certain facet-wrap data

I have a problem combining plotly::ggplotly (v4.7.1) with facet_wrap (v3.0.0) that I can't seem to generalise but is reproducible with a particular dataset (summary metrics for a set of tweets):
require(tidyverse)
require(plotly)
d = read_csv('https://gist.githubusercontent.com/geotheory/21c4eacbf38ed397f7cf984f8d92e931/raw/9148df79326f53a66a8cc363241a440752487357/data.csv')
d = d %>% mutate(key = fct_reorder(key, n)) # order the bars
p = ggplot(d, aes(key, n)) + geom_bar(stat='identity') +
facet_wrap(~ set, scales='free', nrow=1) +
labs(x=NULL, y=NULL) + coord_flip()
print(p)
Enter ggplotly:
print(ggplotly(p))
This seems to relate to the combination of nrow=1 and scales='free' arguments. Any ideas about the cause?

Plot including one categorical variable and two numeric variables

How can I show the values of AverageTime and AverageCost for their corresponding type on a graph. The scale of the variables is different since one of them is the average of time and another one is the average of cost. I want to define type as x and y refers to the value of AverageTime and AverageCost. (In this case, I will have two line plots just in one graph)
Type<-c("a","b","c","d","e","f","g","h","i","j","k")
AverageTime<-c(12,14,66,123,14,33,44,55,55,6,66)
AverageCost<-c(100,10000,400,20000,500000,5000,700,800,400000,500,120000)
df<-data.frame(Type,AverageTime,AverageCost)
This could be done using facet_wrap and scales="free_y" like so:
library(tidyr)
library(dplyr)
library(ggplot2)
df %>%
mutate(AverageCost=as.numeric(AverageCost), AverageTime=as.numeric(AverageTime)) %>%
gather(variable, value, -Type) %>%
ggplot(aes(x=Type, y=value, colour=variable, group=variable)) +
geom_line() +
facet_wrap(~variable, scales="free_y")
There you can compare the two lines even though they are different scales.
HTH
# install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
p <- ggplot(df, aes(AverageTime, AverageCost, colour=Type)) + geom_point()
p + geom_abline()
To show both lines in the same plot it will be hard since there are on different scales. You also need to convert AverageTime and AverageCost into a numeric variable.
library(ggplot2)
library(reshape2)
library(plyr)
to be able to plot both lines in one graph and take the average of the two, you need to some reshaping.
df_ag <- melt(df, id.vars=c("Type"))
df_ag_sb <- df_ag %>% group_by(Type, variable) %>% summarise(meanx = mean(as.numeric(value), na.rm=TRUE))
ggplot(df_ag_sb, aes(x=Type, y=as.numeric(meanx), color=variable, group=variable)) + geom_line()

Resources