How to normalize data series to start value = 0? - r

I have a dataset similar to this:
library(ggplot2)
data(economics_long)
economics_long$date2 <- as.numeric(economics_long$date) + 915
ggplot(economics_long, aes(date2, value01, colour = variable)) +
geom_line()
Which gives the following plot:
Now I would like to normalize it to the start value of the green line (or the mean), so all variables start at the same point of the Y axes. Similar to this:
Thanks for any help.

You could subtract the starting value of each vector depending on variable-value using by().
library(ggplot2)
l <- by(economics_long, economics_long$variable, function(x)
within(x, varnorm <- value01 - value01[1]))
dat <- do.call(rbind, l)
ggplot(dat, aes(date2, value01.n, colour = variable)) +
geom_line()

use group_by() and mutate() to shift each variable by its initial y-value.
library(tidyverse)
data(economics_long)
economics_long %>%
group_by(variable) %>%
mutate(value_shifted = value01 - value01[1]) %>%
ungroup() %>%
ggplot(aes(date2, value_shifted, colour = variable)) +
geom_line()

Related

How can you plot `geom_point()` with `facet_wrap()` using per-group row number as x?

Is there a way to plot geom_point() so that it implicitly uses the row number as x in a facet? Just like plot(y) but also for multiple facets.
The following fails with Error: geom_point requires the following missing aesthetics: x:
df = data.frame(y = rnorm(60), group = rep(c("A", "B", "C"), 20))
ggplot(df, aes(y = y)) +
geom_point() +
facet_wrap(~group)
Naturally, you can do it using something like the following, but it is quite cumbersome.
df = df %>%
group_by(group) %>%
mutate(row = row_number())
ggplot(df, aes(x = row, y = y)) +
geom_point() +
facet_wrap(~group)
You can try this:
ggplot(df, aes(x=seq(y),y = y))+geom_point() + facet_wrap(~group)
In that way you can avoid the creation of an index variable as you mentioned!!!

rearrange facet_wrap plots based on the points in the subplot

I would like to rearrange the facet_wrap plots in a better way.
library(ggplot2)
set.seed(123)
freq <- sample(1:10, 20, replace = T)
labels <- sample(LETTERS, 20)
value <- paste("i",1:13,sep='')
lab <- rep(unlist(lapply(1:length(freq), function(x) rep(labels[x],freq[x]))),2)
ival <- rep(unlist(lapply(1:length(freq), function(x) value[1:freq[x]])),2)
df <- data.frame(lab, ival, type=c(rep('Type1',119),rep('Type2',119)),val=runif(238,0,1))
ggplot(df, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
It results in the below plot:
Is there any way rearrange the plots based on their frequency? Some of the lab frequencies (or the number of points per type) are very low(1-3). I would like to arrange the plots facet_wrap wrt their frequencies instead of their label orders. One advantage is to reduce the plotting area and get better intuition from the plots.
Can it be done using the frequency values computed on the fly and passing them to the facet_wrap? Or it should be done separately using dplyr approaches and divide the data into low/medium/high frequent set of plots?
Here is one idea. We can use dplyr to calculate the number of each group in lab and use fct_reorder from forcats to reorder the factor level.
library(dplyr)
library(forcats)
df2 <- df %>%
group_by(lab) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(lab = fct_reorder(lab, N))
ggplot(df2, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
Set .desc = TRUE when using fct_reorder if you want to reverse the factor levels.

ggplot2 automatically removes missing values but does not rescale axes

The code below show that ggplot2 automatically removes the 2nd observation, and yet still keep the y-axis's range from 1 to 1000. How to make ggplot2 scale appropriately without hard-coding the range myself?
df <- data.frame(x = c(1, NA),
y = c(1, 1000))
ggplot(df) + geom_point(aes(x, y))
How about removing rows with missing values in x before plotting?
library(dplyr)
df %>%
filter(!is.na(x)) %>%
ggplot() +
geom_point(aes(x, y))
Or use na.omit
df %>%
na.omit() %>%
ggplot() +
geom_point(aes(x, y))

How to do an association plot in ggplot2?

I have a table with two categorical values and I want to visualise their association; the number of times that they are found together in the same row.
For instance, let's take this data frame:
d <-data.frame(cbind(sample(1:5,100,replace=T), sample(1:10,100,replace=T)))
How can generate a heatmap like this:
Where the colour of the squares represent the number of times that X1 and X2 are found in a given combination.
It would be even better to know how to plot this with a dot plot instead, where the size of the dot represent the count of the combination occurrence between X1 and X2.
If you can guide me how to do this on ggplot2 or any other way in R, it would be really helpful.
Thanks!
Here's how I would do it:
library(ggplot2)
library(dplyr)
set.seed(123)
d <-data.frame(x = sample(1:5,100,replace=T), y = sample(1:10,100,replace=T))
d_sum <- d %>%
group_by(x, y) %>%
summarise(count = n())
For the heatmap:
ggplot(d_sum, aes(x, y)) +
geom_tile(aes(fill = count))
For the dotplot:
ggplot(d_sum, aes(x, y)) +
geom_point(aes(size = count))
library(ggplot2)
library(dplyr)
library(scales)
set.seed(123)
d <-data.frame(x = sample(1:20,1000,replace=T), y = sample(1:20,1000,replace=T))
d %>% count(x, y) %>% ggplot(aes(x, y, fill = n)) +
geom_tile() +
scale_x_continuous(breaks=1:20)+
scale_y_continuous(breaks=1:20)+
scale_fill_gradient2(low='white', mid='steelblue', high='red') +
guides(fill=guide_legend("Count")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme_bw()

Pass column names to a function

How can I turn this ggplot() call into a function? I can't figure out how to get R to recognize the column names I want to pass to the function. I've come across several similar sounding questions, but I've not had success adapting ideas. See here for substitute().
# setup
library(dplyr)
library(ggplot2)
set.seed(205)
dat = data.frame(t=rep(1:2, each=10),
pairs=rep(1:10,2),
value=rnorm(20))
# working example
ggplot(dat %>% group_by(pairs) %>%
mutate(slope = (value[t==2] - value[t==1])/(2-1)),
aes(t, value, group=pairs, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
# attempt at turning into a function
plotFun <- function(df, groupBy, dv, time) {
groupBy2 <- substitute(groupBy)
dv2 <- substitute(dv)
time2 <- substitute(time)
ggplot(df %>% group_by(groupBy2) %>%
mutate(slope = (dv2[time2==2] - dv2[time2==1])/(2-1)),
aes(time2, dv2, group=groupBy2, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
# error time
plotFun(dat, pairs, value, t)
Update
I took #joran's advice to look at this answer, and here's what I came up with:
library(dplyr)
library(ggplot2)
library(lazyeval)
plotFun <- function(df, groupBy, dv, time) {
ggplot(df %>% group_by_(groupBy) %>%
mutate_(slope = interp(~(dv2[time2==2] - dv2[time2==1])/(2-1),
dv2=as.name(dv),
time2=as.name(time))),
aes(time, dv, group=groupBy, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
plotFun(dat, "pairs", "value", "t")
The code runs but the plot is not correct:
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
Here's the working solution informed by all of the commenters:
# setup
library(dplyr)
library(ggplot2)
library(lazyeval)
set.seed(205)
dat = data.frame(t=rep(1:2, each=10),
pairs=rep(1:10,2),
value=rnorm(20))
# function
plotFun <- function(df, groupBy, dv, time) {
ggplot(df %>% group_by_(groupBy) %>%
mutate_(slope = interp(~(dv2[time2==2] - dv2[time2==1])/(2-1),
dv2=as.name(dv),
time2=as.name(time))),
aes_string(time, dv, group = groupBy,
colour = 'slope > 0')) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
# plot
plotFun(dat, "pairs", "value", "t")

Resources