ggplot2-line plotting with TIME series and multi-spline - r

This question's theme is simple but drives me crazy:
1. how to use melt()
2. how to deal with multi-lines in single one image?
Here is my raw data:
a 4.17125 41.33875 29.674375 8.551875 5.5
b 4.101875 29.49875 50.191875 13.780625 4.90375
c 3.1575 29.621875 78.411875 25.174375 7.8012
Q1:
I've learn from this post Plotting two variables as lines using ggplot2 on the same graph to know how to draw the multi-lines for multi-variables, just like this:
The following codes can get the above plot. However, the x-axis is indeed time-series.
df <- read.delim("~/Desktop/df.b", header=F)
colnames(df)<-c("sample",0,15,30,60,120)
df2<-melt(df,id="sample")
ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) + geom_line() + geom_point()
I wish it could treat 0 15 30 60 120 as real number to show the time series, rather than name_characteristics. Even having tried this, I failed.
row.names(df)<-df$sample
df<-df[,-1]
df<-as.matrix(df)
df2 <- data.frame(sample = factor(rep(row.names(df),each=5)), Time = factor(rep(c(0,15,30,60,120),3)),Values = c(df[1,],df[2,],df[3,]))
ggplot(data = df2, aes(x=Time, y= Values, group = sample, colour=sample))
+ geom_line()
+ geom_point()
Loooooooooking forward to your help.
Q2:
I've learnt that the following script can add the spline() function for single one line, what about I wish to apply spline() for all the three lines in single one image?
n <-10
d <- data.frame(x =1:n, y = rnorm(n))
ggplot(d,aes(x,y))+ geom_point()+geom_line(data=data.frame(spline(d, n=n*10)))

Your variable column is a factor (you can verify by calling str(df2)). Just convert it back to numeric:
df2$variable <- as.numeric(as.character(df2$variable))
For your other question, you might want to stick with using geom_smooth or stat_smooth, something like this:
p <- ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) +
geom_line() +
geom_point()
library(splines)
p + geom_smooth(aes(group = sample),method = "lm",formula = y~bs(x),se = FALSE)
which gives me something like this:

Related

Why does R behave differently when parsing parameters of plotting?

I am attempting to plot multiple time series variables on a single line chart using ggplot. I am using a data.frame which contains n time series variables, and a column of time periods. Essentially, I want to loop through the data.frame, and add exactly n goem_lines to a single chart.
Initially I tried using the following code, where;
df = data.frame containing n time series variables, and 1 column of time periods
wid = n (number of time series variables)
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
for (i in 1:wid) {
p <- p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
}
ggplotly(p)
However, this only produces a plot of the final time series variable in the data.frame. I then investigated further, and found that following sets of code produce completely different results:
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
i = 1
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 2
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 3
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
ggplotly(p)
Plot produced by code above
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
p = p + geom_line(aes(x=df$Time, y=df[,1], color=var.lab[1]))
p = p + geom_line(aes(x=df$Time, y=df[,2], color=var.lab[2]))
p = p + geom_line(aes(x=df$Time, y=df[,3], color=var.lab[3]))
ggplotly(p)
Plot produced by code above
In my mind, these two sets of code are identical, so could anyone explain why they produce such different results?
I know this could probably be done quite easily using autoplot, but I am more interested in the behavior of these two snipits of code.
What you're trying to do is a 'hack' way by plotting multiple lines, but it's not ideal in ggplot terms. To do it successfully, I'd use aes_string. But it's a hack.
df <- data.frame(Time = 1:20,
Var1 = rnorm(20),
Var2 = rnorm(20, mean = 0.5),
Var3 = rnorm(20, mean = 0.8))
vars <- paste0("Var", 1:3)
col_vec <- RColorBrewer::brewer.pal(3, "Accent")
library(ggplot2)
p <- ggplot(df, aes(Time))
for (i in 1:length(vars)) {
p <- p + geom_line(aes_string(y = vars[i]), color = col_vec[i], lwd = 1)
}
p + labs(y = "value")
How to do it properly
To make this plot more properly, you need to pivot the data first, so that each aesthetic (aes) is mapped to a variable in your data frame. That means we need a single variable to be color in our data frame. Hence, we pivot_longer and plot again:
library(tidyr)
df_melt <- pivot_longer(df, cols = Var1:Var3, names_to = "var")
ggplot(df_melt, aes(Time, value, color = var)) +
geom_line(lwd = 1) +
scale_color_manual(values = col_vec)

ggplot2, y limits on geom_bar with faceting

In the following, by selecting free_y, the maximum values of each scale adjust as expected, however, how can I get the minimum values to also adjust? at the moment, they both start at 0, when I really want the upper facet to start at about 99 and go to 100, and the lower facet to start at around 900 and go to 1000.
library(ggplot2)
n = 100
df = rbind(data.frame(x = 1:n,y = runif(n,min=99,max=100),variable="First"),
data.frame(x = 1:n,y = runif(n,min=900,max=1000),variable="Second"))
ggplot(data=df,aes(x,y,fill=variable)) +
geom_bar(stat='identity') +
facet_grid(variable~.,scales='free')
You could use geom_linerange rather than geom_bar. A general way to do this is to first find the min of y for each value of variable and then merge the minimums with the original data. Code would look like:
library(ggplot2)
min_y <- aggregate(y ~ variable, data=df, min)
sp <- ggplot(data=merge(df, min_y, by="variable", suffixes = c("","min")),
aes(x, colour=variable)) +
geom_linerange(aes(ymin=ymin, ymax=y), size=1.3) +
facet_grid(variable ~ .,scales='free')
plot(sp)
Plot looks like:

ggplot2: adding lines in a loop and retaining colour mappings

When running the following two pieces of code, I unexpectedly get different results. I need to add lines in a loop as in EX2, but all lines end up having the same colour. Why is this?
EX1
economics2 <- economics
economics2$unemploy <- economics$unemploy + 1000
economics3 <- economics
economics3$unemploy <- economics$unemploy + 2000
economics4 <- economics
economics4$unemploy <- economics$unemploy + 3000
b <- ggplot() +
geom_line(aes(x = date, y = unemploy, colour = as.character(1)), data=economics2) +
geom_line(aes(x = date, y = unemploy, colour = as.character(2)), data=economics3) +
geom_line(aes(x = date, y = unemploy, colour = as.character(3)), data=economics4)
print(b)
EX2
#economics2, economics3, economics4 are reused from EX1.
b <- ggplot()
econ <- list(economics2, economics3, economics4)
for(i in 1:3){
b <- b + geom_line(aes(x = date, y = unemploy, colour = as.character(i)), data=econ[[i]])
}
print(b)
This is not a good way to use ggplot. Try this way:
econ <- list(e1=economics2, e2=economics3, e3=economics4)
df <- cbind(cat=rep(names(econ),sapply(econ,nrow)),do.call(rbind,econ))
ggplot(df, aes(date,unemploy, color=cat)) + geom_line()
This puts your three versions of economics into a single data.frame, in long format (all the data in 1 column, with a second column, cat in this example, identifying the source). Once you've done that, ggplot takes care of everything else. No loops.
The specific reason your loop failed, as pointed out in the comment, is that using aes(...) stores the expression in the ggplot object, and that expression is evaluated when you call print(...). At that point i is 3.
Note that this does not apply to the data=... argument, so you could have done something like this:
b=ggplot()
for(i in 1:3){
b <- b + geom_line(aes(x=date,y=unemploy,colour=cat),
data=cbind(cat=as.character(i),econ[[i]]))
}
print(b)
But, this is still the wrong way to use ggplot.

Violin Plot (geom_violin) with aggregated values

I would like to create violin plots with aggregated data. My data has a category, a value coloumn and a count coloumn:
data <- data.frame(category = rep(LETTERS[1:3],3),
value = c(1,1,1,2,2,2,3,3,3),
count = c(3,2,1,1,2,3,2,1,3))
If I create a simple violin plot it looks like this:
plot <- ggplot(data, aes(x = category, y = value)) + geom_violin()
plot
(source: ahschulz.de)
That is not what I wanted. A solution would be to reshape the dataframe by multiplying the rows of each category-value combination. The problem is that my counts go up to millions which takes hours to be plotted! :-(
Is there a solution with my data?
Thanks in advance!
You can submit a weight when calculating the areas.
plot2 <- ggplot(data, aes(x = category, y = value, weight = count)) + geom_violin()
plot2
You will get warning messages that the weights do not add to one, but that is ok. See here for similar/related discussion.
Using stat="identity" and specifying a violinwidth aesthetic appears to work,although I had to put in a fudge factor:
ggplot(data, aes(x = category, y = value)) +
geom_violin(stat="identity",aes(violinwidth=0.2*count))

How can a line be overlaid on a bar plot using ggplot2?

I'm looking for a way to plot a bar chart containing two different series, hide the bars for one of the series and instead have a line (smooth if possible) go through the top of where bars for the hidden series would have been (similar to how one might overlay a freq polynomial on a histogram). I've tried the example below but appear to be running into two problems.
First, I need to summarize (total) the data by group, and second, I'd like to convert one of the series (df2) to a line.
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,1,2,2,3,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,1,2))
ggplot(df, aes(x=grp, y=val)) +
geom_bar(stat="identity", alpha=0.75) +
geom_bar(data=df2, aes(x=grp, y=val), stat="identity", position="dodge")
You can get group totals in many ways. One of them is
with(df, tapply(val, grp, sum))
For simplicity, you can combine bar and line data into a single dataset.
df_all <- data.frame(grp = factor(levels(df$grp)))
df_all$bar_heights <- with(df, tapply(val, grp, sum))
df_all$line_y <- with(df2, tapply(val, grp, sum))
Bar charts use a categorical x-axis. To overlay a line you will need to convert the axis to be numeric.
ggplot(df_all) +
geom_bar(aes(x = grp, weight = bar_heights)) +
geom_line(aes(x = as.numeric(grp), y = line_y))
Perhaps your sample data aren't representative of the real data you are working with, but there are no lines to be drawn for df2. There is only one value for each x and y value. Here's a modifed version of your df2 with enough data points to construct lines:
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,2,3,1,2,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,0,2))
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(stat="identity", alpha=0.75)
p + geom_line(data=df2, aes(x=grp, y=val), colour="blue")
Alternatively, if your example data above is correct, you can plot this information as a point with geom_point(data = df2, aes(x = grp, y = val), colour = "red", size = 6). You can obviously change the color and size to your liking.
EDIT: In response to comment
I'm not entirely sure what the visual for a freq polynomial over a histogram is supposed to look like. Are the x-values supposed to be connected to one another? Secondly, you keep referring to wanting lines but your code shows geom_bar() which I assume isn't what you want? If you want lines, use geom_lines(). If the two assumptions above are correct, then here's an approach to do that:
#First let's summarise df2 by group
df3 <- ddply(df2, .(grp), summarise, total = sum(val))
> df3
grp total
1 A 5
2 B 8
3 C 3
#Second, let's plot df3 as a line while treating the grp variable as numeric
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(alpha=0.75, stat = "identity")
p + geom_line(data=df3, aes(x=as.numeric(grp), y=total), colour = "red")

Resources