Adding individual lines to facet_wrapped plot

Adding individual lines to facet_wrapped plot - r

I'm trying to use facet_wrap to plot a bunch of dataset as scatterpoints, each with an individual line in them indicating when a specific event happened. However, I haven't been able to get the lines to show up individually in each plot, but rather they all show up in all of the plots. After melting, the data looks like:
names(data) = c("Date","ID", "event_date", "variable", "value")
where I want each plot to be a scatter plot of value ~ Date, and each plot divided up by ID with a vertical line appearing at each "event_date" that shows up for each ID. My best efforts have gotten me to:
p <- qplot(Date, value, data=dat, colour=variable)
p <- p + geom_vline(xintercept=as.numeric(dat$event_date))
p + facet_wrap(~ID)
Which works perfectly except for all of the vertical lines showing up in every subplot. Any suggestions? Reading through the documentation hasn't gotten me anywhere yet.

ggplot(dat, aes(Date,value))+ geom_point() + geom_vline(data=dat,aes(xintercept=as.numeric(event_date))) + facet_wrap(~ID)
Is how I do the same thing using facet_grid(). I'm pretty sure it will work for facet_wrap as well.

Related

What is the purpose of using facet_grid(variable ~ .) instead of just using facet_wrap?

So I'm self-teaching myself R right now using this online resource: "https://r4ds.had.co.nz/data-visualisation.html#facets"
This particular section is going over the use of facet_wrap and facet_grid. It's clear to me that facet_grid is primarily used when wanting to visualize a plot along two additional dimensions, rather than just one. What I don't understand is why you can use facet_grid(.~variable) or facet_grid(variable~.) to basically achieve the same result as facet_wrap. Putting a "." in place of a variable results in just not faceting along the row or column dimension, or in other words showing 1 additional variable just as facet_wrap would do.
If anyone can shed some light on this, thank you!

If you use facet_grid, the facets will always be in one row/column. They will never wrap to make a rectangle. But really if you just have one variable with few levels, it doesn't much matter.
You can also see that facet_grid(.~variable) and facet_grid(variable~.) will put the facet labels in different places (row headings vs column headings)
mg <- ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point()
mg + facet_grid(vs~ .) + labs(title="facet_grid(vs~ .)"),
mg + facet_grid(.~ vs) + labs(title="facet_grid(.~ vs)")
So in the most simple of cases, there's nothing that different between them. The main reason to use facet_grid is to have a single, common axis for all facets so you can easily scan across all panels to make a direct comparison of data.

Actually, the same result is not produced all the time...
The number of facets which appear across the graphs pane is fixed with facet_grid (always the number of unique values in the variable) where as facet_wrap, like its name suggests, wraps the facets around the graphics pane. In this way the functions only result in the same graph when the number of facets produced is small.
Both facet_grid and facet_wrap take their arguments in the form row~columns, and nowdays we don't need to use the dot with facet_grid.
In order to compare their differences let's add a new variable with 8 unqiue values to the mtcars data set:
library(tidyverse)
mtcars$example <- rep(1:8, length.out = 32)
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_grid(~example, labeller = label_both)
Which results in a cluttered plot:
Compared to:
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_wrap(~example, labeller = label_both)
Which results in:

Why does changing the label mess up my plot?

I have recently been playing around with various plot types using fictitious data to get my head around how I could display various pieces of information. One plot type that is gaining popularity is the so called individual differences dot plot which shows the change in each subjects score pre-post. The plot is fairly easy to produce, but my issue is that when I go to change the labels using either the labs or xlab ylab functions in ggplot, the plot itself becomes messed up. Below I have attached the fictitious data, the code used and the results.
Data
df<- data.frame(Participant<- c(rep(1:10,2)), Score<- c(rnorm(20,100,5)), Session<- c(1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2))
colnames(df) <- c("Participant", "Score", "Session")
Code for plot
p<- ggplot(df, aes(x=df$Session, y=df$Score, colour=df$Participant))+ geom_point()+
geom_line(group=df$Participant)+
theme_classic()
Plot
Individual difference plot
My dilemma is that anytime I try to change the label names, the plot messes up as per below.
Problem
p + xlab("Session") + ylab("Score")
Plot after relabelling
The same thing happens if I try the labs function i.e, p + labs(x= "Session", y= "Score"). You can see that the labels themselves do actually change, but for some reason this messes up the actual plot. Does any have any ideas as to what could be going wrong here?

The issue appears to be the grouping is undone when the label functions are called. Instead, issue the grouping as an aesthetic mapping:
library(dplyr); library(ggplot)
df %>% mutate(across(c(Session,Participant),factor)) -> df
p <- ggplot(df, aes(x=Session, y=Score, colour=Participant))+ geom_point()+
geom_line(aes(group=Participant))+
theme_classic()
p + xlab("Session") + ylab("Score")
I suspect this is probably a bug.

Geom_area plot doesn't fill the area between the lines

I want to make an area plot with ggplot(mpg, aes(x=year,y=hwy, fill=manufacturer)) + geom_area(), but I get this:
I'm realy new in R world, can anyone explain why it does not fill the area between the lines? Thanks!

First of all, there's nothing wrong with your code. It's working as intended and you are correct in the syntax required to do what you are looking to do.
Why don't you get the area geom to plot correctly, then? Simple answer is that you don't have enough points to draw a proper line between your x values for all of the aesthetics (manufacturers). Try the geom_point plot and you'll see what I mean:
ggplot(mpg, aes(x=year,y=hwy)) + geom_point(aes(color=manufacturer))
You need a different dataset. Here's a dummy one that is simply two lines with different slopes. It works as expected because each of the aesthetics has y values which span the x labels:
# dummy dataset
df <- data.frame(
x=rep(1:10,2),
y=c(seq(1,10,length.out=10), seq(1,5,length.out=10)),
z=c(rep('A',10), rep('B', 10))
)
# plot
ggplot(df, aes(x,y)) + geom_area(aes(fill=z))

Unable to correctly add labels to a list of ggplot graphs

I am trying to assemble a multipanel boxplot with ggplot.
To have a general structure I am generating a list of plots and plotting them. I also want to add letters reporting significance groups for each boxplot.
Everything works fine, except for the fact that all the boxplots show the letters computed during the last iteration of the loop.
I post below an example in which I just try to add letters reporting the loop iteration number, and as you can see instead of reporting "Plot 1" for the first loop and "Plot 2" for the second it always plots the second.
The code I used is the following:
library(ggplot2)
library(gridExtra)
mydata<-data.frame(values=c(1,4,5,6,4,2,4,7,3,4,5,6,4,4,2,1,3,6,4,1,2,5,4,3,4,2,1,3,4,2),group=c(rep("A",15),rep("B",15)))
mydata2<-data.frame(values=c(2,6,5,6,7,2,5,7,3,4,5,6,4,4,2,1,3,6,4,1,2,5,4,3,1,2,3,3,4,7),group=c(rep("A",15),rep("B",15)))
myp<-list()
for(aaa in 1:2)
{
if(aaa==1) mydata<-mydata else mydata<-mydata2
myp[[aaa]]<-ggplot(mydata, aes(x=group, y=values)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(position=position_jitter(width=.1, height=0)) +
geom_text(aes(x=1, y=max(values)-0.05*max(values),label=paste("Plot",aaa))) +
geom_text(aes(x=2, y=max(values)-0.05*max(values),label=paste("Plot",aaa)))
}
do.call(grid.arrange,myp)
What am I doing wrong? It looks like the used of do.call with grid.arrange creates problems with the geom_text (but not with the plot, which is different in the two loops).
I would prefer NOT to manually write all the plot functions, since I have at lest three multipanel plots each on with 4 boxplots.

I'm not entirely sure what goes wrong with geom_text, but everything works if you use annotate instead (which should be used exactly for this purpose).
for(aaa in 1:2){
print(aaa)
if(aaa==1) df<-mydata else df<-mydata2
myp[[aaa]]<-ggplot(df, aes(x=group, y=values)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(position=position_jitter(width=.1, height=0)) +
annotate("text", x=1, y=max(df$values)-0.05*max(df$values),label=paste("Plot",aaa)) +
annotate("text", x=2, y=max(df$values)-0.05*max(df$values),label=paste("Plot",aaa))
}

R incorrect y-axis in ggplots geom_bar()

I have a dataframe with Wikipedia edits, with information about the number of edit for the user (1st edit, 2nd edit and so on), the timestamp when the edit was made, and how many words were added.
In the actual dataset, I have up to 20.000 edits per user and in some edits, they add up to 30.000 words.
However, here is a downloadable small example dataset to exemplify my problem. The header looks like this:
I am trying to plot the distribution of added words across the Edit Progression and across time. If I use the regular R barplot, i works just like expected:
barplot(UserFrame3$NoOfAdds,UserFrame3$EditNo)
But I want to do it in ggplot for nicer graphics and more customizing options.
If I plot this as a scatterplot, I get the same result:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) + geom_point(size = 0.1)
Same for a linegraph:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) +geom_line(size = 0.1)
But when I try to plot it as a bargraph in ggplot, I get this result:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) + geom_bar(stat = "identity", position = "dodge")
There appear to be a lot more holes on the X-axis and the maximum is nowhere close to where it should be (y = 317).
I suspect that ggplot somehow groups the bars and uses means instead of the actual values despite the "dodge" parameter? How can I avoid this? and how would I go about plotting the time progression as a bargraph aswell without ggplot averaging over multiple edits?

You should expect more x-axis "holes" using bars as compared with lines. Lines connect the zero values together, bars do not.
I used geom_col with your data download, it looks as expected:
UserFrame3 %>%
ggplot(aes(EditNo, NoOfAdds)) + geom_col()