Arranging data for two facet R line plot - r

I am trying to make a two facet line plot as this example. My problem is to arrange data to show desired variable on x-axis. Here is small data set I wanna use.
Study,Cat,Dim1,Dim2,Dim3,Dim4
Study1,PK,-3.00,0.99,-0.86,0.46
Study1,US,-4.67,0.76,1.01,0.45
Study2,FL,-2.856,4.15,1.554,0.765
Study2,FL,-8.668,5.907,3.795,4.754
I tried to use the following code to draw line graph from this data frame.
plot1 <- ggplot(data = dims, aes(x = Cat, y = Dim1, group = Study)) +
geom_line() +
geom_point() +
facet_wrap(~Study)
As is clear, I can only use one value column to draw lines. I want to put Dim1, Dim2, Dim3, Dim4 on x axis which I cannot do in this arrangement of data. [tried c(Dim1, Dim2, Dim3, Dim4) with no luck]
Probably the solution is to transpose the table but then I cannot reproduce categorization for facet (Study in above table) and colour (Cat in above table. Any ideas how to solve this issue?

You can try this:
library(tidyr)
library(dplyr)
gather(dims, variable, value, -Study, -Cat) %>%
ggplot(aes(x=variable, y=value, group=Cat, col=Cat)) +
geom_point() + geom_line() + facet_wrap(~Study)

The solution was quite easy. Just had to think a bit and the re-arranged data looks like this.
Study,Cat,Dim,Value
Study1,PK,Dim1,-3
Study1,PK,Dim2,0.99
Study1,PK,Dim3,-0.86
Study1,PK,Dim4,0.46
Study1,US,Dim1,-4.67
Study1,US,Dim2,0.76
Study1,US,Dim3,1.01
Study1,US,Dim4,0.45
Study2,FL,Dim1,-2.856
Study2,FL,Dim2,4.15
Study2,FL,Dim3,1.554
Study2,FL,Dim4,0.765
Study2,FL,Dim1,-8.668
Study2,FL,Dim2,5.907
Study2,FL,Dim3,3.795
Study2,FL,Dim4,4.754
After that R produced desire result with this code.
plot1 <- ggplot(data=dims, aes(x=Dim, y=Value, colour=Cat, group=Cat)) + geom_line()+ geom_point() + facet_wrap(~Study)

Related

Why is my ggplot2 bar graph not displaying?

I'm trying to plot bar graphs in ggplot2 and running into an issue.
Starting with the variables as this
PalList <- c(9, 9009, 906609, 99000099)
PalList1 <- as_tibble(PalList)
Index <- c(1,2,3,4)
PalPlotList <- cbind(Index, PalList)
PPL <- as_tibble(PalPlotList)
and loading the tidyverse library(tidyverse), I tried plotting like this:
PPL %>%
ggplot(aes(x=PalList)) +
geom_bar()
It doesn't matter whether I'm accessing PPL or PalList, I'm still ending up with this (axes and labels may change, but not the chart area):
Even this still gave a blank plot, only now in classic styling:
ggplot(PalList1, aes(value)) +
geom_bar() +
theme_classic()
If I try barplot(PalList), I get an expected result. But I want the control of ggplot. Any suggestions on how to fix this?
An option is to specify the x, y in aes, create the geom_bar with stat as 'identity', and change the x-axis tick labels
library(ggplot2)
ggplot(PPL, aes(x = Index, y = PalList)) +
geom_bar(stat = 'identity') +
scale_x_continuous(breaks = Index, labels = PalList)

plotting multiple geom-vline in a graph

I am trying to plot two ´geom_vline()´ in a graph.
The code below works fine for one vertical line:
x=1:7
y=1:7
df1 = data.frame(x=x,y=y)
vertical.lines <- c(2.5)
ggplot(df1,aes(x=x, y=y)) +
geom_line()+
geom_vline(aes(xintercept = vertical.lines))
However, when I add the second desired vertical line by changing
vertical.lines <- c(2.5,4), I get the error:
´Error: Aesthetics must be either length 1 or the same as the data (7): xintercept´
How do I fix that?
Just remove aes() when you use + geom_vline:
ggplot(df1,aes(x=x, y=y)) +
geom_line()+
geom_vline(xintercept = vertical.lines)
It's not working because the second aes() conflicts with the first, it has to do with the grammar of ggplot.
You should see +geom_vline as a layer of annotation to the graph, not like +geom_points or +geom_line which are for mapping data to the plot. (See here how they are in two different sections).
All the aesthetics need to have either length 1 or the same as the data, as the error tells you. But the annotations can have different lengths.
Data:
x=1:7
y=1:7
df1 = data.frame(x=x,y=y)
vertical.lines <- c(2.5,4)
ggplot(df1, aes(x = x, y = y)) +
geom_line() +
sapply(vertical.lines, function(xint) geom_vline(aes(xintercept = xint)))

How to reduce binwidth in geom_bar for one single bar?

I'm trying to get a side-by-side bar plot using ggplot's geom_bar(). Here's some sample data I made up for replication purposes:
dat <- data.frame("x"=c(rep(c(1,2,3,4,5),5)),
"by"=c(NA,0,0,0,0,NA,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
I want to plot "x" grouped by "by". Now, because I don't need to plot NA values, I filtered for !is.na(by))
library(dplyr)
dat <- filter(dat, !is.na(by))
Now for the plot:
library(ggplot2)
ggplot(dat, aes(x=x, fill=as.factor(by))) + geom_bar(position="dodge") + theme_tufte()
This returns what I need; almost. Unfortunately, the first bar looks really weird, because it's binwidth is twice as high (due to the fact that there are no zeros in "by" for "x"==1).
Is there a way to reduce the binwidth for the first bar back to "normal"?
You could also do it like this. Precalculate the table and use geom_col.
ggplot(as.data.frame(table(dat)), aes(x = x, y = Freq, fill = by)) +
theme_bw() +
geom_col(position = "dodge")
Never mind, I just figured out that you can manipulate the binwidth argument using an ifelse statement.
...geom_bar(..., binwidth = ifelse("by"==1 & is.na("x"), .5, 1)))
So if you play around with this, it will work. At least it worked for me.

How to format the scatterplots of data series in R

I have been struggling in creating a decent looking scatterplot in R. I wouldn't think it was so difficult.
After some research, it seemed to me that ggplot would have been a choice allowing plenty of formatting. However, I'm struggling in understanding how it works.
I'd like to create a scatterplot of two data series, displaying the points with two different colours, and perhaps different shapes, and a legend with series names.
Here is my attempt, based on this:
year1 <- mpg[which(mpg$year==1999),]
year2 <- mpg[which(mpg$year==2008),]
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy,color="yellow")) +
geom_point(data = year2, aes(x=cty,y=hwy,color="green")) +
xlab('cty') +
ylab('hwy')
Now, this looks almost OK, but with non-matching colors (unless I suddenly became color-blind). Why is that?
Also, how can I add series names and change symbol shapes?
Don't build 2 different dataframes:
df <- mpg[which(mpg$year%in%c(1999,2008)),]
df$year<-as.factor(df$year)
ggplot() +
geom_point(data = df, aes(x=cty,y=hwy,color=year,shape=year)) +
xlab('cty') +
ylab('hwy')+
scale_color_manual(values=c("green","yellow"))+
scale_shape_manual(values=c(2,8))+
guides(colour = guide_legend("Year"),
shape = guide_legend("Year"))
This will work with the way you currently have it set-up:
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy), col = "yellow", shape=1) +
geom_point(data = year2, aes(x=cty,y=hwy), col="green", shape=2) +
xlab('cty') +
ylab('hwy')
You want:
library(ggplot2)
ggplot(mpg, aes(cty, hwy, color=as.factor(year)))+geom_point()

plotting the whole data within each facet using facet_wrap and ggplot2

I am trying to plot line graphs for and facet_wrap for each dataset. What I would love to have is in light grey, transparent or something, all datasets in the background.
df <- data.frame(id=rep(letters[1:5], each=10),
x=seq(10),
y=runif(50))
ggplot(df, aes(x,y, group=id)) +
geom_line() +
facet_wrap(~ id)
This graph is how far I get, but I would love to have all the other missing 4 lines in each graph as well... In any way I try to use facet_wrap, I get only the data of a single line.
What I would expect is something like this for each facet.
ggplot(df, aes(x,y, group=id)) +
geom_line() +
geom_line(data=df[1:10,], aes(x,y, group=id), size=5)
Here's another approach:
First add a new column identical to id:
df$id2 <- df$id
Then add another geom_line based on the df without the original id column:
ggplot(df, aes(x,y, group=id)) +
geom_line(data=df[,2:4], aes(x=x, y=y, group=id2), colour="grey") +
geom_line() +
facet_wrap(~ id)
Here is an approach. It might not be suitable for larger datasets, as we replicate the data number_of_facets-times.
First, we do some data-wrangling to create this desired dataframe.
df$obs_id <- 1:nrow(df) #unique ID for each observation
#new data with unique ID's and 'true' facets
df2 <- expand.grid(true_facet=unique(df$id), obs_id=1:nrow(df))
#merge them
dat <- merge(df,df2,by="obs_id",all=T)
Then, we create a flag defining the 'true' faceted variable, and to discern background from foreground.
dat$col_flag <- dat$true_facet == dat$id
Now, plotting is easy. I've used geom_line twice instead of scales, as that was easier than to try to fix the ordering (would lead to black being plotted below grey).
p1 <- ggplot(dat, aes(x=x,y=y, group=id))+
geom_line(color="grey")+
geom_line(dat=dat[dat$col_flag,],size=2,color="black")+
facet_wrap(~true_facet)

Resources