Making multi-histogram in ggplot, not recognizing grouping - r

I'm trying to make a stack of histograms (or a ridgeplot) so I can compare distributions at certain timepoints in my observations.
I used this source for the histogram, and this for the ridge plots.
However, I cannot figure out how to set up my code to make either a stacked histogram of each length (L) by week, so that I can see L distributions at different weeks. I have tried the fill option in ggplot (which in the example seems to produce automatic color differences for the weeks because it is in the aes()?) and other "stacks" using the y= argument, but haven't had much success, I think due to the way my data is set up. If anyone can help me figure out how to make multiple histograms by week, that would be useful!
Thanks!
#fake data
L = rnorm(100, mean=10, sd=2)
t = c((rep.int(7,10)), (rep.int(14,20)), rep.int(21,30), rep.int(28,20), (rep.int(31, 20)), (rep.int(36,10)))
fake = data.frame(cbind(L,t))
#subset data into weeks for convenience
dayofweek = seq(7,120,7)
fake2 = as.data.frame(subset(fake, t %in% dayofweek))
fake2$week <- floor(fake2$t/7)
#Plots, basic code
ggplot(fake2, aes(x=L, fill=week)) +
geom_histogram()

I tried facet_grid before, but for some reason facet_wrap actually at least separated the graphs correctly, AND magically made the color fill work:
ggplot(fake2, aes(x=L, fill = week)) +
geom_histogram()+
facet_wrap(.~week)

Related

R - Bar Plot with transparency based on values?

I have a dataset myData which contains x and y values for various Samples. I can create a line plot for a dataset which contains a few Samples with the following pseudocode, and it is a good way to represent this data:
myData <- data.frame(x = 290:450, X52241 = c(..., ..., ...), X75123 = c(..., ..., ...))
myData <- myData %>% gather(Sample, y, -x)
ggplot(myData, aes(x, y)) + geom_line(aes(color=Sample))
Which generates:
This turns into a Spaghetti Plot when I have a lot more Samples added, which makes the information hard to understand, so I want to represent the "hills" of each sample in another way. Preferably, I would like to represent the data as a series of stacked bars, one for each myData$Sample, with transparency inversely related to what is in myData$y. I've tried to represent that data in photoshop (badly) here:
Is there a way to do this? Creating faceted plots using facet_wrap() or facet_grid() doesn't give me what I want (far too many Samples). I would also be open to stacked ridgeline plots using ggridges, but I am not understanding how I would be able to convert absolute values to a stat(density) value needed to plot those.
Any suggestions?
Thanks to u/Joris for the helpful suggestion! Since, I did not find this question elsewhere, I'll go ahead and post the pretty simple solution to my question here for others to find.
Basically, I needed to apply the alpha aesthetic via aes(alpha=y, ...). In theory, I could apply this over any geom. I tried geom_col(), which worked, but the best solution was to use geom_segment(), since all my "bars" were going to be the same length. Also note that I had to "slice" up the segments in order to avoid the problem of overplotting similar to those found here, here, and here.
ggplot(myData, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, yend=Sample, alpha=y), color='blue3', size=14)
That gives us the nice gradient:
Since the max y values are not the same for both lines, if I wanted to "match" the intensity I normalized the data (myDataNorm) and could make the same plot. In my particular case, I kind of preferred bars that did not have a gradient, but which showed a hard edge for the maximum values of y. Here was one solution:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, y=end=Sample, alpha=ifelse(y>0.9,1,0)) +
theme(legend.position='none')
Better, but I did not like the faint-colored areas that were left. The final code is what gave me something that perfectly captured what I was looking for. I simply moved the ifelse() statement to apply to the x aesthetic, so the parts of the segment drawn were only those with high enough y values. Note my data "starts" at x=290 here. Probably more elegant ways to combine those x and xend terms, but whatever:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(
x=ifelse(y>0.9,x,290), xend=ifelse(y>0.9,x-1,290),
y=Sample, yend=Sample), color='blue3', size=14) +
xlim(290,400) # needed to show entire scale

Reordering data based on a column in [r] to order x-value items from lowest to highest y-values in ggplot

I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create

Multiple line plot using ggplot2

I am trying to emulate a ggplot of multiple lines which works as follows:
set.seed(45)
df <- data.frame(x=c(1,2,3,4,5,1,2,3,4,5,3,4,5), val=sample(1:100, 13),
variable=rep(paste0("category", 1:3), times=c(5,5,3)))
ggplot(data = df, aes(x=x, y=val)) + geom_line(aes(colour=variable))
I can get this simple example to work, however on a much larger data set I am following the same steps but it is not working.
ncurrencies = 6
dates = c(BTC$Date, BCH$Date, LTC$Date, ETH$Date, XRP$Date, XVG$Date)
opens = c(BTC$Open, BCH$Open, LTC$Open, ETH$Open, XRP$Open, XVG$Open)
categories = rep(paste0("categories", 1:ncurrencies),
times=c(nrow(BTC), nrow(BCH), nrow(LTC), nrow(ETH), nrowXRP), nrow(XVG)))
df = data.frame(dates, opens, categories)
# Plot - Not correct.
ggplot(data=df, aes(x=dates, y=opens)) +
geom_line(aes(colour=categories))
As you can see, the different points are discretised and the y-axis is strange. I am guessing this is a rookie error but I have been going round in circles for a while. Can anyone see it?
P.S. I don't think I can upload the data here as it would be too much code. However, the dataframe is in the same format as the practice example and the categories match up correctly to the x and y data. Therefore I believe it is the way I am defining ggplot - I am relatively new to R.
Thank you Markus and Jan, yes you are correct. df$opens was a factor and changing it to a numeric solved the problem.
opens = as.numeric(c(BTC$Open, BCH$Open, LTC$Open, ETH$Open, XRP$Open, XVG$Open))

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

Is it possible to create 3 series (2 lines and one point) faceted plot in ggplot?

I am trying to write a code that I wrote with a basic graphics package in R to ggplot.
The graph I obtained using the basic graphics package is as follows:
I was wondering whether this type of graph is possible to create in ggplot2. I think we could create this kind of graph by using panels but I was wondering is it possible to use faceting for this kind of plot. The major difficulty I encountered is that maximum and minimum have common lengths whereas the observed data is not continuous data and the interval is quite different.
Any thoughts on arranging the data for this type of plot would be very helpful. Thank you so much.
Jdbaba,
From your comments, you mentioned that you'd like for the geom_point to have just the . in the legend. This is a feature that is yet to be implemented to be used directly in ggplot2 (if I am right). However, there's a fix/work-around that is given by #Aniko in this post. Its a bit tricky but brilliant! And it works great. Here's a version that I tried out. Hope it is what you expected.
# bind both your data.frames
df <- rbind(tempcal, tempobs)
p <- ggplot(data = df, aes(x = time, y = data, colour = group1,
linetype = group1, shape = group1))
p <- p + geom_line() + geom_point()
p <- p + scale_shape_manual("", values=c(NA, NA, 19))
p <- p + scale_linetype_manual("", values=c(1,1,0))
p <- p + scale_colour_manual("", values=c("#F0E442", "#0072B2", "#D55E00"))
p <- p + facet_wrap(~ id, ncol = 1)
p
The idea is to first create a plot with all necessary attributes set in the aesthetics section, plot what you want and then change settings manually later using scale_._manual. You can unset lines by a 0 in scale_linetype_manual for example. Similarly you can unset points for lines using NA in scale_shape_manual. Here, the first two values are for group1=maximum and minimum and the last is for observed. So, we set NA to the first two for maximum and minimum and set 0 to linetype for observed.
And this is the plot:
Solution found:
Thanks to Arun and Andrie
Just in case somebody needs the solution of this sort of problem.
The code I used was as follows:
library(ggplot2)
tempcal <- read.csv("temp data ggplot.csv",header=T, sep=",")
tempobs <- read.csv("temp data observed ggplot.csv",header=T, sep=",")
p <- ggplot(tempcal,aes(x=time,y=data))+geom_line(aes(x=time,y=data,color=group1))+geom_point(data=tempobs,aes(x=time,y=data,colour=group1))+facet_wrap(~id)
p
The dataset used were https://www.dropbox.com/s/95sdo0n3gvk71o7/temp%20data%20observed%20ggplot.csv
https://www.dropbox.com/s/4opftofvvsueh5c/temp%20data%20ggplot.csv
The plot obtained was as follows:
Jdbaba

Resources