Partaly "free_y" Facet Wrap with ggplot - r

my goal is to produce a column graph showing different element concentrations.
there is a very wide range so I want to customise the scale of my faceted graph into 3 groups.
that way the graphs are able to show the variation in samples for each element and still be comparable between elements,
so idealy I would have 3 different scales for Groups 1,2,and 3 in the graph below.
this is the code to make the above graph
ggplot(binded)+
aes(y=mean,
x=sample,
group=id)+
geom_col(aes(fill=element))+
geom_errorbar(aes(ymin = mean - sd,
ymax = mean + sd))+
facet_wrap(rang~element)+
scale_x_continuous(breaks = seq(1,15,by=1),
name = "Sample ID")+
scale_y_continuous(name="Elemental Conc. (mg/kg)",labels = comma)+
theme(legend.position = "none")
and the data used is below
if i swich the facting to facet_wrap(rang~element,scales = "free_y") then i get
is there any way to mage the scales only free within each group of rang?
i suspect im going to have to just create 3 seperat graphs.

Thanks to Danlooo for the suggestion of patchwork that package and creating 3 separate graphs + plus another one for the y axis label proved successful.
I produced several graphs with the original code and a data frame filters for different concentrations. and the following patchwork code to produce the following graph
p5<-(p1 | p2) / p3+ plot_layout(heights=c(1,2))
(p4+p5)+plot_layout(widths = c(1, 25))

Related

Changing datastructure to create correct bar graph in ggplot

I would like to make a graph in R, which I managed to make in excel. It is a bargraph with species on the x-axis and the log number of observations on the y-axis. My current data structure in R is not suitable (I think) to make this graph, but I do not know how to change this (in a smart way).
I have (amongst others) a column 'camera_site' (site 1, site2..), 'species' (agouti, paca..), 'count'(1, 2..), with about 50.000 observations.
I tried making a dataframe with a column 'species" (with 18 species) and a column with 'log(total observation)' for each species (see dataframe) But then I can only make a point graph.
this is how I would like the graph to look:
desired graph made in excel
Your data seems to be in the correct format from what I can tell from your screenshot.
The minimum amount of code you would need to get a plot like that would be the following, assuming your data.frame is called df:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col()
Many people intuitively try geom_bar(), but geom_col() is equivalent to geom_bar(stat = "identity"), which you would use if you've pre-computed observations and don't need ggplot to do the counting for you.
But you could probably decorate the plot a bit better with some additions:
ggplot(df, aes(VRM_species, log_obs_count_vrm)) +
geom_col() +
scale_x_discrete(name = "Species") +
scale_y_continuous(name = expression("Log"[10]*" Observations"),
expand = c(0,0,0.1,0)) +
theme(axis.text.x = element_text(angle = 90))
Of course, you could customize the theme anyway you would like.
Groetjes

compare boxplots with a single value

I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")

plot subset of proportion data in ggplot2 facet, maintain original proportion information

I am interested in focusing in on two factor variables in a facetted 35-panel gridded stacked bar chart plot - generating just a 10-panel plot.
The code for the original plot (which works) is as follows:
ggplot(region, aes(Year, Em_sum/1000000, fill=Region, order=Region)) +
geom_bar(position='fill', stat='identity') + scale_fill_brewer(palette="Set1") +
guides(fill = guide_legend(reverse=T)) + scale_y_continuous(labels = percent_format()) +
ylab("Proportion of Global FLW Emissions (%)") +
scale_x_discrete(breaks=seq(1961, 2011, 5)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
facet_grid(Group ~ Stage, scales="free")
This code produces a 35-panel facetted stack bar chart grid. What I would like to do, in effect, is to show in the panel data from only two regions (instead of original seven), but keeping proportions of these two regions from the full data set. The result would be a 10 panel grid, 0-100%, but the two regions would not fill 100% of a facet, just the values of the two regions. The remaining 'white-space' in the panels in the combined proportion of the regions not included for that panel.
In implementing the solutions in the answer to what seemed a similar question (Subset and ggplot2) in the first line of my code as follows:
ggplot(subset(region, Region %in% c("NAm.Oceania", "Indus.Asia"), aes(Year, Em_sum/1000000, fill=Region, order=Region))) +
I get the following error:
Error in x[j] : invalid subscript type 'list'
Simplifying the code as follows (in line with the suggested solution above), produces a 100% stacked bar chart of just the two desired regions, but loses all the information from the other five regions. i.e. the plot is filled 100% rather than some lower value that is just the proportion of the two desire regions vs all seven.
ggplot(subset(region, Region %in% c("NAm.Oceania", "Indus.Asia"))) +
geom_bar(aes(Year, Em_sum/1000000, fill=Region, order=Region), position='fill', stat='identity')
My dataset is 232k rows - if an extract of that dataset would be useful, please suggest how I could provide it.
Thanks!

Keep same scale in different graphs ggplot2

I want to create 3 graphs in ggplot2 as follows:
ggplot(observbest,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
ggplot(observmedium,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
ggplot(observweak,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
That is, three graphs displaying the same thing but for difference dataset each time. I want to compare between them, therefore I want their y axis to be fixed to the same scale with the same margins on all graphs, something the currently doesn't happen automatically.
Any suggestion?
Thanks
It sounds like a facet_wrap on all the observations, combined into a single dataframe, might be what you're looking for. E.g.
library(plyr)
library(ggplot2)
observ <- rbind(
mutate(observbest, category = "best"),
mutate(observmedium, category = "medium"),
mutate(observweak, category = "weak")
)
qplot(iteration, bottles, data = observ, geom = "line") + facet_wrap(~category)
Add + ylim(min_value,max_value) to each graph.
Another option would be to merge the three datasets with an id variable identifying which value is in which dataset, and then plot the three of them together, differentiating them by linetype for instance.
Use scale_y_continuous to define the y axis for each graph and make them all easily comparable.

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

Resources