Nested facet plot with ggplot2

Nested facet plot with ggplot2 - r

If I have a nested factor, in this case I have multiple "Family" levels that are contained in the factor "Order", I would like to potentially create a
facet_grid(Family / Order ~.)
instead of the current
facet_grid(Family + Order ~.)
Basically -- ONE strip for every Order -- that contains next to it all strips for each family inside that Order. I know that facet_grid(Family / Order ~.) is currently not possible, but how would I achieve this effect? Could it be done with a theme()? Thank you so much. --SB
I should have specified above that both Family and Order are factors. The data values B are by Species which have a Family level and Order level they belong to. Here is the code for my plot:
p <- ggplot(models, aes(B,Species)) + geom_point() + facet_grid(Family + Order ~
.,scales="free",space="free")
Here is some sample data:
structure(list(Species = c("Acanthocyclops robustus", "Acroperus harpae",
"Alona affinis", "Ascaphus truei", "Bosmina longirostris"), Intercept = c(-36.1182388331068,
-27.2140776216155, -25.7920464721491, -39.2233884219763, -31.4301301084581
), B = c(0.919397836908493, 0.716601987210452, 0.685455190113372,
1.04159758611351, 0.81077051300147), Bconf = c(0.407917065756464,
0.181611850119198, 0.254101713856315, 0.708582768458448, 0.234313394549538
), Order = c("Cyclopoida", "Diplostraca", "Diplostraca", "Anura",
"Diplostraca"), Family = c("Cyclopidae", "Chydoridae", "Chydoridae",
"Leiopelmatidae", "Bosminidae")), .Names = c("Species", "Intercept",
"B", "Bconf", "Order", "Family"), row.names = c(NA, 5L), class = "data.frame")

Using facet_grid or facet_wrap will not build the graphic you are trying to build. You can, however, build a list of graphics and then plot them via gridExtra::grid.arrange. Here is an example
library(ggplot2)
library(gridExtra)
library(dplyr)
dat <-
structure(list(Species = c("Acanthocyclops robustus", "Acroperus harpae",
"Alona affinis", "Ascaphus truei", "Bosmina longirostris"), Intercept = c(-36.1182388331068,
-27.2140776216155, -25.7920464721491, -39.2233884219763, -31.4301301084581
), B = c(0.919397836908493, 0.716601987210452, 0.685455190113372,
1.04159758611351, 0.81077051300147), Bconf = c(0.407917065756464,
0.181611850119198, 0.254101713856315, 0.708582768458448, 0.234313394549538
), Order = c("Cyclopoida", "Diplostraca", "Diplostraca", "Anura",
"Diplostraca"), Family = c("Cyclopidae", "Chydoridae", "Chydoridae",
"Leiopelmatidae", "Bosminidae")), .Names = c("Species", "Intercept",
"B", "Bconf", "Order", "Family"), row.names = c(NA, 5L), class = "data.frame")
dat
# A ggplot object with NO data. Omit the order from the facet_grid call
g <-
ggplot() +
aes(Species, B) +
geom_point() +
facet_grid(. ~ Family,
scales = "free", space = "free") +
ylim(range(dat$B)) +
xlab("")
# Build a seperate graphic for each Order and title
plots <-
lapply(unique(dat$Order), function(o) {
g %+% dplyr::filter_(dat, ~ Order == o) + ggtitle(o)
})
# build as Grobs and plot via gridExtra::grid.arrange
plots %>%
lapply(ggplotGrob) %>%
arrangeGrob(grobs = .) %>%
grid.arrange(., ncol = 1)

Here's a simple solution: add a variable foo to your data that collapses the levels of the inner factor such that interaction(foo, outer) has the same level sets as inner. I know I'm missing some labels, so if someone can figure out a quick way to fill in the labels I'll edit it into my answer.
library(ggplot2)
library(gridExtra)
library(dplyr)
dat <-
structure(list(Species = c("Acanthocyclops robustus", "Acroperus harpae",
"Alona affinis", "Ascaphus truei", "Bosmina longirostris"),
Intercept = c(-36.1182388331068, -27.2140776216155, -25.7920464721491,
-39.2233884219763, -31.4301301084581),
B = c(0.919397836908493, 0.716601987210452, 0.685455190113372,
1.04159758611351, 0.81077051300147),
Bconf = c(0.407917065756464,
0.181611850119198, 0.254101713856315, 0.708582768458448, 0.234313394549538
),
Order = c("Cyclopoida", "Diplostraca", "Diplostraca", "Anura",
"Diplostraca"),
Family = c("Cyclopidae", "Chydoridae", "Chydoridae",
"Leiopelmatidae", "Bosminidae")),
.Names = c("Species", "Intercept",
"B", "Bconf", "Order", "Family"), row.names = c(NA, 5L), class = "data.frame")
replace_with_int_rank = function (x) as.numeric(as.factor(x))
collapse_nested_factor = function( inner, outer ){
ave(as.character(inner), outer, FUN = replace_with_int_rank )
}
dat$Family_collapsed = collapse_nested_factor(inner = dat$Family, dat$Order)
p <- ggplot(dat) + geom_point(aes(B,Species)) + facet_grid(Order~Family_collapsed, scales = "free")

Related

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

As a preliminary disclaimer, I am still very new to R (this is the first analysis I've performed independently), and am hoping this is a reproducible example.
I have a dataset measuring the d.13.C and d.18.O values of various enamel samples through time and space. I want to represent trends within Families across space and time. I have a boxplot I generated in ggplot2 that does this, but I'm running into a few problems:
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
It produces something like this:
Since my data is not evenly distributed (not every depositional context is represented in each geologic member in each family), the boxplots for each depositional environment are different. I would like them to all be the same width, regardless of if the data is present or not (e.g., equivalent to the size of the ones in Bovidae in the KBS Member).
I've tried messing around with width = in the geom_boxplot call, I've tried using theme() to change aspects of the grid, and I've tried the drop = FALSE call, but that didn't change anything. I've also tried faceting my member and depositional environment, but that did not look as appealing and seemed clunkier. Is there a way to accomplish this, or is faceting the way to go?
I provided my dataframe below. *note: it's a subset since otherwise, the output was too long.
dput(head(d))
structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS",
"Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic",
"Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae",
"Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("",
"", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus",
"Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56",
"1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

You could use position_dodge2 with preserve = "single" to keep the boxplot width the same across different groups like this:
library(ggplot2)
library(dplyr)
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
position = position_dodge2(preserve = "single")) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
Created on 2023-02-08 with reprex v2.0.2

Heatmaps for a matrix with ones and zeros using R

Below is my sample data, basically its a matrix with row names as person names
and some columns for each of these rows. All I have in the data is just zeros and ones. I would like to visualize it using heatmaps. (reds for 0s and green for 1s or any other color coding). How do I accomplish this using R? you can show me using any example dataset with just ones and zeros (binary values).

Just another approach using ggplot
library(ggplot2)
library(reshape2)
library(plyr)
library(scales)
df <- structure(list(people = structure(c(2L, 1L), .Label = c("Dwayne", "LeBron"), class = "factor"),
G = c(1L, 0L),
MIN = c(1L, 0L),
PTS = c(0L, 1L),
FGM = c(0L,0L),
FGA = c(0L,0L),
FGP = c(1L,1L)),
.Names = c("people", "G", "MIN", "PTS", "FGM", "FGA", "FGP"),
class = "data.frame",
row.names = c(NA, -2L))
df.m <- melt(df)
df1.m <- ddply(df.m, .(variable), transform, rescale = value)
p <- ggplot(df1.m, aes(variable, people)) +
geom_tile(aes(fill = rescale), colour = "black")
p + scale_fill_gradient(low = "green", high = "red")
show(p)
Adopted from this tutorial

With highcharter:
library(highcharter)
library(tidyr)
library(dplyr)
df<-data.frame(row=c("Dwayne","James"),G=c(1,0),MIN=c(1,0),PTS=c(0,1),FGM=c(0,0),FGA=c(0,0),FGP=c(1,1))
rownames(df)<-c("Dwayne","James")
df$row<-rownames(df)
data<-df%>%
tidyr::gather(row,value)%>%
setNames(c("name","variable","value"))
hchart(data, "heatmap", hcaes(x = variable, y = name, value = value)) %>%
hc_colorAxis(stops = color_stops(2, c("red","green")))
UPDATE:
You can add hc_size(height = 800) for height=800 or make something like that
x<-50
hg<-length(unique(data$name))*x+100
hchart(data, "heatmap", hcaes(x = variable, y = name, value = value)) %>%
hc_colorAxis(stops = color_stops(2, c("red","green")))%>%
hc_size(height = hg)
Where each row in dataset makes chart bigger by 50 points. You can change it in x

This answer uses plotly and hence adding it as another answer. Using the same data as the following one.
library(plotly)
df1 <- as.matrix(df)
p <- plot_ly(x = colnames(df), y = df[,1], z = as.matrix(df[-1]), colors = colorRamp(c("green", "red")), type = "heatmap")
This is much simpler than the ggplot2 in terms of getting the output.
Hope this helps!

why ggplot switches from a discrete to a continuous legend in this multiple lines plot?

Example df:
xnom <- seq(0,80,by=20)
x1 <- xnom+rnorm(5,0,2)
x2 <- x1*.9
x3 <- x2*.9
S1 <- seq(0,1,by=.25)
S2 <- S1*1.3
S3 <- S2*1.3
df <- data.frame(xnom,x1,x2,x3,S1,S2,S3)
I want to make two different plots. One where each response S1, S2, S3 is plotted against the predictor xnom, and another where each response Siis plotted against the corresponding predictor xi. In both cases, I want to make plot a line of different color for each response, and the legend must summarize the colors of the three responses. To this end, I wrote the following function:
makeplot <- function(df,xvec){
library(ggplot2)
if (length(xvec)==1) {
p <- ggplot(data=df, aes_string(x = xvec))
p <- p + geom_line(aes(y = S1, color = "1")) +
geom_point(aes(y = S1, color = "1")) +
geom_line(aes(y = S2, color = "2")) +
geom_point(aes(y = S2, color = "2")) +
geom_line(aes(y = S3, color = "3")) +
geom_point(aes(y = S3, color = "3"))
} else {
p <- ggplot(data=df)
p <- p + geom_line(aes_string(x = xvec[1], y = "S1", color = "1")) +
geom_point(aes_string(x = xvec[1], y = "S1", color = "1")) +
geom_line(aes_string(x = xvec[2], y = "S2", color = "2")) +
geom_point(aes_string(x = xvec[2], y = "S2", color = "2")) +
geom_line(aes_string(x = xvec[3], y = "S3", color = "3")) +
geom_point(aes_string(x = xvec[3] , y = "S3", color = "3"))
}
p <- p + labs(color = "Section")
print(p)
}
In the single predictor case, it worked fine:
makeplot(df,"x1")
ggplot makes a discrete scale legend which looks great. However, when I match each response to the corresponding predictor, then for some reason ggplot switches to a continuous scale:
makeplot(df,c("x1","x2","x3"))
This looks ugly: a Section 2.5 makes no sense in my case. Why is this happening, and how could I avoid it? I'm afraid it may be related to aes_string. However, I need some way to manage variable predictor names in my function, because all this is part of a larger code in which predictor names can change at runtime.

To formalize the suggestions being made by #RichardTelford and #DeltaIV, is there a reason that the following could not be used instead?
Note that the double melt is less than ideal (I know there is a better way, but I am blanking on it at the moment) and that I coded in the labels, instead of using xlab, ylab, and setting the name of the key, etc.
library(ggplot2)
library(dplyr)
library(reshape2)
melt(df, id.vars = c("xnom")
, measure.vars = paste0("S",1:3)
, variable.name = "Section"
, value.name = "Response") %>%
mutate(Section = gsub("^S","",Section)) %>%
ggplot(aes(x = xnom
, y = Response
, col = Section)) +
geom_point() +
geom_line()
melt(df, id.vars = c(paste0("x",1:3))
, measure.vars = paste0("S",1:3)
, variable.name = "Section"
, value.name = "Response") %>%
melt(id.vars = c("Section","Response")
, measure.vars = c(paste0("x",1:3))
, value.name = "Predictor Value"
, variable.name = "Predictor") %>%
mutate(Section = gsub("^S","",Section)) %>%
ggplot(aes(x = `Predictor Value`
, y = Response
, col = Section)) +
geom_point() +
geom_line() +
facet_wrap(~Predictor)

ggplot and the geom_text()

I am new to R and ggplot2.I have searched a lot regarding this but I could not find the solution.
Sample observation1 observation2 observation3 percentage
sample1_A 163453473 131232689 61984186 30.6236955883
Sample1_B 170151351 137202212 59242536 26.8866816109
sample2_A 194102849 162112484 89158170 40.4183031852
sample2_B 170642240 141888123 79925652 41.7493687378
sample3_A 192858504 161227348 90532447 41.8068248626
sample3_B 177174787 147412720 81523935 40.5463120438
sample4_A 199232380 174656081 118115358 55.6409038531
sample4_B 211128931 186848929 123552556 54.7201927527
sample5_A 186039420 152618196 87012356 40.9656544833
sample5_B 145855252 118225865 66265976 39.5744515254
sample6_A 211165202 186625116 112710053 48.5457722338
sample6_B 220522502 193191927 114882014 47.238670909
I am planning to plot a bar plot with ggplot2. I want to plot the first three columns as a bar plot "dodge" and label the observation3 bar with the percentage. I could plot the bars as below but I could not use geom_text() to add the label.
data1 <- read.table("readStats.txt", header=T)
data1.long <- melt(data1)
ggplot(data1.long[1:36,], aes(data1.long$Sample[1:36],y=data1.long$value[1:36], fill=data1.long$variable[1:36])) + geom_bar(stat="identity", width=0.5, position="dodge")

Transform data1 to long form with the observation columns as the measure variables and the Sample and percentage columns as the id variables. Compute the maximum value, mx, to be used to place the percentages. Then perform the plot. Note that geom_bar uses data1.long but geom_text uses data1. We have colored the text giving the percentages the same color as the observation3 bars. (See this post for how to specify default colors.) Both inherit aes(x = Sample) but use different y and other aesthetics. We clean up the X axis labels by removing all lower case letters and underscores from the data1$Sample (optional).
library(ggplot2)
library(reshape2)
data1.long <- melt(data1, measure = 2:4) # cols 2:4 are observation1, ..., observation3
mx <- max(data1.long$value) # maximum observation value
ggplot(data1.long, aes(x = Sample, y = value)) +
geom_bar(aes(fill = variable), stat = "identity", width = 0.5, position = "dodge") +
geom_text(aes(y = mx, label = paste0(round(percentage), "%")), data = data1,
col = "#619CFF", vjust = -0.5) +
scale_x_discrete(labels = gsub("[a-z_]", "", data1$Sample))
(click on chart to enlarge)
Note: We used this data. Note that one occurrence of Sample was changed to sample with a lower case s:
Lines <- "Sample observation1 observation2 observation3 percentage
sample1_A 163453473 131232689 61984186 30.6236955883
sample1_B 170151351 137202212 59242536 26.8866816109
sample2_A 194102849 162112484 89158170 40.4183031852
sample2_B 170642240 141888123 79925652 41.7493687378
sample3_A 192858504 161227348 90532447 41.8068248626
sample3_B 177174787 147412720 81523935 40.5463120438
sample4_A 199232380 174656081 118115358 55.6409038531
sample4_B 211128931 186848929 123552556 54.7201927527
sample5_A 186039420 152618196 87012356 40.9656544833
sample5_B 145855252 118225865 66265976 39.5744515254
sample6_A 211165202 186625116 112710053 48.5457722338
sample6_B 220522502 193191927 114882014 47.238670909"
data1 <- read.table(text = Lines, header = TRUE)
UPDATE: minor improvements

It might be that G. Grothendieck's answer is a better solution, but here's my suggestion (code below)
# install.packages("ggplot2", dependencies = TRUE)
require(ggplot2)
df <- structure(list(Sample = structure(1:12, .Label = c("sample1_A",
"Sample1_B", "sample2_A", "sample2_B", "sample3_A", "sample3_B",
"sample4_A", "sample4_B", "sample5_A", "sample5_B", "sample6_A",
"sample6_B"), class = "factor"), observation1 = c(163453473L,
170151351L, 194102849L, 170642240L, 192858504L, 177174787L, 199232380L,
211128931L, 186039420L, 145855252L, 211165202L, 220522502L),
observation2 = c(131232689L, 137202212L, 162112484L, 141888123L,
161227348L, 147412720L, 174656081L, 186848929L, 152618196L,
118225865L, 186625116L, 193191927L), observation3 = c(61984186L,
59242536L, 89158170L, 79925652L, 90532447L, 81523935L, 118115358L,
123552556L, 87012356L, 66265976L, 112710053L, 114882014L),
percentage = c(30.6236955883, 26.8866816109, 40.4183031852,
41.7493687378, 41.8068248626, 40.5463120438, 55.6409038531,
54.7201927527, 40.9656544833, 39.5744515254, 48.5457722338,
47.238670909)), .Names = c("Sample", "observation1", "observation2",
"observation3", "percentage"), class = "data.frame", row.names = c(NA,
-12L))
# install.packages("reshape2", dependencies = TRUE)
require(reshape2)
data1.long <- melt(df, id=c("Sample"), measure.var = c("observation1", "observation2", "observation3"))
data1.long$percentage <- paste(round(data1.long$percentage, 2), "%", sep="")
data1.long[data1.long$variable == "observation1" | data1.long$variable == "observation2" ,2] <- ""
ggplot(data1.long, aes(x = Sample, y = value, fill=variable)) +
geom_bar(, stat="identity", width=0.5, position="dodge") +
geom_text(aes(label = percentage), vjust=2.10, size=2, hjust=-.06, angle = 90)

loess method fails on data frame due to multiple series not having enough data points

I have a data frame is like this:
dput(xx)
structure(list(TimeStamp = structure(c(15705, 15706), class = "Date"),
Host = c("Host1", "Host2"), OS = structure(c(1L, 1L), .Label = "solaris", class = "factor"),
ID = structure(c(1L, 1L), .Label = "1234", class = "factor"),
Class = structure(c(1L, 1L), .Label = "Processor", class = "factor"),
Stat = structure(c(1L, 1L), .Label = "CPU", class = "factor"),
Instance = structure(c(1L, 1L), .Label = c("_Total", "CPU0",
"CPU1", "CPU10", "CPU11", "CPU12", "CPU13", "CPU14", "CPU15",
"CPU16", "CPU17", "CPU18", "CPU19", "CPU2", "CPU20", "CPU21",
"CPU22", "CPU23", "CPU3", "CPU4", "CPU5", "CPU6", "CPU7",
"CPU8", "CPU9"), class = "factor"), Average = c(4.39009345794392,
5.3152972972973), Min = c(3.35, -0.01), Max = c(5.15, 72.31
)), .Names = c("TimeStamp", "Host", "OS", "ID", "Class",
"Stat", "Instance", "Average", "Min", "Max"), row.names = c(NA,
-2L), class = "data.frame")
This data frame is huge and it has many Hosts. The challenge that I am having is that when a host like above does not have enough data points, the following ggplot fails, basically complaining about not having enough data points to draw the graph.
ggplot(xx, aes(TimeStamp, Max, group=Host, colour=Host)) + geom_point() + geom_smooth(mehtod="loess")
How can I check and see if a particular Host in this data frame has greater than 10 data points, if yes use method="loess".
if the number of data points for a Host is less than 10, use method="lm"

Yes, it was tricky to find, but it seems to be possible,
# for reproducibility
set.seed(42)
# The idea is to first split the data to < 10 and >= 10 points
# I use data.table for that
require(data.table)
dt <- data.frame(Host = rep(paste("Host", 1:10, sep=""), sample(1:20, 10)),
stringsAsFactors = FALSE)
dt <- transform(dt, x=sample(1:nrow(dt)), y = 15*(1:nrow(dt)))
dt <- data.table(dt, key="Host")
dt1 <- dt[, .SD[.N >= 10], by = Host]
dt2 <- dt[, .SD[.N < 10], by = Host]
# on to plotting now
require(ggplot2)
# Now, dt1 has all Hosts with >= 10 observations and dt2 the other way round
# plot now for dt1
p <- ggplot(data=dt1, aes(x = x, y = y, group = Host)) + geom_line() +
geom_smooth(method="loess", se=T)
# plot geom_line for dt2 by telling the data and aes
# The TRICKY part: add geom_smooth by telling data=dt2
p <- p + geom_line(data = dt2, aes(x=x, y=y, group = Host)) +
geom_smooth(data = dt2, method="lm", se=T)
p
(This is an ugly example. But it gives you the idea).

Adding to Arun's excellent answer, I think you simply need to visually distinguish e.g. use solid-line for loess, dotted-line for lm:
p <- ggplot(data=dt1, aes(x = x, y = y, group = Host)) + geom_line() +
geom_smooth(method='loess', linetype='solid', se=T)
p <- p + geom_line(data = dt2, aes(x=x, y=y, group = Host)) +
geom_smooth(data = dt2, method='lm', linetype='dashed', se=T)

The warning messages can be prevented by duplicating the data points and setting the span parameter of the geom_smooth function. For example:
data <- rbind(dt1, dt2)
p <- ggplot(data=dt1, aes(x = x, y = y, group = Host)) + geom_line() +
geom_smooth(method='loess', span = 1.4, se=T)
In case the warnings remain, you can try different values of span parameter.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Nested facet plot with ggplot2 - r

Related

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

Heatmaps for a matrix with ones and zeros using R

why ggplot switches from a discrete to a continuous legend in this multiple lines plot?

ggplot and the geom_text()

loess method fails on data frame due to multiple series not having enough data points

Categories

Resources