Plots not showing in ggplot2 with geom_bar - r

I am trying to plot stacked bars using ggplot() and geom_bar()
Sample data (Titanic Kaggle question):
PassengerId Survived Age
1 0 25
2 1 20
3 1 40
4 0 10
I am trying to show stacked bars of survival and death for each age range (I have divided age into bins). Plot is not visible when I execute the command. and when I add print() function, I get the error as
Error: No layers in plot
Please tell if there is anything I am missing out here ?
breaks <- seq(min(train$Age), max(train$Age), 10)
p <- ggplot(train, aes(x=train$Age, y=length(train$PassengerId)), xlab = "age", ylab = "count", main = "survival",
fill = Survived + geom_bar(stat = "identity", bin = breaks))
print(p)
"train" is object in which I stored the data.

apart from Pascal's hint, you may want to:
Use factors for colored column (Survived)
Use different geom_bar() and aes() params as suggested
Code:
train <- data.frame(PassengerId = 1:4, Survived = factor(c(0,1,1,0)), Age = c(25, 20, 40, 10))
BIN.WIDTH <- 10
my.breaks <- seq(from = min(train$Age), to = max(train$Age) + BIN.WIDTH, by = BIN.WIDTH)
ggplot(train, aes(Age, fill = Survived)) + geom_bar(breaks = my.breaks)
Plot:

There are a few issues here:
First you should remove the rows where age is NA, otherwise you can't create a sequence.
train<-train[!is.na(train$Age),]
Then you should change your y value to train$Survived (why did you use length(train$PassengerId)? - it doesn't display anything)
The thing that #Pascal mentioned is also correct: You have to put + geom_bar(stat="identity", bin=breaks) outside.
and you need to add the axes and title differently in ggplot.
This is the complete working code:
train<-train[!is.na(train$Age),]
breaks <- seq(min(train$Age), max(train$Age), 10)
p <- ggplot(train, aes(x=train$Age, y=train$Survived),
fill = Survived)+ geom_bar(stat="identity", bin=breaks)
p <- p+labs(x="age", y="count")
p <- p+theme(plot.title= element_text("survival"))
print(p)
Results in this graph:

Related

adding a line to a ggplot boxplot

I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):
Y-AXIS
X-AXIS
SUBJECT
2.2796598
F1
1
0.9118639
F1
2
2.7111228
F3
3
2.7111228
F2
4
2.2796598
F4
5
2.3876401
F10
6
....
...
...
The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.
To generate a box plot, I used ggplot like this:
plot1 <- ggplot(longdata,
aes(x = X_axis, y = Y_axis)) +
geom_boxplot() +
ylim(0, 12.5) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
That results in the boxplot I have in mind. You can check out the result here if you like: boxplot
So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line
Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!
library(ggplot2)
It is always a good idea to add a reproducible example of your data,
you can always simulate what you need
set.seed(123)
simulated_data <- data.frame(
subject = rep(1:10, each = 10),
xaxis = rep(paste0('F', 1:10), times = 10),
yaxis = runif(100, 0, 100)
)
In ggplot each geom can take a data argument, for your line just use
a subset of your original data, limited to the subject desired.
Colors and other visula elements for the line are simple, take a look here
ggplot() +
geom_boxplot(data = simulated_data, aes(xaxis, yaxis)) +
geom_line(
data = simulated_data[simulated_data$subject == 1,],
aes(xaxis, yaxis),
color = 'red',
linetype = 2,
size = 1,
group = 1
)
Created on 2022-10-14 with reprex v2.0.2
library(ggplot2)
library(dplyr)
# Simulate some data absent a reproducible example
testData <- data.frame(
y = runif(300,0,100),
x = as.factor(paste0("F",rep(1:10,times=30))),
SUBJECT = as.factor(rep(1:30, each = 10))
)
# Copy your plot with my own data + ylimits
plot1 <- ggplot(testData,
aes(x = x, y = y)) +
geom_boxplot() +
ylim(0, 100) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
# add the geom_line for subject 1
plot1 +
geom_line(data = filter(testData, SUBJECT == 1),
mapping = aes(x=x, y=y, group = SUBJECT))
My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!

generating a manhattan plot with ggplot

I've been trying to generate a Manhattan plot using ggplot, which I finally got to work. However, I cannot get the points to be colored by chromosome, despite having tried several different examples I've seen online. I'm attaching my code and the resulting plot below. Can anyone see why the code is failing to color points by chromosome?
library(tidyverse)
library(vroom)
# threshold to drop really small -log10 p values so I don't have to plot millions of uninformative points. Just setting to 0 since I'm running for a small subset
min_p <- 0.0
# reading in data to brassica_df2, converting to data frame, removing characters from AvsDD p value column, converting to numeric, filtering by AvsDD (p value)
brassica_df2 <- vroom("manhattan_practice_data.txt", col_names = c("chromosome", "position", "num_SNPs", "prop_SNPs_coverage", "min_coverage", "AvsDD", "AvsWD", "DDvsWD"))
brassica_df2 <- as.data.frame(brassica_df2)
brassica_df2$AvsDD <- gsub("1:2=","",as.character(brassica_df2$AvsDD))
brassica_df2$AvsDD <- as.numeric(brassica_df2$AvsDD)
brassica_df2 <- filter(brassica_df2, AvsDD > min_p)
# setting significance threshhold
sig_cut <- -log10(1)
# settin ylim for graph
ylim <- (max(brassica_df2$AvsDD) + 2)
# setting up labels for x axis
axisdf <- as.data.frame(brassica_df2 %>% group_by(chromosome) %>% summarize(center=( max(position) + min(position) ) / 2 ))
# making manhattan plot of statistically significant SNP shifts
manhplot <- ggplot(data = filter(brassica_df2, AvsDD > sig_cut), aes(x=position, y=AvsDD), color=as.factor(chromosome)) +
geom_point(alpha = 0.8) +
scale_x_continuous(label = axisdf$chromosome, breaks= axisdf$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(axisdf$chromosome)))) +
geom_hline(yintercept = sig_cut, lty = 2) +
ylab("-log10 p value") +
ylim(c(0,ylim)) +
theme_classic() +
theme(legend.position = "n")
print(manhplot)
I think you just need to move your color=... argument inside the call to aes():
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD),
color=as.factor(chromosome))
becomes...
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD, color=as.factor(chromosome)))

Histogram with discontinuous x-axis

I need to realize an histogram in R. I add a picture to represent the desired results. I had tried to use both ggplot2 and the base function hist. I used this code (ggplot) to get the basic histogram, but I would like to add the option to set the x-axis as shown in the figure (exactly the same values). Can someone tell me how to do that?
My imput file DataLig2 contains a list of objects and for each of these is associated a value (N..of.similar..Glob.Sum...0.83..ligandable.pockets). I need to plot the frequencies of all the reported values. The lowest value is 1 and the highest is 28. There aren't values from 16 to 27 so I would like to skip thi range in my plot.
example of imput file:
Object;N..of.similar..Glob.Sum...0.83..ligandable.pockets
1b47_A_001;3
4re2_B_003;1
657w_H_004_13
1gtr_A_003;28
...
my script:
ggplot(dataLig2, aes(dataLig2$N..of.similar..Glob.Sum...0.83..ligandable.pockets, fill = group)) + geom_histogram(color="black") +
scale_fill_manual(values = c("1-5" = "olivedrab1",
"6-10" = "limegreen",
"11-28" = "green4"))
Can you also suggest a script with the hist base function to get the same graph (with spaced bars as in the figure shown)? Thank you!
Using ggplot, set x as factor, missing numbers as "...", and set to plot unused levels, see example:
library(ggplot2)
# reproducible example data
# where 8 and 9 is missing
set.seed(1); d <- data.frame(x = sample(c(1:7, 10), 100, replace = TRUE))
# add missing 8 and 9 as labels
d$x1 <- factor(d$x, levels = 1:10, labels = c(1:7, "...", "...", 10))
#compare
cowplot::plot_grid(
ggplot(d, aes(x)) +
geom_bar() +
ggtitle("before") +
scale_x_continuous(breaks = 1:10),
ggplot(d, aes(x = x1)) +
geom_bar() +
scale_x_discrete(drop = FALSE) +
ggtitle("after"))

Highlight positions without data in facet_wrap ggplot

When facetting barplots in ggplot the x-axis includes all factor levels. However, not all levels may be present in each group. In addition, zero values may be present, so from the barplot alone it is not possible to distinguish between x-axis values with no data and those with zero y-values. Consider the following example:
library(tidyverse)
set.seed(43)
site <- c("A","B","C","D","E") %>% sample(20, replace=T) %>% sort()
year <- c("2010","2011","2012","2013","2014","2010","2011","2012","2013","2014","2010","2012","2013","2014","2010","2011","2012","2014","2012","2014")
isZero = rbinom(n = 20, size = 1, prob = 0.40)
value <- ifelse(isZero==1, 0, rnorm(20,10,3)) %>% round(0)
df <- data.frame(site,year,value)
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site)
This is fish census data, where not all sites were fished in all years, but some times no fish were caught. Hence the need to differentiate between the two situations. For example, there was no catch at site C in 2010 and it was not fished in 2011, and the reader cannot tell the difference. I would like to add something like "no data" to the plot for 2011. Maybe it is possible to fill the rows where data is missing, generate another column with the desired text to be added and then include this via geom_text?
So here is an example of your proposed method:
# Tabulate sites vs year, take zero entries
tab <- table(df$site, df$year)
idx <- which(tab == 0, arr.ind = T)
# Build new data.frame
missing <- data.frame(site = rownames(tab)[idx[, "row"]],
year = colnames(tab)[idx[, "col"]],
value = 1,
label = "N.D.") # For 'no data'
ggplot(df, aes(year, value)) +
geom_col() +
geom_text(data = missing, aes(label = label)) +
facet_wrap(~site)
Alternatively, you could also let the facets omit unused x-axis values:
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site, scales = "free_x")

How to use sec_axis() for discrete data in ggplot2 R?

I have discreet data that looks like this:
height <- c(1,2,3,4,5,6,7,8)
weight <- c(100,200,300,400,500,600,700,800)
person <- c("Jack","Jim","Jill","Tess","Jack","Jim","Jill","Tess")
set <- c(1,1,1,1,2,2,2,2)
dat <- data.frame(set,person,height,weight)
I'm trying to plot a graph with same x-axis(person), and 2 different y-axis (weight and height). All the examples, I find is trying to plot the secondary axis (sec_axis), or discreet data using base plots.
Is there an easy way to use sec_axis for discreet data on ggplot2?
Edit: Someone in the comments suggested I try the suggested reply. However, I run into this error now
Here is my current code:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("height",sec_axis(~.*1.2, name="height"))
p2
I get the error: Error in x < range[1] :
comparison (3) is possible only for atomic and list types
Alternately, now I have modified the example to match this example posted.
p <- ggplot(dat, aes(x = person))
p <- p + geom_line(aes(y = height, colour = "Height"))
# adding the relative weight data, transformed to match roughly the range of the height
p <- p + geom_line(aes(y = weight/100, colour = "Weight"))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*100, name = "Relative weight [%]"))
# modifying colours and theme options
p <- p + scale_colour_manual(values = c("blue", "red"))
p <- p + labs(y = "Height [inches]",
x = "Person",
colour = "Parameter")
p <- p + theme(legend.position = c(0.8, 0.9))+ facet_wrap(~set, scales="free")
p
I get an error that says
"geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?"
I get the template, but no points get plotted
R function arguments are fed in by position if argument names are not specified explicitly. As mentioned by #Z.Lin in the comments, you need sec.axis= before your sec_axis function to indicate that you are feeding this function into the sec.axis argument of scale_y_continuous. If you don't do that, it will be fed into the second argument of scale_y_continuous, which by default, is breaks=. The error message is thus related to you not feeding in an acceptable data type for the breaks argument:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("weight", sec.axis = sec_axis(~.*1.2, name="height"))
p2
The first argument (name=) of scale_y_continuous is for the first y scale, where as the sec.axis= argument is for the second y scale. I changed your first y scale name to correct that.

Resources