Line graphs in R Help Legend - r

So I am currently plotting data from a excel sheet using R. The problem I am having is in regards to the legend. Here is the Picture: https://i.stack.imgur.com/Key98.jpg As you can see in the legend, the values go as follows: PP1, PP10, PP15, PP3, PP30, PP5. I have been trying to make it go in numerical order as PP1, PP3, PP5, PP10,PP15, PP30. I am not sure how to fix this problem as I am very new to R coding. Any help would be greatly appreciated!! This is how i have my Excel sheet formated: https://i.stack.imgur.com/OfNaY.jpg Here is my Code:
library("dplyr")
install.packages("ggplot2")
library("ggplot2")
install.packages("tidyverse")
library("tidyverse")
install.packages('reshape')
library('reshape')
# import data
NPPdata <- read.csv("C:\\Users\\rrami\\Desktop\\R-Data\\NPPdata.csv", header = TRUE)
ggplot(NPPdata , aes(x = N_Gradient, y=Values, colour = Group))+
geom_errorbar(aes(ymin=Values-Stdvalue, ymax=Values+Stdvalue), lwd =1.2)+
geom_line(lwd=1.5)+
ggtitle("Year 1 MONO Phrag [Branch Prob 0.1]")+
theme(plot.title = element_text(hjust =0.5)) +
labs(x = "N-Gradient", y ="INV%")+
theme(axis.text.x = element_text(size = 14), axis.title.x = element_text(size = 16),
axis.text.y = element_text(size = 14), axis.title.y = element_text(size = 16))

I've made an example with "iris". As you can see, on the second figure 'scale_fill_discrete' is used to change the order of the labels
library (tidyverse)
data(iris)
figure_1 <- iris %>%
gather(key = floral_components, value = values, -Species) %>%
ggplot(aes(x = floral_components, y = values, fill = Species)) +
geom_bar(stat='identity') +
labs(x = "Floral Components",
y = "Values",
fill = "Species")
figure_2 <- iris %>%
gather(key = floral_components, value = values, -Species) %>%
ggplot(aes(x = floral_components, y = values, fill = Species)) +
geom_bar(stat='identity') +
labs(x = "Floral Components",
y = "Values",
fill = "Species") +
scale_fill_discrete(labels = c("versicolor", "virginica", "setosa"))

Related

Create a split violin plot with paired points and proper orientation

With ggplot2, I can create a violin plot with overlapping points, and paired points can be connected using geom_line().
library(datasets)
library(ggplot2)
library(dplyr)
iris_edit <- iris %>% group_by(Species) %>%
mutate(paired = seq(1:length(Species))) %>%
filter(Species %in% c("setosa","versicolor"))
ggplot(data = iris_edit,
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violin() +
geom_line(mapping = aes(group = paired),
position = position_dodge(0.1),
alpha = 0.3) +
geom_point(mapping = aes(fill = Species, group = paired),
size = 1.5, shape = 21,
position = position_dodge(0.1)) +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.title.x = element_blank(),
axis.text.y = element_text(size = 10))
The see package includes the geom_violindot() function to plot a halved violin plot alongside its constituent points. I've found this function helpful when plotting a large number of points so that the violin is not obscured.
library(see)
ggplot(data = iris_edit,
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violindot(dots_size = 0.8,
position_dots = position_dodge(0.1)) +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.title.x = element_blank(),
axis.text.y = element_text(size = 10))
Now, I would like to add geom_line() to geom_violindot() in order to connect paired points, as in the first image. Ideally, I would like the points to be inside and the violins to be outside so that the lines do not intersect the violins. geom_violindot() includes the flip argument, which takes a numeric vector specifying the geoms to be flipped.
ggplot(data = iris_edit,
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violindot(dots_size = 0.8,
position_dots = position_dodge(0.1),
flip = c(1)) +
geom_line(mapping = aes(group = paired),
alpha = 0.3,
position = position_dodge(0.1)) +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.title.x = element_blank(),
axis.text.y = element_text(size = 10))
As you can see, invoking flip inverts the violin half, but not the corresponding points. The see documentation does not seem to address this.
Questions
How can you create a geom_violindot() plot with paired points, such that the points and the lines connecting them are "sandwiched" in between the violin halves? I suspect there is a solution that uses David Robinson's GeomFlatViolin function, though I haven't been able to figure it out.
In the last figure, note that the lines are askew relative to the points they connect. What position adjustment function should be supplied to the position_dots and position arguments so that the points and lines are properly aligned?
Not sure about using geom_violindot with see package. But you could use a combo of geom_half_violon and geom_half_dotplot with gghalves package and subsetting the data to specify the orientation:
library(gghalves)
ggplot(data = iris_edit[iris_edit$Species == "setosa",],
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_half_violin(side = "l") +
geom_half_dotplot(stackdir = "up") +
geom_half_violin(data = iris_edit[iris_edit$Species == "versicolor",],
aes(x = Species, y = Sepal.Length, fill = Species), side = "r")+
geom_half_dotplot(data = iris_edit[iris_edit$Species == "versicolor",],
aes(x = Species, y = Sepal.Length, fill = Species),stackdir = "down") +
geom_line(data = iris_edit, mapping = aes(group = paired),
alpha = 0.3)
As a note, the lines in the pairing won't properly align because the dotplot is binning each observation then lengthing out the dotline-- the paired lines only correspond to x-value as defined in aes, not where the dot is in the line.
As per comment - this is not a direct answer to your question, but I believe that you might not get the most convincing visualisation when using the "slope graph" optic. This becomes quickly convoluted (so many dots/ lines overlapping) and the message gets lost.
To show change between paired observations (treatment 1 versus treatment 2), you can also (and I think: better) use a scatter plot. You can show each observation and the change becomes immediately clear. To make it more intuitive, you can add a line of equality.
I don't think you need to show the estimated distribution (left plot), but if you want to show this, you could make use of a two-dimensional density estimation, with geom_density2d (right plot)
library(tidyverse)
## patchwork only for demo purpose
library(patchwork)
iris_edit <- iris %>% group_by(Species) %>%
## use seq_along instead
mutate(paired = seq_along(Species)) %>%
filter(Species %in% c("setosa","versicolor")) %>%
## some more modificiations
select(paired, Species, Sepal.Length) %>%
pivot_wider(names_from = Species, values_from = Sepal.Length)
lims <- c(0, 10)
p1 <-
ggplot(data = iris_edit, aes(setosa, versicolor)) +
geom_abline(intercept = 0, slope = 1, lty = 2) +
geom_point(alpha = .7, stroke = 0, size = 2) +
cowplot::theme_minimal_grid() +
coord_equal(xlim = lims, ylim = lims) +
labs(x = "Treatment 1", y = "Treatment 2")
p2 <-
ggplot(data = iris_edit, aes(setosa, versicolor)) +
geom_abline(intercept = 0, slope = 1, lty = 2) +
geom_density2d(color = "Grey") +
geom_point(alpha = .7, stroke = 0, size = 2) +
cowplot::theme_minimal_grid() +
coord_equal(xlim = lims, ylim = lims) +
labs(x = "Treatment 1", y = "Treatment 2")
p1+ p2
Created on 2021-12-18 by the reprex package (v2.0.1)

plot time series and NA values in R

I have been searching for missing values visualization in R and even though there are many nice options, I haven't found a code to get exactly what I need.
My data frame (df) is
data.frame(
stringsAsFactors = FALSE,
Date = c("01/01/2000","02/01/2000",
"03/01/2000","04/01/2000","05/01/2000","06/01/2000",
"07/01/2000","08/01/2000","09/01/2000"),
Site.1 = c(NA,0.952101337,0.066766616,
0.77279551,0.715427011,NA,NA,NA,0.925705179),
Site.2 = c(0.85847963,0.663818831,NA,NA,
0.568488712,0.002833073,0.349365844,0.652482654,
0.334879886),
Site.3 = c(0.139854891,0.057024999,
0.297705256,0.914754178,NA,0.14108163,0.282896932,
0.823245136,0.153609705),
Site.4 = c(0.758317946,0.284147119,
0.756356853,NA,NA,0.313465424,NA,0.013689324,0.654615632)
) -> df
And I would like to get a plot similar to the following:
Taking into account that my actual data consists of 51 Sites and around 9,000 dates
You can try with something like this:
library(tidyr)
library(dplyr)
library(ggplot2)
df %>%
# from wide to long
pivot_longer(!Date, names_to = "sites", values_to = "value") %>%
# add a column of one an NAs following your data
mutate(fake = ifelse(is.na(value),NA, 1),
sites = as.factor(sites)) %>%
# plot it
ggplot(aes(x = Date, y = reorder(sites,desc(sites)), color = fake, group = sites)) +
# line size
geom_line( size = 2) +
# some aesthetics
ylab('sites') +
scale_color_continuous(high="black",na.value="white") +
theme(legend.position = 'none',
panel.background = element_rect(fill ='white'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Despite, I prefere something simpler like this:
df %>%
pivot_longer(!Date, names_to = "sites", values_to = "value") %>%
ggplot(aes(x = Date, y = sites, fill =value)) +
geom_tile() +
theme_light() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
This answer is similar to the first one, but uses NA values to interrupt the lines.
library(ggplot2)
library(tidyr)
library(stringr)
df %>% pivot_longer(., cols = 2:5) %>%
mutate(present = !is.na(value)) %>%
mutate(height = as.numeric(str_remove(name, "Site.")) * present) %>% mutate(value2 = case_when(!is.na(value) ~ height)) %>%
ggplot(aes(Date, value2, group = name)) +
geom_line() +
theme(legend.position = "none") +
scale_x_discrete(guide = guide_axis(angle = 90))

How to do a bar graphic with multiple columns out of an excel archive?

How can I make a graphic bar using barplot() or ggplopt() of an excel archive that has 83 columns?
I need to plot every column that has a >0 value on ich raw. (ich column represents a gene function and I need to know how many functions there is on ich cluster).
Iwas trying this,but it didn't work:
ggplot(x, aes(x=Cluster, y=value, fill=variable)) +
geom_bar(stat="bin", position="dodge") +
theme_bw() +
ylab("Funções no cluster") +
xlab("Cluster") +
scale_fill_brewer(palette="Blues")
Link to the excel:
https://github.com/annabmarques/GenesCorazon/blob/master/AllclusPathwayEDIT.xlsx
What about a heatmap? A rough example:
library(dplyr)
library(tidyr)
library(ggplot2)
library(openxlsx)
data <- read.xlsx("AllclusPathwayEDIT.xlsx")
data <- data %>%
mutate(cluster_nr = row_number()) %>%
pivot_longer(cols = -c(Cluster, cluster_nr),
names_to = "observations",
values_to = "value") %>%
mutate(value = as.factor(value))
ggplot(data, aes(x = cluster_nr, y = observations, fill = value)) +
geom_tile() +
scale_fill_brewer(palette = "Blues")
Given the large number of observations consider breaking this up into multiple charts.
It's difficult to understand exactly what you're trying to do. Is this what you're trying to achieve?
#install.packages("readxl")
library(tidyverse)
library(readxl)
read_excel("AllclusPathwayEDIT.xlsx") %>%
pivot_longer(!Cluster, names_to = "gene_counts", values_to = "count") %>%
mutate(Cluster = as.factor(Cluster)) %>%
ggplot(aes(x = Cluster, y = count, fill = gene_counts)) +
geom_bar(position="stack", stat = "identity") +
theme(legend.position = "right",
legend.key.size = unit(0.4,"line"),
legend.text = element_text(size = 7),
legend.title = element_blank()) +
guides(fill = guide_legend(ncol = 1))
ggsave(filename = "example.pdf", height = 20, width = 35, units = "cm")

Reordering the Barplots in ggplot2 in R

I have a dataframe through which I plot a bar plot through ggplot2 in R.
library(dplyr)
library(ggplot2)
library(reshape2)
Dataset<- c("MO", "IP", "MP","CC")
GPP <- c(1, 3, 4,3)
NPP<-c(4,3,5,2)
df <- data.frame(Dataset,GPP,NPP)
df.m<-melt(df)
ggplot(df.m, aes(Dataset, value, fill = variable)) +
geom_bar(stat="identity", position = "dodge")
my_se <- df.m %>%
group_by(Dataset) %>%
summarise(n=n(),
sd=sd(value),
se=sd/sqrt(n))
df.m %>%
left_join(my_se) %>%
ggplot(aes(x = Dataset, y = value, fill = variable)) +
geom_bar(stat="identity", position = "dodge")+
geom_errorbar(aes(x=Dataset, ymin=value-se, ymax=value+se), width=0.4, position = position_dodge(.9))+
scale_fill_manual(labels = c("GPP", "NPP"),values=cbp1)+
theme(legend.text=element_text(size=11),axis.text.y=element_text(size=11.5),
axis.text.x=element_text(size=11.5),axis.title.x = element_text(size = 12), axis.title.y = element_text(size = 12))+
theme_bw()+theme(legend.title =element_blank())+
labs(y= fn, x = "")
When my bargraph if plotted, the order of the bars is
I would like to rearrange the bars in order : MO, IP, MP, CC (not alphabetically).
Help would be appreciated.
You need to set your factor levels explicitly or R will pick an order for them.
In the case of characters R will pick alphabetical order. Since you want a non-alphabetical order you'll need to set levels inside of factor at some point before plotting (there are several places where you could do it).
df <- data.frame(Dataset = factor(Dataset, levels=c("MO", "IP"," MP", "CC")) ,GPP,NPP)
Try this (I changed the colors because cbp1 is not present):
df.m %>%
left_join(my_se) %>%
ggplot(aes(x = factor(Dataset,levels=c('MO', 'IP', 'MP', 'CC')), y = value, fill = variable)) +
geom_bar(stat="identity", position = "dodge")+
geom_errorbar(aes(x=Dataset, ymin=value-se, ymax=value+se), width=0.4, position = position_dodge(.9))+
scale_fill_manual(labels = c("GPP", "NPP"),values=c('pink','cyan'))+
theme(legend.text=element_text(size=11),axis.text.y=element_text(size=11.5),
axis.text.x=element_text(size=11.5),axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12))+
theme_bw()+theme(legend.title =element_blank())+
labs(y= "fn", x = "")

How to create such a figure using ggplot2 in R?

I have a matrix with many zero elements. The column names are labeled on the horizontal axis. I'd like to show explictly the nonzero elements as the bias from the vertical line for each column.
So how should construct a figure such as the example using ggplot2?
An example data can be generated as follow:
set.seed(2018)
N <- 5
p <- 40
dat <- matrix(0.0, nrow=p, ncol=N)
dat[2:7, 1] <- 4*rnorm(6)
dat[4:12, 2] <- 2.6*rnorm(9)
dat[25:33, 3] <- 2.1*rnorm(9)
dat[19:26, 4] <- 3.3*rnorm(8)
dat[33:38, 5] <- 2.9*rnorm(6)
colnames(dat) <- letters[1:5]
print(dat)
Here is another option using facet_wrap and geom_col with theme_minimal.
library(tidyverse)
dat %>%
as.data.frame() %>%
rowid_to_column("row") %>%
gather(key, value, -row) %>%
ggplot(aes(x = row, y = value, fill = key)) +
geom_col() +
facet_wrap(~ key, ncol = ncol(dat)) +
coord_flip() +
theme_minimal()
To further increase the aesthetic similarity to the plot in your original post we can
move the facet strips to the bottom,
rotate strip labels,
add "zero lines" in matching colours,
remove the fill legend, and
get rid of the x & y axis ticks/labels/title.
library(tidyverse)
dat %>%
as.data.frame() %>%
rowid_to_column("row") %>%
gather(key, value, -row) %>%
ggplot(aes(x = row, y = value, fill = key)) +
geom_col() +
geom_hline(data = dat %>%
as.data.frame() %>%
gather(key, value) %>%
count(key) %>%
mutate(y = 0),
aes(yintercept = y, colour = key), show.legend = F) +
facet_wrap(~ key, ncol = ncol(dat), strip.position = "bottom") +
coord_flip() +
guides(fill = FALSE) +
theme_minimal() +
theme(
strip.text.x = element_text(angle = 45),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
It would be much easier if you can provide some sample data. Thus I needed to create them and there is no guarantee that this will work for your purpose.
set.seed(123)
# creating some random sample data
df <- data.frame(id = rep(1:100, each = 3),
x = rnorm(300),
group = rep(letters[1:3], each = 100),
bias = sample(0:1, 300, replace = T, prob = c(0.7, 0.3)))
# introducing bias
df$bias <- df$bias*rnorm(nrow(df))
# calculate lower/upper bias for errorbar
df$biaslow <- apply(data.frame(df$bias), 1, function(x){min(0, x)})
df$biasupp <- apply(data.frame(df$bias), 1, function(x){max(0, x)})
Then I used kind of hack to be able to print groups in sufficient distance to make them not overlapped. Based on group I shifted bias variable and also lower and upper bias.
# I want to print groups in sufficient distance
df$bias <- as.numeric(df$group)*5 + df$bias
df$biaslow <- as.numeric(df$group)*5 + df$biaslow
df$biasupp <- as.numeric(df$group)*5 + df$biasupp
And now it is possible to plot it:
library(ggplot2)
ggplot(df, aes(x = x, col = group)) +
geom_errorbar(aes(ymin = biaslow, ymax = biasupp), width = 0) +
coord_flip() +
geom_hline(aes(yintercept = 5, col = "a")) +
geom_hline(aes(yintercept = 10, col = "b")) +
geom_hline(aes(yintercept = 15, col = "c")) +
theme(legend.position = "none") +
scale_y_continuous(breaks = c(5, 10, 15), labels = letters[1:3])
EDIT:
To incorporate special design you can add
theme_bw() +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 1),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
to your plot.
EDIT2:
To incorporate several horizontal lines, you can create different dataset:
df2 <- data.frame(int = unique(as.numeric(df$group)*5),
gr = levels(df$group))
And use
geom_hline(data = df2, aes(yintercept = int, col = gr))
instead of copy/pasting geom_hline for each group level.

Resources