How to set x and y limits to same values? - r

This is some data I made. I have two data frames with two variables each.
var1 <- (1:10)*(rnorm(10,2,0.1))
var2 <- (6:15)*(rnorm(10,1,0.1))
df1 <- as.data.frame(cbind(var1,var2))
var3 <- (1:10)*(rnorm(10,3,0.1))
var4 <- (6:15)*(rnorm(10,1.5,0.1))
df2 <- as.data.frame(cbind(var3,var4))
There is a loop for plotting the first variable of df1 and df2, and the second of df1 and df2 too.
plot_list = list()
for(i in 1:ncol(df1)){
p=ggplot(df1,
aes_string(x=df1[,i],
y=df2[,i]))+
geom_point()
plot_list[[i]] = p
}
library(gridExtra)
do.call("grid.arrange", c(plot_list[c(1:2)], ncol=1))
And this is the plot I got.
So far so good. But, I would like to x and y within each plot had the same limit based on max and min. For example, in the above plot both x and should go from ~5 to ~30. In the below plot both x and should go from ~6 to ~24. I could set the limits manually, but I need to do this for many plots.
Is there any way to set the x and y limits for each plot based on min and max observed in any of the axis?
Thanks for the help.

In general, I’d suggest that the data for each plot should be in its own data.frame. Having a single data.frame and using facets is an option, but facets make it difficult to specify different limits for each plot. I’ve therefore gone with a grid.arrange solution similar to yours.
library(ggplot2)
library(purrr)
var1 <- (1:10)*(rnorm(10,2,0.1))
var2 <- (6:15)*(rnorm(10,1,0.1))
var3 <- (1:10)*(rnorm(10,3,0.1))
var4 <- (6:15)*(rnorm(10,1.5,0.1))
df1 <- data.frame(x = var1, y = var3)
df2 <- data.frame(x = var2, y = var4)
plots <- map(
list(df1, df2),
function(data) {
ggplot(data, aes(x, y)) +
geom_point() +
coord_fixed(xlim = range(c(data$x, data$y)), ylim = range(c(data$x, data$y)))
})
gridExtra::grid.arrange(grobs = plots, nrow = 2)

Related

add one legend with all variables for combined graphs

I'm trying to plot two graphs side-by-side with one common legend that incorporates all the variables between both graphs (some vars are different between the graphs).
Here's a mock example of what I've been attempting:
#make relative abundance values for n rows
makeData <- function(n){
n <- n
x <- runif(n, 0, 1)
y <- x / sum(x)
}
#make random matrices filled with relative abundance values
makeDF <- function(col, rw){
df <- matrix(ncol=col, nrow=rw)
for(i in 1:ncol(df)){
df[,i] <- makeData(nrow(df))
}
return(df)
}
#create df1 and assign col names
df1 <- makeDF(4, 5)
colSums(df1) #verify relative abundance values = 1
df1 <- as.data.frame(df1)
colnames(df1) <- c("taxa","s1", "s2", "s3")
df1$taxa <- c("ASV1", "ASV2", "ASV3", "ASV4", "ASV5")
#repeat for df2
df2 <- makeDF(4,5)
df2 <- as.data.frame(df2)
colnames(df2) <- c("taxa","s1", "s2", "s3")
df2$taxa <- c("ASV1", "ASV5", "ASV6", "ASV7", "ASV8")
# convert wide data format to long format -- for plotting
library(reshape2)
makeLong <- function(df){
df.long <- melt(df, id.vars="taxa",
measure.vars=grep("s\\d+", names(df), val=T),
variable.name="sample",
value.name="value")
return(df.long)
}
df1 <- makeLong(df1)
df2 <- makeLong(df2)
#generate distinct colours for each asv
taxas <- union(df1$taxa, df2$taxa)
library("RColorBrewer")
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
colpals <- qual_col_pals[c("Set1", "Dark2", "Set3"),] #select colour palettes
col_vector = unlist(mapply(brewer.pal, colpals$maxcolors, rownames(colpals)))
taxa.col=sample(col_vector, length(taxas))
names(taxa.col) <- taxas
# plot using ggplot
library(ggplot2)
plotdf2 <- ggplot(df2, aes(x=sample, y=value, fill=taxa)) +
geom_bar(stat="identity")+
scale_fill_manual("ASV", values = taxa.col)
plotdf1 <- ggplot(df1, aes(x=sample, y=value, fill=taxa)) +
geom_bar(stat="identity")+
scale_fill_manual("ASV", values = taxa.col)
#combine plots to one figure and merge legend
library(ggpubr)
ggpubr::ggarrange(plotdf1, plotdf2, ncol=2, nrow=1, common.legend = T, legend="bottom")
(if you have suggestions on how to generate better mock data, by all means!)
When I run my code, I am able to get the two graphs in one figure, but the legend does not incorporate all variables from both plots:
I ideally would like to avoid having repeat variables in the legend, such as:
From what I've searched online, the legend only works when the variables are the same between graphs, but in my case I have similar and different variables.
Thanks for any help!
Maybe this is what you are looking for:
Convert your taxa variables to factor with the levels equal to your taxas variable, i.e. to include all levels from both datasets.
Add argument drop=FALSE to both scale_fill_manual to prevent dropping of unused factor levels.
Note: I only added the relevant parts of the code and set the seed to 42 at the beginning of the script.
set.seed(42)
df1$taxa <- factor(df1$taxa, taxas)
df2$taxa <- factor(df2$taxa, taxas)
# plot using ggplot
library(ggplot2)
plotdf2 <- ggplot(df2, aes(x=sample, y=value, fill=taxa)) +
geom_bar(stat="identity") +
scale_fill_manual("ASV", values = taxa.col, drop = FALSE)
plotdf1 <- ggplot(df1, aes(x=sample, y=value, fill=taxa)) +
geom_bar(stat="identity")+
scale_fill_manual("ASV", values = taxa.col, drop = FALSE)
#combine plots to one figure and merge legend
library(ggpubr)
ggpubr::ggarrange(plotdf1, plotdf2, ncol=2, nrow=1, common.legend = T, legend="bottom")

How to graph “before and after” measures using ggplot with connecting lines?

I’m new to R I want to create a ”before-and-after” scatterplot with connecting lines to illustrate the different power outputs before and after a training intervention.
I want something like the graph in the picture.
Example
Sample Data
X <- c(0,1,0,1,0,1,0,1) # 0=before, 1=after
y <- c(1001,1030,900,950,1040,1020,1010,1000) #Power output
Group <- c(0,0,0,0,1,1,1,1) # 0=Control 1=Experimental
id <- c(1,1,2,2,3,3,4,4) # id = per individual
df <- data.frame(x,y,Group,id)
Many thanks
x <- c(0,1,0,1,0,1,0,1) # 0=before, 1=after
y <- c(1001,1030,900,950,1040,1020,1010,1000) #Power output
Group <- c(0,0,0,0,1,1,1,1) # 0=Control 1=Experimental
id <- c(1,1,2,2,3,3,4,4) # id = per individual
df <- dplyr::bind_cols(
x = x,
y = y,
Group = as.factor(Group),
id = as.factor(id)
)
library(ggplot2)
ggplot(df) +
aes(x,y, color = Group, shape = Group, group = id)+
geom_point()+
geom_line()

R: How to get a scatter plot from matrix data with discrete x axis

I'm pretty new at R and coding so I don't know how to explain it well on this site but I couldn't find a better forum to ask.
Basically I have a 6x6 matrix with each row being a discrete gene and each column being a sample.
I want the genes as the x-axis and the y-axis being the values of the samples, so that each gene will have its 6 samples above at their respective value.
I have this matrix in Excel and when I highlight it and plot it it gives me exactly what I want.
But trying to reduplicate it in R gives me a giant lattice plot at best.
I've tried boxplot(), scatterchart(), plot(), and ggplot().
I'm assuming I have to alter my matrix but I don't know how.
this may help:
library(tidyverse)
gene <- c("a", "b", "c", "d", "e", "f")
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,-6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c(4,-5,6,7,8,9)
x5 <- c(9,8,7,6,5,4)
x6 <- c(5,4,3,2,-1,0)
df <- data.frame(gene, x1, x2, x3, x4, x5, x6) #creates data.frame
as_tibble(df) # convenient way to check data.frame values and column format types
df <- df %>% gather(sample, observation, 2:7) # here's the conversion to long format
as_tibble(df) #watch df change
#example plots
p1 <- ggplot(df, aes(x = gene, y = observation, color = sample)) + geom_point()
p1
p2 <- ggplot(df, aes(x = gene, y = observation, group = sample, color = sample)) +
geom_line()
p2
p3 <- p2 + geom_point()
p3
This is very easy to solve - if your matrix is 6x6 with one gene per row and one observation per column (thus six observations per gene) you first need to make it long format (36 rows) - with such a simple format this can be done using unlist - and then plotting that against a vector of numbers for representing the genes:
# Here I make some dummy data - a 6x6 matrix of random numbers:
df1 <- matrix(rnorm(36,0,1), ncol = 6)
# To help show which way the data unlists, and make the
# genes different, I add 4 to gene 1:
df1[1,] <- df1[1,] + 4
#### TL;DR - HERE IS THE SOULTION ####
# Then plot it, using rep to make the x-axis data vector
plot(x = rep(1:6, times = 6), y = unlist(df1))
To improve the readability add axis labels:
# With axis labels
plot(x = rep(1:6, times = 6), y = unlist(df1),
xlab = 'Gene', ylab = 'Value')
You could also used ggplot with the geom_point aesthetic or geom_jitter - e.g:
ggplot() +
geom_jitter(mapping = aes(x = rep(1:6, times = 6), y = as.numeric(unlist(data.frame(df1)))))
Note that you can also create a "jitter" effect in base R using rnorm() on the x values, tweaking the amount of jittering with the last argument of the rnorm() function:
plot(x = rep(1:6, times = 6) + rnorm(36, 0, 0.05), y = unlist(df1), xlab = 'Gene', ylab = 'Value')

"Dotplot" visualisation with factors

I am not sure how to approach this. I want to create a "dotpot" style plot in R from a data frame of categorical variables (factors) such that for each column of the df I plot a column of dots, each coloured according to the factors. For example,
my_df <- cbind(c('sheep','sheep','cow','cow','horse'),c('sheep','sheep','sheep','sheep',<NA>),c('sheep','cow','cow','cow','cow'))
I then want to end up with a 3 x 5 grid of dots, each coloured according to sheep/cow/horse (well, one missing because of the NA).
Do you mean something like this:
my_df <- cbind(c('sheep','sheep','cow','cow','horse'),
c('sheep','sheep','sheep','sheep',NA),
c('sheep','cow','cow','cow','cow'))
df <- data.frame(my_df) # make it as data.frame
df$id <- row.names(df) # add an id
library(reshape2)
melt_df <-melt(df,'id') # melt it
library(ggplot2) # now the plot
p <- ggplot(melt_df, aes(x = variable, fill = value))
p + geom_dotplot(stackgroups = TRUE, binwidth = 0.3, binpositions = "all")

How to make one bar-chart from different data-frames with same format?

I have three different data-frames that have same format and I can not combine them because each one represent different data source. I would like to show percentage of one variable for different data frames in one bar chart.
I can get bar-chart for column1 of one dataframe by using:
ggplot(baseline, aes(x = c1)) +
geom_bar(aes(y = (..count..)/sum(..count..)),fill="blue",colour="blue") +
geom_text(aes(y = ((..count..)/sum(..count..)), label=scales::percent((..count..)/sum(..count..))), stat = "count")
I want output similar to this plot(except that I am showing percentage of each category) while race will be name of different data-frames and factor is values of column 1 of data frames.
I do not use ggplot2 but here is an illustration of how to accomplish what you want. It will be easiest to add a column to your data.frames indicating the source of each data.frame. Then calculate whatever metric you want, by source, then plot. Alternatively, you could calculate the metrics first, then combine the data.frames.
library(RColorBrewer)
library(data.table)
set.seed(1234)
make_data <- function() {
n <- sample(5:10, 1)
data.frame(id = rep(c("A", "B", "C"), each = n),
vals = c(rnorm(n, 5, 1), rnorm(n, 10, 1), rnorm(n, 15, 1)))
}
df1 <- make_data()
df2 <- make_data()
df3 <- make_data()
df4 <- make_data()
df1$src <- "source1"
df2$src <- "source2"
df3$src <- "source3"
df4$src <- "source4"
dat <- do.call(rbind, list(df1, df2, df3, df4))
dat <- as.data.table(dat)
res <- dat[ , mean(vals), by = list(id, src)][order(id)]
barplot(height = res$V1, col = rep(brewer.pal(4, "Set1"), 3))
EDIT
Here is the ggplot2 code provided by Sumedh:
library(ggplot2)
ggplot(res, aes(x = id, y = V1, fill = src)) +
geom_bar(stat = "identity", position = "dodge")

Resources