transform the data and create histogram with facets in R - r

I have 400 columns with dynamic names (t_namelist1_namelist2). There are 20 names in each namelist1 and namelist2. I have to create a histogram with facets 20 by 20, with labels -
along row - namelist1
along col - namelist2
Can someone please show how to transform the data using pivot_longer() using _ which separates the three parts t, namelist1 and namelist2
In the sample problem, I have a tibble with 4 columns, I want to create 4 individual histograms in 2by2 facets with labels -
along row - a and b
along col - x and y
Thanks
library(tidyverse)
t_a_x <- rnorm(100)
t_b_x <- rnorm(100)
t_a_y <- rnorm(100)
t_b_y <- rnorm(100)
tbl <- tibble(t_a_x, t_a_y, t_b_x, t_b_y)
# create a histogram in 2 by 2 facets with labels -
# along row - a and b
# along col - x and y

The first chunk of code below rearranges the example data into three-column data frame, each column corresponding to either "ab", "xy", or the value ("t"); basically separating the original column name by "_". Then you can plot and facet based on "xy" and "ab".
# Rearrange table by separating the column names by "_" using pivot_longer()
tbl_formatted <- tbl %>% pivot_longer(everything(),
names_to = c(".value", "ab", "xy"),
names_sep = c("_")
)
# Plot
tbl_formatted %>% ggplot(aes(x = t, y = t)) +
geom_point() +
facet_wrap(facets = ab ~ xy)

This is a very basic version of the plot; you can customize it with colors and more.
To get it, you need to properly rearrange your dataframe:
library(tidyverse)
tbl <- tibble(value = c(t_a_x, t_b_x, t_a_y, t_b_y),
lab1 = rep(c("a", "b", "a", "b"), each = 100),
lab2 = rep(c("x", "x", "y", "y"), each = 100))
ggplot(tbl) +
geom_histogram(aes(value), binwidth = 0.5) +
facet_grid(lab1~lab2)

Related

Converting basic r barplot to ggplot

I've currently got a barplot that has a few basic parameters. However, I'm looking to try and convert this into ggplot. The extra parameters don't matter too much; the main problem that I'm having is that I'm trying to plot the sum of various columns, but I'm unable to transpose it correctly as t(data) doesn't seem to work. Here's what I've got so far:
## Subset of indicators
indicators <- clean_data[c(8, 12, 14:23)]
## Get sum of columns
indicator_sums <- colSums(indicators, na.rm = TRUE)
### Transpose for ggplot
(empty)
## Make bar plot
barplot(indicator_sums, ylim=range(pretty(c(0, indicator_sums))), cex.axis=0.75,cex.lab=0.8, cex.names=0.7, col='magenta', las=2, ylab = 'Offences Recorded Using Indicator')
You may try
library(dplyr)
library(reshape2)
dummy <- data.frame(
A = c(1:20),
B = rnorm(20, 10, 4),
C = runif(20, 19,30),
D = sample(c(10:40),20, replace = T)
)
barplot(colSums(dummy))
dummy %>%
colSums %>%
melt %>%
rownames_to_column %>%
ggplot(aes(x = rowname, y = value)) +
geom_col()

R: How to get a scatter plot from matrix data with discrete x axis

I'm pretty new at R and coding so I don't know how to explain it well on this site but I couldn't find a better forum to ask.
Basically I have a 6x6 matrix with each row being a discrete gene and each column being a sample.
I want the genes as the x-axis and the y-axis being the values of the samples, so that each gene will have its 6 samples above at their respective value.
I have this matrix in Excel and when I highlight it and plot it it gives me exactly what I want.
But trying to reduplicate it in R gives me a giant lattice plot at best.
I've tried boxplot(), scatterchart(), plot(), and ggplot().
I'm assuming I have to alter my matrix but I don't know how.
this may help:
library(tidyverse)
gene <- c("a", "b", "c", "d", "e", "f")
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,-6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c(4,-5,6,7,8,9)
x5 <- c(9,8,7,6,5,4)
x6 <- c(5,4,3,2,-1,0)
df <- data.frame(gene, x1, x2, x3, x4, x5, x6) #creates data.frame
as_tibble(df) # convenient way to check data.frame values and column format types
df <- df %>% gather(sample, observation, 2:7) # here's the conversion to long format
as_tibble(df) #watch df change
#example plots
p1 <- ggplot(df, aes(x = gene, y = observation, color = sample)) + geom_point()
p1
p2 <- ggplot(df, aes(x = gene, y = observation, group = sample, color = sample)) +
geom_line()
p2
p3 <- p2 + geom_point()
p3
This is very easy to solve - if your matrix is 6x6 with one gene per row and one observation per column (thus six observations per gene) you first need to make it long format (36 rows) - with such a simple format this can be done using unlist - and then plotting that against a vector of numbers for representing the genes:
# Here I make some dummy data - a 6x6 matrix of random numbers:
df1 <- matrix(rnorm(36,0,1), ncol = 6)
# To help show which way the data unlists, and make the
# genes different, I add 4 to gene 1:
df1[1,] <- df1[1,] + 4
#### TL;DR - HERE IS THE SOULTION ####
# Then plot it, using rep to make the x-axis data vector
plot(x = rep(1:6, times = 6), y = unlist(df1))
To improve the readability add axis labels:
# With axis labels
plot(x = rep(1:6, times = 6), y = unlist(df1),
xlab = 'Gene', ylab = 'Value')
You could also used ggplot with the geom_point aesthetic or geom_jitter - e.g:
ggplot() +
geom_jitter(mapping = aes(x = rep(1:6, times = 6), y = as.numeric(unlist(data.frame(df1)))))
Note that you can also create a "jitter" effect in base R using rnorm() on the x values, tweaking the amount of jittering with the last argument of the rnorm() function:
plot(x = rep(1:6, times = 6) + rnorm(36, 0, 0.05), y = unlist(df1), xlab = 'Gene', ylab = 'Value')

Using lapply to make a plot with factors from a list of data frames

I generated a data frame with two factors, an x-variable, and a y-variable:
set.seed(1)
abc.df <- data.frame(col1 = rep(c("a", "b", "c"), 1000), col2 = rep(1:4, 750),
col3 = rnorm(3000), col4 = rnorm(3000, 2))
names(abc.df) <- c("factor1", "factor2", "q", "value")
abc.df$factor1 <- as.factor(abc.df$factor1)
abc.df$factor2 <- as.factor(abc.df$factor2)
abc_list <- split(abc.df, abc.df$factor1)
I want to generate a three plots for each letter (a, b, and c) with the number factors distinguished by color, but it generated an error message:
par(mfrow = c(1, 3))
lapply(abc_list, function(x) {plot(abc_list[[x]]$q, abc_list[[x]]$value,
col = factor2)})
Error in abc_list[[x]] : invalid subscript type 'list'
How can I make the series of plots with the right syntax?
When you use lapply to loop through a list, the argument passed to the function is the element of the list not the index, so you can not use abc_list[[x]] to subset instead, just use x, so for your case:
par(mfrow = c(1, 3))
lapply(abc_list, function(x) {plot(x$q, x$value, col = x$factor2)})
There is another convenient way to get this kind of plot using ggplot2 package with facet_grid, where you don't need to split your original data frame into lists but specify the factor1 as a variable to layout your plots:
library(ggplot2)
ggplot(abc.df, aes(x = q, y = value, col = factor2)) + geom_point() + facet_grid(.~factor1)

Multiplot of multiplots in ggplot2

I recently discovered the multiplot function from the Rmisc package to produce stacked plots using ggplot2 plots/objects. What I am trying to do now is to create a multiplot of multiplots. Unfortunately, unlike the ggplot function, multiplot does not produce objects, so my issue cannot be resolved by simply nesting multiplot.
I will create a dataframe to make my point clear. In my dataframe named df, I have 3 columns: period, group and value. A certain value is recorded for each of 3 groups over 10 periods. (Note: I don't use a seed number below despite the use of the sample function because the focus is not numerical, it is graphical)
# Create a data frame for illustration purposes
df <- data.frame(period = rep(1:10, 3),
group = rep(LETTERS[1:3], each = 10),
value = sample(100, 30, replace = TRUE))
I then add a fourth column to df, which is the exponential transformation of the value column.
df$exp.value = exp(df$value)
I would like to create stacked plots allowing me to compare the values in each group to their exponential counterparts.
# Split dataframe by group
df_split <- split(df, df$group)
# Plots of values in each group
plots <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = value)) + geom_line()
})
# Plots of logged values in each group
plots_exp <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = exp.value)) + geom_line()
})
plots and plots_exp are both lists of 3 elements each containing ggplot objects. The first element of each list corresponds to group A, the second element corresponds to group B and the third element corresponds to group C.
In order to compare each group's values to the exponential values, I can use the multiplot function. Following is an example with group A:
multiplot(plots[[1]], plots_log[[1]], cols = 1)
How can I create a grid which will include the multiplot above as well as the ones for groups B and C? As if the code included ... + facet_grid(. ~ group)?
We can use cowplot package:
library(cowplot)
plot_grid(plots[[1]], plots_exp[[1]],
plots[[2]], plots_exp[[2]],
plots[[3]], plots_exp[[3]],
labels = c("A", "A", "B", "B", "C", "C"),
ncol = 1, align = "v")
We can output to a pdf looping through plots and plots_exp list objects. Every page will contain 2 plots. This is a better option when we have a lot of groups:
pdf("myPlots.pdf")
lapply(seq(length(plots)), function(i){
plot_grid(plots[[i]], plots_exp[[i]], ncol = 1, align = "v")
})
dev.off()
Another option is to prepare the data for ggplot and use facet as usual:
library(dplyr)
library(tidyr)
library(ggplot2)
gather(df, valueType, value, -c(group, period)) %>%
mutate(myGroup = paste(group, valueType)) %>%
ggplot(aes(period, value)) +
geom_line() +
facet_grid(myGroup ~ ., scales = "free_y")

creating histogram bins in r

I have this code.
a = c("a", 1)
b = c("b",2)
c = c('c',3)
d = c('d',4)
e = c('e',5)
z = data.frame(a,b,c,d,e)
hist = hist(as.numeric(z[2,]))
I am trying to have a histogram such that the bins would be a,b,c,d,e
and the freq values would be 1,2,3,4,5.
However, it gives me an empty screen(no bins at all for histogram model)
You are plotting the factor levels of each column for row 2, which is in this case always 1.
When creating the dataframe you add stringsAsFactors=FALSE to avoid converting the numbers to factors. This should work:
z = data.frame(a,b,c,d,e,stringsAsFactors=FALSE)
hist(as.numeric(z[2,]))
Perhaps this would work for you: it creates a data frame with the x elements being the letters a through 'e', and the y elements being the numbers 1 through 5. It then renders a histogram and tells ggplot not to perform any binning.
library(ggplot2)
tmp <- data.frame(x = letters[1:5], y = 1:5)
ggplot(tmp, aes(x = x, y = y)) + geom_histogram(stat = "identity")

Resources