Generate heatmap in R (multiple independent variable) - r

There are a few similar questions but they are not asking what I am looking for.
I have a gene expression data with multiple independent variables. I want to visualize it using a heatmap in R. I am not able to include all the three variables together on the heatmap. Below is the example code:
species <- rep(c("st", "rt"), each = 18)
life <- rep(c("5d", "15d", "45d"), 2, each = 6)
concentration <- rep(c("c1", "c2", "c3"), 6, each = 2)
gene <- rep(c("gene1", "gene2"), 36, each = 1)
response <- runif(36, -4, 4)
data1 <- data.frame(species, life, concentration, gene, response)
I am open to use any package. Please see below image which is from a different dataset. I wish to visualize my data in a similar manner.
example_data_visualized
Many thanks in advance!

I am not sure which of the variables in your code correspond to which of the dimensions in your chart but, using the ggplot2 package, it's quite easy to do it:
library(ggplot2)
ggplot(data1, aes(x = factor(life, levels = c("5d", "15d", "45d")),
y = concentration,
fill = response)) +
geom_tile() +
facet_wrap(~species + gene, nrow = 1) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
scale_x_discrete(name = "life")
Of course, you can adjust the titles, labels, colours etc accordingly.

Related

R colour code plot by rownames for principal component analysis

I am attempting to complete a principal component analysis on a set of data containing columns of numeric data.
Assuming a dataset like this (in reality I have a pre configured data frame, this one if for reproducibility):
v1 <- c(1,2,3,4,5,6,7)
v2 <- c(3,6,2,5,2,4,9)
v3 <- c(6,1,4,2,3,7,5)
dataset <-data.frame(v1,v2,v3)
row.names(dataset) <-c('New York', 'Seattle', 'Washington DC', 'Dallas', 'Chicago','Los Angeles','Minneapolis')
I have ran my principal component analysis, and successfully plotted it:
pca=prcomp(dataset,scale=TRUE)
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
What I want to do however is colour code my data points based on the city, which is the row names of my dataset. I also want to use these cities (i.e. rownames) as labels.
I've tried the following, but neither have worked:
## attempt 1 - I get row labels, but no chart
plot(pca$x[,1], pca$x[,2],col=rownames(dataset),pch=rownames(dataset),
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),cex=0.7,pos=3,col="darkgrey")
## attempt 2
datasetwithcity = rownames_to_column(dataset, var = "city")
head(datasetwithcity)
OnlyCities=datasetwithcity[,1]
OnlyCities
# this didn't work:
City_Labels=as.numeric(OnlyCities)
head(City_Labels)
# gets city labels, but loses points and no colour
plot(pca$x[,1], pca$x[,2],col=City_Labels,pch=City_Labels,
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),
cex=0.7,pos=3,col="darkgrey")
There are many different ways to do this.
In base R, you could do:
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC", col = seq(nrow(pca$x)),
xlim = c(-2.5, 2.5), ylim = c(-2, 2))
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
text(x = pca$x[,1], y = pca$x[,2], labels = rownames(pca$x), pos = 1)
Personally, I think the resulting aesthetics are nicer (and more easy to change to suit your needs) with ggplot. The code is also a bit easier to read once you get used to the syntax.
library(ggplot2)
df <- as.data.frame(pca$x)
df$city <- rownames(df)
ggplot(df, aes(PC1, PC2, color = city)) +
geom_point(size = 3) +
geom_text(aes(label = city) , vjust = 2) +
lims(x = c(-2.5, 2.5), y = c(-2, 2)) +
theme_bw() +
theme(legend.position = "none")
Created on 2021-10-28 by the reprex package (v2.0.0)

How do I plot the Variable Importance of my trained rpart decision tree model?

I trained a model using rpart and I want to generate a plot displaying the Variable Importance for the variables it used for the decision tree, but I cannot figure out how.
I was able to extract the Variable Importance. I've tried ggplot but none of the information shows up. I tried using the plot() function on it, but it only gives me a flat graph. I also tried plot.default, which is a little better but still now what I want.
Here's rpart model training:
argIDCART = rpart(Argument ~ .,
data = trainSparse,
method = "class")
Got the variable importance into a data frame.
argPlot <- as.data.frame(argIDCART$variable.importance)
Here is a section of what that prints:
argIDCART$variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958
I'd like to plot a graph that shows the variable/feature name and its numerical importance. I just can't get it to do that. It appears to only have one column. I tried separating them using the separate function, but can't do that either.
ggplot(argPlot, aes(x = "variable importance", y = "feature"))
Just prints blank.
The other plots look really bad.
plot.default(argPlot)
Looks like it plots the points, but doesn't put the variable name.
Since there is no reproducible example available, I mounted my response based on an own R dataset using the ggplot2 package and other packages for data manipulation.
library(rpart)
library(tidyverse)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
df <- data.frame(imp = fit$variable.importance)
df2 <- df %>%
tibble::rownames_to_column() %>%
dplyr::rename("variable" = rowname) %>%
dplyr::arrange(imp) %>%
dplyr::mutate(variable = forcats::fct_inorder(variable))
ggplot2::ggplot(df2) +
geom_col(aes(x = variable, y = imp),
col = "black", show.legend = F) +
coord_flip() +
scale_fill_grey() +
theme_bw()
ggplot2::ggplot(df2) +
geom_segment(aes(x = variable, y = 0, xend = variable, yend = imp),
size = 1.5, alpha = 0.7) +
geom_point(aes(x = variable, y = imp, col = variable),
size = 4, show.legend = F) +
coord_flip() +
theme_bw()
If you want to see the variable names, it may be best to use them as the labels on the x-axis.
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART))
(You may need to resize the window to see the labels properly.)
If you have a lot of variables, you may want to rotate the variable names so that the do not overlap.
par(mar=c(7,4,3,2))
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART), las=2)
Data
argIDCART = read.table(text="variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958",
header=TRUE)

How to create separate facets for different measurements with tidyverse?

I am a novice programmer looking to plot highly grouped variables. Specifically, I am trying to plot a variable that is grouped by 5 other variables. Below is an example data that I am working with.
library(ggplot2)
library(tibble)
set.seed(42)
mydf <- tibble(
grp = rep(c('A', 'B'), length.out = 32, each = 16),
sex = rep(c('M', 'F'), length.out = 32, each = 8),
cond = rep(c('Wet', 'Dry'), length.out = 32, each = 4),
measure = rep(c('Tempature', 'Volume'), length.out = 32, each = 2),
kind = rep(c('Experimental', 'Control'), length.out = 32, each = 1),
value = rnorm(32) * 100,
)
ggplot(mydf, aes(x = grp, y = value, col = cond)) +
geom_point() +
facet_wrap(sex~measure + kind)
However, the output is quite messy. Would it be possible to create separate faceted plots for each measurement? What would be a proper way to graph this type of data?
Thank you
For ease of comparison, I would facet on no more than two variables. I would also use facet_grid() rather than facet_wrap() in such cases, as I think it's just easier to keep track of the different facet dimensions if they are on separate axes.
In your case, you want to distinguish measurements for 5 binary variables.
grp
sex
cond
measure
kind
With "grp" on the x-axis, "sex" distinguished by colour, and 2 of the remaining 3 on facets, we'll need to introduce another aesthetic parameter to distinguish the last variable.
In this case, since there aren't too many points to plot, I suggest shape.
ggplot(mydf, aes(x = grp, y = value,
color = cond,
shape = kind)) +
geom_point(size = 5, stroke = 2) +
facet_grid(sex~measure) +
scale_shape_manual(values = c("Control" = 4, "Experimental" = 16),
breaks = c("Experimental", "Control"))
The use of a filled shape vs an un-filled shape makes Experimental points visually distinct from Control points. You can check out other shape options here.
Note that if there are many different values in your grouping variables (e.g. 5 categories along the x-axis, 6 different colours, 20 facet combinations, etc.), or many points within each facet, the plot will look very busy, and you may want to split into separate plots rather than keep everything together.

How to add a trendline to a boxplot of counts(y axis) and ids(x axis) when x axis is ordered

df1 <- data.frame(a=c(1,4,7),
b=c(3, 5, 6),
c=c(1, 1, 4),
d=c(2 ,6 ,3))
df2<-data.frame(id=c("a","f","f","b","b","c","c","c","d","d"),
var=c(12,20,15,18,10,30,5,8,5,5))
mediorder <- with(df2, reorder(id, -var, median))
boxplot(var~mediorder, data = df2)
fc = levels(as.factor(mediorder))
ndf1= df1[,intersect(fc, colnames(df1))]
ln<-lm( #confused here
boxplot(ndf1)
abline(ln)
I have the above boxplot (ndf1) with an x-axis ordered according to medians from another data frame, and I would like to add a trendline to it.
I am confused since it doesn't have an x and y variable to refer to, just columns with counts. Also the ordering is causing me problems.
EDITED for clarification...
I am building on the question here: How to match an ordered list (e.g., levels(as.factor(x)) ) to another dataframe in which only some columns match?
All I would like to do is fit a trend line to ndf1
Something like this should do. It's fairly easy using ggplot2. However, your data/question are a bit confusing e.g. Some factors (a,d) have one data point only. Is this what you want?
df2$id <- factor(df2$id , levels = levels(mediorder))
library(ggplot2)
ggplot(data = df2, aes(x = id, y = var)) + geom_boxplot() +
geom_smooth(method = "lm", aes(group = 1), se = F)

Subsetting ggplot2 graph using facet_grid()

I am trying to get individual trajectories and fitted trajectory per group across repeated measurements.
Toy data below:
set.seed(124)
ID <- factor(rep(1:21, times = 3))
Group <- rep(c("A", "B", "C"), times = 21)
score <- rnorm(63, 25, 3)
session <- rep(c("s1","s2", "s3"), each = 21)
df <- data.frame(ID, Group, session, score)
Now plot trajectories across the three repeated measures for each individual and derive a fitted slope for the whole sample.
c <- ggplot(df, aes(x = session, y = score, group = ID, colour = ID)) +
geom_smooth(method = "lm", se = FALSE) +
stat_smooth(aes(group = 1), se = FALSE, method = "lm", color = "red")
c
Now I want to break this plot up into three plots by group. There is the long way where you subset the dataframe by group and do three separate graphs, However I would like to do it all in one graph, same as above, except separated by group. I tried:
c + facet_grid(.~Group)
But it comes out blank. Something is missing here and I don't know what it is.

Resources