pheatmap column cluster annotation - r

I have following code for the heat map generation using "pheatmap" package. I am successful creating heat map but having very hard time to annotate the column. I have 9 cell types and each cell type has 3 replicates.
Here is the data. I have labeled samples as 01_01,01_02,01_03 to 09_01, 09_02,09_03.
I would like to have column annotation as "A", "B", "C", "D","E", "F", "G", "H", "I". pheatmap with clustering row and column
I tried my best to annotate cluster column but wasn't successful. I really appreciate if someone can explain how to do customize column annotation.
Data:
my data file
library(readxl)
library(pheatmap)
library(tibble)
library(dplyr)
read file
test<-read_excel("~test.xlsx",1)
test%>%
column_to_rownames("Metabolite") %>%
as.data.frame()
test<-as.data.frame(test)
rownames(test)<-test$Metabolite
test$Metabolite<-NULL
pheatmap(test,cluster_rows=TRUE,cluster_cols=TRUE,breaks=NA,scale="none",legend=TRUE,color=colorRampPalette(c("navy","white","firebrick3"))(50),margins=c(5,10),fontsize_row=4,fontsize_col=4,cellwidth=5,cellheight=5)

Related

VegaLite.jl ignoring order of rows

I am trying to plot a sorted bar chart in Julia using VegaLite.jl on the DataFrame sorted by the chosen key, but the bar chart completely ignores the order of the DataFrame. I've also tried to specify order on y in different ways, but that resulted in nothing (neither an error nor any effect on the sorting order).
Toy data and plotting:
using DataFrames
using VegaLite
df2 = DataFrame(weight=rand(10), col_name=["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"])
sort!(df2, [:weight], rev=true)
df2 |> #vlplot() + #vlplot(:bar, x=:col_name, y=:weight) + #vlplot(mark={:rule, strokeDash=[2,2], size=2}, y={datum=0.5})
How to specify the order of the bar chart so that it actually works?

Renaming values in R after binning with cut()

I had a list of numerical values that I wanted to bin using cut(). Now each row has been replaced with the range that it fell into, in the form of ranges using brackets e.g. [0,140] meaning between 0 and 140 inclusive
The problem is these names are lengthy, and eventually require exponent notation, making them even longer, and it makes the graph illegible. Using typeof() it appears it's still in integer form, but I can't figure out how to rename them the way I would with factors. When I tried with factor() and the labels parameter, I was told that sort only worked on atomic lists.
As an example, here's essentially what I tried on my dataset, except with the built-in iris dataset:
data(iris)
iris[1] <- cut(iris[[1]], 10, include.lowest=TRUE)
iris[1] <- factor(iris[1], labels = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))
It returns the error:
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

How to select columns from R dataframe?

I know we can extract specific columns from R dataframe with df[,c("A","B","E")]. However, There are so many columns I want to pick out, and I can not type them one by one. I have a dataframe B that contains all column headers that I want to extract from Dataframe A. How can I extract columns from Dataframe A based on the headers I put in DataframeB ?
I tried A[,B[, 1]] but I got incorrect number of dimensions I got same error when I tried to print B[, 1].
With dput(B) I got > dput(B)
c("A", "B", "C",
"D", "E", "F",
"G")

R - show only levels used in a subset of data frame

I have a rather large data frame with a factor that has a lot of levels (more than 4,000). I have another column in the same data frame that I'm using as a reference, and what I'd like to find is a subset of the levels whenever this reference column is NA.
The first step I'm using is subsetrows <- which(is.na(mydata$reference)) but after that I'm stuck. I want something like levels(mydata[subsetrows,mydata$factor]) but unfortunately, this command shows me all the levels and not just the ones existing in subsetrows. I suppose I could create a new vector outside of my data frame of only my subset rows and then drop any unused levels, but is there any easier/cleaner way to do this, possibly without copying my data outside the data frame?
As an example of what I want returned, if my data frame has factor levels from A to Z, but in my subset only P, R and Y appear, I want something that returns the levels P, R and Y.
You can certainly accomplish this with base functions. But my personal preference is to use dplyr with chained operations such as this:
library(dplyr)
d %>%
filter(is.na(ref)) %>%
select(field) %>%
distinct()
data
d <- data.frame(
field = c("A", "B", "C", "A", "B", "C"),
ref = c(NA, "a", "b", NA, "c", NA)
)
I modified a suggestion in the comments by Marat to use the function unique that seems to return the correct levels.
Solution:
subsetrows <- which(is.na(mydata$reference))
unique(as.character(mydata$factor[subsetrows]))
While I like learning new packages and functions, this solution seems better at this point since it's more compact and easier for me to understand if I need to revisit this code at some distant point in the future.

How do I generate a boxplot using the original data order (not alphabetical)?

I am new to R. I've made a boxplot of my data but currently R is sorting the factors alphabetically. How do I maintain the original order of my data? This is my code:
boxplot(MS~Code,data=Input)
I have 40 variables that I wish to boxplot in the same order as the original data frame lists them. I've read that I may be able to set sort.names=FALSE to maintain the original order by I don't understand where that piece of code would go.
Is there a way to redefine my Input before it goes into boxplot?
Thank you.
factor the variable again as you wish in line 3
data(InsectSprays)
data <- InsectSprays
data$spray <- factor(data$spray, c("B", "C", "D", "E", "F", "G", "A"))
boxplot(count ~ spray, data = data, col = "lightgray")
The answer above is 98% of the way there.
set.seed(1)
# original order is E - A
Input <- data.frame(Code=rep(rev(LETTERS[1:5]),each=5),
MS=rnorm(25,sample(1:5,5)))
boxplot(MS~Code,data=Input) # plots alphabetically
Input$Code <- with(Input,factor(Code,levels=unique(Code)))
boxplot(MS~Code,data=Input) # plots in original order

Resources