pca analysis using factominer and factoextra packages / libraries - r

library(factoextra)
library(FactoMineR)
res.pca = PCA(clin.oc[,1:14], graph = TRUE)
scree = fviz_screeplot(res.pca, ncp=5, addlabels=TRUE, size=0.5, title='Scree
plot')
loadings = plot(res.pca, choix = "var", title='Loadings')
n = fviz_contrib(res.pca, choice = "var", axes = 1, top = 5, title='Variable
contributions')
scores = fviz_pca_ind(res.pca, label="none", habillage=as.factor(clin),
tittle='scores')
ggarrange(scree, loadings, n, scores)
and ggarange function wont work, says it is not found.

I think the error is already in res.pca because your 'identifier' variable is of type character. Take those out (guess: PCA(clin.oc[,2:14], …).

Related

Heatmap of Gene intensity values in R

I have data that look like this:
Gene
HBEC-KT-01
HBEC-KT-02
HBEC-KT-03
HBEC-KT-04
HBEC-KT-05
Primarycells-02
Primarycells-03
Primarycells-04
Primarycells-05
BPIFB1
15726000000
15294000000
15294000000
14741000000
22427000000
87308000000
2.00E+11
1.04E+11
1.51E+11
LCN2
18040000000
26444000000
28869000000
30337000000
10966000000
62388000000
54007000000
56797000000
38414000000
C3
2.52E+11
2.26E+11
1.80E+11
1.80E+11
1.78E+11
46480000000
1.16E+11
69398000000
78766000000
MUC5AC
15647000
8353200
12617000
12221000
29908000
40893000000
79830000000
28130000000
69147000000
MUC5B
965190000
693910000
779970000
716110000
1479700000
38979000000
90175000000
41764000000
50535000000
ANXA2
14705000000
18721000000
21592000000
18904000000
22657000000
28163000000
24282000000
21708000000
16528000000
I want to make a heatmap like the following using R. I am following a paper and they quoted "Heat maps were generated with the ‘pheatmap’ package76, where correlation clustering distance row was applied". Here is their heatmap.
I want the same like this and I am trying to make one using R by following tutorials but I am new to R language and know nothing about R.
Here is my code.
df <- read.delim("R.txt", header=T, row.names="Gene")
df_matrix <- data.matrix(df)
pheatmap(df_matrix,
main = "Heatmap of Extracellular Genes",
color = colorRampPalette(rev(brewer.pal(n = 10, name = "RdYlBu")))(10),
cluster_cols = FALSE,
show_rownames = F,
fontsize_col = 10,
cellwidth = 40,
)
This is what I get.
When I try using clustering, I got the error.
pheatmap(
mat = df_matrix,
scale = "row",
cluster_column = F,
show_rownames = TRUE,
drop_levels = TRUE,
fontsize = 5,
clustering_method = "complete",
main = "Hierachical Cluster Analysis"
)
Error in hclust(d, method = method) :
NA/NaN/Inf in foreign function call (arg 10)
Can someone help me with the code?
You can normalize the data using scale to archive a more uniform coloring. Here, the mean expression is set to 0 for each sample. Genes lower expressed than average have a negative z score:
library(tidyverse)
library(pheatmap)
data <- tribble(
~Gene, ~`HBEC-KT-01`, ~`HBEC-KT-02`, ~`HBEC-KT-03`, ~`HBEC-KT-04`, ~`HBEC-KT-05`, ~`Primarycells-03`, ~`Primarycells-04`, ~`Primarycells-05`,
"BPIFB1", 1.5726e+10, 1.5294e+10, 1.5294e+10, 1.4741e+10, 2.2427e+10, 2e+11, 1.04e+11, 1.51e+11,
"LCN2", 1.804e+10, 2.6444e+10, 2.8869e+10, 3.0337e+10, 1.0966e+10, 5.4007e+10, 5.6797e+10, 3.8414e+10,
"C3", 2.52e+11, 2.26e+11, 1.8e+11, 1.8e+11, 1.78e+11, 1.16e+11, 6.9398e+10, 7.8766e+10,
"MUC5AC", 15647000, 8353200, 12617000, 12221000, 29908000, 7.983e+10, 2.813e+10, 6.9147e+10,
"MUC5B", 965190000, 693910000, 779970000, 716110000, 1479700000, 9.0175e+10, 4.1764e+10, 5.0535e+10,
"ANXA2", 1.4705e+10, 1.8721e+10, 2.1592e+10, 1.8904e+10, 2.2657e+10, 2.4282e+10, 2.1708e+10, 1.6528e+10
)
data %>%
mutate(across(where(is.numeric), scale)) %>%
column_to_rownames("Gene") %>%
pheatmap(
scale = "row",
cluster_column = F,
show_rownames = FALSE,
show_colnames = TRUE,
treeheight_col = 0,
drop_levels = TRUE,
fontsize = 5,
clustering_method = "complete",
main = "Hierachical Cluster Analysis (z-score)",
)
Created on 2021-09-26 by the reprex package (v2.0.1)

Is there a way to specify number of kmeans clusters to return in heatmaply

I would like to return a specific number of clusters for my interactive heatmap from heatmaply like I can do with pheatmap and the kmeans_k = argument. Is there a way to do this with heatmaply?
If I have a large matrix and do not define the number of clusters to return with heatmaply, it takes too long to calculate the heatmap or I will get the error: 'vector memory exhausted(limit reached?)'.
library(pheatmap)
data(mtcars)
mat <- as.matrix(mtcars)
pheatmap(
mtcars,
border_color = "grey20",
main = "",
show_rownames = TRUE,
show_colnames = TRUE,
kmeans_k = 30,
cluster_rows = F,
cluster_cols = F
)
You want to use the k_col, and or k_row arguments.
You can see examples in the vignette, but just a simple example:
library("heatmaply")
heatmaply(mtcars, k_col = 2, k_row = 4)
Output:

Not able to get the proper plots in plot window. One plot missing. May someone tell why?

I am using par() function to partition the window, to accommodate three plots, but always one plot is missing. Not able to figure out why is it happening. Below is my R script function:
answer_3<-function(){
set.seed(1)
a <- runif(100)
set.seed(1)
b <- runif(100)
A<- as.data.frame(A)
B <- as.data.frame(b)
AB<-cbind(A,B)
AB_pear<-cor(x = AB,y=NULL,method = "pearson")
AB_spear<-cor(x = AB,y=NULL,method = "spearman")
AB_kend<-cor(x = AB,y=NULL,method = "kendall")
par(mfcol=c(3,1), mar=c(1,1,1,1))
#sp <-plot(AB[,1],AB[,2],main = "Plot of random uniform distribution",xlab = "uniformly distributed points_a",ylab = "uniformly distributed points_b")
library(corrplot)
Pear_p<-corrplot(corr = AB_pear,method = "circle",title = "Pearson correlogram")
Spear_p<-corrplot(corr = AB_spear,method = "circle",title = "Spearman correlogram")
Kend_p<-corrplot(corr = AB_kendall,method = "circle",title = "Kendall correlogram")
c(Pear_p,Spear_p,Kend_p)
}
The Kend_p is always missing from the plot window. What am I doing wrong?

K-Means plot resulting unreal points in R

I'm trying to plot a K-Means cluster to analyze different categories of products based on their inventory average and sold quantity.
All values are non-negative and of the same measurement unit.
I don't know what I did wrong and the results contain point with negative values. Actually, I believe all the points given in the plot aren't actual valid points from my data.
Here is my code:
reduced_dataset = dataset[1:20, 4:5]
# Using the elbow method to find the optimal number of clusters
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(reduced_dataset, i)$withinss)
plot(1:10,
wcss,
type = 'b',
main = paste('The Elbow Method'),
xlab = 'Number of clusters',
ylab = 'WCSS')
# As a result, number of clusters should be 2
# Fitting K-Means to the dataset
kmeans = kmeans(x = reduced_dataset, centers = 2)
y_kmeans = kmeans$cluster
# Visualising the clusters
library(cluster)
clusplot(reduced_dataset,
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
main = paste('Clusters of categories - NOT ON SALE'),
xlab = 'Average Sold Quantity',
ylab = 'Average Inventory')
dput(reduced_dataset):
structure(list(Avg_Sold_No_Promo = c(0.255722695, 1.139983236,
0.458651842, 0.784966698, 1.642746914, 0.115264798, 7.50338696,
0.487603306, 1.023373984, 0.956099815, 1.505901506, 0.253837072,
0.834963325, 0.880898876, 6.527699531, 11.54054054, 3.44077135,
0.750182882, 0.251033058, 1.875698324), Avg_Inventory_No_Promo =
c(6.068672335,
22.57865326, 9.00694927, 11.56137012, 28.47530864, 7.485981308,
170.9064352, 11.07438017, 22.80792683, 40.63863216, 41.73463573,
10.87603306, 35.87408313, 46.09213483, 185.5671362, 315.6015693,
165.1129477, 78.18032187, 9.65857438, 198.4385475)), .Names =
c("Avg_Sold_No_Promo",
"Avg_Inventory_No_Promo"), row.names = c(NA, 20L), class = "data.frame")
Can someone please help me?
The clusplot function does this automatically.
It is called PCA, and that is also why you get the line with the variability explained there.

Adding Labels in Scientific Notation to Forest Plots Using the metafor package

So I'm doing a meta-analysis using the meta.for package in R. I am preparing figures for publication in a scientific journal and i would like to add p-values to my forest plots but with scientific annotation formatted as x10-04 rather than standard
e-04
However the argument ilab in the forest function does not accept expression class objects but only vectors
Here is an example :
library(metafor)
data(dat.bcg)
## REM
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg,
measure = "RR",
slab = paste(author, year, sep = ", "), method = "REML")
# MADE UP PVALUES
set.seed(513)
p.vals <- runif(nrow(dat.bcg), 1e-6,0.02)
# Format pvalues so only those bellow 0.01 are scientifically notated
p.vals <- ifelse(p.vals < 0.01,
format(p.vals,digits = 3,scientific = TRUE,trim = TRUE),
format(round(p.vals, 2), nsmall=2, trim=TRUE))
## Forest plot
forest(res, ilab = p.vals, ilab.xpos = 3, order = "obs", xlab = "Relative Risk")
I want the scientific notation of the p-values to be formatted as x10-04
All the answers to similar questions that i've seen suggest using expression() but that gives Error in cbind(ilab) : cannot create a matrix from type 'expression' which makes sense because the help file on forest specifies that the ilab argument should be a vector.
Any ideas on how I can either fix this or work around it?
A hacky solution would be to
forest.rma <- edit(forest.rma)
Go to line 574 and change
## line 574
text(ilab.xpos[l], rows, ilab[, l], pos = ilab.pos[l],
to
text(ilab.xpos[l], rows, parse(text = ilab[, l]), pos = ilab.pos[l],
fix your p-values and plot
p.vals <- gsub('e(.*)', '~x~10^{"\\1"}', p.vals)
forest(res, ilab = p.vals, ilab.xpos = 3, order = "obs", xlab = "Relative Risk")

Resources