I am using the R programming language. I am trying to follow this tutorial over here: http://www.semspirit.com/artificial-intelligence/machine-learning/regression/support-vector-regression/support-vector-regression-in-r/
For the famous Iris dataset, I am trying to plot the 3D decision surface for the random forest algorithm (using tsne dimensions):
library(Rtsne)
library(dplyr)
library(ggplot2)
library(plotly)
library(caret)
library(randomForest)
#data
a = iris
a <- unique(a)
#create two species just to make things easier
s <- c("a","b")
species<- sample(s , 149, replace=TRUE, prob=c(0.3, 0.7))
a$species = species
a$species = as.factor(a$species)
#split data into train/test, and then random forest
index = createDataPartition(a$species, p=0.7, list = FALSE)
train = a[index,]
test = a[-index,]
rf = randomForest(species ~ ., data=train, ntree=50, mtry=2)
#have the model predict the test set
pred = predict(rf, test, type = "prob")
labels = as.factor(ifelse(pred[,2]>0.5, "a", "b"))
confusionMatrix(labels, test$species)
#tsne algorithm
tsne_obj_3 <- Rtsne(test[,-5], perplexity=1, dims=3)
df_m2 <- as.data.frame(tsne_obj_3$Y)
df_m2$labels = test$species
From here, I am trying to plot the 3d decision surface (http://www.semspirit.com/artificial-intelligence/machine-learning/regression/support-vector-regression/support-vector-regression-in-r/) :
axis_1 = df_m2$V1
axis_2 = df_m2$V2
axis_3 = df_m2$V3
plot_ly(x=as.vector(axis_1),y=as.vector(axis_2),z=axis_3, type="scatter3d", mode="markers", name = "Obs", marker = list(size = 3)) %>%
add_trace(x=as.vector(axis_1),y=as.vector(axis_2),z=df_m2$labels, type = "mesh3d", name = "Preds")
But I am getting the following error:
2: In RColorBrewer::brewer.pal(N, "Set2") :
minimal value for n is 3, returning requested palette with 3 different levels
3: 'mesh3d' objects don't have these attributes: 'mode', 'marker'
Valid attributes include:
'type', 'visible', 'legendgroup', 'name', 'uid', 'ids', 'customdata', 'meta', 'hoverlabel', 'stream', 'uirevision', 'x', 'y', 'z', 'i', 'j', 'k', 'text', 'hovertext', 'hovertemplate', 'delaunayaxis', 'alphahull', 'intensity', 'intensitymode', 'color', 'vertexcolor', 'facecolor', 'cauto', 'cmin', 'cmax', 'cmid', 'colorscale', 'autocolorscale', 'reversescale', 'showscale', 'colorbar', 'coloraxis', 'opacity', 'flatshading', 'contour', 'lightposition', 'lighting', 'hoverinfo', 'showlegend', 'xcalendar', 'ycalendar', 'zcalendar', 'scene', 'idssrc', 'customdatasrc', 'metasrc', 'xsrc', 'ysrc', 'zsrc', 'isrc', 'jsrc', 'ksrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'intensitysrc', 'vertexcolorsrc', 'facecolorsrc', 'hoverinfosrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
A 3D plot is produced, but the 3D plane is completely gone.
Can someone please tell me what I am doing wrong?
I am trying to make so that when you move your mouse over each point, for that point it will display the value of a$Sepal.Length, a$Sepal.Width, a$Petal.Length, a$Petal.Width, a$Species
Thanks
When you called add_trace(), z is not assigned correctly. The labels won't plot; you need to plot the probabilities you identified, z=df_m2$pred.
There are multiple ways to fix the issues with the mesh plot, but the easiest would be to use add_mesh instead of add_trace.
plot_ly(x=as.vector(axis_1),
y=as.vector(axis_2),
z=axis_3,
type="scatter3d",
mode="markers",
name = "Obs",
marker = list(size = 3)) %>%
add_mesh(x=as.vector(axis_1),
y=as.vector(axis_2),
z=df_m2$pred,
type = "mesh3d",
name = "Preds")
Related
library(factoextra)
library(FactoMineR)
res.pca = PCA(clin.oc[,1:14], graph = TRUE)
scree = fviz_screeplot(res.pca, ncp=5, addlabels=TRUE, size=0.5, title='Scree
plot')
loadings = plot(res.pca, choix = "var", title='Loadings')
n = fviz_contrib(res.pca, choice = "var", axes = 1, top = 5, title='Variable
contributions')
scores = fviz_pca_ind(res.pca, label="none", habillage=as.factor(clin),
tittle='scores')
ggarrange(scree, loadings, n, scores)
and ggarange function wont work, says it is not found.
I think the error is already in res.pca because your 'identifier' variable is of type character. Take those out (guess: PCA(clin.oc[,2:14], …).
There are examples available of changing marker size in plotly objects created with vistime() (e.g. as below), but I am having trouble figuring out how to do the same thing for a ggplot2 object made with gg_vistime(). Ideally I would like to be able to specify both the marker size and the label font size. Is there a way to do this?
library(vistime)
dat <- data.frame(event = 1:4, start = c("2019-01-01", "2019-01-10"))
p <- vistime(dat)
# step 1: transform into a list
pp <- plotly::plotly_build(p)
# step 2: loop over pp$x$data, and change the marker size of all text elements to 50px
for(i in seq_along(pp$x$data)){
if(pp$x$data[[i]]$mode == "markers") pp$x$data[[i]]$marker$size <- 20
}
# or, using purrr:
# marker_idx <- which(purrr::map_chr(pp$x$data, "mode") == "markers")
# for(i in marker_idx) pp$x$data[[i]]$marker$size <- 20
# pp
pp
You can use the following code to specify the textfont size too, like this:
library(vistime)
#> Warning: package 'vistime' was built under R version 4.1.2
dat <- data.frame(event = 1:4, start = c("2019-01-01", "2019-01-10"))
p <- vistime(dat)
# step 1: transform into a list
pp <- plotly::plotly_build(p)
# step 2: loop over pp$x$data, and change the marker size of all text elements to 50px
pp$x$data <- lapply(pp$x$data, function(x){
if(x$mode == "text"){
x$textfont$size <- 28
return(x)
}else if(x$mode == "markers"){
x$size <- 20
return(x)
}else{
return(x)
}})
pp
#> Warning: 'scatter' objects don't have these attributes: 'size'
#> Valid attributes include:
#> 'cliponaxis', 'connectgaps', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'fill', 'fillcolor', 'groupnorm', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hoveron', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'legendgroup', 'legendgrouptitle', 'legendrank', 'line', 'marker', 'meta', 'metasrc', 'mode', 'name', 'opacity', 'orientation', 'selected', 'selectedpoints', 'showlegend', 'stackgaps', 'stackgroup', 'stream', 'text', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
Created on 2022-07-27 by the reprex package (v2.0.1)
I figured out separate solutions for marker size and label size since gg_vistime() utilizes geom_text_repel() for the labels which doesn't store a size parameter in aes.
library(vistime)
dat <- data.frame(event = 1:4, start = c("2019-01-01", "2019-01-10"),
show_labels = FALSE)
# do not include labels in the original plot, add them later to control appearance
p <- gg_vistime(dat,
show_labels = FALSE)
# adjust the marker size by directly accessing the aes_params
p$layers[[3]]$aes_params$size <- 20
# add a new geom_text_repel layer for the labels where you can specify appearance
p +
ggrepel::geom_text_repel(data=p$layers[[3]]$data,
label=p$layers[[3]]$data$label,
size =10,
color="black")
I have data that look like this:
Gene
HBEC-KT-01
HBEC-KT-02
HBEC-KT-03
HBEC-KT-04
HBEC-KT-05
Primarycells-02
Primarycells-03
Primarycells-04
Primarycells-05
BPIFB1
15726000000
15294000000
15294000000
14741000000
22427000000
87308000000
2.00E+11
1.04E+11
1.51E+11
LCN2
18040000000
26444000000
28869000000
30337000000
10966000000
62388000000
54007000000
56797000000
38414000000
C3
2.52E+11
2.26E+11
1.80E+11
1.80E+11
1.78E+11
46480000000
1.16E+11
69398000000
78766000000
MUC5AC
15647000
8353200
12617000
12221000
29908000
40893000000
79830000000
28130000000
69147000000
MUC5B
965190000
693910000
779970000
716110000
1479700000
38979000000
90175000000
41764000000
50535000000
ANXA2
14705000000
18721000000
21592000000
18904000000
22657000000
28163000000
24282000000
21708000000
16528000000
I want to make a heatmap like the following using R. I am following a paper and they quoted "Heat maps were generated with the ‘pheatmap’ package76, where correlation clustering distance row was applied". Here is their heatmap.
I want the same like this and I am trying to make one using R by following tutorials but I am new to R language and know nothing about R.
Here is my code.
df <- read.delim("R.txt", header=T, row.names="Gene")
df_matrix <- data.matrix(df)
pheatmap(df_matrix,
main = "Heatmap of Extracellular Genes",
color = colorRampPalette(rev(brewer.pal(n = 10, name = "RdYlBu")))(10),
cluster_cols = FALSE,
show_rownames = F,
fontsize_col = 10,
cellwidth = 40,
)
This is what I get.
When I try using clustering, I got the error.
pheatmap(
mat = df_matrix,
scale = "row",
cluster_column = F,
show_rownames = TRUE,
drop_levels = TRUE,
fontsize = 5,
clustering_method = "complete",
main = "Hierachical Cluster Analysis"
)
Error in hclust(d, method = method) :
NA/NaN/Inf in foreign function call (arg 10)
Can someone help me with the code?
You can normalize the data using scale to archive a more uniform coloring. Here, the mean expression is set to 0 for each sample. Genes lower expressed than average have a negative z score:
library(tidyverse)
library(pheatmap)
data <- tribble(
~Gene, ~`HBEC-KT-01`, ~`HBEC-KT-02`, ~`HBEC-KT-03`, ~`HBEC-KT-04`, ~`HBEC-KT-05`, ~`Primarycells-03`, ~`Primarycells-04`, ~`Primarycells-05`,
"BPIFB1", 1.5726e+10, 1.5294e+10, 1.5294e+10, 1.4741e+10, 2.2427e+10, 2e+11, 1.04e+11, 1.51e+11,
"LCN2", 1.804e+10, 2.6444e+10, 2.8869e+10, 3.0337e+10, 1.0966e+10, 5.4007e+10, 5.6797e+10, 3.8414e+10,
"C3", 2.52e+11, 2.26e+11, 1.8e+11, 1.8e+11, 1.78e+11, 1.16e+11, 6.9398e+10, 7.8766e+10,
"MUC5AC", 15647000, 8353200, 12617000, 12221000, 29908000, 7.983e+10, 2.813e+10, 6.9147e+10,
"MUC5B", 965190000, 693910000, 779970000, 716110000, 1479700000, 9.0175e+10, 4.1764e+10, 5.0535e+10,
"ANXA2", 1.4705e+10, 1.8721e+10, 2.1592e+10, 1.8904e+10, 2.2657e+10, 2.4282e+10, 2.1708e+10, 1.6528e+10
)
data %>%
mutate(across(where(is.numeric), scale)) %>%
column_to_rownames("Gene") %>%
pheatmap(
scale = "row",
cluster_column = F,
show_rownames = FALSE,
show_colnames = TRUE,
treeheight_col = 0,
drop_levels = TRUE,
fontsize = 5,
clustering_method = "complete",
main = "Hierachical Cluster Analysis (z-score)",
)
Created on 2021-09-26 by the reprex package (v2.0.1)
I am learning about the "kohonen" package in R for the purpose of making Self Organizing Maps (SOM, also called Kohonen Networks - a type of Machine Learning algorithm). I am following this R language tutorial over here: https://www.rpubs.com/loveb/som
I tried to create my own data (this time with both "factor" and "numeric" variables) and run the SOM algorithm (this time using the "supersom()" function instead):
#load libraries and adjust colors
library(kohonen) #fitting SOMs
library(ggplot2) #plots
library(RColorBrewer) #colors, using predefined palettes
contrast <- c("#FA4925", "#22693E", "#D4D40F", "#2C4382", "#F0F0F0", "#3D3D3D") #my own, contrasting pairs
cols <- brewer.pal(10, "Paired")
#create and format data
a =rnorm(1000,10,10)
b = rnorm(1000,10,5)
c = rnorm(1000,5,5)
d = rnorm(1000,5,10)
e <- sample( LETTERS[1:4], 100 , replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
f <- sample( LETTERS[1:5], 100 , replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2) )
g <- sample( LETTERS[1:2], 100 , replace=TRUE, prob=c(0.5, 0.5) )
data = data.frame(a,b,c,d,e,f,g)
data$e = as.factor(data$e)
data$f = as.factor(data$f)
data$g = as.factor(data$g)
cols <- 1:4
data[cols] <- scale(data[cols])
#som model
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"),
dist.fct = "euclidean", keep.data = TRUE)
From here, I was able to successfully make some of the basic plots:
#plots
#pretty gradient colors
colour1 <- tricolor(som$grid)
colour4 <- tricolor(som$grid, phi = c(pi/8, 6, -pi/6), offset = 0.1)
plot(som, type="changes")
plot(som, type="count")
plot(som, type="quality", shape = "straight")
plot(som, type="dist.neighbours", palette.name=grey.colors, shape = "straight")
However, the problem arises when I try to make individual plots for each variable:
#error
var <- 1 #define the variable to plot
plot(som, type = "property", property = getCodes(som)[,var], main=colnames(getCodes(som))[var], palette.name=terrain.colors)
var <- 6 #define the variable to plot
plot(som, type = "property", property = getCodes(som)[,var], main=colnames(getCodes(som))[var], palette.name=terrain.colors)
This produces an error: "Error: Incorrect Number of Dimensions"
A similar error (NAs by coercion) is produced when attempting to cluster the SOM Network:
#cluster (error)
set.seed(33) #for reproducability
fit_kmeans <- kmeans(data, 3) #3 clusters are used, as indicated by the wss development.
cl_assignmentk <- fit_kmeans$cluster[data$unit.classif]
par(mfrow=c(1,1))
plot(som, type="mapping", bg = rgb(colour4), shape = "straight", border = "grey",col=contrast)
add.cluster.boundaries(som, fit_kmeans$cluster, lwd = 3, lty = 2, col=contrast[4])
Can someone please tell me what I am doing wrong?
Thanks
Sources: https://www.rdocumentation.org/packages/kohonen/versions/2.0.5/topics/supersom
getCodes() produces a list and as such you have to treat it like one.
Calling getCodes(som) produces a list containing 7 items named a-g as such you should be selecting items from the list either using $ or [[]]
e.g
plot(som, type = "property", property = getCodes(som)[[1]], main=names(getCodes(som))[1], palette.name=terrain.colors)
or
plot(som, type = "property", property = getCodes(som)$a, main="a", palette.name=terrain.colors)
or
plot(som, type = "property", property = getCodes(som)[["a"]], main="a", palette.name=terrain.colors)
if you must set the variable prior to calling the plot you can do so like:
var <- 1
plot(som, type = "property", property = getCodes(som)[[var]], main=names(getCodes(som))[var], palette.name=terrain.colors)
Regarding kmeans()
kmeans() needs a matrix or an object that can be coerced into a matrix, you have factors (categorical data) which cannot be coerced into numeric, either drop the factors, or find another method.
drop the factors:
#cluster (error)
set.seed(33)
#for reproducability
fit_kmeans <- kmeans(as.matrix(data[1:4]), 3)
#3 clusters are used, as indicated by the wss development.
cl_assignmentk <- fit_kmeans$cluster[data$unit.classif]
par(mfrow=c(1,1))
plot(som, type="mapping", bg = rgb(colour4), shape = "straight", border = "grey",col=contrast)
add.cluster.boundaries(som, fit_kmeans$cluster, lwd = 3, lty = 2, col=contrast[4])
edit:
Alternatively you can specify the code directly from getCodes() by using idx like so:
plot(som, type = "property", property = getCodes(som, idx = 1), main="a"), palette.name=terrain.colors)
I recently switched from ggplot to Rcharts and have a fairly simple question about the labels.
Sample data
data_1 <- data.table(Filter = c('Filter 1', 'Filter 2'),
Amount = c(100, 50))
data_2 <- data.table(Filter = c('Filter 1'),
Amount = c(100))
Plots
hPlot(Amount ~ Filter, data = data_1, type = 'bar', group.na = 'NA\'s')
hPlot(Amount ~ Filter, data = data_2, type = 'bar', group.na = 'NA\'s')
Question:
Why do we see the correct label in the first plot, but only the first letter of the label in the second plot? This issue always occurs when the number of rows = 1 (as it is in data_2).
Does anyone has a quick fix / workaround?