Highlight single terms within a word cloud? - r

Is it possible to highlight single words within a word cloud using 'wordcloud' or 'wordcloud2'? Does one have to add another column to the data frame as ordering factor?
I couldn't find any simple solution.
Here's what I've done:
wordcloud(text_process$words[1:n.words],
text_process$frequency[1:n.words],
scale = c(18, 0.5),
colors = c("#666666", "#3E6AA0") [factor(text_process$matches[1:n.words])],
use.r.layout = FALSE,
rot.per = 0.2,
random.order = FALSE, ordered.colors=TRUE)
I had to introduce a criterion (called 'matches') in the data frame 'text_process' that indicates the color. I was wondering whether there's a simpler way of highlighting specific words...

# Not Tested
library(randomcoloR)
cols<-randomColor(length(unique(test_process$words[1:n.words])), luminosity = "dark")
match_value<-match("HighlightThisWord", test_process$words[1:n.words])
cols[match_value]<-"orange"
wordcloud(text_process$words[1:n.words],
text_process$frequency[1:n.words],
scale = c(18, 0.5),
colors = cols,
use.r.layout = FALSE,
rot.per = 0.2,
random.order = FALSE, ordered.colors=TRUE)

Related

Remove residual variance paths of y-indicators in sempaths (package: semplot)

I am using the following code semPlot::semPaths to plot a lavaan model:
semPaths(object = fit,
what = "stand",
nCharNodes = 0,
layout = "tree",
fade = FALSE,
exoCov = FALSE,
residuals = FALSE,
intercepts = FALSE,
rotation = 2,
sizeMan = 5,
sizeLat = 10,
edge.label.cex = 1.5,
minimum=0.16,
allVars = FALSE,
equalizeManifests = FALSE,
fixedStyle = 2)
And this is the figure from the semPaths output:
I want to remove all the red paths on the y-indicators (i.e., red lines on the right-hand side of the model). What argument can I add to remove them? Thanks.
I have tried adding various arguments to the semPaths() code based on R documentation and previous stackoverflow queries. These additional arguments are removing different paths and numbers, except for the paths I want to remove. Most arguments are included in the current code.

GGplot is refusing to change the colors of my row annotation

There are some issues with my ggplot chart that I can't seem to fix.
# as you may geuss from the file name I have provided this matrix below
vis.matrix <- read.csv("csvfileprovidedbelow.csv")
# setting up annotation_row
cell_df <- data.frame ("Cells" = c(rep("Putative Engram Cell", 10), rep("Random Cell",10))
)
rownames(cell_df) <- rownames(vis.matrix)
cell_df$Cells <- as.factor(cell_df$Cells)
#setting up colors
newCols <- colorRampPalette(grDevices::rainbow(length(unique(cell_df$Cells))))
annoCol <- c("2AFE00", "ACACAC") # green and grey
names(annoCol) <- levels(cell_df$Cells)
annoCol <- list(category = annoCol)
color=colorRampPalette(c("navy", "white", "red"))(50)
#plotting
pheatmap(vis.matrix,cluster_rows = F, cluster_cols=F, annotation_row = cell_df,
annotation_names_col = F, scale = "column", color = color,
annotation_row_colors = annoCol,
show_rownames = F)
Result
For some reason the Cells are not the colors I selected, you can search those colors here: https://www.color-hex.com/
Don't know why ggplot is ignoring the input I'm giving it. Also would like to remove the word "Cells" beside the color bars on the graph, it's unescessary the legend already explains what it is.
Variables as csv's for reproduction(copy and paste!)
vis.matrix is here:
"","LINGO1","ARC","INHBA","BDNF","MAPK4","ADGRL3","PTGS2","CHGB","BRINP1","KCNK1"
"P57_CATCGGGCATGTCGAT",-0.368245729284319,3.47987283505039,2.94634318740768,5.57309275653906,1.28904872906168,5.3650511213102,-0.368245729284319,2.25850383984707,4.60363764575367,-0.368245729284319
"P57_GAAGCAGGTAAAGGAG",-0.384074162377759,4.36118508997518,3.70326968156081,4.89874111968957,1.65959775959153,4.36118508997518,-0.384074162377759,-0.384074162377759,4.89874111968957,2.85506919772029
"P57_TGACTTTTCTTTACAC",-0.357194851773428,2.40812492004642,3.13225019258772,5.67855340720666,-0.357194851773428,3.13225019258772,-0.357194851773428,4.87697271476829,1.38752767040715,-0.357194851773428
"P57_CTAGAGTGTCCGACGT",1.50110424640379,3.34315724311024,2.57863617381809,6.67240079339861,3.34315724311024,3.93616585502151,-0.340948750302666,1.50110424640379,5.77821885172796,3.34315724311024
"P57_CCTTACGTCCAAGTAC",-0.381478022176755,4.73256922534426,2.17554560158375,6.70465771162764,1.23182426263886,3.36449387848259,-0.381478022176755,2.17554560158375,4.45842883227008,3.36449387848259
"P57_ATCCGAAGTGTGACCC",2.60172319423431,1.50562420175544,-0.36816940232616,5.57161579079479,1.50562420175544,3.37941780583703,-0.36816940232616,3.37941780583703,4.47551679831591,3.98264461101114
"P57_TCCACACAGCTCCTCT",-0.364903374339472,2.59101007342497,2.59101007342497,5.23001785519025,-0.364903374339472,3.36504411201368,-0.364903374339472,1.5000703688371,1.5000703688371,-0.364903374339472
"P57_CTGAAGTGTGCTTCTC",-0.384690873645543,3.35025193111807,2.83241374986762,4.71429931551947,3.35025193111807,3.35025193111807,-0.384690873645543,3.35025193111807,2.16480422093696,2.16480422093696
"P57_CTGATAGAGAATCTCC",1.6886646742164,2.87694996247181,-0.342722443403036,7.39148929746973,1.6886646742164,5.75143890945527,-0.342722443403036,5.75143890945527,4.37401237658979,-0.342722443403036
"P57_GGAGCAACATACAGCT",-0.351186802480077,1.4651606822983,1.4651606822983,5.40649850082577,-0.351186802480077,4.34400333395122,-0.351186802480077,1.4651606822983,5.09785565185506,1.4651606822983
"A57_CGTCTACCAGACGCAA",-0.229651158962319,-0.229651158962319,-0.229651158962319,-0.229651158962319,-0.229651158962319,3.72717582194343,-0.229651158962319,-0.229651158962319,-0.229651158962319,-0.229651158962319
"P57_GTTCGGGCAATGGACG",-0.269219507178484,-0.269219507178484,-0.269219507178484,-0.269219507178484,-0.269219507178484,4.26241026631276,-0.269219507178484,-0.269219507178484,-0.269219507178484,-0.269219507178484
"P56_GGTATTGTCATGTCTT",-0.294887130864939,-0.294887130864939,-0.294887130864939,-0.294887130864939,-0.294887130864939,5.06808977241301,-0.294887130864939,-0.294887130864939,-0.294887130864939,-0.294887130864939
"A67_AAATGCCAGATAGTCA",4.03836820795661,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977
"P76_CCCTGATAGAGGACTC",-0.507269585219581,-0.507269585219581,-0.507269585219581,1.90264065061749,-0.507269585219581,4.86614536666517,-0.507269585219581,1.40253909173334,-0.507269585219581,0.697685532698955
"P56_GATCGATTCCGTCAAA",2.00727896845415,-0.313514850319463,-0.313514850319463,2.00727896845415,-0.313514850319463,3.36485632434217,-0.313514850319463,-0.313514850319463,-0.313514850319463,-0.313514850319463
"P57_GCTGCAGCATAGGATA",2.32839123926114,-0.289105834618761,-0.289105834618761,-0.289105834618761,-0.289105834618761,2.32839123926114,-0.289105834618761,-0.289105834618761,-0.289105834618761,4.94588831314104
"P82_AGGATAACATAGGTTC",1.39699437520094,-0.501641808549684,0.696264250985952,1.39699437520094,-0.501641808549684,4.49353661848721,-0.501641808549684,-0.501641808549684,1.89417031052159,-0.501641808549684
"P82_CCAAGCGTCCGGCTTT",-0.328980171926236,-0.328980171926236,-0.328980171926236,4.08682708745919,-0.328980171926236,1.87892345776647,-0.328980171926236,-0.328980171926236,4.08682708745919,-0.328980171926236
"P57_CAGCGACCATGTCCTC",-0.316475979591103,-0.316475979591103,-0.316475979591103,2.18079240270816,-0.316475979591103,6.13886914288907,-0.316475979591103,2.18079240270816,-0.316475979591103,4.67806078500742
pheatmap is not ggplot. It is drawn using grid graphics.
Anyway, you would pass the color specification as follows:
pheatmap(vis.matrix, cluster_rows = F, cluster_cols=F, annotation_row = cell_df,
annotation_names_col = F, scale = "column", color = color,
annotation_colors = list(Cells = c("Putative Engram Cell" = "#2AFE00",
"Random Cell" = "#ACACAC")),
show_rownames = F)

Venn diagram (4 sets) with R: problem with percentages

I am trying to draw a Venn diagram with four logical variables. I have tried many different R packages but with each of them I have faced some problems. So far, the best result I have achieved by using the ggvenn package. However, the problem is, that it shows the percentages of the intersections based on the observations included in the diagram, instead of all observations in the data.
Below is an example Venn diagram and its code to illustrate the problem. So my question is: is there some way to display the percentages in relation to the total amount of observations in the data. For instance, in the diagram below the intersection of ABCD consists of 45 observation and thus the correct proportion would be 4.5% (i.e. 45/1000) instead of 4.7%.
I would really appreciate if someone could help me out with this.
library(ggvenn)
a <- sample(x = c(TRUE, TRUE, FALSE), size = 1000, replace = TRUE)
b <- sample(x = c(TRUE,FALSE, FALSE), size = 1000, replace = TRUE)
c <- sample(x = c(TRUE, FALSE, FALSE, FALSE), size = 1000, replace = TRUE)
d <- sample(x = c(TRUE, TRUE, TRUE, FALSE), size = 1000, replace = TRUE)
df <- tibble(values = c(1:1000), A=a, B=b, C=c, D=d)
ggvenn(df,
fill_color = c("black", "grey70", "grey80", "grey90"),
show_percentage = TRUE,
digits = 1,
text_size = 2.5)
A quick fix for this problem would be to use the latest Github version of {ggvenn}:
remove.packages("ggvenn")
devtools::install_github("yanlinlin82/ggvenn")
Then, by default, there will be also a percentage for the observations that are outside A/B/C/D. This gives you the percentages you're looking for:

Add a gap in heatmap with pheatmap package

I made the heatmap using the code below:
library(pheatmap)
library(dplyr)
data = data.frame(matrix(runif(10*10), ncol=10))
data$sample = rep(c("tumour", "normal"), 5)
data$subject.ID = paste('Subject', 1:10)
data = data %>% arrange(sample)
# for row annotation
my_sample_col = data %>% select(sample)
rownames(my_sample_col) = data$subject.ID
# data matrix
mat = as.matrix(data %>% select(-sample, -subject.ID))
rownames(mat) = data$subject.ID
pheatmap(mat,
scale='row',
annotation_row = my_sample_col,
annotation_names_row=F,
cluster_rows = FALSE,
cluster_cols = FALSE,
show_colnames = FALSE,
show_rownames = FALSE)
I want to put a gap between row 5 and row 6, to separate the heatmap according to my row annotation.
In pheatmap function, the argument gaps_row seems to do the job.
vector of row indices that show shere to put gaps into heatmap. Used only if the rows are not clustered.
I'm not sure how to implement that. Can someone help me with this? Thanks a lot.
I would recommend using ComplexHeatmap package (website; Gu et al, 2016). You can install it with devtools::install_github("jokergoo/ComplexHeatmap").
It has more functionalities, but you also have to invest more time (eg., row annotation and matrix scaling).
library(ComplexHeatmap)
# Create annotation for rows
my_sample_col_ano <- rowAnnotation(sample = my_sample_col$sample,
show_annotation_name = FALSE)
# Scale original matrix row-wise
matS <- t(apply(mat, 1, scale))
# Plot heatmap
Heatmap(matS,
# Remove name from fill legend
name = "",
# Keep original row/col order
row_order = rownames(matS), column_order = colnames(matS),
# Add left annotation (legend with tumor/normal)
left_annotation = my_sample_col_ano,
# ACTUAL SPLIT by sample group
row_split = my_sample_col$sample,
show_row_names = FALSE, show_column_names = FALSE,
show_row_dend = FALSE, show_column_dend = FALSE,
row_title = NULL)
If you want to use original pheatmap pass argument to gaps_row which is equal to the size of your group (ie, normal):
pheatmap(mat,
scale='row',
gaps_row = 5,
annotation_row = my_sample_col,
annotation_names_row=F,
cluster_rows = FALSE,
cluster_cols = FALSE,
show_colnames = FALSE,
show_rownames = FALSE)
If you can more groups than two instead of hardcoding numeric value to gaps_row (ie, gaps_row = 5) you can pass this snippet (head(as.numeric(cumsum(table(my_sample_col$sample))), -1)).

Neither Color Key nor Main Title appear (properly) on heat map (heat map.2 in r)

I'm learning how to use R and for an exercise I'm using an experimental ExpresionSet from Bioconductor: http://bioconductor.org/packages/release/data/experiment/html/leukemiasEset.html
As simple as that: My heat map doesn't display the color key and the main title appears chopped off on the left site. I had given up with the no color key problem after searching on the web how to solve it and not really finding a solution. But then I added the title and since it doesn't appear properly either, I thought that maybe both things are due to the same (unknown for me) reason...
Here is my code:
difexp <- exprs(leukemiasEset)[c(which(adjPval < 0.01)),c(1:12, 25:36)]
heatmap.2(difexp,
trace = "none",
cexCol = 0.6,
ColSideColors = as.character(as.numeric(factor(leukemiasEset$LeukemiaType[c(1:12, 25:36)]))),
main = "Differentially expressed genes in ALL and CLL samples",
cex.main = 1.5,
key = TRUE,
keysize = 1.5,
density.info = "histogram")
Where:
pvalues <- c()
for(i in 1:nrow(exprs(leukemiasEset))) {
R <- t.test(exprs(leukemiasEset)[i, leukemiasEset$LeukemiaType == "ALL"],
exprs(leukemiasEset)[i, leukemiasEset$LeukemiaType == "CLL"],
var.equal = TRUE)
pvalues <- c(pvalues, R$p.value)
}
adjPval <- p.adjust(pvalues, method = "fdr")
And this is how it looks like:
Since I am a beginner, I think this is probably an "easy peasy" for the experts... thank you very much in advance!
Use "\n" to insert a newline in your title and put half of it on a second line.
heatmap.2(difexp,
trace = "none",
cexCol = 0.6,
ColSideColors = as.character(as.numeric(factor(leukemiasEset$LeukemiaType[c(1:12, 25:36)]))),
main = "Differentially expressed genes\nin ALL and CLL samples",
cex.main = 1.5,
key = TRUE,
keysize = 1.5,
density.info = "histogram")

Resources