ggplot2 heatmap : how to preserve the label order? - r

I'm trying to plot heatmap in ggplot2 using csv data following casbon's solution in
http://biostar.stackexchange.com/questions/921/how-to-draw-a-csv-data-file-as-a-heatmap-using-numpy-and-matplotlib
the problem is x-label try to re-sort itself. For example, if I swap label COG0002 and COG0001 in that example data, the x-label still come out in sort order (cog0001, cog0002, cog0003.... cog0008).
Is there anyway to prevent this ? I want to it to be ordered as in csv file
thanks
pp

If I recall, when calling factor(x) with the default levels argument, the levels are set as levels = sort(unique(x)).
You can override this action by setting levels = unique(x).
For example:
set.seed(1)
x = sample(letters, 100, replace = TRUE)
head(x, 5)
[1] "g" "j" "o" "x" "f"
levels(factor(x))
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
levels(factor(x, levels = unique(x)))
[1] "g" "j" "o" "x" "f" "y" "r" "q" "b" "e" "u" "m" "s" "z" "d" "k" "a" "w" "i"
[20] "p" "v" "c" "n" "t" "l" "h"
You can see that setting levels = unique(x) preserves the order of occurrence in the data.

If you want to keep the order directly from the csv file :
foomelt$COG <- factor(foomelt$COG, levels = unique(as.character(foo[[1]])))

Did you try reordering factor levels before plotting?
e.g.
foomelt$COG = factor(foomelt$COG,levels(foomelt$COG)[c(2,1,3:8)])
(I can't try it right now, so I can't be sure that it works)

Related

Sampling alternating blocks of character strings (words) from two fixed lists in R

I am attempting to create a 5000 word vector composed of 500 blocks of 10 words. One block is drawn from sampling with replacement from a fixed list of animals, and this block is to alternate with a fixed list of foods. The following code yields one iteration of what I need:
anim<- data.frame(cbind(stim=list.sample(animals$WORD, 10, replace=T), cond="animal"))
food <- data.frame(cbind(stim=list.sample(foods$WORD, 10, replace=T), cond="food"))
both <- data.frame(rbind(anim, food))
This yields output as follows:
I just cannot figure out how to repeat this procedure 499 more times to create the total vector I need -- I will be running semantic distances between clusters to determine whether I can autosegment the boundaries between foods and animals. I attempted a repeat loop to no avail
Thanks for any ideas!
Since you did not provide any reproducible data, we will assume that LETTERS are food and letters are animals. This line of code generates the vector you specified. Here we are only using batches of 5 to illustrate the process:
result <- as.vector(replicate(5, c(sample(LETTERS, 5, replace=TRUE), sample(letters, 5, replace=TRUE))))
result
# [1] "H" "O" "T" "K" "J" "m" "c" "s" "u" "c" "P" "Y" "V" "U" "Y" "p" "u" "q" "k" "l" "B" "H" "U" "F" "K" "h" "v" "g"
# [29] "c" "d" "X" "F" "R" "N" "U" "v" "t" "u" "q" "x" "N" "E" "G" "Q" "L" "d" "a" "v" "e" "a"

Print vowels from the vector

I need to execute the vowels from the LETTERS R build-in vector
"A", "E", etc.
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"
"V" "W" "X"
[25] "Y" "Z"
Maybe, someone knows how to do it with if() or other functions. Thank you in advance.
Looks like you need extract vowels, does this work:
> vowels <- c('A','E','I','O','U')
> LETTERS[sapply(vowels, function(ch) grep(ch, LETTERS))]
[1] "A" "E" "I" "O" "U"
>

Matching between a vector and multiple vectors in a list in R

I have a list of vectors such as:
>list
[[1]]
[1] "a" "m" "l" "s" "t" "o"
[[2]]
[1] "a" "y" "o" "t" "e"
[[3]]
[1] "n" "a" "s" "i" "d"
I want to find the matches between each of them and the remaining (i.e. between the 1st and the other 2, the 2nd and the other 2, and so on) and keep the couple with the highest number of matches. I could do it with a "for" loop and intersect by couples. For example
for (i in 2:3) { intersect(list[[1]],list[[i]]) }
and then save the output into a vector or some other structure. However, this seems so inefficient to me (given than rather than 3 I have thousands) and I am wondering if R has some built-in function to do that in a clever way.
So the question would be:
Is there a way to look for matches of one vector to a list of vectors without the explicit use of a "for" loop?
I don't believe there is a built-in function for this. The best you could try is something like:
lsts <- lapply(1:5, function(x) sample(letters, 10)) # make some data (see below)
maxcomb <- which.max(apply(combs <- combn(length(lsts), 2), 2,
function(ix) length(intersect(lsts[[ix[1]]], lsts[[ix[2]]]))))
lsts <- lsts[combs[, maxcomb]]
# [[1]]
# [1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"
# [[2]]
# [1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"
A dump of the original:
[[1]]
[1] "z" "r" "j" "h" "e" "m" "w" "u" "q" "f"
[[2]]
[1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"
[[3]]
[1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"
[[4]]
[1] "c" "o" "t" "j" "d" "g" "u" "k" "w" "h"
[[5]]
[1] "f" "g" "q" "y" "d" "e" "n" "s" "w" "i"
datal <- list (a=c(2,2,1,2),
b=c(2,2,2,4,3),
c=c(1,2,3,4))
# all possible combinations
combs <- combn(length(datal), 2)
# split into list
combs <- split(combs, rep(1:ncol(combs), each = nrow(combs)))
# calculate length of intersection for every combination
intersections_length <- sapply(combs, function(y) {
length(intersect(datal[[y[1]]],datal[[y[2]]]))
}
)
# What lists have biggest intersection
combs[which(intersections_length == max(intersections_length))]

Letter "y" comes after "i" when sorting alphabetically

When using function sort(x), where x is a character, the letter "y" jumps into the middle, right after letter "i":
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[21] "t" "u" "v" "w" "x" "z"
The reason may be that I am located in Lithuania, and this is "lithuanian-like" sorting of letters, but I need normal sorting. How do I change the sorting method back to normal inside R code?
I'm using R 2.15.2 on Win7.
You need to change the locale that R is running in. Either do that for your entire Windows install (which seems suboptimal) or within the R sessions via:
Sys.setlocale("LC_COLLATE", "C")
You can use any other valid locale string in place of "C" there, but that should get you back to the sort order for letters you want.
Read ?locales for more.
I suppose it is worth noting the sister function Sys.getlocale(), which queries the current setting of a locale parameter. Hence you could do
(locCol <- Sys.getlocale("LC_COLLATE"))
Sys.setlocale("LC_COLLATE", "lt_LT")
sort(letters)
Sys.setlocale("LC_COLLATE", locCol)
sort(letters)
Sys.getlocale("LC_COLLATE")
## giving:
> (locCol <- Sys.getlocale("LC_COLLATE"))
[1] "en_GB.UTF-8"
> Sys.setlocale("LC_COLLATE", "lt_LT")
[1] "lt_LT"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n"
[16] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "z"
> Sys.setlocale("LC_COLLATE", locCol)
[1] "en_GB.UTF-8"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
[16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> Sys.getlocale("LC_COLLATE")
[1] "en_GB.UTF-8"
which of course is what #Hadley's Answer shows with_collate() doing somewhat more succinctly once you have devtools installed.
If you want to do this temporarily, devtools provides the with_collate function:
library(devtools)
with_collate("C", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
# [20] "t" "u" "v" "w" "x" "y" "z"
with_collate("lt_LT", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r"
# [20] "s" "t" "u" "v" "w" "x" "z"

Is there a way to reverse the letters of cld in multcomp package?

cld makes a compact letters display of the differences. The greatest different mean gets an "a" the second a "b" and so on. However I want the least mean to get an "a", ie get the letters in a ascending order insted of a descending order.
Here is a reproducible example from the help:
data(warpbreaks)
amod <- aov(breaks ~ tension, data = warpbreaks)
tuk <- glht(amod, linfct = mcp(tension = "Tukey"))
tuk.cld <- cld(tuk)
tuk.cld
I have submitted a contribution to the multcomp package. Now the decreasing flag control the order of the letters. Setting it to TRUE will reverse the order.
data(warpbreaks)
amod <- aov(breaks ~ tension, data = warpbreaks)
tuk <- glht(amod, linfct = mcp(tension = "Tukey"))
tuk.cld <- cld(tuk)
tuk.cld
tuk.cld_dec <- cld(tuk, decreasing = TRUE)
tuk.cld_dec
I'm not familiar w/ the multcomp package, but I noticed that
tuk.cld$mcletters$aLetters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z" "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
[ 39] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
Which suggest to me that there's a command switch in cld() that lets you choose whatever set of identifiers you want. E.g.
rev.lets<-rev(c(letters,LETTERS))

Resources