Sort qualitative variable with groups and keeping the indexes - r

I have a variable composed by 6 different letters, I need to sort this obtaining 6 different indexes, so that I will be able to sort a dataset according to this qualitative variable.
here's the variable:
data = c("H", "H", "A", "A", "B", "R", "E", "B", "E", "B", "A", "E",
"R", "R", "I", "B", "I", "I", "H", "A", "E", "I", "B", "I", "H",
"B", "R", "E", "B", "R", "H", "R", "I", "A", "B", "E", "A", "E",
"I", "H", "A", "E", "I", "H", "R", "H", "A", "R")
if I sort this I'm obtaining only the alphabetic order:
data_idx = sort(data, index.return = TRUE)
How can I obtain these indexes and reorder this variable?

We can extract with either $ or [[ as it is a list output when we use index.return = TRUE
sort(data, index.return = TRUE)$ix
Another option is order
order(data)
If we need to obtain index
match(data, unique(data))
Or may be
split(seq_along(data), data)
Or with ave
ave(seq_along(data), data, FUN = seq_along)

Related

Color parts of x-axis characters [duplicate]

This question already has an answer here:
Change the color of the axis labels
(1 answer)
Closed 5 years ago.
Edit: This question addresses how to color only subsets of the x-axis labels. This is not a duplicate question.
I have made the x-axis labels to represent a nucleotide sequence, and I would like to add color to different sections of the nucleotides. How?
Thanks.
ggplot(data = miRNA3) +
geom_line(mapping = aes(x = Position, y = Count_combined)) +
scale_y_continuous(breaks = seq(0, 120, 10)) +
ylab("Count") +
scale_x_continuous(breaks=1:150, labels=c("T", "G", "A", "T", "G", "T", "C", "C", "G", "T", "G", "T", "C", "C", "A", "C", "T", "C", "G", "T", "T", "G", "T", "T", "T", "T", "C", "A", "A", "C", "T", "T", "C", "T", "T", "C", "C", "C", "G", "C", "A", "A", "T", "T", "T", "A", "C", "C", "T", "T", "C", "A", "T", "G", "G", "T", "T", "A", "A", "A", "C", "A", "A", "T", "A", "A", "A", "T", "C", "A", "G", "C", "T", "A", "A", "G", "G", "T", "A", "T", "G", "G", "A", "C", "A", "C", "T", "G", "T", "A", "A", "C", "T", "A", "C", "T", "C", "T", "G", "A", "A", "G", "G", "T", "A", "A", "G", "T", "T", "G", "C", "G", "A", "G", "A", "G", "G", "A", "A", "G", "T", "T", "T", "C", "A", "A", "G", "T", "A", "G", "C", "A", "T", "T", "G", "G", "A", "T", "T", "C", "G", "G", "A", "C", "G", "T", "T", "A", "T", "G"), expand = c(0, 0)) +
xlab("Supercontig_1.4289:xxx-xxx") +
theme(panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank(),
panel.grid.minor.y=element_blank())
Edit: I would like to make something like this (see the letters on the x-axis):
df = data.frame(x = 1:4, y = 1:4)
my_labs = c("G", "A", "A", "T")
my_cols = c("red", "blue", "blue", "chartreuse")
ggplot(df, aes(x, y)) + geom_point() +
scale_x_continuous(breaks = 1:4, labels = my_labs) +
theme(axis.text.x = element_text(color = my_cols))
I had no idea this was possible until I saw #UnivStudent's comment. Pretty cool!

How to order data.frame in my specific 'vector' order in R language?

I have a data.frame showed below:
In order to analyse the relationship between those 10 features and disorder propensity, I need to sort the data.frame in my amino acids order which is stored in an vector like this c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E")
I tried this properties[aa == c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E"), ] which doesn't seem to work for me.
What's the right way to sort the data.frame in my 'vector' order?
You can make your column aa a factor and give the factor levels in the correct order. The factor can then be sorted according to the levels. Look at this example:
my_order <- c("X", "Y", "Z", "A", "B") # defines the order
test <- c("A", "B", "Y", "Z", "Z", "A", "X", "X", "B") # a normal character vector
test2 <- factor(test, levels = my_order) # convert it to factor and specify the levels
test2 # original order unchanged
test2[order(test2)] # ordered by custom order
Note that you must specify all occuring factor levels or this will not work!

Replacement with vectors

I have a vector with all consonants and I want every single consonant to be replaced with a "C" in a given data frame. Assume my data frame is x below:
x <- c("abacate", "papel", "importante")
v <- c("a", "e", "i", "o", "u")
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
find <- c
replace <- "C"
found <- match(x, find)
ifelse(is.na(found), x, replace[found])
This is not working. Could anybody tell me what the problem is and how I can fix it?
Thanks
Regular expressions (gsub) are far more flexible in general, but for that particular problem you can also use the chartr function which will run faster:
old <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n",
"p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
new <- rep("C", length(old))
chartr(paste(old, collapse = ""),
paste(new, collapse = ""), x)
Use gsub to replace the letters in a character vector:
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
consonants = paste(c("[", c, "]"), collapse="")
replaced = gsub(consonants, "C", x)
consonants becomes a regular expression, [bcdfghjklmnpqrstvwxyz], that means "any letter inside the brackets."
One of the reasons your code wasn't working is that match doesn't look for strings within other strings, it only looks for exact matches. For example:
> match(c("a", "b"), "a")
[1] 1 NA
> match(c("a", "b"), "apple")
[1] NA NA

tkplot in latex via knitr and igraph

This may be a wild strange dream. I dreampt that I could put a tkplot from igraph inside a latex document via knitr. I know Yihui is know for animation stuff so I thought maybe this is possible. A google search didn't show what I was after so here's a non working attempt:
\documentclass[a4paper]{scrartcl}
\begin{document}
<<setup, include=FALSE, cache=FALSE>>=
library(igraph)
#
<<network>>=
edges <- structure(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "A", "B", "C",
"D", "E", "F", "G", "H", "I", "J", "E", "G", "G", "F", "H", "G",
"D", "J", "J", "D", "B", "C", "D", "I", "I", "H", "A", "B", "G",
"I", "F", "D", "F", "J", "D", "B", "E", "E", "A", "E"), .Dim = c(30L,
2L), .Dimnames = list(NULL, c("person", "choice")))
g <- graph.data.frame(edges, directed=TRUE)
tkplot(g)
#
\end{document}
OK, a quick and dirty answer:
\documentclass{article}
\begin{document}
<<setup, include=FALSE, cache=FALSE>>=
library(igraph)
library(tcltk)
knit_hooks$set(igraph = function(before, options, envir) {
if (before) return()
path = knitr:::fig_path('.eps')
tkpostscript(igraph:::.tkplot.get(options$igraph)$canvas,
file = path)
sprintf('\\includegraphics{%s}', path)
})
#
<<network, igraph=1>>=
edges <- structure(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "A", "B", "C",
"D", "E", "F", "G", "H", "I", "J", "E", "G", "G", "F", "H", "G",
"D", "J", "J", "D", "B", "C", "D", "I", "I", "H", "A", "B", "G",
"I", "F", "D", "F", "J", "D", "B", "E", "E", "A", "E"), .Dim = c(30L,
2L), .Dimnames = list(NULL, c("person", "choice")))
g <- graph.data.frame(edges, directed=TRUE)
tkplot(g)
#
\end{document}
Feel free to polish it with hook_plot_custom.

How can I partition a vector?

How can I build a function
slice(x, n)
which would return a list of vectors where each vector except maybe the last has size n, i.e.
slice(letters, 10)
would return
list(c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),
c("k", "l", "m", "n", "o", "p", "q", "r", "s", "t"),
c("u", "v", "w", "x", "y", "z"))
?
slice<-function(x,n) {
N<-length(x);
lapply(seq(1,N,n),function(i) x[i:min(i+n-1,N)])
}
You can use the split function:
split(letters, as.integer((seq_along(letters) - 1) / 10))
If you want to make this into a new function:
slice <- function(x, n) split(x, as.integer((seq_along(x) - 1) / n))
slice(letters, 10)

Resources