How to order data.frame in my specific 'vector' order in R language? - r

I have a data.frame showed below:
In order to analyse the relationship between those 10 features and disorder propensity, I need to sort the data.frame in my amino acids order which is stored in an vector like this c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E")
I tried this properties[aa == c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E"), ] which doesn't seem to work for me.
What's the right way to sort the data.frame in my 'vector' order?

You can make your column aa a factor and give the factor levels in the correct order. The factor can then be sorted according to the levels. Look at this example:
my_order <- c("X", "Y", "Z", "A", "B") # defines the order
test <- c("A", "B", "Y", "Z", "Z", "A", "X", "X", "B") # a normal character vector
test2 <- factor(test, levels = my_order) # convert it to factor and specify the levels
test2 # original order unchanged
test2[order(test2)] # ordered by custom order
Note that you must specify all occuring factor levels or this will not work!

Related

An ifelse statement that checks a variable for a letter

I apologize if this is basic. I am trying perform hex to dec and dec to hex functions on a variable in a data frame in R. I need to "sort" my dataframe into two variables, those which have a character string that contain a letter, and those that do not (i.e. if they are in hex or in dec).
My solution is to create new variables with mutate and an ifelse statement, but with my code below it appears to not recognize that any character string contains a letter.
df$PITnumF contains this:
3D91BF15B9C2D,
985120013429805
My attempt to mutate/ifelse
mutate(df, h2df = ifelse(df$PITnumF %in% c("A", "B", "C", "D", "E", "F", "G", "H"
, "I", "J", "K", "L", "M", "N", "O", "P",
"Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"),
(.hex.to.dec(df$PITnumF)), (.dec.to.hex(df$PITnumF))))
Thank you for your time.

Sort qualitative variable with groups and keeping the indexes

I have a variable composed by 6 different letters, I need to sort this obtaining 6 different indexes, so that I will be able to sort a dataset according to this qualitative variable.
here's the variable:
data = c("H", "H", "A", "A", "B", "R", "E", "B", "E", "B", "A", "E",
"R", "R", "I", "B", "I", "I", "H", "A", "E", "I", "B", "I", "H",
"B", "R", "E", "B", "R", "H", "R", "I", "A", "B", "E", "A", "E",
"I", "H", "A", "E", "I", "H", "R", "H", "A", "R")
if I sort this I'm obtaining only the alphabetic order:
data_idx = sort(data, index.return = TRUE)
How can I obtain these indexes and reorder this variable?
We can extract with either $ or [[ as it is a list output when we use index.return = TRUE
sort(data, index.return = TRUE)$ix
Another option is order
order(data)
If we need to obtain index
match(data, unique(data))
Or may be
split(seq_along(data), data)
Or with ave
ave(seq_along(data), data, FUN = seq_along)

Variables order for ggplot

My dataframe:
Variable <- sample(-9:10)
Levels<-rep(c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" ),times=2)
ID<-rep(c("WT", "KO"), each=10)
df <- data.frame(Variable, Levels, ID)
I run ggplot and I get this:
If I had these two lines
df$ID=factor(df$ID, c("WT","KO"))
df$Levels=factor(df$Levels, c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" ))
I can get this
But there must be a way to do this without entering manually the levels
Just create your initial data frame with the correct factor, i.e.
df = data.frame(Variable, factor(Levels, levels=unique(Levels)), ID)
The unique function helpfully maintains the correct order. Alternatively,
levels = c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" )
Levels = factor(rep(levels, each=2), levels)

Replacement with vectors

I have a vector with all consonants and I want every single consonant to be replaced with a "C" in a given data frame. Assume my data frame is x below:
x <- c("abacate", "papel", "importante")
v <- c("a", "e", "i", "o", "u")
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
find <- c
replace <- "C"
found <- match(x, find)
ifelse(is.na(found), x, replace[found])
This is not working. Could anybody tell me what the problem is and how I can fix it?
Thanks
Regular expressions (gsub) are far more flexible in general, but for that particular problem you can also use the chartr function which will run faster:
old <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n",
"p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
new <- rep("C", length(old))
chartr(paste(old, collapse = ""),
paste(new, collapse = ""), x)
Use gsub to replace the letters in a character vector:
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
consonants = paste(c("[", c, "]"), collapse="")
replaced = gsub(consonants, "C", x)
consonants becomes a regular expression, [bcdfghjklmnpqrstvwxyz], that means "any letter inside the brackets."
One of the reasons your code wasn't working is that match doesn't look for strings within other strings, it only looks for exact matches. For example:
> match(c("a", "b"), "a")
[1] 1 NA
> match(c("a", "b"), "apple")
[1] NA NA

How can I partition a vector?

How can I build a function
slice(x, n)
which would return a list of vectors where each vector except maybe the last has size n, i.e.
slice(letters, 10)
would return
list(c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),
c("k", "l", "m", "n", "o", "p", "q", "r", "s", "t"),
c("u", "v", "w", "x", "y", "z"))
?
slice<-function(x,n) {
N<-length(x);
lapply(seq(1,N,n),function(i) x[i:min(i+n-1,N)])
}
You can use the split function:
split(letters, as.integer((seq_along(letters) - 1) / 10))
If you want to make this into a new function:
slice <- function(x, n) split(x, as.integer((seq_along(x) - 1) / n))
slice(letters, 10)

Resources