Variables order for ggplot - r

My dataframe:
Variable <- sample(-9:10)
Levels<-rep(c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" ),times=2)
ID<-rep(c("WT", "KO"), each=10)
df <- data.frame(Variable, Levels, ID)
I run ggplot and I get this:
If I had these two lines
df$ID=factor(df$ID, c("WT","KO"))
df$Levels=factor(df$Levels, c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" ))
I can get this
But there must be a way to do this without entering manually the levels

Just create your initial data frame with the correct factor, i.e.
df = data.frame(Variable, factor(Levels, levels=unique(Levels)), ID)
The unique function helpfully maintains the correct order. Alternatively,
levels = c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" )
Levels = factor(rep(levels, each=2), levels)

Related

An ifelse statement that checks a variable for a letter

I apologize if this is basic. I am trying perform hex to dec and dec to hex functions on a variable in a data frame in R. I need to "sort" my dataframe into two variables, those which have a character string that contain a letter, and those that do not (i.e. if they are in hex or in dec).
My solution is to create new variables with mutate and an ifelse statement, but with my code below it appears to not recognize that any character string contains a letter.
df$PITnumF contains this:
3D91BF15B9C2D,
985120013429805
My attempt to mutate/ifelse
mutate(df, h2df = ifelse(df$PITnumF %in% c("A", "B", "C", "D", "E", "F", "G", "H"
, "I", "J", "K", "L", "M", "N", "O", "P",
"Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"),
(.hex.to.dec(df$PITnumF)), (.dec.to.hex(df$PITnumF))))
Thank you for your time.

Sort qualitative variable with groups and keeping the indexes

I have a variable composed by 6 different letters, I need to sort this obtaining 6 different indexes, so that I will be able to sort a dataset according to this qualitative variable.
here's the variable:
data = c("H", "H", "A", "A", "B", "R", "E", "B", "E", "B", "A", "E",
"R", "R", "I", "B", "I", "I", "H", "A", "E", "I", "B", "I", "H",
"B", "R", "E", "B", "R", "H", "R", "I", "A", "B", "E", "A", "E",
"I", "H", "A", "E", "I", "H", "R", "H", "A", "R")
if I sort this I'm obtaining only the alphabetic order:
data_idx = sort(data, index.return = TRUE)
How can I obtain these indexes and reorder this variable?
We can extract with either $ or [[ as it is a list output when we use index.return = TRUE
sort(data, index.return = TRUE)$ix
Another option is order
order(data)
If we need to obtain index
match(data, unique(data))
Or may be
split(seq_along(data), data)
Or with ave
ave(seq_along(data), data, FUN = seq_along)

Stacked barplot with percentage in R ggplot2 for categorical variables from scratch

I've installed ggplot2 3.1.0 on my PC. My raw data look like (short example of the long data frame):
structure(list(genotype = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "C", "C", "C", "C", "C", "C", "C"),
type = c("M", "M", "M", "M", "M", "M", "M", "M", "M", "M",
"M", "M", "R", "R", "R", "R", "R", "R", "R")), row.names = c(NA,
19L), class = "data.frame")
I used the code to obtain the following plot :
library(ggplot2)
ggplot(df_test,aes(x=factor(genotype),fill=factor(type)))+geom_bar(stat="count")+xlab("Genotype")
But now i need to substitute the count by percentage, to show that 100% of the observations with genotype have type M and the same for genotype C (100% observations belong to type R). I tried to follow the post:
enter link description here
My code was:
ggplot(df,aes(type,x=genotype,fill=type)+geom_bar(stat="identity")+xlab("Genotype")+scales::percent)
But got the error:
Error in aes(y = type, x = genotype, fill = type) + geom_bar(stat = "identity") + :
non-numeric argument to binary operator
Could you please help me to fix the error?
Is this it?
library(tidyverse)
df<-data.frame(type=c(rep("A",5),rep("C",5)),genotype=c(rep("M",5),rep("R",5)))
df %>%
mutate_if(is.character,as.factor) %>%
ggplot(aes(genotype,fill=type))+geom_bar(position="fill")+
theme_minimal()+
scale_y_continuous(labels=scales::percent_format())
I used solution provided by NelsonGon, but maked it a bit shorter. Besides i always try to avoid of using third party libraries if it's possible. So the code working for me was:
ggplot(df_test,aes(x=factor(genotype),fill=factor(type)))+geom_bar(position="fill")+xlab("Genotype")+scale_y_continuous(labels=scales::percent_format())

How to order data.frame in my specific 'vector' order in R language?

I have a data.frame showed below:
In order to analyse the relationship between those 10 features and disorder propensity, I need to sort the data.frame in my amino acids order which is stored in an vector like this c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E")
I tried this properties[aa == c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E"), ] which doesn't seem to work for me.
What's the right way to sort the data.frame in my 'vector' order?
You can make your column aa a factor and give the factor levels in the correct order. The factor can then be sorted according to the levels. Look at this example:
my_order <- c("X", "Y", "Z", "A", "B") # defines the order
test <- c("A", "B", "Y", "Z", "Z", "A", "X", "X", "B") # a normal character vector
test2 <- factor(test, levels = my_order) # convert it to factor and specify the levels
test2 # original order unchanged
test2[order(test2)] # ordered by custom order
Note that you must specify all occuring factor levels or this will not work!

Replacement with vectors

I have a vector with all consonants and I want every single consonant to be replaced with a "C" in a given data frame. Assume my data frame is x below:
x <- c("abacate", "papel", "importante")
v <- c("a", "e", "i", "o", "u")
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
find <- c
replace <- "C"
found <- match(x, find)
ifelse(is.na(found), x, replace[found])
This is not working. Could anybody tell me what the problem is and how I can fix it?
Thanks
Regular expressions (gsub) are far more flexible in general, but for that particular problem you can also use the chartr function which will run faster:
old <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n",
"p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
new <- rep("C", length(old))
chartr(paste(old, collapse = ""),
paste(new, collapse = ""), x)
Use gsub to replace the letters in a character vector:
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
consonants = paste(c("[", c, "]"), collapse="")
replaced = gsub(consonants, "C", x)
consonants becomes a regular expression, [bcdfghjklmnpqrstvwxyz], that means "any letter inside the brackets."
One of the reasons your code wasn't working is that match doesn't look for strings within other strings, it only looks for exact matches. For example:
> match(c("a", "b"), "a")
[1] 1 NA
> match(c("a", "b"), "apple")
[1] NA NA

Resources