Stacked barplot with percentage in R ggplot2 for categorical variables from scratch - r

I've installed ggplot2 3.1.0 on my PC. My raw data look like (short example of the long data frame):
structure(list(genotype = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "C", "C", "C", "C", "C", "C", "C"),
type = c("M", "M", "M", "M", "M", "M", "M", "M", "M", "M",
"M", "M", "R", "R", "R", "R", "R", "R", "R")), row.names = c(NA,
19L), class = "data.frame")
I used the code to obtain the following plot :
library(ggplot2)
ggplot(df_test,aes(x=factor(genotype),fill=factor(type)))+geom_bar(stat="count")+xlab("Genotype")
But now i need to substitute the count by percentage, to show that 100% of the observations with genotype have type M and the same for genotype C (100% observations belong to type R). I tried to follow the post:
enter link description here
My code was:
ggplot(df,aes(type,x=genotype,fill=type)+geom_bar(stat="identity")+xlab("Genotype")+scales::percent)
But got the error:
Error in aes(y = type, x = genotype, fill = type) + geom_bar(stat = "identity") + :
non-numeric argument to binary operator
Could you please help me to fix the error?

Is this it?
library(tidyverse)
df<-data.frame(type=c(rep("A",5),rep("C",5)),genotype=c(rep("M",5),rep("R",5)))
df %>%
mutate_if(is.character,as.factor) %>%
ggplot(aes(genotype,fill=type))+geom_bar(position="fill")+
theme_minimal()+
scale_y_continuous(labels=scales::percent_format())

I used solution provided by NelsonGon, but maked it a bit shorter. Besides i always try to avoid of using third party libraries if it's possible. So the code working for me was:
ggplot(df_test,aes(x=factor(genotype),fill=factor(type)))+geom_bar(position="fill")+xlab("Genotype")+scale_y_continuous(labels=scales::percent_format())

Related

Perform combinatorics with words that have repeated letters with Gtools package

Good morning, I need your help because I am solving the following problem:
What is the number of words we can write with the letters of “ELECTROENCEFALOGRAMA”? There are 20 letters, of which L, C, O, A and R are repeated twice and E is repeated 4 times.
I know that the problem is of combinatorics and I am approaching it in the following way using Rstudio and the gtools library
palabra_A <- c("E", "L", "E", "C", "T", "R", "O", "E", "N", "C", "E", "F",
"A", "L", "O", "G", "R", "A", "M", "A")
combinatoria_A <- combinations(20, 20, palabra_A, repeats.allowed = F)
The problem is that I get the following error which I have not been able to solve
> combinatoria_A <- combinations(20, 20, palabra_A, repeats.allowed = F)
Error in combinations(20, 20, palabra_A, repeats.allowed = F) :
too few different elements
Please help me to resolve this issue.

An ifelse statement that checks a variable for a letter

I apologize if this is basic. I am trying perform hex to dec and dec to hex functions on a variable in a data frame in R. I need to "sort" my dataframe into two variables, those which have a character string that contain a letter, and those that do not (i.e. if they are in hex or in dec).
My solution is to create new variables with mutate and an ifelse statement, but with my code below it appears to not recognize that any character string contains a letter.
df$PITnumF contains this:
3D91BF15B9C2D,
985120013429805
My attempt to mutate/ifelse
mutate(df, h2df = ifelse(df$PITnumF %in% c("A", "B", "C", "D", "E", "F", "G", "H"
, "I", "J", "K", "L", "M", "N", "O", "P",
"Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"),
(.hex.to.dec(df$PITnumF)), (.dec.to.hex(df$PITnumF))))
Thank you for your time.

Using different colors for a logical variable on a graph in R

I am trying to create a graph that shows the x and y axis' but then changes the colors for whether the variable is correlated with a "T" or "F". I created a simple dataframe because the one I'm using has over 19000 variables.
age<- rnorm(20, 6, 2)
prediction<- rnorm(20, 17, 4)
logic<- c("T", "F", "F", "F", "T", "F", "T", "T", "T", "F", "T", "F", "F", "F", "T", "F", "T", "T", "T", "F")
data.frame(age,prediction,logic)
plot(age, prediction)
So, on my graph I want the points to be one color if the answer was "T" and another color if the answer was "F".
I've looked all over and tried multiple techniques with ggplot2 and subsetting but so far nothing has worked. Any ideas?
This method will work for ggplot2
age<- rnorm(20, 6, 2)
prediction<- rnorm(20, 17, 4)
logic<- c("T", "F", "F", "F", "T", "F", "T", "T", "T", "F", "T", "F", "F", "F", "T", "F", "T", "T", "T", "F")
df <- data.frame(age,prediction,logic) #ggplot likes data frames
library(ggplot2)
ggplot(data = df, aes(x = age, y = prediction, colour = logic)) + geom_point()
If on the other hand, you'd like to keep this in base graphics, you can use the following to make the plot and then the legend.
plot(age, prediction, col = as.factor(logic))
legend("topright", c("T", "F"), col = c("black", "red"), pch = 1)

How to order data.frame in my specific 'vector' order in R language?

I have a data.frame showed below:
In order to analyse the relationship between those 10 features and disorder propensity, I need to sort the data.frame in my amino acids order which is stored in an vector like this c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E")
I tried this properties[aa == c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E"), ] which doesn't seem to work for me.
What's the right way to sort the data.frame in my 'vector' order?
You can make your column aa a factor and give the factor levels in the correct order. The factor can then be sorted according to the levels. Look at this example:
my_order <- c("X", "Y", "Z", "A", "B") # defines the order
test <- c("A", "B", "Y", "Z", "Z", "A", "X", "X", "B") # a normal character vector
test2 <- factor(test, levels = my_order) # convert it to factor and specify the levels
test2 # original order unchanged
test2[order(test2)] # ordered by custom order
Note that you must specify all occuring factor levels or this will not work!

Variables order for ggplot

My dataframe:
Variable <- sample(-9:10)
Levels<-rep(c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" ),times=2)
ID<-rep(c("WT", "KO"), each=10)
df <- data.frame(Variable, Levels, ID)
I run ggplot and I get this:
If I had these two lines
df$ID=factor(df$ID, c("WT","KO"))
df$Levels=factor(df$Levels, c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" ))
I can get this
But there must be a way to do this without entering manually the levels
Just create your initial data frame with the correct factor, i.e.
df = data.frame(Variable, factor(Levels, levels=unique(Levels)), ID)
The unique function helpfully maintains the correct order. Alternatively,
levels = c("N", "A", "L","B", "O", "C", "U", "R", "E", "Y" )
Levels = factor(rep(levels, each=2), levels)

Resources