ggplot2 adding custom legend when plotting two lines from subset of columns - r

I've looked all over stack and other sites to fix my code but can't see what's wrong. I am trying to plot 2 lines on the same graph on ggplot that are portions of 2 different columns. For example, I have a column of length 8 of which the first four rows are M (male) and the last four rows are F (female). I have two columns of data and one column for condition (factor).
ModelMF <- data.frame(ProbGender, ProbCond, ProbMF, Act_pct)
where:
ProbGender ProbCond ProbMF Act_pct
M 0 .75 .71
M 10 .67 .69
M 20 .61 .54
M 30 .81 .77
F 0 .88 .82
F 10 .73 .71
F 20 .67 .71
F 30 .60 .63
I have tried the following but I keep getting errors (see below):
ggplot(data = ModelMF, aes(x = ProbCond)) + geom_line(data =
ModelMF[ModelMF$ProbGender=="M",], aes(y=ProbMF), color = 'col1') +
geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y = ProbMF)) +
geom_line(data = ModelMF[ModelMF$ProbGender=="M",], aes(y=Act_pct), color =
'col2') + geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y =
Act_pct)) + scale_color_manual(values = c('col1' = 'darkblue', 'col2' ='lightblue'))
Preferably I would like to be able to create a custom legend that lets me map the colors as I've attempted to do using scale_color_manual, but I get the following error:
Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'col1'
I'm not sure if it is due to the fact that I'm subsetting data within the df or something else I'm just missing? Also if I add the female lines I assume I can simply follow the same procedure?
Thanks in advance.

Related

R Plot Bar graph transposed dataframe

I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?
If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()
I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.

Color points if ID in vector in ggplot2

I have imported data in this form:
Sample1 Sample2 Identity
1 2 chr11-50-T
3 4 chr11-200-A
v <- read.table("myfile", header = TRUE)
I have a vector that looks like this:
x <- c(50,100)
And without some other aesthetic stuff I am plotting column 1 vs column 2 labeled with column 3.
p <- ggplot(v, aes(x=sample1, y=sample2, alpha=0.5, label=identity)) +
geom_point() +
geom_text_repel(aes(label=ifelse(sample2>0.007 |sample1>0.007 ,as.character(identity),''))) +
I would like to somehow indicate those points that contain a number in their ID, found within the vector x. I was thinking this could be done with color, but it doesn't really matter to me as long as there is a difference between the two types of points.
So for instance if the points containing a number in x were to be colored red, the first point would be red because it has 50 in the ID and the second point would not be, because 200 is not a value in x.
You could add in a TRUE/FALSE value as a column and use that as a color. I had to remove your label = ... aes since that's not an aes in ggplot2. Also everything is transparent because you use aes(alpha = 0.5):
library(ggrepel)
library(ggplot2)
vafs$col <- grepl(paste0(x,collapse = "|"), vafs$Identity)
p <- ggplot(vafs, aes(x=Sample1, y=Sample2, alpha=0.5, color = col)) +
geom_point() +
geom_text_repel(aes(label=ifelse(Sample2>0.007 |Sample1>0.007 ,as.character(Identity),'')))
I came up with the following solution:
vafs<-read.table(text="Sample1 Sample2 Identity
1 2 chr11-50-T
3 4 chr11-200-A", header=T)
vec <- c(50,100)
vafs$vec<- sapply(vafs$Identity, FUN=function(x)
ifelse(length(grep(pattern=paste(vec,collapse="|"), x))>0,1,0))
vafs$vec <- as.factor(vafs$vec)
ggplot(vafs, aes(x=Sample1, y=Sample2, label=Identity, col=vec),alpha=0.5)+geom_point()

Legends not showing up properly in heatmap with ggplot2

I am trying to make a heatmap of normalized read abundance values with geom_tile in ggplot2 based on the example code here. My current code produces a heatmap for the desired ranges, but for some reason only 4 out of the 7 ranges are shown in heatmap and I cannot figure out what is the issue. When I followed the example in the original link it worked fine, so I must have changed something incorrectly in my code. Can anyone please help me to identify the error in my code that is causing this?
I want to have the following color scheme:
-Inf < value <= 0 -> white
0 < value <=1 -> yellow
1< value <=10 -> orange
10< value <= 100 -> darkorange2
100< value <= 1000 -> red
1000 <value <= 10000 -> red3
10000 < value <= 32000 -> red4
Here is my code:
#re-order the labels in the order of appearance in the data frame
df$label <- factor(df$X1, as.character(df$X1))
# make the cuts
df$value1 <-cut(df$value,breaks=c(Inf,0,1,10,100,1000,10000,32000),right = T)
ggplot(data = df, aes(x = label, y = X2)) + geom_tile(aes(fill=value1), colour= "black") + scale_fill_manual(breaks=c("(-Inf,0]", "(0,1]", "(1,10]", "(10,100]", "(100,1000]", "(1000,10000]", "(10000,32000]"),values =c("white","yellow","orange","darkorange2","red","red3","red4"))
here is a preview of my data (actual data has 228 rows featuring reads per million values for 38 IDs in 6 different experiments):
head(df)
X1 X2 value label value1
1 merged_read_17785-997_aka_156_aka_21 RPM.MT1 91.783028 merged_read_17785-997_aka_156_aka_21 (10,100]
2 merged_read_133362-79_aka_156_aka_21 RPM.MT1 6.403467 merged_read_133362-79_aka_156_aka_21 (1,10]
3 merged_read_147828-69_aka_156_aka_20 RPM.MT1 4.268978 merged_read_147828-69_aka_156_aka_20 (1,10]
4 merged_read_162443-60_aka_156_aka_21 RPM.MT1 0.000000 merged_read_162443-60_aka_156_aka_21 (-Inf,0]
5 merged_read_262156-32_aka_156_aka_21 RPM.MT1 5.691971 merged_read_262156-32_aka_156_aka_21 (1,10]
6 merged_read_22905-759_aka_159_aka_21 RPM.MT1 140.164780 merged_read_22905-759_aka_159_aka_21 (100,1e+03]
And here is the plot that I get from the above data:
I think I figured this out, if I take out the breaks argument from scale_fill_manual then all legends are shown:
ggplot(data = df, aes(x = label, y = X2)) + geom_tile(aes(fill=value1), colour= "black") + scale_fill_manual(values =c("white","yellow","orange","darkorange2","red","red3","red4"))

add custom legend in ggplot2

I'm working with ggplot2 to generate some geom_line plots which i've already generated from another data.frame which is not important to mention here. but it also contains the same id value as the following dataframe.
I have this data frame called df:
id X Y total
1 3214 6786 10000
2 4530 5470 10000
3 2567 7433 10000
4 1267 8733 10000
5 2456 7544 10000
6 6532 6532 10000
7 5642 4358 10000
What i want to do is create custom legend which present for a specific id the percentage of X and Y on each of the geom_line for when the id variable is the same. So basically for each geom_line of e.g(id=1, draw the percentage for that id in the geom_line plot)
I've tried to use geom_text, but the problem is that it's printing everything in one line which i cannot see anything of it.
how this can be done ??
EDIT
olddf dataframe is something like that:
id pos X Y Z
1
1.....
1
2
3
4
3 ......
.
.
that's the code that i've tried
for(i in df$id)
{
test = subset(olddf, id==i)
mdata <- melt(test, id=c("pos","id"))
pl = ggplot() + geom_line(data=mdata, aes(x=pos, y=value, color=variable)) + geom_text(data=df, aes(x=6000, y=0.1, label=(X*total)/100), size=5)
}
The answer (as discussed in chat) is quite straightforward:
Change geom_text(data = df, ...) to geom_text(data = df[df$id == i, ], ...)

ggplot function select multiple subset

I'm not an expert with the ggplot2 package. I have a subset selection problem.
Here is my code that produce this kind of graph...
g <- ggplot(merged_data,aes_string(x=Order,fill=var.y)) +
scale_y_continuous(expand = c(0.05,0)) +
xlab(paste("Order","Total number of sequences",sep=" - ")) +
ggtitle(main.str) +
geom_bar(position="fill",
subset = .(Order != ""),
width=0.6,hjust =0)+
geom_text(stat="bin",
subset = .(Order != ""),
color="black", hjust=1, vjust = 0.5, size=2,
aes_string(fill=NULL,x = Order,y = "0", label="..count.."))+
coord_flip()
For geom_bar and geom_text I select subset of data that remove empty names
subset = .(eval(parse(text=var.x)) != "")
this is a simple example with only 2 bars.
Here is a the data ...
Collector<- c("BK","YE_LD","BK","JB","JB",
"BK","BK","BK","JB","YE_LD")
Order<-c("A","B","B","B","A",
"B","B","A","B","B")
data <- data.frame(Order,Collector)
Now I want to add a cutoff to my subset... only show the variable that that have a minimum of counts.
So if I put the cutoff = 4 ... I will get only the bar at the bottom that have 7 counts, the bar at the top with 3 counts should not appear.
I have no idea how I can do this ...
Thanks for your help.
You could create a subset of the data and use this new object in ggplot. The following command will remove all Order conditions with less than four data points:
subset(data, Order %in% names(which(table(Order) >= 4)))
Order Collector
2 B YE_LD
3 B BK
4 B JB
6 B BK
7 B BK
9 B JB
10 B YE_LD

Resources