Mosaic plot (vcd package) - position of legend - r

I'm trying to make a mosaic plot with the vcd package, and i'm having an hard time understanding how to configure some settings of the plot.
library(vcd)
library(RColorBrewer)
mydf <- structure(list(A=structure(c(7L,6L,7L,6L,7L,1L,5L,4L,7L,6L,6L,6L,6L,6L,
3L,6L,6L,6L,5L,3L),
.Label=c("a","b","c","d","e","f","g","h","i"),
class="factor"),
B=structure(c(3L,2L,1L,1L,3L,3L,3L,3L,2L,3L,3L,1L,3L,
3L,3L,3L,3L,3L,3L,3L),
.Label=c("a","b","c"),
class="factor")),
.Names=c("A","B"),
row.names=c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L),
class="data.frame")
mosaic( ~ A + B, data=mydf, highlighting="A",
highlighting_fill=brewer.pal(9, "Set3"))
The legend of the different levels of the A variable are on the top of the plot, which is not ver helpful since the category "a", of variable B does not have all those levels. I would like the legend to be in the bottom to be together with the category that has all the levels in the legend.

From ?labeling I learned there were several "behind the scenes" functions that accept arguments from mosaic and tried a couple of changes:
?labeling
I believe this is closer to what you were hoping for:
mosaic( ~ A + B, data=mydf, highlighting="A",
highlighting_fill=brewer.pal(9, "Set3"),
labeling_args=list(tl_labels =c(TRUE, FALSE) ) )
Sets the row labels to the bottom and uses the lower cell locations for placement. (Still have overlap of 'h' and 'i' but they can even be resoved, whereas you had overlap of a-e before.)

Related

How to add a edges between component of a graph in igraph R

I have a graph containing 4 components. Now, I want to add an edge among all components based on the size of the membership.
For example, the following graph contains 4 components.
First, I will connect all components with only one edge and take the edge randomly. I can do it using this code
graph1 <- graph_from_data_frame(g, directed = FALSE)
E(graph1)$weight <- g$new_ssp
cl <- components(graph1)
graph2 <- with(
stack(membership(cl)),
add.edges(
graph1,
c(combn(sapply(split(ind, values), sample, size = 1), 2)),
weight = runif(choose(cl$no, 2))
)
)
Secondly, now, I want to add an edge between component-1 and component-2. I want to add an edge between 2 components but rest of the component will be present in the new graph from the previous graph.
Like, after adding an edge between component-1 and component-2, the new graph will contain 3 component 1st (component-1 and component-2 as a 1 component because we added 1 edge), 2nd (component-3 from the main graph), and 3rd (component-4 from the main graph). I can do it using this code
dg <- decompose.graph(graph1)
graph3 <- (dg[[1]] %u% dg[[2]])
component_subgraph_1 <- components(graph3)
graph2 <- with(
stack(membership(component_subgraph_1)),
add.edges(
graph1,
c(combn(sapply(split(ind, values), sample, size = 1), 2)),
weight = 0.01))
Figure:
Same for all combinations. Such as, component-1 and component-3, and component-1 and component-4, and component-2 and component-3, and component-2 and component-4, and component-3 and component-4.
But, this is not feasible to write the code and change manually dg[[1]], dg[[2]], and so on. Moreover, my actual dataset contains a lot of components. So, in reality, this is impossible.
Any idea, how can I do this automatically?
Actually, I have a scoring function (like the shortest path). So, I want to check the score after adding all components, or after adding only 2 components, after adding only 3 components, and so on! Something like greedy algorithms.
Reproducible Data:
g <- structure(list(query = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("ID_00104",
"ID_00136", "ID_00169", "ID_00178", "ID_00180"), class = "factor"),
target = structure(c(16L, 19L, 20L, 1L, 9L, 9L, 6L, 11L,
13L, 15L, 4L, 8L, 10L, 14L, 2L, 3L, 5L, 7L, 12L, 17L, 18L
), .Label = c("ID_00169", "ID_00288", "ID_00324", "ID_00394",
"ID_00663", "ID_00790", "ID_00846", "ID_00860", "ID_00910", "ID_00959",
"ID_01013", "ID_01047", "ID_01130", "ID_01222", "ID_01260", "ID_06663",
"ID_06781", "ID_06786", "ID_06791", "ID_09099"), class = "factor"),
new_ssp = c(0.654172560113154, 0.919096895578551, 0.925821596244131,
0.860406091370558, 0.746376811594203, 0.767195767195767,
0.830379746835443, 0.661577608142494, 0.707520891364902,
0.908193484698914, 0.657118786857624, 0.687664041994751,
0.68586387434555, 0.874513618677043, 0.836646499567848, 0.618361836183618,
0.684163701067616, 0.914728682170543, 0.876297577854671,
0.732707087959009, 0.773116438356164)), row.names = c(NA,
-21L), class = "data.frame")
Thanks in advance.
You are actually close to what you want already. Perhaps the code below could help you
out <- with(
stack(membership(cl)),
lapply(
combn(split(ind, values), 2, simplify = FALSE),
function(x) {
add.edges(
graph1,
c(combn(sapply(x, sample, size = 1), 2)),
weight = 0.01
)
}
)
)
and then you can run
sapply(out, plot)
to visualize all the combinations.

How to make a lineplot with specific values out of a dataframe

I have a df as follow:
Variable Value
G1_temp_0 37.9
G1_temp_5 37.95333333
G1_temp_10 37.98333333
G1_temp_15 38.18666667
G1_temp_20 38.30526316
G1_temp_25 38.33529412
G1_mean_Q1 38.03666667
G1_mean_Q2 38.08666667
G1_mean_Q3 38.01
G1_mean_Q4 38.2
G2_temp_0 37.9
G2_temp_5 37.95333333
G2_temp_10 37.98333333
G2_temp_15 38.18666667
G2_temp_20 38.30526316
G2_temp_25 38.33529412
G2_mean_Q1 38.53666667
G2_mean_Q2 38.68666667
G2_mean_Q3 38.61
G2_mean_Q4 38.71
I like to make a lineplot with two lines which reflects the values "G1_mean_Q1 - G1_mean_Q4" and "G2_mean_Q1 - G2_mean_Q4"
In the end it should more or less look like this, the x axis should represent the different variables:
The main problem I have is, how to get a basic line plot with this df.
I've tried something like this,
ggplot(df, aes(x = c(1:4), y = Value) + geom_line()
but I have always some errors. It would be great if someone could help me. Thanks
Please post your data with dput(data) next time. it makes it easier to read your data into R.
You need to tell ggplot which are the groups. You can do this with aes(group = Sample). For this purpose, you need to restructure your dataframe a bit and separate the Variable into different columns.
library(tidyverse)
dat <- structure(list(Variable = structure(c(5L, 10L, 6L, 7L, 8L, 9L,
1L, 2L, 3L, 4L, 15L, 20L, 16L, 17L, 18L, 19L, 11L, 12L, 13L,
14L), .Label = c("G1_mean_Q1", "G1_mean_Q2", "G1_mean_Q3", "G1_mean_Q4",
"G1_temp_0", "G1_temp_10", "G1_temp_15", "G1_temp_20", "G1_temp_25",
"G1_temp_5", "G2_mean_Q1", "G2_mean_Q2", "G2_mean_Q3", "G2_mean_Q4",
"G2_temp_0", "G2_temp_10", "G2_temp_15", "G2_temp_20", "G2_temp_25",
"G2_temp_5"), class = "factor"), Value = c(37.9, 37.95333333,
37.98333333, 38.18666667, 38.30526316, 38.33529412, 38.03666667,
38.08666667, 38.01, 38.2, 37.9, 37.95333333, 37.98333333, 38.18666667,
38.30526316, 38.33529412, 38.53666667, 38.68666667, 38.61, 38.71
)), class = "data.frame", row.names = c(NA, -20L))
dat <- dat %>%
filter(str_detect(Variable, "mean")) %>%
separate(Variable, into = c("Sample", "mean", "time"), sep = "_")
g <- ggplot(data=dat, aes(x=time, y=Value, group=Sample)) +
geom_line(aes(colour=Sample))
g
Created on 2020-07-20 by the reprex package (v0.3.0)

How to part diverging bar plots in R

Hi I am relatively new in R / ggplot2 and I would like to ask for some advice on how to create a plot that looks like this:
Explanation: A diverging bar plot showing biological functions with genes that have increased expression (yellow) pointing towards the right, as well as genes with reduced expression (purple) pointing towards the left. The length of the bars represent the number of differentially expressed genes, and color intensity vary according to their p-values.
Note that the x-axis must be 'positive' in both directions.
(In published literature on gene expression experimental studies, bars that point towards the left represent genes that have reduced expression, and right to show genes that have increased expression. The purpose of the graph is not to show the "magnitude" of change (which would give rise to positive and negative values). Instead, we are trying to plot the NUMBER of genes that have changes of expression, therefore cannot be negative)
I have tried ggplot2 but fails completely to reproduce the graph that is shown.
Here is the data which I am trying to plot: Click here for link
> dput(sample)
structure(list(Name = structure(c(15L, 19L, 5L, 11L, 8L, 6L,
16L, 13L, 17L, 1L, 3L, 2L, 14L, 18L, 7L, 12L, 10L, 9L, 4L, 20L
), .Label = c("Actin synthesis", "Adaptive immunity", "Antigen presentation",
"Autophagy", "Cell cycle", "Cell division", "Cell polarity",
"DNA repair", "Eye development", "Lipid metabolism", "Phosphorylation",
"Protein metabolism", "Protein translation", "Proteolysis", "Replication",
"Signaling", "Sumoylation", "Trafficking", "Transcription", "Translational initiation"
), class = "factor"), Trend_in_AE = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Down", "Up"), class = "factor"), Count = c(171L,
201L, 38L, 63L, 63L, 47L, 22L, 33L, 20L, 16L, 16L, 7L, 10L, 4L,
13L, 15L, 5L, 7L, 9L, 7L), PValue = c(1.38e-08, 1.22e-06, 1.79e-06,
2.89e-06, 0.000122, 0.000123, 0.00036, 0.000682, 0.001030253,
0.001623939, 7.76e-05, 0.000149, 0.000734, 0.001307039, 0.00292414,
0.003347556, 0.00360096, 0.004006781, 0.007330264, 0.010083734
)), .Names = c("Name", "Trend_in_AE", "Count", "PValue"), class = "data.frame", row.names = c(NA,
-20L))
Thank you very much for your help and suggestions, this is really help with my learning.
My own humble attempt was this:
table <- read.delim("file.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(aes(x=Number, y=Names)) +
geom_bar(stat="identity",position="identity") +
xlab("number of genes") +
ylab("Name"))
Result was error message regarding the aes
Although not exactly what you are looking for, but the following should get you started. #Genoa, as the expression goes, "there are no free lunches". So in this spirit, like #dww has rightly pointed out, show "some effort"!
# create dummy data
df <- data.frame(x = letters,y = runif(26))
# compute normalized occurence for letter
df$normalize_occurence <- round((df$y - mean(df$y))/sd(df$y), 2)
# categorise the occurence
df$category<- ifelse(df$normalize_occurence >0, "high","low")
# check summary statistic
summary(df)
x y normalize_occurence
a : 1 Min. :0.00394 Min. :-1.8000000
b : 1 1st Qu.:0.31010 1st Qu.:-0.6900000
c : 1 Median :0.47881 Median :-0.0800000
d : 1 Mean :0.50126 Mean : 0.0007692
e : 1 3rd Qu.:0.70286 3rd Qu.: 0.7325000
f : 1 Max. :0.93091 Max. : 1.5600000
(Other):20
category
Length:26
Class :character
Mode :character
ggplot(df,aes(x = x,y = normalize_occurence)) +
geom_bar(aes(fill = category),stat = "identity") +
labs(title= "Diverging Bars")+
coord_flip()
#ddw and #Ashish are right - there's a lot in this question. It's also not clear how ggplot "failed" in reproducing the figure, and that would help understand what you're struggling with.
The key to ggplot is that pretty much everything that you want to include in the plotting should be included in the data. Adding a few variables to your table to help with putting bars in the right direction will get you a long way toward what you want. Make the variables that are actually negative ("down" values) negative, and they'll plot that way:
r_sample$Count2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$Count*-1,r_sample$Count)
r_sample$PValue2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$PValue*-1,r_sample$PValue)
Then reorder your "Name" so that it plots according to the new PValue2 variable:
r_sample$Name <- factor(r_sample$Name, r_sample$Name[order(r_sample$PValue2)], ordered=T)
Lastly, you'll want to left-justify some labels and right-justify others, so make that a variable now:
r_sample$just <- ifelse(r_sample$Trend_in_AE=="Down",0,1)
Then some fairly minimal plot code gets you quite close to what you want:
ggplot(r_sample, aes(x=Name, y=Count2, fill=PValue2)) +
geom_bar(stat="identity") +
scale_y_continuous("Number of Differently Regulated Genes", position="top", limits=c(-100,225), labels=c(100,0,100,200)) +
scale_x_discrete("", labels=NULL) +
scale_fill_gradient2(low="blue", mid="light grey", high="yellow", midpoint=0) +
coord_flip() +
theme_minimal() +
geom_text(aes(x=Name, y=0, label=Name), hjust=r_sample$just)
You can explore the theme commands on the ggplot2 help page to figure out the rest of the formatting.

Ordering of factor variables [duplicate]

I am calling the ggplot function
ggplot(data,aes(x,y,fill=category)+geom_bar(stat="identity")
The result is a barplot with bars filled by various colours corresponding to category. However the ordering of the colours is not consistent from bar to bar. Say there is pink, green and blue. Some bars go pink,green,blue from bottom to top and some go green,pink,blue. I don't see any obvious pattern.
How are these orderings chosen? How can I change it? At the very least, how can I make ggplot choose a consistent ordering?
The class of (x,y and category) are (integer,numeric and factor) respectively. If I make category an ordered factor, it does not change this behavior.
Anyone know how to fix this?
Reproducible example:
dput(data)
structure(list(mon = c(9L, 10L, 11L, 10L, 8L, 7L, 7L, 11L, 9L,
10L, 12L, 11L, 7L, 12L, 8L, 12L, 9L, 7L, 9L, 10L, 10L, 8L, 12L,
7L, 11L, 10L, 8L, 7L, 11L, 12L, 12L, 9L, 9L, 7L, 7L, 12L, 12L,
9L, 9L, 8L), gclass = structure(c(9L, 1L, 8L, 6L, 4L, 4L, 3L,
6L, 2L, 4L, 1L, 1L, 5L, 7L, 1L, 6L, 8L, 6L, 4L, 7L, 8L, 7L, 9L,
8L, 3L, 5L, 9L, 2L, 7L, 3L, 5L, 5L, 7L, 7L, 9L, 2L, 4L, 1L, 3L,
8L), .Label = c("Down-Down", "Down-Stable", "Down-Up", "Stable-Down",
"Stable-Stable", "Stable-Up", "Up-Down", "Up-Stable", "Up-Up"
), class = c("ordered", "factor")), NG = c(222614.67, 9998.17,
351162.2, 37357.95, 4140.48, 1878.57, 553.86, 40012.25, 766.52,
15733.36, 90676.2, 45000.29, 0, 375699.84, 2424.21, 93094.21,
120547.69, 291.33, 1536.38, 167352.21, 160347.01, 26851.47, 725689.06,
4500.55, 10644.54, 75132.98, 42676.41, 267.65, 392277.64, 33854.26,
384754.67, 7195.93, 88974.2, 20665.79, 7185.69, 45059.64, 60576.96,
3564.53, 1262.39, 9394.15)), .Names = c("mon", "gclass", "NG"
), row.names = c(NA, -40L), class = "data.frame")
ggplot(data,aes(mon,NG,fill=gclass))+geom_bar(stat="identity")
Starting in ggplot2_2.0.0, the order aesthetic is no longer available. To get a graph with the stacks ordered by fill color, you can simply order the dataset by the grouping variable you want to order by.
I often use arrange from dplyr for this. Here I'm ordering the dataset by the fill factor within the ggplot call rather than creating an ordered dataset but either will work fine.
library(dplyr)
ggplot(arrange(data, gclass), aes(mon, NG, fill = gclass)) +
geom_bar(stat = "identity")
This is easily done in base R, of course, using the classic order with the extract brackets:
ggplot(data[order(data$gclass), ], aes(mon, NG, fill = gclass)) +
geom_bar(stat = "identity")
With the resulting plot in both cases now in the desired order:
ggplot2_2.2.0 update
In ggplot_2.2.0, fill order is based on the order of the factor levels. The default order will plot the first level at the top of the stack instead of the bottom.
If you want the first level at the bottom of the stack you can use reverse = TRUE in position_stack. Note you can also use geom_col as shortcut for geom_bar(stat = "identity").
ggplot(data, aes(mon, NG, fill = gclass)) +
geom_col(position = position_stack(reverse = TRUE))
You need to specify the order aesthetic as well.
ggplot(data,aes(mon,NG,fill=gclass,order=gclass))+
geom_bar(stat="identity")
This may or may not be a bug.
To order, you must use the levels parameter and inform the order. Like this:
data$gclass
(data$gclass2 <- factor(data$gclass,levels=sample(levels(data$gclass)))) # Look the difference in the factors order
ggplot(data,aes(mon,NG,fill=gclass2))+geom_bar(stat="identity")
You can change the colour using the scale_fill_ functions. For example:
ggplot(dd,aes(mon,NG,fill=gclass)) +
geom_bar(stat="identity") +
scale_fill_brewer(palette="blues")
To get consistent ordering in the bars, then you need to order the data frame:
dd = dd[with(dd, order(gclass, -NG)), ]
In order to change the ordering of legend, alter the gclass factor. So something like:
dd$gclass= factor(dd$gclass,levels=sort(levels(dd$gclass), TRUE))
Since this exchange shows up first for "factor fill order", I will add one more solution, what I believe to be a bit more straight forward, and doesn't require altering your underlying data.
ggplot(data,aes(x,y,fill=factor(category, levels = c("Down-Down", "Down-Stable", "Down-Up", "Stable-Down", "Stable-Stable", "Stable-Down", "Up-Down", "Up-Stable", "Up-Up"))) +
geom_col(position = position_stack(reverse = FALSE))
Or as I prefer, I first create a variable vector to simplify coding later and make it more easily editable:
v_factor_levels <- c("Down-Down", "Down-Stable", "Down-Up", "Stable-Down", "Stable-Stable", "Stable-Down", "Up-Down", "Up-Stable", "Up-Up")
ggplot(data,aes(x,y,fill=factor(category, levels = v_factor_levels)) +
geom_col(position = position_stack(reverse = FALSE))
You don't need the reverse position element within geom_col(), I keep these as a reminder in case I want to reverse, but you could further simplify by eliminating that.
Building on #aosmith 's answer, another way to order the bars, that I found slightly more intuitive is:
ggplot(data, aes(x=mon, y=reorder(NG,gclass), fill = gclass)) +
geom_bar(stat = "identity")
The beauty of the reorder function from the base stats package is that you can apply it in the reorder(based_on_dimension, y, function) wherein y is ordered based_on_dimension with a function like sum, mean, etc.

ggplot: better presentation of barplot

I have a small data frame DF which consists of two columns X=Type, Y=Cost. want to graph a barplot for each type with its cost. I have managed to do that, however, I'm seeking a better presentation in barplot. I have three issues which I think will satisfy my requirements:
1) Since the X-axis text for each type is long, I made them with 45 degree. I tried abbreviation, it was unreadable !!!
2) Instead of the color, I was trying to use filling patterns/texture in ggplot, which turns out not possible by Hadley : fill patterns
Is there any way to make the plot readable in case of black/white printing ?
3) I'm wondering if there is a way to focus on one of the "Type" categories i.e. make it bold and special color to attract the eye to this special type. For example, I want to make the "other" result looks different from other.
I tried my thoughts, however, I'm totally open to re-design the graph. Any suggestions
Here is the data- I have used dput command:
structure(list(Type = structure(c(6L, 8L, 7L, 9L, 10L, 15L, 11L,
17L, 3L, 16L, 5L, 19L, 4L, 14L, 2L, 18L, 13L, 1L, 12L), .Label = c("Backup Hardware ",
"data or network control", "Email exchange server/policy", "Instant messaging control",
"Login/system administrators/privilage", "Machine A", "Machine A with Software and Camera",
"Machine A without Software", "Machine B", "Machine B without RAM and CD ROM",
"Managment analyses software ", "Other", "Password and security",
"public web application availability", "Software for backup",
"System access by employees ", "Telecom and Harware", "Web site update",
"wireless network access ponits"), class = "factor"), Cost = structure(c(4L,
3L, 15L, 13L, 11L, 7L, 2L, 1L, 19L, 16L, 14L, 12L, 10L, 9L, 8L,
6L, 5L, 17L, 18L), .Label = c("$1,292,312", "$1,888,810", "$11,117,200",
"$14,391,580", "$161,210", "$182,500", "$2,145,900", "$250,000",
"$270,500", "$298,810", "$3,452,010", "$449,001", "$6,034,000",
"$621,710", "$7,642,660", "$700,000", "$85,100", "$885,000",
"$923,700"), class = "factor")), .Names = c("Type", "Cost"), class = "data.frame", row.names = c(NA,
-19L))
here is my code in R:
p<- ggplot(data = DF, aes(x=Type, y=Cost)) +
geom_bar(aes(fill=Type),stat="identity") +
geom_line()+
scale_x_discrete (name="Type")+
scale_y_discrete(name="Cost")+
theme(axis.text.x = element_text(colour="black",size=11,face="bold")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
labs(fill=expression(paste("Type of study\n")))
print(p)
Here are some starting points for your plot:
1) First, converted variable Cost from factor to numeric and named it Cost2
DF$Cost2<-as.numeric(gsub("[^0-9]", "",DF$Cost))
2) Converted your plot to grey scale using scale_fill_manual() - here all bars are grey except bar for Other that is black. With scale_y_continuous() made y axis values again as dollars with labels=dollar (for this you need to add library scales). To make Other label of x axis bold while others are normal you should provide argument face= inside theme() axis.text.x= with vector of the same length as number of levels - 1 for all levels and 2 for level Other.
library(scales)
ggplot(data = DF, aes(x=Type, y=Cost2)) +
geom_bar(aes(fill=Type),stat="identity",show_guide=FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1,
face=(as.numeric(levels(DF$Type)=="Other") + 1)))+
scale_y_continuous(labels=dollar)+
scale_fill_manual(values=c("grey43","black")[as.numeric(levels(DF$Type)=="Other")+1])

Resources