How to part diverging bar plots in R - r

Hi I am relatively new in R / ggplot2 and I would like to ask for some advice on how to create a plot that looks like this:
Explanation: A diverging bar plot showing biological functions with genes that have increased expression (yellow) pointing towards the right, as well as genes with reduced expression (purple) pointing towards the left. The length of the bars represent the number of differentially expressed genes, and color intensity vary according to their p-values.
Note that the x-axis must be 'positive' in both directions.
(In published literature on gene expression experimental studies, bars that point towards the left represent genes that have reduced expression, and right to show genes that have increased expression. The purpose of the graph is not to show the "magnitude" of change (which would give rise to positive and negative values). Instead, we are trying to plot the NUMBER of genes that have changes of expression, therefore cannot be negative)
I have tried ggplot2 but fails completely to reproduce the graph that is shown.
Here is the data which I am trying to plot: Click here for link
> dput(sample)
structure(list(Name = structure(c(15L, 19L, 5L, 11L, 8L, 6L,
16L, 13L, 17L, 1L, 3L, 2L, 14L, 18L, 7L, 12L, 10L, 9L, 4L, 20L
), .Label = c("Actin synthesis", "Adaptive immunity", "Antigen presentation",
"Autophagy", "Cell cycle", "Cell division", "Cell polarity",
"DNA repair", "Eye development", "Lipid metabolism", "Phosphorylation",
"Protein metabolism", "Protein translation", "Proteolysis", "Replication",
"Signaling", "Sumoylation", "Trafficking", "Transcription", "Translational initiation"
), class = "factor"), Trend_in_AE = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Down", "Up"), class = "factor"), Count = c(171L,
201L, 38L, 63L, 63L, 47L, 22L, 33L, 20L, 16L, 16L, 7L, 10L, 4L,
13L, 15L, 5L, 7L, 9L, 7L), PValue = c(1.38e-08, 1.22e-06, 1.79e-06,
2.89e-06, 0.000122, 0.000123, 0.00036, 0.000682, 0.001030253,
0.001623939, 7.76e-05, 0.000149, 0.000734, 0.001307039, 0.00292414,
0.003347556, 0.00360096, 0.004006781, 0.007330264, 0.010083734
)), .Names = c("Name", "Trend_in_AE", "Count", "PValue"), class = "data.frame", row.names = c(NA,
-20L))
Thank you very much for your help and suggestions, this is really help with my learning.
My own humble attempt was this:
table <- read.delim("file.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(aes(x=Number, y=Names)) +
geom_bar(stat="identity",position="identity") +
xlab("number of genes") +
ylab("Name"))
Result was error message regarding the aes

Although not exactly what you are looking for, but the following should get you started. #Genoa, as the expression goes, "there are no free lunches". So in this spirit, like #dww has rightly pointed out, show "some effort"!
# create dummy data
df <- data.frame(x = letters,y = runif(26))
# compute normalized occurence for letter
df$normalize_occurence <- round((df$y - mean(df$y))/sd(df$y), 2)
# categorise the occurence
df$category<- ifelse(df$normalize_occurence >0, "high","low")
# check summary statistic
summary(df)
x y normalize_occurence
a : 1 Min. :0.00394 Min. :-1.8000000
b : 1 1st Qu.:0.31010 1st Qu.:-0.6900000
c : 1 Median :0.47881 Median :-0.0800000
d : 1 Mean :0.50126 Mean : 0.0007692
e : 1 3rd Qu.:0.70286 3rd Qu.: 0.7325000
f : 1 Max. :0.93091 Max. : 1.5600000
(Other):20
category
Length:26
Class :character
Mode :character
ggplot(df,aes(x = x,y = normalize_occurence)) +
geom_bar(aes(fill = category),stat = "identity") +
labs(title= "Diverging Bars")+
coord_flip()

#ddw and #Ashish are right - there's a lot in this question. It's also not clear how ggplot "failed" in reproducing the figure, and that would help understand what you're struggling with.
The key to ggplot is that pretty much everything that you want to include in the plotting should be included in the data. Adding a few variables to your table to help with putting bars in the right direction will get you a long way toward what you want. Make the variables that are actually negative ("down" values) negative, and they'll plot that way:
r_sample$Count2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$Count*-1,r_sample$Count)
r_sample$PValue2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$PValue*-1,r_sample$PValue)
Then reorder your "Name" so that it plots according to the new PValue2 variable:
r_sample$Name <- factor(r_sample$Name, r_sample$Name[order(r_sample$PValue2)], ordered=T)
Lastly, you'll want to left-justify some labels and right-justify others, so make that a variable now:
r_sample$just <- ifelse(r_sample$Trend_in_AE=="Down",0,1)
Then some fairly minimal plot code gets you quite close to what you want:
ggplot(r_sample, aes(x=Name, y=Count2, fill=PValue2)) +
geom_bar(stat="identity") +
scale_y_continuous("Number of Differently Regulated Genes", position="top", limits=c(-100,225), labels=c(100,0,100,200)) +
scale_x_discrete("", labels=NULL) +
scale_fill_gradient2(low="blue", mid="light grey", high="yellow", midpoint=0) +
coord_flip() +
theme_minimal() +
geom_text(aes(x=Name, y=0, label=Name), hjust=r_sample$just)
You can explore the theme commands on the ggplot2 help page to figure out the rest of the formatting.

Related

How to add a edges between component of a graph in igraph R

I have a graph containing 4 components. Now, I want to add an edge among all components based on the size of the membership.
For example, the following graph contains 4 components.
First, I will connect all components with only one edge and take the edge randomly. I can do it using this code
graph1 <- graph_from_data_frame(g, directed = FALSE)
E(graph1)$weight <- g$new_ssp
cl <- components(graph1)
graph2 <- with(
stack(membership(cl)),
add.edges(
graph1,
c(combn(sapply(split(ind, values), sample, size = 1), 2)),
weight = runif(choose(cl$no, 2))
)
)
Secondly, now, I want to add an edge between component-1 and component-2. I want to add an edge between 2 components but rest of the component will be present in the new graph from the previous graph.
Like, after adding an edge between component-1 and component-2, the new graph will contain 3 component 1st (component-1 and component-2 as a 1 component because we added 1 edge), 2nd (component-3 from the main graph), and 3rd (component-4 from the main graph). I can do it using this code
dg <- decompose.graph(graph1)
graph3 <- (dg[[1]] %u% dg[[2]])
component_subgraph_1 <- components(graph3)
graph2 <- with(
stack(membership(component_subgraph_1)),
add.edges(
graph1,
c(combn(sapply(split(ind, values), sample, size = 1), 2)),
weight = 0.01))
Figure:
Same for all combinations. Such as, component-1 and component-3, and component-1 and component-4, and component-2 and component-3, and component-2 and component-4, and component-3 and component-4.
But, this is not feasible to write the code and change manually dg[[1]], dg[[2]], and so on. Moreover, my actual dataset contains a lot of components. So, in reality, this is impossible.
Any idea, how can I do this automatically?
Actually, I have a scoring function (like the shortest path). So, I want to check the score after adding all components, or after adding only 2 components, after adding only 3 components, and so on! Something like greedy algorithms.
Reproducible Data:
g <- structure(list(query = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("ID_00104",
"ID_00136", "ID_00169", "ID_00178", "ID_00180"), class = "factor"),
target = structure(c(16L, 19L, 20L, 1L, 9L, 9L, 6L, 11L,
13L, 15L, 4L, 8L, 10L, 14L, 2L, 3L, 5L, 7L, 12L, 17L, 18L
), .Label = c("ID_00169", "ID_00288", "ID_00324", "ID_00394",
"ID_00663", "ID_00790", "ID_00846", "ID_00860", "ID_00910", "ID_00959",
"ID_01013", "ID_01047", "ID_01130", "ID_01222", "ID_01260", "ID_06663",
"ID_06781", "ID_06786", "ID_06791", "ID_09099"), class = "factor"),
new_ssp = c(0.654172560113154, 0.919096895578551, 0.925821596244131,
0.860406091370558, 0.746376811594203, 0.767195767195767,
0.830379746835443, 0.661577608142494, 0.707520891364902,
0.908193484698914, 0.657118786857624, 0.687664041994751,
0.68586387434555, 0.874513618677043, 0.836646499567848, 0.618361836183618,
0.684163701067616, 0.914728682170543, 0.876297577854671,
0.732707087959009, 0.773116438356164)), row.names = c(NA,
-21L), class = "data.frame")
Thanks in advance.
You are actually close to what you want already. Perhaps the code below could help you
out <- with(
stack(membership(cl)),
lapply(
combn(split(ind, values), 2, simplify = FALSE),
function(x) {
add.edges(
graph1,
c(combn(sapply(x, sample, size = 1), 2)),
weight = 0.01
)
}
)
)
and then you can run
sapply(out, plot)
to visualize all the combinations.

How to manage 4 different parameters in ggplot

I have the following problem: I want to create a plot using ggplot, showing the relationship between two variables (Microplastic quantification in mussels, denoted as MP and Lipofuscinaccumulation denoted as Lip) of different treatment groups and independence of exposure time.
My data look like this:
And here is my Code:
ggplot(Catrv_all,aes(Lip,MP,color=treatment))+
geom_smooth(method="lm", se=FALSE)+
geom_point(size = 2)+
theme(legend.position = "bottom")+
theme(plot.title = element_text(hjust = 0.5))+
labs(x = "Lipofuscin accumulation [% area]",
y = "Microplastic quantification [% area]",
title = "Lipofuscin accumulation vs. Microplastic quantification")
The plot looks like this:
I recognized that ggplot obviously did not order the values in the correct way for exposure Time because the values disagree (it starts for example not with the value for 0 h).
My question is: how can I tell ggplot to reorder the values for MP and Lip in the right order in terms of exposure Time? Should I create second x-axes? If yes, how can I do that in ggplot?
I saw a lot of discussions in SO, that this is difficult to create a second x/y axes in ggplot, but I don't know how I should visualize my data in another way.
Update for my question: I heed advice of sconfluentus and found a very interesting answer of Ben Bolker in the following post:
How can I plot with 2 different y-axes?
I adapted the provided code:
## add extra space to right margin of plot within frame
par(mar=c(5, 4, 4, 6) + 0.1)
## split data set for treatment groups
MP_Ko<-Catrv_all$MP[1:8]
exp<-Catrv_all$expTime[1:8]
Lip_Ko<-Catrv_all$Lip[1:8]
## Plot first set of data and draw its axis
plot(exp, MP_Ko, pch=16, axes=FALSE, xlab="", ylab="",
type="b",col="black", main="Microplastic quantification vs. Lipofuscin accumulation in Controls")
axis(2,col="black",las=1) ## las=1 makes horizontal labels
mtext("Microplastic quantification [% area]",side=2,line=2.5)
box()
## Allow a second plot on the same graph
par(new=TRUE)
## Plot the second plot and put axis scale on right
plot(exp,Lip_Ko, pch=15, xlab="", ylab="",
axes=FALSE, type="b", col="red")
## a little farther out (line=4) to make room for labels
mtext("Lipofuscin accumulation [% area]",side=4,col="red",line=4)
axis(4, col="red",col.axis="red",las=1)
## Draw the time axis
axis(1,pretty(range(Catrv_all$expTime, 672)))
mtext("Time (Hours)",side=1,col="black",line=2.5)
## Add Legend
legend("topright",legend=c("Microplastic quantification","Lipofuscin accumulation"),
text.col=c("black","red"),pch=c(16,15),col=c("black","red"))
... and got the following plot:
enter image description here
Time consuming, but this approach was very helpful.
Thank you all for your advices! I now used dput(Catrv_all), and here is the output of my data:
structure(list(treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Co", "CoP", "HDPE"), class = "factor"),
expTime = c(0L, 3L, 6L, 24L, 96L, 168L, 336L, 672L, 0L, 3L,
6L, 24L, 96L, 168L, 336L, 672L, 0L, 3L, 6L, 24L, 96L, 168L,
336L, 672L), MP = c(0.056481655, 0.098508038, 0.097108112,
0.056848278, 0.082198187, 0.052261369, 0.022911461, 0.023901656,
0.056481655, 0.124866733, 0.125732967, 0.07986102, 0.071233133,
0.128376543, 0.331948, 0.121689155, 0.056481655, 0.186735799,
0.137477095, 0.41251914, 0.093364945, 0.085760245, 0.249371764,
0.187693319), Lip = c(9.848221569, 11.62875399, 9.530378924,
12.67745734, 14.14610784, 11.44140636, 11.55310567, 12.37321851,
9.848221569, 8.889567938, 12.5142123, 13.79770638, 11.26698845,
14.67064904, 14.56027915, 15.24772977, 9.848221569, 12.22424265,
13.05104725, 12.96830215, 12.10175574, 14.66505958, 13.67550035,
11.65168387), Cat = c(6.681571728, 7.321681629, 4.939885929,
7.73812502, 6.85066487, 9.317238053, 8.309505248, 9.33338377,
6.681571728, 7.517468479, 7.151607966, 9.074518192, 6.350614893,
9.749092742, 9.335634354, 11.43658695, 6.681571728, 6.164473371,
9.416062149, 9.19813927, 8.041328941, 8.736550013, 9.788258534,
10.55471537), CI = c(120.5252336, 110.1709456, 112.9077575,
110.9032308, 101.0274926, 101.1970679, 107.1464111, 97.42950278,
120.5252336, 101.7284063, 132.6162567, 108.7251954, 107.2199383,
102.9096767, 100.9637646, 101.6655302, 120.5252336, 102.1888777,
111.9139996, 113.7840225, 104.4767637, 103.1984161, 96.67797683,
95.59369834)), .Names = c("treatment", "expTime", "MP", "Lip",
"Cat", "CI"), class = "data.frame", row.names = c(NA, -24L))
Hopefully it would help to reconstruct my code.
Again to my question: yes, I would like to show exposure Time as well on one of the axes (if this is possible). And secondly, I want to show a kind of "time series" (from 0h to 672 h) and the behaviour of both MP and Lip for all treatment groups. So my first idea was: y-axes: MP, x-axes bottom: Lip, x-axes on top: exposure Time --> plot values for all treatment groups in the right order for exposure time (from 0 to 672). Try to plot a trend line. In fact, I want a visual evidence, that MP behavior (over time) led to changes in Lipofuscinaccumulation for different treatment groups.
#Jake Kaupp: I am not sure, how to facet_wrap in ggplot. May you specify that a bit please?

Add symbol on top of ggplot2 boxplots to indicate value of variable

Working with the following subset of a much larger dataset,
ex <- structure(list(transect_id = c(1L, 1L, 1L, 1L, 1L, 15L, 15L,
15L, 15L, 15L, 15L), number_f = c(2L, 2L, 2L, 2L, 2L, 0L, 0L,
0L, 0L, 0L, 0L), years_f = c(1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L,
6L, 6L, 6L), b = c(5.036625862, 6.468666553, 8.028989792, 4.168409348,
5.790089607, 10.67796993, 9.371051788, 10.54364777, 6.904324532,
7.203606129, 9.1611166)), .Names = c("transect_id", "number_f",
"years_f", "b"), class = "data.frame", row.names = c(1L, 2L,
3L, 4L, 5L, 2045L, 2046L, 2047L, 2048L, 2049L, 2050L))
I've plotted the distributions of "b" for each of the groups indicated by "transect_id" and have colored them by "number_f", which I do here:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) + geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')
What I need to do for each of the "transect_id" groups is stack symbols - asterisks or some other symbol - on top of each boxplot to provide an indication of the value of "years_f" that corresponds to each "transect_id". In the data subset below, "years_f" amounts to 1 and 6 for transect_ids 1 and 15, respectively. I'd like to see something like this, which I manually mocked up.
Also keep in mind that the dataset I'm working with is very large so I'll need to use some loop or some other way of doing this automatically. Please note that I absolutely welcome other ideas for better ways of indicating the value of "years_f" that might not overburden the figure as much as having all of these stacked symbols that will particularly be an issue for larger values of "years_f".
Try adding
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
to the end of your plot like so:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
To use it on a bigger dataset you would have to edit the x and y argument, but this might be a decent alternative. A possibility for the y coordinate could be something like 0.9 * min(ex$b).
edit In response to your comment:
You could first count how many levels there are of transect_id to specify x
len.levels <- length(levels(as.factor(ex$transect_id)))
then, you could create a summary table of the uniqe years_f variable by transect_id:
sum.table <- aggregate(years_f~reorder(ex$transect_id, ex$b, median),
data = ex, FUN = unique)
reorder(ex$transect_id, ex$b, median) years_f
1 1 1
2 15 6
and then plot as follows:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = 1:len.levels, y = .9 * min(ex$b),
label = paste0('Year_F =', sum.table[,2]))

Resize/manually enter breaks on colorbar guide of geom_tile AND replace y-axis labels

I am revisiting this issue I ran into approximately a year ago. I would like my 'colourbar' guide to effectively be displayed on a log scale so that the takeaway when looking at it is that increasingly darker values of blue reflect greater significance.
With the following code, I generate the below image:
pz <- ggplot(dat.m, aes(x=variable,y=Category)) +
geom_tile(aes(fill=value)) +
xlab(NULL) + ylab(NULL) +
scale_fill_gradientn(colours=c("#000066","#0000FF","#DDDDDD","white"),
values=c(0,0.05,0.050000000000001,1.0),
breaks=c(0, 0.000001, 0.01, 0.05, 1),
guide = "colourbar") +
theme_bw()+
theme(panel.background = element_blank(),
panel.border = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank()) +
theme(legend.position="top",
legend.text = element_text(angle=45),
axis.text.x = element_text(angle=45)
)
Or, I can display it as a "legend" as opposed to a "colourbar":
But what I really desire is something like this:
I have tried adding 'trans="log"' (scale_fill_gradientn(trans="log")), but there are lots of zeros in my data which causes a problem. If you have any ideas it would be greatly appreciated!
Previous wording:
I am trying to make a heatmap of p-values for different samples for various categorizations. There are two things I would like to modify on this plot:
I would like to adjust the legend of my geom_tile plot to emphasize the lower end of the legend scale while still maintaining the full spectrum of the gradient - similar to how it would look if it were a log scale. So essentially the white to blue transition from 1.0-0.05 and the blue to darkblue transition from 0.05-0.00 will be approximately equal in size. Is there a way that I can manually adjust the colorbar guide?
I would like to replace the y-axis names so that I can remove my "empty" row label. Note, the Categories are simply represented as letters here, but in my real data set they are long names. I have inserted "dummy" rows of data to split categorizations into chucks and ordered the tiles within each block to go from most significant to not significant - I am sure there is a better solution to this, but this is what I came up with after many failed attempts of other ideas I found on stack overflow! I have tried labeling them with scale_y_discrete, but this gets jumbled with the aforementioned ordering.
Help with either of these issues will be much appreciated!
Here is a sample dataset:
dput(dat.m)
structure(list(Category = structure(c(12L, 11L, 10L, 9L, 8L,
7L, 6L, 5L, 4L, 3L, 2L, 1L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L,
4L, 3L, 2L, 1L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L), class = "factor", .Label = c("j", "i", "empty2", "h", "empty1",
"g", "f", "e", "d", "c", "b", "a")), variable = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("b2", "c1", "c2"), class = "factor"),
value = c(7.40214650772221e-06, 0.0075828339, 0.1825924627,
0.0384381317, 0.0440256659, 0.3659284985, 0.9777569144, 1,
0.0075828339, 1, 0.2193606406, 0.3659284985, 0.0004289756,
0.0011541045, 0.0004289756, 0.4400885491, 0.6121402215, 0.6724032426,
0.2735924085, 1, 0.018824582, 1, 0.4386503891, 0.4249526456,
1.05094571578633e-05, 0.0027216795, 0.715979827, 0.0050376405,
0.7473334763, 0.9053300832, 1, 1, 0.0015392848, 1, 0.039679469,
0.0950327519)), .Names = c("Category", "variable", "value"
), row.names = c(NA, -36L), class = "data.frame")
And here is my code:
col_blue <- c("#FFFFFF","#000099","#000066","#000033")
ggplot(dat.m, aes(x=variable,y=Category)) +
geom_tile(aes(fill=value)) +
xlab(NULL) + ylab(NULL) +
scale_fill_gradientn(colours=col_blue, values=c(1,0.05,0.01,0),guide="colorbar") +
theme_mary(base_size=12)
UPDATE:
So now I have modified the code as such with the following results. I am getting closer to what I hope to achieve but I would like to play with the proportions of the colourbar to show the gradient from 0.05-0.0 a bit more clearly.
col_blue <- c("#FFFFFF","#000099","#000066","#000033")
ggplot(dat.m, aes(x=variable,y=Category)) +
geom_tile(aes(fill=value)) +
xlab(NULL) + ylab(NULL) +
scale_fill_gradientn(colours=col_blue, values=c(1,0.05,0.01,0), guide=FALSE) +
scale_colour_gradientn(guide = "colourbar", limits = c(0,1),breaks=c(1,0.05,0.01,0),values=c(1,0.05,0.01,0),colours=c("#FFFFFF","#000099","#000066","#000033"))
We can tell scale_fill_gradientn not to display a guide with guide=FALSE, then manually add our own with limits set to c(0,0.1) (or whatever range you want).
ggplot(dat.m, aes(x=variable,y=Category)) +
geom_tile(aes(fill=value)) +
xlab(NULL) +
ylab(NULL) +
scale_fill_gradientn(colours=col_blue, values=c(1,0.05,0.01,0), guide=FALSE) +
scale_colour_gradientn(guide = "colorbar", limits = c(0,0.1), colours=col_blue)
As for your second point, why not just remove the "empty" rows from the source data before plotting?
for (1), simply modify the data being used to drop the empty rows before (or as you are) plotting. eg: ggplot(dat.m[!grepl("^empty", dat.m$Category), ], aes(<etc>...))
for (2), you can override the aesthetics specifically just for the legend. Here is one example, adjust to your taste: + guides(fill=guide_legend(override.aes=list(alpha=1)))

ggplot: better presentation of barplot

I have a small data frame DF which consists of two columns X=Type, Y=Cost. want to graph a barplot for each type with its cost. I have managed to do that, however, I'm seeking a better presentation in barplot. I have three issues which I think will satisfy my requirements:
1) Since the X-axis text for each type is long, I made them with 45 degree. I tried abbreviation, it was unreadable !!!
2) Instead of the color, I was trying to use filling patterns/texture in ggplot, which turns out not possible by Hadley : fill patterns
Is there any way to make the plot readable in case of black/white printing ?
3) I'm wondering if there is a way to focus on one of the "Type" categories i.e. make it bold and special color to attract the eye to this special type. For example, I want to make the "other" result looks different from other.
I tried my thoughts, however, I'm totally open to re-design the graph. Any suggestions
Here is the data- I have used dput command:
structure(list(Type = structure(c(6L, 8L, 7L, 9L, 10L, 15L, 11L,
17L, 3L, 16L, 5L, 19L, 4L, 14L, 2L, 18L, 13L, 1L, 12L), .Label = c("Backup Hardware ",
"data or network control", "Email exchange server/policy", "Instant messaging control",
"Login/system administrators/privilage", "Machine A", "Machine A with Software and Camera",
"Machine A without Software", "Machine B", "Machine B without RAM and CD ROM",
"Managment analyses software ", "Other", "Password and security",
"public web application availability", "Software for backup",
"System access by employees ", "Telecom and Harware", "Web site update",
"wireless network access ponits"), class = "factor"), Cost = structure(c(4L,
3L, 15L, 13L, 11L, 7L, 2L, 1L, 19L, 16L, 14L, 12L, 10L, 9L, 8L,
6L, 5L, 17L, 18L), .Label = c("$1,292,312", "$1,888,810", "$11,117,200",
"$14,391,580", "$161,210", "$182,500", "$2,145,900", "$250,000",
"$270,500", "$298,810", "$3,452,010", "$449,001", "$6,034,000",
"$621,710", "$7,642,660", "$700,000", "$85,100", "$885,000",
"$923,700"), class = "factor")), .Names = c("Type", "Cost"), class = "data.frame", row.names = c(NA,
-19L))
here is my code in R:
p<- ggplot(data = DF, aes(x=Type, y=Cost)) +
geom_bar(aes(fill=Type),stat="identity") +
geom_line()+
scale_x_discrete (name="Type")+
scale_y_discrete(name="Cost")+
theme(axis.text.x = element_text(colour="black",size=11,face="bold")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
labs(fill=expression(paste("Type of study\n")))
print(p)
Here are some starting points for your plot:
1) First, converted variable Cost from factor to numeric and named it Cost2
DF$Cost2<-as.numeric(gsub("[^0-9]", "",DF$Cost))
2) Converted your plot to grey scale using scale_fill_manual() - here all bars are grey except bar for Other that is black. With scale_y_continuous() made y axis values again as dollars with labels=dollar (for this you need to add library scales). To make Other label of x axis bold while others are normal you should provide argument face= inside theme() axis.text.x= with vector of the same length as number of levels - 1 for all levels and 2 for level Other.
library(scales)
ggplot(data = DF, aes(x=Type, y=Cost2)) +
geom_bar(aes(fill=Type),stat="identity",show_guide=FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1,
face=(as.numeric(levels(DF$Type)=="Other") + 1)))+
scale_y_continuous(labels=dollar)+
scale_fill_manual(values=c("grey43","black")[as.numeric(levels(DF$Type)=="Other")+1])

Resources