R, ggplot bar, all bars same width? [duplicate] - r

This question already has answers here:
Don't drop zero count: dodged barplot
(6 answers)
Closed 6 years ago.
I'm trying to find a ggplot specific work around so that I can generate bar plots in which all the bars are the same width. I know that this is because I am "missing values" and because the bar width fills in side-to-side over a blank. BUT I'm working with very large data sets and using reshape to make the data wide and then inserting place holder values to eliminate blanks is not something I want to do.
Test data:
df<-data.frame(tax=c("type1","type1","type1","type1","type2","type2"),Gene=c("a","b","c","c","a","b"),logFC=c(-2,-4,2,1,3,-1))
ggplot code, which gives me an extra wide bar for "c"
bar<-ggplot(df, aes(x=Gene, order=Gene,y=logFC,fill=tax))+ geom_bar(stat="identity",position="dodge")
Any suggestions that don't require me to change any values in the input df?
**This question is not a duplicate. I am looking for an ALTERNATIVE solution to what has been given before. Previous solutions DO NOT WORK. I cannot simply dcast (with fill=0) and re-melt my data frame (trust me, I've been trying this for weeks).
I am looking for a ggplot specific answer.

I think it will remain as a wide bar because c has only type1 twice and it doesn't have type 2
If you use facet_wrap, it will remain the same width
ggplot(df, aes(x=Gene, y=logFC, color = tax))+
geom_bar(stat = "identity", position="dodge", width=.5) +
facet_wrap(~tax)

Related

Is there a way to manually set order of bar plot in R ggplot2? [duplicate]

This question already has answers here:
Order Bars in ggplot2 bar graph
(16 answers)
Closed 6 months ago.
I'm doing some data analysis for an organization and I'm trying to change the order of the bars in a bar chart. Right now, it is ordered alphabetically but I want to set it to the order of days in a week. I have tried using levels and factor, but I don't think it is working possibly because I am reading the data in from an excel file and each column is the sum of all values of that day. Is there a way to do this through ggplot2 without editing the original excel file?
enter image description here
Basically do something like ggplot(df,aes(x=factor(V1,level=unique(V1)),y=V3,fill=V2)) (or add fct_rev to reverse the order):
t=read.table("https://pastebin.com/raw/GyEiXxNs",r=1)
t=t[,c(4,1,3,2)]
colnames(t)=paste0("V",1:ncol(t)) # prevent ggplot from reordering the bars for each column
t=t[order(as.matrix(t)%*%(1:ncol(t))^2),] # reorder rows so that rows with a high percentage of the first column are placed first
w2l=function(x)data.frame(V1=rownames(x)[row(x)],V2=colnames(x)[col(x)],V3=unname(c(unlist(x))))
t2=w2l(t) # wide to long
lab=round(100*t2$V3)
lab[lab<=1]="" # don't display labels for 0% or 1%
ggplot(t2,aes(x=fct_rev(factor(V1,levels=unique(V1))),y=V3,fill=V2))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T),size=.1,color="gray10")+
geom_text(aes(label=lab),position=position_stack(vjust=.5,reverse=T),size=3.5)+
coord_flip()+
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0))+
scale_fill_manual(values=colorspace::hex(colorspace::HSV(head(seq(0,360,length.out=ncol(t)+1),-1),.5,1)))+
# ggh4x::force_panelsizes(cols=unit(3,"in"))+ # make bars always 3 inches wide
theme(
axis.text=element_text(color="black",size=11),
axis.text.x=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
legend.position="none",
panel.border=element_rect(color="gray10",fill=NA,size=.2)
)
ggsave("1.png",width=4.5,height=.25*nrow(t)+.3,limitsize=F)

How to prepare my data for spaghetti plots [duplicate]

This question already has answers here:
Plot multiple lines in one graph [duplicate]
(3 answers)
Closed 2 years ago.
I would like to create a spaghetti plot similar to this one here
.
Unfortunately my data looks like this
.
I have 11 columns that have NA's, so I remove them with
neuron1 <- drop_na(neuron)
Then I have a datatable with 13 columns and 169 rows. My goal is to display the expression of each gene across these 169 rows. Basically I would only need the "area" on the x-axis and on the y-axis the 11 genes. I am able to plot the data, but only when selecting the genes specifically e.g with this code:
ggplot(neuron1, aes(area)) +
geom_line(aes(y=MAP2, group=1)) +
geom_line(aes(y=REEP1, group=1, color="red"))
It would be okay to repeat this 11 times but I have some datasets with more genes so it would really be nice to be able to group them properly and then run a short code.
Thank you very much in advance!
To long for comment. Lacking your data, this code may be a bit of a shot into the dark.
Try something like this:
library (tidyverse)
yourdata %>%
pivot_longer(cols = c(-area, -region), names_to = "key", values_to = "value") %>%
ggplot(aes(area, value) +
geom_line(aes(group = key))
Not sure how this will work with your area as an x, because it's a categorical variable (therefore not sure if geom_line is the right choice for visualisation)

R, ggplot2: reverse alphabetical order [duplicate]

This question already has answers here:
Order Bars in ggplot2 bar graph
(16 answers)
Closed 6 years ago.
I use ggplot2 to create a graph using
dat <- data.frame(xx=c("IND","AUS","USA"), yy=c(1,5,2))
ggplot(data=dat, aes(x=reorder(xx,xx), y=yy))
and this nicely sorts my x-axis alphabetically. However, I want to sort the string variable xx in reverse alphabetical order but cannot seem to get it. While reorder(yy,-yy) can sort my numeric variable, reorder(xx,-xx) does not work.
How about:
ggplot(data=dat, aes(x=forcats::fct_rev(reorder(xx,xx)), y=yy))

ggplot2 stacked bar graph using rows as datapoints [duplicate]

This question already has an answer here:
Grouping & Visualizing cumulative features in R
(1 answer)
Closed 6 years ago.
I have a set of data that I would like to plot like this:
Now this is plotted using LibreOffice Calc in Ubunutu. I have tried to do this in R using following code:
ggplot(DATA, aes(x="Samples", y="Count", fill=factor(Sample1)))+geom_bar(stat="identity")
This does not give me a stacked bar graph for each sample, but rather one single graph. I have had a similar question, that used a different dataframe, that was answered here. However, in this problem I don't have just one sample, but information for at least three. In LibreOffice Calc or Excel I can choose the stacked bar graph option and then choose to use rows as the data series. How can I achieve a similar graph in ggplot2?
Here is the dataframe/object for which I am trying to produce the graph:
Aminoacid Sequence,Sample1,Sample2,Sample3
Sequence 1,16,10,33
Sequence 2,2,2,7
Sequence 3,1,1,6
Sequence 4,4,1,1
Sequence 5,1,2,4
Sequence 6,4,3,14
Sequence 7,2,2,2
Sequence 8,8,5,12
Sequence 9,1,3,17
Sequence 10,7,1,4
Sequence 11,1,1,1
Sequence 12,1,1,2
Sequence 13,1,1,1
Sequence 14,1,2,2
Sequence 15,5,4,7
Sequence 16,3,1,8
Sequence 17,7,5,20
Sequence 18,3,3,21
Sequence 19,2,1,5
Sequence 20,1,1,1
Sequence 21,2,2,5
Sequence 22,1,1,3
Sequence 23,4,2,9
Sequence 24,2,1,1
Sequence 25,4,4,3
Sequence 26,4,1,3
I copied the content of a .csv file, is that reproducible enough? It worked for me to just use read.csv(.file) in R.
Edit:
Thank you for redirecting me to another post with a very similar problem, I did not find that before. That post brought me a lot closer to the solution. I had to change the code just a little to fit my problem, but here is the solution:
df <- read.csv("example.csv")
df2 <- melt(example, id="Aminoacid.Sequence")
ggplot(df2, aes(x=variable, y=value, fill=Aminoacid.Sequence))+geom_bar(stat="identity")
Using variable as on the x-axis makes bar graph for each sample (Sample1-Sample3 in the example). Using y=value uses the value in each cell for that sample on the y-axis. And most importantly, using fill="Aminoacid.Sequence" stacks the values for each sequence on top of each other giving me the same graph as seen in the screenshot above!
Thank you for your help!
Try something along the following lines:
library(reshape2)
df <- melt(DATA) # you probably need to adjust the id.vars here...
ggplot(df, aes(x=variable, y=value) + geom_bar(stat="identity")
Note that you need to adjust the ggplot and the melt code somewhat, but since you haven't provided sample data, no one can provide the actual code necessary. The above provides the basic approach on how to deal with these multiple columns representing your samples, though. melt will "stack" the columns on top of each other, and create a column with the old variable name. This you can then use as x for ggplot.
Note that if you have other data in the data frame as well, melt will also stack these. For that reason you will need to adjust the commands to fit your data.
Edit: using your data:
library(reshape2)
library(ggplot2)
### reading your data:
# df <- read.table(file="clipboard", header=T, sep=",")
df2 <- melt(df)
head(df2)
Aminoacid.Sequence variable value
1 Sequence 1 preDLI 16
2 Sequence 2 preDLI 2
3 Sequence 3 preDLI 1
4 Sequence 4 preDLI 4
5 Sequence 5 preDLI 1
6 Sequence 6 preDLI 4
This can be used as in:
ggplot(df2, aes(x=variable, y=value, fill=Aminoacid.Sequence)) + geom_bar(stat="identity")
I am sure you want to change some details about the graph, such as the colors etc, but this should answer your inital question.

How do I display only selected items in a ggplot2 legend? [duplicate]

This question already has an answer here:
Remove legend entries for some factors levels
(1 answer)
Closed 7 years ago.
I'm creating a stacked bar plot of relative abundance data, but I only want to display the ten most abundant organisms in the legend. How do I do this? I have no idea where to begin and haven't found any answers online.
Here's the plot with a full legend:
Thanks.
As pointed out by user20650 in the comments, the answer is to add a list of selected items to the breaks= argument in scale_fill_manual()
scale_fill_manual(breaks=list,values=colpal)

Resources