Changing plotting order of points in R / ggplot2 - r

I have the following code to plot a large dataset (450k) in ggplot2
x<-ggplot()+
geom_point(data=data_Male,aes(x=a,y=b),color="Turquoise",position=position_jitter(w=0.2,h=1),alpha=0.1,size=.5,show.legend=TRUE)+
geom_point(data=data_Female,aes(x=a,y=b),color="#FF9999",position=position_jitter(w=0.2,h=1),alpha=0.1,size=.5,show.legend=TRUE)+
theme_bw()
x<-x+geom_smooth(data=data_Male,aes(x=a,y=b,alpha="Male"),method="lm",colour="Blue",linetype=1,se=T)+
geom_smooth(data=data_Female,aes(x=a,y=b,alpha="Female"),method="lm",colour="Dark Red",linetype=5,se=T)+
geom_smooth(data=data_All,aes(x=a,y=b,alpha="All"),method="lm",colour="Black",linetype=3,se=T)+
scale_fill_discrete(name="Key",labels=c("Female","Male","All"))+
scale_colour_discrete(name="Plot Colour",labels=c("Female","Male","All"))+
scale_alpha_manual(name="Key",
values=c(1,1,1),
breaks=c("Female","Male","All"),
guide=guide_legend(override.aes=list(linetype=c(5,1,3),name="Key",
shape=c(16,16,NA),
color=c("Dark Red","Blue","Black"),
fill=c("#FF9999","Turquoise",NA))))
How can I change the order in which points are plotted? I have seen answered questions here dealing with a single dataframe but I am working with several dataframes so I cannot re-order the rows or ask ggplot to plot by certain criteria from within the dataframe. You can see an example of the kind of problem that this causes in the attached picture: the Female points are plotted on top of the Male points. Ideally I would like to be able to plot all the points in a random order, so that one "cloud" of points is not plotted on top of the other, obscuring it (N.B. the image shown doesn't include the "All" line).
Any help would be appreciated. Thank you.

I belive this is not possible. The following should work though:
You'd have to paste the two data frames together to df. The new data frame will appear sorted by male and female.
You can then suffle the new data frame:
set.seed(42)
rows <- sample(nrow(df))
male_female_mixed <- df[rows, ]
Then you can plot male_female_mixed

Related

Multiple boxplots in one graph, R

I'm working with a dataset where I have one continous variable (V1) and want to see how that variable differs depending on demographics such as sex, age group etc.
I would like to do one graph that contains multiple boxplots - so that V1 is on the Y-axis and all my demographic variables (sex, age groups etc.) are on the x-axis with their corresponding p-values. Anyonw know how to do this in R?
I've added two photos to illustrate my dataset and the output I want.
Thanks!
Output example
Data example
It would be nice to have actual data and the code you already have so we can replicate what you have and work what you want. That being said, this link might be what you are looking for:
https://statisticsglobe.com/draw-multiple-boxplots-in-one-graph-in-r#example-2-drawing-multiple-boxplots-using-ggplot2-package
Scroll down about half way to Example 4: Drawing Multiple Boxplots for Each Group Side-by-Side

How to create a genome-wide reads density map in R (for a bacterial genome)

I have a data frame (pLog) containing the number of reads per nucleotide for a chip-seq experiment done for a E. coli genome (4.6MB). I want to be able to plot on the X axis the chromosomal position and on the Y axis the number of reads. To make it easier, I binned the data in windows of 100bp. That makes the data frame of 46,259 rows and 2 columns. One column is named "position" and has a number representing a chromosomal position (1,101,201,....) and the other column is named "values" and contains the number of reads found on that bin e.g.(210,511,315,....). I have been using ggplot for all my analysis and I would like to use it for this plot, if possible.
I am trying for the graph to look something like this:
but I haven't been able to plot it.
This is how my data looks like
I tried
ggplot(pLog,aes(position))+
geom_histogram(binwidth=50)
ggsave(file.jpg)
And this is how it looks like :(
Many thanks!
You cannot use geom_histogram(), try geom_line:
pLog=data.frame(position=seq(1,100000,by=100),
value=rnbinom(10000,mu=100,size=20))
ggplot(pLog,aes(x=position,y=value))+geom_line(alpha=0.7,col="steelblue")
Most likely you need to play around to get the visualization you need

Remove factors from boxplot that have no data

I have a data frame with two factors: Peel - either "Standard" or "Delay" and Wafer - a number of a wafer but which I want as a factor:
**Peel** **Wafer**
Standard 122
Standard 123
Delay 124
Delay 125
(sorry I am trying to post real data but it seems to come out in a dodgy format)
When I boxplot my data for a variable against both factors, I get gaps on the x axis where there is no data:
boxplot(Von.fwd~Wafer*Peel, data=df, las=2)
I have tried posting an image but apparently I need 10 reputation to do this.
The data is missing because it doesn't exist. I just dont want it to plot the gap. I have looked at the droplevel code but I dont want to drop either of my factors, just certain combinations of the factors.
Is there a way to tell R not to plot crossed factors where there is no data?
Many thanks
Pete
One option is to combine Peel and Wafer into a new factor like so (assuming your data.frame is called df):
Edit
Sorry did not think that through:
df$NewFactor<-paste(df$Peel,df$Wafer)
df$NewFactor<-factor(df$NewFactor)
That will give you each combination as a factor, but no missing combinations. Then you can use df$newFactor in your boxplot function.

R ggplot2 - convert row records to vertical values and use in geom_polygon

I have some items that have different eligibility criteria - specifically in this example two variables each with a min and max the values are allowed to take. I would like to see the coverage of the products by plotting rectangles for each product on a chart that shows the area between the mins and maxs.
How would you go about
converting the records most elegantly to that required by geom_polygon() and
ensuring the shapes produced appear as rectangles
Example
library(data.table)
library(ggplot2)
df<-data.table(Product=letters[1:10], minX=1:10, maxX=5:14, minY= 10:1, maxY=14:5)
df.t<-data.table(rbind( df[,list(Product,X=minX,Y=minY)],
df[,list(Product,X=minX,Y=maxY)],
df[,list(Product,X=maxX,Y=minY)],
df[,list(Product,X=maxX,Y=maxY)]))[
order(Product,X,Y)]
ggplot(df.t,aes(x=X,y=Y,group=Product,fill=Product))+geom_polygon()
NB In this reduced example there are only two criteria, however I have a range of criteria columns and would not want to repeat the exercise above for different combinations.
Use your original data frame df and then geom_rect() as you already have minimal and maximal values for the x and y.
ggplot(df,aes(xmin=minX,xmax=maxX,ymin=minY,ymax=maxY,fill=Product))+geom_rect()

R - zoo plots customization

I have data in a zoo object which has multiple columns.
Now I want to plot (four of those columns) two columns in same and two in graph below the previous graph.
To be more precise, I have been able to plot the four of them one below the other.
But I want first two in the same plot and last two in the next plot
It should work by adding nc=2 in your plot command (ie number of columns = 2).

Resources