Combining dotplot R - r

Im trying to combine two plots into the same plot in R.
My code looks like this:
#----------------------------------------------------------------------------------------#
# RING data: Mikkel
#----------------------------------------------------------------------------------------#
# Set working directory
setwd("/Users/mikkelastrup/Dropbox/Master/RING R")
#### Read data & Converting factors ####
dat <- read.table("R SUM kopi.txt", header=TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$rep <- as.factor(dat$rep)
dat$fly <- as.factor(dat$fly)
str(dat)
mtdata <- droplevels(dat[dat$Line=="20",])
mt1data <- droplevels(mtdata[mtdata$rep=="1",])
tdata <- melt(mt1data, id=c("rep","Conc","Sex","Line","Vial", "fly"))
tdata$variable <- as.factor(tdata$variable)
tfdata <- droplevels(tdata[tdata$Sex=="f",])
tmdata <- droplevels(tdata[tdata$Sex=="m",])
####Plotting####
d1 <- dotplot(tfdata$value~tdata$variable|tdata$Conc,
main="Y Position over time Line 20 Female",
xlab="Time", ylab="mm above buttom")
d2 <- dotplot(tmdata$value~tdata$variable|tdata$Conc,
main="Y Position over time Line 20 Male",
xlab="Time", ylab="mm above buttom")
grid.arrange(d1,d2,ncol=2)
And that looks like this:
Im trying to combine it into one plot, with two different colors for male and female, i have tried to write it into one dotplot separated by a , and or () but that dosen't work and when i dont split the data and use tdata instead of tfdata and tfmdata i get all the dots in the same color. Im open to suggestions, using another package or another way of plotting the data that still looks somewhat like this since im new to R

All you need to do is to use the group parameter.
dotplot(value~variable|Conc, group=Sex, data=tdata,
main="Y Position over time Line 20 All",
xlab="Time", ylab="mm above buttom")
Also, don't use the $ notation in these functions; notice that you're using value from tfdata but value and variable from tdata. This is a problem because there's twice as many rows in tdata! Instead, use the data argument to specify which data frame to get the variables from.

Related

Set common y axis limits from a list of ggplots

I am running a function that returns a custom ggplot from an input data (it is in fact a plot with several layers on it). I run the function over several different input data and obtain a list of ggplots.
I want to create a grid with these plots to compare them but they all have different y axes.
I guess what I have to do is extract the maximum and minimum y axes limits from the ggplot list and apply those to each plot in the list.
How can I do that? I guess its through the use of ggbuild. Something like this:
test = ggplot_build(plot_list[[1]])
> test$layout$panel_scales_x
[[1]]
<ScaleContinuousPosition>
Range:
Limits: 0 -- 1
I am not familiar with the structure of a ggplot_build and maybe this one in particular is not a standard one as it comes from a "custom" ggplot.
For reference, these plots are created whit the gseaplot2 function from the enrichplot package.
I dont know how to "upload" an R object but if that would help, let me know how to do it.
Thanks!
edit after comments (thanks for your suggestions!)
Here is an example of the a gseaplot2 plot. GSEA stands for Gene Set Enrichment Analysis, it is a technique used in genomic studies. The gseaplot2 function calculates a running average and then plots it and another bar plot on the bottom.
and here is the grid I create to compare the plots generated from different data:
I would like to have a common scale for the "Running Enrichment Score" part.
I guess I could try to recreate the gseaplot2 function and input all of the datasets and then create the grid by facet_wrap, but I was wondering if there was an easy way of extracting parameters from a plot list.
As a reproducible example (from the enrichplot package):
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
wp2gene <- read.gmt(wpgmtfile)
wp2gene <- wp2gene %>% tidyr::separate(term, c("name","version","wpid","org"), "%")
wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
gseaplot2(ewp2, geneSetID=1, subplots=1:2)
And this is how I generate the plot list (probably there is a much more elegant way):
plot_list = list()
for(i in 1:3) {
fig_i = gseaplot2(ewp2,
geneSetID=i,
subplots=1:2)
plot_list[[i]] = fig_i
}
ggarrange(plotlist=plot_list)

Scatter plot in R doesn't use the x values in the variable indicated in the plot statement

I am trying to make a scatter plot in R between two numeric variables, and it uses the observation number as the x variable. This is the problem I'm trying to fix: I would like to have a scatter plot that uses the values of the x variable I indicated in the plot statement.
Yes, both the X variable and the Y variable are numeric.
I've attached a screenshot showing the data setup (Galton height data), the fact that the father and son variables are both numeric, and the resulting plot.
Here's the code that sets up the data and runs the scatter plot:
#install.packages("dplyr")
library('dplyr')
#tidyverse is name of package used for class
library(tidyverse)
remove.packages('HistData')
install.packages('HistData')
library(HistData)
data("GaltonFamilies")
childNum <- galton_heights[,6]
gender <- galton_heights[,8]
#Different code to get son height
#If we wanted to follow the lesson exactly, we would
#use the following
son_data <- GaltonFamilies[GaltonFamilies$gender == "male" & GaltonFamilies$childNum == 1,]
son <- son_data$childHeight
#Now we can compare the oldest child's height (if they happen to be male) with that of the father:
GaltonFamilies %>% summarize(mean(father), sd(father), mean(son), sd(son))
GaltonFamilies$father2 <- as.numeric(GaltonFamilies$father)
#galton_heights$father <- as.numeric(levels(galton_heights$father))[galton_heights$father]
plot(GaltonFamilies$father,GaltonFamilies$son)
plot(GaltonFamilies$father2, GaltonFamilies$son, main="Scatterplot Example",
xlab="Father ", ylab="Son ")
Edit: the filter statement creating son_data wasn't working when I ran the above code fresh. I don't know why. I've replaced it with a way to get son_data without the filter.
son_data <- GaltonFamilies[GaltonFamilies$gender == "male" & GaltonFamilies$childNum == 1,]
There is no GaltonFamilies$son. See also: Random data added when using `plot` in R

r coding for customising vegan plot

I am attempting to produce an NMDS plot in vegan, but really struggling with the code. I am trying to display the site points and species points differently, with the site points coloured according to treatment. Both lines work individually, but I cannot work out how to combine these two lines of code into one line to form one graph. I am using ordipointlabel to prevent overlap. These are the two lines of code I want to combine into one.
ordipointlabel(NMDS10, scaling=2, display="species", select=sel)
ordipointlabel(NMDS10,display="sites", col=c(rep("darkgreen",4),rep("blue4",4)),cex=0.75)
You can access directly to ordinpointlabel object and make it look like you wish. Please see the sample:
library(vegan)
data(dune)
NMDS10 <- metaMDS(dune[1:8, ])
pdf(file = NULL)
y <- ordipointlabel(NMDS10, display=c("sites", "species"))
dev.off()
# select sites & species
sel <- unlist(dimnames(dune[1:8, ]))[-(20:ncol(dune))]
# messing with ordipointlabel object
y$points <- y$points[rownames(y$points) %in% sel, ]
y$args$pcol[] = rep("red", length(y$args$pcol))
y$args$pcol[1:8] <- c(rep("darkgreen", 4), rep("blue4", 4))
y$par$cex <- 0.75
plot(y)

How to adjust x labels in R boxplot

This is my code to create a boxplot in R that has 4 boxplots in one.
psnr_x265_256 <- c(39.998,39.998, 40.766, 38.507,38.224,40.666,38.329,40.218,44.746,38.222)
psnr_x264_256 <- c(39.653, 38.106,37.794,36.13,36.808,41.991,36.718,39.26,46.071,36.677)
psnr_xvid_256 <- c(33.04564,33.207269,32.715427,32.104696,30.445141,33.135261,32.669766, 31.657039,31.53103,31.585865)
psnr_mpeg2_256 <- c(32.4198,32.055051,31.424819,30.560274,30.740421,32.484694, 32.512268,32.04659,32.345848, 31)
all_errors = cbind(psnr_x265_256, psnr_x264_256, psnr_xvid_256,psnr_mpeg2_256)
modes = cbind(rep("PSNR",10))
journal_linear_data <-data.frame(psnr_x265_256, psnr_x264_256, psnr_xvid_256,psnr_mpeg2_256)
yvars <- c("psnr_x265_256","psnr_x264_256","psnr_xvid_256","psnr_mpeg2_256")
xvars <- c("x265","x264","xvid","mpeg2")
bmp(filename="boxplot_PSNR_256.bmp")
boxplot(journal_linear_data[,yvars], xlab=xvars, ylab="PSNR")
dev.off()
This is the image I get.
I want to have the corresponding values for each boxplot in x axis "x265","x264","xvid","mpeg2".
Do you have any idea how to fix this?
There are multiple ways of changing the labels for your boxplot variables. Probably the simplest way is changing the column names of your data frame:
colnames(journal_linear_data) <- c("x265","x264","xvid","mpeg2")
Even simpler: you could do this right at the creation of your data frame too:
journal_linear_data <- data.frame(x265=psnr_x265_256, x264=psnr_x264_256, xvid=psnr_xvid_256, mpeg2=psnr_mpeg2_256)
If you run into the problem of your labels not being shown or overlapping due to too few space, try rotating the x labels using the las parameter, e.g. las=2 or las=3.

Add to ggplot with element of different length

I'm new to ggplot2 and I'm trying to figure out how I can add a line to an already existing plot I created. The original plot, which is the cumulative distribution of a column of data T1 from a data frame x, has about 100,000 elements in it. I have successfully plotted this using ggplot2 and stat_ecdf() with the code I posted below. Now I want to add another line using a set of (x,y) coordinates, but when I try this using geom_line() I get the error message:
Error in data.frame(x = c(0, 7.85398574631245e-07, 3.14159923334398e-06, :
arguments imply differing number of rows: 1001, 100000
Here's the code I'm trying to use:
> set.seed(42)
> x <- data.frame(T1=rchisq(100000,1))
> ps <- seq(0,1,.001)
> ts <- .5*qchisq(ps,1) #50:50 mixture of chi-square (df=1) and 0
> p <- ggplot(x,aes(T1)) + stat_ecdf() + geom_line(aes(ts,ps))
That's what produces the error from above. Now here's the code using base graphics that I used to use but that I am now trying to move away from:
plot(ecdf(x$T1),xlab="T1",ylab="Cum. Prob.",xlim=c(0,4),ylim=c(0,1),main="Empirical vs. Theoretical Distribution of T1")
lines(ts,ps)
I've seen some other posts about adding lines in general, but what I haven't seen is how to add a line when the two originating vectors are not of the same length. (Note: I don't want to just use 100,000 (x,y) coordinates.)
As a bonus, is there an easy way, similar to using abline, to add a drop line on a ggplot2 graph?
Any advice would be much appreciated.
ggplot deals with data.frames, you need to make ts and ps a data.frame then specify this extra data.frame in your call to geom_line:
set.seed(42)
x <- data.frame(T1=rchisq(100000,1))
ps <- seq(0,1,.001)
ts <- .5*qchisq(ps,1) #50:50 mixture of chi-square (df=1) and 0
tpdf <- data.frame(ts=ts,ps=ps)
p <- ggplot(x,aes(T1)) + stat_ecdf() + geom_line(data=tpdf, aes(ts,ps))

Resources