Looping and Saving Scatterplots in R - r

I have a data frame that consists of 2 columns and 3110 rows. The X column is a constant, where as the Y column changes each row. I am looking to create a loop that will generate a scatter plot for each row, and ultimately save the scatter plots onto my desktop.
The original code that I would use to create one scatter plot is:
X <- Abundances$s__Coprobacillus_cateniformis
Y <- Abundances$Gene1
plot(X, Y, main = "Species Vs Gene Expression",
xlab = "s__Coprobacillus_cateniformis", ylab = "Gene1",
pch = 19, frame = FALSE)
So, the X variable is a specie name, and will stay constant. The Y variable is a gene name, and will change for each of the 3110 plots. I am using the percentage abundances for the gene expression and the specie's from another data frame called "Abundances".
A short snippet of my data looks like so, it has 2 columns, one column called Predictor, and one column called response:
Response <- c("ENSG00000000005.5", "ENSG00000001167.10", "ENSG00000001617.7", "ENSG00000003393.10", "ENSG00000004142.7")
Predictor <- c("s__Coprobacillus_cateniformis", "s__Coprobacillus_cateniformis", "s__Coprobacillus_cateniformis", "s__Coprobacillus_cateniformis", "s__Coprobacillus_cateniformis" )
If anyone could help me generate a loop that could create a scatter plot for each individual gene (on the y axis), against the specie on the X axis, and then immediately save these plots on my desktop, that would be great!
Thanks.

It's impossible to test without a sample from Abundances, but I think this is on the right track. The key thing to note is that $ doesn't work with strings, but [[ does: Abundances$Gene1 is the same as Abundances[["Gene1"]] is the same as col = "Gene1"; Abundances[[col]].
for(i in seq_along(Response)) {
png(filename = paste0("plot_", Response[i], ".png"))
X <- Abundances[[Predictor[i]]]
Y <- Abundances[[Response[i]]]
plot(X, Y, main = "Species Vs Gene Expression",
xlab = Response[i], ylab = Predictor[i],
pch = 19, frame = FALSE)
dev.off()
}
If you want the plots on your desktop, set that as the working directory or put the paste to your desktop as part of the filename.

Related

Select argument doesn't work on cca objects

I created an object of class cca in vegan and now I am trying to tidy up the triplot. However, I seemingly can't use the select argument to only show specified items.
My code looks like this:
data("varechem")
data("varespec")
ord <- cca(varespec ~ Al+S, varechem)
plot(ord, type = "n")
text(ord, display = "sites", select = c("18", "21"))
I want only the two specified sites (18 and 21) to appear in the plot, but when I run the code nothing happens. I do not even get an error meassage.
I'm really stuck, but I am fairly certain that this bit of code is correct. Can someone help me?
I can't recall now, but I don't think the intention was to allow "names" to select which rows of the scores should be selected. The documentation speaks of select being a logical vector, or indices of the scores to be selected. By indices it was meant numeric indices, not rownames.
The example fails because select is also used to subset the labels character vector of values to be plotted in text(), and this labels character vector is not named. Using a character vector to subset another vector requires that the other vector be named.
Your example works if you do:
data("varechem")
data("varespec")
ord <- cca(varespec ~ Al + S, varechem)
plot(ord, type = "n")
take <- which(rownames(varechem) %in% c("18", "21"))
# or
# take <- rownames(varechem) %in% c("18", "21")
text(ord, display = "sites", select = take)
I'll have a think about whether it will be simple to support the use case of your example.
The following code probably gives the result you want to achieve:
First, create an object to store the blank CCA1-CCA2 plot
p1 = plot(ord, type = "n")
Find and then save the coordinates of the sites 18 and 21
p1$p1$sites[c("18", "21"),]
# CCA1 CCA2
#18 0.3496725 -1.334061
#21 -0.8617759 -1.588855
site18 = p1$sites["18",]
site21 = p1$sites["21",]
Overlay the blank CCA1-CCA2 plot with the points of site 18 and 21. Setting different colors to different points might be a good idea.
points(p1$sites[c("18", "21"),], pch = 19, col = c("blue", "red"))
Showing labels might be informative.
text(x = site18[1], y = site18[2] + 0.3, labels = "site 18")
text(x = site21[1], y = site21[2] + 0.3, labels = "site 21")
Here is the resulted plot.

How to create a plot which shows objects on y-axis and number on x-axis

I do have a lots of data which I want to plot in a special way. But I don't know how to do this on R.
The input is a csv file containing several columns. The columns I want to plot are A and D.
A contains text and D numbers. The usesd text in column A can be there several times. But the does not matter
In the end I want to get a plot which shall demonstrate the following:
I have actually no idea how to plot this:
I've tried: plot(data1$COLUMND,data1$COLUMNA,xlab = "COLUMND", ylab = "COLUMNA"); But the result is that the text in column A is replaced by a number. So the axis get the label from 0-3 in this case.
I also tried to change the lable with the labels command. But this lead to the problem that the lables were in an aceding row. But the data in the column are not (in my example above they are, but not in my real data). Therefore R should replace 0 with the corresponding text from column A.
For this I used the methods shown in Quick-R guide
but they work not as desired and replaced the entries with null.
you have to do two steps.
1) Make a list of vectors. Every vector is names after an unique element of column A and contains the corresponding values form column D.
2) Use the stripchart() function with this list.
My code approach:
## your data
data <- data.frame(A = c("AAA", "AAB", "AAC", "AAA", "AAE", "AAC"),
B = rep(12.3),
C = rep(20160729),
D = c(100,80,10,0,5,20))
## empty list to fill in the following loop
list <- list()
## get the values in column D for every unique value in column A
## an add it to the list
for (i in unique(data$A)) list[[i]] <- data$D[data$A == i]
## plot the list
stripchart(list,
xlab = "Column D", ylab = "Column A",
pch = 16, col = "red")
The result:
Stripchart
Have you tried using the axis function?
First, note that "AAD" was not in the sample data that you provided. We have to tell R about the values in Column A and how we want them to be ordered:
data1 <- data.frame(A=c('AAA', 'AAB', 'AAC', 'AAA', 'AAE', 'AAC'),
D=c(100, 80, 10, 0, 5, 20))
data1$A <- factor(data1$A, levels=paste0('AA',LETTERS[1:5]))
Now we can plot. We tell R to leave out the Y-axis for now (using the yaxt argument); we'll add them in manually later.
par(mar=c(6,6,4,2)) # Set margins for plot
plot(data1$D, data1$A, xlab = "Column D", ylab = "", yaxt="n", las=1)
Finally we add in the Y-axis labels, using the actual values instead of factor levels (i.e. the numbers).
axis(2, at=1:length(levels(data1$A)), labels=levels(data1$A), las=2)
mtext("Column A", side=2, line=1, las=2, at=3.2)

How to obtain x values for 2 data sets in a line plot where y == 80?

I have two separate .csv files that I have loaded into R (version 3.1.2), and produced a simple plot with data from both files using the plot() function as below:
plot(db1[ ,2],db1[ ,5], type = "l", xlab = "area", ylab = "represented", main = "title", frame.plot = FALSE, col = 'blue', pch = 20)
lines(db2[ ,3], db2[ ,7], col="red", pch = 20, )
abline(h=80,col= 'black',lty=2)
This is the plot:
What I would like to do is obtain the value of x for each of the two data sets, where y == 80. The values do not exist in the dataset - I would need to interpolate them. An example dataset can be found on my Google Drive here.
Searching through the literature, I can see that I can use identify or locate and use the mouse to find the y-values, but I would like a more accurate value than these functions can provide. Is there something I can add to the plot code to obtain specific values for x where y == 80? Any assistance would be greatly appreciated.
Since both the x- and y-values come directly from df, you can simply subset your data frame.
# Blue plot
df[df[, 5] == 80, 2]
# Red plot
df[df[, 7] == 80, 3]
For the blue line, this gets the value of the second column of df, i.e. df[, 2] such that the fifth column, df[, 5] is equal to 80. Likewise for the red line.
The syntax for subsetting a data frame by row is as follows:
df[<row subsetting>, <column selection>]
The row subsetting can be a list of indices to select, or a logical vector that's TRUE at the indices we wish to select. In this case we're using the latter. The column selection is simply a list of columns to return, and in this case we're just getting a single column.

quantile plot, two data - issues with fitting the line in R

So I am trying to plot two p values from two different data frames and compare them to the normal distribution in QQplot in R
here is the code that I am using
## Taking values from 1st dataframe to plot
Rlogp = -log10(trialR$PVAL)
Rindex <- seq(1, nrow(trialR))
Runi <- Rindex/nrow(trialR)
Rloguni <- -log10(Runi)
## Taking values from 2nd dataframe to plot on existing plot
Nlogp = -log10(trialN$PVAL)
Nlogp = sort(Nlogp)
Nindex <- seq(1, nrow(trialN))
Nuni <- Nindex/nrow(trialN)
Nloguni <- -log10(Nuni)
Nloguni <- sort(Nloguni)
qqplot(Rloguni, Rlogp, xlim=range(0,6), ylim=range(0,6), col=rgb(100,0,0,50,maxColorValue=255), pch=19, lwd=2, bty="l",xlab ="", ylab ="")
qqline(Rloguni, Rlogp,distribution=qnorm, lty="dashed")
par(new=TRUE, cex.main=4.8, col.axis="white")
plot(Nloguni, Nlogp, xlim=range(0,6), ylim=range(0,6), col=rgb(0,0,100,50,maxColorValue=255), pch=19, lwd=2, bty="l",xlab ="", ylab ="")
The code plot the graph effectively,but I am not sure of the qqline as it seems bit offset... Can someone tell me if I am doing the correct way or is there something to change
the TARGET plot will look something like this - without the third data value..

How to annotate subplots with ggplot from rpy2?

I'm using Rpy2 to plot dataframes with ggplot2. I make the following plot:
p = ggplot2.ggplot(iris) + \
ggplot2.geom_point(ggplot2.aes_string(x="Sepal.Length", y="Sepal.Width")) + \
ggplot2.facet_wrap(Formula("~Species"))
p.plot()
r["dev.off"]()
I'd like to annotate each subplot with some statistics about the plot. For example, I'd like to compute the correlation between each x/y subplot and place it on the top right corner of the plot. How can this be done? Ideally I'd like to convert the dataframe from R to a Python object, compute the correlations and then project them onto the scatters. The following conversion does not work, but this is how I'm trying to do it:
# This does not work
#iris_df = pandas.DataFrame({"Sepal.Length": rpy2.robjects.default_ri2py(iris.rx("Sepal.Length")),
# "Sepal.Width": rpy2.robjects.default_ri2py(iris.rx("Sepal.Width")),
# "Species": rpy2.robjects.default_ri2py(iris.rx("Species"))})
# So we access iris using R to compute the correlation
x = iris_py.rx("Sepal.Length")
y = iris_py.rx("Sepal.Width")
# compute r.cor(x, y) and divide up by Species
# Assume we get a vector of length Species saying what the
# correlation is for each Species' Petal Length/Width
p = ggplot2.ggplot(iris) + \
ggplot2.geom_point(ggplot2.aes_string(x="Sepal.Length", y="Sepal.Width")) + \
ggplot2.facet_wrap(Formula("~Species")) + \
# ...
# How to project correlation?
p.plot()
r["dev.off"]()
But assuming I could actually access the R dataframe from Python, how could I plot these correlations? thanks.
The solution is to create a dataframe with a label for each sample plotted. The dataframe's column should match the corresponding column name of the dataframe with the original data. Then this can be plotted with:
p += ggplot2.geom_text(data=labels_df, mapping=ggplot2.aes_string(x="1", y="1", mapping="labels"))
where labels_df is the dataframe containing the labels and labels is the column name of labels_df with the labels to be plotted. (1,1) in this case will be the coordinate position of the label in each subplot.
I found that #user248237dfsf's answer didn't work for me. ggplot got confused between the data frame I was plotting and the data frame I was using for labels.
Instead, I used
ggplot2_env = robjects.baseenv'as.environment'
class GBaseObject(robjects.RObject):
#classmethod
def new(*args, **kwargs):
args_list = list(args)
cls = args_list.pop(0)
res = cls(cls._constructor(*args_list, **kwargs))
return res
class Annotate(GBaseObject):
_constructor = ggplot2_env['annotate']
annotate = Annotate.new
Now, I have something that works just like the standard annotate.
annotate(geom = "text", x = 1, y = 1, label = "MPC")
One minor comment: I don't know if this will work with faceting.

Resources