Plotting subsets of an AffyRNAdeg {affy} object with plot AffyRNAdeg? - plot

library(affy)
microarrays <- ReadAffy() # 98 CEL files are read into the same object
RNAdeg <- AffyRNAdeg(microarrays)
Now I want to plot subsets of RNAdeg
plotAffyRNAdeg(RNAdeg[.......?]) # What can I do?
I've tried various 'for' loops without success.
But if plot line colors are specified then plotAffyRNAdeg plots a subset of 1:(number of colors specified), but I haven't thought of a way to use that effectively. For example, below plots the first through the sixth AffyRNAdeg'd set of microarray data (first through sixth .CEL file read in by ReadAffy() )
plotAffyRNAdeg(RNAdeg,col=c(2,2,2,3,3,3))

OK, one way was found by running AffyRNAdeg() on subsets of the object the CEL files are in and putting the resulting data in a list of lists organized by experiment, then plotting the list elements. Maybe there is an easier way, but this worked (I'm quite new to R).
library(affy)
library(RColorBrewer)
> sampleNames(ARTHwoundMA[,11:14])
[1] "GSE18960_05_GSM469416_trt_rep2.CEL" "GSE18960_06_GSM469418_trt_rep3.CEL"
[3] "GSE5525_GSM128715_ctrl12h.CEL" "GSE5525_GSM128716_ctrl24h.CEL
# RNA DEG
# Indices to subset by experiment
cel_names <- substr(sampleNames(ARTHwoundMA),1,7)
unique_exp <- unique(substr(sampleNames(ARTHwoundMA),1,7))
exp_ind <- list()
for (i in 1:length(unique_exp))
{
tempvec <- vector()
for (j in 1:length(cel_names))
{
if (cel_names[j]==unique_exp[i])
{
tempvec <- append(tempvec,j)
}
}
exp_ind[[(length(exp_ind)+1)]] <- tempvec
}
# Calculating
RNAdeg_exp <- list()
for(i in 1:length(exp_ind))
{
RNAdeg_exp[[i]] <- AffyRNAdeg(ARTHwoundMA[,exp_ind[[i]]])
}
# Plotting
colors <- colorRampPalette(rev(brewer.pal(9, "Reds")))(length(exp_ind[[i]])
pdf(file="C:\\R working directory\\TEST\\RNAdeg_plots.pdf")
for(i in 1:length(exp_ind))
{
par(bg="gray")
colors <- colorRampPalette(rev(brewer.pal(9, "Reds")))(length(exp_ind[[i]]))
plotAffyRNAdeg(RNAdeg_exp[[i]], col=colors)
plot.new()
legend("topleft", lty=1, lwd=2,col=colors,
legend=paste(sampleNames(ARTHwoundMA[,exp_ind[[i]]])))
}
dev.off()

Related

How to combine data generated in a for loop into a single scatter plot?

I have this code that calculates p-values at certain named markers in a file (object gts). The for loop produces the scatter plots for each trait that is defined in pts.
pts <- read.csv('data_rqtl_phenotypes.csv',as.is=T)
gts <- read.csv('data_rqtl_genotypes.csv',as.is=T)
gts <- gts[,c(1,grep('2B',gts[1,]))]### use only 1B
gts <- gts[,!apply(gts[2,],1,duplicated)]### remove duplicated
dim(gts)### 93 229
map <- gts[1:2,]### salvage map
gts <- gts[-c(1:2),]
gts[gts=='-'] <- NA
testdfr <- merge(pts,gts,by='id')
testdfr[1:5,1:10]
pdf('SingleMarkerAnalysis_ParW471_2021_10_21.pdf')
for( trait in colnames(pts)[-1])
{
print(trait)
pValueL <- c()
for(marker in colnames(gts))
{
model <- try(lm(testdfr[,trait]~testdfr[,marker]))
{
if(any("try-error"%in%class(model)))
print('not enough data for a test')
else
{
modelsum <- summary(model)
coeffs <- modelsum$coefficient
print(coeffs)
pValue=coeffs[grep('[AB]$',row.names(coeffs)),grep('Pr',colnames(coeffs))]
print(pValue)
if(length(pValue)>0)
pValueL[marker] <- pValue
}
}
}
plot.default(map[2,names(pValueL)],-sapply(pValueL,log10),main=trait,xlab='Chr 1B [cM]', ylab='-log10(pValue)',ylim=c(0,7))
}
dev.off()
However, I want to take the data produced from each iteration and then plot it on the same scatter plot, so that I can colour each one and see how they compare when on top of each other. I'm not sure how to get the loop to output each iteration to a new object so that they can be plotted together. if anyone knows how this can be done, it would be much appreciated!

Remove outliers by condition from list of data frames

I try to create a function to remove multiple outliers via cooks distance from a list of data frames.
There are some problems at the moment:
Can I formulate part 1 as function? I tried several things that did not work out. I want to use several different variables for the lm - so it would be great if I could use colnumbers and the regular expression syntax of data frames as input argument.
Part 2 - the filename of the plots are not correct. It takes the first observation in each data frame from the list as filename. How can I correct this?
Part 3: data frames without the outliers are not created. Function comes to an end after the message is printed. I can't find my mistake.
data(iris)
iris.lst <- split(iris[, 1:2], iris$Species)
new_names <- c(paste0(unlist(levels(iris$Species)),"_data"))
for (i in 1:length(iris.lst)) {
assign(new_names[i], iris.lst[[i]])
}
# Part 1: Then cooks distances
fit <- lapply(mget(ls(pattern = "_data")),
function(x) lm(x[,1] ~ x[,3], data = x))
cooksd <-lapply(fit,cooks.distance)
# Part 2: Plot each data frame with suspected outlier
plots <- function(x){
jpeg(file=paste0(names(x),".jpeg")) # file names are numbers
#par(mfrow=c(2,1))
plot(x, pch="*", cex=2, main="Influential cases by Cooks distance") # plot cook's distance
abline(h = 3*mean(x, na.rm=T), col="red") # add cutoff line
text(x=1:length(x)+1, y=x, labels=ifelse(x > 3*mean(x, na.rm=T),
names(x),""), col="red")
dev.off()
}
myplots <- lapply(cooksd, plots)
# Part 3: give me new data frames without influential cases
show_influential_cases <- function(x){
# invisible(cooksd[["n_OG"]] <- lapply(cooksd, length)
influential <- lapply(x,function(x) names(x)[x > 3*mean(x, na.rm=T)])
test <- as.data.frame(unlist(influential))[,1]
test <- as.numeric(test)
}
tested <- show_influential_cases(result)
cleaned_data <- add_new[-tested,] # removing outliers by indexing
Could someone please help me to improve my code?
Many thanks,
Nadine
In general, it is not a good practice to create multiple dataframes in global environment. Lists always are a better option, they are easy to manage.
Part 1 -
You can combine multiple steps in one lapply function. Here in part 1 we apply lm and cooks.distance function together in the same lapply call.
master_data <- split(iris[, 1:2], iris$Species)
data <- lapply(master_data, function(x) {
cooks.distance(lm(Sepal.Length ~ Sepal.Width, data = x))
})
new_names <- paste0(levels(iris$Species),"_data")
names(data) <- new_names
Part 2 -
lapply does not have access to names of the list, pass them separately and use Map to call plots function.
plots <- function(x, y){
jpeg(file=paste0(y,".jpeg"))
plot(x, pch="*", cex=2, main="Influential cases by Cooks distance")
abline(h = 3*mean(x, na.rm=T), col="red") # add cutoff line
text(x=1:length(x)+1,y=x,labels=ifelse(x > 3*mean(x, na.rm=T),y,""), col="red")
dev.off()
}
Map(plots, data, names(data))
Part 3 -
I am not exactly clear about how you want to perform Part3 but for now I am showing outlier and data separately.
remove_influential_cases <- function(x, y){
inds <- x > 3*mean(x, na.rm=TRUE)
y[!inds, ]
}
result <- Map(remove_influential_cases, data, master_data)

How to save multiple ggplot charts in loop for using grid.arrange

I have a for loop that creates a different ggplot for a different set of parameters each time through the loop. Right now I am printing N different charts one at a time. I would like to save them so I can use grid.arrange to put them all on one page. This doesn't work:
p <- vector(length = N)
for(i in 1:N)
p[i] <- ggplot( ........
...
...
grid.arrange(p[1], p[2], .. p[N], nrow = 4)
Is there a way to save the plots for later plotting a grid of plots on a page outside the loop, or is there a way to set up the grid specification before the loop and and produce the gridded plot on the fly as the loop is executed (e.g., the way par is used with plot)?
You rarely want to use for loops in R. In R's lapply(). In a single step:
do.call(
grid.arrange,
lapply(data, function(f){
ggplot(f, ...)
}
)
EDIT:
If you want to store the list for later plotting:
plot_objects <- lapply(data, function(f) {
ggplot(f, ...)
})
do.call(grid.arrange, plot_objects)
This could be solved by initiating a list to store the plot objects instead of vector
p <- vector('list', N)
for(i in seq_len(N)) {
p[[i]] <- ggplot(...)
}
grid.arrange(p[[1]], p[[2]], ..., p[[N]], nrow = 4)

Way to progressively overlap line plots in R

I have a for loop from which I call a function grapher() which extracts certain columns from a dataframe (position and w, both continuous variables) and plots them. My code changes the Y variable (called w here) each time it runs and so I'd like to plot it as an overlay progressively. If I run the grapher() function 4 times for example, I'd like to have 4 plots where the first plot has only 1 line, and the 4th has all 4 overlain on each other (as different colours).
I've already tried points() as suggested in other posts, but for some reason it only generates a new graph.
grapher <- function(){
position.2L <- data[data$V1=='2L', 'V2']
w.2L <- data[data$V1=='2L', 'w']
plot(position.2L, w.2L)
points(position.2L, w.2L, col='green')
}
# example of my for loop #
for (t in 1:200){
#code here changes the 'w' variable each iteration of 't'
if (t%%50==0){
grapher()
}
}
Not knowing any details about your situation I can only assume something like this might be applicable.
# Example data set
d <- data.frame(V1=rep(1:2, each=6), V2=rep(1:6, 2), w=rep(1:6, each=2))
# Prepare the matrix we will write to.
n <- 200
m <- matrix(d$w, nrow(d), n)
# Loop progressively adding more noise to the data
set.seed(1)
for (i in 2:n) {
m[,i] <- m[,i-1] + rnorm(nrow(d), 0, 0.05)
}
# We can now plot the matrix, selecting the relevant rows and columns
matplot(m[d$V1 == 1, seq(1, n, by=50)], type="o", pch=16, lty=1)

Plotting multiple trajectories from combined .csv file as different colours

I have a combined .csv file containing coordinates for multiple trajectories.
I would like to plot these trajectories in R on the same graph with each line having a different colour (preferably using a loop). How do I do this?
Please supply a minimal working example for your work.
But in general: If you have loaded your .csv into R using read.csv or similar methods and arranged your data in a dataframe or matrix its a matter of looping over the desired dimension with lines.
Example:
simdata <- function()
{
set.seed(1234)
data <- matrix(data=NA,nrow=10,ncol=100)
for(i in 1:10) data[i,] <- dnorm(1:100,runif(1,1,100),runif(1,5,20))
return(data)
}
Matrix <- simdata()
cols <- colorRampPalette(c("blue","red"))(10) #generate ramping colors
plot(NULL,xlim=c(0,ncol(Matrix)),ylim=range(Matrix)) #setup empty plot window
for(i in 1:nrow(Matrix)) lines(Matrix[i,],col=cols[i]) #plot

Resources