plotting in for loop with R using package eHOF - r

I want to run a number (~100) of eHOF models using the R package eHOF, produce the graphs that this package can make of the models, and save jpeg files of each one. I am trying to use a for loop to accomplish this quickly, but I am not able to get it to make the graphs. I don't see anything in the R studio plots window, and I produce jpeg files that have nothing in them (as an aside problem, I am not producing the names for the jpeg files correctly in the loop).
To produce these plots, outside a loop there is no problem, if for example I call my model modSP<-HOF(Sp, ...) then using plot(ModSp) produces the desired graph. But within the loop, nothing is produced, and I get several error messages of the sort:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In plot.window(...) : "boxp" is not a graphical parameter
3: In plot.window(...) : "las.h" is not a graphical parameter
4: In plot.window(...) : "onlybest" is not a graphical parameter
5: In plot.window(...) : "para" is not a graphical parameter
Background: I am using R version 3.1.0 (2014-04-10) in R studio, and package eHOF in windows 7.
My code is as follows:
species<- read.csv("F:/Thesis_projects/Chapter4_climateChange/HOF/259species.csv")
species<-as.data.frame(species)
enviro<-read.csv("F:/Thesis_projects/Chapter4_climateChange/HOF/EnvironmentalData.csv")
enviro<-as.data.frame(enviro)
species_enviro<-merge(enviro, species, all.x=FALSE)
HOF_Sp<-species_enviro[,23:25]
GDD<-species_enviro[,19]
library(eHOF)
SpeciesCodes<-c("ACPE","ACRU2","ACSP2")
Modx<-NULL
for (Spp in seq_along(SpeciesCodes)){
Modx[[Spp]]<-HOF(HOF_Sp[[Spp]],GDD, M=1,family=binomial, bootstrap=2, freq.limit = 100)
jpeg(filename = (paste(("GDD_responsecurve_",SpeciesCodes[[Spp]],".jpg"),sep="")),
width =8.3, height = 8.3, units = "cm", pointsize = 8, bg="white", res = 800)
print(plot((paste(c(Modx[[Spp]]))), boxp = TRUE,
las.h = 1, onlybest = TRUE, para = TRUE,
gam.se = FALSE, newdata = NULL, lwd=1, leg = TRUE, add=FALSE,
xlabel="Growing degree days", ylab="Probability"))
dev.off()
}
My Data looks like this:
> head(GDD)
[1] 996.1681 996.1681 962.0662 962.0662 945.7007 945.7007
(there are lots of 1's in the species data too, just not in the first few rows).
> head(HOF_Sp)
ACMI2 ACPA ACPE
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
Any advice at all would be very appreciated! I think it is an issue with the plot function in the loop being mistaken for the generic plot function of R. If I didn't provide enough information, I will be happy to edit my question.

There are several things going on here:
Your SpeciesCodes vector has codes different from the column names in HOF_Sp.
seq_along(...) returns the index of each element in it's argument, not the element itself.
The plot method for HOF is invoked when an object of class HOF is passed to plot(...). But you are passing paste(c(HOF)) which is incomprehensible...
Your example is not reproducible because it does not run with the data you provided (did you try to run it??). Specifically, the sample of HOF_Sp is degenerate because there are no non-zero values.
It's impossible for me to test this code because of (4) above, but try this:
SpeciesCodes <- c("ACPE","ACRU2","ACSP2")
for (Spp in SpeciesCodes) {
model <- HOF(HOF_Sp[[Spp]],GDD, M=1,family=binomial, bootstrap=2, freq.limit = 100)
jpeg(filename = (paste(("GDD_responsecurve_",Spp,".jpg"),sep="")),
width =8.3, height = 8.3, units = "cm", pointsize = 8, bg="white", res = 800)
plot(model, boxp = TRUE,
las.h = 1, onlybest = TRUE, para = TRUE,
gam.se = FALSE, newdata = NULL, lwd=1, leg = TRUE, add=FALSE,
xlabel="Growing degree days", ylab="Probability")
dev.off()
}
Note that this invokes the vector version of HOF(...), which I gather is what you want.
This does not solve the problem in (1) above (species codes do not match columns names), but other than that it should work.

Related

Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { : argument is of length zero

I am using WarbleR in R to do some acoustic analyses. As freq_range couldn't detect all the bottom frequencies very well, I have created a data frame manually with all the right bottom frequencies, loaded this into R and turned it into a selection table. Traq_freq_contour and compare.methods and freq_DTW all work fine (although freq_DTW does give a warning message:
Warning message: In (0:(n - 1)) * f : NAs produced by integer overflow
However. If I try to do the function cross_correlation, I get the following error:
Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { :
argument is of length zero
I do not get this error with a selection table with the bottom and top frequency added with the freq_range function in R instead of manually. What could be the issue here? The selection tables both look similar:
This is the selection table partly made by R through freq_range:
And this is the one with the bottom frequencies added manually (which has more sound files than the one before):
This is part of the code I use:
#Comparing methods for quantitative analysis of signal structure
compare.methods(X = stnew, flim = c(0.6,2.5), bp = c(0.6,2.5), methods = c("XCORR", "dfDTW"))
#Measure acoustic parameters with spectro_analysis
paramsnew <- spectro_analysis(stnew, bp = c(0.6,2), threshold = 20)
write.csv(paramsnew, "new_acoustic_parameters.csv", row.names = FALSE)
#Remove parameters derived from fundamental frequency
paramsnew <- paramsnew[, grep("fun|peakf", colnames(paramsnew), invert = TRUE)]
#Dynamic time warping
dm <- freq_DTW(stnew, length.out = 30, flim = c(0.6,2), bp = c(0.6,2), wl = 300, img = TRUE)
str(dm)
#Spectrographic cross-correlation
xcnew <- cross_correlation(stnew, wl = 300, na.rm = FALSE)
str(xc)
Any idea what I'm doing wrong?

Limiting decimals of AUC to be printed with with ROC curve in pROC

I have successfully created a plot with multiple ROC curves of different prediction models in the same plot and the numerical value of the AUC and CI is printed nicely on the side.
However it makes no sense for my project to have these printed with three decimals, rather I need the rounded version with two decimals and I have not been able to convince Rstudio to print it this way...
(to clearify, it says AUC: 0.708 (0.661-0.754) and I need it to be 0.71 (0.66-0.75)).
I tried:
print(ROC_rfhipfx60X1MOmospugholt, digits = max(3, getOption("digits") -3 ), call= TRUE)
which returns the call (seemingly succesfull at first):
Call:
roc.default(response = hipfx_pugely_and_holt_predictions_20200820$`1MO`, predictor = hipfx_pugely_and_holt_predictions_20200820$`PUGELY RISK %`, ci = TRUE, plot = TRUE, print.auc = TRUE, col = "red", lwd = 4, print.auc.y = 0.3, legacy.axes = TRUE, add = TRUE)
Data: hipfx_pugely_and_holt_predictions_20200820$`PUGELY RISK %` in 121 controls (hipfx_pugely_and_holt_predictions_20200820$`1MO` no) > 1050 cases (hipfx_pugely_and_holt_predictions_20200820$`1MO` yes).
Area under the curve: 0.7078
95% CI: 0.6615-0.7542 (DeLong)
>
And instead of creating a new plot it returns (same message gets printed again if I scroll through my plots and return to the one that is not working):
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file '/var/folders/22/nxnj3khn76q9hz0khppd11_40000gn/T/RtmpXrawRS/rs-graphics-7de8e907-4a56-4944-b16e-aeb4054ee835/2473fad9-4618-46c1-9014-fbcb61050b97.snapshot', probable reason 'No such file or directory'
Graphics error: Plot rendering error
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
The digit argument to print controls the number of significant digits. In this case it matches the number of decimals, as we are between 0 and 1. So you can use digits = 2, like so:
> library(pROC)
> data(aSAH)
> roc_curve <- roc(aSAH$outcome, aSAH$ndka)
Setting levels: control = Good, case = Poor
Setting direction: controls < cases
> print(roc_curve, digits = 2)
Call:
roc.default(response = aSAH$outcome, predictor = aSAH$ndka, ci = TRUE)
Data: aSAH$ndka in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
Area under the curve: 0.61
95% CI: 0.5-0.72 (DeLong)
For plots I suggest you use the plot function separately. You should customize the print.auc.pattern argument, to plot the numbers with two decimals (this time it's not the number of significant digits):
plot(roc_curve, print.auc = TRUE, col = "red", lwd = 4, print.auc.y = 0.3, legacy.axes = TRUE, print.auc.pattern = "%.2f (%.2f-%.2f)")
When you get plot errors from Rstudio, most often creating the plot fresh again will solve the problem.

DoHeatmap function Seurat - Error in dataframe: arguments imply differing number of rows

I'm trying to use the DoHeatmap function in Seurat to show expression of a number of genes across some defined clusters.
B_cells is my Seurat object.
tfs <- c("PRDM1", "PAX5", "BACH2")
DoHeatmap(B_cells, features=tfs)
I'm getting this error back;
Error in data.frame(group = sort(x = group.use), x = x.divs) :
arguments imply differing number of rows: 10411, 0
When I look at the number of rows and columns in the Seurat object;
nrow(B_cells) = 19651
ncol(B_cells) = 10151
Sorry if this is a silly question but I've been stuck on it for a while now.
edit traceback():
3: stop(gettextf("arguments imply differing number of rows: %s",
paste(unique(nrows), collapse = ", ")), domain = NA)
2: data.frame(group = sort(x = group.use), x = x.divs)
1: DoHeatmap(B_cells, features = genes)
The source code for the DoHeatmap() function can be found at https://github.com/satijalab/seurat/blob/develop/R/visualization.R. The traceback() shows line 363 of visualization.R is causing the error:
if (label) {
x.max <- max(pbuild$layout$panel_params[[1]]$x.range)
# Attempt to pull xdivs from x.major in ggplot2 < 3.3.0; if NULL, pull from the >= 3.3.0 slot
x.divs <- pbuild$layout$panel_params[[1]]$x.major %||% pbuild$layout$panel_params[[1]]$x$break_positions()
x <- data.frame(group = sort(x = group.use), x = x.divs)
...
}
As a workaround to bypass the error try:
DoHeatmap(B_cells, features=tfs, label=FALSE)
I had a similar error. It turns out that there was a problem with my cluster labeling, where one of my clusters ended up with an empty label (""). I found it by when I asked for DimPlot with label=T, and one of the clusters did not have a label. When I went back and re-labeled the clusters correctly, the DoHeatmap error disappeared.

Creating a scatterplot for each value of the first column in an r dataframe

Using the file below, I am trying to creating 2 scatterplots. One scatterplot compares the 2nd and 3rd column when the first column is equal to "coat" and the second scatterplot compares the 2nd and third column when the first column is equal to "hat"
file.txt
clothing,freq,temp
coat,0.3,10
coat,0.9,0
coat,0.1,20
hat,0.5,20
hat,0.3,15
hat,0.1,5
This is the script I have written
script.R
rates = read.csv("file.txt")
for(i in unique(rates[1])){
plot(unlist(rates[2])[rates[1] == toString(i)],unlist(rates[3])[rates[1] == toString(i)])
}
I receive this error when running it
Error in plot.window(...) : need finite 'xlim' values
Calls: plot -> plot.default -> localWindow -> plot.window
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Execution halted
The script works if I replace if I replace "toString(i)" with "hat" but can obviously only make one of the scatterplots.
.
EDIT
I edited my script slightly. It creates a graph for the first iteration through the loop but not for any iteration after the first.
This is my script
rates = read.csv("file.txt")
for(i in unique(rates[,1])){
plot(unlist(rates[2])[rates[1] == toString(i)],unlist(rates[3])[rates[1] == toString(i)])
file.rename("Rplots.pdf", paste(i,".pdf",sep=""))
}
This is what happens when I execute the script
name#server:/directory> ./script.R
Warning message:
In file.rename("Rplots.pdf", paste(i, ".pdf", sep = "")) :
cannot rename file 'Rplots.pdf' to 'hat.pdf', reason 'No such file or directory'
name#server:/directory> ls
coat.pdf file.txt script.R*
try this:
rates = read.table("file.txt",sep=',',header=TRUE)
cloth_type<-unique(rates[,1])
for (i in 1:length(cloth_type)){
dev.new()
index_included=which(rates[,1]==cloth_type[i])
plot(rates[index_included,2],rates[index_included,3],main=cloth_type[i],
xlab="freq ", ylab="temp ", pch=19)
}
Maybe the dplyr package would be helpful.
To install the package:
install.packages('dplyr')
Then you can use the filter function to generate separate your separate dataframes:
library('dplyr')
rates <- read.csv("file.txt")
cloathTypes <- unique(rates$clothing)
for(cloath in cloathTypes){
d <- filter(rates, clothing == cloath)
plot(d$freq, d$temp, xlab = 'Freq', ylab='Temp', main=cloath)
}
I think your issue is arising from calling unique() on a data.frame, which produces another data.frame rather than a vector to iterate over. Provided your global options import strings as factors, you should be able to output the plots side-by-side as follows:
## input data
rates = data.frame(clothing = c(rep("coat", 3), rep("hat", 3)),
freq = c(0.3, 0.9, 0.1, 0.5, 0.3, 0.1),
temp = c(10, 0, 20, 20, 15, 5))
## store original plotting parameters
op = par(no.readonly = TRUE)
## modify plotting parameters to produce side-by-side plots
par(mfrow = c(1, 2))
## output plots
for(i in levels(rates[,1])){
plot(rates[,2][rates[,1] == i], rates[,3][rates[,1] == i])
}
## reset plotting pars
par(op)
If you want to produce separate plots just remove the par lines.
You can do this pretty easily with ggplot
Your data as a data.frame
df <- data.frame(clothing=c(rep("coat",3),rep("hat",3)),
freq=c(0.3,0.9,0.1,0.5,0.3,0.1),
temp=c(10,0,20,20,15,5),
stringsAsFactors=F)
Plotting freq on x, temp on y, and coloring points by clothing
ggplot(df, aes(freq, temp, colour=clothing)) +
geom_point()
Change for(i in unique(rates[1])) to for(i in unique(rates[,1])) and add dev.new() into the for loop
rates = read.csv("file.txt")
for(i in unique(rates[,1])){
dev.new()
plot(unlist(rates[2])[rates[1] == toString(i)],unlist(rates[3])[rates[1] == toString(i)])
file.rename("Rplots.pdf", paste(i,".pdf",sep=""))
}

How can I combine several heatmaps using R in a signal figure

I have created 36 heatmaps with the function pheatmap, and I want to display them in just one figure. I have tried to using the function par(), but it did not work, I do not know why. Could someone tell me what should I do? Thank you very much. This is my code:
require(graphics);require(grDevices);library("pheatmap", lib.loc="D:/Program Files/R/R-3.1.1/library");library(gplots)
filenames<-list.files("D:/Project/bladder cancer/heatmap0829/heatmap/"); # detect all of the files in the fold
filename2<-strtrim(filenames,nchar(filenames)-4); # all of the filenames without extension names
par(mfrow=c(18,2)) #divide the graphics windows into a 18x2 matrix
for(i in 1:length(filename2)){
rt<-read.table(paste("D:/Project/bladder cancer/heatmap0829/heatmap/",filenames[i],sep = ""), header = T, sep = '\t') # Import the data with the ith file name
size=dim(rt) # the dimensional of the datafram
cw=400/size[1] #the width of the cell in the heatmap
rt<-log10(rt)
x <- t(data.matrix(rt))
pheatmap(x,color=greenred(256),main=filename2[i],cluster_rows = F, cluster_cols = T,cellwidth = cw, cellheight = 60,border_color =F,fontsize = 8,fontsize_col = 15)}
This is one dataset
ScaBER 5637
1 1.010001e+02
1.341186e+00 2.505067e+01
1.669456e+01 8.834190e+01
7.141351e+00 3.897474e+01
1.585592e+04 5.858210e+04
1 3.137979e+01
1.498863e+01 7.694948e+01
1.115443e+02 3.642917e+02
1.157677e+01 5.036716e+01
4.926492e+02 8.642784e+03
3.047117e+00 1.872154e+01
I have 36 txt files like this, but I can not put all of them here
"ScaBER 5637" is the column name of this dataset
See this previous answer: Histogram, error: Error in plot.new() : figure margins too large
par(mfcol=c(3,12), oma=c(1,1,0,0), mar=c(1,1,1,0), tcl=-0.1, mgp=c(0,0,0))
for(i in 1:36){
plot(runif(2), runif(2), type="l")
}
dev.off()

Resources