I am currently running a stacked species distribution model through a linux cluster using the following code:
library(SSDM)
setwd("/home/nikhail1")
Env <- load_var(path = getwd(), files = NULL, format = c(".grd", ".tif",
".asc",
".sdat", ".rst", ".nc",
".envi", ".bil", ".img"), categorical = "af_anthrome.asc",
Norm = TRUE, tmp = TRUE, verbose = TRUE, GUI = FALSE)
Occurrences <- load_occ(path = getwd(), Env, file =
"Final_African_Bird_occurrence_rarefied_points.csv",
Xcol = "decimallon", Ycol = "decimallat", Spcol =
"species", GeoRes = FALSE,
sep = ",", verbose = TRUE, GUI = FALSE)
head(Occurrences)
warnings()
SSDM <- stack_modelling(c("GLM", "GAM", "MARS", "GBM", "RF", "CTA",
"MAXENT", "ANN", "SVM"), Occurrences, Env, Xcol = "decimallon",
Ycol = "decimallat", Pcol = NULL, Spcol = "species", rep = 1,
name = "Stack", save = TRUE, path = getwd(), PA = NULL,
cv = "holdout", cv.param = c(0.75, 1), thresh = 1001,
axes.metric = "Pearson", uncertainty = TRUE, tmp = TRUE,
ensemble.metric = c("AUC", "Kappa", "sensitivity",
"specificity"), ensemble.thresh = c(0.75, 0.75, 0.75, 0.75), weight = TRUE,
method = "bSSDM", metric = "SES", range = NULL,
endemism = NULL, verbose = TRUE, GUI = FALSE, cores = 200)
save.stack(SSDM, name = "Bird", path = getwd(),
verbose = TRUE, GUI = FALSE)
I receive the following error message when trying to run my analyses:
Error in socketConnection("localhost", port = port, server = TRUE, blocking
= TRUE, :
all connections are in use
Calls: stack_modelling ... makePSOCKcluster -> newPSOCKnode ->
socketConnection
How do i increase the maximum number of connections? Can i do this within the SSDM package as parallel is built in. Do I have to apply a specific function from another package to ensure that my job runs smoothly across clusters?
Thank you for you help,
Nikhail
The maximum number of open connections you can have in R is 125. To increase the number of connections you can have open at the same time, you need to rebuild R from source. See
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28
Related
I am running the following command for GSEA analysis in R
gse <- gseGO(geneList=gene_list,
ont ="ALL",
keyType = "SYMBOL",
minGSSize = 3,
maxGSSize = 800,
pvalueCutoff = 0.05,
verbose = TRUE,
OrgDb = org.Hs.eg.db,
pAdjustMethod = "none")
It created an object gse as follows:
How do I create this manually with my own interested terms?
I am using RevoScaleR rxDataStep function in RStudio to write data to a SQL table. The process works fine yet I am unable to save the output from the reportProgress argument when set to any value, 2 or 3. The results show up in the Console but I can't figure out where the objects exist so I can save them for future use. How can I save the progress outputs?
out <- rxDataStep(inData = data, outFile = obj, varsToKeep = NULL, varsToDrop = NULL,
rowSelection = NULL, transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL,
transformPackages = NULL, transformEnvir = NULL,
append = "rows", overwrite = TRUE, rowVarName = NULL,
removeMissingsOnRead = FALSE, removeMissings = FALSE,
computeLowHigh = TRUE, maxRowsByCols = 50000000,
rowsPerRead = -1, startRow = 1, numRows = -1,
returnTransformObjects = F,
blocksPerRead = 10000,
reportProgress = 2)
This is what shows up in console:
Rows Processed: 100Total Rows written: 100, Total time: 0.384
Maybe this is a more general R console question, how to save any message from the console which is not available otherwise
Hello I'm a new bioinformatician so bear with me please!
I'm using the gprofiler2 to run GO/KEGG analysis in R using emacs/ess and I want to add a title to the table it offers:
publish_gosttable(gostres, highlight_terms = gostres$result[c(1:2,10,120),],
use_colors = TRUE,
show_columns = c("source", "term_name", "term_size", "intersection_size"),
filename = NULL)
I have tried the title(), tab_header() function but I can't seem to be able to add a title. My question is if there is some other function or package that would allow me to add it instead of having to do it manually.
The code so far
GOresult <- gost(
geneid1up$gene,
organism = "hsapiens",
ordered_query = FALSE,
multi_query = FALSE,
significant = TRUE,
exclude_iea = FALSE,
measure_underrepresentation = FALSE,
evcodes = FALSE,
user_threshold = 0.05,
correction_method = "gSCS",
domain_scope = "annotated",
custom_bg = NULL,
numeric_ns = "",
sources = c("GO:BP","GO:MF","GO:CC","KEGG"),
as_short_link = FALSE)
GOresult1 <- as.data.frame(GOresult$result)
GOresult1$minuslog10pval <- -log10(GOresult1$p_value)
names(GOresult1)[15] <- "-log10(pval)"
GOresult2 <- GOresult1[order(GOresult1$p_value, decreasing=F),]
plot1 <- publish_gosttable(GOresult2, highlight_terms = GOresult2[c(1:20),],
use_colors = FALSE,
show_columns = c("source", "term_name", "term_size", "intersection_size","-log10(pval)"),
filename = NULL)
Does this do the job?
library(gprofiler2)
library(ggplot2)
gostres <- gost(query = c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"),
organism = "hsapiens", ordered_query = FALSE,
multi_query = FALSE, significant = TRUE, exclude_iea = FALSE,
measure_underrepresentation = FALSE, evcodes = FALSE,
user_threshold = 0.05, correction_method = "g_SCS",
domain_scope = "annotated", custom_bg = NULL,
numeric_ns = "", sources = NULL, as_short_link = FALSE)
publish_gosttable(gostres, highlight_terms = gostres$result[c(1:2,10,120),],
use_colors = TRUE,
show_columns = c("source", "term_name", "term_size", "intersection_size"),
filename = NULL)+
ggtitle('Your Title')
Result:
The trick is that the plot is a ggplot objet. Therefore you can add the title using +ggtitle('Your Title') after your plot code (as in my example)
I am trying to run the Monocle3 function find_gene_modules() on a cell_data_set (cds) but am getting a variety of errors in this. I have not had any other issues before this. I am working with an imported Seurat object. My first error came back stating that the number of rows were not the same between my cds and cds#preprocess_aux$gene_loadings values. I took a look and it seems my gene loadings were a list under cds#preprocess_aux#listData$gene_loadings. I then ran the following code to make a dataframe version of the gene loadings:
test <- seurat#assays$RNA#counts#Dimnames[[1]]
test <- as.data.frame(test)
cds#preprocess_aux$gene_loadings <- test
rownames(cds#preprocess_aux$gene_loadings) <- cds#preprocess_aux$gene_loadings[,1]
Which created a cds#preprocess_aux$gene_loadings dataframe with the same number of rows and row names as my cds. This resolved my original error but now led to a new error being thrown from uwot as:
15:34:02 UMAP embedding parameters a = 1.577 b = 0.8951
Error in uwot(X = X, n_neighbors = n_neighbors, n_components = n_components, :
No numeric columns found
Running traceback() produces the following information.
> traceback()
4: stop("No numeric columns found")
3: uwot(X = X, n_neighbors = n_neighbors, n_components = n_components,
metric = metric, n_epochs = n_epochs, alpha = learning_rate,
scale = scale, init = init, init_sdev = init_sdev, spread = spread,
min_dist = min_dist, set_op_mix_ratio = set_op_mix_ratio,
local_connectivity = local_connectivity, bandwidth = bandwidth,
gamma = repulsion_strength, negative_sample_rate = negative_sample_rate,
a = a, b = b, nn_method = nn_method, n_trees = n_trees, search_k = search_k,
method = "umap", approx_pow = approx_pow, n_threads = n_threads,
n_sgd_threads = n_sgd_threads, grain_size = grain_size, y = y,
target_n_neighbors = target_n_neighbors, target_weight = target_weight,
target_metric = target_metric, pca = pca, pca_center = pca_center,
pca_method = pca_method, pcg_rand = pcg_rand, fast_sgd = fast_sgd,
ret_model = ret_model || "model" %in% ret_extra, ret_nn = ret_nn ||
"nn" %in% ret_extra, ret_fgraph = "fgraph" %in% ret_extra,
batch = batch, opt_args = opt_args, epoch_callback = epoch_callback,
tmpdir = tempdir(), verbose = verbose)
2: uwot::umap(as.matrix(preprocess_mat), n_components = max_components,
metric = umap.metric, min_dist = umap.min_dist, n_neighbors = umap.n_neighbors,
fast_sgd = umap.fast_sgd, n_threads = cores, verbose = verbose,
nn_method = umap.nn_method, ...)
1: find_gene_modules(cds[pr_deg_ids, ], reduction_method = "UMAP",
max_components = 2, umap.metric = "cosine", umap.min_dist = 0.1,
umap.n_neighbors = 15L, umap.fast_sgd = FALSE, umap.nn_method = "annoy",
k = 20, leiden_iter = 1, partition_qval = 0.05, weight = FALSE,
resolution = 0.001, random_seed = 0L, cores = 1, verbose = T)
I really have no idea what I am doing wrong or how to proceed from here. Does anyone with experience with uwot know where my error is coming from? Really appreciate the help!
I am currently running an ensemble niche model analyses through a Linux cluster in a CentOs6 environment. The package I am using is SSDM. My code is as follows:
Env <- load_var(path = getwd(), files = NULL, format = c(".grd", ".tif", ".asc",
".sdat", ".rst", ".nc", ".envi", ".bil", ".img"), categorical = "af_anthrome.asc",
Norm = TRUE, tmp = TRUE, verbose = TRUE, GUI = FALSE)
Env
head(Env)
warnings()
Occurrences <- load_occ(path = getwd(), Env, file =
"Final_African_Bird_occurrence_rarefied_points.txt",
Xcol = "decimallon", Ycol = "decimallat", Spcol =
"species", GeoRes = FALSE,
sep = ",", verbose = TRUE, GUI = FALSE)
head(Occurrences)
warnings()
SSDM <- stack_modelling(c("GLM", "GAM", "MARS", "GBM", "RF", "CTA",
"MAXENT", "ANN", "SVM"), Occurrences, Env, Xcol = "decimallon",
Ycol = "decimallat", Pcol = NULL, Spcol = "species", rep
= 1,
name = "Stack", save = TRUE, path = getwd(), PA = NULL,
cv = "holdout", cv.param = c(0.75, 1), thresh = 1001,
axes.metric = "Pearson", uncertainty = TRUE, tmp = TRUE,
ensemble.metric = c("AUC", "Kappa", "sensitivity", "specificity"), ensemble.thresh = c(0.75, 0.75, 0.75, 0.75), weight = TRUE,
method = "bSSDM", metric = "SES", range = NULL,
endemism = NULL, verbose = TRUE, GUI = FALSE, cores = 125)
save.stack(SSDM, name = "Bird", path = getwd(),
verbose = TRUE, GUI = FALSE)
When running the stack_modelling function I get this Error message:
Error in checkForRemoteErrors(val) :
125 nodes produced errors; first error: comparison of these types is not
implemented
Calls: stack_modelling ... clusterApply -> staticClusterApply ->
checkForRemoteErrors
In addition: Warning message:
In stack_modelling(c("GLM", "GAM", "MARS", "GBM", "RF", "CTA", "MAXENT", :
It seems you attributed more cores than your CPU have !
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode ->
unserialize
In addition: Warning message:
In eval(e, x, parent.frame()) :
Incompatible methods ("Ops.data.frame", "Ops.factor") for "=="
Execution halted
I understand that I may have attributed more cores than I have access to but this same error message crops up when I use a fraction of the cores. I am not entirely sure what this error message is trying to tell me or how to fix it as I am new to working on a cluster. Is it a problem with parallel processing of the data? Is there a line of code which can help me fix this issue?
Thanks