RevoScaleR Save reportProgress Output - r

I am using RevoScaleR rxDataStep function in RStudio to write data to a SQL table. The process works fine yet I am unable to save the output from the reportProgress argument when set to any value, 2 or 3. The results show up in the Console but I can't figure out where the objects exist so I can save them for future use. How can I save the progress outputs?
out <- rxDataStep(inData = data, outFile = obj, varsToKeep = NULL, varsToDrop = NULL,
rowSelection = NULL, transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL,
transformPackages = NULL, transformEnvir = NULL,
append = "rows", overwrite = TRUE, rowVarName = NULL,
removeMissingsOnRead = FALSE, removeMissings = FALSE,
computeLowHigh = TRUE, maxRowsByCols = 50000000,
rowsPerRead = -1, startRow = 1, numRows = -1,
returnTransformObjects = F,
blocksPerRead = 10000,
reportProgress = 2)
This is what shows up in console:
Rows Processed: 100Total Rows written: 100, Total time: 0.384
Maybe this is a more general R console question, how to save any message from the console which is not available otherwise

Related

Add a title to gprofiler2 gosttable in R

Hello I'm a new bioinformatician so bear with me please!
I'm using the gprofiler2 to run GO/KEGG analysis in R using emacs/ess and I want to add a title to the table it offers:
publish_gosttable(gostres, highlight_terms = gostres$result[c(1:2,10,120),],
use_colors = TRUE,
show_columns = c("source", "term_name", "term_size", "intersection_size"),
filename = NULL)
I have tried the title(), tab_header() function but I can't seem to be able to add a title. My question is if there is some other function or package that would allow me to add it instead of having to do it manually.
The code so far
GOresult <- gost(
geneid1up$gene,
organism = "hsapiens",
ordered_query = FALSE,
multi_query = FALSE,
significant = TRUE,
exclude_iea = FALSE,
measure_underrepresentation = FALSE,
evcodes = FALSE,
user_threshold = 0.05,
correction_method = "gSCS",
domain_scope = "annotated",
custom_bg = NULL,
numeric_ns = "",
sources = c("GO:BP","GO:MF","GO:CC","KEGG"),
as_short_link = FALSE)
GOresult1 <- as.data.frame(GOresult$result)
GOresult1$minuslog10pval <- -log10(GOresult1$p_value)
names(GOresult1)[15] <- "-log10(pval)"
GOresult2 <- GOresult1[order(GOresult1$p_value, decreasing=F),]
plot1 <- publish_gosttable(GOresult2, highlight_terms = GOresult2[c(1:20),],
use_colors = FALSE,
show_columns = c("source", "term_name", "term_size", "intersection_size","-log10(pval)"),
filename = NULL)
Does this do the job?
library(gprofiler2)
library(ggplot2)
gostres <- gost(query = c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"),
organism = "hsapiens", ordered_query = FALSE,
multi_query = FALSE, significant = TRUE, exclude_iea = FALSE,
measure_underrepresentation = FALSE, evcodes = FALSE,
user_threshold = 0.05, correction_method = "g_SCS",
domain_scope = "annotated", custom_bg = NULL,
numeric_ns = "", sources = NULL, as_short_link = FALSE)
publish_gosttable(gostres, highlight_terms = gostres$result[c(1:2,10,120),],
use_colors = TRUE,
show_columns = c("source", "term_name", "term_size", "intersection_size"),
filename = NULL)+
ggtitle('Your Title')
Result:
The trick is that the plot is a ggplot objet. Therefore you can add the title using +ggtitle('Your Title') after your plot code (as in my example)

uwot is throwing an error running the Monocle3 R package's "find_gene_module()" function, likely as an issue with how my data is formatted

I am trying to run the Monocle3 function find_gene_modules() on a cell_data_set (cds) but am getting a variety of errors in this. I have not had any other issues before this. I am working with an imported Seurat object. My first error came back stating that the number of rows were not the same between my cds and cds#preprocess_aux$gene_loadings values. I took a look and it seems my gene loadings were a list under cds#preprocess_aux#listData$gene_loadings. I then ran the following code to make a dataframe version of the gene loadings:
test <- seurat#assays$RNA#counts#Dimnames[[1]]
test <- as.data.frame(test)
cds#preprocess_aux$gene_loadings <- test
rownames(cds#preprocess_aux$gene_loadings) <- cds#preprocess_aux$gene_loadings[,1]
Which created a cds#preprocess_aux$gene_loadings dataframe with the same number of rows and row names as my cds. This resolved my original error but now led to a new error being thrown from uwot as:
15:34:02 UMAP embedding parameters a = 1.577 b = 0.8951
Error in uwot(X = X, n_neighbors = n_neighbors, n_components = n_components, :
No numeric columns found
Running traceback() produces the following information.
> traceback()
4: stop("No numeric columns found")
3: uwot(X = X, n_neighbors = n_neighbors, n_components = n_components,
metric = metric, n_epochs = n_epochs, alpha = learning_rate,
scale = scale, init = init, init_sdev = init_sdev, spread = spread,
min_dist = min_dist, set_op_mix_ratio = set_op_mix_ratio,
local_connectivity = local_connectivity, bandwidth = bandwidth,
gamma = repulsion_strength, negative_sample_rate = negative_sample_rate,
a = a, b = b, nn_method = nn_method, n_trees = n_trees, search_k = search_k,
method = "umap", approx_pow = approx_pow, n_threads = n_threads,
n_sgd_threads = n_sgd_threads, grain_size = grain_size, y = y,
target_n_neighbors = target_n_neighbors, target_weight = target_weight,
target_metric = target_metric, pca = pca, pca_center = pca_center,
pca_method = pca_method, pcg_rand = pcg_rand, fast_sgd = fast_sgd,
ret_model = ret_model || "model" %in% ret_extra, ret_nn = ret_nn ||
"nn" %in% ret_extra, ret_fgraph = "fgraph" %in% ret_extra,
batch = batch, opt_args = opt_args, epoch_callback = epoch_callback,
tmpdir = tempdir(), verbose = verbose)
2: uwot::umap(as.matrix(preprocess_mat), n_components = max_components,
metric = umap.metric, min_dist = umap.min_dist, n_neighbors = umap.n_neighbors,
fast_sgd = umap.fast_sgd, n_threads = cores, verbose = verbose,
nn_method = umap.nn_method, ...)
1: find_gene_modules(cds[pr_deg_ids, ], reduction_method = "UMAP",
max_components = 2, umap.metric = "cosine", umap.min_dist = 0.1,
umap.n_neighbors = 15L, umap.fast_sgd = FALSE, umap.nn_method = "annoy",
k = 20, leiden_iter = 1, partition_qval = 0.05, weight = FALSE,
resolution = 0.001, random_seed = 0L, cores = 1, verbose = T)
I really have no idea what I am doing wrong or how to proceed from here. Does anyone with experience with uwot know where my error is coming from? Really appreciate the help!

Only Table in rpivotTable

I'm using the rpivotTable package in Shiny application and I'd like to have only the choice of 'Table' for the users (no charts)
The RenderName argument is only used to choose the default display...
output$pivot <- renderRpivotTable(
rpivotTable(iris,
rendererName = "Table" )
)
Many thanks in advance !
There are multiple issues here.
you can specify renderers via the anonymos renderers argument in rpivotTable(). I have the JS code form here.
however, there is a bug when only selecting one option. In this case, rpivotTable() wraps the argument in a list again (see the Map() call in the original function code) and the forwarding to JS fails.
Therefore, I accounted for this issue and extended the function a bit. Play around with aggregators/renderers to see how it behaves differently to the original rpivotTable() function.
# define own function
my_rpivotTable <- function (data, rows = NULL, cols = NULL, aggregatorName = NULL,
vals = NULL, rendererName = NULL, sorter = NULL, exclusions = NULL,
inclusions = NULL, locale = "en", subtotals = FALSE, ...,
width = 800, height = 600, elementId = NULL)
{
if (length(intersect(class(data), c("data.frame", "data.table",
"table", "structable", "ftable"))) == 0) {
stop("data should be a data.frame, data.table, or table",
call. = F)
}
if (length(intersect(c("table", "structable", "ftable"),
class(data))) > 0)
data <- as.data.frame(data)
params <- list(rows = rows, cols = cols, aggregatorName = aggregatorName,
vals = vals, rendererName = rendererName, sorter = sorter,
...)
params <- Map(function(p) {
# added to the class check -------------------------------------------------
if (length(p) == 1 && class(p[[1]]) != "JS_EVAL") {
p = list(p)
}
return(p)
}, params)
par <- list(exclusions = exclusions, inclusions = inclusions)
params <- c(params, par)
params <- Filter(Negate(is.null), params)
x <- list(data = data, params = params, locale = locale,
subtotals = subtotals)
htmlwidgets::createWidget(name = "rpivotTable", x, width = width,
height = height, elementId = elementId, package = "rpivotTable")
}
# create the pivot table
my_rpivotTable(
expand.grid(LETTERS, 1:3),
aggregatorName = "Count",
aggregators = list(Sum = htmlwidgets::JS('$.pivotUtilities.aggregators["Sum"]'),
Count = htmlwidgets::JS('$.pivotUtilities.aggregators["Count"]')),
rendererName = "fancyTable",
renderers = list(fancyTable = htmlwidgets::JS('$.pivotUtilities.renderers["Table"]'))
)

Unable to start Selenium session in R

I've been trying to set up a Selenium session with Rwebdriver, but so far with no success. I believe I have initiated a Selenium server, but whenever I run
`> Rwebdriver::start_session(root = "http://localhost:4444/wd/hub/")`
I get the following error:
Error in serverDetails$value[[1]] : subscript out of bounds
After I set options(error = recover) to run a debug, I get the following result:
function (root = NULL, browser = "firefox", javascriptEnabled = TRUE,
takesScreenshot = TRUE, handlesAlerts = TRUE, databaseEnabled = TRUE,
cssSelectorsEnabled = TRUE)
{
server <- list(desiredCapabilities = list(browserName = browser,
javascriptEnabled = javascriptEnabled, takesScreenshot = takesScreenshot,
handlesAlerts = handlesAlerts, databaseEnabled = databaseEnabled,
cssSelectorsEnabled = cssSelectorsEnabled))
newSession <- getURL(paste0(root, "session"), customrequest = "POST",
httpheader = c(`Content-Type` = "application/json;charset=UTF-8"),
postfields = toJSON(server))
serverDetails <- fromJSON(rawToChar(getURLContent(paste0(root,
"sessions"), binary = TRUE)))
sessionList <- list(time = Sys.time(), sessionURL = paste0(root,
"session/", serverDetails$value[[1]]$id))
class(sessionList) <- "RSelenium"
print("Started new session. sessionList created.")
seleniumSession <<- sessionList
}
The issue is at paste0(root, "session/", serverDetails$value[[1]]$id)), and true enough, whenever I print serverDetails$value, all that appears is list(). Does anyone know how to fix the issue? Thank you for your attention and patience in advance.

Error in socketConnection in SSDM (from R)

I am currently running a stacked species distribution model through a linux cluster using the following code:
library(SSDM)
setwd("/home/nikhail1")
Env <- load_var(path = getwd(), files = NULL, format = c(".grd", ".tif",
".asc",
".sdat", ".rst", ".nc",
".envi", ".bil", ".img"), categorical = "af_anthrome.asc",
Norm = TRUE, tmp = TRUE, verbose = TRUE, GUI = FALSE)
Occurrences <- load_occ(path = getwd(), Env, file =
"Final_African_Bird_occurrence_rarefied_points.csv",
Xcol = "decimallon", Ycol = "decimallat", Spcol =
"species", GeoRes = FALSE,
sep = ",", verbose = TRUE, GUI = FALSE)
head(Occurrences)
warnings()
SSDM <- stack_modelling(c("GLM", "GAM", "MARS", "GBM", "RF", "CTA",
"MAXENT", "ANN", "SVM"), Occurrences, Env, Xcol = "decimallon",
Ycol = "decimallat", Pcol = NULL, Spcol = "species", rep = 1,
name = "Stack", save = TRUE, path = getwd(), PA = NULL,
cv = "holdout", cv.param = c(0.75, 1), thresh = 1001,
axes.metric = "Pearson", uncertainty = TRUE, tmp = TRUE,
ensemble.metric = c("AUC", "Kappa", "sensitivity",
"specificity"), ensemble.thresh = c(0.75, 0.75, 0.75, 0.75), weight = TRUE,
method = "bSSDM", metric = "SES", range = NULL,
endemism = NULL, verbose = TRUE, GUI = FALSE, cores = 200)
save.stack(SSDM, name = "Bird", path = getwd(),
verbose = TRUE, GUI = FALSE)
I receive the following error message when trying to run my analyses:
Error in socketConnection("localhost", port = port, server = TRUE, blocking
= TRUE, :
all connections are in use
Calls: stack_modelling ... makePSOCKcluster -> newPSOCKnode ->
socketConnection
How do i increase the maximum number of connections? Can i do this within the SSDM package as parallel is built in. Do I have to apply a specific function from another package to ensure that my job runs smoothly across clusters?
Thank you for you help,
Nikhail
The maximum number of open connections you can have in R is 125. To increase the number of connections you can have open at the same time, you need to rebuild R from source. See
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28

Resources