Warning message from ggfortify::ggbiplot - r

I am trying to use ggbiplotfrom ggfortify package. It seems its working fine but I am getting warning message as follows,
mdl <- pls::plsr(mpg ~ ., data = mtcars, scale = T)
scrs <- data.frame(pls::scores(mdl)[])
loads <- data.frame(pls::loadings(mdl)[])
ggfortify::ggbiplot(scrs, loads,
label.label = rownames(scrs), asp = 1, label = T, label.size = 3,
loadings = T, loadings.label = T, loadings.label.label = rownames(loads))
Warning messages:
1: In if (value %in% columns) { :
the condition has length > 1 and only the first element will be used
2: In if (value %in% columns) { :
the condition has length > 1 and only the first element will be used
Have I taken any wrong step or is it a bug.

According to the ggbiplot documentation, the label.label= parameter expects the column names from which to pull the names; it does not expect a vector of names. Same goes for loadings.label.label=. (ggplot and most tidyverse functions don't like rownames very much -- better to make them a proper column)
scrs$ID <- rownames(scrs)
loads$ID <- rownames(loads)
ggfortify::ggbiplot(scrs, loads,
label.label = "ID", asp = 1, label = T, label.size = 3,
loadings = T, loadings.label = T, loadings.label.label = "ID")

Related

uwot is throwing an error running the Monocle3 R package's "find_gene_module()" function, likely as an issue with how my data is formatted

I am trying to run the Monocle3 function find_gene_modules() on a cell_data_set (cds) but am getting a variety of errors in this. I have not had any other issues before this. I am working with an imported Seurat object. My first error came back stating that the number of rows were not the same between my cds and cds#preprocess_aux$gene_loadings values. I took a look and it seems my gene loadings were a list under cds#preprocess_aux#listData$gene_loadings. I then ran the following code to make a dataframe version of the gene loadings:
test <- seurat#assays$RNA#counts#Dimnames[[1]]
test <- as.data.frame(test)
cds#preprocess_aux$gene_loadings <- test
rownames(cds#preprocess_aux$gene_loadings) <- cds#preprocess_aux$gene_loadings[,1]
Which created a cds#preprocess_aux$gene_loadings dataframe with the same number of rows and row names as my cds. This resolved my original error but now led to a new error being thrown from uwot as:
15:34:02 UMAP embedding parameters a = 1.577 b = 0.8951
Error in uwot(X = X, n_neighbors = n_neighbors, n_components = n_components, :
No numeric columns found
Running traceback() produces the following information.
> traceback()
4: stop("No numeric columns found")
3: uwot(X = X, n_neighbors = n_neighbors, n_components = n_components,
metric = metric, n_epochs = n_epochs, alpha = learning_rate,
scale = scale, init = init, init_sdev = init_sdev, spread = spread,
min_dist = min_dist, set_op_mix_ratio = set_op_mix_ratio,
local_connectivity = local_connectivity, bandwidth = bandwidth,
gamma = repulsion_strength, negative_sample_rate = negative_sample_rate,
a = a, b = b, nn_method = nn_method, n_trees = n_trees, search_k = search_k,
method = "umap", approx_pow = approx_pow, n_threads = n_threads,
n_sgd_threads = n_sgd_threads, grain_size = grain_size, y = y,
target_n_neighbors = target_n_neighbors, target_weight = target_weight,
target_metric = target_metric, pca = pca, pca_center = pca_center,
pca_method = pca_method, pcg_rand = pcg_rand, fast_sgd = fast_sgd,
ret_model = ret_model || "model" %in% ret_extra, ret_nn = ret_nn ||
"nn" %in% ret_extra, ret_fgraph = "fgraph" %in% ret_extra,
batch = batch, opt_args = opt_args, epoch_callback = epoch_callback,
tmpdir = tempdir(), verbose = verbose)
2: uwot::umap(as.matrix(preprocess_mat), n_components = max_components,
metric = umap.metric, min_dist = umap.min_dist, n_neighbors = umap.n_neighbors,
fast_sgd = umap.fast_sgd, n_threads = cores, verbose = verbose,
nn_method = umap.nn_method, ...)
1: find_gene_modules(cds[pr_deg_ids, ], reduction_method = "UMAP",
max_components = 2, umap.metric = "cosine", umap.min_dist = 0.1,
umap.n_neighbors = 15L, umap.fast_sgd = FALSE, umap.nn_method = "annoy",
k = 20, leiden_iter = 1, partition_qval = 0.05, weight = FALSE,
resolution = 0.001, random_seed = 0L, cores = 1, verbose = T)
I really have no idea what I am doing wrong or how to proceed from here. Does anyone with experience with uwot know where my error is coming from? Really appreciate the help!

Topic label of each document in LDA model using textmineR

I'm using textmineR to fit a LDA model to documents similar to https://cran.r-project.org/web/packages/textmineR/vignettes/c_topic_modeling.html. Is it possible to get the topic label for each document in the data set?
>library(textmineR)
>data(nih_sample)
> # create a document term matrix
> dtm <- CreateDtm(doc_vec = nih_sample$ABSTRACT_TEXT,doc_names =
nih_sample$APPLICATION_ID, ngram_window = c(1, 2), stopword_vec =
c(stopwords::stopwords("en"), stopwords::stopwords(source = "smart")),lower
= TRUE, remove_punctuation = TRUE,remove_numbers = TRUE, verbose = FALSE,
cpus = 2)
>dtm <- dtm[,colSums(dtm) > 2]
>set.seed(123)
> model <- FitLdaModel(dtm = dtm, k = 20,iterations = 200,burnin =
180,alpha = 0.1, beta = 0.05, optimize_alpha = TRUE, calc_likelihood =
TRUE,calc_coherence = TRUE,calc_r2 = TRUE,cpus = 2)
then adding the labels to the model:
> model$labels <- LabelTopics(assignments = model$theta > 0.05, dtm = dtm,
M = 1)
now I want the topic labels for each of 100 document in nih_sample$ABSTRACT_TEXT
Are you looking to label each document by the label of its most prevalent topic? IF so, this is how you could do it:
# convert labels to a data frame so we can merge
label_df <- data.frame(topic = rownames(model$labels), label = model$labels, stringsAsFactors = FALSE)
# get the top topic for each document
top_topics <- apply(model$theta, 1, function(x) names(x)[which.max(x)][1])
# convert the top topics for each document so we can merge
top_topics <- data.frame(document = names(top_topics), top_topic = top_topics, stringsAsFactors = FALSE)
# merge together. Now each document has a label from its top topic
top_topics <- merge(top_topics, label_df, by.x = "top_topic", by.y = "topic", all.x = TRUE)
This kind of throws away some information that you'd get from LDA though. One advantage of LDA is that each document can have more than one topic. Another is that we can see how much of each topic is in that document. You can do that here by
# set the plot margins to see the labels on the bottom
par(mar = c(8.1,4.1,4.1,2.1))
# barplot the first document's topic distribution with labels
barplot(model$theta[1,], names.arg = model$labels, las = 2)

Skip empty panel using lattice package, R programming

I want to skip a empty panel using lattice package in R.
set.seed(1)
df1 <- data.frame("treatment" = c(rep("A",16),rep("B",16),rep("C",16)),
"disease_type" = c(rep("1",8),rep("2",8)),
"days_after_application" = rep(c(rep("10-24",4),rep("24-48",4)),6),
"severity" = rnorm(48, mean = 80, sd = 5))
df1[(df1$disease_type == "2" & df1$days_after_application == "24-48"),"severity"] <- NA
library(lattice)
figure1 <- bwplot(treatment~severity|days_after_application+disease_type,
data = df1,layout = c(2,2),
strip = strip.custom(strip.names = TRUE))
jpeg("figure1.jpeg")
print(figure1)
dev.off()
Here is what I get
My question is how I can remove/skip empty panel in the top right WITHOUT changing layout?
I have tried following code. However, it doesn't work.
figure2 <- bwplot(treatment~severity|days_after_application+disease_type,
data = df1,layout = c(2,2),
strip = strip.custom(strip.names = TRUE),
skip = c(FALSE,FALSE,FALSE,TRUE))
jpeg("figure2.jpeg")
print(figure2)
dev.off()
Here is what I got
I also tried following codes. But it is not what I want since I do want 2 levels strips.
df1[(df1$disease_type == "2" & df1$days_after_application == "24-48"),] <- NA
bwplot(treatment~severity|interaction(days_after_application,disease_type),
data = df1,layout = c(2,2),
strip = strip.custom(strip.names = TRUE))
Thank you!
Get help from a Professor in Temple University.
Here is his solution:
figure4 <- bwplot(treatment~severity|days_after_application+disease_type,
data = df1,layout = c(2,2),
strip = strip.custom(strip.names = TRUE),
skip = c(FALSE,FALSE,FALSE,TRUE),
scales=list(alternating=FALSE), ## keep x-scale on bottom
between=list(x=1, y=1)) ## space between panels
pdf("figure4%03d.pdf",onefile = FALSE) ## force two pages in file.
print(figure4)
dev.off()

Insert a blank column in dataframe

I would like to insert a blank column in between "Delta = delta" and "Card = vars" in the dataframe below. I would also like to sort the output by the column "Model_Avg_Error" in the dataframe as well.
df = data.frame(Card = vars, Model_Avg_Error = model_error, Forecast = forecasts, Delta = delta, ,Card = vars, Model_Avg_Error = model_error,
Forecast = forecasts, Delta = delta)
# save
write.csv(df, file = file.path(proj_path, "output.csv"), row.names = F)
This was the error received from above:
Error in data.frame(Card = vars, Model_Avg_Error = model_error, Forecast = forecasts, :
argument is missing, with no default
You can add your blank column, re-order, and sort using the code below:
df$blankVar <- NA #blank column
df[c("Card", "blankVar", "Model_Avg_Error", "Forecast", "Delta")] #re-ordering columns by name
df[order(df$Model_Avg_Error),] #sorting by Model_Avg_Error
Here's a general way to add a new, blank column
library(tibble)
# Adds after the second column
iris %>% add_column(new_col = NA, .after = 2)
# Adds after a specific column (in this case, after Sepal.Width)
iris %>% add_column(new_col = NA, .after = "Sepal.Width")

PerformanceAnalytics charts.RollingRegression plots initial window values. How do I make it not do that?

I am using the PerformanceAnalytics package to analyze some monthly returns. The charts.RollingRegression should plot the n-month rolling regression against some benchmark.
The data is just 6 returns series from April 08 to December 2014, trying to regress against the SPY.
indexReturns <- read.table("quantIndices.csv", stringsAsFactors = FALSE, sep = ",", fill = TRUE, row.names = 1, header=TRUE)
hfIndexReturns <- read.table("quantHFIndices.csv", stringsAsFactors = FALSE, sep = ",", fill = TRUE, row.names = 1, header=TRUE)
peerReturns <- read.table("quantPeers.csv", stringsAsFactors = FALSE, sep = ",", fill = TRUE, row.names = 1, header=TRUE)
splits <- as.data.frame(strsplit(rownames(indexReturns), "/"))
rownames(indexReturns) <- unname(sapply(splits, function(x) paste0(x[3], "-", x[1], "-", x[2])))
splits <- as.data.frame(strsplit(rownames(peerReturns), "/"))
rownames(peerReturns) <- unname(sapply(splits, function(x) paste0(x[3], "-", x[1], "-", x[2])))
Ret <- xts(peerReturns, order.by = as.Date(row.names(peerReturns)))
Rb <- xts(indexReturns, order.by = as.Date(row.names(indexReturns)))
charts.RollingRegression(Ret, Rb[,2, drop = FALSE], Rf = 0.001, na.pad = TRUE)
This produces the following chart:
I would like it to omit the "meaningless) first 12 months, but there is no documentation on how this is done, and any other depiction of this chart I can find looks like this:
Looking at the source, in the main meat of the function, I see:
for (column.a in 1:columns.a) {
for (column.b in 1:columns.b) {
merged.assets = merge(Ra.excess[, column.a, drop = FALSE],
Rb.excess[, column.b, drop = FALSE])
if (attribute == "Alpha")
column.result = rollapply(na.omit(merged.assets),
width = width, FUN = function(x) lm(x[, 1,
drop = FALSE] ~ x[, 2, drop = FALSE])$coefficients[1],
by = 1, by.column = FALSE, fill = na.pad, align = "right")
if (attribute == "Beta")
column.result = rollapply(na.omit(merged.assets),
width = width, FUN = function(x) lm(x[, 1,
drop = FALSE] ~ x[, 2, drop = FALSE])$coefficients[2],
by = 1, by.column = FALSE, fill = na.pad, align = "right")
if (attribute == "R-Squared")
column.result = rollapply(na.omit(merged.assets),
width = width, FUN = function(x) summary(lm(x[,
1, drop = FALSE] ~ x[, 2, drop = FALSE]))$r.squared,
by = 1, by.column = FALSE, align = "right")
column.result.tmp = xts(column.result)
colnames(column.result.tmp) = paste(columnnames.a[column.a],
columnnames.b[column.b], sep = " to ")
column.result = xts(column.result.tmp, order.by = time(column.result))
if (column.a == 1 & column.b == 1)
Result.calc = column.result
else Result.calc = merge(Result.calc, column.result)
}
}
And we can see there is no na.pad being passed to the final "R-Squared" function, which results in the graph I would expect to see for both the first two charts. I would like to fix this, but I cannot edit the package code. I tried using "assignInNamespace", but it doesn't work. The function seems to work, but the function code does not change in the package. I would also like to remove the leading blank space in the graphs as well, but if you guys could let me know how to edit this, or know any workarounds please let me know. (And thanks! You guys are gods!)
OH! And PS - Why the heck is my version of the package seemingly the only one that has this problem??? Why don't my graphs look right by default?
EDIT: This is not the only piece of code from this package which is suspect. I keep having things break and not work as it seems to be documented (Error in R[, nc] - coredata(Rf) : non-numeric argument to binary operator seems to happen about every other function call.) Anyone have any suggestions for better packages for this type stuff?
The subsetting of the data should be done prior to passing to the charts.RollingRegression function. The mighty xts provides this functionality:
charts.RollingRegression(Ret["2009-04::",], Rb["2009-04::",2, drop = FALSE], Rf = 0.001, na.pad = TRUE)
You can read more about how to subset with xts by looking at the help page in R via ?subset.xts.
Let's break this down a bit:
The charts.RollingRegression is just a wrapper to calculate the rolling beta then plots it.
Here is an example with the rolling alpha and beta:
require(PerformanceAnalytics)
data(managers)
capm_xts = xts(matrix(nrow=nrow(managers),ncol=2),order.by=index(managers))
for(i in 12:nrow(managers)){
capm_xts[i,] = coef(lm(managers[(i-11):i,1]~managers[(i-11):i,4]))
}
colnames(capm_xts) = c('alpha','beta')
chart.TimeSeries(capm_xts[12:nrow(capm_xts),2])

Resources