Problem in plotCountDepth R function. How to solve it? - r

I'm dealing with a dataframe that i called GBM that contains single-cell measurements. So i'm relying to SCnorm package to deal with the normalization process and to have a previous check of my data. I'm using (plotCountDepth function)
This is my pipeline :
sce <- SingleCellExperiment::SingleCellExperiment(assays = list('counts' = GBM))
sce <- plotCountDepth(Data = sce,
Conditions = Label,
FilterCellProportion = .1,
NCores = 3)
I do not really understand why I continue to have this error returned
Error in colSums(Data[, which(Conditions == Levels[x])]) :
'x' must be an array of at least two dimensions
even if I'm applying the same criteria I find in BioConductor
For you to have major information Label is a vector of the same dimension of GBM that is a matrix G x S, containing a series of labels to distinguish each cell group.
Thank you in advance
PS : GBM is a matrix which columns are named by the various cell names while the rows are of course the genes

As the vignette stated:
Data: can be a matrix of single-cell expression with cells where rows
are genes and columns are samples. Gene names should not be a column
in this matrix, but should be assigned to rownames(Data).
Below I provide a minimum working example and I suggest you check whether you specified the rownames correctly:
library(SingleCellExperiment)
library(SCnorm)
GBM = matrix(rpois(10000,20),ncol=50)
rownames(GBM) = paste0("Gene",1:200)
colnames(GBM) = paste0("Sample",1:50)
Label=rep(c("X","Y"),each=25)
sce <- SingleCellExperiment(assays = list('counts' = GBM))
This function works but it is not very well written because it prints out the ggplot object but there's no way of storing it:
plt <- plotCountDepth(Data = sce,Conditions = Label,
FilterCellProportion = .1,NCores = 3)

Related

Error when creating a data frame: only 0's may be mixed with negative subscripts

I have the following variables: CFNAIdiff(first differenced), HOUSTgr, INDPROgr, UMCSENTgr, and UNRATEgr(which are growth rates). I want to build an AR model and I am trying to construct a data frame in the following way:
dataframe <- data.frame(y = INDPROgr[2:T], INDPROgr = INDRPOgr[1:(T-1)],
HOUSTgr = HOUSTgr[1:(T-1)], UMCSENTgr = UMCSENTgr[1:(T-1)],
UNRATEgr = UNRATEgr[1:(T-1)], CFNAIdiff = CFNAIdiff[1:(T-1)])
However, I encounter the following problem:
Error in INDPROgr[1:(T - 1)] :
only 0's may be mixed with negative subscripts
What am I specifying wrong?
The error is stating that you are trying to subset both positive and negative numbers. Lets make a simple example
dat <- data.frame(A = LETTERS[1:10], B = 1:10)
We can subset the data.frame in this example using standard methods as you are doing in your own code
dat[0:3,]
which will return the first 3 rows. Here 0 is treated as empty row, and thus returns an empty row (different from a row with nulls)
dat[0,]
Now if we by a mistake end up subsetting by lets say a variable T, and this for some reason is 0 or negative you will get an error, if you want to return any specific rows. This is in turn the case to avoid any conflicts such as
dat[c(-1,1),]
which technically is trying to return the entire data frame minus the first row, but including the first row equivalent to rbind(dat[-1,], dat[1,]).
So if we have some function or script that subsets alike your script
dataframe<- data.frame( y = INDPROgr[2:T],
INDPROgr = INDRPOgr[1:(T-1)],
HOUSTgr = HOUSTgr[1:(T-1)],
UMCSENTgr = UMCSENTgr[1:(T-1)],
UNRATEgr = UNRATEgr[1:(T-1)],
CFNAIdiff = CFNAIdiff[1:(T-1)])
R will return an error in the case that T is either 0 as T-1 = -1 meaning you are subsetting 1:(-1), or alternatively if T itself is negative, for the same reasons.
As such i suggest checking if T becomes negative or zero somewhere in your code.

Storing p-values from multiple 2 sample t-tests in R

Good evening,
I'm working on a class project and I am trying to do multiple unpaired 2 sample t-tests and then storing their p-values so that I can work with just the p-values later
Below is the code I have been trying:
pVals_1Beta <-vector("numeric", length = nrow(group1_Y_Beta))
for (i in 1:nrow(group1_Y_Beta)) {
pVals_1Beta[i] <- t.test(x = group1_Y_Beta$values[i,],
y = group1_N_Beta$values[i,],
paired = FALSE,
var.equal =FALSE,
conf.level = 0.95)$p.value
}
where group1_Y_Beta and group1_N_Beta have two columns(values and ind) and about 110312 rows and I want to do run unpaired t-test comparing the two groups values and store all 110312 p-values. When I try running this I get:
Error in group1_Y_Beta$values[i, ] : incorrect number of dimensions
Any help on how to tweak my code to get it to work would be greatly appreciated.
THanks, LIz
Since group1_N_Beta and group1_Y_Beta are 2D objects, you need (1) row and (2) column identifier in order to obtain a specific cell's value. But since you already specified the name of the column using the $ notation, you only need to provide one number (or a vector of numbers) to complete the query. Replace [i,] ("ith row, all columns") with [i]

Can't get 'plotweb' in the Biparite package to work (R)

I am trying to visualise a biparite network using the biparite package in R. My data consists of 4 columns in a spreadsheet. The columns contain 1) plant species names2) bee species names 3) site 4) interaction frequency. I first read the data into R from a CSV file, then convert it to a web using the helper function frame2webs. When I then try to visualise the network with plotweb() I get the error message:
Error in web[rind, cind, drop = FALSE] : incorrect number of dimensions
My code looks like this:
library(bipartite)
bee <- read.csv('TestFile.csv')
bees <- as.data.frame(bee)
BeeWeb <- frame2webs(bees, type.out = "array")
plotweb(BeeWeb)
I've also tried:
BeeWeb <- frame2webs(bees,
varnames = c("higher","lower","webID","freq"),
type.out = "array")
Please help! I am new to R and am struggling to make this work. Cheers!
Not sure what your data look like, but this happens to me when I have a single factor level in either the "higher" or "lower" column, type.out is "list", and emptylist is TRUE.
This is due to a problem in empty, a function that frame2webs only calls when type.out is "list" and emptylist is TRUE. empty finds the dimensions of your data using NROW and NCOL, which interpret a single row of input as a vertical vector. When there's only one factor level in "lower" or "higher", the input to empty is a one-row array. empty interprets this row as a column, hence the 'incorrect number of dimensions' error.
Two simple workarounds:
Set type.out to "array"
Set emptylist to FALSE

xgb.DMatrix Error: The length of labels must equal to the number of rows in the input data

I am using xgboost in R.
I created the xgb matrix fine using a matrix as input, but when I reduce the number in columns in the matrix data, I receive an error.
This works:
> dim(ctt1)
[1] 6401 5901
> xgbmat1 <- xgb.DMatrix(
Matrix(data.matrix(ctt1)),
label = as.matrix(as.numeric(data$V2)) - 1
)
This does not:
> dim(ctt1[,nr])
[1] 6401 1048
xgbmat1 <- xgb.DMatrix(
Matrix(data.matrix(ctt1[,nr])),
label = as.matrix(as.numeric(data$V2)) - 1)
Error in xgb.setinfo(dmat, names(p), p[[1]]) :
The length of labels must equal to the number of rows in the input data
In my case I fixed this error by changing assign operation:
labels <- df_train$target_feature
It turns out that by removing some columns, there are some rows with all 0s, and could not contribute to model.
For sparse matrices, xgboost R interface uses the CSC format creation method. The problem currently is that this method automatically determines the number of rows from the existing non-sparse values, and any completely sparse rows at the end are not counted in. A similar loss of completely sparse columns at the end can happen with the CSR sparse format. For more details see xgboost issue #1223 and also wikipedia on the sparse matrix formats.
The proper way for creating the DBMatrix Like
xgtrain <- xgb.DMatrix(data = as.matrix(X_train[,-5]), label = `X_train$item_cnt_month)`
drop the label column in data parameter and use same data set for create label column in index five i have item_cnt_month i drop it at run time and use same data set for referring label column
Before splitting your data, you need to turn it into a data frame.
For Exemplo:
data <- read.csv(...)
data = as.data.frame(data)
Now you can set your train data and test data to use in your "sparse.model.matrix" and "xgb.DMatrix".

simba-package R: Comparing mean similarity between subsets of data with missing values

I am trying to compare mean similarity between 3 subsets of data using the com.sim function (simba-package), but I’m having trouble getting the function to ignore missing values and correctly run the analysis.
Some background on my data and what I’ve done so far: My data is binary, but unlike the kinds of data for which the function is written, I working with skeletal remains, which are typically incomplete and fragmented. Thus, ~10% of my data matrix has missing values.
When I run this command in R
com.sim(mydata, subs, simil = "jaccard", binary = TRUE, permutations = 1000, alpha = 0.05, bonfc = TRUE)
I get the following error message:
Error in diffmean(as.numeric(sim(veg[subs == (comb[x, 1]), ], method = simil)), :
There are NA values. Consider setting na.rm accordingly
I subsequently modified the code of the function to the following (modification in bold):
if (binary) {
tmp <- lapply(c(1:nrow(comb)), function(x) diffmean(as.numeric(sim(veg[subs ==
(comb[x, 1]), ], method = simil,)), as.numeric(sim(veg[subs ==
(comb[x, 2]), ], method = simil, )), na.rm = TRUE))
Now, the function runs, but it is excluding all cases with at least 1 missing value (which is nearly half the data set!!). It seems that it is deleting cases w/ NA listwise, whereas I’d prefer pairwise deletion so that similarity coefficients can still be calculated between cases with missing values (but just excluding the variables with NA from the calculation). Is there any way to accomplish this within com.sim? I know other functions such as simil (proxy-package) can handle missing values when calculating a matrix of Jaccard coefficients, but it seems that the sim functions in simba weren’t built this way.
I’m have zero coding experience (is it obvious?) and so I would appreciate any help or advice on options to pursue!
Thank you very much, and please let me know if I can provide additional information.
Best,
Matt

Resources