R -> Error in `row.names<-.data.frame` - r

Following this other question (Get p-value about contrast hypothesis for rectangular matrix) I am trying to run the following code in R, but the line:
colnames(posmat) <- "pos_c1"
causes an error when calling the function summary().
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘Pos’
Does anybody knows why this error comes up?
Here the MWE:
library(lme4)
library(lmerTest)
library(corpcor)
database <- data.frame(
Clos=factor(c(4,4,1,4,4,3,2,1,2,1,2,2,4,3,1,2,1,4,1,3,2,2,4,4,4,4,2,1,4,2,2,1,4,2,4,2,1,4,4,3)),
Pos=factor(c(2,4,1,2,5,6,7,2,2,2,5,6,3,3,3,8,5,3,4,2,1,4,3,3,2,6,1,8,3,7,5,7,8,3,6,6,1,6,3,7)),
RF=c(8,6,2,9,7,1,7,6,3,4,6,4,5,2,5,5,3,4,1,3,1,2,3,1,2,2,3,1,8,5,2,2,7,1,9,4,5,6,4,2),
Score=c(4,3,3,5,4,3,2,4,5,2,2,3,3,4,4,4,3,2,3,3,5,4,3,4,4,2,3,4,3,4,1,2,2,2,3,4,5,3,1,2)
)
clos_c1 = c(0,0,-1,1)
clos_c2 = c(0,-1,0,1)
clos_c3 = c(-1,0,0,1)
closmat.temp = rbind(constant = 1/4,clos_c1,clos_c2,clos_c3)
closmat = solve(closmat.temp)
closmat = closmat[, -1]
closmat
pos_c1 = c(1/2,1/2,-1/6,-1/6,-1/6,-1/6,-1/6,-1/6)
posmat.temp = rbind(pos_c1)
posmat = pseudoinverse(posmat.temp)
colnames(posmat) <- "pos_c1"
contrasts(database$Clos) = closmat
contrasts(database$Pos) = posmat
model = lmer(Score~Clos+Pos+(1|RF), data = database, REML = TRUE)
summary(model)

The problem is that when you run the model, you have the contrasts(database$Pos) without colnames but just one.
You can see that by running your model variable and you will see 6 variables with the name "Pos". This causes trouble in reading the summary() command. Just by adding the line
colnames(contrasts(database$Pos))<-c("pos1","pos2","pos3","pos4","pos5","pos6","pos7")
after the creation of your contrasts(database$Pos) <- posmat
your code will work. Feel free to put the colnames you require.
The whole code is as follows then:
library(lme4)
library(lmerTest)
library(corpcor)
database <- data.frame(
Clos=factor(c(4,4,1,4,4,3,2,1,2,1,2,2,4,3,1,2,1,4,1,3,2,2,4,4,4,4,2,1,4,2,2,1,4,2,4,2,1,4,4,3)),
Pos=factor(c(2,4,1,2,5,6,7,2,2,2,5,6,3,3,3,8,5,3,4,2,1,4,3,3,2,6,1,8,3,7,5,7,8,3,6,6,1,6,3,7)),
RF=c(8,6,2,9,7,1,7,6,3,4,6,4,5,2,5,5,3,4,1,3,1,2,3,1,2,2,3,1,8,5,2,2,7,1,9,4,5,6,4,2),
Score=c(4,3,3,5,4,3,2,4,5,2,2,3,3,4,4,4,3,2,3,3,5,4,3,4,4,2,3,4,3,4,1,2,2,2,3,4,5,3,1,2)
)
clos_c1 = c(0,0,-1,1)
clos_c2 = c(0,-1,0,1)
clos_c3 = c(-1,0,0,1)
closmat.temp = rbind(constant = 1/4,clos_c1,clos_c2,clos_c3)
closmat = solve(closmat.temp)
closmat = closmat[, -1]
closmat
pos_c1 = c(1/2,1/2,-1/6,-1/6,-1/6,-1/6,-1/6,-1/6)
posmat.temp = rbind(pos_c1)
posmat <- pseudoinverse(posmat.temp)
colnames(posmat) <- "pos_c1"
contrasts(database$Clos) <- closmat
contrasts(database$Pos) <- posmat
##NEW LINE
colnames(contrasts(database$Pos))<-c("pos1","pos2","pos3","pos4","pos5","pos6","pos7")
model <- lmer(Score~Clos+Pos+(1|RF), data = database, REML = TRUE)
summary(model)
I hope it helps. Cheers!

Related

How do I resolve an integration error in Seurat?

I am new to Seurat, and am trying to run an integrated analysis of two different single-nuclei RNAseq datasets. I have been following the Seurat tutorial on integrated analysis (https://satijalab.org/seurat/articles/integration_introduction.html) to guide me, but when I ran the last line of code, I got an error.
# Loading required libraries
library(Seurat)
library(cowplot)
library(patchwork)
# Set up the Seurat Object
vgat.data <- Read10X(data.dir = "~/Desktop/VGAT Viral Data 1/")
vglut.data <- Read10X(data.dir = "~/Desktop/VGLUT3 Viral/")
# Initialize the Seurat object with the raw (non-normalized data)
vgat <- CreateSeuratObject(counts = vgat.data, project = "VGAT/VGLUT Integration", min.cells = 3, min.features = 200)
vglut <- CreateSeuratObject(counts = vglut.data, project = "VGAT/VGLUT Integration", min.cells = 3, min.features = 200)
# Merging the datasets
vgat <- AddMetaData(vgat, metadata = "VGAT", col.name = "Cell")
vglut <- AddMetaData(vglut, metadata = "VGLUT", col.name = "Cell")
merged <- merge(vgat, y = vglut, add.cell.ids = c("VGAT", "VGLUT"), project = "VGAT/VGLUT Integration")
# Split the dataset into a list of two seurat objects (vgat and vglut)
merged.list <- SplitObject(merged, split.by = "Cell")
# Normalize and Identify variable features for each dataset independently
merged.list <lapply(X = merged.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nFeatures = 2000)
})
After running the last line of code, I get the following error: Error in merged.list < lapply(X = merged.list, FUN = function(x) { :
comparison of these types is not implemented
I was wondering if anyone is familiar with Seurat and knows how I can troubleshoot this error. Any help would be greatly appreciated.

Create a multivariate matrix in tidymodels recipes::recipe()

I am trying to do a k-fold cross validation on a model that predicts the joint distribution of the proportion of tree species basal area from satellite imagery. This requires the use of the DiricihletReg::DirichReg() function, which in turn requires that the response variables be prepared as a matrix using the DirichletReg::DR_data() function. I originally tried to accomplish this in the caret:: package, but I found out that caret:: does not support multivariate responses. I have since tried to implement this in the tidymodels:: suite of packages. Following the documentation on how to register a new model in the parsnip:: (I appreciate Max Kuhn's vegetable humor) package, I created a "DREG" model and a "DR" engine. My registered model works when I simply call it on a single training dataset, but my goal is to do kfolds cross-validation, implementing the vfolds_cv(), a workflow(), and the 'fit_resample()' function. With the code I currently have I get warning message stating:
Warning message:
All models failed. See the `.notes` column.
Those notes state that Error in get(resp_char, environment(oformula)): object 'cbind(PSME, TSHE, ALRU2)' not found This, I believe is due to the use of DR_data() to preprocess the response variables into the format necessary for Dirichlet::DirichReg() to run properly. I think the solution I need to implement involve getting this pre-processing to happen in either the recipe() call or in the set_fit() call when I register this model with parsnip::. I have tried to use the step_mutate() function when specifying the recipe, but that performs a function on each column as opposed to applying the function with the columns as inputs. This leads to the following error in the "notes" from the output of fit_resample():
Must subset columns with a valid subscript vector.
Subscript has the wrong type `quosures`.
It must be numeric or character.
Is there a way to get the recipe to either transform several columns to a DirichletRegData class using the DR_data() function with a step_*() function or using the pre= argument in set_fit() and set_pred()?
Below is my reproducible example:
##Loading Necessary Packages##
library(tidymodels)
library(DirichletReg)
##Creating Fake Data##
set.seed(88)#For reproducibility
#Response variables#
PSME_BA<-rnorm(100,50, 15)
TSHE_BA<-rnorm(100,40,12)
ALRU2_BA<-rnorm(100,20,0.5)
Total_BA<-PSME_BA+TSHE_BA+ALRU2_BA
#Predictor variables#
B1<-runif(100, 0, 2000)
B2<-runif(100, 0, 1800)
B3<-runif(100, 0, 3000)
#Dataset for modeling#
DF<-data.frame(PSME=PSME_BA/Total_BA, TSHE=TSHE_BA/Total_BA, ALRU2=ALRU2_BA/Total_BA,
B1=B1, B2=B2, B3=B3)
##Modeling the data using Dirichlet regression with repeated k-folds cross validation##
#Registering the model to parsnip::#
set_new_model("DREG")
set_model_mode(model="DREG", mode="regression")
set_model_engine("DREG", mode="regression", eng="DR")
set_dependency("DREG", eng="DR", pkg="DirichletReg")
set_model_arg(
model = "DREG",
eng = "DR",
parsnip = "param",
original = "model",
func = list(pkg = "DirichletReg", fun = "DirichReg"),
has_submodel = FALSE
)
DREG <-
function(mode = "regression", param = NULL) {
# Check for correct mode
if (mode != "regression") {
rlang::abort("`mode` should be 'regression'")
}
# Capture the arguments in quosures
args <- list(sub_classes = rlang::enquo(param))
# Save some empty slots for future parts of the specification
new_model_spec(
"DREG",
args=args,
eng_args = NULL,
mode = mode,
method = NULL,
engine = NULL
)
}
set_fit(
model = "DREG",
eng = "DR",
mode = "regression",
value = list(
interface = "formula",
protect = NULL,
func = c(pkg = "DirichletReg", fun = "DirichReg"),
defaults = list()
)
)
set_encoding(
model = "DREG",
eng = "DR",
mode = "regression",
options = list(
predictor_indicators = "none",
compute_intercept = TRUE,
remove_intercept = TRUE,
allow_sparse_x = FALSE
)
)
set_pred(
model = "DREG",
eng = "DR",
mode = "regression",
type = "numeric",
value = list(
pre = NULL,
post = NULL,
func = c(fun = "predict.DirichletRegModel"),
args =
list(
object = expr(object$fit),
newdata = expr(new_data),
type = "response"
)
)
)
##Running the Model##
DF$Y<-DR_data(DF[,c(1:3)]) #Preparing the response variables
dreg_spec<-DREG(param="alternative") %>%
set_engine("DR")
dreg_mod<-dreg_spec %>%
fit(Y~B1+B2+B3, data = DF)#Model works when simply run on single dataset
##Attempting Crossvalidation##
#First attempt - simply call Y as the response variable in the recipe#
kfolds<-vfold_cv(DF, v=10, repeats = 2)
rcp<-recipe(Y~B1+B2+B3, data=DF)
dreg_fit<- workflow() %>%
add_model(dreg_spec) %>%
add_recipe(rcp)
dreg_rsmpl<-dreg_fit %>%
fit_resamples(kfolds)#Throws warning about all models failing
#second attempt - use step_mutate_at()#
rcp<-recipe(~B1+B2+B3, data=DF) %>%
step_mutate_at(fn=DR_data, var=vars(PSME, TSHE, ALRU2))
dreg_fit<- workflow() %>%
add_model(dreg_spec) %>%
add_recipe(rcp)
dreg_rsmpl<-dreg_fit %>%
fit_resamples(kfolds)#Throws warning about all models failing
This works, but I'm not sure if it's what you were expecting.
First--getting the data setup for CV and DR_data()
I don't know of any package that has built what would essentially be a translation for CV and DirichletReg. Therefore, that part is manually done. You might be surprised to find it's not all that complicated.
Using the data you created and the modeling objects you created for tidymodels (those prefixed with set_), I created the CV structure that you were trying to use.
df1 <- data.frame(PSME = PSME_BA/Total_BA, TSHE = TSHE_BA/Total_BA,
ALRU2=ALRU2_BA/Total_BA, B1, B2, B3)
set.seed(88)
kDf2 <- kDf1 <- vfold_cv(df1, v=10, repeats = 2)
For each of the 20 subset data frames identified in kDf2, I used DR_data to set the data up for the models.
# convert to DR_data (each folds and repeats)
df2 <- map(1:20,
.f = function(x){
in_ids = kDf1$splits[[x]]$in_id
dd <- kDf1$splits[[x]]$data[in_ids, ] # filter rows BEFORE DR_data
dd$Y <- DR_data(dd[, 1:3])
kDf1$splits[[x]]$data <<- dd
})
Because I'm not all that familiar with tidymodels, next conducted the modeling using DirichReg. I then did it again with tidymodels and compared them. (The output is identical.)
DirichReg Models and summaries of the fits
set.seed(88)
# perform crossfold validation on Dirichlet Model
df2.fit <- map(1:20,
.f = function(x){
Rpt = kDf1$splits[[x]]$id$id
Fld = kDf1$splits[[x]]$id$id2
daf = kDf1$splits[[x]]$data
fit = DirichReg(Y ~ B1 + B2, daf)
list(Rept = Rpt, Fold = Fld, fit = fit)
})
# summary of each fitted model
fit.a <- map(1:20,
.f = function(x){
summary(df2.fit[[x]]$fit)
})
tidymodels and summaries of the fits (the code looks the same, but there are a few differences--the output is the same, though)
# I'm not sure what 'alternative' is supposed to do here?
dreg_spec <- DREG(param="alternative") %>% # this is not model = alternative
set_engine("DR")
set.seed(88)
dfa.fit <- map(1:20,
.f = function(x){
Rpt = kDf1$splits[[x]]$id$id
Fld = kDf1$splits[[x]]$id$id2
daf = kDf1$splits[[x]]$data
fit = dreg_spec %>%
fit(Y ~ B1 + B2, data = daf)
list(Rept = Rpt, Fold = Fld, fit = fit)
})
afit.a <- map(1:20,
.f = function(x){
summary(dfa.fit[[x]]$fit$fit) # extra nest for parsnip
})
If you wanted to see the first model?
fit.a[[1]]
afit.a[[1]]
If you wanted the model with the lowest AIC?
# comare AIC, BIC, and liklihood?
# what do you percieve best fit with?
fmin = min(unlist(map(1:20, ~fit.a[[.x]]$aic))) # dir
# find min AIC model number
paste0((map(1:20, ~ifelse(fit.a[[.x]]$aic == fmin, .x, ""))), collapse = "")
fit.a[[19]]
afit.a[[19]]

randomforest model prediction error on new data

I am trying to train a randomForest model to take in new data but do not know why it is not working.
I read the data in from the clipboard using the following and build my RF model.
data <- read.table(file = "clipboard", strip.white = TRUE, sep = ',', header = FALSE, skip = 0)
y <- data[, 1]
x <- data[, 2]
rfmodel <- randomForest(y ~ x)
nd <- c(0.09, 0.9, 0.8)
predict(rfmodel, newdata = nd)
It throws an error out when the new data nd does not match the size of the training data.
Any help would be great since I do no know why the new data is not being "predicted".
Data:
2.81,5.62
7.14,8.00
2.72,5.44
3.87,7.74
1.90,3.80
7.82,8.00
7.02,8.00
5.50,8.00
9.15,8.00
4.87,8.00
8.08,8.00
5.58,8.00
9.13,8.00
0.14,0.28
2.00,4.00
5.47,8.00
0.80,1.60
4.37,8.00
5.31,8.00
0.00,0.00
1.78,3.56
3.45,6.90
6.13,8.00
3.53,7.06
4.61,8.00
1.76,3.52
6.39,8.00
0.02,0.04
9.69,8.00
5.33,8.00
6.37,8.00
5.55,8.00
7.80,8.00
2.06,4.12
7.79,8.00
2.24,4.48
9.71,8.00
1.11,2.22
8.38,8.00
2.33,4.66
1.83,3.66
5.94,8.00
9.20,8.00
1.14,2.28
4.15,8.00
8.43,8.00
5.68,8.00
8.21,8.00
1.75,3.50
2.16,4.32
4.93,8.00
5.75,8.00
1.26,2.52
3.97,7.94
4.39,8.00
7.53,8.00
1.98,3.96
1.66,3.32
2.04,4.08
11.72,8.00
4.64,8.00
4.71,8.00
3.77,7.54
9.33,8.00
1.83,3.66
2.15,4.30
1.58,3.16
9.29,8.00
1.27,2.54
8.49,8.00
5.39,8.00
3.47,6.94
6.48,8.00
4.11,8.00
1.85,3.70
8.79,8.00
0.13,0.26
1.44,2.88
5.96,8.00
3.42,6.84
1.89,3.78
1.98,3.96
5.26,8.00
0.39,0.78
6.05,8.00
1.99,3.98
1.58,3.16
3.99,7.98
4.35,8.00
6.71,8.00
2.58,5.16
7.37,8.00
5.77,8.00
3.97,7.94
3.65,7.30
4.38,8.00
8.06,8.00
8.05,8.00
1.10,2.20
6.65,8.00

Error in colnames

Could anyone help me with some little problem?
When I plot the frontier I get the following message: "Error in colnames<-(tmp, value = c("targetRisk", "targetReturn")) :
attempt to set 'colnames' on an object with less than two dimensions"(see below for detail). How could I solve this. Thanks a lot.
Portfolio construction & Optimisation
Assets: LUTAX, PFODX,BRGAX,GFAFX,NMSAX,EGINX,IPOYX,SCWFX,FGLDX,PAGEX
Getting monthly returns of the assets
library(quantmod)
library(tseries)
library(timeSeries)
LUTAX <- monthlyReturn((getSymbols("LUTAX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(LUTAX) <- c("LUTAX")
PFODX <- monthlyReturn((getSymbols("PFODX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(PFODX) <- c("PFODX")
BRGAX <- monthlyReturn((getSymbols("BRGAX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(BRGAX) <- c("BRGAX")
GFAFX <- monthlyReturn((getSymbols("GFAFX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(GFAFX) <- c("GFAFX")
NMSAX <- monthlyReturn((getSymbols("NMSAX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(NMSAX) <- c("NMSAX")
EGINX <- monthlyReturn((getSymbols("EGINX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(EGINX) <- c("EGINX")
IPOYX <- monthlyReturn((getSymbols("IPOYX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(IPOYX) <- c("IPOYX")
SCWFX <- monthlyReturn((getSymbols("SCWFX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(SCWFX) <- c("SCWFX")
FGLDX <- monthlyReturn((getSymbols("FGLDX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(FGLDX) <- c("FGLDX")
PAGEX <- monthlyReturn((getSymbols("PAGEX",auto.assign=FALSE)[,4]),type = "arithmetic")
colnames(PAGEX) <- c("PAGEX")
Merging returns of the assets (excluding NA's)
portfolio_returns <- merge(LUTAX, PFODX,BRGAX,GFAFX,NMSAX,EGINX,IPOYX,SCWFX,FGLDX,PAGEX,all=F)
data <- as.timeSeries(portfolio_returns)
Optimisation portfolio
library(fPortfolio)
spec <- portfolioSpec()
setNFrontierPoints <- 25
setSolver(spec) <- "solveRquadprog"
constraints <- c("minW[1:1]=0.12","maxW[1:1]=0.18","minW[2:2]=0.12","maxW[2:2]=0.18",
"minW[3:3]=0.10","maxW[3:3]=0.15","minW[4:4]=0.08","maxW[4:4]=0.12",
"minW[5:5]=0.08","maxW[5:5]=0.12","minW[6:6]=0.05","maxW[6:6]=0.10",
"minW[7:7]=0.05","maxW[7:7]=0.10","minW[8:8]=0.08","maxW[8:8]=0.12",
"minW[9:9]=0.05","maxW[9:9]=0.10","minW[10:10]=0.08","maxW[10:10]=0.12",
"minsumW[c(1:1,2:2)]=0.27","maxsumW[c(1:1,2:2)]=0.33",
"minsumW[c(3:3,4:4,6:6,10:10)]=0.37","maxsumW[c(3:3,4:4,6:6,10:10)]=0.43",
"minsumW[c(5:5,7:7,8:8,9:9)]=0.27","maxsumW[c(5:5,7:7,8:8,9:9)]=0.33",
"maxsumW[c(1:1,2:2,3:3,4:4,5:5,6:6,7:7,8:8,9:9,10:10)]=1")
portfolioConstraints(data,spec,constraints)
frontier<- portfolioFrontier(data,spec,constraints)
print(frontier)
tailoredFrontierPlot(frontier)
After running the last command above I get the following message: "Error in colnames<-(tmp, value = c("targetRisk", "targetReturn")) :
attempt to set 'colnames' on an object with less than two dimensions"

R : How to deal with "non-unique row.names given" in "predict.sarlm"

I'm facing an error in R trying to predict a sac model with new data. I'm using the following code:
# I've fitted a sac model and now I'm trying to predict for a new dataset:
>newdata = read.csv("location" , header=T , sep=",")
#Calculate the new matrix from this new data:
>cord <- coordinates(newdata[,c("X","Y")])
>k1 <- knn2nb(knearneigh(cord, k=5))
>all.linked <- max(unlist(nbdists(k1, cord)))
>d <- dnearneigh(cord, 0, all.linked)
>dlist <- nbdists(d, coordinates(cord))
>wn = nb2listw(d, glist=dlist, style="W")
#predict:
>prediction = predict.sarlm(sac_model, newdata = newdata, pred.type = "TC", wn )
Here's where the problem appears:
Error in mat2listw(W, row.names = region.id, style = style) : non-unique row.names given
Honestly, I can't see where is the error in the code.

Resources