passing arguments to igraph function within sapply - r

I am new to network analyses and ERGM models and have a list of lists which has 100 Erdos Renyi Models as content and was created with the code below
set.seed(666)
gs <- list()
for (x in seq_len(100L)) {
gs[[x]] <- erdos.renyi.game(374, 0.0084, type = "gnp", directed = F)
E(gs[[x]])$weight <- sample(1:5, ecount(gs[[x]]), T)
}
Now I would like to calculate the mean path length between two nodes as well as the average clustering across these 100 models.
For the mean path length I used the following code:
random_mean_paths <- sapply(gs, igraph::mean_distance, 1:100)
mean(random_mean_paths)
However, if I try the same with igraph::transitivity , i.e.
random_mean_clus <- sapply(gs, igraph::transitivity, 1:100)
I get the error
Error in match.arg(arg = arg, choices = choices, several.ok = several.ok) :
'arg' must be of length 1
and when trying to resolve this error by setting type = "global", i.e.
random_mean_clus <- sapply(gs, igraph::transitivity(type = "global", 1:100)
I get the error argument "graph" is missing with no default
I cannot specify gs in the transitivity() function, since it is not an igraph object and I am stuck trying to pass the correct argument to this function.
Thanks in advance.

Any of
random_mean_clus <- sapply(gs, igraph::transitivity, type = "global", 1:100)
random_mean_clus <- sapply(gs, \(s) igraph::transitivity(s, type = "global", 1:100))
will solve the problem.
The first includes the named argument in the arguments list and the second defines an anonymous function, \(s) using the new way introduced in R 4.1.

Related

Can not inspect S4 object after modification

I am having problems with my S4 object resafter I appended a list of values to it. The object was created with the DESeq2 package. The object was created via:
dds <- DESeqDataSetFromMatrix(countData = count.matrix,
colData = coldata,
design = ~ Condition)
dds <- DESeq(dds, test = "Wald")
res <- results(dds)
I did the following:
x <- qvalue(res#listData[["pvalue"]]) #calc qvalues based on pvalues from S4 object 'res'
res#listData[["qval"]] <- x[["qvalues"]] #append qvalues from x to 'res' as new col named "qval"
Now when I try to inspect the object with head() I get the following error:
> head(res)
Error in `rownames<-`(`*tmp*`, value = names(x)) :
invalid rownames length
The funny thing is that with View()I can inspect the S4 object in RStudio and I can see that everything went fine, adding the qvalues. Does anyone know why this happens? Is there a way to avoid that?
For you to get the qvalues.. you can do this first:
library(qvalue)
library(DESeq2)
dds = makeExampleDESeqDataSet()
dds = DESeq(dds)
res = results(dds)
res$qvalue = qvalue(res$pvalue)$qvalue
I will follow up with why there is an error.. you need to look into how it is constructed.

Using K- Protoypes in R

I'd like to try the K prototypes for my data, but when I use the code:
kpres <- kproto(#name of data", "number of cluster", lambda = xx)
I get the following mistake:
Error in Ops.data.frame(x[, j], rep(protos[i, j], nrows)) : list of
length 3994 not meaningful
Does somebody know how to fix it?
Thanks!!
You need to convert the data into a dataframe first and then pass it to kproto.
For example:
data<- as.data.frame(data)
kpres <- kproto(data, "number of cluster", lambda = xx)

How to debug "invalid subscript type 'list'" error in R (genalg package)

I am new to genetic algorithms and am trying a simple variable selection code based on the example on genalg package's documentation:
data(iris)
library(MASS)
X <- cbind(scale(iris[,1:4]), matrix(rnorm(36*150), 150, 36))
Y <- iris[,5]
iris.evaluate <- function(indices) {
result = 1
if (sum(indices) > 2) {
huhn <- lda(X[,indices==1], Y, CV=TRUE)$posterior
result = sum(Y != dimnames(huhn)[[2]][apply(huhn, 1,
function(x)
which(x == max(x)))]) / length(Y)
}
result
}
monitor <- function(obj) {
minEval = min(obj$evaluations);
plot(obj, type="hist");
}
woppa <- rbga.bin(size=40, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, verbose=TRUE, monitorFunc=monitor)
The code works just fine on its own, but when I try to apply my dataset (here), I get the following error:
X <- reducedScaledTrain[,-c(541,542)]
Y <- reducedScaledTrain[,542]
ga <- rbga.bin(size=540, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, verbose=TRUE, monitorFunc=monitor)
Testing the sanity of parameters...
Not showing GA settings...
Starting with random values in the given domains...
Starting iteration 1
Calucating evaluation values... Error in dimnames(huhn)[[2]][apply(huhn, 1, function(x) which(x == max(x)))] :
invalid subscript type 'list'
I am trying to perform feature selection on 540 variables (I've eliminated the variables with 100% correlation) using LDA. I've tried transforming my data into numeric or list, but to no avail. I have also tried entering the line piece by piece, and the 'huhn' line works just fine with my data. Please help, I might be missing something...

I do not understand error "object not found" inside the function

I have roughly this function:
plot_pca_models <- function(models, id) {
library(lattice)
splom(models, groups=id)
}
and I'm calling it like this:
plot_pca_models(data.pca, log$id)
wich results in this error:
Error in eval(expr, envir, enclos) : object 'id' not found
when I call it without the wrapping function:
splom(data.pca, groups=log$id)
it raises this error:
Error in log$id : object of type 'special' is not subsettable
but when I do this:
id <- log$id
splom(models, groups=id)
it behaves as expected.
Please can anybody explain why it behaves like this and how to correct it? Thanks.
btw:
I'm aware of similar questions here, eg:
Help understand the error in a function I defined in R
Object not found error with ddply inside a function
Object disappears from namespace in function
but none of them helped me.
edit:
As requested, there is full "plot_pca_models" function:
plot_pca_models <- function(data, id, sel=c(1:4), comp=1) {
# 'data' ... princomp objects
# 'id' ... list of samples id (classes)
# 'sel' ... list of models to compare
# 'comp' ... which pca component to compare
library(lattice)
models <- c()
models.size <- 1:length(data)
for(model in models.size) {
models <- c(models, list(data[[model]]$scores[,comp]))
}
names(models) <- 1:length(data)
models <- do.call(cbind, models[sel])
splom(models, groups=id)
}
edit2:
I've managed to make the problem reproducible.
require(lattice)
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100))
my.id <- data.frame(id = sample(letters[1:4], 100, replace = TRUE))
plot_pca_models2 <- function(x, ajdi) {
splom(x, group = ajdi)
}
plot_pca_models2(x = my.data, ajdi = my.id$id)
which produce the same error like above.
The problem is that splom evaluates its groups argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:
f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))
# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)
log is a function in base R. Good practice is to not name objects after functions...it can create confusion. Type log$test into a clean R session and you'll see what's happening:
object of type 'special' is not subsettable
Here's a modification of Hong Oi's answer. First I would recommend to include id in the main data frame, i.e
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100), id = sample(letters[1:4], 100, replace = TRUE))
.. and then
plot_pca_models2 <- function(x, ajdi) {
Call <- bquote(splom(x, group = x[[.(ajdi)]]))
eval(Call)
}
plot_pca_models2(x = my.data, ajdi = "id")
The cause of the confusion is the following line in lattice:::splom.formula:
groups <- eval(substitute(groups), data, environment(formula))
... whose only point is to be able to specify groups without quotation marks, that is,
# instead of
splom(DATA, groups="ID")
# you can now be much shorter, thanks to eval and substitute:
splom(DATA, groups=ID)
But of course, this makes using splom (and other functions e.g. substitute which use "nonstandard evaluation") harder to use from within other functions, and is against the philosophy that is "mostly" followed in the rest of R.

R: make pls calibration models from n number of subset and use them to predict different test sets

I am trying to apply a function I wrote that uses the 'pls' package to make a model and then use it
to predict several test set(in this case 9), returning the R2,RMSEP and prediction bias of each test set
for n number of subset selected from the data frame.
the function is
cpo<-function(data,newdata1,newdata2,newdata3,newdata4,newdata5,newdata6,newdata7,newdata8,newdata9){
data.pls<-plsr(protein~.,8,data=data,validation="LOO")#making a pls model
newdata1.pred<-predict(data.pls,8,newdata=newdata1) #using the model to predict test sets
newdata2.pred<-predict(data.pls,8,newdata=newdata2)
newdata3.pred<-predict(data.pls,8,newdata=newdata3)
newdata4.pred<-predict(data.pls,8,newdata=newdata4)
newdata5.pred<-predict(data.pls,8,newdata=newdata5)
newdata6.pred<-predict(data.pls,8,newdata=newdata6)
newdata7.pred<-predict(data.pls,8,newdata=newdata7)
newdata8.pred<-predict(data.pls,8,newdata=newdata8)
newdata9.pred<-predict(data.pls,8,newdata=newdata9)
pred.bias1<-mean(newdata1.pred-newdata1[742]) #calculating the prediction bias
pred.bias2<-mean(newdata2.pred-newdata2[742])
pred.bias3<-mean(newdata3.pred-newdata3[742]) #[742] reference values in column742
pred.bias4<-mean(newdata4.pred-newdata4[742])
pred.bias5<-mean(newdata5.pred-newdata5[742])
pred.bias6<-mean(newdata6.pred-newdata6[742])
pred.bias7<-mean(newdata7.pred-newdata7[742])
pred.bias8<-mean(newdata8.pred-newdata8[742])
pred.bias9<-mean(newdata9.pred-newdata9[742])
r<-c(R2(data.pls,"train"),RMSEP(data.pls,"train"),pred.bias1,
pred.bias2,pred.bias3,pred.bias4,pred.bias5,pred.bias6,
pred.bias7,pred.bias8,pred.bias9)
return(r)
}
selecting n number of subsets (based on an answer from my question[1]: Select several subsets by taking different row interval and appy function to all subsets
and applying cpo function to each subset I tried
Edited based on #Gavin advice
FO03 <- function(data, nSubsets, nSkip){
outList <- vector("list", 11)
names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
sub <- vector("list", length = nSubsets) # sub is the n number subsets created by selecting rows
names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))
totRow <- nrow(data)
for (i in seq_len(nSubsets)) {
rowsToGrab <- seq(i, totRow, nSkip)
sub[[i]] <- data[rowsToGrab ,]
}
for(i in sub) { #for every subset in sub i want to apply cpo
outList[[i]] <- cpo(data=sub,newdata1=gag11p,newdata2=gag12p,newdata3=gag13p,
newdata4=gag21p,newdata5=gag22p,newdata6=gag23p,
newdata7=gag31p,newdata8=gag32p,newdata9=gag33p) #new data are test sets loaded in the workspace
}
return(outlist)
}
FOO3(GAGp,10,10)
when I try this I keep getting 'Error in eval(expr, envir, enclos) : object 'protein' not found' not found.
Protein is used in the plsr formula of cpo, and is in the data set.
I then tried to use the plsr function directly as seen below
FOO4 <- function(data, nSubsets, nSkip){
outList <- vector("list", 11)
names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
sub <- vector("list", length = nSubsets)
names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))
totRow <- nrow(data)
for (i in seq_len(nSubsets)) {
rowsToGrab <- seq(i, totRow, nSkip)
sub[[i]] <- data[rowsToGrab ,]
}
cal<-vector("list", length=nSubsets) #for each subset in sub make a pls model for protein
names(cal)<-c(paste("cal",1:nSubsets, sep=""))
for(i in sub) {
cal[[i]] <- plsr(protein~.,8,data=sub,validation="LOO")
}
return(outlist) # return is just used to end script and check if error still occurs
}
FOO4(gagpm,10,10)
When I tried this I get the same error 'Error in eval(expr, envir, enclos) : object 'protein' not found'.
Any advice on how to deal with this and make the function work will be much appreciated.
I suspect the problem is immediately at the start of FOO3():
FOO3 <- function(data, nSubsets, nSkip) {
outList <- vector("list", r <- c(R2(data.pls,"train"), RMSEP(data.pls,"train"),
pred.bias1, pred.bias2, pred.bias3, pred.bias4, pred.bias5,
pred.bias6, pred.bias7, pred.bias8, pred.bias9))
Not sure what you are trying to do when creating outList, but vector() has two arguments and you seem to be assigning to r a vector of numerics that you want R to use as the length argument to vector().
Here you are using the object data.pls and this doesn't exist yet - and never will in the frame of FOO3() - it is only ever created in cpo().
Your second loop looks totally wrong - you are not assigning the output from cpo() to anything. I suspect you wanted:
outList <- vector("list", 11)
names(outList) <- c("R2train","RMSEPtrain", paste("bias", 1:9, sep = ""))
....
for(i in subset) {
outList[[i]] <- cpo(....)
}
return(outList)
But that depends on what subset is etc. You also haven't got the syntax for this loop right. You have
for(i in(subset)) {
when it should be
for(i in subset) {
And subset and data aren't great names as these are common R functions and modelling arguments.
There are lots of problems with your code. Try to start simple and build up from there.
I have managed to achieved what i wanted using this, if there is a better way of doing it (i'm sure there must be) I'm eager to learn.This function preforms the following task
1. select "n" number of subsets from a dataframe
2. For each subset created, a plsr model is made
3. Each plsr model is used to predict 9 test sets
4. For each prediction, the prediction bias is calculated
far5<- function(data, nSubsets, nSkip){
sub <- vector("list", length = nSubsets)
names(sub) <- c( paste("sub", 1:nSubsets, sep = ""))
totRow <- nrow(data)
for (i in seq_len(nSubsets)) {
rowsToGrab <- seq(i, totRow, nSkip)
sub[[i]] <- data[rowsToGrab ,]} #sub is the subsets created
mop<- lapply(sub,cpr2) #assigning output from cpr to mop
names(mop)<-c(paste("mop", mop, sep=""))
return(names(mop))
}
call: far5(data,nSubsets, nSkip))
The first part -selecting the subsets is based on the answer to my question Select several subsets by taking different row interval and appy function to all subsets
I was then able to apply the function cpr2 to the subsets created using "lapply" instead of the "for' loop as was previously done.
cpr2 is a modification of cpo, for which only data is supplied, and the new data to be predicted is used directly in the function as shown below.
cpr2<-function(data){
data.pls<-plsr(protein~.,8,data=data,validation="LOO") #make plsr model
gag11p.pred<-predict(data.pls,8,newdata=gag11p) #predict each test set
gag12p.pred<-predict(data.pls,8,newdata=gag12p)
gag13p.pred<-predict(data.pls,8,newdata=gag13p)
gag21p.pred<-predict(data.pls,8,newdata=gag21p)
gag22p.pred<-predict(data.pls,8,newdata=gag22p)
gag23p.pred<-predict(data.pls,8,newdata=gag23p)
gag31p.pred<-predict(data.pls,8,newdata=gag31p)
gag32p.pred<-predict(data.pls,8,newdata=gag32p)
gag33p.pred<-predict(data.pls,8,newdata=gag33p)
pred.bias1<-mean(gag11p.pred-gag11p[742]) #calculate prediction bias
pred.bias2<-mean(gag12p.pred-gag12p[742])
pred.bias3<-mean(gag13p.pred-gag13p[742])
pred.bias4<-mean(gag21p.pred-gag21p[742])
pred.bias5<-mean(gag22p.pred-gag22p[742])
pred.bias6<-mean(gag23p.pred-gag23p[742])
pred.bias7<-mean(gag31p.pred-gag31p[742])
pred.bias8<-mean(gag32p.pred-gag32p[742])
pred.bias9<-mean(gag33p.pred-gag33p[742])
r<-signif(c(pred.bias1,pred.bias2,pred.bias3,pred.bias4,pred.bias5,
pred.bias6,pred.bias7,pred.bias8,pred.bias9),2)
out<-c(R2(data.pls,"train",ncomp=8),RMSEP(data.pls,"train",ncomp=8),r)
return(out)
} #signif use to return 2 decimal place for prediction bias
call:cpr2(data)
I was able to use this to solve my problem, however since the amount of new data to be predicted was only nine, it was possible to list them out as i did. If there is a more generalized way to do this I'm interested in learning.

Resources