I have a list that contains 24 TraMineR sequence objects. Now I want to calculate the Optimal Matching distances for each of these sequence objects (only within each object) and store it in a new list, now consisting of 24 OM distance objects (distance matrices).
The dataset can be found here.
library(TraMineR)
sequences <- read.csv(file = "event-stream-20-l-m.csv", header = TRUE, nrows=10)
repo_names = colnames(sequences)
# 1. Loop across and define the 24 sequence objects & store them in sequence_objects
colpicks <- seq(10,240,by=10)
sequence_objects <- mapply(function(start,stop) seqdef(sequences[,start:stop]), colpicks- 9, colpicks)
# 2. Calculate the costs for OM distances within each object
costs <- mapply(seqsubm(sequence_objects, method="TRATE"))
# 3. Calculate the OM distance objects for each sequence object
sequences.om <- seqdist(sequence_objects, method="OM", indel=1, sm=costs, with.missing=FALSE, norm="maxdist")
Step (1) works fine, but when I progress to step (2), it tells me:
Error in seqsubm(sequence_objects, method = "TRATE") :
[!] data is NOT a sequence object, see seqdef function to create one
This is natural, because sequence_objects is not a sequence object, but a list of sequence objects.
How can I apply the seqsubm function to a list of sequence objects?
I'm not familiar with the TraMineR package, however it looks like you are trying to iterate over the elements of sequence_objects.
mapply is for iterating over multiple objects simultaneously.
lapply in contrast is for iterating over a single object.
Therefore, the following might work for you:
costs <- lapply(sequence_objects, seqsubm, method="TRATE")
Related
I am trying to calculate network indexes (clustering, modularity, edge density, degree, centrality etc) from 1000 simulated null matrices using the igraph package in R. The data I'm using is a mixed-species bird flock data that I've used to generate the null matrices.
Here's the code:
## Construct null matrices ##
library(EcoSimR)
library(igraph)
# creating a 1000 empty matrices
fl_emp <- lapply(1:1000, function(i) data.frame())
# simulating 1000 matrices by randomization
fl_wp_n <- replicate(1000, sim5(fl_wp[,3:ncol(fl_wp)]),simplify = FALSE) #fl_wp is the raw data
#sim5 function is from the package 'EcoSimR'
for(i in 1:length(fl_emp))
{
fl_wp_ig <- graph_from_incidence_matrix(fl_wp_n[[i]]) #Creating new igraph object to convert the null matrices to igraph objects to calculate network indexes
fl_wp_cw <- cluster_walktrap(fl_wp_ig[[i]])
fl_wp_mod <- modularity(fl_wp_cw[[i]]) ##Network index, this does not work
}
Here's what the simulated matrices look like(fl_wp_n) :
[1]: https://i.stack.imgur.com/1Q0Na.png
It is basically a list of 1000 elements, where each element is a simulated 133x74 matrix where the rows represent flock ID and the columns represent Species ID.
This is the error I'm getting when I run the loop:
> for(i in 1:length(fl_emp))
+ {
+ fl_wp_ig <- graph_from_incidence_matrix(fl_wp_n[[i]])
+ fl_wp_cw <- cluster_walktrap(fl_wp_ig[[i]])
+ fl_wp_mod <- modularity(fl_wp_cw[[i]])
+ }
Error in cluster_walktrap(fl_wp_ig[[i]]) : Not a graph object!
It seems to be not recognizing fl_wp_ig as an igraph object. Any idea why?
Is there a better way to do calculate indices for a 1000 matrices in one loop?
Sorry if this is a dumb question, I'm new to igraph and R in general
Thanks a lot in advance!
If you have a look at the documentation for 1. cluster_walktrap, you will see the function expects a graph object. As #Szabolcs pointed out, when you are index fl_wp_ig[[i]] in the for-loop, you are returning the vertices adjacent to vertex [[i]], but not the graph itself. You only should iterate over fl_wp_n[[i]] because you want to use every time a 'matrix' but not the other variables.
So you could try:
list_outputs = list()
for(i in 1:length(fl_emp))
{
# fl_wp_n[[i]] gets 1 matrix each iteration. Output -> graph object
fl_wp_ig <- graph_from_incidence_matrix(fl_wp_n[[i]])
# Use the whole graph object fl_wp_ig
fl_wp_cw <- cluster_walktrap(fl_wp_ig)
# Use the whole fl_wp_cw output
fl_wp_mod <- modularity(fl_wp_cw)
# NOTE that you are not storing the result of each iteration in a variable to keep it,
# you are overwritting fl_wp_mod
# You could have create a empty list before the for-loop and then fill it
list_outputs = append(list_outputs, fl_wp_mod)
}
Also, if you find it difficult to see the whole picture, you could try to create a custom function and use apply methods instead of a for-loop.
# Custom function
cluster_modularity = function(graph_object){
# takes only one graph_object at time
fl_wp_ig <- graph_from_incidence_matrix(graph_object)
fl_wp_cw <- cluster_walktrap(fl_wp_ig)
fl_wp_mod <- modularity(fl_wp_cw)
}
# Iterate using lapply to store the outputs in a list - for example
list_outputs = lapply(fl_wp_n, cluster_modularity)
I am learning cross validation method.
In the lines below, the input and query are both a data frame.
my.knn <- get.knnx(input,query,k=2)
nn.index <- my.knn$nn.index
What does the second line mean? What will nn.index be?
my.knn is a list of variables. So nn.index is taking that value out of the list so you can work on it as a single variable.
EXAMPLE OF GETTING ELEMENTS OUT OF A LIST
stats <- list("mean" = 10, "data" = c(0, 10 ,20))
#just get the average out
my.average <- stats$mean
So a list can have different kind of results from your testing, and can have a mix of variable types (integers, strings, vectors). The $ syntax is taking one of the variables out of the list into a single variable.
If you type my.knn at the prompt you will see its contents with sections marked with $. This will help see what is in your list.
In the example:
> stats
$mean
[1] 10
$data
[1] 0 10 20
SPECIFICS ON FUNCTION
I looked at get.knnx function notes, assuming you are using FNN package, here http://www.inside-r.org/packages/cran/fnn/docs/get.knn:
Output a list contains:
nn.index
an n x k matrix for the nearest neighbor indice(s).
nn.dist
an n x k matrix for the nearest neighbor Euclidean distances.
So you can see your function output list has these two variables - an index of the nearest neighbour, and the second is the distances.
Trust this helps.
I am running the code below to generate a list of TraMineR sequence objects. The dataset can be found [here][1].
library(TraMineR)
sequences <- read.csv(file = "event-stream-20-l-m.csv", header = TRUE, nrows=10)
repo_names = colnames(sequences)
# 1. Loop across and define the 24 sequence objects & store them in sequence_objects
colpicks <- seq(10,240,by=10)
sequence_objects <- mapply(function(start,stop) seqdef(sequences[,start:stop]),
colpicks - 9, colpicks)
However, if I run:
test <- sequence_objects[1]
seqdist(test, indel=1, with.missing=FALSE, norm="maxdist")
The error message I receive is:
Error: [!] data is not a state sequence object, use 'seqdef' function to create one
How can it be that the mapply using seqdef does not create a list of sequence objects?
mapply by default simplifies the return value.
As per the comment in the previous question, try including SIMPLIFY=FALSE in the mapply call.
So I've created an object of 12 binary files. As part of the analysis that I want to do, I compare one of the 12 against the other 11, using functions to do some analysis.
i.e.
In loop one, object$1 compared against object$1 2:12,
loop two, object$2 against object$ 1,3:12
...
loop 12, object$12 against object$1[1:11]
I can do it on a small scale manually, by specifying the file names. But as it involves comparing all 12 against each other, and I have many groups of 12 files (250 files in total) to work ok, how I automate this?
The eventual output is a data frame, so I'd like that to be created in each loop too (with the relevant file name, like object$1.csv or something).
firstbatch <-bams[1:12] #bams is character vector of the files
bedfile <- "filename.bed"
my.counts <- getBamCounts(bed.file = bedfile, bam.files = firstbatch) #creates object
my.test <- firstbatch$1
my.ref.samples <- firstbatch$2...firstbatch$12
series of functions comparing $1 against 2:12
maybe you cold use this procedure :
a <- combn(12,2) # will give you all possible combinations
for (i in 1:dim(a)[2]) { #loops over all possible combinations
firstbatch[ a [1,i]] # first sample name to compare
firstbatch[ a [2,i]] # second sample name to compare against
...
}
I'd like to create a list of Igraph objects with the data used for each Igraph object determined by another variable.
This is how I create a single Igraph object
netEdges <- NULL
for (idi in c("nom1", "nom2", "nom3")) {
netEdge <- net[c("id", idi)]
names(netEdge) <- c("id", "friendID")
netEdge$weight <- 1
netEdges <- rbind(netEdges, netEdge)
}
g <- graph.data.frame(netEdges, directed=TRUE)
For each unique value of net$community I'd like to make a new Igraph object. Then I would like to calculate measures of centrality for each object and then bring those measures back into my net dataset. Many thanks for your help!
Since the code you provide isn't completely reproducible, what follows is not guaranteed to run. It is intended as a guide for how to structure a real solution. If you provide example data that others can use to run your code, you will get better answers.
The simplest way to do this is probably to split net into a list with one element for each unique value of community and then apply your graph building code to each piece, storing the results for each piece in another list. There are several ways to doing this type of thing in R, one of which is to use lapply:
#Break net into pieces based on unique values of community
netSplit <- split(net,net$community)
#Define a function to apply to each element of netSplit
myFun <- function(dataPiece){
netEdges <- NULL
for (idi in c("nom1", "nom2", "nom3")) {
netEdge <- dataPiece[c("id", idi)]
names(netEdge) <- c("id", "friendID")
netEdge$weight <- 1
netEdges <- rbind(netEdges, netEdge)
}
g <- graph.data.frame(netEdges, directed=TRUE)
#This will return the graph itself; you could change the function
# to return other values calculated on the graph
g
}
#Apply your function to each subset (piece) of your data:
result <- lapply(netSplit,FUN = myFun)
If all has gone well, result should be a list containing a graph (or whatever you modified myFun to return) for each unique value of community. Other popular tools for doing similar tasks include ddply from the plyr package.