Using mapply to create sequence objects in R/TraMineR? - r

I am running the code below to generate a list of TraMineR sequence objects. The dataset can be found [here][1].
library(TraMineR)
sequences <- read.csv(file = "event-stream-20-l-m.csv", header = TRUE, nrows=10)
repo_names = colnames(sequences)
# 1. Loop across and define the 24 sequence objects & store them in sequence_objects
colpicks <- seq(10,240,by=10)
sequence_objects <- mapply(function(start,stop) seqdef(sequences[,start:stop]),
colpicks - 9, colpicks)
However, if I run:
test <- sequence_objects[1]
seqdist(test, indel=1, with.missing=FALSE, norm="maxdist")
The error message I receive is:
Error: [!] data is not a state sequence object, use 'seqdef' function to create one
How can it be that the mapply using seqdef does not create a list of sequence objects?

mapply by default simplifies the return value.
As per the comment in the previous question, try including SIMPLIFY=FALSE in the mapply call.

Related

Output of a function in R change with the number of inputs

I am trying to run a function to download data from the USGS website using dataRetrieval package of R and a function I have created called getstreamflow. The code is the following:
siteNumber <- c("094985005","09498501","09489500","09489499","09498502")
Streamflow <- sapply(siteNumber, function(siteNumber) tryCatch(getstreamflow(siteNumber), error = function(e) message(paste("Error in station ", siteNumber))))
Streamflow <- Filter(NROW,Streamflow) #to delete empty data frames
I got the output I want that it is the one shown in the image below:
However, when I ran the same code but increase the number of stations in the input siteNumber
The output change and instead to produce several dataframes inside of a list. It generates a list for each data frame.
Does someone know why this happens? It is the same function only changes the number of stations in the siteNumber
Based on the image showed in the new data, each element in the list is nested as a list. We can extract the list element (of length 1) with [[1]] and then apply the Filter
out <- Filter(NROW, lapply(Streamflow, function(x) x[[1]]))
As we used NROW, it passed the test for list as well where it returns 1 for length attribute of list and thus all the elements meet the condition TRUE. Also, in the previous step, OP uses sapply and sapply is one function which can sometimes simplify the output. Instead of sapply use lapply (or specify simplify = FALSE)

How to get vector of index of list in R

I try to get index of list, for example set_of_paragraphs("Two sample t-test",list_new2[[1]], list_new2[[2]],list_new2[[3]],list_new2[[4]])...and so on.
library(ReporteRs)
list_new <- c("Text1","Text2","String","Another Text")
my_text <- letters[1:length(list_new)]
list_new1 <- paste0(my_text, list_new,sep="")
list_new2 <- lapply(list_new1, function(i) pot(substr(i,1,1),textProperties(color="blue",vertical.align="superscript"))+substring(i,2))
Function set_of_paragraphs works only when I list all of index in a list
set_of_paragraphs("Two sample t-test",list_new2[[1]], list_new2[[2]],list_new2[[3]],list_new2[[4]])
I try to do this way, set_of_paragraphs gives me error
Error in set_of_paragraphs(l, list_new2) :
set_of_paragraphs can only contains pot objects.
l <- list("Two sample t-test")
set_of_paragraphs(l,list_new2)
So the best way for me to do list them all like this code set_of_paragraphs("Two sample t-test",list_new2[[1]], list_new2[[2]],list_new2[[3]],list_new2[[4]]), but the problem, I have so many, there is any way to write loop or apply to access index.
If you want to call a function with a list of parameters you can use do.call. Try
l <- list("Two sample t-test")
do.call("set_of_paragraphs" c(l, list_new2))
This is the equivalent of
set_of_paragraphs(l[[1], list_new2[[1]], list_new2[[2]], list_new2[[3]], ...)
(I can not test because that package seems to require Java which I do not have installed.) Basically you put all the parameters into one big list (here I use c() to join two lists).

Is there a more efficient/clean approach to an eval(parse(paste0( set up?

Sometimes I have code which references a specific dataset based on some variable ID. I have then been creating lines of code using paste0, and then eval(parse(...)) that line to execute the code. This seems to be getting sloppy as the length of the code increases. Are there any cleaner ways to have dynamic data reference?
Example:
dataset <- "dataRef"
execute <- paste0("data.frame(", dataset, "$column1, ", dataset, "$column2)")
eval(parse(execute))
But now imagine a scenario where dataRef would be called for 1000 lines of code, and sometimes needs to be changed to dataRef2 or dataRefX.
Combining the comments of Jack Maney and G.Grothendieck:
It is better to store your data frames that you want to access by a variable in a list. The list can be created from a vector of names using get:
mynames <- c('dataRef','dataRef2','dataRefX')
# or mynames <- paste0( 'dataRef', 1:10 )
mydfs <- lapply( mynames, get )
Then your example becomes:
dataset <- 'dataRef'
mydfs[[dataset]][,c('column1','column2')]
Or you can process them all at once using lapply, sapply, or a loop:
mydfs2 <- lapply( mydfs, function(x) x[,c('column1','column2')] )
#G.Grothendieck has shown you how to use get and [ to elevate a character value and return the value of a named object and then reference named elements within that object. I don't know what your code was intended to accomplish since the result of executing htat code would be to deliver values to the console, but they would not have been assigned to a name and would have been garbage collected. If you wanted to use three character values: objname, colname1 and colname2 and those columns equal to an object named after a fourth character value.
newname <- "newdf"
assign( newname, get(dataset)[ c(colname1, colname2) ]
The lesson to learn is assign and get are capable of taking character character values and and accessing or creating named objects which can be either data objects or functions. Carl_Witthoft mentions do.call which can construct function calls from character values.
do.call("data.frame", setNames(list( dfrm$x, dfrm$y), c('x2','y2') )
do.call("mean", dfrm[1])
# second argument must be a list of arguments to `mean`

Looping through sequence objects in a list?

I have a list that contains 24 TraMineR sequence objects. Now I want to calculate the Optimal Matching distances for each of these sequence objects (only within each object) and store it in a new list, now consisting of 24 OM distance objects (distance matrices).
The dataset can be found here.
library(TraMineR)
sequences <- read.csv(file = "event-stream-20-l-m.csv", header = TRUE, nrows=10)
repo_names = colnames(sequences)
# 1. Loop across and define the 24 sequence objects & store them in sequence_objects
colpicks <- seq(10,240,by=10)
sequence_objects <- mapply(function(start,stop) seqdef(sequences[,start:stop]), colpicks- 9, colpicks)
# 2. Calculate the costs for OM distances within each object
costs <- mapply(seqsubm(sequence_objects, method="TRATE"))
# 3. Calculate the OM distance objects for each sequence object
sequences.om <- seqdist(sequence_objects, method="OM", indel=1, sm=costs, with.missing=FALSE, norm="maxdist")
Step (1) works fine, but when I progress to step (2), it tells me:
Error in seqsubm(sequence_objects, method = "TRATE") :
[!] data is NOT a sequence object, see seqdef function to create one
This is natural, because sequence_objects is not a sequence object, but a list of sequence objects.
How can I apply the seqsubm function to a list of sequence objects?
I'm not familiar with the TraMineR package, however it looks like you are trying to iterate over the elements of sequence_objects.
mapply is for iterating over multiple objects simultaneously.
lapply in contrast is for iterating over a single object.
Therefore, the following might work for you:
costs <- lapply(sequence_objects, seqsubm, method="TRATE")

make loop to create list of igraph objects in R

I'd like to create a list of Igraph objects with the data used for each Igraph object determined by another variable.
This is how I create a single Igraph object
netEdges <- NULL
for (idi in c("nom1", "nom2", "nom3")) {
netEdge <- net[c("id", idi)]
names(netEdge) <- c("id", "friendID")
netEdge$weight <- 1
netEdges <- rbind(netEdges, netEdge)
}
g <- graph.data.frame(netEdges, directed=TRUE)
For each unique value of net$community I'd like to make a new Igraph object. Then I would like to calculate measures of centrality for each object and then bring those measures back into my net dataset. Many thanks for your help!
Since the code you provide isn't completely reproducible, what follows is not guaranteed to run. It is intended as a guide for how to structure a real solution. If you provide example data that others can use to run your code, you will get better answers.
The simplest way to do this is probably to split net into a list with one element for each unique value of community and then apply your graph building code to each piece, storing the results for each piece in another list. There are several ways to doing this type of thing in R, one of which is to use lapply:
#Break net into pieces based on unique values of community
netSplit <- split(net,net$community)
#Define a function to apply to each element of netSplit
myFun <- function(dataPiece){
netEdges <- NULL
for (idi in c("nom1", "nom2", "nom3")) {
netEdge <- dataPiece[c("id", idi)]
names(netEdge) <- c("id", "friendID")
netEdge$weight <- 1
netEdges <- rbind(netEdges, netEdge)
}
g <- graph.data.frame(netEdges, directed=TRUE)
#This will return the graph itself; you could change the function
# to return other values calculated on the graph
g
}
#Apply your function to each subset (piece) of your data:
result <- lapply(netSplit,FUN = myFun)
If all has gone well, result should be a list containing a graph (or whatever you modified myFun to return) for each unique value of community. Other popular tools for doing similar tasks include ddply from the plyr package.

Resources