What is the correct term for an operation that forks a list (functional programming)? - functional-programming

Consider the problem where you have a list which needs to be split into multiple lists (buckets) given a function given an element and returning the index of the destination list (bucket). The output of the operation is a list of lists.
What's the correct name for this operation?

You can also call it partition.

One name would be grouping: the Scala function that does this is groupBy (though it returns a Map from discriminator keys to Lists instead of the list of lists you're asking for).

If your list is ordered and the function in question splits into multiple buckets of roughly equal size (for some notion of size), then it could be called quantiling.

Related

How to assign nested variables in a list into a new list or vector in R?

I have a nested list which I converted from a JSON file. This list contains user IDs (strings), under each of the IDs I have other nested elements, integers, booleans, etc. I have sorted this list via list.sort by a specific integer called score. What I really want to have is to have a list or a vector that only includes these score values, preferably top 100. I don't care about the user IDs or any other data point. How do I do this? I am completely new to R.
lapply and sapply functions iterate through the list. Iterated with lapply, used a function that returns the score value of the component.

Object modification only happens in list

I have put objects that I would like to edit in a list.
Say, the names of the objects are kind of like this:
name1_X
name1_Y
name2_X
name2_Y
And there are different sets of these objects, that are stored in different lists, so for each different set, they would have a slightly different name, like:
name1_P_X
name1_F1_X
name2_F2_Y
and so on..
So for every "name" there are six objects. There are two each ending with X or Y for P, F1, F2. We have three lists (listbF_P, listbF_F1, listbF_F2), each containing objects that end with X and Y.
I edited the objects in the list like this (example for only one list):
for (i in 1:NROW(listbF_P)){
listbF_P[[i]]#first.year <- 1986
listbF_P[[i]]#last.year <- 2005
listbF_P[[i]]#year.aggregate.method <- "mean"
listbF_P[[i]]#id <- makeFieldID(listbF_P[[i]])
}
When I check whether the changes were successfully applied, it works but only when referring to the objects inside the list but not the same objects "unlisted".
So if I call
listbF_P[[1]]#last.year
it returns
"2005"
But if I call
name1_X#last.year
it returns
"Inf"
The problem with this is that I want the edited objects in a different list later.
So I need either a way that the latter call example returns "2005" or a way that I can search for a certain object name pattern in multiple lists to put the ones that fit the pattern into another list.
This is because the example above was made with multiple lists (listbF_P, listbF_F1, listbF_F2) and these lists contain a pattern matching "X" and another matching "Y".
So basically I want to have two lists with edited objects, one matching pattern "X" and the other matching pattern "Y".
I would call the list matching the desired patterns like this:
listbF_ALL_X <- mget(ls(pattern=".*_X$"))
listbF_ALL_Y <- mget(ls(pattern=".*_Y$"))
The first list would hence contain all objects ending with "X", e.g.:
name1_P_X
name1_F1_X
name1_F2_X
name2_P_X
[...]
and I would like to have the ones that I edited in the loop earlier
..but when calling the objects out of that list
listbF_ALL_X[[1]]#last.year
again just returns
"Inf"
since it takes the objects out of the environment and not the list. But I want it to return the desired number that has been changed (e.g. "2005").
I hope my problem and the two possible ways of solving them are clear..
If something isn't, ask :)
Thanks for any input
Regards
In R, unlike in many other modern languages, (almost) all objects are logically copies of each other. You can’t have multiple names that are references to the same object (see below for caveats).
But even if this was supported, your design looks confusing. Rather than have lots of related objects with different names, put your objects into nested lists and classes that logically relate them. That is, rather than have objects with names name{1..10}_{P,F1,F2}_{X,Y}, you should have one list, name, in which you store nested lists or classes with named members P, F1, F2 which, in turn, are objects that have names X and Y. Then you could access an object by, say, name[1L]$P$X (or name[1L]#P#X, if you’re using S4 objects with slots).
Or you use a more data-oriented approach and flatten all these nested objects into a table with corresponding columns P, F1, F2, X and Y. Which solution is more appropriate depends on your exact use-case.
Now for the caveat: you can use reference semantics in R by using *environments8 instead of regular objects. When copying an environment, a reference to the same environment object is created. However, this semantic is usually confusing because it’s contrary to the expectation in R, so it should be used with care. The ‘R6’ package creates an object system with reference semantics based on environments. For many purposes where reference semantics are indispensable, ‘R6’ is the right answer.
I found another solution:
I went on by modifying this part:
listbF_ALL_X <- mget(ls(pattern=".*_X$"))
listbF_ALL_Y <- mget(ls(pattern=".*_Y$"))
To not call objects from the environment but by calling objects from each list:
listbF_ALL_X <- c(c(listbF_P, listbF_F1, listbF_F2)[grepl(".*_X$", names(c(listbF_P, listbF_F1, listbF_F2)))])
listbF_ALL_Y <- c(c(listbF_P, listbF_F1, listbF_F2)[grepl(".*_Y$", names(c(listbF_P, listbF_F1, listbF_F2)))])
It's not the prettiest way of doing it but it works and in my case it was the solution that required the least amount of change in my script.

multiplying two lists of matrices in continuos form

I have two lists of matrices and I want to multiply the first element of the first list with the first element of the second list and so on, without writing every operatios due to may be a large number of elements on each list (both lists have the same length)
this is what I mean
'(colSums(R1*t(M1))),(colSums(R2*t(M2))),...(colSums(Rn*t(Mn)))'
Do I need to create an extra list?
Although first I must be able to transpose the matrices of one of the lists before multiplying them. The results will be used for easier operations.
I already tried to use indexes and loops and doesn't work,
first tried to transpose matrices in one list like this (M is one of the lists and the other is named R, M contains M1,M2,..Mn and the same for list R)
The complete operation looks like this:
'for (i in 1:length(M)){Mt<-list(t(M[[i]]))}'
and only applies it to the last element.
The full operation looks like this:
'(cbind((colSums(R1*t(M1))),(colSums(R2*t(M2))),...(colSums(Rn*t(Mn))))'
any step of these will be useful
you could use the rlist package.
The function
list.apply(.data, .fun, ...)
will apply a function to each list element.
You can find documentation at [https://cran.r-project.org/web/packages/rlist/rlist.pdf][1].

How to remember which variables are in a list

I have a huge list in which I put different variables in order to apply the same function to all of them.
In a next step I want to apply specific functions to specific elements of the list, i.e. all functions used vary from element to element within the list.
How can I do this? My first idea was (see my other question, Reassign variables to elements of list) to split the list into the original variables again. This can be done.
But I was recommended to keep the items in the list instead. My questions is: How can I access each variable quickly by doing that? One idea would be to use the names attribute of the list in the beginning and fill it with a vector of the original variable names. However, by doing that it would be much longer later on to type list["name_x"] than just typing name_x assuming name_x is globally available.
What is the most efficient way to deal with my problem?

Passing undetermined number of arguments in R to the order() function

I've gathered that the order function in R can be used to sort rows of a data frame/matrix by one or more columns of that object. The columns are passed as separate arguments to order, and order can handle a variable number of arguments.
I would like to sort a data frame by all its columns, but I don't know the names or the number of columns in the data frame beforehand. In Python, one can unpack a list of objects as the arguments to a function (e.g. zip(*mylist) is zip(mylist[0], mylist[1], etc...)). Is there a similar way to do so in R? It would be nice to "unpack" the columns of a matrix when I call order.
Is there another way in R to sort by multiple columns besides passing an arbitrary number of parameters?
more thoughts:
It seems like I cannot just package multiple unnamed items into a single object to pass to order. Nor can I think of a way to use a for loop, apply, or do.call to make arbitrary numbers of objects. There's something here: http://r.789695.n4.nabble.com/custom-sort-td888802.html.
Or... should I write a for loop to call order on each column, starting with the least priority one and ending with the column that would've been the first argument to order, reordering the rows each time and making sure that order sorts stably?
Thanks.
in python calling fun(*args,**kwargs) specifies the list of positional arguments (*args) and arguments to be matched by name (kwargs).
A similar call in R is do.call(fun,arglist). Unlike python, you cant mix regular and special arguments (e.g. fun(a=1,*args)) and the second argument to do.call is can have elements that are matched by name or position (e.g. do.call(fun,list(2,x=3)))
To complete the example, since data.frames inherit from lists, you can simply call 'order(df)' to order on all the columns sequentially (as long as none of the names of the fields in your data.frame match the formal arguments of order 'na.last' and 'decreasing')

Resources