I've gathered that the order function in R can be used to sort rows of a data frame/matrix by one or more columns of that object. The columns are passed as separate arguments to order, and order can handle a variable number of arguments.
I would like to sort a data frame by all its columns, but I don't know the names or the number of columns in the data frame beforehand. In Python, one can unpack a list of objects as the arguments to a function (e.g. zip(*mylist) is zip(mylist[0], mylist[1], etc...)). Is there a similar way to do so in R? It would be nice to "unpack" the columns of a matrix when I call order.
Is there another way in R to sort by multiple columns besides passing an arbitrary number of parameters?
more thoughts:
It seems like I cannot just package multiple unnamed items into a single object to pass to order. Nor can I think of a way to use a for loop, apply, or do.call to make arbitrary numbers of objects. There's something here: http://r.789695.n4.nabble.com/custom-sort-td888802.html.
Or... should I write a for loop to call order on each column, starting with the least priority one and ending with the column that would've been the first argument to order, reordering the rows each time and making sure that order sorts stably?
Thanks.
in python calling fun(*args,**kwargs) specifies the list of positional arguments (*args) and arguments to be matched by name (kwargs).
A similar call in R is do.call(fun,arglist). Unlike python, you cant mix regular and special arguments (e.g. fun(a=1,*args)) and the second argument to do.call is can have elements that are matched by name or position (e.g. do.call(fun,list(2,x=3)))
To complete the example, since data.frames inherit from lists, you can simply call 'order(df)' to order on all the columns sequentially (as long as none of the names of the fields in your data.frame match the formal arguments of order 'na.last' and 'decreasing')
Related
I have a nested list which I converted from a JSON file. This list contains user IDs (strings), under each of the IDs I have other nested elements, integers, booleans, etc. I have sorted this list via list.sort by a specific integer called score. What I really want to have is to have a list or a vector that only includes these score values, preferably top 100. I don't care about the user IDs or any other data point. How do I do this? I am completely new to R.
lapply and sapply functions iterate through the list. Iterated with lapply, used a function that returns the score value of the component.
Is there a function in R that would let me combine/concatenate data-frames when some variables are either lists or data frames themselves? I've tried rbind(), rbindlist(), rbind.data.frame and bind_rows and they are all throwing out errors, e.g. duplicate 'row.names' are not allowed or Argument 4 can't be a list containing data frames.
After looking into it a bit, it seems that none of those functions support nested data-frames. Is there a function that would work for me? Or is there something (other than a for-loop that adds row by row) that I could do?
As a bit of a background, I'm making API-calls to a database and can get only 40 results at a time so I am looping through those via multiple calls, and I want to combine the results without any loss of information. I am using jsonlite:fromJSON to convert to a df: could I/should I combine the info in JSON format first and then convert to a df?
I'm needing to subset a list which contains an array as well as a factor variable. Essentially if you imagine each component of the array is relative to a single individual which is then associated to a two factor variable (treatment).
list(array=array(rnorm(2,4,1),c(5,5,10)), treatment= rep(c(1,2),5))
Typically when sub-setting multiple components of the array from the first component of the list I would use something like
list$array[,,c(2,4,6)]
this would return the array components in location 2,4 and 6. However, for the factor component of the list this wouldn't work as subsetting is different, what you would need is this:
list$treatment[c(2,4,6)]
Need to subset a list with containing different classes (array and vector) by the same relative number.
You're treating your list of matrices as some kind of 3-dimensional object, but it's not.
Your list$matrices is of itself a list as well, which means you can index at as a list as well, it doesn't matter if it is a list of matrices, numerics, plot-objects, or whatever.
The data you provided as an example can just be indexed at one level, so list$matrices[c(2,4,6)] works fine.
And I don't really get your question about saving the indices in a numeric vector, what's to stop you from this code?
indices <- c(2,4,6)
mysubset <- list(list$matrices[indices], list$treatment[indices])
EDIT, adding new info for edited question:
I see you actually have an 3-D array now. Which is kind of weird, as there is no clear convention of what can be seen as "components". I mean, from your question I understand that list$array[,,n] refers to the n-th individual, but from a pure code-point of view there is no reason why something like list$array[n,,] couldn't refer to that.
Maybe you got the idea from other languages, but this is not really R-ish, your earlier example with a list of matrices made more sense to me. And I think the most logical would have been a data.frame with columns matrix and treatment (which is conceptually close to a list with a vector and a list of matrices, but it's clearer to others what you have).
But anyway, what is your desired output?
If it's just subsetting: with this structure, as there are no constraints on what could have been the content, you just have to tell R exactly what you want. There is no one operator that takes a subset of a vector and the 3rd index of an array at the same time. You're going to have to tell R that you want 3rd index to use for subsetting, and that you want to use the same index for subsetting a vector. Which is basically just the code you already have:
idx <- c(2,4,6)
output <- list(list$array[,,idx], list$treatment[idx])
The way that you use for subsetting multiple matrices actually gives an error since you are giving extra dimension although you already specify which sublist you are in. Hence in order to subset matrices for the given indices you can usemy_list[[1]][indices] or directly my_list$matrices[indices]. It is the same for the case treatement my_list[[2]][indices] or my_list$treatement[indices]
Ok, I'm stuck in a dumbness loop. I've read thru the helpful ideas at How to sort a dataframe by column(s)? , but need one more hint. I'd like a function that takes a matrix with an arbitrary number of columns, and sorts by all columns in sequence. E.g., for a matrix foo with N columns,
does the equivalent of foo[order(foo[,1],foo[,2],...foo[,N]),] . I am happy to use a with or by construction, and if necessary define the colnames of my matrix, but I can't figure out how to automate the collection of arguments to order (or to with) .
Or, I should say, I could build the entire bloody string with paste and then call it, but I'm sure there's a more straightforward way.
The most elegant (for certain values of "elegant") way would be to turn it into a data frame, and use do.call:
foo[do.call(order, as.data.frame(foo)), ]
This works because a data frame is just a list of variables with some associated attributes, and can be passed to functions expecting a list.
Consider the problem where you have a list which needs to be split into multiple lists (buckets) given a function given an element and returning the index of the destination list (bucket). The output of the operation is a list of lists.
What's the correct name for this operation?
You can also call it partition.
One name would be grouping: the Scala function that does this is groupBy (though it returns a Map from discriminator keys to Lists instead of the list of lists you're asking for).
If your list is ordered and the function in question splits into multiple buckets of roughly equal size (for some notion of size), then it could be called quantiling.