Sort a data.frame by multiple columns whose names are contained in a single object? - r

I want to sort a data.frame by multiple columns, ideally using base R without any external packages (though if necessary, so be it). Having read How to sort a dataframe by column(s)?, I know I can accomplish this with the order() function as long as I either:
Know the explicit names of each of the columns.
Have a separate object representing each individual column by which to sort.
But what if I only have one vector containing multiple column names, of length that's unknown in advance?
Say the vector is called sortnames.
data[order(data[, sortnames]), ] won't work, because order() treats that as a single sorting argument.
data[order(data[, sortnames[1]], data[, sortnames[2]], ...), ] will work if and only if I specify the exact correct number of sortname values, which I won't know in advance.
Things I've looked at but not been totally happy with:
eval(parse(text=paste("data[with(data, order(", paste(sortnames, collapse=","), ")), ]"))). Maybe this is fine, but I've seen plenty of hate for using eval(), so asking for alternatives seemed worthwhile.
I may be able to use the Deducer library to do this with sortData(), but like I said, I'd rather avoid using external packages.
If I'm being too stubborn about not using external packages, let me know. I'll get over it. All ideas appreciated in advance!

You can use do.call:
data<-data.frame(a=rnorm(10),b=rnorm(10))
data<-data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
sortnames <- c("a", "b")
data[do.call("order", data[sortnames]), ]
This trick is useful when you want to pass multiple arguments to a function and these arguments are in convenient named list.

Related

Is there a way to apply plyr's count() function to every column individually?

Similar to this question but for R. I want to get a summary count of every variable in each column of a data frame.
Currently, doing something like plyr::count(df[,1:10]) checks for how many times every variable in a row match. Instead, I just want a quick way of printing out what all my variables even are, though. I know this can be done with C-style recursion, but I'm hoping for a more elegant/simpler solution.
You can use lapply:
lapply(df, plyr::count)
Alternatively, keeping everything in base R you can use table with stack to get similar output
lapply(df, function(x) stack(table(x)))

How to automatize listing many elements within a command line in R? [duplicate]

Currently, i have multiple dataframes with the same name and in running order (foo1, foo2, foo3, foo4, foo5... etc). I am trying to create a large dataframe containing all the rows of the above dataframes with rbind(). Is there an elegant way to do it which would be the equivalent of rbind(foo1, foo2, foo3, foo4, foo5...)?
I have tried do.call(rbind, paste0("foo",i)) where i=c(1,2,3...) to no avail.
There is a solution mentioned here, which is:
do.matrix <- do.call(rbind, lapply( paste0("variable", 1:10) , get) )
However, the answer mysteriously says "That is the wrong way to handle related items. Better to use a list or dataframe, but you will probably find out why in due course."
Why would that be the wrong way to do this, and what would be the "right" way?
Thanks.
Always try to rigorously capture relations between related instances of data, or related data and methods, or related methods. This generally helps ease aggregate manipulation such as your rbind requirement.
For your case, you should have defined your related data.frames as a single list from the beginning:
foo <- list(data.frame(...), data.frame(...), ... );
And then your requirement could be satisfied thusly:
do.call(rbind, foo );
If it's too late for that, then the solution involving repeated calls to get(), as described in the article to which you linked, can do the job.

How to rbind matrices based on objects names?

I have several matrices that I would like to rbind in a single summary one. They are objects product of different functions and they have share a pattern in their names.
What I want to do is to tell R to look for all the objects with that common pattern and then rbind them.
Assuming these matrices exist:
commonname.N1<-matrix(nrow=2,ncol=3)
commonname.N2<-matrix(nrow=2,ncol=3)
commonname.M1<-matrix(nrow=2,ncol=3)
I tried something like this to get them:
mats<-grep(x= ls(pos=1), pattern="commonname.", value=TRUE)
mats
[1] "commonname.N1" "commonname.N2" "commonname.M1"
What I can't figure out is how to tell rbind to use that as argument. Basically I would something that gives the same matrix than what rbind(commonname.N1, commonname.N2, commonname.M1) would do in this example.
I have tried things on the line of
mats<-toString(mats)
rbind(mats2)
but that just creates a matrix with the different objects as names.
A similar question was asked here, but:
mats<-as.list(mats)
do.call(what=rbind, args=as.list(mats))
doesn't do the job.
Sorry if there is something basic I'm missing somewhere, but I can't figure it out and I'm relatively new to R.
Use mget:
do.call(rbind,mget(mats))

How to order a matrix by all columns

Ok, I'm stuck in a dumbness loop. I've read thru the helpful ideas at How to sort a dataframe by column(s)? , but need one more hint. I'd like a function that takes a matrix with an arbitrary number of columns, and sorts by all columns in sequence. E.g., for a matrix foo with N columns,
does the equivalent of foo[order(foo[,1],foo[,2],...foo[,N]),] . I am happy to use a with or by construction, and if necessary define the colnames of my matrix, but I can't figure out how to automate the collection of arguments to order (or to with) .
Or, I should say, I could build the entire bloody string with paste and then call it, but I'm sure there's a more straightforward way.
The most elegant (for certain values of "elegant") way would be to turn it into a data frame, and use do.call:
foo[do.call(order, as.data.frame(foo)), ]
This works because a data frame is just a list of variables with some associated attributes, and can be passed to functions expecting a list.

Remove values from a dataset based on a vector of those values

I have a dataset that looks like this, except it's much longer and with many more values:
dataset <- data.frame(grps = c("a","b","c","a","d","b","c","a","d","b","c","a"), response = c(1,4,2,6,4,7,8,9,4,5,0,3))
In R, I would like to remove all rows containing the values "b" or "c" using a vector of values to remove, i.e.
remove<-c("b","c")
The actual dataset is very long with many hundreds of values to remove, so removing values one-by-one would be very time consuming.
Try:
dataset[!(dataset$grps %in% remove),]
There's also subset:
subset(dataset, !(grps %in% remove))
... which is really just a wrapper around [ that lets you skip writing dataset$ over and over when there are multiple subset criteria. But, as the help page warns:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
‘[’, and in particular the non-standard evaluation of argument
‘subset’ can have unanticipated consequences.
I've never had any problems, but the majority of my R code is scripting for my own use with relatively static inputs.
2013-04-12
I have now had problems. If you're building a package for CRAN, R CMD check will throw a NOTE if you have use subset in this way in your code - it will wonder if grps is a global variable, even though subset is evaluating it within dataset's environment (not the global one). So if there's any possiblity your code will end up in a package and you feel squeamish about NOTEs, stick with Rcoster's method.

Resources