I have an algorithm that takes data, sorts it, analyzes it, and then returns scores for the sorted data. However, the scores correspond to the sorted data, and I'd really like to return scores that correspond to the unsorted data. I figured there had to be some default R function that does this, but I've had no luck finding anything. Here's a MWE, and code I wrote that works but is really slow:
orig = rnorm(10)
ord = order(orig)
new = orig[ord]
reprod = sapply(1:length(orig), function(x)new[which(ord==x)] )
all(reprod==orig)
Are there any ways to "un-sort" data more efficiently?
what about just:
orig = rnorm(100000)
ord = order(orig)
new = orig[ord]
reprod = rep(0,length(new))
reprod[ord] = new
One nice thing about the ordering vector (the result of the order function) is that if you run order on the result it gives you the inverse sorting, in other words it tells you how to unsort. Here is a quick example of what I think you are trying to do in a simple way
> orig <- rnorm(10)
> ord <- order(orig)
> score <- seq_along(orig)
> cbind(orig, score[ order(ord) ])
orig
[1,] -0.2429266384 4
[2,] 0.6346488818 8
[3,] 1.2956779160 9
[4,] -0.5563531517 3
[5,] 1.3299626650 10
[6,] -1.6062497717 1
[7,] -1.1444093167 2
[8,] -0.0004719915 5
[9,] 0.2734227278 7
[10,] 0.0357991850 6
You could also do something like:
new[ order(ord) ]
to see that it returns the sorted data to the original order.
Related
I want to assure that the result of which(..., arr.ind = TRUE) is always ordered, specifically: arranged ascending by (col, row). I do not see such a remark in the which function documentation, whereas it seems to be the case based on some experiments I made. How I can check / learn if it is the case?
Example. When I run the code below, the output is a matrix in which the results are arranged ascending by (col, row) columns.
> set.seed(1)
> vals <- rnorm(10)
> valsall <- sample(as.numeric(replicate(10, vals)))
> mat <- matrix(valsall, 10, 10)
> which(mat == max(mat), arr.ind = TRUE)
row col
[1,] 1 1
[2,] 3 1
[3,] 1 2
[4,] 2 2
[5,] 10 2
[6,] 1 6
[7,] 2 8
[8,] 4 8
[9,] 1 9
[10,] 6 9
Part1:
Answering a part of your question on how to understand functions on a deeper level, if the documentation is not enough, without going into the detail of function which().
As match() is not a primitive function (which are written in C), i.e. written using the basic building blocks of R, we can check what's going on behind the scenes by printing the function itself. Note that using the backticks allows to check functions that have reserved names, e.g. +, and is therefore optional in this example. This dense R code can be extremely tiresome to read, but I've found it very educational and it does solve some mental knots every once in a while.
> print(`which`)
function (x, arr.ind = FALSE, useNames = TRUE)
{
wh <- .Internal(which(x))
if (arr.ind && !is.null(d <- dim(x)))
arrayInd(wh, d, dimnames(x), useNames = useNames)
else wh
}
<bytecode: 0x00000000058673e0>
<environment: namespace:base>
Part2:
So after giving up on trying to understand the which and arrayInd function in the way described above, I'm trying it with common sense. The most efficient way to check each value of a matrix/array that makes sense to me, is to at some point convert it to a one-dimensional object. Coercion from matrix to atomic vector, or any reduction of dimensions will always result in concatenating the complete columns of each dimension, so to me it is natural that higher-level functions will also follow this fundamental rule.
> testmat <- matrix(1:10, nrow = 2, ncol = 5)
> testmat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> as.numeric(testmat)
[1] 1 2 3 4 5 6 7 8 9 10
I found Hadley Wickham's Advanced R an extremely valuable resource in answering your question, especially the chapters about functions and data structures.
[http://adv-r.had.co.nz/][1]
I cannot seem to convert a list to a matrix. I load a .csv file using:
dat = read.csv("games.csv", header = TRUE)
> typeof(dat)
[1] "list"
But when I try to convert it into a numeric matrix using:
games = data.matrix(dat)
The entries' values are all changed for some reason. What is the problem?
While Nathaniel's solution worked for you, I think it's important to point out that you might need to adjust your perception of what is going on.
The typeof(dat) might be a list but the class is a data.frame.
This might help illustrate the difference:
# typeof() & class() of `pts` is `list`
# whereas typeof() `dat` in your example is `list` but
# class() of `dat` in your example is a `data.frame`
pts <- list(x = cars[,1], y = cars[,2])
as.matrix(pts)
## [,1]
## x Numeric,50
#3 y Numeric,50
head(as.matrix(data.frame(pts)))
## x y
## [1,] 4 2
## [2,] 4 10
## [3,] 7 4
## [4,] 7 22
## [5,] 8 16
## [6,] 9 10
Those are two substantially different outcomes from the 'as.matrix()` function.
Just making sure you don't get disappointed of the outcome if you try this in a different context outside of read.csv.
Without any other information being provided, perhaps you might try:
games <- as.matrix(dat)
This is related to a previous question:
Basic R, how to populate a vector with results from a function
But I thought I would post a new one because I have an additional requirement. Take this R code.
X <- matrix(stats::rnorm(100), ncol = 2)
hpts <- chull(X)
Y <- ifelse(X[,1] %in% X[hpts], 1, 0)
c(X,Y)
Z <- matrix(0,ncol = 1,nrow =50)
I want to add in an additional vector Z preserving the order of hpts. So if t(t(hpts)) looks like this:
[1,] 48
[2,] 27
[3,] 15
[4,] 13
[5,] 39
[6,] 2
[7,] 5
[8,] 50
then I want the 48th row Z to have 1, the 27th row to have 2, the 15th row to have 3 etc.etc. I have attempted to do this with a for loop:: for (i in Y) {...} but I have not had success. Any pointers would be appreciated.
This should do it:
Z[c(hpts),1] <- seq_along(hpts)
(edit note: I changed the Title to "R: enumerate column combinations of a matrix", from "R grep: matching a matrix of strings to a list" to better reflect the solution)
I am trying to match a matrix of strings to a list: so that i can ultimately use the matrix as a map in later operations on a data.frame.
This first part works as intended, returning a list of all the possible pairs, triples and quad combinations (though perhaps this approach has created my bind?):
priceList <- data.frame(aaa = rnorm(100, 100, 10), bbb = rnorm(100, 100, 10),
ccc = rnorm(100, 100, 10), ddd = rnorm(100, 100, 10),
eee = rnorm(100, 100, 10), fff = rnorm(100, 100, 10),
ggg = rnorm(100, 100, 10))
getTrades <- function(dd, Maxleg=3)
{
nodes <- colnames(dd)
tradeList <- list()
for (i in 2:Maxleg){
tradeLeg <- paste0('legs',i)
tradeList[[tradeLeg]] <- combn(nodes, i)
}
return(tradeList)
}
tradeCombos <- getTrades(priceList, 4)
I'd now like to turn this list of possible combinations into trades. For example:
> tradeCombos[[1]][,1]
[1] "aaa" "bbb"
Needs to eventually become priceList[,2] - priceList[,1], and so forth.
I have tried a few approaches with grep and similar commands, and feel that i've come close with the following:
LocList <- sapply(tradeCombos[[1]], regexpr, colnames(priceList))
However the format is not quite suitable for the next step.
Ideally, LocList[1] would return something like: 1 2
Assuming that the tradeCombos[[1]][,1] == "aaa" "bbb".
Can someone please help?
__
With help from all of the answers below, i've now got:
colDiff <- function(x)
{
Reduce('-', rev(x))
}
getTrades <- function(dd, Maxleg=3)
{
tradeList <- list()
for (i in 2:Maxleg){
tradeLeg <- paste0('legs',i)
tradeLegsList <- combn(names(dd), i,
function(x) dd[x], simplify = FALSE)
nameMtx <- combn(names(dd), i)
names(tradeLegsList) <- apply(nameMtx, MARGIN=2,
FUN=function(x) paste(rev(x), collapse='*'))
tradeList[[tradeLeg]] <- lapply(tradeLegsList, colDiff)
}
return(tradeList)
}
tradeCombos <- getTrades(priceList, 4)
This retains the names of the constitutent parts, and is everything I was trying to achieve.
Many thanks to all for the help.
Whoa... ignore everything below and jump to the update
As mentioned in my comment, you can just use combn. This solution doesn't take you to your very last step, but instead, creates a list of data.frames. From there, it is easy to use lapply to get to whatever your final step would be.
Here's the simplified function:
TradeCombos <- function(dd, MaxLeg) {
combos = combn(names(dd), MaxLeg)
apply(combos, 2, function(x) dd[x])
}
To use it, just specify your dataset and the number of combinations you're looking for.
TradeCombos(priceList, 3)
TradeCombos(priceList, 4)
Moving on: #mplourde has shown you how to use Reduce to successively subtract. A similar approach would be taken here:
cumDiff <- function(x) Reduce("-", rev(x))
lapply(TradeCombos(priceList, 3), cumDiff)
By keeping the output of the TradeCombos function as a list of data.frames, you'll be leaving more room for flexibility. For instance, if you wanted row sums, you can simply use lapply(TradeCombos(priceList, 3), rowSums); similar approaches can be taken for whatever function you want to apply.
Update
I'm not sure why #GSee didn't add this as an answer, but I think it's pretty awesome:
Get your list of data.frames as follows:
combn(names(priceList), 3, function(x) priceList[x], simplify = FALSE)
Advance as needed. (For example, using the cumDiff function we created: combn(names(priceList), 2, function(x) cumDiff(priceList[x]), simplify = FALSE).)
This gets your eventual aim using lapply, apply, and Reduce.
lapply(tradeCombos,
function(combos)
apply(combos, MARGIN=2, FUN=function(combo) Reduce('-', priceList[rev(combo)])))
combo is a column from one of the combo matrices in tradeCombos. rev(combo) reverses the column so the last value is first. The R syntax for selecting a subset of columns from a data.frame is DF[col.names], so priceList[rev(combo)] is a subset of priceList with just the columns in combo, in reverse order. data.frames are actually just lists of columns, so any function that's designed to iterate over lists can be used to iterate over the columns in a data.frame. Reduce is one such function. Reduce takes a function (in this case the subtract function -) and a list of arguments and then successively calls the function on the arguments in the list with the results of the previous call, e.g., (((arg1 - arg2) - arg3) - arg4).
You rename the columns in tradeCombos so that the final column names reflect their source with:
tradeCombos <- lapply(tradeCombos,
function(combos) {
dimnames(combos)[[2]] <- apply(combos,
MARGIN=2,
FUN=function(combo) paste(rev(combo), collapse='-')
)
return(combos)
}
)
tradeCombos is a list with matrix elements. Therefore, tradeCombos[[1]] is a matrix for which apply is more suitable.
apply(tradeCombos[[1]],1,function(x) match(x,names(priceList)))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 1 6
[6,] 1 7
[7,] 2 3
[8,] 2 4
[9,] 2 5
[10,] 2 6
[11,] 2 7
[12,] 3 4
[13,] 3 5
[14,] 3 6
[15,] 3 7
[16,] 4 5
[17,] 4 6
[18,] 4 7
[19,] 5 6
[20,] 5 7
[21,] 6 7
Incidentally, you can subset using the string form anyway, eg priceList[,"aaa"]
This question is more advanced than the similar here. Expected amount of chars about 20.
When I plot things in data.frame, I do it like:
# t1 is a df
> plot((q1*s1+q2*s2)/(s1+s2),data=t1)
but can I reuse this form for matrix?
[Finally working MVO, thanks!]
> M<-matrix(data=rnorm(30),ncol=2,dimnames=list(NULL,c('q1','q2')))
> plot(M)
> x=1:dim(M)[1]
> plot(x~q1/q2,data=data.frame(M),type='l')
You can use with for this
with(data.frame(mymatrix), plot((q1*s1+q2*q2)/(s1+s2)))
Hope this help
That sort of plotting (where you type formulas involving the dataframe's columns) is only available for data frames.
If colnames(mymatrix) are q1, s1, etc, then you can achieve the affect by doing:
plot( myformula, data=data.frame(mymatrix))
i.e., coerce the matrix to a dataframe and then use the formula.
Update
An example demonstrating this works:
# construct a matrix
> mymatrix <- array(runif(10*2),dim=c(10,2))
# give it column names X and Y
> colnames(mymatrix)<-c('X','Y')
> mymatrix
X Y
[1,] 0.07346608 0.81321578
[2,] 0.09525474 0.17852467
[3,] 0.81246522 0.45747972
[4,] 0.01286714 0.82517127
[5,] 0.77554012 0.87725725
[6,] 0.71908435 0.71628493
[7,] 0.13212848 0.67827601
[8,] 0.65993809 0.01650703
[9,] 0.11385161 0.99433644
[10,] 0.22750439 0.45611635
# plot Y vs X -- note you need to convert the matrix to a data frame first.
> plot(Y~X,data.frame(mymatrix))