Related
I'm trying to learn how to use the apply() functions.
Suppose we have a 3 row, 2 column matrix of test <- matrix(c(1,2,3,4,5,6), ncol = 2), and we would like the maximum value of each element in the first column (1, 2, 3) to not exceed 2 for example, so we end up with a matrix of (1,2,2,4,5,6).
How would one write an apply() function to do this?
Here's my latest attempt: test1 <- apply(test[,1], 2, function(x) {if(x > 2){return(x = 2)} else {return(x)}})
We may use pmin on the first column with value 2 as the second argument, so that it does elementwise checking with the recycled 2 and gets the minimum for each value from the first column
test[,1] <- pmin(test[,1], 2)
-output
> test
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 2 6
Note that apply needs the 'X' as an array/matrix or one with dimensions, when we subset only a single column/row, it drops the dimensions because drop = TRUE by default
If you really want to use the apply() function, I guess you're looking for something like this:
t(apply(test, 1, function(x) c(min(x[1], 2), x[2])))
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 2 6
But if you want my opinion, akrun's suggestion is definitely better.
Suppose we have a "test" matrix that looks like this: (1,2,3, 4,5,6, 7,8,9, 10,11,12) generated by running test <- matrix(1:12, ncol = 4). A simple 3 x 4 (rows x columns) matrix of numbers running from 1 to 12.
Now suppose we'd like to add a value of 1 to each element in each odd-numbered matrix column, so we end up with a matrix of the following values: (2,3,4, 4,5,6, 8,9,10, 10,11,12). How would we use an apply() function to do this?
Note that this is a simplified example. In the more complete code I'm working with, the matrix dynamically expands/contracts based on user inputs so I need an apply() function that counts the actual number of matrix columns, rather than using a fixed assumption of 4 columns per the above example. (And I'm not adding a value of 1 to the elements; I'm running the parallel minima function test[,1] <- pmin(test1[,1], 5) to say limit each value to a max of 5).
With my current limited understanding of the apply() family of functions, all I can so far do is apply(test, 2, function(x) {return(x+1)}) but this is adding a value of 1 to all elements in all columns rather than only the odd-numbered columns.
You may simply subset the input data frame to access only odd or even numbered columns. Consider:
test[c(TRUE, FALSE)] <- apply(test[c(TRUE, FALSE)], 2, function(x) f(x))
test[c(FALSE, TRUE)] <- apply(test[c(FALSE, TRUE)], 2, function(x) f(x))
This works because the recycling rules in R will cause e.g. c(TRUE, FALSE) to be repeated however many times is needed to cover all columns in the input test data frame.
For a matrix, we need to use the drop=FALSE flag when subsetting the matrix in order to keep it in matrix form when using apply():
test <- matrix(1:12, ncol = 4)
test[,c(TRUE, FALSE)] <- apply(test[,c(TRUE, FALSE),drop=FALSE], 2, function(x) x+1)
test
[,1] [,2] [,3] [,4]
[1,] 2 4 8 10
[2,] 3 5 9 11
[3,] 4 6 10 12
^ ^ ... these columns incremented by 1
You may use modulo %% 2.
odd <- !seq(ncol(test)) %% 2 == 0
test[, odd] <- apply(test[, odd], 2, function(x) {return(x + 1)})
# [,1] [,2] [,3] [,4]
# [1,] 2 4 8 10
# [2,] 3 5 9 11
# [3,] 4 6 10 12
I am trying to find out usage of drop() function. I read the documentation that a matrix or array can be the input object for the function however the size of the matrix or object does not change. Can someone explain its actual usage and how it works?
I am using R version 3.2.1. Code snippet:
data1 <- matrix(data=(1:10),nrow=1,ncol=1)
drop(data1)
R has factors, which are very cool (and somewhat analogous to labeled levels in Stata). Unfortunately, the factor list sticks around even if you remove some data such that no examples of a particular level still exist.
# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
levels(x)
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!
The solution is simple: run factor() again:
x <- factor(x)
levels(x)
If you need to do this on many factors at once (as is the case with a data.frame containing several columns of factors), use drop.levels() from the gdata package:
x <- x[x!="antiquewhite1"]
df <- data.frame(a=x,b=x,c=x)
df <- drop.levels(df)
R matrix is a two dimensional array. R has a lot of operator and functions that make matrix handling very convenient.
Matrix assignment:
>A <- matrix(c(3,5,7,1,9,4),nrow=3,ncol=2,byrow=TRUE)
>A
[,1] [,2]
[1,] 3 5
[2,] 7 1
[3,] 9 4
Matrix row and column count:
>rA <- nrow(A)
>rA
[1] 3
>cA <- ncol(A)
>cA
[1] 2
t(A) function returns a transposed matrix of A:
>B <- t(A)
>B
[,1] [,2] [,3]
[1,] 3 7 9
[2,] 5 1 4
Matrix multplication:
C <- A * A
C
[,1] [,2]
[1,] 9 25
[2,] 49 1
[3,] 81 16
Matrix Addition:
>C <- A + A
>C
[,1] [,2]
[1,] 6 10
[2,] 14 2
[3,] 18 8
Matrix subtraction (-) and division (/) operations ... ...
Sometimes a matrix needs to be sorted by a specific column, which can be done by using order() function.
Following is a csv file example:
,t1,t2,t3,t4,t5,t6,t7,t8
r1,1,0,1,0,0,1,0,2
r2,1,2,5,1,2,1,2,1
r3,0,0,9,2,1,1,0,1
r4,0,0,2,1,2,0,0,0
r5,0,2,15,1,1,0,0,0
r6,2,2,3,1,1,1,0,0
r7,2,2,3,1,1,1,0,1
Following R code will read in the above file into a matrix, and sort it by column 4, then write to a output file:
x <- read.csv("sortmatrix.csv",header=T,sep=",");
x <- x[order(x[,4]),];
x <- write.table(x,file="tp.txt",sep=",")
The result is:
"X","t1","t2","t3","t4","t5","t6","t7","t8"
"1","r1",1,0,1,0,0,1,0,2
"4","r4",0,0,2,1,2,0,0,0
"6","r6",2,2,3,1,1,1,0,0
"7","r7",2,2,3,1,1,1,0,1
"2","r2",1,2,5,1,2,1,2,1
"3","r3",0,0,9,2,1,1,0,1
"5","r5",0,2,15,1,1,0,0,0
The DROP function supports natively compiled, scalar user-defined functions.
Removes one or more user-defined functions from the current database
To execute DROP FUNCTION, at a minimum, a user must have ALTER permission on the schema to which the function belongs, or CONTROL permission on the function.
DROP FUNCTION will fail if there are Transact-SQL functions or views in the database that reference this function and were created by using SCHEMA BINDING, or if there are computed columns, CHECK constraints, or DEFAULT constraints that reference the function.
DROP FUNCTION will fail if there are computed columns that reference this function and have been indexed.
DROP FUNCTION { [ schema_name. ] function_name } [ ,...n ]
(edit note: I changed the Title to "R: enumerate column combinations of a matrix", from "R grep: matching a matrix of strings to a list" to better reflect the solution)
I am trying to match a matrix of strings to a list: so that i can ultimately use the matrix as a map in later operations on a data.frame.
This first part works as intended, returning a list of all the possible pairs, triples and quad combinations (though perhaps this approach has created my bind?):
priceList <- data.frame(aaa = rnorm(100, 100, 10), bbb = rnorm(100, 100, 10),
ccc = rnorm(100, 100, 10), ddd = rnorm(100, 100, 10),
eee = rnorm(100, 100, 10), fff = rnorm(100, 100, 10),
ggg = rnorm(100, 100, 10))
getTrades <- function(dd, Maxleg=3)
{
nodes <- colnames(dd)
tradeList <- list()
for (i in 2:Maxleg){
tradeLeg <- paste0('legs',i)
tradeList[[tradeLeg]] <- combn(nodes, i)
}
return(tradeList)
}
tradeCombos <- getTrades(priceList, 4)
I'd now like to turn this list of possible combinations into trades. For example:
> tradeCombos[[1]][,1]
[1] "aaa" "bbb"
Needs to eventually become priceList[,2] - priceList[,1], and so forth.
I have tried a few approaches with grep and similar commands, and feel that i've come close with the following:
LocList <- sapply(tradeCombos[[1]], regexpr, colnames(priceList))
However the format is not quite suitable for the next step.
Ideally, LocList[1] would return something like: 1 2
Assuming that the tradeCombos[[1]][,1] == "aaa" "bbb".
Can someone please help?
__
With help from all of the answers below, i've now got:
colDiff <- function(x)
{
Reduce('-', rev(x))
}
getTrades <- function(dd, Maxleg=3)
{
tradeList <- list()
for (i in 2:Maxleg){
tradeLeg <- paste0('legs',i)
tradeLegsList <- combn(names(dd), i,
function(x) dd[x], simplify = FALSE)
nameMtx <- combn(names(dd), i)
names(tradeLegsList) <- apply(nameMtx, MARGIN=2,
FUN=function(x) paste(rev(x), collapse='*'))
tradeList[[tradeLeg]] <- lapply(tradeLegsList, colDiff)
}
return(tradeList)
}
tradeCombos <- getTrades(priceList, 4)
This retains the names of the constitutent parts, and is everything I was trying to achieve.
Many thanks to all for the help.
Whoa... ignore everything below and jump to the update
As mentioned in my comment, you can just use combn. This solution doesn't take you to your very last step, but instead, creates a list of data.frames. From there, it is easy to use lapply to get to whatever your final step would be.
Here's the simplified function:
TradeCombos <- function(dd, MaxLeg) {
combos = combn(names(dd), MaxLeg)
apply(combos, 2, function(x) dd[x])
}
To use it, just specify your dataset and the number of combinations you're looking for.
TradeCombos(priceList, 3)
TradeCombos(priceList, 4)
Moving on: #mplourde has shown you how to use Reduce to successively subtract. A similar approach would be taken here:
cumDiff <- function(x) Reduce("-", rev(x))
lapply(TradeCombos(priceList, 3), cumDiff)
By keeping the output of the TradeCombos function as a list of data.frames, you'll be leaving more room for flexibility. For instance, if you wanted row sums, you can simply use lapply(TradeCombos(priceList, 3), rowSums); similar approaches can be taken for whatever function you want to apply.
Update
I'm not sure why #GSee didn't add this as an answer, but I think it's pretty awesome:
Get your list of data.frames as follows:
combn(names(priceList), 3, function(x) priceList[x], simplify = FALSE)
Advance as needed. (For example, using the cumDiff function we created: combn(names(priceList), 2, function(x) cumDiff(priceList[x]), simplify = FALSE).)
This gets your eventual aim using lapply, apply, and Reduce.
lapply(tradeCombos,
function(combos)
apply(combos, MARGIN=2, FUN=function(combo) Reduce('-', priceList[rev(combo)])))
combo is a column from one of the combo matrices in tradeCombos. rev(combo) reverses the column so the last value is first. The R syntax for selecting a subset of columns from a data.frame is DF[col.names], so priceList[rev(combo)] is a subset of priceList with just the columns in combo, in reverse order. data.frames are actually just lists of columns, so any function that's designed to iterate over lists can be used to iterate over the columns in a data.frame. Reduce is one such function. Reduce takes a function (in this case the subtract function -) and a list of arguments and then successively calls the function on the arguments in the list with the results of the previous call, e.g., (((arg1 - arg2) - arg3) - arg4).
You rename the columns in tradeCombos so that the final column names reflect their source with:
tradeCombos <- lapply(tradeCombos,
function(combos) {
dimnames(combos)[[2]] <- apply(combos,
MARGIN=2,
FUN=function(combo) paste(rev(combo), collapse='-')
)
return(combos)
}
)
tradeCombos is a list with matrix elements. Therefore, tradeCombos[[1]] is a matrix for which apply is more suitable.
apply(tradeCombos[[1]],1,function(x) match(x,names(priceList)))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 1 6
[6,] 1 7
[7,] 2 3
[8,] 2 4
[9,] 2 5
[10,] 2 6
[11,] 2 7
[12,] 3 4
[13,] 3 5
[14,] 3 6
[15,] 3 7
[16,] 4 5
[17,] 4 6
[18,] 4 7
[19,] 5 6
[20,] 5 7
[21,] 6 7
Incidentally, you can subset using the string form anyway, eg priceList[,"aaa"]
I want to find maximum value in each column for every 2 rows (say). How to do that in R? For example
matrix(c(3,1,20,5,4,12,6,2,9,7,8,7), byrow=T, ncol=3)
I want the output like this
matrix(c(5,4,20,7,8,9), byrow=T, ncol=3)
Here is one way of doing it.
Define a vector that contains information about the groups you want. In this case, I use rep to repeat a sequence of numbers.
Then define a helper function to calculate the column maximum of an array — this is a simple apply of max.
finally, use sapply with an anonymous function that applies colMax to each of your grouped array subsets.
The code:
groups <- rep(1:2, each=2)
colMax <- function(x)apply(x, 2, max)
t(
sapply(unique(groups), function(i)colMax(x[which(groups==i), ]))
)
The results:
[,1] [,2] [,3]
[1,] 5 4 20
[2,] 7 8 9
A one long line:
t(sapply(seq(1,nrow(df1),by=2),function(i) apply(df1[seq(i,1+i),],2,max)))
Another option,
do.call(rbind, by(m, gl(nrow(m)/2, 2), function(x) apply(x, 2, max)))
apply(mat, 2, function(x) tapply(x, # work on each column
# create groups of 2 vector of proper length: 1,1,2,2,3,3,4,4 ....
rep(1:(length(x)/2), each=2, len=length(x))
max))
[,1] [,2] [,3]
1 5 4 20
2 7 8 9