Working with matrices in r - r

I'm working on code to construct an option pricing matrix. What I have at the moment is the values along the diagonal part of the matrix. Currently I'm working in a matrix with 4 rows and 4 columns. What I'm attempting to do is to use the values in the diagonal part of the matrix to give values in the lower triangle of the matrix. So for my matrix Omat, Omat[1,1]+Omat[2,2] will give a value for [2,1], Omat[2,2]+Omat[3,3] will give a value for [3,2]. Then using these created values, Omat[2,1]+Omat[3,2] will give a value for [3,1].
My attempt:
Omat = diag(2, 4, 4)
Omat[j+i,j] <- Omat[i-1,j]+Omat[i,j+1]
Any ideas on how one could go about this?
What I currently have, a 4 row by 4 col matrix:
Omat
# 2 0 0 0
# 0 2 0 0
# 0 0 2 0
# 0 0 0 2
What I've been attempting to create, a 4 row by 4 col matrix:
0 0 0 0
4 0 0 0
8 4 0 0
16 8 4 0

You could try calculating successive diagonals underneath the main diagonal. Code could look like:
Omat = diag(2,4)
for(i in 1:(nrow(Omat)-1)) {
for( j in (i+1):nrow(Omat)) {
Omat[j,j-i] <- Omat[j,j-i+1] + Omat[j-1,j-i]
}
}
diag(Omat) <- 0

Am I probably missing something, but why not do this:
for (i in 2:dim){
for (j in 1:(i-1)){
Omat[i,j] <- Omat[i-1,j] + Omat[i,j+1]
}
}
diag(Omat) <- 0
,David.

Related

Populate Binary Matrix with Double For Loop in R

I'm working on populating a binary matrix based on values from a different table. I can create the matrix but am struggling with the looping needed to populate it. I think this is a pretty simple issue so I hope I can get some easy help.
Here's an example of my data:
start <- c(291, 291, 291, 702, 630, 768)
sequence <- c("chr9:103869456:103870456", "chr5:30823103:30824103", "chr11:49801703:49802703", "chr4:133865601:133866601", "chr12:55738034:55739034", "chr8:96569493:96570493")
motif <- c("ARI5B", "ARI5B", "ARI5B", "ATOH1", "EGR1", "EGR1")
df <- data.frame(start, sequence, motif)
I have created a character vector for each unique motif+start values like so:
x <- sprintf("%s_%d", df$motif, df$start)
x <- unique(x)
Next I create a binary matrix with the sequences as rows and the values from x as columns:
binmat <- matrix(0, nrow = length(df$sequence), ncol = length(x))
rownames(binmat) <- df$sequence
colnames(binmat) <- x
And now I'm stuck. I want to iterate through columns and rows and put a 1 in each position that has a match. For example, the first sequence is "chr9:103869456:103870456" and it has motif "ARI5B" at starting position 291, so it should get a 1 while the rest of the values in that row remain at 0. The output of this example should look like this:
ARI5B_291 ATOH1_702 EGR1_630 EGR1_768
chr9:103869456:103870456 1 0 0 0
chr5:30823103:30824103 1 0 0 0
chr11:49801703:49802703 1 0 0 0
chr4:133865601:133866601 0 1 0 0
chr12:55738034:55739034 0 0 1 0
chr8:96569493:96570493 0 0 0 1
But so far I am unsuccessful. I think I need a double for loop somewhere along these lines:
for (row in binmat){
for (col in binmat){
if (row && col %in% x){
1
} else { 0
}
}
}
But all I get are 0s.
Thanks in advance!
Aren't you just looking for table here? You can get the result as a vectorized one-liner, without loops, by doing:
table(factor(df$sequence, df$sequence), sprintf("%s_%d", df$motif, df$start))
ARI5B_291 ATOH1_702 EGR1_630 EGR1_768
chr9:103869456:103870456 1 0 0 0
chr5:30823103:30824103 1 0 0 0
chr11:49801703:49802703 1 0 0 0
chr4:133865601:133866601 0 1 0 0
chr12:55738034:55739034 0 0 1 0
chr8:96569493:96570493 0 0 0 1

Biclustering in R

I want to apply byclustering on a binary matrix in R. There is a nice package called "biclust" available, but it does and displays not everything that I want.
I have a binary matrix which looks like the following:
1 0 0 1 0 1 0
0 0 0 0 0 0 0
0 0 1 0 1 0 0
1 0 0 1 0 1 0
0 0 1 0 1 0 0
1 0 0 1 0 1 0
0 0 0 0 0 0 0
And my goal is to bicluster (and display) this as following (may be colored):
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
0 0 0 1 1 0 0
0 0 0 1 1 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Set up code:
# install.packages("biclust") (if necessary)
library("biclust")
testMatrix <- matrix(c(1,0,0,1,0,1,0,
0,0,0,0,0,0,0,
0,0,1,0,1,0,0,
1,0,0,1,0,1,0,
0,0,1,0,1,0,0,
1,0,0,1,0,1,0,
0,0,0,0,0,0,0),
nrow = 7,
ncol = 7,
byrow = TRUE)
I applied the biclust function of the "biclust" R package:
testCluster <- biclust(x = testMatrix, method=BCBimax())
and indeed I get the two clusters expected:
An object of class Biclust
call:
biclust(x = testMatrix, method = BCBimax())
Number of Clusters found: 2
First 2 Cluster sizes:
BC 1 BC 2
Number of Rows: 3 2
Number of Columns: 3 2
I can both display the clusters separately by:
drawHeatmap(x = testMatrix, bicResult = testCluster, number = 1) # shown in picture below
drawHeatmap(x = testMatrix, bicResult = testCluster, number = 2)
and I can display the entire clustered matrix (one cluster at upper left corner) by:
drawHeatmap2(x = testMatrix, bicResult = testCluster, number = 1) # shown in picture below
drawHeatmap2(x = testMatrix, bicResult = testCluster, number = 2)
So far so good, but I want:
Colors of display switched. Now the 1 is red and the 0 is green.
I want to see the rows and columns of the original matrix. Now there are shown just the row numbers and column numbers of the specific cluster (with drawHeatMap) and there are shown no row and column numbers at the entire clustered matrix (drawHeatMap2).
I want a nicely ordered clustered matrix. Now only the cluster specified in drawHeatmap2 is shown in the upper left corner, but for the rest of the matrix I also want the other clusters nicely ordered from the upper left corner to the lower right corner for the rest of the matrix.
Are these changes possible (with the "biclust" package)? Or is it better to do it in another way with R?
Change the drawHeatmap() funtion in the biclust source packag package:
trace("drawHeatmap", edit = TRUE)
Change the following:
(a) Switch red and green - switch the rvect and gvect in call rgb()
(b) Original rownames instead of new - change 'labels=' to '=bicCols' and '=bicRows'.
Print rownumbers: before axis about rows: cat(bicRows).
Save rownumbers to file - before axis about rows: write(bicRows, file="FILENAME.txt")

transition matrix force ncol to equal nrows

I have created a transition matrix as a 'from cluster' (rows) 'to cluster' (columns) frequency. Think Markov chain.
Assume I have 5 from clusters but only 3 to clusters then I get a 5*3 transition matrix. How do a force it to be a 5*5 transition matrix? Effectively how to I show the all zero columns?
I'm after an elegant solution as this will be applied on a much larger problem involving hundreds of clusters. I am really quite unfamiliar with R Matrix's and to my knowledge I don't know of an elegant way to force number of columns to enter number of rows then impute zero's where no match except for using a for loop which my hunch is that's not the best solution.
Example code:
# example data
cluster_before <- c(1,2,3,4,5)
cluster_after <- c(1,2,4,4,1)
# Table output
table(cluster_before,cluster_after)
# ncol does not = nrows. I want to rectify that
# I want output to look like this:
what_I_want <- matrix(
c(1,0,0,0,0,
0,1,0,0,0,
0,0,0,1,0,
0,0,0,1,0,
1,0,0,0,0),
byrow=TRUE,ncol=5
)
# Possible solution. But for loop can't be best solution?
empty_mat <- matrix(0,ncol=5,nrow=5)
matrix_to_update <- empty_mat
for (i in 1:length(cluster_before)) {
val_before <- cluster_before[i]
val_after <- cluster_after[i]
matrix_to_update[val_before,val_after] <- matrix_to_update[val_before,val_after]+1
}
matrix_to_update
# What's the more elegant solution?
Thanks in advance for your help. It's much appreciated.
Make them factors and then table:
levs <- union(cluster_before, cluster_after)
table(factor(cluster_before,levs), factor(cluster_after,levs))
# 1 2 3 4 5
# 1 1 0 0 0 0
# 2 0 1 0 0 0
# 3 0 0 0 1 0
# 4 0 0 0 1 0
# 5 1 0 0 0 0
Another solution is to use matrix indicies:
what_I_want <- matrix(0,ncol=5,nrow=5)
what_I_want[cbind(cluster_before,cluster_after)] <- 1
print(what_I_want)
## [,1] [,2] [,3] [,4] [,5]
##[1,] 1 0 0 0 0
##[2,] 0 1 0 0 0
##[3,] 0 0 0 1 0
##[4,] 0 0 0 1 0
##[5,] 1 0 0 0 0
The second line sets the elements corresponding to the row (cluster_before) and column (cluster_after) indices to 1.
Hope this helps.

Matrix of booleans based on quantile in R

I have a matrix whose columns are stock returns and whose rows are dates, which looks like this:
ES1.Index VG1.Index TY1.Comdty RX1.Comdty GC1.Comdty
1999-01-05 0.009828476 0.012405717 -0.003058466 -0.0003480884 -0.001723317
1999-01-06 0.021310816 0.027030061 0.001883240 0.0017392317 0.002425398
1999-01-07 -0.001952962 -0.016130850 -0.002826191 -0.0011591516 0.013425435
1999-01-08 0.007989946 -0.004071275 -0.005913678 0.0016224363 -0.001363540
I'd like to have a function that returns a matrix with the same column-names and row-names filled with 1s and 0s based on whether each observation within each row-vector belongs or not to some group within two given quantiles.
For example, I may want to divide each row vector into 3 groups and have 1s for all observations falling within the 2nd group and 0s elsewhere. The result being something looking like:
ES1.Index VG1.Index TY1.Comdty RX1.Comdty GC1.Comdty
1999-01-05 0 0 1 1 0
1999-01-06 1 0 0 1 0
1999-01-07 0 1 0 0 1
1999-01-08 0 0 1 0 1
(The 1s and 0s in my example are meant to be just a visual outcome, the numbers aren't accurate)
Which would be the least verbose way to get to that?
Taking the intermediate steps of finding the quantiles and testing against them is not necessary. Only the ordinal properties of each vector matter.
# set bounds
lb = 1/3
ub = 2/3
# find ranks
p = t(apply(m,1,rank))/ncol(m)
# test ranks against bounds
+( p >= lb & p <= ub )
ES1.Index VG1.Index TY1.Comdty RX1.Comdty GC1.Comdty
1999-01-05 0 0 0 1 1
1999-01-06 0 0 1 0 1
1999-01-07 1 0 1 0 0
1999-01-08 0 1 0 0 1
We can use apply with MARGIN=1 to loop over the rows, cut each row vector with breaks specified by the quantile, transpose the output to get an output.
t(apply(df1, 1, function(x) {
x1 <- cut(x, breaks= quantile(x, seq(0, 1,1/3)))
+(levels(x1)[2]== x1 & !is.na(x1))}))

referencing indices in boot function

I have a large dataset (DF), a subset of which looks like this:
Site Event HardwareID Species Day1 Day2 Day3 Day4 Day5 Day6
1 1 16_11 x 0 0 0 0 0 0
1 1 29_11 y 0 0 6 2 0 1
1 1 36_11 d 0 0 0 0 0 1
1 1 41_11 y 0 0 2 4 1 1
1 1 41_11 x 0 0 0 0 0 1
1 1 58_11 a 0 0 1 0 0 0
1 1 62_11 y 0 0 0 1 0 0
1 1 62_11 z 0 0 0 0 0 0
1 1 62_11 x 0 0 0 0 0 1
2 1 40_AR b 0 0 0 0 0 0
2 1 12_11 z 0 0 1 0 0 0
I'd like to examine the minimum number of HardwareIDs to produce the most Species over the shortest amount of time, by calculating species accumulation curves (which intrinsically incorporates the Days columns) for each HardwareID, at each different site, and boostrapping the HardwareID selection part (so, look at accumulation curves using two HardwareIDs, then 3, then 4 etc, at each site).
I have written a function to create species accumulation curves (using specaccum) for a subset of these, such as:
Sites<-subset(DF,DF$Site==1)
samples<-function (x) {
specurve_sample<-(ddply(Sites[,4:length(colnames(Sites))],"Species",numcolwise(sum)))
specurve_sample<-specurve_sample[-1,]
n<-specurve_sample$Species
n<-drop.levels(n,reorder=FALSE)
specurve_sample<-specurve_sample[,-1]
specurve_sample <-t(specurve_sample)
colnames(specurve_sample)<-n
specurve_sample<-as.data.frame(specurve_sample)
sample_k<-specaccum(specurve_sample)
out<-rbind(sample_k$richness,sample_k$sd)
outnames<-c("Richness","SD")
st<-rep(Sites$Site[1],2)
out<-as.data.frame(cbind(outnames,st,out))
colnames(out)<-c("label","site","Days")
out
}
The function works fine if I subset my data before hand, but the boostrapping part does not work. I know I need to create a function (x,j) but cannot figure out where to place the j in my function. Here is the rest of my code. Many thanks for any assistance. James
all_data<-c()
for (i in 1:length(unique(DF$Site))) {
Sites<-subset(DF,DF$Site==i)
boots<-boot(Sites,samples, strata=Sites$HardwareID,R=1000)
all_data<-rbind(all_data,boots)
all_data
}
One straightforward way to do this is to create a function of x and j (as you have started to do), and have the first line of that function identify the relevant bootstrap subset from the whole collection, bootsub <- x[j, ]. Then, you can refer to this subset, bootsub throughout the rest of the function, and you need not refer to j again.
In your case, you don't want your function to refer back to your original data frame, Site. So, every where that you have Site in your function, change it to bootsub. For example:
samples <- function(x, j) {
bootsub <- x[j, ]
specurve_sample <- (ddply(bootsub[, 4:length(colnames(bootsub))], "Species", numcolwise(sum)))
specurve_sample <- specurve_sample[-1, ]
n <- specurve_sample$Species
n <- drop.levels(n, reorder=FALSE)
specurve_sample <- specurve_sample[, -1]
specurve_sample <- t(specurve_sample)
colnames(specurve_sample) <- n
specurve_sample <- as.data.frame(specurve_sample)
sample_k <- specaccum(specurve_sample)
out <- rbind(sample_k$richness, sample_k$sd)
outnames <- c("Richness", "SD")
st <- rep(bootsub$Site[1], 2)
out <- as.data.frame(cbind(outnames, st, out))
colnames(out) <- c("label", "site", "Days")
out
}
...
A follow up to the first two comments below. It's a little hard to troubleshoot without data, but this is my best guess. It may be that you have an issue with your subset() function, because you use i as an index of unique sites in the for() loop, but then refer to i as the value of the site in the call to subset(). Also, it is likely more efficient to run one call to do.call() after the for() loop, rather than multiple calls to rbind() inside the loop. Give this untested code a try.
# vector of unique sites
usite <- unique(DF$Site)
# empty list in which to put the bootstrap results
alldatlist <- vector("list", length(usite))
# loop through every site separately, save the bootstrap replicates ($t)
for(i in 1:length(usite)) {
Sites <- subset(DF, DF$Site==usite[i])
alldatlist[[i]] <- boot(Sites, samples, strata=Sites$HardwareID, R=1000)$t
}
# combine the list of results into a single matrix
all_data <- do.call(rbind, alldatlist)

Resources