Circumvent aggregation in for-loop R - r

I want to find the first index k of an array, where the aggregate until that k is bigger than an given cutoff. This looks like follows in the code:
k <- 0
agg <- 0
while (agg < cutoff) {
k <- k +1
agg <- sum(array[1:k])
}
I was told there is a way to rewrite this without the for loop, I was also told the which statement would be helpful. I'm new to R and couldn't find the way. Any thoughts on this?

First we find array of partial sums:
x <- 1:10
partial_sums <- Reduce('+', x, accumulate = T)
partial_sums
[1] 1 3 6 10 15 21 28 36 45 55
Next we find the indices of all the elements of partial_sums array which are bigger then cutoff:
cutoff <- 17
indices <- which(partial_sums > cutoff)
indices[1]
[1] 6
Please note, that indices could be empty.

You can use the following:
seed(123)#in order to have reproducible "random" numbers
m1 <- matrix(sample(10),nrow = 5,ncol = 2)# create a matrix
m1
[,1] [,2]
[1,] 7 5
[2,] 4 2
[3,] 9 8
[4,] 1 6
[5,] 3 10
cutoff <- 5 #applying cutoff value
apply(m1,2,function(x){x<cutoff})#checking each column using apply instead of loop
OR:
which(m1 < cutoff) #results in the indices of m1 that comply to the condition <cutoff
[1] 2 4 5 7
EDIT
cutoff<-30# a new cutoff
v1<-unlist(lapply(seq_along(1:(nrow(m1)*ncol(m1))),function(x){sum(m1[1:x])}))#adding the values of each cell
which(v1>=cutoff)[1]#find the 1st of occurrence

Related

Create n by n matrix with unique values from 1:n

I want generate a random n by n matrix in R with discrete values ranging from 1 to n. The tricky part is that I want each value to be unique both in the row and on the column.
For example, if n=3 the matrix could look like:
1 2 3
2 3 1
3 1 2
or it could look like this:
2 3 1
1 2 3
3 1 2
anyone has any idea of how to generate this kind of matrix?
What you want is called a Latin square. Here's one function (from the Cookbook for R; see also here and a bunch of other search results online) allowing to generate them:
latinsquare <- function(len, reps=1, seed=NA, returnstrings=FALSE) {
# Save the old random seed and use the new one, if present
if (!is.na(seed)) {
if (exists(".Random.seed")) { saved.seed <- .Random.seed }
else { saved.seed <- NA }
set.seed(seed)
}
# This matrix will contain all the individual squares
allsq <- matrix(nrow=reps*len, ncol=len)
# Store a string id of each square if requested
if (returnstrings) { squareid <- vector(mode = "character", length = reps) }
# Get a random element from a vector (the built-in sample function annoyingly
# has different behavior if there's only one element in x)
sample1 <- function(x) {
if (length(x)==1) { return(x) }
else { return(sample(x,1)) }
}
# Generate each of n individual squares
for (n in 1:reps) {
# Generate an empty square
sq <- matrix(nrow=len, ncol=len)
# If we fill the square sequentially from top left, some latin squares
# are more probable than others. So we have to do it random order,
# all over the square.
# The rough procedure is:
# - randomly select a cell that is currently NA (call it the target cell)
# - find all the NA cells sharing the same row or column as the target
# - fill the target cell
# - fill the other cells sharing the row/col
# - If it ever is impossible to fill a cell because all the numbers
# are already used, then quit and start over with a new square.
# In short, it picks a random empty cell, fills it, then fills in the
# other empty cells in the "cross" in random order. If we went totally randomly
# (without the cross), the failure rate is much higher.
while (any(is.na(sq))) {
# Pick a random cell which is currently NA
k <- sample1(which(is.na(sq)))
i <- (k-1) %% len +1 # Get the row num
j <- floor((k-1) / len) +1 # Get the col num
# Find the other NA cells in the "cross" centered at i,j
sqrow <- sq[i,]
sqcol <- sq[,j]
# A matrix of coordinates of all the NA cells in the cross
openCell <-rbind( cbind(which(is.na(sqcol)), j),
cbind(i, which(is.na(sqrow))))
# Randomize fill order
openCell <- openCell[sample(nrow(openCell)),]
# Put center cell at top of list, so that it gets filled first
openCell <- rbind(c(i,j), openCell)
# There will now be three entries for the center cell, so remove duplicated entries
# Need to make sure it's a matrix -- otherwise, if there's just
# one row, it turns into a vector, which causes problems
openCell <- matrix(openCell[!duplicated(openCell),], ncol=2)
# Fill in the center of the cross, then the other open spaces in the cross
for (c in 1:nrow(openCell)) {
# The current cell to fill
ci <- openCell[c,1]
cj <- openCell[c,2]
# Get the numbers that are unused in the "cross" centered on i,j
freeNum <- which(!(1:len %in% c(sq[ci,], sq[,cj])))
# Fill in this location on the square
if (length(freeNum)>0) { sq[ci,cj] <- sample1(freeNum) }
else {
# Failed attempt - no available numbers
# Re-generate empty square
sq <- matrix(nrow=len, ncol=len)
# Break out of loop
break;
}
}
}
# Store the individual square into the matrix containing all squares
allsqrows <- ((n-1)*len) + 1:len
allsq[allsqrows,] <- sq
# Store a string representation of the square if requested. Each unique
# square has a unique string.
if (returnstrings) { squareid[n] <- paste(sq, collapse="") }
}
# Restore the old random seed, if present
if (!is.na(seed) && !is.na(saved.seed)) { .Random.seed <- saved.seed }
if (returnstrings) { return(squareid) }
else { return(allsq) }
}
mats is a list of such matrices. It uses r2dtable to generate N random n x n matrices whose elements are chosen from 0, 1, ..., n-1 and whose margins are each given by margin. Then it filters out those for which all columns columns have one each of 0:(n-1) and adds one to each matrix to give result. The number of matrices returned can vary and you have to generate a huge number of matrices N to get just a few as n gets larger. When I tried n <- 3 below mats was a list of 24 matrices out of 100 but with n <- 4 it only found 1 out of 100.
set.seed(123)
N <- 100 # no of tries
n <- 3 # rows of matrix (= # cols)
check <- function(x) all(apply(x, 2, sort) == seq_len(nrow(x))-1)
margin <- sum(seq_len(n))-n
margins <- rep(margin, n)
L <- r2dtable(N, r = margins, c = margins)
mats <- lapply(Filter(check, L), "+", 1)
Here is an attempt:
x <- c(1,2,3)
out <- NULL
for(i in 1:3){
y <- c(x[1 + (i+0) %% 3], x[1 + (i+1) %% 3], x[1 + (i+2) %% 3])
out <- rbind(out,y)
}
This gives:
> out
[,1] [,2] [,3]
y 2 3 1
y 3 1 2
y 1 2 3
For the general case:
n <- 4
x <- 1:n
out <- NULL
for(i in 1:n){
y <- x[1 + ((i+0:(n-1))%%n)]
out <- rbind(out,y)
}
If I'm not wrong this is the expected result:
> out
[,1] [,2] [,3] [,4]
y 2 3 4 1
y 3 4 1 2
y 4 1 2 3
y 1 2 3 4
Shorter:
n < 4
x <- 1:n
vapply(x, function(i) x[1 + ((i+0:(n-1))%%n)], numeric(n))
Here is one version that generates all possible rows for such matrix and then takes them one by one, restricting the selection to the valid choices each time:
n <- 9
allrows <- combinat::permn(n)
takerows <- function(taken, all) {
available <- rep(TRUE, length(all))
for(i in 1:nrow(taken)) {
available <- sapply(all, function(x) all((x-taken[i,])!=0)) & available
}
matrix(all[[which(available)[sample(sum(available), 1)]]], nrow=1)
}
magicMat <- takerows(matrix(rep(0, n), ncol=n), allrows)
for(i in 1:(n-1)) {
magicMat <- rbind(magicMat, takerows(magicMat, allrows))
}
> magicMat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 5 3 1 4 2 8 6 7 9
[2,] 9 8 6 2 1 3 7 4 5
[3,] 4 5 7 8 9 2 3 6 1
[4,] 3 9 2 1 6 7 5 8 4
[5,] 1 6 5 3 8 4 2 9 7
[6,] 7 2 4 9 3 5 8 1 6
[7,] 6 4 8 5 7 1 9 3 2
[8,] 8 1 9 7 5 6 4 2 3
[9,] 2 7 3 6 4 9 1 5 8

Ranking and Counting Matrix Elements in R

I know there are similar questions but I couldn't find an answer to my question. I'm trying to rank elements in a matrix and then extract data of 5 highest elements.
Here is my attempt.
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
start<-d[1,1]
for (i in 1:10) {
for (j in 1:10) {
if (start < d[i,j])
{high<-d[i,j]
rowind<-i
colind<-j
}
}
}
Although this gives me the data of the highest element, including row and column numbers, I can't think of a way to do the same for elements ranked from 2 to 5. I also tried
rank(d, ties.method="max")
But it wasn't helpful because it just spits out the rank in vector format.
What I ultimately want is a data frame (or any sort of table) that contains
rank, column name, row name, and the data(number) of highest 5 elements in matrix.
Edit
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
d[1,2]<-5
d[2,1]<-5
d[1,3]<-4
d[3,1]<-4
Thanks for the answers. Those perfectly worked for my purpose, but as I'm running this code for correlation chart -where there will be duplicate numbers for every pair- I want to count only one of the two numbers for ranking purpose. Is there any way to do this? Thanks.
Here's a very crude way:
DF = data.frame(row = c(row(d)), col = c(col(d)), v = c(d))
DF[order(DF$v, decreasing=TRUE), ][1:5, ]
row col v
91 1 10 2.208443
82 2 9 1.921899
3 3 1 1.785465
32 2 4 1.590146
33 3 4 1.556143
It would be nice to only have to partially sort, but in ?order, it looks like this option is only available for sort, not for order.
If the matrix has row and col names, it might be convenient to see them instead of numbers. Here's what I might do:
dimnames(d) <- list(letters[1:10], letters[1:10])
DF = data.frame(as.table(d))
DF[order(DF$Freq, decreasing=TRUE), ][1:5, ]
Var1 Var2 Freq
91 a j 2.208443
82 b i 1.921899
3 c a 1.785465
32 b d 1.590146
33 c d 1.556143
The column names don't make much sense here, unfortunately, but you can change them with names(DF) <- as usual.
Here is one option with Matrix
library(Matrix)
m1 <- summary(Matrix(d, sparse=TRUE))
head(m1[order(-m1[,3]),],5)
# i j x
#93 3 10 2.359634
#31 1 4 2.234804
#23 3 3 1.980956
#55 5 6 1.801341
#16 6 2 1.678989
Or use melt
library(reshape2)
m2 <- melt(d)
head(m2[order(-m2[,3]), ], 5)
Here is something quite simple in base R.
# set.seed(20)
# d <- matrix(rnorm(100), nrow = 10, ncol = 10)
d.rank <- matrix(rank(-d), nrow = 10, ncol = 10)
which(d.rank <= 5, arr.ind=TRUE)
row col
[1,] 3 1
[2,] 2 4
[3,] 3 4
[4,] 2 9
[5,] 1 10
d[d.rank <= 5]
[1] 1.785465 1.590146 1.556143 1.921899 2.208443
Results can (easily) be made clearer (see comment from Frank):
cbind(which(d.rank <= 5, arr.ind=TRUE), v = d[d.rank <= 5], rank = rank(-d[d.rank <= 5]))
row col v rank
[1,] 3 1 1.785465 3
[2,] 2 4 1.590146 4
[3,] 3 4 1.556143 5
[4,] 2 9 1.921899 2
[5,] 1 10 2.208443 1

Nested While-Loop - R

So I have a table m, consisting of a random number of rows and columns. (can be any size)...
I want to do this calculation against each rows/columns totals:
r[i] * c[j] / n;
Where r <- rowSums(m);, c <- colSums(m); and n <- sum(m);
I can do it with a double for-loop but I'm hoping to implement it now using while loops.
I wasn't going to use while loops but seems the table size can differ, I figured it was wise too.
I'm storing each value as it's found in a test vector.
This is my attempt, but I'm messing up the indices:
while(i < nrow(m)){
while(j < ncol(m)){
test[i] <- r[i]*c[j] / n;
j=j+1;
i=i+1;
}
j=j+1;
i=i+1;
}
Any guidance to help me sort out my loops would be much appreciated. Thanks in advance.
update
See below for an example and expected result:
m <- t(matrix(c(28,48,10,114), nrow=2, ncol=2));
r <- rowSums(m); #76 124 (sum of rows)
c <- colSums(m); #38 162 (sum of cols)
n <- sum(m); #200 (sum of all cells)
test <- c(0, times length(m)); #empty vector/data frame
#inside while loops, calc for each r and c indice:
test[1] <- 76 *38 /200 #first calc to test[i] where i=1
test[2] <- 124*38 /200
test[3] <- 76*162 /200
test[4] <- 124*162/200 #last calc to test[i] where i==length(m)
I would avoid using a for or while loop and do something like this instead:
> as.vector(outer(r,c, function(x,y) x*y/n))
[1] 14.44 23.56 61.56 100.44
No need to use a while loop. It is always best to use vector operations in R (and any other array-based language). It makes for clearer and faster code.
nrows<-sample(1:100,1) # a random number of rows
ncols<-sample(1:100,1) # a random number of columns
#create a matrix of random numbers with our random dimnesions
m<-matrix(runif(nrows*ncols), nrow=nrows)
n<-sum(m)
#read into outer, it creates a cartesian product of your vectors
#so you will have every r[i] multipled with every r[j]...ie what your loop is doing
r<-outer(rowSums(m),colSums(m),function(x,y) x*y/n)
Hope this helps, let me know if you have any questions.
A more R-like solution would be to use expand.grid instead of a nested while loop:
Set-up:
> m <- matrix(1:12, 3, 4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> n <- sum(m)
> r <- rowSums(m)
> c <- colSums(m)
Now:
> test <- expand.grid(r,c)
> test
Var1 Var2
1 22 6
2 26 6
3 30 6
4 22 15
5 26 15
6 30 15
7 22 24
8 26 24
9 30 24
10 22 33
11 26 33
12 30 33
> test <- test[,1] * test[,2] / n
> test
[1] 1.692308 2.000000 2.307692 4.230769 5.000000 5.769231 6.769231
[8] 8.000000 9.230769 9.307692 11.000000 12.692308

In R, can out of bounds indexing return NAs on matrices, like it does on vectors?

I would like an out of bounds subscript on a matrix in R to return NAs instead of an error, like it does on vectors.
> a <- 1:3
> a[1:4]
[1] 1 2 3 NA
> b <- matrix(1:9, 3, 3)
> b
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> b[1:4, 1]
Error: subscript out of bounds
>
So I would have liked it to return:
[1] 1 2 3 NA
Right now I am doing this with ifelse tests to see if the index variables exist in the rownames but on large data structures this is taking quite a bit of time. here is an example:
s <- split(factors, factors$date) # split so each date has its own list
names <- last(s)[[1]]$bond # names of bonds that we want
cdmat <- sapply(names, function(n)
sapply(s, function(x)
if(n %in% x$bond) x[x$bond == n, column] else NA))
where factors is an xts with about 250 000 rows. So it's taking about 15 seconds and that's too long for my application.
The reason this is important is that each list element I am applying this to has a different length, but I need to output a matrix with equal length columns as a result of the sapply. I don't want another list out with different length elements.
Actually I have just realised that if I take the column I want and turn it into a vector, this works perfectly. So:
> b[, 1][1:4]
[1] 1 2 3 NA

R - How to sum objects in a column between an interval defined by conditions on another column

This comes as an application to this question:Sum object in a column between an interval defined by another column
What I would like to know is how to adjust the answer if I want to sum the values in B, for ((A[i+1]-A[i]==0) or (A[i+1]-A[i]==1) or (A[i]-A[i-1]==0) or (A[i]-A[i-1]==1)) where i is the row index, so basically sum B rows for A-s that have the same value +/- 1, but not sum the same row twice?
I tried building a loop function but I get stuck when using row indices with data frames.
Example:
If the following data frame is given
df
A B
[1,] 1 4
[2,] 1 3
[3,] 3 5
[4,] 3 7
[5,] 4 3
[6,] 5 2
What I want to obtain is the next data frame:
df
A B
[1,] 1 7
[2,] 3 15
[3,] 5 2
Moreover if a have a large data frame like this:
df
chr start stop m n s
chr1 71533361 71533362 23 1 -
chr1 71533361 71533362 24 26 -
chr1 71533361 71533362 25 1 -
and I want my result to look like this (I chose the row for which the value in column m is max):
df
chr1 71533361 71533362 24 28 -
Try the following, assuming your original dataframe is df:
df2 <- df # create a duplicate df to destroy
z <- data.frame(nrow=length(unique(df$A)), ncol=2) # output dataframe
names(z) <- c("A","B")
j <- 1 # output indexing variable
u <- unique(df$A) # unique vals of A
i <- u[1]
s <- TRUE # just for the while() loop
while(s){
z[j,] <- c(i,sum(df2[df2$A %in% c(i-1,i,i+1),2]))
df2 <- df2[!df2$A %in% c(i-1,i,i+1),]
j <- j + 1 # index the output
u <- u[!u %in% c(i-1,i,i+1)] # cleanup the u vector
if(length(u)==0) # conditionally exit the loop
s <- FALSE
else
i <- min(u) # reset value to sum by
}
I know that's kind of messy code, but it's a sort of tough problem given all of the different indices.
I would create a for loop that tests whether A[i] - A[i-1] meets your criteria.
If that is true it adds b[i] to a sum variable and repeats its way through.
Because i is just iterating through A[] it shouldn't count anything from B[] twice.

Resources