Keep only observations from consecutive quarters - r

I'm currently looking at bank data for 9 consecutive quarters. I now want to only keep those banks for which I have data from all 9 quarters. Each bank has a unique certification ID. How can I filter using the ID and only keep banks with 9 consecutive observations?
Maybe a way to do this is to count how often a certification ID (cert) shows up and keep only the ones with 9 observations? So this is what I tried:
df <- (...)
a = rle(sort(df$cert))
b = data.frame(id=a$values, n=a$lengths)
c = subset(b, n==9)
I'm unsure if this is correct because I'm trying to reproduce the results of a research paper but the numbers don't match anymore after this step.

One option would be n_distinct with group_by, Grouped by 'id', check whether the number of distinct elements in 'qtr' is 9 and filter those 'id's rows
library(dplyr)
df %>%
group_by(id) %>%
filter(n_distinct(qtr) ==9)

library(tidyverse)
df<-data.frame(id=rep(1:4,times=9),
qtr=rep(1:9,each=4))
df%>%
filter(id %in% (df%>%
count(id)%>%
filter(n>8)%>%.$id))

Generated an example. Use rowSums and !is.na to count the number of rows with values for all 9 columns.
a[rowSums(!is.na(a))==9,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 4 7 10 13 16 19 22 25
[2,] 3 6 9 12 15 18 21 24 27
The data used.
a <- matrix(1:27, ncol=9, nrow=3)
a[2,2] <- NA
a
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 4 7 10 13 16 19 22 25
[2,] 2 NA 8 11 14 17 20 23 26
[3,] 3 6 9 12 15 18 21 24 27

Related

Efficiently reshuffling a long matrix into one consisting of column bound subblocks (of the original) in R

"I have a very long matrix, measuring 30^5 x 3 entries. I basically consists of subblocks of 10.000 30 x 3 matrices, stacked on top of one another. I want to afficiently "cbind" them, next to one another (without looping constructs), leading to a 30 x 30^4 matrix.
Just changing the matrix dimensions does not work, as R fills the new matrix per individual column.
I'm sure there is a very compact, superefficient way of doing this, and I'll slap myself on the forehead as soon as you fill me in on the obvious solution.
Thanks!"
"Just changing the matrix dimensions does not work, as R fills the new matrix per individual column."
```R
test <- matrix(c(1:18), 6, 3, byrow = FALSE)
>test
[,1] [,2] [,3]
[1,] 1 7 13
[2,] 2 8 14
[3,] 3 9 15
[4,] 4 10 16
[5,] 5 11 17
[6,] 6 12 18
dim(test) <- c(3,6)
>test
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
```
The output I'm looking for is:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 7 13 4 10 16
[2,] 2 8 14 5 11 17
[3,] 3 9 15 6 12 18
We can create a grouping variable to split the sequence of rows, subset the matrix and then cbind
do.call(cbind, lapply(split(seq_len(nrow(test)),
as.integer(gl(nrow(test), 3, nrow(test)))), function(i) test[i,]))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 7 13 4 10 16
#[2,] 2 8 14 5 11 17
#[3,] 3 9 15 6 12 18

Enhancing performance of matrix transformation in R

I have a little problem with a fast matrix transformation. That transformation has to be performed a lot of times, so I'm looking for a fast way to this. Imagine a given matrix A and a integer parameter L. Matrix A should be transformed into a new matrix newA with L rows and nrow(A)*ncol(A)/L columns. I take L rows of A and want to transform them into columns. A little example with two possible solutions to this problem newA1 and newA2 where newA1 = newA2 should clarify my explanations:
A = matrix(1:24,6,4,byrow=T) #example matrix
L = 2 #number of rows for new matrix
newA1 = NULL
newA2 = matrix(0,L,nrow(A)*ncol(A)/L)
for(i in 1:(nrow(A)/L)){
newA1 = cbind(newA1,A[((i-1)*L+1):(i*L),]) #slower than newA2
newA2[1:L,((i-1)*ncol(A)+1):(i*ncol(A))] = A[((i-1)*L+1):(i*L),] #faster than newA1
}
The matrices look like that:
> A
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
[6,] 21 22 23 24
> newA1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 9 10 11 12 17 18 19 20
[2,] 5 6 7 8 13 14 15 16 21 22 23 24
L=3 works also in this example.

Extracting Every Nth Element of A Matrix

I want to extract every nth element of row for each row in a matrix, here is my code:
x <- matrix(1:16,nrow=2)
x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 3 5 7 9 11 13 15
[2,] 2 4 6 8 10 12 14 16
I have tried:
sapply(x, function(l) x[seq(1,8,2)])
which clearly fails.
I want to pull every 2nd value from "x" the desired output would be something like...
[,1] [,2] [,3] [,4]
[1,] 3 7 11 15
[2,] 4 8 12 16
You are overcomplicating it:
This gives you what you need
x[,seq(2, 8, 2)]
or, more generally
x[,seq(2, ncol(x), 2)]

Pairwise calculation in r

I have been thinking about a problem I have but I don't know how to express the problem to even search for it. I'd be very thankful if you could explain it to me.
So, I have a data set with the following format:
10 6 4 4
10 6 4 4
7 6 4 4
I want to conduct a pairwise calculation for which I need to sum each element to the other one by one. That is 1 with 2, 1 with 3, 1 with 4, 2 with 3, 2 with 4 and 3 with 4.
I thought to do a nested a loop in R which I read about it and I started like this:
for (i in 1:r-1) { ## r the number of columns
for (j in (i+1):r) {
....
}
I am stuck at this stage, I don't know how to express in codes what I need to do. I am sorry for posting a not progressed code, some advice would be very good that how I should go about it.
Thanks a lot in advance.
Use combn to create the "pairs":
(pairs <- combn(4,2))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 2 2 3
[2,] 2 3 4 3 4 4
Then apply across the rows of your data by summing these subsets by applying across the columns of the pairs:
dat <- matrix(c(10,10,7,6,6,6,4,4,4,4,4,4),ncol=4)
t(apply(dat, 1, function(x) apply(combn(4,2),2,function(y) sum(x[y]))))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 16 14 14 10 10 8
[2,] 16 14 14 10 10 8
[3,] 13 11 11 10 10 8
You could slightly modify your loop:
d <- read.table(text='
10 6 4 4
10 6 4 4
7 6 4 4')
nc <- ncol(d)
r <- NULL
for (i in 1:nc) {
for (j in 1:nc) {
if (i < j) { # crucial condition
r <- cbind(r, d[, i] + d[, j]) # calculate new column and bind to calculated ones
}
}
}
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 16 14 14 10 10 8
[2,] 16 14 14 10 10 8
[3,] 13 11 11 10 10 8
Another application of combn but perhaps easier to understand:
apply(combn(ncol(dat),2), 2, function(x) rowSums(dat[,x]))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 16 14 14 10 10 8
## [2,] 16 14 14 10 10 8
## [3,] 13 11 11 10 10 8
Here, the matrix dat is indexed by each column of the result of combn giving a matrix of two columns (the two columns to be summed). rowSums then does the arithmetic.
Because I really like package functional, here is a slight variation on the above:
apply(combn(ncol(dat),2), 2, Compose(Curry(`[`, dat, i=seq(nrow(dat))), rowSums))
It should be noted that a combn approach is more flexible than using nested for loops for this sort of computation. In particular, it is easily adapted to any number of columns to sum:
f <- function(dat, num=2)
{
apply(combn(ncol(dat),num), 2, function(x) rowSums(dat[,x,drop=FALSE]))
}
This will give all combinations of num columns, and sum them:
f(dat, 1)
## [,1] [,2] [,3] [,4]
## [1,] 10 6 4 4
## [2,] 10 6 4 4
## [3,] 7 6 4 4
f(dat, 2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 16 14 14 10 10 8
## [2,] 16 14 14 10 10 8
## [3,] 13 11 11 10 10 8
f(dat, 3)
## [,1] [,2] [,3] [,4]
## [1,] 20 20 18 14
## [2,] 20 20 18 14
## [3,] 17 17 15 14
f(dat, 4)
## [,1]
## [1,] 24
## [2,] 24
## [3,] 21

How to index by 3's in a for loop R

I have a matrix that is 39 columns wide and I want to get the average values across rows for the first three columns then next three ect. so I would have 13 columns total after everything was done. Triple would be the indexes I would like to use but it just makes a vector from 1:39.
Triple <- c(1:3, 4:6, 7:9, 10:12, 13:15, 16:18, 19:21, 22:24, 25:27, 28:30, 31:33, 34:36, 37:39)
AveFPKM <- matrix(nrow=54175, ncol=13)
for (i in 1:39){
Ave <- rowMeans(AllFPKM[,i:i+2])
AveFPKM[,i] <- Ave
i+2
}
Thanks for the help
With some specifying of dimensions and apply-ing, you can pretty easily get your result. Here's a smaller example:
test <- matrix(1:36,ncol=12)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#[1,] 1 4 7 10 13 16 19 22 25 28 31 34
#[2,] 2 5 8 11 14 17 20 23 26 29 32 35
#[3,] 3 6 9 12 15 18 21 24 27 30 33 36
Now get the mean of each row in each block of three columns:
apply(structure(test,dim=c(3,3,4)),c(1,3),mean)
# [,1] [,2] [,3] [,4]
#[1,] 4 13 22 31
#[2,] 5 14 23 32
#[3,] 6 15 24 33
Or generally, assuming your number of columns is always exactly divisible by the group size:
grp.row.mean <- function(x,grpsize) {
apply(structure(x,dim=c(nrow(x),grpsize,ncol(x)/grpsize)),c(1,3),mean)
}
grp.row.mean(test,3)
Here's a solution using sapply, taking advantage of the fact we know the number of columns is exactly a multiple of 3:
sapply(1:13, function(x) {
i <- (x-1)*3 + 1 # Get the actual starting index
rowMeans(AveFPKM[,i:(i+2)])
})

Resources