Count the number of appeareances in a list, trick? - r

I have a matrix of 1 and 0. The rules concerning this table is as follows.
I would like to count the number of times a serie of 1,1 appears (where the 1 are not separated by 0!) and make the same thing for a serie of 1,1,1. I have tried colSums but it's seemed not to be very appropriate.
the matrix final is
t1 t2 t3 t4 t5 t6 t7
[1,] 0 0 0 0 1 1 0
[2,] 0 0 1 1 0 0 1
[3,] 1 1 0 0 0 0 0
[4,] 0 0 1 1 1 0 0
[5,] 0 0 1 1 0 0 0
[6,] 1 1 0 0 0 0 0
[7,] 0 0 0 0 0 0 1
[8,] 0 0 0 0 1 1 0
[9,] 1 1 0 0 1 1 0
[10,] 0 0 0 0 0 1 1
[11,] 1 1 0 0 0 0 0
[12,] 0 0 1 1 0 0 0
[13,] 0 0 0 0 0 0 0
[14,] 0 0 0 0 0 0 1
[15,] 0 0 0 0 0 0 0
Therefore for the first row I would like to have 1 time a serie of 1,1 and 0 time a serie of 1,1,1. For row 4 I would like to have 0 time a serie of 1,1 but 1 time a serie of 1,1,1.
Can anyone tell me whats wrong with the following code for a serie of 1,1?
occ <- matrix()
occ_temp <- matrix
for (j in 1:nrow(final)){
for (i in 2:7){
if (sum(final[j,i-1:i])==2){occ_temp[j,i-1]=1}
}
occ[j] <- sum(occ_temp)
}

We can loop through the rows with apply, get the run-length-type with rle, extract the lengths where the values are 1, check that are equal to 'n1' and 'n2', and get the sum.
n1 <- 2
n2 <- 3
res <- t(apply(m1, 1, FUN=function(x) {
x1 <- with(rle(x), lengths[!!values])
c(sum(x1==n1), sum(x1==n2))
}))
colnames(res) <- paste0("count", c(11, 111))
res
# count11 count111
# [1,] 1 0
# [2,] 1 0
# [3,] 1 0
# [4,] 0 1
# [5,] 1 0
# [6,] 1 0
# [7,] 0 0
# [8,] 1 0
# [9,] 2 0
#[10,] 1 0
#[11,] 1 0
#[12,] 1 0
#[13,] 0 0
#[14,] 0 0
#[15,] 0 0

Related

Get all combination of treatment & control in R [duplicate]

This question already has answers here:
Create combinations of a binary vector
(6 answers)
Closed 1 year ago.
I am using the combn function in R to find the combinations of treatment and control in R, however this function doesn't account for order.
Is there another function similar to combn that could account for combination in different orders?
I hope to achieve this in the end
i.e.
000111
001110
011100
....
(should have 20 combination of 6 choose 3)
Try the code below
> t(combn(6, 3, function(k) replace(rep(0, 6), k, 1)))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 0 0 0
[2,] 1 1 0 1 0 0
[3,] 1 1 0 0 1 0
[4,] 1 1 0 0 0 1
[5,] 1 0 1 1 0 0
[6,] 1 0 1 0 1 0
[7,] 1 0 1 0 0 1
[8,] 1 0 0 1 1 0
[9,] 1 0 0 1 0 1
[10,] 1 0 0 0 1 1
[11,] 0 1 1 1 0 0
[12,] 0 1 1 0 1 0
[13,] 0 1 1 0 0 1
[14,] 0 1 0 1 1 0
[15,] 0 1 0 1 0 1
[16,] 0 1 0 0 1 1
[17,] 0 0 1 1 1 0
[18,] 0 0 1 1 0 1
[19,] 0 0 1 0 1 1
[20,] 0 0 0 1 1 1

Specific sequence creation in R

I want to create the following sequences in a smart way instead of hard-coding them:
'0-0-0-0-0-0'
'0-1-0-0-0-0'
'0-0-1-0-0-0'
'0-0-0-1-0-0'
'0-0-0-0-1-0'
'0-0-0-0-0-1'
'1-0-0-0-0-0'
'1-1-0-0-0-0'
'1-0-1-0-0-0'
'1-0-0-1-0-0'
'1-0-0-0-1-0'
'1-0-0-0-0-1'
'1-1-1-1-1-1'
'2-0-0-0-0-0'
'2-1-0-0-0-0'
'2-0-1-0-0-0'
'2-0-0-1-0-0'
'2-0-0-0-1-0'
'2-0-0-0-0-1'
'3-0-0-0-0-0'
'3-1-0-0-0-0'
'3-0-1-0-0-0'
'3-0-0-1-0-0'
'3-0-0-0-1-0'
'3-0-0-0-0-1'
'0-2-0-0-0-0'
'0-0-2-0-0-0'
'0-0-0-2-0-0'
'0-0-0-0-2-0'
'0-0-0-0-0-2'
and so on...
Elaborating more on the details of the pattern that presents: I have 4 states {0,1,2,3} and I want to find all the possible combinations for sequences of length=6 starting with any of the states and allowing only one intermediate position of the sequence to be present in any of the next positions.
Here's one method. I generate a simple description of each sequence, then build the sequences (and de-duplicate, which is needed because of the all-intermediate-0 items).
dd = expand.grid(first = 0:3, inter_value = 0:3, inter_position = 2:6)
result = t(apply(dd, 1, function(x) {
z = c(x["first"], rep(0L, 5))
z[x["inter_position"]] = x["inter_value"]
z
}))
result = result[!duplicated(result), ]
dim(result)
# [1] 64 6
head(result, 10)
# first
# [1,] 0 0 0 0 0 0
# [2,] 1 0 0 0 0 0
# [3,] 2 0 0 0 0 0
# [4,] 3 0 0 0 0 0
# [5,] 0 1 0 0 0 0
# [6,] 1 1 0 0 0 0
# [7,] 2 1 0 0 0 0
# [8,] 3 1 0 0 0 0
# [9,] 0 2 0 0 0 0
# [10,] 1 2 0 0 0 0
Getting the dashes:
apply(result, 1, paste, collapse = "-")
# [1] "0-0-0-0-0-0" "1-0-0-0-0-0" "2-0-0-0-0-0" "3-0-0-0-0-0" "0-1-0-0-0-0" "1-1-0-0-0-0" "2-1-0-0-0-0"
# [8] "3-1-0-0-0-0" "0-2-0-0-0-0" "1-2-0-0-0-0" "2-2-0-0-0-0" "3-2-0-0-0-0" "0-3-0-0-0-0" "1-3-0-0-0-0"
# [15] "2-3-0-0-0-0" "3-3-0-0-0-0" "0-0-1-0-0-0" "1-0-1-0-0-0" "2-0-1-0-0-0" "3-0-1-0-0-0" "0-0-2-0-0-0"
# [22] "1-0-2-0-0-0" "2-0-2-0-0-0" "3-0-2-0-0-0" "0-0-3-0-0-0" "1-0-3-0-0-0" "2-0-3-0-0-0" "3-0-3-0-0-0"
# [29] "0-0-0-1-0-0" "1-0-0-1-0-0" "2-0-0-1-0-0" "3-0-0-1-0-0" "0-0-0-2-0-0" "1-0-0-2-0-0" "2-0-0-2-0-0"
# [36] "3-0-0-2-0-0" "0-0-0-3-0-0" "1-0-0-3-0-0" "2-0-0-3-0-0" "3-0-0-3-0-0" "0-0-0-0-1-0" "1-0-0-0-1-0"
# [43] "2-0-0-0-1-0" "3-0-0-0-1-0" "0-0-0-0-2-0" "1-0-0-0-2-0" "2-0-0-0-2-0" "3-0-0-0-2-0" "0-0-0-0-3-0"
# [50] "1-0-0-0-3-0" "2-0-0-0-3-0" "3-0-0-0-3-0" "0-0-0-0-0-1" "1-0-0-0-0-1" "2-0-0-0-0-1" "3-0-0-0-0-1"
# [57] "0-0-0-0-0-2" "1-0-0-0-0-2" "2-0-0-0-0-2" "3-0-0-0-0-2" "0-0-0-0-0-3" "1-0-0-0-0-3" "2-0-0-0-0-3"
# [64] "3-0-0-0-0-3"
Here's a general nested for-loop solution. Not the most efficient in the world, but gets the desired result (Note: You can change states and/or sequence_len and the sequences will be generated automatically):
states <- 0:3
states_len <- length(states)
sequence_len <- 6
sequence_mat <- matrix(0, states_len*{{states_len-1}*{sequence_len-1}+1}, sequence_len)
rw <- 1
for(ii in states){
for(jj in states){
for(kk in 2:sequence_len){
if(jj != 0){
rw = rw + 1
}
sequence_mat[rw, 1] <- ii
sequence_mat[rw, kk] <- jj
if(jj == rev(states)[1] && kk == sequence_len){
rw = rw + 1
}
}
}
}
Output:
> head(sequence_mat, 20)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 0 0
[5,] 0 0 0 0 1 0
[6,] 0 0 0 0 0 1
[7,] 0 2 0 0 0 0
[8,] 0 0 2 0 0 0
[9,] 0 0 0 2 0 0
[10,] 0 0 0 0 2 0
[11,] 0 0 0 0 0 2
[12,] 0 3 0 0 0 0
[13,] 0 0 3 0 0 0
[14,] 0 0 0 3 0 0
[15,] 0 0 0 0 3 0
[16,] 0 0 0 0 0 3
[17,] 1 0 0 0 0 0
[18,] 1 1 0 0 0 0
[19,] 1 0 1 0 0 0
[20,] 1 0 0 1 0 0

How to form the matrix of logical '1' and '0' using two vectors and logical operators in r?

Here is Matlab code to form the matrix of logical values of '0' and '1'
A=[1 2 3 4 5 6 7 8 9 10 ];
N = numel(A);
step = 2; % Set this to however many zeros you want to add each column
index = N:-step:1;
val = (1:N+step).' <= index;
Which result in
val=
1 1 1 1 1
1 1 1 1 1
1 1 1 1 0
1 1 1 1 0
1 1 1 0 0
1 1 1 0 0
1 1 0 0 0
1 1 0 0 0
1 0 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
How to do same task in r ,particularly val = (1:N+step).' <= indexthis step?
One option is
i <- seq_len(ncol(m1))
sapply(rev(i), function(.i) {
m1[,.i][sequence(.i *2)] <- 1
m1[,.i]
})
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 1 1 1 1 1
# [3,] 1 1 1 1 0
# [4,] 1 1 1 1 0
# [5,] 1 1 1 0 0
# [6,] 1 1 1 0 0
# [7,] 1 1 0 0 0
# [8,] 1 1 0 0 0
# [9,] 1 0 0 0 0
#[10,] 1 0 0 0 0
#[11,] 0 0 0 0 0
#[12,] 0 0 0 0 0
Or vectorize it
i1 <- rep(i, rev(2*i))
m1[cbind(ave(i1, i1, FUN = seq_along), i1)] <- 1
m1
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 1 1 1 1 1
# [3,] 1 1 1 1 0
# [4,] 1 1 1 1 0
# [5,] 1 1 1 0 0
# [6,] 1 1 1 0 0
# [7,] 1 1 0 0 0
# [8,] 1 1 0 0 0
# [9,] 1 0 0 0 0
#[10,] 1 0 0 0 0
#[11,] 0 0 0 0 0
#[12,] 0 0 0 0 0
Or another option without creating a matrix beforehand
n <- 5
i1 <- seq(10, 2, by = -2)
r1 <- c(rbind(i1, rev(i1)))
matrix(rep(rep(c(1, 0), n), r1), ncol = n)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 1 1 1 1 1
# [3,] 1 1 1 1 0
# [4,] 1 1 1 1 0
# [5,] 1 1 1 0 0
# [6,] 1 1 1 0 0
# [7,] 1 1 0 0 0
# [8,] 1 1 0 0 0
# [9,] 1 0 0 0 0
#[10,] 1 0 0 0 0
#[11,] 0 0 0 0 0
#[12,] 0 0 0 0 0
data
m1 <- matrix(0, 12, 5)

Populate vector down or up with unique element value (like na.locf)

I have a large dataframe with each column containing one flag from the set {-1,1}, all the rest of the values are set to zero. I want to fill up or down the rest of the column entries with a value corresponding to that flag value. for example, given a vector to represent 1 column, I have
v <- rep(0,15)
v[12] <- 1
#I'd want a function that is something like:
f <- function(v,flag){
for(i in 2:length(v)){ if(v[i-1]==flag) v[i] <- flag else v[i]<-v[i]}
v
}
> v
[1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
> f(v,1)
[1] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
The example works fine for filling forward some v and a flag 1. I'd also want to be able to fill backwards with 1 based on a -1 flag. The obvious solution that comes to mind is na.locf, except I can't get it to work with a 1 in the middle and filling forward and backwards. Even if I populate the 0 elements with NA, it will still not partially fill up or down based on a flag.
Are there any simple and fast vectorized functions that could do this with a matrix or zoo object populated with all zeros, except where there is one element with 1 or -1 in each column, telling it to fill down or up with 1s depending on the value?
edit: thinking about it a bit more, I came up with a possible solution, that along with an illustration, (hopefully) makes it more clear what I want.
Also, the overall goal is to create a mask for Additions/Deletions to a fund index, by date, that fill forwards for additions (+1) and fill backwards for removals (-1). Also, why I thought of na.locf right away. Still not sure if this is the best approach for this block, though. Any thoughts appreciated.
#generate random matrix of flags
v.mtx <- matrix(0,15,10)
for(i in 1:10){
v.mtx[sample(1:15,1),i] <- sample(c(-1,1),1)
}
fill.flag <- function(v) {
if(any(-1 %in% v)) {v[1:which(v!=0)] <- 1}
else
if(any(1 %in% v)) {v[which(v!=0):length(v)] <- 1}
v
}
> v.mtx
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 1 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 1 0 -1 0 0 0
[7,] 0 0 0 -1 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0
[9,] 0 0 0 0 0 0 0 1 0 -1
[10,] 0 0 0 0 0 0 0 0 -1 0
[11,] 0 0 0 0 0 0 0 0 0 0
[12,] 0 0 0 0 0 0 0 0 0 0
[13,] 0 0 1 0 0 0 0 0 0 0
[14,] 0 0 0 0 0 0 0 0 0 0
[15,] 1 -1 0 0 0 0 0 0 0 0
> apply(v.mtx,2,fill.flag)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 1 0 1 0 1 1 0 1 1
[2,] 0 1 0 1 0 1 1 0 1 1
[3,] 0 1 0 1 0 1 1 0 1 1
[4,] 0 1 0 1 0 1 1 0 1 1
[5,] 0 1 0 1 0 1 1 0 1 1
[6,] 0 1 0 1 1 1 1 0 1 1
[7,] 0 1 0 1 1 1 0 0 1 1
[8,] 0 1 0 0 1 1 0 0 1 1
[9,] 0 1 0 0 1 1 0 1 1 1
[10,] 0 1 0 0 1 1 0 1 1 0
[11,] 0 1 0 0 1 1 0 1 0 0
[12,] 0 1 0 0 1 1 0 1 0 0
[13,] 0 1 1 0 1 1 0 1 0 0
[14,] 0 1 1 0 1 1 0 1 0 0
[15,] 1 1 1 0 1 1 0 1 0 0
As #G. Grothendieck commented, you can try cummax and cummin, i.e.
f1 <- function(x){
if(sum(x) == 1){
return(cummax(x))
}else{
return(rev(cummin(rev(x)))* -1)
}
}
#apply as usual
apply(v.mtx, 2, f1)

Clustering of Count data

I am currently trying to find clusters in a data set that looks like this:
Dienstag 19 Mittwoch 20 Donnerstag 21 Freitag 22 Montag 25 Dienstag 26 Donnerstag 28
[1,] 0 0 0 0 0 0 NA
[2,] 0 0 0 0 0 0 NA
[3,] 0 0 0 0 0 0 NA
[4,] 0 0 0 0 1 0 NA
[5,] 1 0 1 1 1 1 NA
[6,] 0 0 0 0 0 0 NA
[7,] 4 0 1 0 2 1 NA
[8,] 0 1 2 1 0 2 NA
[9,] 0 0 1 0 0 0 NA
[10,] 1 0 0 0 0 1 0
[11,] 2 0 1 0 0 5 0
[12,] 1 0 0 0 0 1 1
[13,] 0 1 0 0 0 0 0
[14,] 0 0 1 0 4 1 0
It corresponds at the counting of times a user used an application given the day and the hour.
I want to find pattern/clusters that relate the usage with the hour, but I don't know how to manage it. It would really be helpful if you could give me some suggestions about methods.
There are statistical means at clustering as well but here's a visual approach. I was lazy and used libraries I am familiar with to accomplish this goal but it is likely accomplished more efficiently with some base tools.
## dat <- read.table(text=" Dienstag.19 Mittwoch.20 Donnerstag.21 Freitag.22 Montag.25 Dienstag.26 Donnerstag.28
## [1,] 0 0 0 0 0 0 NA
## [2,] 0 0 0 0 0 0 NA
## [3,] 0 0 0 0 0 0 NA
## [4,] 0 0 0 0 1 0 NA
## [5,] 1 0 1 1 1 1 NA
## [6,] 0 0 0 0 0 0 NA
## [7,] 4 0 1 0 2 1 NA
## [8,] 0 1 2 1 0 2 NA
## [9,] 0 0 1 0 0 0 NA
## [10,] 1 0 0 0 0 1 0
## [11,] 2 0 1 0 0 5 0
## [12,] 1 0 0 0 0 1 1
## [13,] 0 1 0 0 0 0 0
## [14,] 0 0 1 0 4 1 0", header=TRUE)
dat$hour <- factor(1:nrow(dat))
library(reshape2); library(qdap); library(ggplot2); library(plyr)
dat2 <- melt(dat)
dat2[, 2] <- beg2char(dat2[, 2], ".")
dat2 <- ddply(dat2, .(variable), transform,
rescale = scale(value))
ggsave("heat.png")
ggplot(dat3, aes(variable, hour)) + geom_tile(aes(fill=rescale)) +
scale_fill_gradient(low = "white", high = "red")
Most clustering algorithms will assume continuous data. While of course you can "cast" integers to double values, the results will no longer be as meaningful as they were for true continuous values.
I like Tylers visual approach. If there is a meaningful pattern, your brains visual cortex is probably the best tool to discover it.

Resources