Finding repeated patterns in R

Finding repeated patterns in R - r

Say I have a matrix of 5 x 100 of numbers between 0 and 100 for instance:
1 5 10 15 3
2 15 3 8 27
1 22 34 45 35
28 27 32 3 8
......
I would like to find repeated "patterns" of numbers (mainly couples or triplets).
So in my example I would have the couple 3,15 appearing twice and the triplet 3, 8, 27 also appearing twice (I don't care about the order).
How would you implement that in R?
I would like to have couples and triplets separately and have their count.
thanks
nico

Here is one way. For each row of your 100-row matrix, you find all pairs/triples of numbers (using combn ) and do a frequency-count (using table) of the pairs/triples. The pasteSort function I defined creates a string out of a vector after sorting it. We apply this function to each pair/tuple in each row, and collect all pairs/tuples from the matrix before doing the frequency-count. Note that if a pair repeats on the same row, it's counted as a "repeat".
> mtx <- matrix( c(1,5,10,15,3,
2, 15, 3, 8, 27,
1, 22, 34, 45, 35,
28, 27, 32, 3, 8), ncol=5, byrow=TRUE)
> pasteSort <- function( x ) do.call(paste, as.list( sort( x) ) )
> pairs <- c( apply(mtx, 1, function(row) apply( combn(row, 2), 2, pasteSort)) )
> pairFreqs <- table(pairs)
> pairFreqs[ pairFreqs > 1 ]
3 15 3 27 3 8 8 27
2 2 2 2
> triples <- c( apply(mtx, 1, function(row) apply( combn(row, 3), 2, pasteSort)) )
> tripleFreqs <- table( triples )
> tripleFreqs[ tripleFreqs > 1 ]
3 8 27
2

Related

Examine if a value is in an interval using R

Having the following vector:
t <- c(2, 6, 8, 20, 22, 30, 40, 45, 60)
I would like to find the values that fall between the following intervals:
g <- list(c(1,20), c(20, 40))
The desired output is:
1, 20 c(2, 6, 8)
20, 40 c(20, 22, 30)
Using the dplyr library, I do the following:
library(dplyr)
for(i in t){
for(h in g){
if(between(i, h[[1]], h[[2]])==TRUE){print(c(i, h[[1]], h[[2]]))}
}}
Is there a better way of doing this in R?

We can loop over the list 'g' and extract the 't' elements based on the first and second values by creating a logical vector with >/< and extract the elements of 't'
lapply(g, function(x) t[t >= x[1] & t < x[2]])
-output
[[1]]
[1] 2 6 8
[[2]]
[1] 20 22 30

library(purrr)
library(dplyr)
map(g,~keep(t,between(t,.[1],.[2])))
[[1]]
[1] 2 6 8 20
[[2]]
[1] 20 22 30 40

You may find findInterval() from base R useful:
g <- c(1, 20, 40)
t <- c(2, 6, 8, 20, 22, 30, 40, 45, 60)
findInterval(t, g)
#> [1] 1 1 1 2 2 2 3 3 3
So t[1], t[2] and t[3] are in the first interval, t[4], t[5] and
t[6] in the second, and t[7], t[8] and t[9] the third (meaning that
these values are bigger than the right end point of the second interval.)
If you had values lower than one they would be labelled by 0:
t2 <- c(-1, 0, 2, 6, 8, 20, 22, 30, 40, 45, 60)
findInterval(t2, g)
#> [1] 0 0 1 1 1 2 2 2 3 3 3
You can save the result of findInterval() as e.g. y and use which(y==1) to find which entries correspond to the first interval.

We can try cut + is.na like below
lapply(
g,
function(x) {
t[!is.na(cut(t, x, include.lowest = TRUE))]
}
)
which gives
[[1]]
[1] 2 6 8 20
[[2]]
[1] 20 22 30 40

Sum every x elements in a vector

I have a vector v like:
v <- c(1, 2, 46, 6, 3, 5, 67, 2, ..., 9)
I want to add the numbers three by three, so I would have the results of adding 1+6+67...
Thank you!

I would suggest creating a sequence by the width you want (in this case 3) which will start from 1 to the length of your vector and then sum:
#Data
v <- c(1, 2, 46, 6, 3, 5, 67, 2, 9)
#Seq
seqv <- seq(1,length(v),by = 3)
#Sum
sum(v[seqv])
Output:
[1] 74

You could create a sequence of values by three and use that to index the vector v and then sum the result.
v <- 10:19
s <- seq(1,9, by=3)
> v
[1] 10 11 12 13 14 15 16 17 18 19
> s
[1] 1 4 7
> sum(v[s])
[1] 39

Split a vector in R depending on entries

I input a vector vec<-c(2 3 4 8 10 12 15 19 20 23 27 28 39 47 52 60 64 75), and the size of intervals that I want to break the vector entries into.
In this example I want to break this into 9 different vectors based on the size of each entry.
In my case I want vector number 1 to be entries in the interval [1,9], then vector 2 to be entries in [10,18]...ect
In other words:
vec1: 2 3 4 8
vec2: 10 12 15
vec3: 19 20 23 27
ect...
I have tried using the split function but I do not know how to set a ratio that will work.

Maybe the following will do what you want.
f <- cut(vec, seq(0, max(vec), by = 9), include.lowest = TRUE)
sp <- split(vec, f)
sp <- sp[sapply(sp, function(x) length(x) != 0)]
sp

Use integer division %/% to return a vector of which group each value belongs in. Then split into separate vectors. Use (vec-1) to be "inclusive", i.e. 27 goes with group 3, not group 4.
split(vec,(vec-1) %/% 9)
Edit:
Another way using dplyr and cut which explicitly tags each interval
require(dplyr)
vec <- as.data.frame(vec)
df2 %>% mutate(interval = cut(vec,breaks=seq(0,((max(vec) %/% 9) +1) * 9,9),include.lowest=TRUE,right=TRUE))
vec interval
1 2 [0,9]
2 3 [0,9]
3 4 [0,9]
4 8 [0,9]
5 10 (9,18]
6 12 (9,18]
7 15 (9,18]
8 19 (18,27]
9 20 (18,27]
10 23 (18,27]
11 27 (18,27]

maybe this
library(purrr)
vec <- c(2, 3, 4, 8, 10 ,12, 15 ,19, 20, 23, 27, 28, 39, 47, 52, 60, 64, 75)
vec1 <- keep(vec, function(x) x >= 1 & (x) <= 9)
vec2 <- keep(vec, function(x) x >= 10 & (x) <= 18)

R Multiplying a list of lists with a vector

I have a dataframe with 1 column consisting of 10 lists each with a varying number of elements. I also have a vector with 10 different values in it (10 integers).
I want to take the "sumproduct" of each 10 lists with its corresponding vector value, and end up with 10 values.
Value 1 = sumproduct(First list, First vector value)
Value 2 = sumproduct(Second list, Second vector value)
etc...
Final_Answer <- c(Value 1, Value 2, ... , Value 10)
I have a function that generates the dataframe containing lists of numbers representing years. The dataframe is contructed using a loop to generate each value then rowbinding the value together with the dataframe.
Time_Function <- function(Maturity)
{for (i in 0:Count)
{x<-as.numeric(((as.Date(as.Date(Maturity)-i*365)-Start_Date)/365)
Time <- rbind(Time, data.frame(x))}
return((Time))
}
The result is this:
http://pastebin.com/J6phR2hv
http://i.imgur.com/Sf4mpA5.png
If my vector looks like [1,2,3,4...,10], I want the output to be:
Final Answer = [(1*1.1342466 + 1*0.6342466 + 1* 0.1342466), (2*1.3835616 + 2*0.8835616 + 2*0.3835616), ... , ( ... +10*0.0630137)]

Assuming you want to multiply each value in the list by the respective scalar and then add it all up, here is one way to do it.
list1 <- mapply(rep, 1:10, 10:1)
vec1 <- 1:10
df <- data.frame( I(list1), vec1)
df
list1 vec1
1 1, 1, 1,.... 1
2 2, 2, 2,.... 2
3 3, 3, 3,.... 3
4 4, 4, 4,.... 4
5 5, 5, 5,.... 5
6 6, 6, 6,.... 6
7 7, 7, 7, 7 7
8 8, 8, 8 8
9 9, 9 9
10 10 10
mapply(df$list1, df$vec1, FUN = function(x, y) {y* sum(x)})
[1] 10 36 72 112 150 180 196 192 162 100

Multiplication of several vectors

I have 10 vectors (v_1 to v_10) and I need all of them multiplied with another vector v_mult (i.e. v_1*v_mult, v_2*v_mult etc.). How to I solve this problem within a for-loop? Im stuck to the loop-solution (which I do not find) because it is part of a larger analysis.
v_10<-c(2, 3, 5, 8)
v_20<-c(3, 9, 0, 1)
v_30<-c(15, 9, 6, 0)
v_40<-c(4, 9, 6, 1)
v_50<-c(1, 7, 3, 9)
v_60<-c(5, 9, 5, 1)
v_70<-c(5, 8, 2, 6)
v_80<-c(5, 8, 1, 6)
v_90<-c(5, 0, 1, 6)
v_10<-c(2, 8, 1, 0)
v_mult<-c(8, 5, 1, 9)

Those vectors should be all together in a matrix:
vlist <- mget(ls(pattern = "v_[[:digit:]*]"))
m <- do.call(cbind, vlist)
m * v_mult
# v_10 v_20 v_30 v_40 v_50 v_60 v_70 v_80 v_90
#[1,] 16 24 120 32 8 40 40 40 40
#[2,] 40 45 45 45 35 45 40 40 0
#[3,] 1 0 6 6 3 5 2 1 1
#[4,] 0 9 0 9 81 9 54 54 54
You can of course extract each vector from the matrix using column subsetting, e.g., m[, "v_10"] or m[, 1].

We can get all the vector objects in a list using mgetand multiply each element of the list with 'v_mult' using Map.
Map('*',mget(paste('v', seq(10, 100, by=10), sep="_")), list(v_mult))
Or use set from data.table which would be very fast as it doesn't have the .[data.table overhead.
library(data.table)
DT <- setDT(mget(paste('v', seq(10, 100, by=10), sep="_")))
for(j in seq_along(DT)){
set(DT, i=NULL, j= j, value= DT[[j]]*v_mult)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Finding repeated patterns in R - r

Related

Examine if a value is in an interval using R

Sum every x elements in a vector

Split a vector in R depending on entries

R Multiplying a list of lists with a vector

Multiplication of several vectors

Categories

Resources