multiple loops on matrix - r

I have below information:
coordinate <- read.table(text = " 18.915 13.462 31.598
17.898 14.453 32.160
18.220 15.420 32.853
19.208 12.313 32.573
20.393 11.524 32.110
20.344 10.809 31.085
21.595 16.610 29.912")
amnumber <- c(1,1,2,3,3,3,4)
atname <-as.data.frame( c("A","B","A","C","D","C","H"),stringsAsFactors = F)
library(geometry)
tri <- delaunayn(coordinate)
tri
[,1] [,2] [,3] [,4]
[1,] 1 3 7 2
[2,] 4 1 6 2
[3,] 4 1 3 2
[4,] 4 1 3 7
[5,] 5 4 3 7
[6,] 5 1 6 7
[7,] 5 4 1 7
[8,] 5 4 1 6
tridmatrix
I want to perform two loops on tri mamtrix such that value 1 in the first row has relations between each other next values like 3,7 and 2. So, in the output matrix of our loops, we have to put 1 between these indices. Then, value 3 of the first row has relations between two other values like 7 and 2. And so on. The output result would be a matrix that only contains 0,1 values. To this end I wrote the below loops:
for (k in 1:nrow(tri)){
for (i in 1:4){
for (j in i+1){
c <- abs(amnumber[tri[k,i]]-amnumber[tri[k,j]])
if (c>=1){
if (!((atname[tri[k,i],]%in%"N")&&(atname[tri[k,j],]%in%"C")&&(c%in%1)||
(atname[tri[k,i],]%in%"C")&&(atname[tri[k,j],]%in%"N")&&(c%in%1))){
d <- sqrt(sum((coordinate[tri[k,i],]-coordinate[tri[k,j],])^2))
if (d<=tridist){
adj_tri[tri[k,i],tri[k,j]] <- 1
adj_tri[tri[k,j],tri[k,i]] <- 1
adj_tri[is.na(adj_tri)] <- 0
}
}
}
}
}
}
But it did not work. And I faced error. i index is equal to the number of columns in tri matrix and I think the problem is in the third loop. However, I could not fix it. Any help would be appreciated.
Besides, this is too slow. Would you please help me to change it lapply to speed up the progress.

Related

How to identify all possible permutations of a time series according to order of permutation

I am trying to figure out a way to translate a financial time series into a symbolic time series that account for all "meaningful" permutations according to a given order (in R):
Example:
Given a Time Series: ts= c(1,2,3,4,5)
If Order=2 I would like to extract the following patterns:
1) 1 1 (ts[i]==ts[i+1])
2) 1 2 (ts[i]<ts[i+1])
3) 2 1 (ts[i]>ts[i+1])
(pattern 2 2 is redundant since equality is accounted for via pattern 1 1)
If Order=3 I would like to extract the following patterns:
1) 1 2 3 (ts[i]<ts[i+1]<ts[i+2])
2) 1 2 2 (ts[i]<ts[i+1]==ts[i+2])
3) 1 2 1 (ts[i]<ts[i+1]>ts[i+2])
4) 2 2 3 (ts[i]==ts[i+1]<ts[i+2])
5) 2 2 2 (ts[i]==ts[i+1]==ts[i+2])
6) 2 2 1 (ts[i]==ts[i+1]>ts[i+2])
7) 3 2 1 (ts[i]>ts[i+1]>ts[i+2])
8) 3 2 2 (ts[i]>ts[i+1]==ts[i+2])
9) 3 2 3 (ts[i]>ts[i+1]<ts[i+2])
What I am looking for is a scalable (in terms of Order being 2,3,4,5 and so on) and automated (function-wise) way to do this.
I am striving with packages such as "permute", "gtools", "combinat" but to no avail. I think what I seek is a special case of permutations. Can anyone help me with this problem?
My quest started from reading papers on "Permutation Entropy", a Google-scholar search shall provide you with relevant bibliography for anyone further interested.
Try this:
library(zoo)
ts <- c(1,3,2,4,5,4,3,3,2)
rollapply(ts, 2, rank, ties='min')
[,1] [,2]
[1,] 1 2
[2,] 2 1
[3,] 1 2
[4,] 1 2
[5,] 2 1
[6,] 2 1
[7,] 1 1
[8,] 2 1
When order = 3:
rollapply(ts, 3, rank, ties='min')
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
[3,] 1 2 3
[4,] 1 3 1
[5,] 3 2 1
[6,] 3 1 1
[7,] 2 2 1
That's not quite what you want but it is close. The main issue is seen in the first two rows where you don't wish to distinguish the ranks of the first and third values when both are higher or lower than the middle observation. Here is a fix.
z <- rollapply(ts, 3, rank, ties='min')
lohilo <- z[,1] < z[,2] & z[,3] < z[,2]
hilohi <- z[,1] > z[,2] & z[,3] > z[,2]
z[lohilo,] <- rep(c(1,2,1),rep(sum(lohilo),3))
z[hilohi,] <- rep(c(2,1,2),rep(sum(hilohi),3))
z
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 2 1 2
[3,] 1 2 3
[4,] 1 2 1
[5,] 3 2 1
[6,] 3 1 1
[7,] 2 2 1
Permutations of a time series are computed within the function below, specifically made for permutation entropy (Source):
# Function to compute the ordinal patterns for a given time series.
# Input (2 arguments. Null arguments are not vaild)
# x = Given time series (type=numeric vector)
# dim = Embedding dimension (type=numeric)
# Commonly used value of dim ranges from 3 to 7
# Output is a numeric vector of size=(dim)!
ordinal_pattern<-function(x,dim){
# Generate ordinal numbers to assign. For example if dim =3, then
# ordinal number=0,1,2
ordinal_numbers<-seq(0,(dim-1),by=1)
# Compute all possible permutations of the ordinal numbers.
# Maximum size of possible_pattern=dim!
possible_pattern<-(combinat::permn(ordinal_numbers))
# Initialize result. Result is the output.
result<-0
result[1:length(possible_pattern)]<-0
# Loop for computation of ordinal pattern
for(i in 1:(length(x)-(dim-1))){
temp<-x[i:(i+(dim-1))]
tempseq<-seq(0,dim-1,by=1)
tempdata<-data.frame(temp,tempseq)
tempdata<-tempdata[order(temp),]
for(j in 1: length(possible_pattern)){
if (all(possible_pattern[[j]]==tempdata$tempseq)){
result[j]<-result[j]+1
}
}
}
return(result)
}

Enhancing speed / vectorization of for loop including sample-function R

I am looking for a fast way to create a matrix with integer values that have a certain probability to be chosen. Given a vector L=c(3,4,2) and a probability vector Prob=c(0.4,0.35,0.25,0.1,0.25,0.4,0.25,0.6,0.4) with sum(L) elements, I want to choose, for example, an element between 1:L[1] = 1:3 with probability Prob[1:L[1]] = c(0.4,0.35,0.25). This should be performed over all elements of L for several times determined by the parameter rows and be stored into a matrix named POP.
My solution is very slow because of two for-loops and I am searching a solution with much better performance through vectorization or other techniques.
My solution to this problem looks as follows:
L = c(3,4,2)
L_cum = c(0,cumsum(L)) #vector to call vector sections from Prob
Prob = c(0.4,0.35,0.25,0.1,0.25,0.4,0.25,0.6,0.4) #probability vector for sum(L) elements
rows = 5 #number of rows of matrix POP
POP = matrix(0,rows,length(L))
for(i in 1:rows){
for(j in 1:length(L)){
POP[i,j] = sample(1:L[j],1,prob=Prob[(L_cum[j]+1):L_cum[j+1]])
}
}
I'd just try:
set.seed(1234)
#set the number of extractions
n<-10
vapply(split(Prob,rep(seq_along(L),L)),
function(x) sample(length(x),n,replace=TRUE,prob=x),
integer(n))
# 1 2 3
# [1,] 1 4 1
# [2,] 2 2 1
# [3,] 2 3 1
# [4,] 2 1 1
# [5,] 3 3 1
# [6,] 2 4 2
# [7,] 1 3 1
# [8,] 1 3 2
# [9,] 2 3 2
#[10,] 2 3 1

Getting all possible two column subsets

I am a relative newbie to R and I am now very close to being finished with a rather long script with many thanks to everyone who helped me thus far at various steps. I have another point I am stuck on. I have simplified the issue to this:
Dataset1
ax ay
1 3
2 4
Dataset2
bx by
5 7
6 8
A <- dataset1
B <- dataset2
a <- 2 #number of columns
b <- 1:2
(my datasets will vary in number of columns and so I need to be able to vary this factor)
I want this answer in any order (i.e. all possible combinations of two columns one from each of the two datasets) like this or equivalent.
[[1]]
1 5
2 6
[[2]]
1 7
2 8
[[3]]
3 5
4 6
[[4]]
3 7
4 8
But I am not getting it.
I tried a bunch of things and the closest to what I want was with this:
i <- 1
for( i in 1:a )
{
e <- lapply(B, function(x) as.data.frame(cbind(A, x)))
print(e)
i <- i+1
}
Close, yes. I can take the answer and do some fiddling and subsetting but its not right and there must be an easy way to do this. I have not seen anything like this in my on line searches. Any help much appreciated.
Does something like this work for you?
Dataset1 <- data.frame(ax=1:2,ay=3:4)
Dataset2 <- data.frame(bx=5:6,by=7:8)
apply(
expand.grid(seq_along(Dataset1),seq_along(Dataset2)),
1,
function(x) cbind(Dataset1[x[1]],Dataset2[x[2]])
)
Result:
[[1]]
ax bx
1 1 5
2 2 6
[[2]]
ay bx
1 3 5
2 4 6
[[3]]
ax by
1 1 7
2 2 8
[[4]]
ay by
1 3 7
2 4 8
I think the easiest way to do is very similar to what you tried, use two explicit loops. However, there are still some things I would do differently:
Pre allocate the list space
Use an explicit counter
Use drop=FALSE
Then you can do the following.
A <- read.table(text = "ax ay
1 3
2 4", header = TRUE)
B <- read.table(text = "bx by
5 7
6 8", header = TRUE)
out <- vector("list", length = ncol(A) * ncol(B))
counter <- 1
for (i in 1:ncol(A)) {
for (j in 1:ncol(B)) {
out[[counter]] <- cbind(A[,i, drop = FALSE], B[,j, drop = FALSE])
counter <- counter + 1
}
}
out
## [[1]]
## ax bx
## 1 1 5
## 2 2 6
##
## [[2]]
## ax by
## 1 1 7
## 2 2 8
##
## [[3]]
## ay bx
## 1 3 5
## 2 4 6
##
## [[4]]
## ay by
## 1 3 7
## 2 4 8
If I understand the question, I think you can use combn to select the columns you want. For instance, if wanted all combinations of 8 columns taken 2 at at time, you could do:
combn(1:8, 2)
Which gives (in part for readability):
combn(1:8,2)[,c(1:5, 15:18)]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 1 1 1 1 3 3 3 3
[2,] 2 3 4 5 6 5 6 7 8
So then columns of this matrix can be used as the indices you want.

How can I extract matrix elements corresponding to column list

This seems like it should be very simple to do with an apply function, but I find myself struggling with it.
I have a matrix (dataframe ok also) of data:
u <- matrix(sample(seq(4),20,T),5,4)
u
[,1] [,2] [,3] [,4]
[1,] 1 2 4 2
[2,] 4 3 2 2
[3,] 3 3 3 1
[4,] 3 2 4 4
[5,] 4 1 3 4
Suppose I just wanted to use the elements (like indirect in excel) of
column j to select a corresponding column value from each row.
e.g. given col(j) = 3
row 1 would get element corresponding to row=1,col(j=3)=4 and return 2 (row(1):col(4))
row 2 would get element
corresponding to row=2,col(j=3)=2 and return 3 (row(2):col(2))
...
row
5 would get element corresponding to row=5,col(j=3)=3 and return 3
(row(5),col(3))
I end up with a vector of those values v<-c(4,2,...3)
You can use matrix indexing:
i <- seq_len(nrow(u))
j <- u[, 3]
u[cbind(i, j)]
I think the following also works:
sapply(1:nrow(u), function(i) u[i,u[i,3]])

How to transform a list of user ratings into a matrix in R

I am working on a collaborative filtering problem, and I am having problems reshaping my raw data into a user-rating matrix. I am given a rating database with columns 'movie', 'user' and 'rating'. From this database, I would like to obtain a matrix of size #users x #movies, where each row indicates a user's ratings.
Here is a minimal working example:
# given this:
ratingDB <- data.frame(rbind(c(1,1,1),c(1,2,NA),c(1,3,0), c(2,1,1), c(2,2,1), c(2,3,0),
c(3,1,NA), c(3,2,NA), c(3,3,1)))
names(ratingDB) <- c('user', 'movie', 'liked')
#how do I get this?
userRating <- matrix(data = rbind(c(1,NA,0), c(1,1,0), c(NA,NA,1)), nrow=3)
I can solve the problem using two for loops, but this of course doesn't scale well. Can anybody help with me with a vectorized solution?
This can be done without any loop. It works with the function matrix:
# sort the 'liked' values (this is not neccessary for the example data)
vec <- with(ratingDB, liked[order(user, movie)])
# create a matrix
matrix(vec, nrow = length(unique(ratingDB$user)), byrow = TRUE)
[,1] [,2] [,3]
[1,] 1 NA 0
[2,] 1 1 0
[3,] NA NA 1
This will transform the vector stored in ratingDB$liked to a matrix. The argument byrow = TRUE allows arranging the data in rows (the default is by columns).
Update: What to do if the NA cases are not in the data frame?
(see comment by #steffen)
First, remove the rows containing NA:
subDB <- ratingDB[complete.cases(ratingDB), ]
user movie liked
1 1 1 1
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 0
9 3 3 1
The full data frame can be reconstructed. The function expand.grid is used to generate all combinations of user and movie:
full <- setNames(with(subDB, expand.grid(sort(unique(user)), sort(unique(movie)))),
c("user", "movie"))
movie user
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
Now, the information of the sub data frame subDB and the full combination data frame full can be combined with the merge function:
ratingDB_2 <- merge(full, subDB, all = TRUE)
user movie liked
1 1 1 1
2 1 2 NA
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 0
7 3 1 NA
8 3 2 NA
9 3 3 1
The result is identical with the original matrix. Hence, the same procedure can be applied to transform it to a matrix of liked values:
matrix(ratingDB_2$liked, nrow = length(unique(ratingDB_2$user)), byrow = TRUE)
[,1] [,2] [,3]
[1,] 1 NA 0
[2,] 1 1 0
[3,] NA NA 1

Resources