Extracting alternating sequence from vector in R - r

I have a data looking like the following:
A= c(0,0,0,-1,0,0,0,1,1,1,0,0,-1,0,0,-1,-1,1,1,1,-1,0,0,0,-1,0,0,-1,-1,1,1,0,0,0,0,1,-1)
The goal is to extract alternating -1s and 1s. I want to make a function where the input vector contains 0,1, and -1. The output ideally spits out all the 0s and alternating -1s and 1s.
For instance, the desired output for the above example is:
B= c(0,0,0,-1,0,0,0,1,0,0,0,0,-1,0,0,0,0,1,0,0,-1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,-1)
The two 1s in the 9th and 10th location in A is turned to 0 because we only keep the first 1 or -1 appearing. The -1s in 16th and 17th location of A is turned to 0 for this reason as well.
Anyone have a good idea for making such a function?

Identify positions of nonzero values:
w = which(A != 0)
For each run of similar values, in A[w], take the position of the first:
library(data.table)
wkeep = tapply(w, rleid(A[w]), FUN = function(x) x[1])
Set all other values to zero:
# following #alexis_laz's approach
B = numeric(length(A))
B[ wkeep ] = A[ wkeep ]
This way, you don't have to make comparisons in a loop, which R is slow at, I think.
rleid comes from data.table. With base R, you can make wkeep with #alexis_laz's suggestion:
wkeep = w[c(TRUE, A[w][-1L] != A[w][-length(w)])]
Or write your own rleid, as in Josh's answer.

This is really just a Reification of GWarius's pseudo-code. (I already had a structure but logic that was failing.)
last1 <- -A[which(A != 0)[1] ] # The opposite of the first non-zero item
for (i in seq_along(A) ){
if( last1==1 && A[i]==-1 ){ last1 <- -1
} else {if (last1 == -1 && A[i] == 1) { last1 <- 1
} else {A[i] <- 0}} }
A
[1] 0 0 0 -1 0 0 0 1 0 0 0 0 -1 0 0 0 0 1 0 0 -1 0 0
[24] 0 0 0 0 0 0 1 0 0 0 0 0 0 -1
> identical(A, B)
[1] TRUE

you have to slide all the array and with a flag variable you check if previously you found 1 or -1.
it could be possible pseudo-code algorithm:
while i < length(a):
if flag == 1 && a[i]=-1:
b[i]=a[i];
flag = -1;
else if flag == -1 && a[i] = 1:
b[i]=a[i];
flag = 1;
else:
b[i]=0;
i++;
}//end of while

Related

Create boolean vector of length n with k true values well dispersed

The problem is to create boolean vector of length n with k true entries (and n-k false entries) well dispersed in the vector.
If k = 5 and n = 8 manually created solutions are [1 0 1 1 0 1 0 1] or [1 0 1 0 1 0 1 1] etc.
An example for a vector with entries that are not well dispersed would be [1 1 1 1 1 0 0 0 0].
A possible criterium for "well-dispersedness" is having alternating blocks of zeros and ones of roughly the same length - specifically with one-blocks of size floor(n/k) or floor(n/k) + 1 and zero-blocks of size floor(n/(n-k)) or floor(n/(n-k)) + 1.
How to create such a vector?
Get the simplest implementation of Bresenham algorithm, and simulate drawing of line segment with end coordinates (0,0)-(ones,zeros). This is just error-propagation approach.
When algorithm generates change of X-coordinate (X-step), it corresponds to 1-entry, Y-step corresponds to zero bit.
def Distribute(ones, zeros):
leng = ones + zeros
err = leng // 2
res = []
for i in range(0, leng):
err = err - ones
if err < 0 :
res.append(1)
err = err + leng
else:
res.append(0)
print(res)
Distribute(5,3)
[1, 0, 1, 0, 1, 1, 0, 1]

If-Else loop vs. Ifelse(); how are they different and how to give the same result

I wanted to write a if-loop, possibly that gives the same result from:
K = ifelse(arg1 < arg2,1,2), which results:
K = {1,2,1,1,2,2,1,...}
I was trying to do this:
if (arg1 < arg2) {
K = 1;
if (arg1 > arg2) {
K = 2;
}
}
But this gives me an error that the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used.
I was actually hoping to use if-else, but I am having hard time implementing it.
You are getting that warning because length(arg1) is greater than 1 and if can handle only one value at a given time so even if you pass it the entire arg1 it is going to by default take only first value of i.e arg1[1].
Something like this should work
arg1 <- 10:1
arg2 <- 5:14
K <- numeric(length = length(arg1))
for (i in seq_along(arg1)) {
if (arg1[i] < arg2[i])
K[i] = 1
else
K[i] = 2
}
K
#[1] 2 2 2 1 1 1 1 1 1 1
which also gives same output with ifelse
ifelse(arg1 < arg2, 1, 2)
#[1] 2 2 2 1 1 1 1 1 1 1
Make sure that length of arg1 is same as that of arg2.

Counting frequency of runs of consecutive 1s in an array

I have defined an array of arr=[0 0 1 1 1 0 0 0 0 1 1 0 1 1 1].
I would like to count the continuous repetition and store in another array.
For example above, the first repetition is 3 of 1s. So I assume the length of this continuous repetition is 3.
Therefore, the the array should be like arr[length]=1 . The 1 stands for it has been encountered once. The final output should be
arr[3]=2 //it means length of repetition with 3 of 1s has been encountered twice
arr[2]=1 //length of repetition with 2 has been encountered once.
Below is my code progress so far.
err=0;
no_err=0;
flag=0;
arr=[0 0 1 1 1 0 0 0 0 1 1 0 1 1 1] //assume 15
for x=1:15;
val=arr(x);//access array values
if(val==0)//if no error
flag=0; //indicates no error
elseif(val==1) //if there is an error
flag=1; //indicates error
end
if(flag==0)
no_err=no_err+1;//counter of no error
err=0; //reset error to zero
elseif(flag==1)//
err=err+1;//
tmperr=err;//will keep updating with latest err count length
end
end
Some of the logic is reasonable, but this part is strange:
if(flag==0)
....
elseif(flag==1)
This could be simply else. More importantly, the state of flag could have been changed by the earlier part of the loop, which will render the logic of the second if statement invalid.
I suggest considering four scenarios, focusing on counting only repetitions of 1s:
flag is 0 and value is 0: do nothing (no need to have a clause for this)
flag is 1 and value is 1: increment the count of consecutive 1s
flag is 0 and value is 1: set the flag to 1, set the count of consecutive 1s to 1.
flag is 1 and value is 0: set the flag to 0, record the count of 1s in the output array.
Since the second bullet item does not change the state of flag, it can be dealt with first; the two others are considered within if/elseif construction.
Also, the end of array should be ending the current run of 1s. Instead of writing a separate clause for this, it's easier to pad the given array with 0.
err=0
flag=0
arr=[0 0 1 1 1 0 0 0 0 1 1 0 1 1 1]
output = zeros(arr)
paddedarr = [arr 0]
for x=1:length(paddedarr)
val=paddedarr(x)
if (val==1 & flag==1)
err=err+1
end
if (val==1 & flag==0)
flag = 1
err = 1
elseif (val==0 & flag==1)
output(err) = output(err)+1
flag = 0
err = 0
end
end
Output:
0. 1. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
Note that I used length command instead of assuming the length of array.

R: Creating a matrix with loops/apply (Original code Fortran)

Any help appreciated, I have been struggling with this problem far too long today, and I hope a fresh pair of eyes and set of braincells can help. Suggestions on how to make the code more efficient will also be greatly appreciated.
I am in the process of rewriting a program from Fortran into R. The eventual matrix, once all the data comes in, will be bigger that 1000x1000.
The first element of the code looked like this:
allocate (S(nrecords))
do i=1,nrecords
S(i)=ZZ(i,i)
end do
which in R simply became this: S<-diag(ZZ) **nrecords in the example data = 10
The example dataset I am using consists of a 10x10 matrix ZZ:
167315 136626 138035 150376 137080 136561 139467 137161 151010 140947
136626 171188 139660 138286 138161 138709 139713 138422 138138 140265
138035 139660 170362 138202 138643 138168 140629 139121 137675 139288
150376 138286 138202 167354 138025 138029 140168 137797 144110 139955
137080 138161 138643 138025 168606 144637 140715 138636 142043 141936
136561 138709 138168 138029 144637 167756 140256 138348 140914 152011
139467 139713 140629 140168 140715 140256 172119 141704 140553 140769
137161 138422 139121 137797 138636 138348 141704 169635 137902 138752
151010 138138 137675 144110 142043 140914 140553 137902 169823 142444
140947 140265 139288 139955 141936 152011 140769 138752 142444 173183
so S is a vector containing the diagonal values.
I am stuck in translating this Fortran element though:
allocate(D(nrecords,nrecords))
sumD=0
do i=1,nrecords
do j=1,nrecords
D(i,j)=S(i)+S(j)-2*ZZ(i,j)
sumD=sumD+D(i,j)
end do
end do
deallocate(ZZ)
sumD=sumD/(nrecords*nrecords)
I know that at the end of the day I am supposed to end up with another 10x10 matrix, where D1,1 will equal to 0, and D1,2 will be 65251. But between reading-up on for-loops, apply(), sapply() and tapply() I am rather lost and confused.
This is another element that has already been translated, and I wanted to base the fortran translation on this, but I think I have been staring at it too long, and I strongly suspect that there is a more efficient answer:
n <-6
sumA <- 0
for (i in 1:n) {
for (j in 1:n) {
sumA <- sumA+A[i,j]
}
}
sumA2 <- 0
for (i in 1:n) {
for (j in 1:n) {
sumA2 <- sumA2+A[i,j]^2
}
}
with the corresponding fortran:
sumA2=0.0;sumA=0.0
do i=1,nrecords
do j=1,nrecords
if(A(i,j) > 0.0) then
sumA2=sumA2+(A(i,j)*A(i,j))
sumA=sumA+A(i,j)
end if
end do
end do
sumMMA=0.0;sumZZ=0.0
do i=1,nrecords
do j=1,nrecords
sumMMA=sumMMA+(ZZ(i,j)*A(i,j))
sumZZ=sumZZ+ZZ(i,j) !this will not work using the sum(ZZ) function
end do
end do
Matrix A is simply
1 0 0 0 0 0 0 0 0 0
0 0.75 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0.5 0 0 0 0
0 0 0 0 0 0 0.75 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1
Thanks in advance!
The purpose of the apply functions is to improve readability. If you don't understand them you don't need to use them. They are more or less wrappers for for loops. In your case, you can almost translate your code verbatim.
R
nrecords <- 10
ZZ <- as.matrix(read.table(header=F, text='
167315 136626 138035 150376 137080 136561 139467 137161 151010 140947
136626 171188 139660 138286 138161 138709 139713 138422 138138 140265
138035 139660 170362 138202 138643 138168 140629 139121 137675 139288
150376 138286 138202 167354 138025 138029 140168 137797 144110 139955
137080 138161 138643 138025 168606 144637 140715 138636 142043 141936
136561 138709 138168 138029 144637 167756 140256 138348 140914 152011
139467 139713 140629 140168 140715 140256 172119 141704 140553 140769
137161 138422 139121 137797 138636 138348 141704 169635 137902 138752
151010 138138 137675 144110 142043 140914 140553 137902 169823 142444
140947 140265 139288 139955 141936 152011 140769 138752 142444 173183
'))
S <- diag(ZZ)
Fortran
allocate(D(nrecords,nrecords))
sumD=0
do i=1,nrecords
do j=1,nrecords
D(i,j)=S(i)+S(j)-2*ZZ(i,j)
sumD=sumD+D(i,j)
end do
end do
deallocate(ZZ)
sumD=sumD/(nrecords*nrecords)
R
D <- matrix(0, nrecords, nrecords)
sumD = 0
for(i in 1:nrecords){
for(j in 1:nrecords){
D[i,j] = S[i] + S[j] - 2*ZZ[i,j]
sumD = sumD + D[i,j]
}
}
sumD = sumD/(nrecords*nrecords)
Fortran
do i=1,nrecords
do j=1,nrecords
if(A(i,j) > 0.0) then
sumA2=sumA2+(A(i,j)*A(i,j))
sumA=sumA+A(i,j)
end if
end do
end do
sumMMA=0.0;sumZZ=0.0
do i=1,nrecords
do j=1,nrecords
sumMMA=sumMMA+(ZZ(i,j)*A(i,j))
sumZZ=sumZZ+ZZ(i,j) !this will not work using the sum(ZZ) function
end do
end do
R
A <- matrix(0, nrecords, nrecords)
diag(A) <- c(1,.75,1,1,1,.5,.75,1,1,1)
sumA2 = 0
sumA = 0
for(i in 1:nrecords){
for(j in 1:nrecords){
if(A[i,j] > 0){
sumA2=sumA2+(A[i,j]*A[i,j])
sumA = sumA+A[i,j]
}
}
}
sumMMA=0
sumZZ=0
for(i in 1:nrecords){
for(j in 1:nrecords){
sumMMa=sumMMA+(ZZ[i,j]*A[i,j])
sumZZ=sumZZ+ZZ[i,j]
}
}

Repeat a loop until it satisfies a specific condition

Anybody can help me on this? Suppose the "p" is totally exogenous and following a uniform distribution. Then I want to generate "z", which is a TRUE(=1) or FALSE(=0) dummy, and has the property that the summation of each three elements (1-3, 4-6, 7-9,..., 58-60) in "z" should be greater than 0.
For example, if I get a "z" like {1 0 0 1 1 0 0 0 0 0 1 0...}, I hope to repeat the loop again ( since sum(z[7:9])=0 ) to draw a different "error" until I get a new "z" like {1 1 0 0 0 1 0 1 0 1 0 0...} where all summations for each three elements are greater than 0. The code I use is as follows. Where am I wrong?
set.seed(005)
p<-runif(60, 0, 1)
for (i in 1:20) {
repeat {
error= -0.2*log((1/runif(60, 0, 1))-1) # a random component
z=(p<0.5+error) # TRUE/FALSE condition
z=replace(z, z==TRUE, 1) # replace z to 1 if z is true, else z=0
if (sum(z[(3*i-2):(3*i)])>0) {break}
}
}
Your for loop generates a new z for every i. I don't think that's what you're trying to do. From what I can understand, you're trying to generate a new z and then use a for loop with the counter i to check for sums of three consecutive elements. If so, then you need to have one loop to generate new zs, and then another one inside this loop which checks for the sum of three consecutive elements.
I think this does what you want. But when I run it it seems unlikely that you will get a satisfactory z soon.
set.seed(005)
p<-runif(60, 0, 1)
invalidentriesexist =1
while(invalidentriesexist == 1) {
error = -0.2*log((1/runif(60, 0, 1))-1) # a random component
z=(p<0.5+error) # TRUE/FALSE condition
z=replace(z, z==TRUE, 1) # replace z to 1 if z is true, else z=0
z=replace(z, z==FALSE, 0) # replace z to 1 if z is true, else z=0
invalidentriesexist = 0
i = 1
while ( i <=20 & invalidentriesexist == 0 ) {
invalidentriesexist = 0
if (sum(z[((3*i)-2):(3*i)])==0) {invalidentriesexist = 1}
cat(i,'\n')
cat(invalidentriesexist,'\n')
cat(paste(z,collapse = ","),'\n')
cat(z[((3*i)-2):(3*i)],'\n\n')
i = i + 1
}
}

Resources