I need to compare element i with all previous elements i-1,i-2,..., and if i > i-1, i-2, ... return 1, otherwise return 0.
data <- c(10.3,14.3,7.7,15.8,14.4,16.7,15.3,20.2,17.1,7.7,15.3,16.3,19.9,14.4,18.7,20.7)
The result of comparing should be the following.
0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1
Here's one standard way:
as.integer(cummax(data) == data)
The value of the first element is 1 here instead of the OP's preferred 0, but that is easy to tweak.
Related
Given a binary array of size N
e.g. A[1:N] = 1 0 0 1 0 1 1 1
A new array of size N-1 will be created by taking XOR of 2 consecutive elements.
A'[1:N-1] = 1 0 1 1 1 0 0
Repeat this operation until one element is left.
1 0 0 1 0 1 1 1
1 0 1 1 1 0 0
1 1 0 0 1 0
0 1 0 1 1
1 1 1 0
0 0 1
0 1
1
I want to find the last element left (0 or 1)
One can find the answer by repetitively performing the operation. This approach will take O(N*N) time. Is there a way to solve the problem more efficiently?
There's a very efficient solution to this problem, which needs just a few lines of code, but it's rather complicated to explain. I'll have a go, anyway.
Suppose you need to reduce a list of, say, 6 numbers that are all zero except for one element. By symmetry, there are just three cases to consider:
1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
1 0 0 0 0 1 1 0 0 0 0 1 1 0 0
1 0 0 0 0 1 0 0 1 0 1 0
1 0 0 1 1 0 1 1 1
1 0 0 1 0 0
1 1 0
In the first case, a single '1' at the edge doesn't really do anything much. It basically just stays put. But in the other two cases, more elements of the list get involved and the situation is more complex. A '1' in the second element of the list produces a result of '1', but a '1' in the third element produces a result of '0'. Is there a simple rule that explains this behaviour?
Yes, there is. Take a look at this:
Row 0: 1
Row 1: 1 1
Row 2: 1 2 1
Row 3: 1 3 3 1
Row 4: 1 4 6 4 1
Row 5: 1 5 10 10 5 1
I'm sure you've seen this before. It's Pascal's triangle, where each row is obtained by adding adjacent elements taken from the row above. The larger numbers in the middle of the triangle reflect the fact that these numbers are obtained by adding together values drawn from a broader subset of the preceding rows.
Notice that in Row 5, the two numbers in the middle are both even, while the other numbers are all odd. This exactly matches the behaviour of the three examples shown above; the XOR product of an even number of '1's is zero, and the XOR product of an odd number of '1's is '1'.
To make things clearer, let's just consider the parity of the numbers in this triangle (i.e., '1' for odd numbers, '0' for even numbers):
Row 0: 1
Row 1: 1 1
Row 2: 1 0 1
Row 3: 1 1 1 1
Row 4: 1 0 0 0 1
Row 5: 1 1 0 0 1 1
This is actually called a Sierpinski triangle. Where a zero appears in this triangle, it tells us that it doesn't matter if your list has a '1' or a '0' in this position; it will have no effect on the resulting value because if you wrote out the expression showing the value of the final result in terms of all the initial values in your list, this element would appear an even number of times.
Take a look at Row 4, for example. Every element is zero except at the extreme edges. That means if your list has 5 elements, the end result depends only on the first and last elements in the list. (The same applies to any list where the number of elements is one more than a power of 2.)
The rows of the Sierpinski triangle are easy to calculate. As mentioned in oeis.org/A047999:
Lucas's Theorem is that T(n,k) = 1 if and only if the 1's in the binary expansion of k are a subset of the 1's in the binary expansion of n; or equivalently, k AND NOT n is zero, where AND and NOT are bitwise operators.
So, after that long-winded explanation, here's my code:
def xor_reduction(a):
n, r = len(a), 0
for k in range(n):
b = 0 if k & -n > 0 else 1
r ^= b & a.pop()
return r
assert xor_reduction([1, 0, 0, 1, 0, 1, 1, 1]) == 1
I said it was short. In case you're wondering, the 4th line has k & -n (k AND minus n) instead of k & ~n (k AND not n) because n in this function is the number of elements in the list, which is one more than the row number, and ~(n-1) is the same thing as -n (in Python, at least).
When dealing with recursive equations in mathematics, it is common to write equations that hold over some range k = 1,...,d with the implicit convention that if d < 1 then the set of equations is considered to be empty. When programming in R I would like to be able to write for loops in the same way as a mathematical statement (e.g., a recursive equation) so that it interprets a range with upper bound lower than the lower bound as being empty. This would ensure that the syntax of the algorithm mimics the syntax of the mathematical statement on which it is based.
Unfortunately, R does not interpret the for loop in this way, and so this commonly leads to errors when you program your loops in a way that mimics the underlying mathematics. For example, consider a simple function where we create a vector of zeros with length n and then change the first d values to ones using a loop over the elements in the range k = 1,...,d. If we input d < 1 into this function we would like the function to recognise that the loop is intended to be empty, so that we would get a vector of all zeros. However, using a standard for loop we get the following:
#Define a function using a recursive pattern
MY_FUNC <- function(n,d) {
OBJECT <- rep(0, n);
for (k in 1:d) { OBJECT[k] <- 1 }
OBJECT }
#Generate some values of the function
MY_FUNC(10,4);
[1] 1 1 1 1 0 0 0 0 0 0
MY_FUNC(10,1);
[1] 1 0 0 0 0 0 0 0 0 0
MY_FUNC(10,0);
[1] 1 0 0 0 0 0 0 0 0 0
#Not what we wanted
MY_FUNC(10,-2);
[1] 1 1 1 1 1 1 1 1 1 1
#Not what we wanted
My Question: Is there any function in R that performed loops like a for loop, but interprets the loop as empty if the upper bound is lower than the lower bound? If there is no existing function, is there a way to program R to read loops this way?
Please note: I am not seeking answers that simply re-write this example function in a way that removes the loop. I am aware that this can be done in this specific case, but my goal is to get the loop working more generally. This example is shown only to give a clear view of the phenomenon I am dealing with.
There is imho no generic for-loop doing what you like but you could easily make it by adding
if(d > 0) break
as the first statement at the beginning of the loop.
EDIT
If you don't want to return an error when negative input is given you can use pmax with seq_len
MY_FUNC <- function(n,d) {
OBJECT <- rep(0, n);
for (k in seq_len(pmax(0, d))) { OBJECT[k] <- 1 }
OBJECT
}
MY_FUNC(10, 4)
#[1] 1 1 1 1 0 0 0 0 0 0
MY_FUNC(10, 1)
#[1] 1 0 0 0 0 0 0 0 0 0
MY_FUNC(10, 0)
#[1] 0 0 0 0 0 0 0 0 0 0
MY_FUNC(10, -2)
#[1] 0 0 0 0 0 0 0 0 0 0
Previous Answer
Prefer seq_len over 1:d and it takes care of this situation
MY_FUNC <- function(n,d) {
OBJECT <- rep(0, n);
for (k in seq_len(d)) { OBJECT[k] <- 1 }
OBJECT
}
MY_FUNC(10, 4)
#[1] 1 1 1 1 0 0 0 0 0 0
MY_FUNC(10, 1)
#[1] 1 0 0 0 0 0 0 0 0 0
MY_FUNC(10, 0)
#[1] 0 0 0 0 0 0 0 0 0 0
MY_FUNC(10, -2)
Error in seq_len(d) : argument must be coercible to non-negative integer
The function can be vectorized
MY_FUNC <- function(n,d) {
rep(c(1, 0), c(d, n -d))
}
MY_FUNC(10, 4)
#[1] 1 1 1 1 0 0 0 0 0 0
MY_FUNC(10, 1)
#[1] 1 0 0 0 0 0 0 0 0 0
MY_FUNC(10, 0)
#[1] 0 0 0 0 0 0 0 0 0 0
MY_FUNC(10, -2)
Error in rep(c(1, 0), c(d, n - d)) : invalid 'times' argument
I’m working in R and am trying to find a way to refer to the previous cell within a vector when that vector belongs to a data frame. By previous cell, I’m essentially hoping for a “lag” command of some sort so that I can compare one cell to the cell previous. As an example, I have these data:
A <- c(1,0,0,0,1,0,0)
B <- c(1,1,1,1,1,0,0)
AB_df <- cbind (A,B)
What I want is for a given cell in a given row, if that cell’s value is less than the previous cell’s value for the same column vector, to return a value of 1 and if not to return a value of 0. For this example, the new columns would be called “A-flag” and “B-flag” below.
A B A-flag B-flag
1 1 0 0
0 1 1 0
0 1 0 0
0 1 0 0
1 1 0 0
0 0 1 1
0 0 0 0
Any suggestions for syntax that can do this? Ideally, to just create a new column variable into an existing data-frame.
Here is one solution using dplyr package and it's lag method:
library(dplyr)
AB_df <- data.frame(A = A, B = B)
AB_df %>% mutate(A.flag = ifelse(A < lag(A, default = 0), 1, 0),
B.flag = ifelse(B < lag(B, default = 0), 1, 0))
A B A.flag B.flag
1 1 1 0 0
2 0 1 1 0
3 0 1 0 0
4 0 1 0 0
5 1 1 0 0
6 0 0 1 1
7 0 0 0 0
I have a series of data in the format (true/false). eg it looks like it can be generated from rbinom(n, 1, .1). I want a column that represents the # of rows since the last true. So the resulting data will look like
true/false gap
0 0
0 0
1 0
0 1
0 2
1 0
1 0
0 1
What is an efficient way to go from true/false to gap (in practice I'll this will be done on a large dataset with many different ids)
DF <- read.table(text="true/false gap
0 0
0 0
1 0
0 1
0 2
1 0
1 0
0 1", header=TRUE)
DF$gap2 <- sequence(rle(DF$true.false)$lengths) * #create a sequence for each run length
(1 - DF$true.false) * #multiply with 0 for all 1s
(cumsum(DF$true.false) != 0L) #multiply with zero for the leading zeros
# true.false gap gap2
#1 0 0 0
#2 0 0 0
#3 1 0 0
#4 0 1 1
#5 0 2 2
#6 1 0 0
#7 1 0 0
#8 0 1 1
The cumsum part might not be the most efficient for large vectors. Something like
if (DF$true.false[1] == 0) DF$gap2[seq_len(rle(DF$true.false)$lengths[1])] <- 0
might be an alternative (and of course the rle result could be stored temporarly to avoid calculating it twice).
Ok, let me put this in answer
1) No brainer method
data['gap'] = 0
for (i in 2:nrow(data)){
if data[i,'true/false'] == 0{
data[i,'gap'] = data[i-1,'gap'] + 1
}
}
2) No if check
data['gap'] = 0
for (i in 2:nrow(data)){
data[i,'gap'] = (data[i-1,'gap'] + 1) * (-(data[i,'gap'] - 1))
}
Really don't know which is faster, as both contain the same amount of reads from data, but (1) have an if statement, and I don't know how fast is it (compared to a single multiplication)
I have a nested liste, resulted from a function. Where the top element names are reapeated in the element names further down.
$`1`
$`1`$`1`
[1] 0 0 0 0 0 0 0 1 0
$`1`$`2`
[1] 0 0 0 0 0 0 0 0 0
$`2`
$`2`$`1`
[1] 0 0 0 1 1 0 0 0 0
$`2`$`2`
[1] 0 1 0 0 0 1 0 0 0
Is there a way to use an apply function (or whatever) to extract those vectors where the element and subelement names match. E.g. $1$1 and $2$2. I have a huge list (4000 elements with 4000 subelements) so efficiency is thus a must.
Alternatively - I have figured out a way out of this mess by using ´melt()´, but it's too consuming for the size of my set. But if anyone know how to replicate the effect - giving a dataframe with 3 columns one for elementname, one for subelement name and one for the vector - that will also work.
Regards and thanks :)
This is a way to get a list of the vectors you want:
lapply(names(dat), function(x) dat[[x]][[x]])
In a data frame:
do.call("rbind",
lapply(names(dat),
function(x) data.frame(element = x,
subelement = x,
values = dat[[x]][[x]])
)
)
You can unlist them without recursion to remove the top level list structure, and then use regex-assisted subsetting on the names of this result.
l <- list(`1`=list(`1`=rpois(6,1),`2`=rep(0,6)),`2`=list(`1`=rep(0,6),`2`=rpois(6,1)))
l2 <- unlist(l,recursive=F)
l2[grepl("([0-9]+)[.]\\1",names(l2))]
$`1.1`
[1] 2 0 2 4 1 0
$`2.2`
[1] 0 0 0 2 1 0