I have one vector, example x=c(0,0,0,1,1,2,3,4,5,6).
I want write out all zeros, all ones and next all numbers divisible by 2.
The output would look like this: 0 0 0 1 1 2 4 6
I don't know how to write out zeros and ones, because next I use (which (x %% 2==0)). Can anyone help?
We can try with %% and use |
x[x%%2==0 | x==1]
#[1] 0 0 0 1 1 2 4 6
Related
Given a binary array of size N
e.g. A[1:N] = 1 0 0 1 0 1 1 1
A new array of size N-1 will be created by taking XOR of 2 consecutive elements.
A'[1:N-1] = 1 0 1 1 1 0 0
Repeat this operation until one element is left.
1 0 0 1 0 1 1 1
1 0 1 1 1 0 0
1 1 0 0 1 0
0 1 0 1 1
1 1 1 0
0 0 1
0 1
1
I want to find the last element left (0 or 1)
One can find the answer by repetitively performing the operation. This approach will take O(N*N) time. Is there a way to solve the problem more efficiently?
There's a very efficient solution to this problem, which needs just a few lines of code, but it's rather complicated to explain. I'll have a go, anyway.
Suppose you need to reduce a list of, say, 6 numbers that are all zero except for one element. By symmetry, there are just three cases to consider:
1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
1 0 0 0 0 1 1 0 0 0 0 1 1 0 0
1 0 0 0 0 1 0 0 1 0 1 0
1 0 0 1 1 0 1 1 1
1 0 0 1 0 0
1 1 0
In the first case, a single '1' at the edge doesn't really do anything much. It basically just stays put. But in the other two cases, more elements of the list get involved and the situation is more complex. A '1' in the second element of the list produces a result of '1', but a '1' in the third element produces a result of '0'. Is there a simple rule that explains this behaviour?
Yes, there is. Take a look at this:
Row 0: 1
Row 1: 1 1
Row 2: 1 2 1
Row 3: 1 3 3 1
Row 4: 1 4 6 4 1
Row 5: 1 5 10 10 5 1
I'm sure you've seen this before. It's Pascal's triangle, where each row is obtained by adding adjacent elements taken from the row above. The larger numbers in the middle of the triangle reflect the fact that these numbers are obtained by adding together values drawn from a broader subset of the preceding rows.
Notice that in Row 5, the two numbers in the middle are both even, while the other numbers are all odd. This exactly matches the behaviour of the three examples shown above; the XOR product of an even number of '1's is zero, and the XOR product of an odd number of '1's is '1'.
To make things clearer, let's just consider the parity of the numbers in this triangle (i.e., '1' for odd numbers, '0' for even numbers):
Row 0: 1
Row 1: 1 1
Row 2: 1 0 1
Row 3: 1 1 1 1
Row 4: 1 0 0 0 1
Row 5: 1 1 0 0 1 1
This is actually called a Sierpinski triangle. Where a zero appears in this triangle, it tells us that it doesn't matter if your list has a '1' or a '0' in this position; it will have no effect on the resulting value because if you wrote out the expression showing the value of the final result in terms of all the initial values in your list, this element would appear an even number of times.
Take a look at Row 4, for example. Every element is zero except at the extreme edges. That means if your list has 5 elements, the end result depends only on the first and last elements in the list. (The same applies to any list where the number of elements is one more than a power of 2.)
The rows of the Sierpinski triangle are easy to calculate. As mentioned in oeis.org/A047999:
Lucas's Theorem is that T(n,k) = 1 if and only if the 1's in the binary expansion of k are a subset of the 1's in the binary expansion of n; or equivalently, k AND NOT n is zero, where AND and NOT are bitwise operators.
So, after that long-winded explanation, here's my code:
def xor_reduction(a):
n, r = len(a), 0
for k in range(n):
b = 0 if k & -n > 0 else 1
r ^= b & a.pop()
return r
assert xor_reduction([1, 0, 0, 1, 0, 1, 1, 1]) == 1
I said it was short. In case you're wondering, the 4th line has k & -n (k AND minus n) instead of k & ~n (k AND not n) because n in this function is the number of elements in the list, which is one more than the row number, and ~(n-1) is the same thing as -n (in Python, at least).
I have a data frame (data). I would like to run a for loop through it. I want to label the output as either 0 or 1 based on whether the next instance (p) in the data frame (data) is equal to (==) or not equal to (!=) the previous record. At the moment, I just get a list of '1's' but having checked print(p) the for loop is looping. Any help much appreciated. Thanks.
data <- data.frame(movementObjects$Movement)
for (p in seq(1, nrow(data)-1)){
if (p == p-1)
changes <- 0
else {
if (p != p-1)
changes <- 1
print(changes)
}
}
Sorry, I'm new to R. Here is an example of one of the datsets but this is a string I have numerical ones too. All i want is a data frame output labelled according to whether the previous 'movement' is equal to or not equal to the last
movementObjects.Movement
1 left
2 forward
3 forward
4 non-moving
5 non-moving
6 non-moving
7 left
8 non-moving
9 right
10 forward
11 non-moving
12 non-moving
13 non-moving
I guess like this:
1 1
2 0
3 1
4 0
5 0
6 1
7 1
8 1
9 1
If I understand correctly what you want, it is (changing your data.frame name to df and the column names to movement for clarity):
`+`(tail(df$movement, -1) != head(df$movement, -1))
#[1] 1 0 1 0 0 1 1 1 1 1 0 0
Explanation:
tail permits to consider only the (n-1) last elements (n being the total number of elements), while head permits to consider the (n-1) first element. Then you compare those 2 vectors to find the differences and, finally, you use + to convert the logical result into numeric.
Note: As mentioned by #ColonelBeauvel, you can index df with -1 to avoid the use of tail:
`+`(df$movement[-1] != head(df$movement, -1))
EDIT, if you really want to know how to modify your script to make your for loop work:
changes <- c()
for (p in 2:nrow(df)){
if (df[p, 1] == df[p-1, 1]) changes <- c(changes, 0) else changes <- c(changes, 1)
}
print(changes)
# [1] 1 0 1 0 0 1 1 1 1 1 0 0
In Statistics, often times count data can be used for many purposes. I currently have a large data column (around 600 million rows) which is called "A". It looks something like so:
A
0
0
0
1
0
1
1
0
0
0
0
1
0
1
0
Here, A is just a bunch of 0 and 1's with no pattern. The 1's represent a "hit". Hence, I would like to keep a counter (starting at 1 instead of 0) that keeps track of how many hits have occurred+1.
A Counter
0 1
0 1
0 1
1 2
0 2
1 3
1 4
0 4
0 4
0 4
0 4
1 5
0 5
1 6
0 6
I have come up with a for-loop that is:
for(i in 1:nrow(A){
Counter[i+1] <- df[i,5]+df[i+1,4]
}
However, the entire look takes forever at 600 million rows. Does anyone know a good fix? This seems simple but I just cant think of it. Any tips would be greatly helpful. Thanks!
You want to calculate the cumulative sum:
Counter <- cumsum(A) + 1
How about not creating the counter afterwards but doing it when saving the hit/no-hit? Retrieve your last value and set the counter depending on this last value. For performance maybe even use a separate record to save the last value?
I have a table of several columns, with values from 1 to 8. The columns have different lenghts so I have filled them with NAs at the end. I would like to transform each column of the data so I will get something like this for each column:
1 2 3 4 5 6 7 8
0-25 1 0 0 0 0 1 0 2
25-50 5 1 2 0 0 0 0 1
50-75 12 2 2 3 0 1 1 1
75-100 3 25 1 1 1 0 0 0
where the row names are percentages of the actual length of the original column (i.e. without the NAs), the column names are the original 0 to 8 values, and the new values are the number of occurances of the original values in each percentage. Any ideas will be appreciated.
Best,
Lince
PS/ I realize that my original message was very confusing. The data I want to transform contain a number of columns from time series like this:
1
1
8
1
3
4
1
5
1
6
2
7
1
NA
NA
and I need to calculate the frequency of occurences of each value (1 to 8) at the 0-25%, 25-50% et cetera of the series. Joris' answer is very useful. I can work on it. Thanks!
Given the lack of some information, I can offer you this :
Say 0 is no occurence, and 1 is occurence. Then you can use the following little script for the results of one column. Wrap it in a function, apply it over the columns and you get what you need.
x <- c(1,0,0,1,1,0,1,0,0,0,1,0,1,1,1,NA,NA,NA,NA,NA,NA)
prop <- which(x==1) / sum(!is.na(x))*100
result <- cut(prop,breaks=c(0,25,50,75,100))
table(result)
This is making me feel dumb, but I am trying to produce a single vector/df/list/etc (anything but a matrix) concatenating two factors. Here's the scenario. I have a 100k line dataset. I used the top half to predict the bottom half and vice versa using knn. So now I have 2 objects created by knn predict().
> head(pred11)
[1] 0 0 0 0 0 0
Levels: 0 1
> head(pred12)
[1] 0 1 1 0 0 0
Levels: 0 1
> class(pred11)
[1] "factor"
> class(pred12)
[1] "factor"
Here's where my problem starts:
> pred13 <- rbind(pred11, pred12)
> class(pred13)
[1] "matrix"
There are 2 problems. First it changes the 0's and 1's to 1's and 2's and second it seems to create a huge matrix that's eats all my memory. I've tried messing with as.numeric(), data.frame(), etc, but can't get it to just combine the 2 50k factors into 1 100k one. Any suggestions?
#James presented one way, I'll chip in with another (shorter):
set.seed(42)
x1 <- factor(sample(0:1,10,replace=T))
x2 <- factor(sample(0:1,10,replace=T))
unlist(list(x1,x2))
# [1] 1 1 0 1 1 1 1 0 1 1 0 1 1 0 0 1 1 0 0 1
#Levels: 0 1
...This might seem a bit like magic, but unlist has special support for factors for this particular purpose! All elements in the list must be factors for this to work.
rbind will create 2 x 50000 matrix in your case which isn't what you want. c is the correct function to combine 2 vectors in a single longer vector. When you use rbind or c on a factor, it will use the underlying integers that map to the levels. In general you need to combine as a character before refactoring:
x1 <- factor(sample(0:1,10,replace=T))
x2 <- factor(sample(0:1,10,replace=T))
factor(c(as.character(x1),as.character(x2)))
[1] 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0
Levels: 0 1