What's the most elegant way to create this series of numbers? - math

I have N items in a collection and I am assigning values starting from 1, going down to 0 in the "center" of the list, and then going back up to 1 linearly.
So if you have 5 items:
0 1 2 3 4
1 0.5 0 0.5 1
For 6 items, 2 items in the center would have the same value 0.
0 1 2 3 4 5
1 0.5 0 0 0.5 1
Right now I have a bunch of if statements checking for index and then deciding whether the value should go up or down from 1. But it seems too messy.
Is there an elegant way to create such a series of numbers (particularly without if statements if possible)?

If N >= 3 is odd, then
f(x) = fabs(2*x-N+1)/(N-1)
If N >= 4 is even, then
f(x) = (fabs(2*x-N+1) - 1)/(N-2)
To get totally rid of if-statements, you can write this as
f(x) = (fabs(2*x-N+1) + (N%2) - 1)/(N-2 + (N%2))
which works for even and odd values of N >= 3.

Related

Repetitively taking XOR of consecutive elements

Given a binary array of size N
e.g. A[1:N] = 1 0 0 1 0 1 1 1
A new array of size N-1 will be created by taking XOR of 2 consecutive elements.
A'[1:N-1] = 1 0 1 1 1 0 0
Repeat this operation until one element is left.
1 0 0 1 0 1 1 1
1 0 1 1 1 0 0
1 1 0 0 1 0
0 1 0 1 1
1 1 1 0
0 0 1
0 1
1
I want to find the last element left (0 or 1)
One can find the answer by repetitively performing the operation. This approach will take O(N*N) time. Is there a way to solve the problem more efficiently?
There's a very efficient solution to this problem, which needs just a few lines of code, but it's rather complicated to explain. I'll have a go, anyway.
Suppose you need to reduce a list of, say, 6 numbers that are all zero except for one element. By symmetry, there are just three cases to consider:
1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
1 0 0 0 0 1 1 0 0 0 0 1 1 0 0
1 0 0 0 0 1 0 0 1 0 1 0
1 0 0 1 1 0 1 1 1
1 0 0 1 0 0
1 1 0
In the first case, a single '1' at the edge doesn't really do anything much. It basically just stays put. But in the other two cases, more elements of the list get involved and the situation is more complex. A '1' in the second element of the list produces a result of '1', but a '1' in the third element produces a result of '0'. Is there a simple rule that explains this behaviour?
Yes, there is. Take a look at this:
Row 0: 1
Row 1: 1 1
Row 2: 1 2 1
Row 3: 1 3 3 1
Row 4: 1 4 6 4 1
Row 5: 1 5 10 10 5 1
I'm sure you've seen this before. It's Pascal's triangle, where each row is obtained by adding adjacent elements taken from the row above. The larger numbers in the middle of the triangle reflect the fact that these numbers are obtained by adding together values drawn from a broader subset of the preceding rows.
Notice that in Row 5, the two numbers in the middle are both even, while the other numbers are all odd. This exactly matches the behaviour of the three examples shown above; the XOR product of an even number of '1's is zero, and the XOR product of an odd number of '1's is '1'.
To make things clearer, let's just consider the parity of the numbers in this triangle (i.e., '1' for odd numbers, '0' for even numbers):
Row 0: 1
Row 1: 1 1
Row 2: 1 0 1
Row 3: 1 1 1 1
Row 4: 1 0 0 0 1
Row 5: 1 1 0 0 1 1
This is actually called a Sierpinski triangle. Where a zero appears in this triangle, it tells us that it doesn't matter if your list has a '1' or a '0' in this position; it will have no effect on the resulting value because if you wrote out the expression showing the value of the final result in terms of all the initial values in your list, this element would appear an even number of times.
Take a look at Row 4, for example. Every element is zero except at the extreme edges. That means if your list has 5 elements, the end result depends only on the first and last elements in the list. (The same applies to any list where the number of elements is one more than a power of 2.)
The rows of the Sierpinski triangle are easy to calculate. As mentioned in oeis.org/A047999:
Lucas's Theorem is that T(n,k) = 1 if and only if the 1's in the binary expansion of k are a subset of the 1's in the binary expansion of n; or equivalently, k AND NOT n is zero, where AND and NOT are bitwise operators.
So, after that long-winded explanation, here's my code:
def xor_reduction(a):
n, r = len(a), 0
for k in range(n):
b = 0 if k & -n > 0 else 1
r ^= b & a.pop()
return r
assert xor_reduction([1, 0, 0, 1, 0, 1, 1, 1]) == 1
I said it was short. In case you're wondering, the 4th line has k & -n (k AND minus n) instead of k & ~n (k AND not n) because n in this function is the number of elements in the list, which is one more than the row number, and ~(n-1) is the same thing as -n (in Python, at least).

Add index to runs of positive or negative values of certain length

I have a dataframe, which contains 100.000 rows. It looks like this:
Value
1
2
-1
-2
0
3
4
-1
3
I want to create an extra column (column B). Which consist of 0 and 1's.
It is basically 0, but when there are 5 data points in a row positive OR negative, then it should give a 1. But, only if they are in a row (e.g.: when the row is positive, and there is a negative number.. the count shall start again).
Value B
1 0
2 0
1 0
2 0
2 1
3 1
4 1
-1 0
3 0
I tried different loops, but It didn't work. I also tried to convert the whole DF to a list (and loop over the list). Unfortunately with no end.
Here's an approach that uses the rollmean function from the zoo package.
set.seed(1000)
df = data.frame(Value = sample(-9:9,1000,replace=T))
sign = sign(df$Value)
library(zoo)
rolling = rollmean(sign,k=5,fill=0,align="right")
df$B = as.numeric(abs(rolling) == 1)
I generated 1000 values with positive and negative sets.
Extract the sign of the values - this will be -1 for negative, 1 for positive and 0 for 0
Calculate the right aligned rolling mean of 5 values (it will average x[1:5], x[2:6], ...). This will be 1 or -1 if all the values in a row are positive or negative (respectively)
Take the absolute value and store the comparison against 1. This is a logical vector that turns into 0s and 1s based on your conditions.
Note - there's no need for loops. This can all be vectorised (once we have the rolling mean calculated).
This will work. Not the most efficient way to do it but the logic is pretty transparent -- just check if there's only one unique sign (i.e. +, -, or 0) for each sequence of five adjacent rows:
dat <- data.frame(Value=c(1,2,1,2,2,3,4,-1,3))
dat$new_col <- NA
dat$new_col[1:4] <- 0
for (x in 5:nrow(dat)){
if (length(unique(sign(dat$Value[(x-4):x])))==1){
dat$new_col[x] <- 1
} else {
dat$new_col[x] <- 0
}
}
Use the cumsum(...diff(...) <condition>) idiom to create a grouping variable, and ave to calculate the indices within each group.
d$B2 <- ave(d$Value, cumsum(c(0, diff(sign(d$Value)) != 0)), FUN = function(x){
as.integer(seq_along(x) > 4)})
# Value B B2
# 1 1 0 0
# 2 2 0 0
# 3 1 0 0
# 4 2 0 0
# 5 2 1 1
# 6 3 1 1
# 7 4 1 1
# 8 -1 0 0
# 9 3 0 0

generate choice switching matrix by group for many groups

I want to calculate the choice switching probability by group first(user in below code). Then I will average the group level probability and get a total probability. I have tens of thousands of groups so I need the code to be fast. My code is a for loop , which takes more than 10 minutes to run. I did the same code/logic excel, it takes less than a few seconds.
The switching for choice m to n for a particular user is defined as the share of observations whose choice are n at period t and m at period t-1
My original code is tagging the first and last purchase by for loop first. Then use another for loop to get the switching matrix. I am only able to create the switching matrix by the whole data not by group. Even so, it is still very slow. Adding user would make it even slower.
t<-c(1,2,1,1,2,3,4,5)
user<-c('A','A','B' ,'C','C','C','C','C')
choice<-c(1,1,2,1,2,1,3,3)
dt<-data.frame(t,user,choice)
t user choice
1 A 1
2 A 1
1 B 2
1 C 1
2 C 2
3 C 1
4 C 3
5 C 3
# **step one** create a second choice column for later construction of the switching matrix
#Label first purchase and last purchase is zero
for (i in 1:nrow(dt))
{ ifelse (dt$user[i+1]==dt$user[i],dt$newcol[i+1]<-0,dt$newcol[i+1]<-1) }
# **step two** create stitching matrix
# switching.m is a empty matrix with the size of total chocie:3x3 here
length(unique(dt$user))
total.choice<-3
switching.m<-matrix(0,nrow=total.choice,ncol=total.choice)
for (i in 1:total.choice)
{
for(j in 1:total.choice)
{
if(length(nrow(switching.m[switching.m[,1]==i& switching.m[,2]==j,])!=0))
{switching.m[i,j]=nrow(dt[dt[,1]==i&dt[,2]==j,])}
else {switching.m[i,j]<0}
}
}
The desire output for a particular user/group is like this. The output should have the same matrix size even if the user does not make a particular choice at all
# take user C
#output for switching matrix
second choice
first 1 2 3
1 0 1 1
2 1 0 0
3 0 0 1
#output for switching probability
second choice
first 1 2 3
1 0 0.5 0.5
2 1 0 0
3 0 0 1
We could use table and prop.table after after splitting by 'user'
lst <- lapply(split(dt, dt$user), function(x)
table(factor(x$choice, levels= 1:3), factor(c(x$choice[-1], NA), levels=1:3)))
As mentioned by #nicola, it is more compact to split the 'choice' column by 'user'
lst <- lapply(split(dt$choice, dt$user), function(x)
table(factor(x, levels = 1:3), factor(c(x[-1], NA), levels = 1:3)))
lst$C
# 1 2 3
#1 0 1 1
#2 1 0 0
#3 0 0 1
prb <- lapply(lst, prop.table, 1)
prb$C
# 1 2 3
# 1 0.0 0.5 0.5
# 2 1.0 0.0 0.0
# 3 0.0 0.0 1.0

Write all zeros from vector in R

I have one vector, example x=c(0,0,0,1,1,2,3,4,5,6).
I want write out all zeros, all ones and next all numbers divisible by 2.
The output would look like this: 0 0 0 1 1 2 4 6
I don't know how to write out zeros and ones, because next I use (which (x %% 2==0)). Can anyone help?
We can try with %% and use |
x[x%%2==0 | x==1]
#[1] 0 0 0 1 1 2 4 6

for loop where criteria is last data record in r

I have a data frame (data). I would like to run a for loop through it. I want to label the output as either 0 or 1 based on whether the next instance (p) in the data frame (data) is equal to (==) or not equal to (!=) the previous record. At the moment, I just get a list of '1's' but having checked print(p) the for loop is looping. Any help much appreciated. Thanks.
data <- data.frame(movementObjects$Movement)
for (p in seq(1, nrow(data)-1)){
if (p == p-1)
changes <- 0
else {
if (p != p-1)
changes <- 1
print(changes)
}
}
Sorry, I'm new to R. Here is an example of one of the datsets but this is a string I have numerical ones too. All i want is a data frame output labelled according to whether the previous 'movement' is equal to or not equal to the last
movementObjects.Movement
1 left
2 forward
3 forward
4 non-moving
5 non-moving
6 non-moving
7 left
8 non-moving
9 right
10 forward
11 non-moving
12 non-moving
13 non-moving
I guess like this:
1 1
2 0
3 1
4 0
5 0
6 1
7 1
8 1
9 1
If I understand correctly what you want, it is (changing your data.frame name to df and the column names to movement for clarity):
`+`(tail(df$movement, -1) != head(df$movement, -1))
#[1] 1 0 1 0 0 1 1 1 1 1 0 0
Explanation:
tail permits to consider only the (n-1) last elements (n being the total number of elements), while head permits to consider the (n-1) first element. Then you compare those 2 vectors to find the differences and, finally, you use + to convert the logical result into numeric.
Note: As mentioned by #ColonelBeauvel, you can index df with -1 to avoid the use of tail:
`+`(df$movement[-1] != head(df$movement, -1))
EDIT, if you really want to know how to modify your script to make your for loop work:
changes <- c()
for (p in 2:nrow(df)){
if (df[p, 1] == df[p-1, 1]) changes <- c(changes, 0) else changes <- c(changes, 1)
}
print(changes)
# [1] 1 0 1 0 0 1 1 1 1 1 0 0

Resources