Cumulative Products in Kusto - azure-data-explorer

I have data with times (column t) and values (column v). I want to create a new column, call it p, which is the product of all v's from t=0 to the current row's t value. The row_cumsum() can do this for addition but I need a product.
I tried using extend 𝑝 = vβˆ—π‘π‘Ÿπ‘’π‘£(𝑝,1,1), but kusto doesn’t recognize the p column in prev() because it is being created.
If the input is:
datatable(t:int, v:int)
1, 1
2, 1
3, 2
4, 3
5, 3
6, 2
I want the output to be:
datatable(t:int, v: int, p:int)
1, 1, 1
2, 1, 1
3, 2, 2
4, 3, 6
5, 3, 18
6, 2, 36

You can utilize log10() and row_cumsum() (cumulative sum):
datatable(t:int, v:int)
[
1, 1,
2, 1,
3, 2,
4, 3,
5, 3,
6, 2,
]
| order by t asc
| extend l = log10(v)
| extend cumsum = row_cumsum(l)
| project t, v, p=exp10(cumsum)

Related

How do I generate a polychoric correlation matrix in R-psych

I am trying to generate a polychoric correlation matrix in R-psych for a 227 x 6 data table which I have called nepr. Importing the data from an excel spreadsheet and entering the code:
nepr=as.data.frame(nepr)
attach(nepr)
library(psych)
out=polychoric(nepr)
neprpoly=out$rho
print(neprpoly,digits=2)
generates the following error message:
>Error in if (any(lower > upper)) stop("lower>upper integration
limits"): missing value where TRUE/FALSE needed
>In addition: warning messages:
>1. In polychoric(nepr): The items do not have an equal number
of response alternatives, global set to FALSE.
>2. In qnorm(cumsum(rsum)[-length(rsum)]): NaNs produced
I was expecting the code which I entered to produce a polychoric correlation matrix based on the dataframe nepr and don't know how to interpret/ act on the error messages which I have received.
Can anyone suggest what changes I need to make to the code to address the error messages?
A sample of the dataset is as follows:
structure(list(Balance = c(4, 4, 5, 5, 3, 4, 3, 4, 2, 2, 2, 5,
2, 2, 2, 2, 1, 2, 4, 1), Earth = c(4, 5, 5, 5, 5, 5, 5, 4, 4,
4, 4, 5, 3, 4, 4, 2, 5, 4, 5, 5), Plants = c(2, 2, 2, 3, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 2, 2, 4), Modify = c(2, 2, 1,
1, 2, 2, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 2), Growth =
c(2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 4, 1, 4, 2, 2, 4, 4, 4, 1, 2),
Mankind = c(2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1,
1, 1, 2)), row.names = c(NA,20L), class = "data.frame")
The data consists of inputs of Likert scale rankings (ranked 1-5) to the items 'Balance', 'Earth', 'Plants', 'Modify', 'Growth', and 'Mankind'. There are no missing values in any cells of the 227 row x 6 item matrix; Balance, Plants, & Growth all contain the values 1-5; Earth contains the values 2-5 (no ranking of 1 recorded); Mankind contains the values 1-4 (no ranking of 5 recorded). When I ran the original data set (before reversing the valence of the last 3 columns) I was able to get a polychoric matrix with no problems even though the data contained the Earth data as it appears in the nepr data set. I assume that it is not uncommon to have similar data sets from surveys where variables do not necessarily contain the full range of response values.

split vector after all predefined set of elements occured

I have to do the following:
I have a vector, let as say
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
I have to subset the remainder of a vector after 1, 2, 3, 4 occurred at least once.
So the subset new vector would only include 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1.
I need a relatively easy solution on how to do this. It might be possible to do an if and while loop with breaks, but I am kinda struggling to come up with a solution.
Is there a simple (even mathematical way) to do this in R?
Use sapply to find where each predefined number occurs first time.
x[-seq(max(sapply(1:4, function(y) which(x == y)[1])))]
# [1] 4 5 5 3 2 11 1 3 3 4 1
Data
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
You can use run length encoding for this
x = c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
encoded = rle(x)
# Pick the first location of 1, 2, 3, and 4
# Then find the max index location
indices = c(which(encoded$values == 1)[1],
which(encoded$values == 2)[1],
which(encoded$values == 3)[1],
which(encoded$values == 4)[1])
index = max(indices)
# Find the index of x corresponding to your split location
reqd_index = cumsum(encoded$lengths)[index-1] + 2
# Print final split value
x[reqd_index:length(x)]
The result is as follows
> x[reqd_index:length(x)]
[1] 4 5 5 3 2 11 1 3 3 4 1

Returning index of vector

I have a vector that looks like this:
c(1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5..)
I want to get the index of when the element changes, i.e. (1,5,9,...)
I know how to do it with a for loop, but I am trying a faster way as my vector is very large.
Thanks,
Try
which(c(TRUE,diff(v1)!=0))
Or
match(unique(v1), v1)
Or if the vector is sorted
head(c(1, findInterval(unique(v1), v1)+1),-1)
data
v1 <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 5, 5, 5, 5, 5)
Another fun approach:
v1 <- c(1, 1, 2, 3, 4, 4, 5, 6, 7, 7, 7, 8)
head(c(1, cumsum(rle(v1)$lengths) + 1), -1)
Or if you have magrittr then it can become
library(magrittr)
v1 %>%
rle %>%
.$lengths %>%
cumsum %>%
add(1) %>%
c(1, .) %>%
head(-1)
Result: 1 3 4 5 7 8 9 12
Might look weird but it's fun to think that through :)
Explanation: cumsum(rle(v1)$lengths) gets you almost all the way there, but it'll give you the index of where a sequence ends rather than where the next sequence starts, so that's why we add one to each element, append the index 1, and remove the last element.

merge table in R

I have the 2 tables as below
subj <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
gamble <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
ev <- c(4, 5, 6, 4, 5, 6, 4, 5, 6)
table1 <- data.frame(subj, gamble, ev)
subj2 <- c(1, 2, 3)
gamble2 <- c(1, 3, 2)
table2 <- data.frame(subj2, gamble2)
I want to merge the two tables by gamble, only choose the gamble from table 1 which has the same number to gamble in table 2. The expected output is as follows:
sub gamble ev
1 1 4
2 3 6
3 2 5
You are looking for merge
merge(table1, table2, by.x=c("subj", "gamble"), by.y=c("subj2", "gamble2"), all=FALSE, sort=TRUE)
edited as per Ananda's helpful observation

How can you find the polynomial for a decimated LFSR?

I know that it if you decimate the series generated by a linear feedback shift register, you get a new series and a new polynomial. For example, if you sample every fifth element in the series generated by a LFSR with polynomial x4+x+1, you get the series generated by x2+x+1. I can find the second polynomial (x2+x+1) by brute force, which is fine for low-order polynomials. However, for higher-order polynomials, the time required to brute force it gets unreasonable.
So the question is: is it possible to find the decimated polynomial analytically?
Recently read this article and thought of it when seeing your question, hope it helps.. :oÞ
Given a primitive polynomial over GF(q), one can obtain another primitive polynomial by decimating an LFSR sequence obtained from the initial polynomial. This is demonstrated in the code below.
K := GF(7);
C := PrimitivePolynomial(K, 2);
C;
D^2 + 6*D + 3
In order to generate an LFSR sequence, we must first multiply this polynomial by a suitable constant so that the trailing coefficient becomes 1.
C := C * Coefficient(C,0)^-1;
C;
5*D^2 + 2*D + 1
We are now able to generate an LFSR sequence of length 72 - 1. The initial state can be anything other than [0, 0].
t := LFSRSequence (C, [K| 1,1], 48);
t;
[ 1, 1, 0, 2, 3, 5, 3, 4, 5, 5, 0, 3, 1, 4, 1, 6, 4, 4, 0, 1, 5, 6, 5, 2, 6, 6,
0, 5, 4, 2, 4, 3, 2, 2, 0, 4, 6, 3, 6, 1, 3, 3, 0, 6, 2, 1, 2, 5 ]
We decimate the sequence by a value d having the property gcd(d, 48)=1.
t := Decimation(t, 1, 5);
t;
[ 1, 5, 0, 6, 5, 6, 4, 4, 3, 1, 0, 4, 1, 4, 5, 5, 2, 3, 0, 5, 3, 5, 1, 1, 6, 2,
0, 1, 2, 1, 3, 3, 4, 6, 0, 3, 6, 3, 2, 2, 5, 4, 0, 2, 4, 2, 6, 6 ]
B := BerlekampMassey(t);
B;
3*D^2 + 5*D + 1
To get the corresponding primitive polynomial, we multiply by a constant to make it monic.
B := B * Coefficient(B, 2)^-1;
B;
D^2 + 4*D + 5
IsPrimitive(B);
true
from these notes: "The decimation by n>0 of a m-sequence c , denoted as c[ n],
has a period equal to N/gcd(N,n), if it is not the all-zero
sequence, its generator polynomial gˆ( x ) has roots that are nth
powers of the roots of g(x)"

Resources