R - Create new variable using the difference between lagged values - r

The problem: I need to create a new variable (eventWindowTime) in R that is based on data from two columns - obstacle present (1=yes,0=no) and timeOnTask (assessed continuously).
The dataset: I have data that was continuously collected (to the fractional seconds) from several participants as they performed a task. At various points, participants encountered one or more obstacles. I would like to create obstacle event windows that range from -5s (5 s before the obstacle) to +20s (20 s after the obstacle).
Additional challenges:
Some event windows are overlapping
Some timestamps have multiple measurements (so I can't rep values -5 to 20 relative to the first obstaclePresent == 1)
Things I've tried:
The way I would typically approach this is to use ifelse or case_when functions with lag() to:
set eventWindowTime to 0 when obstaclePresent == 1 && lag(obstaclePresent == 0)
set eventWindow Time to increment the lagged eventWindowTime value by the difference in timeOnTask values across the two rows when obstaclePresent == 1 && lag(obstaclePresent == 1).
then backfill the negative seconds in a second step.
However, R does not seem to hold the lagged values in memory and I keep getting a "Error in vec_slice():
! x must be a vector, not NULL." error.
Here's a small subset of code and a file which can be used to reproduce the problem:
mre <- data.frame(Sub = rep(1, 41), Time = c(723.2, 723.2, 723.3, 723.3, 723.3, 723.4, 723.4, 723.5, 723.5, 723.6, 723.6, 723.6, 723.7, 723.7, 723.7, 723.8, 723.9, 723.9, 723.9, 724, 724, 724, 724, 724.1, 724.1, 724.2, 724.2, 724.2, 724.3, 724.3, 724.3, 724.4, 724.4, 724.5, 724.5, 724.6, 724.6, 724.6, 724.7, 724.7, 724.8), obstaclePresent = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))
mre$obstacleEventWindow <- case_when(
mre$obstaclePresent != lag(mre$obstaclePresent,1) & mre$obstaclePresent == 0 ~ 0,
mre$obstaclePresent == lag(mre$obstaclePresent,1) ~ lag(mre$obstacleEventWindow) + mre$newTime - lag(mre$newTime,1),
TRUE ~ 0
)
To be clear, I understand that the case_when() statement is self-referencing. I've worked with other programs where a column is populated on the fly, and you can reference lagged cells without issue. That isn't working here, but I'm at a loss with respect to what to do instead.

Related

Math problem trying to make a progressbar in project zomboid

I am making a mod for project Zomboid and I cant seem to figure out a math problem, so the value I am getting ranges from 0 to 1 and I want my progress bar to start at the max width and then descend as the value is increasing.
The first one was easy I got a value between a 100 and 0 so how do this with a value starting at 0?
I tried searching this on google but I am really bad at math and could not find an answer.
function panel:render()
self:drawRectBorder(30, 30, self:getWidth() - 1, 50, 1.0, 1.0, 1.0, 1.0);
--print((bt_core.player:getBodyDamage():getOverallBodyHealth() / 100) * self:getWidth());
self:drawRect(31, 31, (bt_core.player:getBodyDamage():getOverallBodyHealth() / 100) * self:getWidth() - 3, 48, 1, 0, 0, 1);
self:drawRectBorder(30, 110, self:getWidth() - 1, 50, 1.0, 1.0, 1.0, 1.0);
print(bt_core.player:getStats():getFatigue());
if bt_core.player:getStats():getFatigue() == 0 then
self:drawRect(31, 111, self:getWidth() - 3 , 48, 1, 0, 0, 1);
else
self:drawRect(32, 111,bt_core.player:getStats():getFatigue() / (self:getWidth() - 3) , 48, 1, 0, 0, 1);
end
end
To get variable in range 100..0 from variable in range 0..1, you can use y = 100 - x*100
So you have a value 0..1 and you want to map it to 100..0.
Multiplying your value with 100 gives you 0..100.
To invert this you subtract that from 100. 100-0 is 100, 100-100 is 0...
local newVal = 100 - val * 100
or
local newVal = 100 * (1-val)

Solving for possible combinations of variables that add up to a certain number or range of numbers

Good evening all, this is my first question and I am hoping someone on here might be able to at least point me in a direction.
I am trying to figure out how to optimally stack pallets in a new storage facility. I need to configure the racking ahead of time in order to accept different sized pallets.
I am thinking of using between 3-6 different pallet height openings, say 105", 100", 84", 78", 72" and 66".
What I need to do is figure out every possible combination of these pallet heights that will have the top of the top beam at, say, 439".
An example of a combination would be (1) 105" pallet, (1) 100" pallet and (3) 78" pallets.
Another example would be (1) 105" pallet, (1) 100" pallet, (1) 84" pallet, (1) 78" pallet and (1) 72" pallet.
Obviously there are a number of these combinations...and I need to find them all.
I'm wondering if this is possible with excel? I just discovered "Solver" but haven't quite figured it out yet.
Any input would be greatly appreciated. I am kind of running in circles here...
Using a bit of Python and the constraint solver https://pypi.org/project/python-constraint/:
import constraint
h = [105, 100, 84, 78, 72] # heights
total = 439
n = len(h) # number of different heights
# max number of pallets that can fit
Max = int(max([total/h[i] for i in range(n)]))
problem = constraint.Problem()
problem.addVariables( [f"h{j}" for j in h], range(Max+1) )
problem.addConstraint(constraint.MaxSumConstraint(total,h))
s = problem.getSolutions()
print(f"number of solutions:{len(s)}")
print(s)
Output:
number of solutions:194
[{'h100': 4, 'h105': 0, 'h72': 0, 'h78': 0, 'h84': 0},
{'h100': 3, 'h105': 1, 'h72': 0, 'h78': 0, 'h84': 0},
{'h100': 3, 'h105': 0, 'h72': 1, 'h78': 0, 'h84': 0},
...
{'h100': 0, 'h105': 0, 'h78': 0, 'h84': 0, 'h72': 1},
{'h100': 0, 'h105': 0, 'h78': 0, 'h84': 0, 'h72': 0}]

Most common term in a vector - PARI/GP

I feel like I'm being really stupid here as I would have thought there's a simple command already in Pari, or it should be a simple thing to write up, but I simply cannot figure this out.
Given a vector, say V, which will have duplicate entries, how can one determine what the most common entry is?
For example, say we have:
V = [ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ]
I want something which would return the value 8.
I'm aware of things like vecsearch, but I can't see how that can be tweaked to make this work?
Very closely related to this, I want this result to return the most common non-zero entry, and some vectors I look at will have 0 as the most common entry. Eg: V = [ 0, 0, 0, 0, 3, 3, 5 ]. So whatever I execute here I would like to return 3.
I tried writing up something which would remove all zero terms, but again struggled.
The thing I have tried in particular is:
rem( v ) = {
my( c );
while( c = vecsearch( v, 0 ); #c, v = vecextract( v, "^c" ) ); v
}
but vecextract doesn't seem to like this set up.
If you can ensure all the elements are within the some fixed range then it is enough just to do the counting sorting with PARI/GP code like this:
counts_for(v: t_VEC, lower: t_INT, upper: t_INT) = {
my(counts = vector(1+upper-lower));
for(i=1, #v, counts[1+v[i]-lower]++);
vector(#counts, i, [i-1, counts[i]])
};
V1 = [0, 1, 2, 2, 3, 4, 6, 8, 8, 8];
vecsort(counts_for(V1, 0, 8), [2], 4)[1][1]
> 8
V2 = [0, 0, 0, 0, 3, 3, 5];
vecsort(counts_for(V2, 0, 5), [2], 4)[1][1]
> 0
You also can implement the following short-cut for the sake of convenience:
counts_for1(v: t_VEC) = {
counts_for(v, vecmin(v), vecmax(v))
};
most_frequent(v: t_VEC) = {
my(counts=counts_for1(v));
vecsort(counts, [2], 4)[1][1]
};
most_frequent(V1)
> 8
most_frequent(V2)
> 0
The function matreduce provides this in a more general setting: applied to a vector of objects, it returns a 2-column matrix whose first column contains the distinct objects and the second their multiplicity in the vector. (The function has a more general form that takes the union of multisets.)
most_frequent(v) = my(M = matreduce(v), [n] = matsize(M)); M[n, 1];
most_frequent_non0(v) =
{ my(M = matreduce(v), [n] = matsize(M), x = M[n, 1]);
if (x == 0, M[n - 1, 1], x);
}
? most_frequent([ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ])
%1 = 8
? most_frequent([x, x, Mod(1,3), [], [], []])
%2 = []
? most_frequent_non0([ 0, 0, 0, 0, 3, 3, 5 ])
%3 = 5
? most_frequent_non0([x, x, Mod(1,3), [], [], []])
%4 = x
The first function will error out if fed an empty vector, and the second one if there are no non-zero entries. The second function tests for "0" using the x == 0 test (and we famously have [] == 0 in GP); for a more rigorous semantic, use x === 0 in the function definition.

How can I get `True` from `[1, 1, 0, 0, 0] == [0, 0, 1, 1, 0]` in Python?

Example:
I have a solution list a:
a = [1, 1, 0, 0, 0]
and input lists bs:
b1 = [1, 1, 0, 0, 0]
b2 = [0, 1, 1, 0, 0]
b3 = [0, 0, 1, 1, 0]
...
bn = [1, 0, 0, 0, 1]
If I compare a to either b1, b2, ..., bn, I expected to get True value from the comparisons. For sure, this simple expression will not work:
if a == b:
...
because in Python only identical lists can be equal.
Is there any beautiful math that I can easily implement it in programming languages? Now I am thinking about building some hash function but I'm still not sure how?
Note 1) it can be easily implemented by just using for loop but I need some thing more robust. 2) this is maybe also related to problem of this post Cyclic group
A simple solution could be to adjust the a and b values:
a_original = [5, 2, 3, 1, 4]
a_formatted = sorted(a_original)
Then, you can just use the formatted variables. A simple "for" loop can be used to format all of your variables.
Hope this helps!

Mathematica: part assignment

I'm trying to implement an algorithm to build a decision tree from a dataset.
I wrote a function to calculate the information gain between a subset and a particular partition, then I try all the possible partition and want to choose the "best" partition, in the sense that it's got the lowest entropy.
This procedure must be recursive, hence, after the first iteration, it needs to work for every subset of the partition you got in the previous step.
These are the data:
X = {{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}, {1, 1, 0, 0}}
Xfin[0]=X
This is the function: for every subset of the partition, it tries all the possible partitions and calculate the IG. Then it selects the partition with IGMAX:
Partizioneottimale[X_, n_] :=
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
X[n + 1][[l]] := Partizione[X[n][[l]], v]]]]
then I call it:
Partizioneottimale[Xfin, 0]
and it works fine for the first one:
Xfin[1]
{{{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}}, {{1, 0, 0, 0}}}
That is the partition with lowest entropy.
But it doesn't work for the next ones:
Partizioneottimale[Xfin, 1]
Set delayed::steps : Xfin[1+1] in the part assignment is not a symbol
Has anybody any idea about how to solve this?
Thanks
without unraveling all your logic a simple fix is this:
Partizioneottimale[X_, n_] := (
xnp1 = Table[Null, {Length[Flatten[X[n], n - 1]]}] ;
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
xnp1[[l]] = Partizione[X[n][[l]], v]]]] ;
X[n+1] = xnp1 ; )

Resources