I am trying to calculate in R the velocity from acceleration in a data frame where the first value is fixed at 0. I would like to use v=u+at to find the velocity from velocity[2:nrow(trial.data)] where t is a constant 0.002. The initial data frame looks like this:
trial.data <- data.table("acceleration" = sample(-5:5,5), "velocity" = c(0))
acceleration velocity
1 0 0
2 5 0
3 -1 0
4 3 0
5 4 0
I have tried using lag from the second row however this gives a value of zero with the correct value in row 3 with other values following also being incorrect.
trial.data$velocity[2:nrow(trial.data)] =
(lag(trial.data$velocity,default=0)) + trial.data$acceleration * 0.002
acceleration velocity
1 0 0.000
2 5 0.000
3 -1 0.010
4 3 -0.002
5 4 0.006
Velocity is accumulated acceleration, so use cumsum:
trial.data <- data.table("acceleration" = c(0,5,-1,3,4))
u <- 0 #starting velocity
velocity <- c(u,u+cumsum(trial.data$acceleration)*0.002)
trial.data$velocity <- velocity[-length(velocity)]
Output:
> trial.data
acceleration velocity
1: 0 0.000
2: 5 0.000
3: -1 0.010
4: 3 0.008
5: 4 0.014
Note the the velocity vector had a final element (which happens to be 0.022) which was neglected when reading it into the data table, since otherwise the columns would be of unequal length. The above code starts with u = 0, but the u could be changed to any other starting velocity and the code would work as intended.
Related
Given a bunch of 2D points and a polygon, I want to evaluate which points are on the boundary of the polygon, and which are strictly inside/outside of the polygon.
The 2D points are:
> grp2
x2 y2
1 -5.233762 1.6213203
2 -1.107843 -7.9349705
3 4.918313 8.9073019
4 7.109651 -3.9571781
5 7.304966 -4.3280168
6 6.080564 -3.5817545
7 8.382685 0.4638735
8 6.812215 6.1610483
9 -4.773094 -3.4260797
10 -3.269638 1.1299852
and the vertices of the polygon are:
> dfC
px py
1 7.304966 -4.3280167
2 8.382685 0.4638735
3 6.812215 6.1610483
4 5.854366 7.5499780
5 2.385478 7.0895268
6 -5.233762 1.6213203
7 -4.773094 -3.4260797
8 -1.107843 -7.9349705
The plot of the situation looks like following:
Clearly, there are 3 points inside the polygon, 1 point outside and 6 points on the edge (as is also evident from the data points).
Now I am using point.in.polygon to estimate this. According to the documentation of package sp, this should return 'integer array; values are: 0: point is strictly exterior to pol; 1: point is strictly interior to pol; 2: point lies on the relative interior of an edge of pol; 3: point is a vertex of pol.'
But my code is not being able to detect the points which are vertices of the polygon:
> point.in.polygon(grp2$x2,grp2$y2,dfC$px,dfC$py)
[1] 0 0 0 1 0 1 0 0 0 1
How can I resolve this problem?
The points are not equal. For example, grp2$x2[1] == -5.23376158438623 but for fpC$px[6] == -5.23376157160271. These are not equal. As the comments suggest, you will have more luck if you round the values:
grp3 <- round(grp2, 3)
dfC3 <- round(dfC, 3)
point.in.polygon(grp3$x2,grp3$y2,dfC3$px,dfC3$py)
# [1] 3 3 0 1 3 1 3 3 3 1
Now
grp3[1, ]
# x2 y2
# 1 -5.234 1.621
fpc3[6, ]
# px py
# 6 -5.234 1.621
Changing the number of decimals to 4 or 5 gives the same results as 3. For floating point numbers to be equal, they must match exactly over all 14 decimal places.
I have a data frame and I want to generate a new column that has the result of an calculation based on the row before. Additionally the calculation has some conditions.
The data frame consist of energy production = p, energy consumption = c, energy grid = g, energy safe = s
My goal is to calculate the usage of a battery in a PV-System. When the modules produces more then needed the battery gets loaded and otherwise unloaded. When the batterie don't have enough energy the grid delivers the remainig energy.
So in the first line the batterie gets loaded because I produce more than I need. In the 5 line I need more energy than I produce, so the batterie gets unloaded and so on.
One row is one hour. So n+1 is based on the energy demand and supply of n.
### Old:
n p c g
1 2 1 0
2 3 1 0
3 4 3 0
4 3 5 2
5 5 8 3
6 2 1 0
### New:
n p c g s
1 2 1 0 1
2 3 1 0 3
3 4 3 0 4
4 3 5 0 2
5 5 8 1 0
6 2 1 0 1
When i use your code the result is like this:
First column - c
Second Colum - p
Third colum - g
Fourth colum - s
The battery gets loaded but the unload process does not fit from what is expected. The battery has 2.3801 energy and the demand in n+1 is 0.875.
So the result should be 2.3801 - 0.875 = 1.5015
This process should end when s = 0
I dont understand why your codes works for the rest of data.
I found a solution here which works very well for my problem.
My battery is floored at 0 and limited to 16 kWh, so I added just the pmin function.
mutate(result = accumulate(production-consumw1, ~ pmin(16,pmax(0, .x + .y)), .init = 0)[-1])
Thanks for your help!
I have a SAS dataset with 3 columns. A FirmIndex, a ProducIndex and a third column called PrChange. Within each FirmIndex & ProductIndex group I want to count how many PrChange are different from . and from 0, and add that to a column called Number. Then I want to divide that column Number by the number of observations within each group which are not ..
Below an example of the dataset and desired output.
data prod;
input firmindex productindex PrChange Number Fract;
cards;
1 1 . 1 0.250
1 1 0.00 1 0.250
1 1 0.00 1 0.250
1 1 -0.40 1 0.250
1 1 0.00 1 0.250
1 2 . 2 1.000
1 2 1.00 2 1.000
1 2 0.30 2 1.000
1 3 . 4 0.800
1 3 0.70 4 0.800
1 3 1.00 4 0.800
1 3 0.70 4 0.800
1 3 0.00 4 0.800
1 3 -0.30 4 0.800
1 4 . 5 1.000
1 4 0.20 5 1.000
1 4 -1.00 5 1.000
1 4 -0.90 5 1.000
1 4 -0.50 5 1.000
1 4 1.00 5 1.000
2 1 . 2 1.000
2 1 0.30 2 1.000
2 1 -0.50 2 1.000
2 2 . 5 0.714
2 2 0.30 5 0.714
2 2 0.10 5 0.714
2 2 0.00 5 0.714
2 2 0.00 5 0.714
2 2 0.80 5 0.714
2 2 -0.20 5 0.714
2 2 0.40 5 0.714
2 3 . 1 1.000
2 3 0.60 1 1.000
2 4 . 5 0.714
2 4 -1.00 5 0.714
2 4 0.80 5 0.714
2 4 -0.20 5 0.714
2 4 0.00 5 0.714
2 4 0.00 5 0.714
2 4 -0.70 5 0.714
2 4 0.90 5 0.714
2 5 . 3 1.000
2 5 0.90 3 1.000
2 5 -0.70 3 1.000
2 5 -0.50 3 1.000
;
run;
Here is what I tried to generate the column number, but it is not working:
data work.prod;
set work.prod;
by firmindex productindex;
if first.productindex or first.firmindex then sum = 0;
else if PrChange ne 0 and PrChange ne .;
sum = sum + 1;
run;
Your problem here is that you need the number to divide by prior to running the rows of data. This is where SAS is different from Excel; SAS is row-based, meaning it takes your code and runs it against each row of data (more or less) one at a time, rather than dynamically looking at every cell from every other cell (like Excel). Much faster and more efficient but less flexible for stuff like this.
Your particular question begs for a DoW loop. This takes over the normal data step loop and performs its own loop - twice. Once to calculate the number/fract values, then once to copy those to the BY group. Note I only check for last.productIndex; last/first transitions are always set on a second by variable when they're true for the first by variable.
Here we do the first loop once for the first set of values( the first 5 records) then we re-loop through the same 5 records. Then for the next 3. Etc. Each time the two loops take the same number of rows so they always stay in sync.
data want;
do _n_ = 1 by 1 until (last.productIndex);
set have;
by firmindex productindex;
number_denom = sum(number_Denom,not missing(PrChange));
number = sum(number, not (PrChange in (.,0)));
end;
fract = number/number_denom;
do _n_ = 1 by 1 until (last.productIndex);
set have;
by firmindex productindex;
output;
end;
run;
I'm going to give the IML answer that I'm able to give. Rick or someone else more IML-savvy probably can do better than this. In R or other matrix language I think this would be much easier, but I don't have the IML chops to do this without looping; maybe it's possible.
proc iml;
use have;
read all var _all_ into h;
u = h[uniqueby(h,1:2), 1:2]; *generate the "unique" categories for the first two columns;
v = j(nrow(h),5); *generate a matrix to save this into;
v[,1:3] = h; *start it out with the first three columns of the dataset;
do i = 1 to nrow(u); *iterate over the unique category matrix;
number = ncol(loc(h[loc((h[,1:2] = u[i,1:2])[,#]),3]));
*the inner LOC produces a two column 1/0 matrix with match 1 / nomatch 0 for each col
then reduce to 1 column via subscript reduction product, to get correct 1/0 match vector
the outer LOC takes the rows of h from that (so rows of h matching u), then returns nonzero/nonmissing
which then ncol summarizes into a count;
fract_denom = ncol(loc(h[loc((h[,1:2] = u[i,1:2])[,#]),3] ^= .));
*similar, but here we have to verify they are not missing explicitly, considering 0 valid;
v[loc((v[,1:2] = u[i,1:2])[,#]),4] = number; *assign to col4 of V;
v[loc((v[,1:2] = u[i,1:2])[,#]),5] = number/fract_denom; *assign to col5 of V;
end;
print v;
quit;
This uses the unique-loc method, more or less, with some modifications; probably is an easier way to get the matches.
A SQL in SAS solution - Parfait's is probably the better one overall, but SAS's willingness to remerge makes the SASsy solution a bit simpler.
proc sql;
create table want as
select firmindex, productindex, prchange,
sum (not (prchange in (0,.))) as number,
calculated number / (sum ( not missing(prchange))) as fract
from have
group by firmindex, productindex;
quit;
SAS will do the grouping/counting/etc. and then merge back to the original dataset with no problem, skipping the need for correlated subqueries. NOT standard SQL, but quite common in SAS nonetheless.
Consider proc sql using conditional CASE WHEN correlated subqueries:
proc sql;
create table ProdChangeCount as
SELECT p.firmindex, p.productindex,
(SELECT SUM(CASE WHEN sub.PrChange ^= . AND sub.PrChange ^= 0 THEN 1 ELSE 0 END)
FROM Prod sub
WHERE sub.firmindex = p.firmindex
AND sub.productindex = p.productindex) AS Number,
CALCULATED Number /
(SELECT Count(*)
FROM Prod sub
WHERE sub.PrChange ^= .
AND sub.firmindex = p.firmindex
AND sub.productindex = p.productindex) AS Frac
FROM Prod p;
quit;
The following datasheet is from excel file
Part A B C D E F G H I J K L
XXX 0 1 1 2 0 1 2 3 1 2 1 0
YYY 0 1 2 2 0 30 1 1 0 1 10 0
....
So, I want to display those parts that contains outliers having logic of
[median – t * MAD, median + t * MAD]
So how to code this using R by function for large amount of data?
You would want to calculate robust Z-scores based on median and MAD (median of absolute deviations) instead of non-robust standard mean and SD. Then assess your data using Z, with Z=0 meaning on median, Z=1 one MAD out, etc.
Let's assume we have the following data, where one set is outliers:
df <- rbind( data.frame(tag='normal', res=rnorm(1000)*2.71), data.frame(tag='outlier', res=rnorm(20)*42))
then Z it:
df$z <- with(df, (res - median(res))/mad(res))
that gives us something like this:
> head(df)
tag res z
1 normal -3.097 -1.0532
2 normal -0.650 -0.1890
3 normal 1.200 0.4645
4 normal 1.866 0.6996
5 normal -6.280 -2.1774
6 normal 1.682 0.6346
Then cut it into Z-bands, eg.
df$band <- cut(df$z, breaks=c(-99,-3,-1,1,3,99))
That can be analyzed in a straightforward way:
> addmargins(xtabs(~band+tag, df))
tag
band normal outlier Sum
(-99,-3] 1 9 10
(-3,-1] 137 0 137
(-1,1] 719 2 721
(1,3] 143 1 144
(3,99] 0 8 8
Sum 1000 20 1020
As can be seen, obviously, the ones with the biggest Zs (those being in the (-99,-3) and (3,99) Z-band, are those from the outlier community).
I have 2 data sets. The first data set has a vector of p-values from 0.5 - 0.001, and the corresponding threshold that meets that p-vale. For example, for 0.05, the value is 13. Any value greater than 13 has a p-value of <0.05. This data set contains all my thresholds that I'm interested in. Like so:
V1 V2
1 0.500 10
2 0.200 11
3 0.100 12
4 0.050 13
5 0.010 14
6 0.001 15
The 2nd data set is just one long list of values. I need to write an R script that counts the number of values in this set that exceed each threshold. For example, count how many values in the 2nd data set that exceed 13, and therefore have a p-value of <0.05, and do this fore each threshold value.
Here are the first 15 values of the 2nd data set (1000 total):
1 11.100816
2 8.779858
3 10.510090
4 9.503772
5 9.392222
6 10.285920
7 8.317523
8 10.007738
9 11.021283
10 9.964725
11 9.081947
12 11.253643
13 10.896120
14 10.272814
15 10.282408
Function which will help you:
length( which( data$V1 > 3 & data$V2 <0.05 ) )
Assuming dat1 and dat2 both have a V2 column, something like this:
colSums(outer(dat2$V2, setNames(dat1$V2, dat1$V2), ">"))
# 10 11 12 13 14 15
# 9 3 0 0 0 0
(reads as follows: 9 items have a value greater than 10, 3 items have a value greater than 11, etc.)