Non linear regression using R

Non linear regression using R - r

I am new to R and I am trying to build a nonlinear correlation in the format below. I have tried a script in R but It is not working and return an error message " singular gradient matrix at initial parameter estimates". Can Someone please help me with the right script to enter in R in order to estimate the updated correlation Coefficients based on new data set?. The data set is made of 3 variables Z,X and Y. I would like to Estimate Z=f(x,y).
Thank You
Equation to Fit
z=a+bx+cy+dx^2+ey^2+fxy+gx^3+hy^3+ixy^2+jx^2y
a 0.065119008
b -0.002506607
c 0.004586821
d 3.73635E-05
e 8.41116E-07
f -1.7902E-05
g -1.28967E-07
h -1.04123E-10
i -2.40641E-09
j 4.42138E-08
X | Y | Z
_______ | _______ | _______
60 | 100 | 0.41994
60 | 200 | 0.79807
60 | 300 | 1.18778
60 | 400 | 1.58945
60 | 500 | 2.00336
60 | 600 | 2.42971
60 | 700 | 2.86858
60 | 800 | 3.31989
60 | 900 | 3.78335
60 | 1000 | 4.25842
60 | 1100 | 4.74429
60 | 1200 | 5.23983
60 | 1300 | 5.74359
60 | 1400 | 6.25381
60 | 1500 | 6.76844
60 | 1600 | 7.28523
60 | 1700 | 7.80179
60 | 1800 | 8.31574
60 | 1900 | 8.82475
60 | 2000 | 9.32668
80 | 100 | 0.40357
80 | 200 | 0.76552
80 | 300 | 1.13711
80 | 400 | 1.5185
80 | 500 | 1.90979
80 | 600 | 2.311
80 | 700 | 2.72205
80 | 800 | 3.14274
80 | 900 | 3.57269
80 | 1000 | 4.01141
80 | 1100 | 4.45817
80 | 1200 | 4.91207
80 | 1300 | 5.37202
80 | 1400 | 5.83674
80 | 1500 | 6.30477
80 | 1600 | 6.77453
80 | 1700 | 7.24438
80 | 1800 | 7.71262
80 | 1900 | 8.17761
80 | 2000 | 8.63777
100 | 100 | 0.38847
100 | 200 | 0.73573
100 | 300 | 1.09104
100 | 400 | 1.45447
100 | 500 | 1.82598
100 | 600 | 2.20551
100 | 700 | 2.59287
100 | 800 | 2.9878
100 | 900 | 3.38993
100 | 1000 | 3.79877
100 | 1100 | 4.21372
100 | 1200 | 4.63401
100 | 1300 | 5.0588
100 | 1400 | 5.48709
100 | 1500 | 5.91781
100 | 1600 | 6.3498
100 | 1700 | 6.78184
100 | 1800 | 7.21271
100 | 1900 | 7.64119
100 | 2000 | 8.06612
120 | 100 | 0.37451
120 | 200 | 0.70832
120 | 300 | 1.04892
120 | 400 | 1.39627
120 | 500 | 1.7503
120 | 600 | 2.11085
120 | 700 | 2.47771
120 | 800 | 2.85059
120 | 900 | 3.22913
120 | 1000 | 3.61287
120 | 1100 | 4.00129
120 | 1200 | 4.39376
120 | 1300 | 4.78958
120 | 1400 | 5.18797
120 | 1500 | 5.58809
120 | 1600 | 5.98905
120 | 1700 | 6.38994
120 | 1800 | 6.78981
120 | 1900 | 7.18777
120 | 2000 | 7.58291
140 | 100 | 0.36155
140 | 200 | 0.683
140 | 300 | 1.01021
140 | 400 | 1.34307
140 | 500 | 1.68148
140 | 600 | 2.02523
140 | 700 | 2.37411
140 | 800 | 2.72783
140 | 900 | 3.08602
140 | 1000 | 3.4483
140 | 1100 | 3.81418
140 | 1200 | 4.18314
140 | 1300 | 4.55459
140 | 1400 | 4.9279
140 | 1500 | 5.3024
140 | 1600 | 5.67739
140 | 1700 | 6.05216
140 | 1800 | 6.42596
140 | 1900 | 6.7981
140 | 2000 | 7.16787
160 | 100 | 0.34948
160 | 200 | 0.65953
160 | 300 | 0.97447
160 | 400 | 1.29419
160 | 500 | 1.61852
160 | 600 | 1.94728
160 | 700 | 2.28022
160 | 800 | 2.61706
160 | 900 | 2.95748
160 | 1000 | 3.3011
160 | 1100 | 3.64752
160 | 1200 | 3.99628
160 | 1300 | 4.34688
160 | 1400 | 4.6988
160 | 1500 | 5.05149
160 | 1600 | 5.40438
160 | 1700 | 5.7569
160 | 1800 | 6.10847
160 | 1900 | 6.4585
160 | 2000 | 6.80647
180 | 100 | 0.33822
180 | 200 | 0.6377
180 | 300 | 0.94137
180 | 400 | 1.24907
180 | 500 | 1.56064
180 | 600 | 1.87588
180 | 700 | 2.19455
180 | 800 | 2.51639
180 | 900 | 2.84109
180 | 1000 | 3.16833
180 | 1100 | 3.49772
180 | 1200 | 3.82888
180 | 1300 | 4.16138
180 | 1400 | 4.49478
180 | 1500 | 4.82863
180 | 1600 | 5.16245
180 | 1700 | 5.49577
180 | 1800 | 5.82812
180 | 1900 | 6.15903
180 | 2000 | 6.48806
200 | 100 | 0.32767
200 | 200 | 0.61734
200 | 300 | 0.91058
200 | 400 | 1.20725
200 | 500 | 1.50717
200 | 600 | 1.81015
200 | 700 | 2.11596
200 | 800 | 2.42434
200 | 900 | 2.73502
200 | 1000 | 3.04768
200 | 1100 | 3.36202
200 | 1200 | 3.67767
200 | 1300 | 3.99427
200 | 1400 | 4.31145
200 | 1500 | 4.62882
200 | 1600 | 4.94597
200 | 1700 | 5.26253
200 | 1800 | 5.57809
200 | 1900 | 5.89227
200 | 2000 | 6.2047

I'm not entirely sure what it is you would like to do, or why google was unsatisfactory, but maybe something along these lines will give you an idea:
x <- rep(c(60,80,100,160,200), each = 10)
y <- c(seq(from = 100, to = 2000, length.out = 25),seq(1800, 200, length.out = 25))
z <- rnorm(50, 6)
df <- data.frame(x,y,z)
mod <- lm(z ~ 1 + x + y + I(x^2) + I(y^2) + I(x*y) + I(x^3) + I(y^3) + I(x*y^2) + I(x*y^3), data =df)
summary(mod)
summary(mod)$adj

Related

Calculating weighted average buy and hold return per ID in R

Thanks to #langtang, I was able to calculate Buy and Hold Return around the event date for each company (Calculating Buy and Hold return around event date per ID in R). But then now I am facing a new problem.
Below is the data I currently have.
+----+------------+-------+------------+------------+----------------------+
| ID | Date | Price | EventDate | Market Cap | BuyAndHoldIndividual |
+----+------------+-------+------------+------------+----------------------+
| 1 | 2011-03-06 | 10 | NA | 109 | NA |
| 1 | 2011-03-07 | 9 | NA | 107 | -0.10000 |
| 1 | 2011-03-08 | 12 | NA | 109 | 0.20000 |
| 1 | 2011-03-09 | 14 | NA | 107 | 0.40000 |
| 1 | 2011-03-10 | 15 | NA | 101 | 0.50000 |
| 1 | 2011-03-11 | 17 | NA | 101 | 0.70000 |
| 1 | 2011-03-12 | 12 | 2011-03-12 | 110 | 0.20000 |
| 1 | 2011-03-13 | 14 | NA | 110 | 0.40000 |
| 1 | 2011-03-14 | 17 | NA | 100 | 0.70000 |
| 1 | 2011-03-15 | 14 | NA | 101 | 0.40000 |
| 1 | 2011-03-16 | 17 | NA | 107 | 0.70000 |
| 1 | 2011-03-17 | 16 | NA | 104 | 0.60000 |
| 1 | 2011-03-18 | 15 | NA | 104 | NA |
| 1 | 2011-03-19 | 16 | NA | 102 | 0.06667 |
| 1 | 2011-03-20 | 17 | NA | 107 | 0.13333 |
| 1 | 2011-03-21 | 18 | NA | 104 | 0.20000 |
| 1 | 2011-03-22 | 11 | NA | 105 | -0.26667 |
| 1 | 2011-03-23 | 15 | NA | 100 | 0.00000 |
| 1 | 2011-03-24 | 12 | 2011-03-24 | 110 | -0.20000 |
| 1 | 2011-03-25 | 13 | NA | 110 | -0.13333 |
| 1 | 2011-03-26 | 15 | NA | 107 | 0.00000 |
| 2 | 2011-03-12 | 48 | NA | 300 | NA |
| 2 | 2011-03-13 | 49 | NA | 300 | NA |
| 2 | 2011-03-14 | 50 | NA | 290 | NA |
| 2 | 2011-03-15 | 57 | NA | 296 | 0.14000 |
| 2 | 2011-03-16 | 60 | NA | 297 | 0.20000 |
| 2 | 2011-03-17 | 49 | NA | 296 | -0.02000 |
| 2 | 2011-03-18 | 64 | NA | 299 | 0.28000 |
| 2 | 2011-03-19 | 63 | NA | 292 | 0.26000 |
| 2 | 2011-03-20 | 67 | 2011-03-20 | 290 | 0.34000 |
| 2 | 2011-03-21 | 70 | NA | 299 | 0.40000 |
| 2 | 2011-03-22 | 58 | NA | 295 | 0.16000 |
| 2 | 2011-03-23 | 65 | NA | 290 | 0.30000 |
| 2 | 2011-03-24 | 57 | NA | 296 | 0.14000 |
| 2 | 2011-03-25 | 55 | NA | 299 | 0.10000 |
| 2 | 2011-03-26 | 57 | NA | 299 | NA |
| 2 | 2011-03-27 | 60 | NA | 300 | NA |
| 3 | 2011-03-18 | 5 | NA | 54 | NA |
| 3 | 2011-03-19 | 10 | NA | 50 | NA |
| 3 | 2011-03-20 | 7 | NA | 53 | NA |
| 3 | 2011-03-21 | 8 | NA | 53 | NA |
| 3 | 2011-03-22 | 7 | NA | 50 | NA |
| 3 | 2011-03-23 | 8 | NA | 51 | 0.14286 |
| 3 | 2011-03-24 | 7 | NA | 52 | 0.00000 |
| 3 | 2011-03-25 | 6 | NA | 55 | -0.14286 |
| 3 | 2011-03-26 | 9 | NA | 54 | 0.28571 |
| 3 | 2011-03-27 | 9 | NA | 55 | 0.28571 |
| 3 | 2011-03-28 | 9 | 2011-03-28 | 50 | 0.28571 |
| 3 | 2011-03-29 | 6 | NA | 52 | -0.14286 |
| 3 | 2011-03-30 | 6 | NA | 53 | -0.14286 |
| 3 | 2011-03-31 | 4 | NA | 50 | -0.42857 |
| 3 | 2011-04-01 | 5 | NA | 50 | -0.28571 |
| 3 | 2011-04-02 | 8 | NA | 55 | 0.00000 |
| 3 | 2011-04-03 | 9 | NA | 55 | NA |
+----+------------+-------+------------+------------+----------------------+
This time, I would like to make a new column called BuyAndHoldWeightedMarket, where I calculate the weighted average (by Market cap) Buy and Hold return for each ID around -5 ~ +5 days of the event date. For example, for ID =1, starting from 2011-03-19, BuyAndHoldWeightedMarket is calculated as the sum product of (prices for each ID(t)/prices for each ID(eventdate-6)-1) and Market Caps for that day for each ID and then dividing that by the sum of the Market Caps for each ID on that day.
Please check the below picture for the details. The equations are listed for each case of colored blocks.
Please note that for the uppermost BuyAndHoldWeightedMarket, ID =2,3 is not involved because they begin later than 2011-03-06. For the third block (grey colored area), the calculation of weighted return only includes ID=1,2 because Id=3 begins later than 2011-03-14. Also, for the Last block (mixed color), the first four rows use all three IDs, Blue area uses only ID=2,3 because ID=1 ends 2011-03-26, and the yellow block uses only ID=3 because ID=1, 2 ends before 2011-03-28.
Eventually, I would like to get a nice data table that looks as below.
+----+------------+-------+------------+------------+----------------------+--------------------------+
| ID | Date | Price | EventDate | Market Cap | BuyAndHoldIndividual | BuyAndHoldWeightedMarket |
+----+------------+-------+------------+------------+----------------------+--------------------------+
| 1 | 2011-03-06 | 10 | NA | 109 | NA | NA |
| 1 | 2011-03-07 | 9 | NA | 107 | -0.10000 | -0.10000 |
| 1 | 2011-03-08 | 12 | NA | 109 | 0.20000 | 0.20000 |
| 1 | 2011-03-09 | 14 | NA | 107 | 0.40000 | 0.40000 |
| 1 | 2011-03-10 | 15 | NA | 101 | 0.50000 | 0.50000 |
| 1 | 2011-03-11 | 17 | NA | 101 | 0.70000 | 0.70000 |
| 1 | 2011-03-12 | 12 | 2011-03-12 | 110 | 0.20000 | 0.20000 |
| 1 | 2011-03-13 | 14 | NA | 110 | 0.40000 | 0.40000 |
| 1 | 2011-03-14 | 17 | NA | 100 | 0.70000 | 0.70000 |
| 1 | 2011-03-15 | 14 | NA | 101 | 0.40000 | 0.40000 |
| 1 | 2011-03-16 | 17 | NA | 107 | 0.70000 | 0.70000 |
| 1 | 2011-03-17 | 16 | NA | 104 | 0.60000 | 0.60000 |
| 1 | 2011-03-18 | 15 | NA | 104 | NA | NA |
| 1 | 2011-03-19 | 16 | NA | 102 | 0.06667 | 0.11765 |
| 1 | 2011-03-20 | 17 | NA | 107 | 0.13333 | 0.10902 |
| 1 | 2011-03-21 | 18 | NA | 104 | 0.20000 | 0.17682 |
| 1 | 2011-03-22 | 11 | NA | 105 | -0.26667 | -0.07924 |
| 1 | 2011-03-23 | 15 | NA | 100 | 0.00000 | 0.07966 |
| 1 | 2011-03-24 | 12 | 2011-03-24 | 110 | -0.20000 | -0.07331 |
| 1 | 2011-03-25 | 13 | NA | 110 | -0.13333 | -0.09852 |
| 1 | 2011-03-26 | 15 | NA | 107 | 0.00000 | 0.02282 |
| 2 | 2011-03-12 | 48 | NA | 300 | NA | NA |
| 2 | 2011-03-13 | 49 | NA | 300 | NA | NA |
| 2 | 2011-03-14 | 50 | NA | 290 | NA | NA |
| 2 | 2011-03-15 | 57 | NA | 296 | 0.14000 | 0.059487331 |
| 2 | 2011-03-16 | 60 | NA | 297 | 0.20000 | 0.147029703 |
| 2 | 2011-03-17 | 49 | NA | 296 | -0.02000 | -0.030094118 |
| 2 | 2011-03-18 | 64 | NA | 299 | 0.28000 | 0.177381404 |
| 2 | 2011-03-19 | 63 | NA | 292 | 0.26000 | 0.177461929 |
| 2 | 2011-03-20 | 67 | 2011-03-20 | 290 | 0.34000 | 0.24836272 |
| 2 | 2011-03-21 | 70 | NA | 299 | 0.40000 | 0.311954459 |
| 2 | 2011-03-22 | 58 | NA | 295 | 0.16000 | 0.025352941 |
| 2 | 2011-03-23 | 65 | NA | 290 | 0.30000 | 0.192911011 |
| 2 | 2011-03-24 | 57 | NA | 296 | 0.14000 | 0.022381918 |
| 2 | 2011-03-25 | 55 | NA | 299 | 0.10000 | 0.009823098 |
| 2 | 2011-03-26 | 57 | NA | 299 | NA | NA |
| 2 | 2011-03-27 | 60 | NA | 300 | NA | NA |
| 3 | 2011-03-18 | 5 | NA | 54 | NA | NA |
| 3 | 2011-03-19 | 10 | NA | 50 | NA | NA |
| 3 | 2011-03-20 | 7 | NA | 53 | NA | NA |
| 3 | 2011-03-21 | 8 | NA | 53 | NA | NA |
| 3 | 2011-03-22 | 7 | NA | 50 | NA | NA |
| 3 | 2011-03-23 | 8 | NA | 51 | 0.14286 | 0.178343199 |
| 3 | 2011-03-24 | 7 | NA | 52 | 0.00000 | 0.010691161 |
| 3 | 2011-03-25 | 6 | NA | 55 | -0.14286 | -0.007160905 |
| 3 | 2011-03-26 | 9 | NA | 54 | 0.28571 | 0.106918456 |
| 3 | 2011-03-27 | 9 | NA | 55 | 0.28571 | 0.073405953 |
| 3 | 2011-03-28 | 9 | 2011-03-28 | 50 | 0.28571 | 0.285714286 |
| 3 | 2011-03-29 | 6 | NA | 52 | -0.14286 | -0.142857143 |
| 3 | 2011-03-30 | 6 | NA | 53 | -0.14286 | -0.142857143 |
| 3 | 2011-03-31 | 4 | NA | 50 | -0.42857 | -0.428571429 |
| 3 | 2011-04-01 | 5 | NA | 50 | -0.28571 | -0.285714286 |
| 3 | 2011-04-02 | 8 | NA | 55 | 0.00000 | 0.142857143 |
| 3 | 2011-04-03 | 9 | NA | 55 | NA | NA |
+----+------------+-------+------------+------------+----------------------+--------------------------+
I tried so far by using the following code, with the help of the previous question, but I am having a hard time figure out how to calculate the weighted BUY AND HOLD return that begins around different event dates for each ID.
#choose rows with no NA in event date and only show ID and event date
events = unique(df[!is.na(EventDate),.(ID,EventDate)])
#helper column
#:= is defined for use in j only. It adds or updates or removes column(s) by reference.
#It makes no copies of any part of memory at all.
events[, eDate:=EventDate]
#makes new column(temporary) lower and upper boundary
df[, `:=`(s=Date-6, e=Date+6)]
#non-equi match
bhr = events[df, on=.(ID, EventDate>=s, EventDate<=e), nomatch=0]
#Generate the BuyHoldReturn column, by ID and EventDate
bhr2 = bhr[, .(Date, BuyHoldReturnM1=c(NA, (Price[-1]/Price[1] -1)*MarketCap[-1])), by = .(Date)]
#merge back to get the full data
bhr3 = bhr2[df,on=.(ID,Date),.(ID,Date,Price,EventDate=i.EventDate,BuyHoldReturn)]
I would be grateful if you could help.
Thank you very much in advance!

Mariadb: select average by hour and other column

I have a table in a Mariadb version 10.3.27 database that looks like this:
+----+------------+---------------+-----------------+
| id | channel_id | timestamp | value |
+----+------------+---------------+-----------------+
| 1 | 2 | 1623669600000 | 2882.4449252449 |
| 2 | 1 | 1623669600000 | 295.46914369742 |
| 3 | 2 | 1623669630000 | 2874.46365243 |
| 4 | 1 | 1623669630000 | 295.68124546516 |
| 5 | 2 | 1623669660000 | 2874.9638893452 |
| 6 | 1 | 1623669660000 | 295.69561247521 |
| 7 | 2 | 1623669690000 | 2878.7120274678 |
and I want to have a result like this:
+------+-------+-------+
| hour | valhh | valwp |
+------+-------+-------+
| 0 | 419 | 115 |
| 1 | 419 | 115 |
| 2 | 419 | 115 |
| 3 | 419 | 115 |
| 4 | 419 | 115 |
| 5 | 419 | 115 |
| 6 | 419 | 115 |
| 7 | 419 | 115 |
| 8 | 419 | 115 |
| 9 | 419 | 115 |
| 10 | 419 | 115 |
| 11 | 419 | 115 |
| 12 | 419 | 115 |
| 13 | 419 | 115 |
| 14 | 419 | 115 |
| 15 | 419 | 115 |
| 16 | 419 | 115 |
| 17 | 419 | 115 |
| 18 | 419 | 115 |
| 19 | 419 | 115 |
| 20 | 419 | 115 |
| 21 | 419 | 115 |
| 22 | 419 | 115 |
| 23 | 419 | 115 |
+------+-------+-------+
but with valhh (valwp) being the average of the values for the hour of the day for all days where the channel_id is 1 (2) and not the overall average. So far, I've tried:
select h.hour, hh.valhh, wp.valwp from
(select hour(from_unixtime(timestamp/1000)) as hour from data) h,
(select hour(from_unixtime(timestamp/1000)) as hour, cast(avg(value) as integer) as valhh from data where channel_id = 1) hh,
(select hour(from_unixtime(timestamp/1000)) as hour, cast(avg(value) as integer) as valwp from data where channel_id = 2) wp group by h.hour;
which gives the result above (average of all values).
I can get what I want by querying the channels separately, i.e.:
select hour(from_unixtime(timestamp/1000)) as hour, cast(avg(value) as integer) as value from data where channel_id = 1 group by hour;
gives
+------+-------+
| hour | value |
+------+-------+
| 0 | 326 |
| 1 | 145 |
| 2 | 411 |
| 3 | 142 |
| 4 | 143 |
| 5 | 171 |
| 6 | 160 |
| 7 | 487 |
| 8 | 408 |
| 9 | 186 |
| 10 | 214 |
| 11 | 199 |
| 12 | 942 |
| 13 | 521 |
| 14 | 196 |
| 15 | 247 |
| 16 | 364 |
| 17 | 252 |
| 18 | 392 |
| 19 | 916 |
| 20 | 1024 |
| 21 | 1524 |
| 22 | 561 |
| 23 | 249 |
+------+-------+
but I want to have both channels in one result set as separate columns.
How would I do that?
Thanks!

After a steep learning curve I think I figured it out:
select
hh.hour, hh.valuehh, wp.valuewp
from
(select
hour(from_unixtime(timestamp/1000)) as hour,
cast(avg(value) as integer) as valuehh
from data
where channel_id=1
group by hour) hh
inner join
(select
hour(from_unixtime(timestamp/1000)) as hour,
cast(avg(value) as integer) as valuewp
from data
where channel_id=2
group by hour) wp
on hh.hour = wp.hour;
gives
+------+---------+---------+
| hour | valuehh | valuewp |
+------+---------+---------+
| 0 | 300 | 38 |
| 1 | 162 | 275 |
| 2 | 338 | 668 |
| 3 | 166 | 38 |
| 4 | 152 | 38 |
| 5 | 176 | 37 |
| 6 | 174 | 38 |
| 7 | 488 | 36 |
| 8 | 553 | 37 |
| 9 | 198 | 36 |
| 10 | 214 | 38 |
| 11 | 199 | 612 |
| 12 | 942 | 40 |
| 13 | 521 | 99 |
| 14 | 187 | 38 |
| 15 | 209 | 38 |
| 16 | 287 | 39 |
| 17 | 667 | 37 |
| 18 | 615 | 39 |
| 19 | 854 | 199 |
| 20 | 1074 | 44 |
| 21 | 1470 | 178 |
| 22 | 665 | 37 |
| 23 | 235 | 38 |
+------+---------+---------+

Calculating date difference of subsequent rows based on each group in R

I have a sample table which looks somewhat like this:
| Date | Vendor_Id | Requisitioner | Amount |
|------------|:---------:|--------------:|--------|
| 1/17/2019 | 98 | John | 2405 |
| 4/30/2019 | 1320 | Dave | 1420 |
| 11/29/2018 | 3887 | Michele | 596 |
| 11/29/2018 | 3887 | Michele | 960 |
| 11/29/2018 | 3887 | Michele | 1158 |
| 9/21/2018 | 4919 | James | 857 |
| 10/25/2018 | 4919 | Paul | 1162 |
| 10/26/2018 | 4919 | Echo | 726 |
| 10/26/2018 | 4919 | Echo | 726 |
| 10/29/2018 | 4919 | Andrew | 532 |
| 10/29/2018 | 4919 | Andrew | 532 |
| 11/12/2018 | 4919 | Carlos | 954 |
| 5/21/2018 | 2111 | June | 3580 |
| 5/23/2018 | 7420 | Justin | 224 |
| 5/24/2018 | 1187 | Sylvia | 3442 |
| 5/25/2018 | 1187 | Sylvia | 4167 |
| 5/30/2018 | 3456 | Ama | 4580 |
Based on each requisitioner and vendor id, I need to find the difference in the date such that it should be something like this:
| Date | Vendor_Id | Requisitioner | Amount | Date_Diff |
|------------|:---------:|--------------:|--------|-----------|
| 1/17/2019 | 98 | John | 2405 | NA |
| 4/30/2019 | 1320 | Dave | 1420 | 103 |
| 11/29/2018 | 3887 | Michele | 596 | NA |
| 11/29/2018 | 3887 | Michele | 960 | 0 |
| 11/29/2018 | 3887 | Michele | 1158 | 0 |
| 9/21/2018 | 4919 | James | 857 | NA |
| 10/25/2018 | 4919 | Paul | 1162 | NA |
| 10/26/2018 | 4919 | Paul | 726 | 1 |
| 10/26/2018 | 4919 | Paul | 726 | 0 |
| 10/29/2018 | 4919 | Paul | 532 | 3 |
| 10/29/2018 | 4919 | Paul | 532 | 0 |
| 11/12/2018 | 4917 | Carlos | 954 | NA |
| 5/21/2018 | 2111 | Justin | 3580 | NA |
| 5/23/2018 | 7420 | Justin | 224 | 2 |
| 5/24/2018 | 1187 | Sylvia | 3442 | NA |
| 5/25/2018 | 1187 | Sylvia | 4167 | 1 |
| 5/30/2018 | 3456 | Ama | 4580 | NA |
Now, if the difference in the date is <=3 days within each requisitioner and vendor id, and sum of the amount is >5000, I need to create a subset of that. The final output should be something like this:
| Date | Vendor_Id | Requisitioner | Amount | Date_Diff |
|-----------|:---------:|--------------:|--------|-----------|
| 5/24/2018 | 1187 | Sylvia | 3442 | NA |
| 5/25/2018 | 1187 | Sylvia | 4167 | 1 |
Initially, when I tried working with date difference, I used the following code:
df=df %>% mutate(diffdate= difftime(Date,lag(Date,1)))
However, the difference doesn't make sense as they are huge numbers such as 86400 and some huge random numbers. I tried the above code when data type for 'Date' field was initially Posixct. Later when I changed it to 'Date' data type, the date differences were still the same huge random numbers.
Also, is it possible to group the date differences based on requisitioners and vendor id's as mentioned in the 2nd table above?
EDIT:
I'm coming across a new challenge now. In the problem set, I need to filter out the values whose date differences are less than 3 days. Let us assume that the table with date difference appears something like this:
| MasterCalendarDate | Vendor_Id | Requisitioner | Amount | diffdate |
|--------------------|:---------:|--------------:|--------|----------|
| 1/17/2019 | 98 | John | 2405 | #N/A |
| 4/30/2019 | 1320 | Dave | 1420 | 103 |
| 11/29/2018 | 3887 | Michele | 596 | #N/A |
| 11/29/2018 | 3887 | Michele | 960 | 0 |
| 11/29/2018 | 3887 | Michele | 1158 | 0 |
| 9/21/2018 | 4919 | Paul | 857 | #N/A |
| 10/25/2018 | 4919 | Paul | 1162 | 34 |
| 10/26/2018 | 4919 | Paul | 726 | 1 |
| 10/26/2018 | 4919 | Paul | 726 | 0 |
When we look at the requisitioner 'Paul', the date diff between 9/21/2018 and 10/25/2018 is 34 and between that of 10/25/2018 and 10/26/2018 is 1 day. However, when I filter the data for date difference <=3 days, I miss out on 10/25/2018 because of 34 days difference. I have multiple such occurrences. How can I fix it?

I think you need to convert your date variable using as.Date(), then you can compute the lagged time difference using difftime().
# create toy data frame
df <- data.frame(date=as.Date(paste(sample(2018:2019,100,T),
sample(1:12,100,T),
sample(1:28,100,T),sep = '-')),
req=sample(letters[1:10],100,T),
amount=sample(100:10000,100,T))
# compute lagged time difference in days -- diff output is numeric
df %>% arrange(req,date) %>% group_by(req) %>%
mutate(diff=as.numeric(difftime(date,lag(date),units='days')))
# as above plus filtering based on time difference and amount
df %>% arrange(req,date) %>% group_by(req) %>%
mutate(diff=as.numeric(difftime(date,lag(date),units='days'))) %>%
filter(diff<10 | is.na(diff), amount>5000)
# A tibble: 8 x 4
# Groups: req [7]
date req amount diff
<date> <fct> <int> <dbl>
1 2018-05-13 a 9062 NA
2 2019-05-07 b 9946 2
3 2018-02-03 e 5697 NA
4 2018-03-12 g 7093 NA
5 2019-05-16 g 5631 3
6 2018-03-06 h 7114 6
7 2018-08-12 i 5151 6
8 2018-04-03 j 7738 8

SQL getting the smallest difference between max and min

I am trying to find the symbol of the smallest difference. But I don't know what to do answer finding the difference to compare the two.
I have this set:
+------+------+-------------+-------------+--------------------+------+--------+
| clid | cust | Min | Max | Difference | Qty | symbol |
+------+------+-------------+-------------+--------------------+------+--------+
| 102 | C6 | 11.8 | 12.72 | 0.9199999999999999 | 1500 | GE |
| 110 | C3 | 44 | 48.099998 | 4.099997999999999 | 2000 | INTC |
| 115 | C4 | 1755.25 | 1889.650024 | 134.40002400000003 | 2000 | AMZN |
| 121 | C9 | 28.25 | 30.27 | 2.0199999999999996 | 1500 | BAC |
| 130 | C7 | 8.48753 | 9.096588 | 0.609058000000001 | 5000 | F |
| 175 | C3 | 6.41 | 7.71 | 1.2999999999999998 | 1500 | SBS |
| 204 | C5 | 6.41 | 7.56 | 1.1499999999999995 | 5000 | SBS |
| 208 | C2 | 1782.170044 | 2004.359985 | 222.1899410000001 | 5000 | AMZN |
| 224 | C10 | 153.350006 | 162.429993 | 9.079986999999988 | 1500 | FB |
| 269 | C6 | 355.980011 | 392.299988 | 36.319976999999994 | 2000 | BA |
+------+------+-------------+-------------+--------------------+------+--------+
so far I have this Query
select d.clid,
d.cust,
MIN(f.fillPx) as Min,
MAX(f.fillPx) as Max,
MAX(f.fillPx)-MIN(f.fillPx) as Difference,
d.Qty,
d.symbol
from orders d
inner join mp f on d.clid=f.clid
group by f.clid
having SUM(f.fillQty) < d.Qty
order by d.clid;
What am I missing so that I can compare the min and max and get the smallest different symbol?
mp table:
+------+------+--------+------+------+---------+-------------+--------+
| clid | cust | symbol | side | oQty | fillQty | fillPx | execid |
+------+------+--------+------+------+---------+-------------+--------+
| 123 | C2 | SBS | SELL | 5000 | 273 | 7.37 | 1 |
| 157 | C9 | C | SELL | 1500 | 167 | 69.709999 | 2 |
| 254 | C9 | GE | SELL | 5000 | 440 | 13.28 | 3 |
| 208 | C2 | AMZN | SELL | 5000 | 714 | 1864.420044 | 4 |
| 102 | C6 | GE | SELL | 1500 | 136 | 12.32 | 5 |
| 160 | C7 | INTC | SELL | 1500 | 267 | 44.5 | 6 |
| 145 | C10 | GE | SELL | 5000 | 330 | 13.28 | 7 |
| 208 | C2 | AMZN | SELL | 5000 | 1190 | 1788.609985 | 8 |
| 161 | C1 | C | SELL | 1500 | 135 | 72.620003 | 9 |
| 181 | C5 | FCX | BUY | 1500 | 84 | 12.721739 | 10 |
orders table:
+------+------+--------+------+------+
| cust | side | symbol | qty | clid |
+------+------+--------+------+------+
| C1 | SELL | C | 1500 | 161 |
| C9 | SELL | INTC | 2000 | 231 |
| C10 | SELL | BMY | 1500 | 215 |
| C1 | BUY | SBS | 2000 | 243 |
| C4 | BUY | AMZN | 2000 | 226 |
| C10 | BUY | C | 1500 | 211 |

If you want one symbol, you can use order by and limit:
select d.clid,
d.cust,
MIN(f.fillPx) as Min,
MAX(f.fillPx) as Max,
MAX(f.fillPx)-MIN(f.fillPx) as Difference,
d.Qty,
d.symbol
from orders d join
mp f
on d.clid = f.clid
group by d.clid, d.cust, d.Qty, d.symbol
having SUM(f.fillQty) < d.Qty
order by difference
limit 1;
Notice that I added the rest of the unaggregated columns to the group by.

R non linear Regression script

I am new to R and I am trying to build a nonlinear correlation in the format below. I have tried a script in R but It is not working and return an error message " singular gradient matrix at initial parameter estimates". Can Someone please help me with the right script to enter in R in order to estimate the updated correlation Coefficients based on new data set?. The data set is made of 3 variables Z,X and Y. I would like to Estimate Z=f(x,y).
Thank You
Equation to Fit
z=a+bx+cy+dx^2+ey^2+fxy+gx^3+hy^3+ixy^2+jx^2y
a 0.065119008
b -0.002506607
c 0.004586821
d 3.73635E-05
e 8.41116E-07
f -1.7902E-05
g -1.28967E-07
h -1.04123E-10
i -2.40641E-09
j 4.42138E-08
X | Y | Z
_______ | _______ | _______
60 | 100 | 0.41994
60 | 200 | 0.79807
60 | 300 | 1.18778
60 | 400 | 1.58945
60 | 500 | 2.00336
60 | 600 | 2.42971
60 | 700 | 2.86858
60 | 800 | 3.31989
60 | 900 | 3.78335
60 | 1000 | 4.25842
60 | 1100 | 4.74429
60 | 1200 | 5.23983
60 | 1300 | 5.74359
60 | 1400 | 6.25381
60 | 1500 | 6.76844
60 | 1600 | 7.28523
60 | 1700 | 7.80179
60 | 1800 | 8.31574
60 | 1900 | 8.82475
60 | 2000 | 9.32668
80 | 100 | 0.40357
80 | 200 | 0.76552
80 | 300 | 1.13711
80 | 400 | 1.5185
80 | 500 | 1.90979
80 | 600 | 2.311
80 | 700 | 2.72205
80 | 800 | 3.14274
80 | 900 | 3.57269
80 | 1000 | 4.01141
80 | 1100 | 4.45817
80 | 1200 | 4.91207
80 | 1300 | 5.37202
80 | 1400 | 5.83674
80 | 1500 | 6.30477
80 | 1600 | 6.77453
80 | 1700 | 7.24438
80 | 1800 | 7.71262
80 | 1900 | 8.17761
80 | 2000 | 8.63777
100 | 100 | 0.38847
100 | 200 | 0.73573
100 | 300 | 1.09104
100 | 400 | 1.45447
100 | 500 | 1.82598
100 | 600 | 2.20551
100 | 700 | 2.59287
100 | 800 | 2.9878
100 | 900 | 3.38993
100 | 1000 | 3.79877
100 | 1100 | 4.21372
100 | 1200 | 4.63401
100 | 1300 | 5.0588
100 | 1400 | 5.48709
100 | 1500 | 5.91781
100 | 1600 | 6.3498
100 | 1700 | 6.78184
100 | 1800 | 7.21271
100 | 1900 | 7.64119
100 | 2000 | 8.06612
120 | 100 | 0.37451
120 | 200 | 0.70832
120 | 300 | 1.04892
120 | 400 | 1.39627
120 | 500 | 1.7503
120 | 600 | 2.11085
120 | 700 | 2.47771
120 | 800 | 2.85059
120 | 900 | 3.22913
120 | 1000 | 3.61287
120 | 1100 | 4.00129
120 | 1200 | 4.39376
120 | 1300 | 4.78958
120 | 1400 | 5.18797
120 | 1500 | 5.58809
120 | 1600 | 5.98905
120 | 1700 | 6.38994
120 | 1800 | 6.78981
120 | 1900 | 7.18777
120 | 2000 | 7.58291
140 | 100 | 0.36155
140 | 200 | 0.683
140 | 300 | 1.01021
140 | 400 | 1.34307
140 | 500 | 1.68148
140 | 600 | 2.02523
140 | 700 | 2.37411
140 | 800 | 2.72783
140 | 900 | 3.08602
140 | 1000 | 3.4483
140 | 1100 | 3.81418
140 | 1200 | 4.18314
140 | 1300 | 4.55459
140 | 1400 | 4.9279
140 | 1500 | 5.3024
140 | 1600 | 5.67739
140 | 1700 | 6.05216
140 | 1800 | 6.42596
140 | 1900 | 6.7981
140 | 2000 | 7.16787
160 | 100 | 0.34948
160 | 200 | 0.65953
160 | 300 | 0.97447
160 | 400 | 1.29419
160 | 500 | 1.61852
160 | 600 | 1.94728
160 | 700 | 2.28022
160 | 800 | 2.61706
160 | 900 | 2.95748
160 | 1000 | 3.3011
160 | 1100 | 3.64752
160 | 1200 | 3.99628
160 | 1300 | 4.34688
160 | 1400 | 4.6988
160 | 1500 | 5.05149
160 | 1600 | 5.40438
160 | 1700 | 5.7569
160 | 1800 | 6.10847
160 | 1900 | 6.4585
160 | 2000 | 6.80647
180 | 100 | 0.33822
180 | 200 | 0.6377
180 | 300 | 0.94137
180 | 400 | 1.24907
180 | 500 | 1.56064
180 | 600 | 1.87588
180 | 700 | 2.19455
180 | 800 | 2.51639
180 | 900 | 2.84109
180 | 1000 | 3.16833
180 | 1100 | 3.49772
180 | 1200 | 3.82888
180 | 1300 | 4.16138
180 | 1400 | 4.49478
180 | 1500 | 4.82863
180 | 1600 | 5.16245
180 | 1700 | 5.49577
180 | 1800 | 5.82812
180 | 1900 | 6.15903
180 | 2000 | 6.48806
200 | 100 | 0.32767
200 | 200 | 0.61734
200 | 300 | 0.91058
200 | 400 | 1.20725
200 | 500 | 1.50717
200 | 600 | 1.81015
200 | 700 | 2.11596
200 | 800 | 2.42434
200 | 900 | 2.73502
200 | 1000 | 3.04768
200 | 1100 | 3.36202
200 | 1200 | 3.67767
200 | 1300 | 3.99427
200 | 1400 | 4.31145
200 | 1500 | 4.62882
200 | 1600 | 4.94597
200 | 1700 | 5.26253
200 | 1800 | 5.57809
200 | 1900 | 5.89227
200 | 2000 | 6.2047

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Non linear regression using R - r

Related

Calculating weighted average buy and hold return per ID in R

Mariadb: select average by hour and other column

Calculating date difference of subsequent rows based on each group in R

SQL getting the smallest difference between max and min

R non linear Regression script

Categories

Resources