Get the latest month rows within a year? - aggregate-functions

I want to get the latest data from a table in ABAP.
Here an example from the table ckmlcr:
MANDT
KALNR
BDATJ
POPER
UNTPER
CURTP
PEINH
VPRSV
STPRS
PVPRS
WAERS
...
100
000100000000
2020
007
000
10
1
S
1.00
0.00
JPY
...
100
000100000000
2020
007
000
30
1
S
1.00
0.00
JPY
...
100
000100000000
2020
007
000
31
1
S
1.00
0.00
JPY
...
100
000100000000
2020
008
000
10
1
S
1.00
0.00
JPY
...
100
000100000000
2020
008
000
30
1
S
1.00
0.00
JPY
...
100
000100000000
2020
008
000
31
1
S
1.00
0.00
JPY
...
100
000199999999
2020
007
000
10
1
S
20.00
0.00
EUR
...
100
000199999999
2020
007
000
30
1
S
25.00
0.00
EUR
...
100
000199999999
2020
007
000
31
1
S
20.00
0.00
EUR
...
I want to get the latest data for each KALNR so this would mean my output table should have following values:
MANDT
KALNR
BDATJ
POPER
UNTPER
CURTP
PEINH
VPRSV
STPRS
PVPRS
WAERS
...
100
000100000000
2020
008
000
10
1
S
1.00
0.00
JPY
...
100
000100000000
2020
008
000
30
1
S
1.00
0.00
JPY
...
100
000100000000
2020
008
000
31
1
S
1.00
0.00
JPY
...
100
000199999999
2020
007
000
10
1
S
20.00
0.00
EUR
...
100
000199999999
2020
007
000
30
1
S
25.00
0.00
EUR
...
100
000199999999
2020
007
000
31
1
S
20.00
0.00
EUR
...
My program should have as selection the year
PARAMETERS: bdatj TYPE ckmlcr-bdatj DEFAULT sy-datum+0(4) OBLIGATORY.
and should uses the highest period (POPER) for each cost estimate number (KALNR).
What is the easiest way to achieve this? Due to a lot of data it would be nice to directly get the filtered data within the SQL select on the table.
This would be the SQL statement without any modifications to get the latest data.
SELECT * FROM ckmlcr INTO TABLE #DATA(ckmlcr_single)
WHERE kalnr = #<ckmlcr_line>-kalnr
AND bdatj = #bdatj.

Learn how to use subqueries
SELECT kalnr, bdatj, poper, untper, curtp, peinh, vprsv, stprs, pvprs, waers
FROM ckmlcr AS cr
INTO TABLE #DATA(ckmlcr_single)
WHERE bdatj = #bdatj
AND poper = ( SELECT MAX( poper ) from ckmlcr WHERE kalnr = cr~kalnr AND bdatj = cr~bdatj ).
P.S. Habituate yourself to put select fields explicitly instead of asterisk, it will serve a good job in future.

What worked for me is:
SELECT * FROM ckmlcr INTO TABLE #DATA(ckmlcr_single)
WHERE kalnr = #<ckmlcr_line>-kalnr
AND bdatj = #bdaj
AND poper = ( SELECT MAX( poper ) from ckmlcr
WHERE kalnr = #kalnr
AND bdatj = 2022 ).
This solution only works for a single kalnr not as requested above for the whole table.
Thus means my program currently looking like this:
" get single kalnr
SELECT DISTINCT kalnr FROM ckmlcr INTO TABLE #DATA(data_ckmlcr)
WHERE bdatj = #bdatj
ORDER BY kalnr.
LOOP AT data_ckmlcr ASSIGNING FIELD-SYMBOL(<ckmlcr_line>).
" get the latest data from this kalnr
SELECT * FROM ckmlcr INTO TABLE #DATA(ckmlcr_single)
WHERE kalnr = #<ckmlcr_line>-kalnr
AND bdatj = #bdatj
AND poper = ( SELECT MAX( poper ) from ckmlcr
WHERE kalnr = #<ckmlcr_line>-kalnr
AND bdatj = #bdatj ).
[...]
ENDLOOP.
Due to a select in a loop that is not quite performant...

Related

Subsetting only positive values of specific column in a list

I have the following code to get options data list and create a new list to get only puts data (only_puts_list)
library(quantmod)
Symbols<-c ("AA","AAL","AAOI","ABBV","ABC","ABNB")
Options.20221111 <- lapply(Symbols, getOptionChain)
names(Options.20221111) <- Symbols
only_puts_list <- lapply(Options.20221111, function(x) x$puts)
I'd like now to subset the only_puts_list and create a new list (i.e. new_list1) to subset and get only the data which has a positive value in the column ChgPct of the only_puts_list.
I guess lapply should work, but how to apply to only positive values of a specific column ChgPct?
We could use subset after looping over the list with lapply
new_list1 <- lapply(only_puts_list, subset, subset = ChgPct > 0)
If we check the output, most of the list elements returned have only 0 rows as there were no positive observations in 'ChgPct'. We could Filter to keep only those having any rows
new_list1_sub <- Filter(nrow, new_list1)
-output
new_list1_sub
$ABBV
ContractID ConractSize Currency Expiration Strike Last Chg ChgPct Bid Ask Vol OI LastTradeTime IV
31 ABBV221202P00155000 REGULAR USD 2022-12-02 155.0 0.66 0.1100000 20.00000 0.56 0.66 70 480 2022-11-29 13:10:43 0.2690503
32 ABBV221202P00157500 REGULAR USD 2022-12-02 157.5 1.49 0.2400000 19.20000 1.41 1.51 544 383 2022-11-29 13:17:43 0.2627027
33 ABBV221202P00160000 REGULAR USD 2022-12-02 160.0 3.05 0.4300001 16.41222 2.79 2.99 34 308 2022-11-29 12:07:54 0.2692944
34 ABBV221202P00162500 REGULAR USD 2022-12-02 162.5 4.95 1.6499999 50.00000 4.80 5.05 6 28 2022-11-29 13:26:10 0.3017648
ITM
31 FALSE
32 FALSE
33 TRUE
34 TRUE
$ABC
ContractID ConractSize Currency Expiration Strike Last Chg ChgPct Bid Ask Vol OI LastTradeTime IV ITM
18 ABC221202P00165000 REGULAR USD 2022-12-02 165 1.05 0.1999999 23.5294 0.6 0.8 3 111 2022-11-29 09:51:47 0.2710034 FALSE

gnuplot: plot multiple lines from single file and make title at end from column and key title manually

I'm quite new with gnuplot and so maybe my question has an obvious answer. Please excuse if this is too noobish.
I have the following data
20 500 1.0
30 500 0.95
40 500 0.85
50 500 0.7
60 500 0.5
20 1000 1.1
30 1000 1.05
40 1000 0.95
50 1000 0.8
60 1000 0.6
20 1500 1.2
30 1500 1.15
40 1500 1.05
50 1500 0.9
60 1500 0.7
20 2000 1.26
30 2000 1.22
40 2000 1.13
50 2000 0.99
60 2000 0.79
20 2500 1.33
30 2500 1.29
40 2500 1.21
50 2500 1.06
60 2500 0.88
Plotting this as a surface worked fine. Now I would like to plot this as 5 separate lines (using 1:3) and have the 2nd column as 'title at end' for each of the lines.
I tried
plot "demo.dat" using 1:3:2 with lines title columnhead(2) at end
but this will only label the last line (which is bogus) with 500 and ignore all the others. Also it sets 500 as title in the key box (which I would like to set to another string). Is that possible or do I have to split the blocks into several files as suggested in How to plot single/multiple lines depending on values in a column with GNUPlot ?
You may try in two steps:
plot 'demo.dat' u 1:3 notitle with lines, \
'demo.dat' u 1:3:2 every ::4::4 notitle with labels
first plotting the data.
then adding a label at the last point of each block (composed of 5 points going from 0 to 4), or at the point before the last one replacing ::4::4 by ::3::3.
Another rather general solution. No need for different data format or for splitting the data into several files.
Of course, it would be easier if the data was split into subblocks by two empty lines, however, the OP's data is separated only by single empty lines. How to handle this without modifying the data outside gnuplot?
For the following solution you don't have to know in advance how many subblocks your data has and how many datapoints there are in one subblock.
Furthermore:
different colors for the subblocks
value of column 2 as label at the end
some other text can be placed in the legend/key
Edit:
each subblock can have different number of datapoints
works with gnuplot>=4.6.0
Data: SO32603146.dat
10 500 1.05
20 500 1.0
30 500 0.95
40 500 0.85
50 500 0.7
60 500 0.5
20 1000 1.1
30 1000 1.05
40 1000 0.95
50 1000 0.8
20 1500 1.2
30 1500 1.15
40 1500 1.05
50 1500 0.9
60 1500 0.7
70 1500 0.51
80 1500 0.30
20 2000 1.26
30 2000 1.22
40 2000 1.13
50 2000 0.99
60 2000 0.79
70 2000 0.61
20 2500 1.33
30 2500 1.29
40 2500 1.21
50 2500 1.06
60 2500 0.88
Script: (works with gnuplot>=4.6.0, March 2012)
### plotting some subblock data with label at the end
reset
FILE = "SO32603146.dat"
set rmargin 7
set key at graph 1.0,0.95 noautotitle
addPosLabel(colX,colY,colL) = (x0=x1,x1=column(colX), y0=y1,y1=column(colY), \
L0=L1,L1=strcol(colL), b0=b1,b1=column(-1),b0!=b1 ? \
myLabels = myLabels.sprintf(" %g %g %s",x0,y0,L0) : 0, column(colY))
x1 = y1 = b1 = NaN
L1 = myLabels = ''
PosX(i) = real(word(myLabels,int(i*3+3)))
PosY(i) = real(word(myLabels,int(i*3+4)))
Label(i) = word(myLabels,int(i*3+5))
plot FILE u 1:(addPosLabel(1,3,2)):-1 w lp pt 7 lc var, \
myLabels=myLabels.sprintf(" %g %g %s",x1,y1,L1), \
'+' u (x0=int($0*3+5),PosX($0)):(PosY($0)):(word(myLabels,x0)) \
every ::::(words(myLabels)-2)/3-1 w labels left offset 1,0 ti "Some other text\n for the legend"
### end of script
Result:

Format of time series data with regression variables

I am new to R and am attempting an analysis in R with the tslm() function.
Sample data in csv format:
UnitSales GDP GDPPerCap CPI PropInvIndex DispIncTopDecile TransCommSecDecile CivilVehOwn Urban AutoFin
2000 1 1198243.4 949 81.62 4984 10643 618 1609 36.22 0
2001 2 1324337.8 1042 81.38 6344 14219 782 1802 37.66 0
2002 3 1453827.558 1135 80.8 7790.9223 18995.9 991.2 2053.17 39.08978381 0
2003 4 1640958.735 1274 81.83 10153.8009 21837.3 1106 2382.93 40.53022975 0
2004 5 1931644.33 1409 85.02 13158.2516 25377.2 1274.2 2693.71 41.76000862 0
2005 6 2256902.591 1731 86.56 15909.2471 28773.1 1590.3 3160 42.98999663 0
2006 7 2712950.885 2069 87.83 19422.9174 31967.3 1801 3697.3531 44.34301016 0
2007 9 3494055.942 2651 92.02 25288.8373 36784.5 2467.7 4358.355 45.8892446 0.1
2008 11 4521827.271 3414 97.45 31203.1942 43613.8 2632.9 5099.6094 46.98950317 0.12
2009 13 4990233.519 3749 96.76 36241.808 46826.1 3181.9 6280.6086 48.34170101 0.14
2010 15 5930502.27 4433 100 48259.403 51431.6 3630.6 7801.8259 49.94966105 0.16
2011 18 7321891.955 5447 105.45 61796.8858 58841.9 3963 9356.3163 51.27027127 0.18
2012 21 8229490.03 6093 108.22 71803.7869 63824.2 4304.1 10933.0912 52.57008656 0.22
I load the data and then attempt to run:
testc <- tslm(UnitSales~GPD+trend, data=lm0015c)
A simple attempt to model UnitSales from the GDP variable plus the trend.
I get the following error:
Error in tslm(UnitSales ~ GPD + trend, data = lm0015c) :
Not time series data
How do I designate the data as a time series?
You can create a time series formatted version of your data directly with the ts() function.
yourGreatData <- ts(d[,2:length(d)], start = 2000, end = 2012)
I discovered as easier solution to this problem is simply by coercing your dataframe as a time series:
testc <- tslm(UnitSales~GPD+trend, data=as.ts(lm0015c))

if statement and mutate

EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1
My data look like the data above. I want to create a new column EMP using mutate function that:
emp= average*FIRMTOT if EMPLTOT_N/FIRMTOT<min
and emp=EMPLTOT_N if EMPLTOT_N/FIRMTOT>min
In your sample data EMPLTOT_N / FIRMTOT is never less than min, but this should work:
df <- read.table(text = "EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1", header = TRUE)
library('dplyr')
mutate(df, emp = ifelse(EMPLTOT_N / FIRMTOT < min, average * FIRMTOT, EMPLTOT_N))
In the above if EMPLTOT_N / FIRMTOT == min, emp will be given the value of EMPLTOT_N since you didn't specify what you want to happen in this case.

Plotting different columns on the same file using boxes

I have a file that looks like
$cat myfile.dat
1 8 32 19230 1.186 3.985
1 8 64 9620 0.600 7.877
1 8 128 4810 0.312 15.136
1 8 256 2410 0.226 20.927
1 8 512 1210 0.172 27.708
1 8 1024 610 0.135 35.582
1 8 2048 310 0.121 40.172
1 8 4096 160 0.117 43.141
1 8 8192 80 0.112 44.770
.....
2 8 16384 300 0.692 6.816
2 8 32768 150 0.686 6.877
2 8 65536 80 0.853 5.904
2 10 320 7830 1.041 4.575
2 10 640 3920 0.919 5.189
2 10 1280 1960 0.828 5.757
2 10 2560 980 0.773 6.167
2 10 5120 490 0.746 6.391
2 10 10240 250 0.748 6.507
2 10 20480 130 0.770 6.567
....
3 18 8192 10 1.311 12.759
3 20 32 650 1.631 3.978
3 20 64 330 0.838 7.863
3 20 128 170 0.483 14.046
3 20 256 90 0.508 14.160
3 20 512 50 0.559 14.283
3 20 1024 30 0.665 14.405
3 20 2048 20 0.865 14.782
3 20 4096 10 0.856 14.932
3 20 8192 10 1.704 14.998
As you can see, there are many ways of plotting this information depending on the column we want as x axis. One of the ways I would like to plot the information is the 6th against the 1st column
p "myfile.dat" u 1:6
My main questions is if there is a way to plot those bars as solid boxes since we are only interested in the peak value achieved and not the frequency or density region of the dots.
Gnuplot has the smooth option, which can be used e.g. as smooth frequency to sum all y-values for the same x-value. Unfortunately there is no smooth maximum, which you would need here, but one can 'emulate' that with a bit of tricking in the Using statement.
reset
xval = -1000
max(x, y) = (x > y ? x : y)
maxval = 0
colnum = 6
set boxwidth 0.2
plot 'mydata.dat' using (val = column(colnum), $1):\
(maxval_prev = (xval == $1 ? maxval : 0), \
maxval = (xval == $1 ? max(maxval, val) : val),\
xval = $1, \
(maxval > maxval_prev ? maxval-maxval_prev : 0)\
) \
smooth frequency lw 3 with boxes t 'maximum values'
Every using entry can consist of different assignments, which are separated by a comma.
If a new x value appears, the variables are initialized. This works, because the data is made monotonic in x by smooth frequency.
If the current value is bigger than the stored maximum value, the difference between the stored maximum value and the current value is added. Potentially, this could result in numerical errors due to repeated adding and subtracting, but judging from you sample data and given the resolution of the plot, this shouldn't be a problem.
The result for you data is:
You can search for the maximum and plot only that, but this is probably easier, even if it draws lots of boxes one over another:
plot "myfile.dat" using 1:6:(.1) with boxes fillstyle solid

Resources