I have a report which has two groups. Group B always has only 2 values. I want to get the difference of total values of Item Type 01 and Item Type 02 to the Group B footer (Tot type01 - tot type02).
Help me to achieve this. I tried few formulas but non of them works for me
Month01 Month2
Group A
Group B
Item Type 01
ab 10 10
ac 20 30
ad 30 30
**Total** 60 70
Item Type 02
ab 10 20
ac 10 15
ad 20 5
**Total** 40 30
**Difference 20 40**
I want something like this
NumberVar sum01 := 0;
Numbervar sum02 := 0;
GroupName ({DataTable1.IncomeType}) = Type 01
Then
sum01 := Sum ({DataTable1.Month01}, {DataTable1.IncomeType})
if
GroupName ({DataTable1.IncomeType}) = Type 02
Then
sum02 := Sum ({DataTable1.Month01}, {DataTable1.IncomeType})
sum01 - sum02
I know this isn't correct. I used it to explain my question for you as much as possible.
Really appreciate your guidence
You can do this using arrays..
Take 2 arrays and store values for Month1 and Month2 and in group footer retrive and add those.
Create a formula #Month1Array and place in Item Type group footer after Month1 summary
Shared Numbervar array x;
x:=x+sum(Month1,Item GRoup);
1;
Create a formula #Month2Array and place in Item Type group footer after Month2 summary
Shared Numbervar array y;
y:=y+sum(Month2,Item GRoup);
1;
Now in the footer where you want to see the difference write below formula for
Create a formula #Month1
Shared Numbervar array x;
x[1]-x[2]
Create a formula #Month2
Shared Numbervar array y;
y[1]-y[2]
Related
working on an economic optimization problem with pyomo, I would like to add a constraint to prevent the product of the commodity quantity and its price to go below zero (<0), avoiding a negative revenue. It appears that all the data are in a dataframe and I can't setup a constraint like:
def positive_revenue(model, t)
return model.P * model.C >=0
model.positive_rev = Constraint(model.T, rule=positive_revenue)
The system returns the error that the price is a scalar and it cannot process it. Indeed the price is set as such in the model:
model.T = Set(doc='quarter of year', initialize=df.quarter.tolist(), ordered=True)
model.P = Param(initialize=df.price.tolist(), doc='Price for each quarter')
##while the commodity is:
model.C = Var(model.T, domain=NonNegativeReals)
I just would like to apply that for each timestep (quarter of hour here) that:
price(t) * model.C(t) >=0
Can someone help me to spot the issue ? Thanks
Here are more information:
df dataframe:
df time_stamp price Status imbalance
quarter
0 2021-01-01 00:00:00 64.84 Final 16
1 2021-01-01 00:15:00 13.96 Final 38
2 2021-01-01 00:30:00 12.40 Final 46
index = quarter from 0 till 35049, so it is ok
Here is the df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_stamp 35040 non-null datetime64[ns]
1 price 35040 non-null float64
2 Status 35040 non-null object
3 imbalance 35040 non-null int64
I modified the to_list() > to_dict() in model.T but still facing the same issue:
KeyError: "Cannot treat the scalar component 'P' as an indexed component" at the time model.T is defined in the model parameter, set and variables.
Here is the constraint where the system issues the error:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
Can't figure it out...any idea ?
UPDATE
Model works after dropping an unfortunate 'quarter' column somewhere...after I renamed the index as quarter.
It runs but i still get negative revenues, so the constraints seems not working at present, here is how it is written:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
What am I missing here ? Thanks for help, just beginning
Welcome to the site.
The problem you appear to be having is that you are not building your model parameter model.P as an indexed component. I believe you likely want it to be indexed by your set model.T.
When you make indexed params in pyomo you need to initialize it with some key:value pairing, like a python dictionary. You can make that from your data frame by re-indexing your data frame so that the quarter labels are the index values.
Caution: The construction you have for model.T and this assume there are no duplicates in the quarter names.
If you have duplicates (or get a warning) then you'll need to do something else. If the quarter labels are unique you can do this:
import pandas as pd
import pyomo.environ as pyo
df = pd.DataFrame({'qtr':['Q5', 'Q6', 'Q7'], 'price':[12.80, 11.50, 8.12]})
df.set_index('qtr', inplace=True)
print(df)
m = pyo.ConcreteModel()
m.T = pyo.Set(initialize=df.index.to_list())
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
m.pprint()
which should get you:
price
qtr
Q5 12.80
Q6 11.50
Q7 8.12
1 Set Declarations
T : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 3 : {'Q5', 'Q6', 'Q7'}
1 Param Declarations
price : Size=3, Index=T, Domain=Any, Default=None, Mutable=False
Key : Value
Q5 : 12.8
Q6 : 11.5
Q7 : 8.12
2 Declarations: T price
edit for clarity...
NOTE:
The first argument when you create a pyomo parameter is the indexing set. If this is not provided, pyomo assumes that it is a scalar. You are missing the set as shown in my example and highlighted with arrow here: :)
|
|
|
V
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
Also note, you will need to initialize model.P with a dictionary as I have in the example, not a list.
I am trying to run a loop where I count the total in each file under the variable _merge, and then count certain outcomes of _merge, such as _merge=1 and so on. I then want to calculate percentages by dividing each instance of _merge by the total under _merge.
Below is my code:
/*define local list*/
local ward_names B C D E FN FS GS HE
/*loop for each dbase*/
foreach file of local ward_names {
use "../../../cleaning/sra/output/`file'_ward_CTS_Merged.dta", clear
count if _merge
local ward_count=r(N)
count if _merge==1
local count_master=r(N)
count if _merge==2
local count_using=r(N)
count if _merge==3
local count_match=r(N)
clear
set obs 1
g ward_count='ward_count'
g count_master=`count_master'
g count_using=`count_using'
g count_match=`count_match'
g ward= "`file'"
save "../temp/`file'_collapsed_diagnostics.dta", replace
clear
The code was running fine until I tried to add the total count for each ward file:
g ward_count='ward_count'
'ward_count' invalid name
Is this a syntax error or something more severe?
You need to use ` instead of ' when you refer to a local macro:
generate ward_count = `ward_count'
EDIT:
As per #NickCox's recommendation you can improve your code by using the tabulate command with its matcell() option to get the counts all at once:
tabulate _merge, matcell(A)
_merge | Freq. Percent Cum.
------------------------+-----------------------------------
master only (1) | 1 16.67 16.67
matched (3) | 5 83.33 100.00
------------------------+-----------------------------------
Total | 6 100.00
matrix list A
A[2,1]
c1
r1 1
r2 5
So you could then do the following:
generate count_master = A[1,1]
generate count_match = A[2,1]
I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:
Patient_num enc_num ordering_date medication age_event
1111 888888 07NOV2008 Wellbutrin 48
1111 876578 11MAY2011 Bupropion 50
2222 999999 08DEC2009 Amitriptyline 32
2222 999999 08DEC2009 Escitalopram 32
3333 656463 12APR2007 Imipramine 44
3333 643211 21DEC2008 Zoloft 45
3333 543213 02FEB2009 Fluoxetine 45
Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).
data pt_meds_;
set pt_meds;
by patient_num;
if first.patient_num then RX = 1;
else RX + 1;
run;
Patient_num enc_num ordering_date medication age_event RX
1111 888888 07NOV2008 Wellbutrin 48 1
1111 876578 11MAY2011 Bupropion 50 2
2222 999999 08DEC2009 Amitriptyline 32 1
2222 999999 08DEC2009 Escitalopram 32 2
3333 656463 12APR2007 Imipramine 44 1
3333 643211 21DEC2008 Zoloft 45 2
3333 543213 02FEB2009 Fluoxetine 45 3
I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:
Patient_num enc_num ordering_date medication age_event RX Enc_
1111 888888 07NOV2008 Wellbutrin 48 1 1
1111 876578 11MAY2011 Bupropion 50 2 2
2222 999999 08DEC2009 Amitriptyline 32 1 1
2222 999999 08DEC2009 Escitalopram 32 2 1
3333 656463 12APR2007 Imipramine 44 1 1
3333 643211 21DEC2008 Zoloft 45 2 2
3333 543213 02FEB2009 Fluoxetine 45 3 3
From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.
I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
if first.patient_num;
if first.ordering_date then enc_ = 1;
else enc_ + 1;
run;
First
.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:
data class;
set sashelp.class;
by age;
first = first.age;
last = last.age;
run;
ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
Try this code to see how exacly .first/.last flags works:
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
fp = first.patient_num;
lp = last.patient_num;
fo = first.ordering_date;
lo = last.ordering_date;
run;
Second
Those condidions works differently than you think:
if expression;
If expression is true then continue with next instructions after if.
Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.
In most cases if without then is equivalent to where. However
whereworks faster but it is limited to variables that comes from data set you are reading
if can be used with any type of expression including calculated fields
More info:: IF
Statement, Subsetting
Third
I think lag() function can be your answear.
data pt_meds_test;
set pt_meds_;
by patient_num;
retain enc_;
prev_patient_num = lag(patient_num);
prev_ordering_date = lag(ordering_date);
if first.patient_num then enc_ = 1;
else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
end;
run;
With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later.
But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there.
More info: LAG Function
I'm not sure if this hurts the rest of your analysis, but what about just
proc freq data=pt_meds noprint;
tables patient_num ordering_date / out=pt_meds_freq;
run;
data pt_meds_freq2;
set pt_meds_freq;
by patient_num ordering_date;
if first.patient_num;
run;
I'm using an Hash Table to store some values. Here are the details:
There will be roughly 1M items to store (not known before, so no perfect-hash possible).
Table is 10M large.
Hash function is MurMurHash3.
I did some tests and storing 1M values I get 350,000 collisions and 30 elements at the most-colliding hash table's slot.
Are these result good?
Would it make sense to implement Binary Search for lists that get created at colliding hash-table's slots?
What' your advice to improve performances?
EDIT: Here is my code
var
HashList: array [0..10000000 - 1] of Integer;
for I := 0 to High(HashList) do
HashList[I] := 0;
for I := 1 to 1000000 do
begin
Y := MurmurHash3(UIntToStr(I));
Y := Y mod Length(HashList);
Inc(HashList[Y]);
if HashList[Y] > 1 then
Inc(TotalCollisionsCount);
if HashList[Y] > MostCollidingSlotItemCount then
MostCollidingSlotItemCount := HashList[Y];
end;
Writeln('Total: ' + IntToStr(TotalCollisionsCount) + ' Max: ' + IntToStr(MostCollidingSlotItemCount));
Here is the result I get:
Total: 48169 Max: 5
Am I missing something?
This is what you get when you put 1M items randomly into 10M cells
calendar_size=10000000 nperson = 1000000
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 9048262 (0.904826) 0 (0.000000) 0 0 0
1: 905064 (0.090506) 905064 (0.905064) 1 905064 905064
2: 45136 (0.004514) 90272 (0.090272) 3 135408 1040472
3: 1488 (0.000149) 4464 (0.004464) 6 8928 1049400
4: 50 (0.000005) 200 (0.000200) 10 500 1049900
----+---------+--------+----------+--------+------+--------+--------
5: 10000000 1000000 1.049900 1049900
The left column is the number of items in a cell. The second: the number of cells having this itemcount.
WRT the binary search: it is obvious that for small tables like this (maximum chain length=4, but most chains are of length=1), linear search outperforms binary search. The takeover-point is probably somewhere between 10 and 100.
I have a matrix as follows
`> y
1 2 3
1 0.8802216 1.2277843 0.6875047
2 0.9381081 1.3189847 0.2046542
3 1.3245534 0.8221709 0.4630722
4 1.2006974 0.8890464 0.6710844
5 1.2344071 0.8354292 0.7259998
6 1.1670665 0.9214787 0.6826173
7 0.9670581 1.1070461 0.7742342
8 0.8867365 1.2160533 0.7024281
9 0.8235792 1.4424190 0.2030302
10 0.8821301 1.0541099 1.2279813
11 1.1958634 0.9708839 0.4297043
12 1.3542734 0.7747481 0.5119648
13 0.4385487 0.3588158 4.9167998
14 0.8530141 1.3578511 0.3698620
15 0.9651803 0.8426226 1.6132899
16 0.8854192 1.2272616 0.6715839
17 0.7779642 0.8132233 2.3386331
18 0.9936722 1.1629110 0.5083558
19 1.1235897 1.0018480 0.5764672
20 0.7887222 1.3101684 0.7373181
21 2.2276176 0.0000000 0.0000000`
I found one clue, but it can give position for the whole matrix,`
n<-read.table(file.choose(),header=T)
y<-n[,c("1","2","3")]
my.number=1.12270420185886 .
z<-abs(y-my.number)==min(abs(y-my.number))
which(z)
[1] 19 `
I want to find at least the 5 closest values with letter & column no too, in another way, I want the 5 closest single values from the matrix with their position.
I don't know what language it is; is it R?
In a procedural language, I would save all values to a map (val, (pos)) = (val (row, col); example (0.880..-> (1, 1)), then sort by value.
Then iterate over i<-pos (1 to map.size-5), and get the diff (pos (i), pos (i+5)), search for the minimum (diff), get the values and their position then.
Here is a solution in Scala:
val matrix = """1 0.8802216 1.2277843 0.6875047
2 0.9381081 1.3189847 0.2046542
3 1.3245534 0.8221709 0.4630722
4 1.2006974 0.8890464 0.6710844
5 1.2344071 0.8354292 0.7259998
6 1.1670665 0.9214787 0.6826173
7 0.9670581 1.1070461 0.7742342
8 0.8867365 1.2160533 0.7024281
9 0.8235792 1.4424190 0.2030302
10 0.8821301 1.0541099 1.2279813
11 1.1958634 0.9708839 0.4297043
12 1.3542734 0.7747481 0.5119648
13 0.4385487 0.3588158 4.9167998
14 0.8530141 1.3578511 0.3698620
15 0.9651803 0.8426226 1.6132899
16 0.8854192 1.2272616 0.6715839
17 0.7779642 0.8132233 2.3386331
18 0.9936722 1.1629110 0.5083558
19 1.1235897 1.0018480 0.5764672
20 0.7887222 1.3101684 0.7373181
21 2.2276176 0.0000000 0.0000000"""
// split block of text into lines
val lines=matrix.split ("\n")
// split lines into words
val rows = lines.map (l => l.split (" \\+"))
// remove the index from the beginning (1, 2, ... 21) and
// transform values from Strings to double numbers
// triples is: Array(Array(0.8802216, 1.2277843, 0.6875047), Array(0.9381081, 1.3189847, 0.2046542),
val triples = rows.map (_.tail).map(triple=> triple.map (_.toDouble))
// generate an own index for the rows and columns
// elems is: elems: Array[Array[(Double, (Int, Int))]] = Array(Array((0.8802216,(0,0)), (1.2277843,(0,1)), (0.6875047,(0,2))), Array((0.9381081,(1,0)), ...
val elems = triples.zipWithIndex.map {t=> t._1.zipWithIndex.map (vc=> (vc._1 -> (t._2, vc._2)))}
// sorted = Array((0.0,(20,1)), (0.0,(20,2)), (0.2030302,(8,2)), (0.2046542,(1,2)),
val sorted = elems.sortBy (e => e._1)
// delta5 = List(0.3588158, 0.369862, 0.2266741, 0.2338945, 0.10425639999999997, 0.1384938,
val delta5 = sorted.sliding (5, 1).map (q => q(4)._1-q(0)._1).toList
val minindex = delta5.indexOf (delta5.min) // minindex: Int = 29, delta5.min = 0.008824799999999966
// we found the smallest intervall of 5 values beginning at 29:
(29 to 29 +5).map (sorted (_))
res568: scala.collection.immutable.IndexedSeq[(Double, (Int, Int))] =
Vector( (0.8802216,(0,0)),
(0.8821301,(9,0)),
(0.8854192,(15,0)),
(0.8867365,(7,0)),
(0.8890464,(3,1)),
(0.9214787,(5,1)))
Since Scala counts from 0 to 20 and 0 to 2, where your index runs from 1 to 3 and 1 to 21 respectively, you have to add (1,1) to each of the positions=> (1,1), (10,1), and so on.