Gnuplot: data normalization - plot

I have several time-based datasets which are of very different scale, e. g.
[set 1]
2010-01-01 10
2010-02-01 12
2010-03-01 13
2010-04-01 19
…
[set 2]
2010-01-01 920
2010-02-01 997
2010-03-01 1010
2010-04-01 1043
…
I'd like to plot the relative growth of both since 2010-01-01. To put both curves on the same graph I have to normalize them. So I basically need to pick the first Y value and use it as a weight:
plot "./set1" using 1:($2/10), "./set2" using 1:($2/920)
But I want to do it automatically instead of hard-coding 10 and 920 as dividers. I don't even need the max value of the second column, I just want to pick the first value or, better, a value for a given date.
So my question: is there a way to parametrize the value of a given column which corresponds a given value of the given X column (X is a time axis)? Something like
plot "./set1" using 1:($2/$2($1="2010-01-01")), "./set2" using 1:($2/$2($1="2010-01-01"))
where $2($1="2010-01-01") is the feature I'm looking for.

Picking the first value is quite easy. Simply remember its value and divide all data values by it:
ref = 0
plot "./set1" using 1:(ref = ($0 == 0 ? $2 : ref), $2/ref),\
"./set2" using 1:(ref = ($0 == 0 ? $2 : ref), $2/ref)
Using the value at a given date is more involved:
Using an external tool (awk)
ref1 = system('awk ''$1 == "2010-01-01" { print $2; exit; }'' set1')
ref2 = system('awk ''$1 == "2010-01-01" { print $2; exit; }'' set1')
plot "./set1" using 1:($2/ref1), "./set1" using 1:($2/ref2)
Using gnuplot
You can use gnuplot's stats command to pick the desired value, but you must pay attention to do all time settings only after that:
a) String comparison
stats "./set1" using (strcol(1) eq "2010-01-01" ? $2 : 1/0)
ref1 = STATS_max
...
set timefmt ...
set xdata time
...
plot ...
b) Compare the actual time value (works like this only since version 5.0):
reftime = strptime("%Y-%m-%d", "2010-01-01")
stats "./set1" using (timecolumn(1, "%Y-%m-%d") == reftime ? $2 : 1/0)
ref1 = STATS_max
...
set timefmt ...
set xdata time
...
plot ...

Related

Constraint issue with pyomo involving a scalar

working on an economic optimization problem with pyomo, I would like to add a constraint to prevent the product of the commodity quantity and its price to go below zero (<0), avoiding a negative revenue. It appears that all the data are in a dataframe and I can't setup a constraint like:
def positive_revenue(model, t)
return model.P * model.C >=0
model.positive_rev = Constraint(model.T, rule=positive_revenue)
The system returns the error that the price is a scalar and it cannot process it. Indeed the price is set as such in the model:
model.T = Set(doc='quarter of year', initialize=df.quarter.tolist(), ordered=True)
model.P = Param(initialize=df.price.tolist(), doc='Price for each quarter')
##while the commodity is:
model.C = Var(model.T, domain=NonNegativeReals)
I just would like to apply that for each timestep (quarter of hour here) that:
price(t) * model.C(t) >=0
Can someone help me to spot the issue ? Thanks
Here are more information:
df dataframe:
df time_stamp price Status imbalance
quarter
0 2021-01-01 00:00:00 64.84 Final 16
1 2021-01-01 00:15:00 13.96 Final 38
2 2021-01-01 00:30:00 12.40 Final 46
index = quarter from 0 till 35049, so it is ok
Here is the df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_stamp 35040 non-null datetime64[ns]
1 price 35040 non-null float64
2 Status 35040 non-null object
3 imbalance 35040 non-null int64
I modified the to_list() > to_dict() in model.T but still facing the same issue:
KeyError: "Cannot treat the scalar component 'P' as an indexed component" at the time model.T is defined in the model parameter, set and variables.
Here is the constraint where the system issues the error:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
Can't figure it out...any idea ?
UPDATE
Model works after dropping an unfortunate 'quarter' column somewhere...after I renamed the index as quarter.
It runs but i still get negative revenues, so the constraints seems not working at present, here is how it is written:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
What am I missing here ? Thanks for help, just beginning
Welcome to the site.
The problem you appear to be having is that you are not building your model parameter model.P as an indexed component. I believe you likely want it to be indexed by your set model.T.
When you make indexed params in pyomo you need to initialize it with some key:value pairing, like a python dictionary. You can make that from your data frame by re-indexing your data frame so that the quarter labels are the index values.
Caution: The construction you have for model.T and this assume there are no duplicates in the quarter names.
If you have duplicates (or get a warning) then you'll need to do something else. If the quarter labels are unique you can do this:
import pandas as pd
import pyomo.environ as pyo
df = pd.DataFrame({'qtr':['Q5', 'Q6', 'Q7'], 'price':[12.80, 11.50, 8.12]})
df.set_index('qtr', inplace=True)
print(df)
m = pyo.ConcreteModel()
m.T = pyo.Set(initialize=df.index.to_list())
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
m.pprint()
which should get you:
price
qtr
Q5 12.80
Q6 11.50
Q7 8.12
1 Set Declarations
T : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 3 : {'Q5', 'Q6', 'Q7'}
1 Param Declarations
price : Size=3, Index=T, Domain=Any, Default=None, Mutable=False
Key : Value
Q5 : 12.8
Q6 : 11.5
Q7 : 8.12
2 Declarations: T price
edit for clarity...
NOTE:
The first argument when you create a pyomo parameter is the indexing set. If this is not provided, pyomo assumes that it is a scalar. You are missing the set as shown in my example and highlighted with arrow here: :)
|
|
|
V
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
Also note, you will need to initialize model.P with a dictionary as I have in the example, not a list.

Using condition in columns of data frame to generate a vector in R

I have the following array:
Year Month Day Hour
1 1 1 1 0
2 1 1 1 3
...
etc
I wrote a function which I then tried to vectorize by using apply in order to run calculations row-by-row basis, but it doesn't work due to the booleans:
day_in_season<-function(tarr){
#first month in season
if((tarr$month==12) || (tarr$month==3) ||(tarr$month==6) || (tarr$month==9)){
d=tarr$day
#second month in season
}else if ((tarr$month==1) || (tarr$month==4)){
d=31+tarr$day
}else if((tarr$month==7) || (tarr$month==10)){
d=30+tarr$day
#third month in season
}else if((tarr$month==2)){
d=62+tarr$day
}else{
d=61+tarr$day
}
h=tarr$hour/24
d=d+h
return(d)
}
I tried
apply(tdjf,1,day_in_season)
but it raised this exception:
Error in tarr$month : $ operator is invalid for atomic vectors
(I already knew about this potential pitfall, but that's why I wanted to use apply in the first place!)
The only way I can currently get it to work is if I do this:
days<-c()
for (x in 1:nrow(tdjf)){
d<-day_in_season(tdjf[x,])
days=append(days,d)
}
If there were only a few values, I'd throw up my hands and just use the for loop, efficiency be damned, but I have over 15,000 rows and that's just one dataset. I know that there has to be a way to make it work.
To vectorize your code, use ifelse() and| instead of ||:
ifelse(
(tarr$month==12) | (tarr$month==3) |(tarr$month==6) | (tarr$month==9),
tarr$day,
ifelse((tarr$month==1) | (tarr$month==4),
31+tarr$day,
ifelse((tarr$month==7) | (tarr$month==10),
30+tarr$day,
ifelse(tarr$month==2,
62+tarr$day,
61+tarr$day)
)
)
)+tarr$hour/24
You might be surprised at how quickly a well constructed for loop can run. If designed well, it has about the same efficiency of an apply statement.
The properfor loop in your case is
tdjf$days <- vector ("numeric", nrow (tdjf))
for (x in seq_along (tdjf$days)){
tdjf$days [x] <- day_in_season(tdjf[x,])
}
If you really want to go the apply route, I would recommend rewriting your function to take three arguments -- month, day, and hour -- and pass those three columns into mapply

To calculate Moving/Rolling back Weekly (7 days) Sum:

Please help to calculate Moving/Rolling back Weekly Sum of Amount($4) based on Distributor wise ($2) and Rolling Date wise.
Want to set vaiable like
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
For Example :
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Example: 8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Output for 8th May 2015 Rolling 7 Days data set
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
I am able to obtain the above output from this command :
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
Kindly suggest how to derive weekly split-up data sets then Sum.
Desired Output:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
Edit#1
1.
The logic is to find a Sum of Amount is billed to the distributor for the period of 7days range, i.e if i need to calculate sum for 1st May then I need to consider the line items from 1st May,30th Apr,29th Apr,28th Apr,27th Apr,26th Apr and 25th Apr , It is equivalent to 1st May (-) minus 6 days back ... like wise 2nd May rolling date is equal to from 2nd May to 26th May ( 2nd May minus 6 days back ..)
2.
Date format is DD/MM/YYYY - 02/05/2015 is 2nd May
Since the file contains 2 to 3 months deatils , dont want to select the first date (25/04/2015) from file then do minus 6 days back analysis , hence "RollingStartDate" will help from which dates need to consider the data , "RollingInterval" will help to do the analysis for "7 days" moving back or "14 days" moving back or "30 days monthly " moving back analysis.
"RollingEndDate" will help to avoid if actual file contains any future date data availabe , in this case if 09th or 15th may date line items need to be excluded ...
Here's a solution that just excludes dates that don't have 7 days before them instead of requiring a specific start/stop range:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
.
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
.
To use some different window size than 7 days, just set it on the command line:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
The above uses GNU awk for true 2D arrays and time functions. Hopefully it's clear enough that you can make any modifications you need to include/exclude specific date ranges.

gnuplot Heatmap with label and score from different columns data

I have created heatmap graphs using gnuplot.
I have data.dat:
avail reli perf
stop 181 20 121 10 34 20
jitter 18 20 17 20 13 20
limp 12 20 5 30 20 20
and gnuplot script:
set term pos eps font 20
unset key
set nocbtics
set cblabel "Score"
set cbtics scale 0
set cbrange [ 0.00000 : 110.00000 ] noreverse nowriteback
set palette defined ( 0.0 "#FFFFFF",\
1 "#FFCCCC",\
20.2 "#FF9999 ",\
30.3 "#FF6666",\
40.4 "#FF3333",\
50.5 "#FF0000",\
60.6 "#CC0000",\
70.7 "#C00000",\
80.8 "#B00000",\
90.9 "#990000",\
100.0 "#A00000")
set title "Faults"
set ylabel "Hardware Faults"
set xlabel "Aspects"
set size 1, 0.5
set output 'c11.eps'
YTICS="`awk 'BEGIN{getline}{printf "%s ",$1}' 'data2.dat'`"
XTICS="`head -1 'data2.dat'`"
set for [i=1:words(XTICS)] xtics ( word(XTICS,i) i-1 )
set for [i=1:words(YTICS)] ytics ( word(YTICS,i) i-1 )
plot "<awk '{$1=\"\"}1' 'data2.dat' | sed '1 d'" matrix w image, '' matrix using 1:2:($3==0 ? " " : sprintf("%.1d",$3)) with labels
#######^ replace the first field with nothing
################################## ^ delete first line
My output is:
Here I have range 1-20,30-39,...,100 or more)
Now I have to 2 values in every axis. e.g stop and avail have(181 and 20). the 181 is the count and 20 is percentages. I want to create graphs which have colors base on percentages and the labels on my graphs from the counts of data.
I have experienced create some graph using for and do some modulo to select the data. But here, I have not idea to create that graphs. Any suggestion for creating this? Thanks!
You can use every to skip columns.
plot ... every 2 only uses every second column, which is what you can use for the labels. For the colors, you must start with the second column (numbered with 1), and you need every 2::1.
Following are the relevant changes only to your script:
set for [i=1:words(XTICS)] xtics ( word(XTICS,i) 2*i-1 )
plot "<awk '{$1=\"\"}1' 'data2.dat' | sed '1 d'" matrix every 2::1 w image, \
'' matrix using ($1+1):2:(sprintf('%d', $3)) every 2 with labels
The result with 4.6.5 is:

Large grouped data plotting

I have a large amount of data to plot, and I'm trying to use gnuplot. The data is a sorted array of around 80000 elements. By simply using
plot "myData.txt" using 1:2 with linespoints linetype 1 pointtype 1
I get the output, but: it takes time to render, and the points are often cluttered, with occasional gaps. To address the second, I thought of doing the bar chart: each of the entries
would correspond to a bar. However, I'm not sure how to achieve this. I would like to have some space between consecutive bars, but I don't expect that it would be visible. What would be your suggestion to plot the data?
........................
Due to large data volume, I guess it's best to group.
Note that my data looks like
1 11041.9
2 11041.9
3 9521.07
4 9521.07
5 9520.07
6 9519.07
7 9018.07
...
I would like to plot the data by a groups of 3, ie., the first vertical line should start at 9521.07 as a minimum of the points from 1, 2, 3, and end at 11041. The second vertical line should consider the following 3 points: 4, 5 and 6, and start at 9519.07 with an end at 9521.07, and so on.
Could this be achieved with gnuplot, given the data file as illustrated? If so, I would appreciate if someone posts a set of commands I should use.
To reduce the number of points gnuplot actually draws, you can use the every keyword, e.g.
plot "myData.txt" using 1:2 with linespoints linetype 1 pointtype 1 every 100
will plot every 100th data point.
I am not sure if it's possible to do what you want (plotting vertical lines) elegantly within gnuplot, but here is my solution (assuming a UNIX-y environment). First make an awk script called sort.awk:
BEGIN { RS = "" }
{
# the next two lines handle the case where
# there are not three lines in a record
xval = $1 + 1
ymin = ymax = $2
# find y minimum
if ($2 <= $4 && $2 <= $6)
ymin=$2
else if ($4 <= $2 && $4 <= $6 && $4 != "")
ymin=$4
else if ($6 <= $2 && $6 <= $4 && $6 != "")
ymin=$6
# find y maximum
if ($2 >= $4 && $2 >= $6)
ymax=$2
else if ($4 >= $2 && $4 >= $6)
ymax=$4
else if ($6 >= $2 && $6 >= $4)
ymax=$6
# print the formatted line
print ($1+1) " " ymin " " ymin " " ymax " " ymax
}
Now this gnuplot script will call it:
set terminal postscript enhanced color
set output 'plot.eps'
set boxwidth 3
set style fill solid
plot "<sed 'n;n;G;' myData.txt | awk -f sort.awk" with candlesticks title 'pretty data'
It's not pretty but it works. sed adds a blank line every 3 lines, and awk formats the output for the candlesticks style. You can also try embedding the awk script in the gnuplot script.
You can do something like that...(it'll be easiest on unix). You will need to insert a space every third line -- I don't see any way around that. If you're on unix, the command
awk 'NR % 3 == 0 {print ""} 1' myfile
should do it. ( see How do I insert a blank line every n lines using awk? )
Of course, you could (and probably should) pack that straight into your gnuplot file.
So, all said and done, you'd have something like this:
xval(x)=int(x)/3 #Return the x position on the plot
plot "< awk 'NR % 3 == 0 {print ""} 1' datafile" using (xval($1)):2 with lines

Resources