Get value by date - datetime

I have a data frame df:
PRICE
2004-03-19 36.250000
2004-03-20 36.237500
2004-03-21 36.225000
2004-03-22 36.212500
etc...
The index is of type:
DatetimeIndex(['2004-03-19', '2004-03-20', '2004-03-21', ...],
dtype='datetime64[ns]', length=1691, freq='D')
I want to retrieve the PRICE at a certain day using df[datetime.date(2004,3,19)]. This is what pandas does:
KeyError: datetime.date(2004, 3, 19)
The following works, but that can't be the way it is supposed to work:
df[df.index.isin(pd.DatetimeIndex([datetime.date(2004,3,19)]))].PRICE.values[0]

The problem here is that the comparison is being performed for an exact match, as none of the times are 00:00:00 then no matches occur.
You can use loc with DatetimeIndex:
print df.loc[pd.DatetimeIndex(['2004-3-19'])]
PRICE
2004-03-19 36.25
Or you can use loc, convert string 2004-3-19 to_datetime and get date of DatetimeIndex:
print df.loc[pd.to_datetime('2004-3-19').date()]
PRICE 36.25
Name: 2004-03-19 00:00:00, dtype: float64
If you need value of PRICE:
print df.loc[pd.DatetimeIndex(['2004-3-19']), 'PRICE']
2004-03-19 36.25
Name: PRICE, dtype: float64
print df.loc[pd.DatetimeIndex(['2004-3-19']), 'PRICE'].values[0]
36.25
print df.loc[pd.to_datetime('2004-3-19').date(), 'PRICE']
36.25
But if add time to datetime, DatetimeIndex match:
print df.loc[pd.to_datetime('2004-3-19 00:00:00')]
PRICE 36.25
Name: 2004-03-19 00:00:00, dtype: float64
print df.loc[pd.to_datetime('2004-3-19 00:00:00'), 'PRICE']
36.25

Your index appears to be timestamps, whereas you are trying to equate them to datetime.date objects.
Rather than trying to retrieve the price via df[datetime.date(2004,3,19)], I would simply recommend df['2004-3-19'].
If you are intent on using datetime.date values, you should first convert the index.
df.index = [d.date() for d in df.index]

Related

Constraint issue with pyomo involving a scalar

working on an economic optimization problem with pyomo, I would like to add a constraint to prevent the product of the commodity quantity and its price to go below zero (<0), avoiding a negative revenue. It appears that all the data are in a dataframe and I can't setup a constraint like:
def positive_revenue(model, t)
return model.P * model.C >=0
model.positive_rev = Constraint(model.T, rule=positive_revenue)
The system returns the error that the price is a scalar and it cannot process it. Indeed the price is set as such in the model:
model.T = Set(doc='quarter of year', initialize=df.quarter.tolist(), ordered=True)
model.P = Param(initialize=df.price.tolist(), doc='Price for each quarter')
##while the commodity is:
model.C = Var(model.T, domain=NonNegativeReals)
I just would like to apply that for each timestep (quarter of hour here) that:
price(t) * model.C(t) >=0
Can someone help me to spot the issue ? Thanks
Here are more information:
df dataframe:
df time_stamp price Status imbalance
quarter
0 2021-01-01 00:00:00 64.84 Final 16
1 2021-01-01 00:15:00 13.96 Final 38
2 2021-01-01 00:30:00 12.40 Final 46
index = quarter from 0 till 35049, so it is ok
Here is the df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_stamp 35040 non-null datetime64[ns]
1 price 35040 non-null float64
2 Status 35040 non-null object
3 imbalance 35040 non-null int64
I modified the to_list() > to_dict() in model.T but still facing the same issue:
KeyError: "Cannot treat the scalar component 'P' as an indexed component" at the time model.T is defined in the model parameter, set and variables.
Here is the constraint where the system issues the error:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
Can't figure it out...any idea ?
UPDATE
Model works after dropping an unfortunate 'quarter' column somewhere...after I renamed the index as quarter.
It runs but i still get negative revenues, so the constraints seems not working at present, here is how it is written:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
What am I missing here ? Thanks for help, just beginning
Welcome to the site.
The problem you appear to be having is that you are not building your model parameter model.P as an indexed component. I believe you likely want it to be indexed by your set model.T.
When you make indexed params in pyomo you need to initialize it with some key:value pairing, like a python dictionary. You can make that from your data frame by re-indexing your data frame so that the quarter labels are the index values.
Caution: The construction you have for model.T and this assume there are no duplicates in the quarter names.
If you have duplicates (or get a warning) then you'll need to do something else. If the quarter labels are unique you can do this:
import pandas as pd
import pyomo.environ as pyo
df = pd.DataFrame({'qtr':['Q5', 'Q6', 'Q7'], 'price':[12.80, 11.50, 8.12]})
df.set_index('qtr', inplace=True)
print(df)
m = pyo.ConcreteModel()
m.T = pyo.Set(initialize=df.index.to_list())
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
m.pprint()
which should get you:
price
qtr
Q5 12.80
Q6 11.50
Q7 8.12
1 Set Declarations
T : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 3 : {'Q5', 'Q6', 'Q7'}
1 Param Declarations
price : Size=3, Index=T, Domain=Any, Default=None, Mutable=False
Key : Value
Q5 : 12.8
Q6 : 11.5
Q7 : 8.12
2 Declarations: T price
edit for clarity...
NOTE:
The first argument when you create a pyomo parameter is the indexing set. If this is not provided, pyomo assumes that it is a scalar. You are missing the set as shown in my example and highlighted with arrow here: :)
|
|
|
V
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
Also note, you will need to initialize model.P with a dictionary as I have in the example, not a list.

format_datetime() in Kusto for datetime with minutes and secounds as 00

In the azure data explorer documentation, there is a lot of supported formats but not the one that i am looking for.
What I need is to format the datetime like "yyyy-MM-dd HH" to set the minutes and seconds as 0
Input datetime
2020-04-21T17:44:27.6825985Z
expected results
2020-04-21 17:00:00
you can use bin() to round down to the hour, and if you still need to remove the datetime parts lower than seconds, you can use substring() (or format_datetime()). e.g.:
print d = datetime(2020-04-21T17:44:27.6825985Z)
| extend h = bin(d, 1h)
| extend h2 = substring(h, 0, 19)
hmm, if you always just want the rest to be 0, can you just use string concatenation?
let d = datetime(2020-04-21T17:44:27.6825985Z);
print strcat(format_datetime(d, "yyyy-MM-dd HH"), ":00:00")
the above code will give you the result of
2020-04-21 17:00:00

Pyspark: Convert String Datetime in 12 hour Clock to Date time with 24 hour clock (Time Zone Change)

Edit: Apologies, the sample data frame is a little off. Below is the corrected sample dataframe I'm trying to convert:
Timestamp (CST)
12/8/2018 05:23 PM
11/29/2018 10:20 PM
I tried the following code based on recommendation below but got null values returned.
df = df.withColumn('Timestamp (CST)_2', from_unixtime(unix_timestamp(col(('Timestamp (CST)')), "yyyy/MM/dd hh:mm:ss aa"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
df = df.withColumn("Timestamp (CST)_3", F.to_timestamp(F.col("Timestamp (CST)_2")))
--------------------------------------------------------------------------------
I have a field called "Timestamp (CST)" that is a string. It is in Central Standard Time.
Timestamp (CST)
2018-11-21T5:28:56 PM
2018-11-21T5:29:16 PM
How do I create a new column that takes "Timestamp (CST)" and change it to UTC and convert it to a datetime with the time stamp on the 24 hour clock?
Below is my desired table and I would like the datatype to be timestamp:
Timestamp (CST)_2
2018-11-21T17:28:56.000Z
2018-11-21T17:29:16.000Z
I tried the following code but all the results came back null:
df = df.withColumn("Timestamp (CST)_2", to_timestamp("Timestamp (CST)", "yyyy/MM/dd h:mm p"))
Firstly, import from_unixtime, unix_timestamp and col using
from pyspark.sql.functions import from_unixtime, unix_timestamp, col
Then, reconstructing your scenario in a DataFrame df_time
>>> cols = ['Timestamp (CST)']
>>> vals = [
... ('2018-11-21T5:28:56 PM',),
... ('2018-11-21T5:29:16 PM',)]
>>> df_time = spark.createDataFrame(vals, cols)
>>> df_time.show(2, False)
+---------------------+
|Timestamp (CST) |
+---------------------+
|2018-11-21T5:28:56 PM|
|2018-11-21T5:29:16 PM|
+---------------------+
Then, my approach would be
>>> df_time_twenfour = df_time.withColumn('Timestamp (CST)', \
... from_unixtime(unix_timestamp(col(('Timestamp (CST)')), "yyyy-MM-dd'T'hh:mm:ss aa"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
>>> df_time_twenfour.show(2, False)
+------------------------+
|Timestamp (CST) |
+------------------------+
|2018-11-21T17:28:56.000Z|
|2018-11-21T17:29:16.000Z|
+------------------------+
Notes
If you want time to be in 24-Hour format then, you would use HH instead of hh.
Since, you have a PM, you use aa in yyyy-MM-dd'T'hh:mm:ss aa to specify PM.
Your, input string has T in it so, you have to specify it as above format.
the option aa as mentioned in #pyy4917's answer might give legacy errors. To fix it, replace aa with a.
The full code as below:
df_time_twenfour = df_time.withColumn('Timestamp (CST)', \ ...
from_unixtime(unix_timestamp(col(('Timestamp (CST)')), \...
"yyyy-MM-dd'T'hh:mm:ss a"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))

R sprintf in sqldf's like

I would like to do a looping query in R using sqldf to that select all non-NULL X.1 variable with date "11/12/2015" and at 9AM. Example :
StartDate X.1
11/12/2015 09:14 A
11/12/2015 09:36
11/12/2015 09:54 A
The date is in variable that generated from other query
nullob<-0
dayminnull<-as.numeric(sqldf("SELECT substr(Min(StartDate),1,03)as hari from testes")) # this produce "11/12/2015"
for (i in 1 : 12){
dday<-mdy(dayminnull)+days(i) #go to next day
sqlsql <- sprintf("SELECT count([X.1]) FROM testes where StartDate like '% \%s 09: %'", dday)
x[i]<-sqldf(sqlsql)
nullob<-nullob+x[i]
}
And it comes with error : Error in sprintf("SELECT count([X.1]) FROM testes WHERE StartDate like '%%s 09%'", :
unrecognised format specification '%'
Please hellp. thank you in advance
It's not super clear in the documentation, but a % followed by a %, that is %%, is the way to tell sprintf to use a literal %. We can test this fairly easily:
sprintf("%% %s %%", "hi")
[1] "% hi %"
For your query string, this should work:
sprintf("SELECT count([X.1]) FROM testes where StartDate like '%% %s 09: %%'", dday)
From ?sprintf:
The string fmt contains normal characters, which are passed through to
the output string, and also conversion specifications which operate on
the arguments provided through .... The allowed conversion
specifications start with a % and end with one of the letters in the
set aAdifeEgGosxX%. These letters denote the following types:
... [Documentation on aAdifeEgGosxX]
%: Literal % (none of the extra formatting characters given below are permitted in this case).

correct sum of hours in access

I have two columns in an access 2010 database with some calculated field:
time_from time_until calculated_field(time_until-time_from)
10:45 15:00 4:15
13:15 16:00 2:45
11:10 16:00 4:50
08:00 15:00 7:00
08:00 23:00 15:00
Now so far, it is good: calculated field did its job to tell me total hours and mins...
now, I need a sum of a calculated field....
I put in an expression builder: =Sum([time_until]-[time_from])
I guess total sum should give me 33:50... but it gives me some 9:50. why is this happening? Is there a way to fix this?
update:
when I put like this:
=Format(Sum([vrijeme_do]-[vrijeme_od])*24)
I get a decimal point number... which I suppose is correct....
for example, 25hrs and 30mins is shown as 25,5
but, how do I format this 25,5 to look like 25:30?
As #Arvo mentioned in his comment, this is a formatting problem. Your expected result for the sum of calculated_field is 33:50. However that sum is a Date/Time value, and since the number of hours is greater than 24, the day portion of the Date/Time is advanced by 1 and the remainder 9:50 is displayed as the time. Apparently your total is formatted to display only the time portion; the day portion is not displayed.
But the actual Date/Time value for the sum of calculated_field is #12/31/1899 09:50#. You can use a custom function to display that value in your desired format:
? duration_hhnn(#12/31/1899 09:50#)
33:50
This is the function:
Public Function duration_hhnn(ByVal pInput As Date) As String
Dim lngDays As Long
Dim lngMinutes As Long
Dim lngHours As Long
Dim strReturn As String
lngDays = Int(pInput)
lngHours = Hour(pInput)
lngMinutes = Minute(pInput)
lngHours = lngHours + (lngDays * 24)
strReturn = lngHours & ":" & Format(lngMinutes, "00")
duration_hhnn = strReturn
End Function
Note the function returns a string value so you can't do further date arithmetic on it directly.
Similar to the answer from #HansUp, it can be done without VBA code like so
Format(24 * Int(SUM(elapsed_time)) + Hour(SUM(elapsed_time)), "0") & ":" & Format(SUM(elapsed_time), "Nn")
I guess you are trying to show the total in a text box? the correct expression would be =SUM([calculated_field_name]).

Resources