System.DateTime formatting in F#

System.DateTime formatting in F# - datetime

I have the following f# code:
let mutable argNum = 0
let cmdArgs = System.Environment.GetCommandLineArgs()
for arg in cmdArgs do
printfn "arg %d : %s" argNum arg
match argNum with
| 1 -> pmID <- System.Int32.Parse arg
| 2 -> startDate <- System.DateTime.Parse arg
| 3 -> endDate <- System.DateTime.Parse arg
| _ -> ()
argNum <- argNum + 1
for the date parameters, the argument comes in the form: "1-1-2011", so "M-D-YYYY"
when i write the dates into the xml serializer, I get the following format:
1/1/2011 12:00:00 AM
I'd like to remove the Time piece completely. What's the best way to do this?

What about this one?
(DateTime.Parse arg).ToString("MM-dd-yyyy")

By default DateTime instances are serialized using the "dateTime" datatype for serialization. You can change it by annotating your type to use "date" instead, which will serialize it in the format YYYY-MM-DD - if this doesn't work for you just serialize a string instead that holds the format of your choice.
You can set the custom serialization attribute like this:
full datetime: [<XmlAttribute("start-date", DataType = "dateTime")>]
just the date part: [<XmlAttribute("start-date", DataType = "date")>]
Example:
[<Serializable>]
type DateTest() =
let mutable startDate = DateTime.Now
[<XmlAttribute("start-date", DataType = "date")>]
member x.StartDate with get() = startDate and set v = startDate <- v

Related

unpivot (wide to long) with dynamic column names

I need a function wide_to_long that turns a wide table into a long table and that accepts an argument id_vars for which the values have to be repeated (see example).
Sample input
let T_wide = datatable(name: string, timestamp: datetime, A: int, B: int) [
'abc','2022-01-01 12:00:00',1,2,
'def','2022-01-01 13:00:00',3,4
];
Desired output
Calling wide_to_long(T_wide, dynamic(['name', 'timestamp'])) should produce the following table.
let T_long = datatable(name: string, timestamp: datetime, variable: string, value: int) [
'abc','2022-01-01 12:00:00','A',1,
'abc','2022-01-01 12:00:00','B',2,
'def','2022-01-01 13:00:00','A',3,
'def','2022-01-01 13:00:00','B',4
];
Attempt
I've come pretty far with the following code.
let wide_to_long = (T:(*), id_vars: dynamic) {
// get names of keys to remove later
let all_columns = toscalar(T | getschema | summarize make_list(ColumnName));
let remove = set_difference(all_columns, id_vars);
// expand columns not contained in id_vars
T
| extend packed1 = pack_all()
| extend packed1 = bag_remove_keys(packed1, id_vars)
| mv-expand kind=array packed1
| extend variable = packed1[0], value = packed1[1]
// remove unwanted columns
| project packed2 = pack_all()
| project packed2 = bag_remove_keys(packed2, remove)
| evaluate bag_unpack(packed2)
| project-away packed1
};
The problems are that the solution feels clunky (is there a better way?) and the columns in the result are ordered randomly. The second issue is minor, but annoying.

how to convert string of mapping to mapping in pyspark

I have a csv file look like this (it is saved from pyspark output)
name_value
"[quality1 -> good, quality2 -> OK, quality3 -> bad]"
"[quality1 -> good, quality2 -> excellent]"
how can I use pyspark to read this csv file and convert name_value column into a map type?

Something like the below
data = {}
line = '[quality1 -> good, quality2 -> OK, quality3 -> bad]'
parts = line[1:-1].split(',')
for part in parts:
k,v = part.split('->')
data[k.strip()] = v.strip()
print(data)
output
{'quality1': 'good', 'quality2': 'OK', 'quality3': 'bad'}

Using a combination of split and regexp_replace cuts the string into key value pairs. In a second step each key value pair is transformed first into a struct and then into a map element:
from pyspark.sql import functions as F
df=spark.read.option("header","true").csv(...)
df1=df.withColumn("name_value", F.split(F.regexp_replace("name_value", "[\\[\\]]", ""),",")) \
.withColumn("name_value", F.map_from_entries(F.expr("""transform(name_value, e -> (regexp_extract(e, '^(.*) ->',1),regexp_extract(e, '-> (.*)$',1)))""")))
df1 has now the schema
root
|-- name_value: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
and contains the same data like the original csv file.

Converting str into datetime and fetching all data from db

I tried to make a tasks.loop() for checking a muted user that needs to be unmuted, but there's a few problem while doing this, i can't use fetchall() for some reason because it will gives me this error
toremove = muteremove[2]
IndexError: list index out of range
If i use fetchone() maybe it only fetch 1 user every 10 secs, i mean how to fetch all the data every 10 sec to unmute a user?
Also if i use fetchone() it will say that it can't convert str into datetime.datetime object, how can i fix this?
#tasks.loop(seconds=10)
async def muted_user_check(self):
self.cur.execute(f"SELECT userId, guildId, expiredAt FROM mutedlist")
muteremove = self.cur.fetchall()
if muteremove is None:
print("No user to unmute :D")
if muteremove is not None:
toremove = muteremove[2]
timenow = datetime.utcnow()
if timenow > toremove:
self.cur.execute(f"DELETE FROM mutedlist WHERE guildId = {muteremove[1]} and userId = {muteremove[0]}")

To convert a string into a datetime object, you can use the strptime() method:
from datetime import datetime
def convert(date, format):
return datetime.strptime(date, format)
[input] convert('22/08/2020', '%d/%m/%Y')
[output] 2020-08-22 00:00:00
The output will be a datetime object that you can format with the strftime() method like so:
#Example
from datetime import datetime
now = datetime.now() #now will be a datetime object
now.strftime('%d/%m/%Y - %H:%M:%S') # DD/MM/YYYY - hours:minutes:seconds
Here's a list of some formats:
%A → Weekday (%a for abreviations and %w for numbers)
%-d → day of the mount (1, 2, 3, 4, ...)
%B → Mounth name (%b for abreviations and %-m for numbers)
%I → Hour (12h clock)
%p → AM or PM
%H → Hour (24h clock)
%M → Minutes
%S → Seconds
%f → Microseconds
%c → Local date and time representation
Using your code, it would be:
#tasks.loop(seconds=10)
async def muted_user_check(self):
self.cur.execute(f"SELECT * FROM mutedlist")
mute_list = self.cur.fetchall()
if not mute_list:
print("No user to unmute :D")
else:
timeNow = datetime.utcnow()
for mute in mute_list:
muteExpire = datetime.strptime(mute[3], '%Y-%m-%d %H:%M:%S')
if timeNow > muteExpire :
self.cur.execute(f"DELETE FROM mutedlist WHERE guildId=? AND userId=?", (mute[0], mute[1]))

How to: Run a user defined function for a range of (date) values

So let’s say I want to test a function that finds outliers over past data. I’d love to end up with a table that looks like this:
Time Outliers_At_Time
<somedate> 0
<somedate + interval> 1
The function looks like this:
let OutliersAt = (TheDate:datetime) {
<… outputs zero or a positive integer>
}
My instinct would be to do something like this:
let SomeDates = range AtTime from ago(10d) to now() step 10m;
SomeDates | extend NumOutliers = OutliersAt (AtTime)
… but that gives me this error message:
Error Semantic error: '' has the following semantic error: Unresolved
reference binding: 'AtTime'. clientRequestId:
KustoWebV2;1ea28ba0-12f1-4a52-95e7-975db3310f59
Suggestions?

If you are looking on finding outliers - there is a built-in function in Kusto to do it:
https://learn.microsoft.com/en-us/azure/kusto/query/series-outliersfunction
Example:
let _data =
range Timestamp from ago(7d) to now() step 1min
| extend Value=case(rand(1000)==10, 1200.0, rand(100));
//
_data
| make-series AvgValue=avg(Value) default=0 on Timestamp in range(ago(7d), now(), 5min)
| extend outliers=series_outliers(AvgValue)
| render timechart
If the question is about general way to provide parameters to user-defined functions,
see more info here:
https://learn.microsoft.com/en-us/azure/kusto/query/functions/user-defined-functions
In particular, you can pass a serie into a user-defined-function (e.g. to get statistics):
let OutliersAt = (_serie:dynamic) {
let stats = series_stats_dynamic(_serie);
todouble(stats.max_idx) >= 0
};
let _data =
range Timestamp from ago(7d) to now() step 1min
| extend Value=case(rand(1000)==10, 1200.0, rand(100));
//
_data
| make-series AvgValue=avg(Value) default=0 on Timestamp in range(ago(7d), now(), 5min)
| extend outliers=series_outliers(AvgValue)
| project hasOutliers=OutliersAt(outliers)

MomentJS using 'm' to add a month gives unexpected result

I'm using Moment.js and for some (probably basic) reason I'm not getting the result I'm expecting:
let date = moment("1995-01-25");
date.add(2, 'm');
console.log(date.month()); // Expected 2, outputs 0

You have to use uppercase M for adding months, lowercase m stands for minutes, see add docs:
Key | Shorthand
-------------------
months | M
minutes | m
Here a working sample:
let date = moment("1995-01-25");
date.add(2, 'm');
console.log(date.month());
console.log(date.format()); //1995-01-25T00:02:00
let date2 = moment("1995-01-25");
date2.add(2, 'M');
console.log(date2.month()); // 2
console.log(date2.format()); // 1995-03-25T00:00:00
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.18.1/moment.min.js"></script>