Parse the string into a list of tuples - functional-programming

I am looking for a piece of code in F# that can parse this type of string:
"x=1,y=42,A=[1,3,4,8]"
into a list of tuples that looks like this:
[("x",1);("y",42);("A",1);("A",3);("A",4);("A",8)]
Thanks in advance :)

You can quite nicely solve this using the FParsec parser combinator library. This is manageable using regular expressions, but it's not very elegant. Parser combinators make it very clear what the grammar of the inputs that you can handle is. You can also easily add other features like whitespace.
The following actually produces a list of string * Value pairs where Value is a new data type, corresponding to the possible right-hand-sides in the input:
type Value = Int of int | List of int list
Now, you can do the parsing using the following:
let ident = identifier (IdentifierOptions())
let rhs =
// Right-hand-side is either an integer...
( pint32 |>> Int ) <|>
// Or a list [ .. ] of integers separated by ','
( pchar '[' >>. (sepBy pint32 (pchar ',')) .>> pchar ']' |>> List )
let tuple =
// A single tuple is an identifier = right-hand-side
ident .>> pchar '=' .>>. rhs
let p =
// The input is a comma separated list of tuples
sepBy tuple (pchar ',')
run p "x=1,y=42,A=[1,3,4,8]"

Sometimes a named regex makes for readable code, even if not the regex.
(?<id>\w+)=((\[((?<list>(\d+))*,?\s*)*\])|(?<number>\d+))
This reads: Identifier = [Number followed by comma or space, zero or more] | Number
let parse input =
[
let regex = Regex("(?<id>\w+)=((\[((?<list>(\d+))*,?\s*)*\])|(?<number>\d+))")
let matches = regex.Matches input
for (expr : Match) in matches do
let group name = expr.Groups.[string name]
let id = group "id"
let list = group "list"
let number = group "number"
if list.Success then
for (capture : Capture) in list.Captures do
yield (id.Value, int capture.Value)
else if number.Success then
yield (id.Value, int number.Value)
]
Test
let input = "var1=1, var2=2, list=[1, 2, 3, 4], single=[1], empty=[], bad=[,,], bad=var"
printfn "%A" (parse input)
Output
[("var1", 1); ("var2", 2); ("list", 1); ("list", 2); ("list", 3); ("list", 4); "single", 1)]

It's quite advisable to follow the approach outlined by Tomas Petricek's answer, employing the established FParsec parser combinator library.
For educational purposes, you might want to roll your own parser combinator, and for this endeavor Scott W.'s blog ("Understanding parser combinators", and "Building a useful set of parser combinators") contains valuable information.
The parsing looks quite similar:
// parse a list of integers enclosed in brackets and separated by ','
let plist = pchar '[' >>. sepBy1 pint (pchar ',') .>> pchar ']'
// parser for the right hand side, singleton integer or a list of integers
let intOrList = pint |>> (fun x -> [x]) <|> plist
// projection for generation of string * integer tuples
let ungroup p =
p |>> List.collect (fun (key, xs) -> xs |> List.map (fun x -> key, x))
// parser for an input of zero or more string value pairs separated by ','
let parser =
sepBy (letters .>> pchar '=' .>>. intOrList) (pchar ',')
|> ungroup
"x=1,y=42,A=[1,3,4,8]"
|> run parser
// val it : ((String * int) list * string) option =
// Some ([("x", 1); ("y", 42); ("A", 1); ("A", 3); ("A", 4); ("A", 8)], "")
This simple grammar still requires 15 or so parser combinators. Another difference is that for simplicity's sake the Parser type has been modeled on FSharp's Option type.
type Parser<'T,'U> = Parser of ('T -> ('U * 'T) option)
let run (Parser f1) x = // run the parser with input
f1 x
let returnP arg = // lift a value to a Parser
Parser (fun x -> Some(arg, x))
let (>>=) (Parser f1) f = // apply parser-producing function
Parser(f1 >> Option.bind (fun (a, b) -> run (f a) b))
let (|>>) p f = // apply function to value inside Parser
p >>= (f >> returnP)
let (.>>.) p1 p2 = // andThen combinator
p1 >>= fun r1 ->
p2 >>= fun r2 ->
returnP (r1, r2)
let (.>>) p1 p2 = // andThen, but keep first value only
(p1 .>>. p2) |>> fst
let (>>.) p1 p2 = // andThen, keep second value only
(p1 .>>. p2) |>> snd
let pchar c = // parse a single character
Parser (fun s ->
if String.length s > 0 && s.[0] = c then Some(c, s.[1..])
else None )
let (<|>) (Parser f1) (Parser f2) = // orElse combinator
Parser(fun arg ->
match f1 arg with None -> f2 arg | res -> res )
let choice parsers = // choose any of a list of combinators
List.reduce (<|>) parsers
let anyOf = // choose any of a list of characters
List.map pchar >> choice
let many (Parser f) = // matches zero or more occurrences
let rec aux input =
match f input with
| None -> [], input
| Some (x, rest1) ->
let xs, rest2 = aux rest1
x::xs, rest2
Parser (fun arg -> Some(aux arg))
let many1 p = // matches one or more occurrences of p
p >>= fun x ->
many p >>= fun xs ->
returnP (x::xs)
let stringP p = // converts list of characters to string
p |>> (fun xs -> System.String(List.toArray xs))
let letters = // matches one or more letters
many1 (anyOf ['A'..'Z'] <|> anyOf ['a'..'z']) |> stringP
let pint = // matches an integer
many1 (anyOf ['0'..'9']) |> stringP |>> int
let sepBy1 p sep = // matches p one or more times, separated by sep
p .>>. many (sep >>. p) |>> (fun (x,xs) -> x::xs)
let sepBy p sep = // matches p zero or more times, separated by sep
sepBy1 p sep <|> returnP []

Try this:
open System.Text.RegularExpressions
let input = "x=1,y=42,A=[1,3,4,8]"
Regex.Split(input,",(?=[A-Za-z])") //output: [|"x=1"; "y=42"; "A=[1,3,4,8]"|]
|> Array.collect (fun x ->
let l,v = Regex.Split(x,"=") |> fun t -> Array.head t,Array.last t //label and value
Regex.Split(v,",") |> Array.map (fun x -> l,Regex.Replace(x,"\[|\]","") |> int))
|> List.ofArray

Related

Removing from a list of a tuples that contains an empty element in the second projection F#

I need to learn the right way to do pattern matching on Pair types:
let pairToBeFiltered = Ok ([(1,[]);(2,[3;4]);(5,[6;7;8]);(9,[]);(10,[])])
let filterEmpty (pair: int * int list) =
match pair with
| (x,y) when y <> [] -> (x,y) //This gives error because of incomplete pattern matching!
let filtering = List.map(filterEmpty) pairToBeFiltered
Desired output:
Ok([(2,[3;4]);(5,[6;7;8])])
This should do it:
let pairsToBeFiltered = Ok ([(1,[]);(2,[3;4]);(5,[6;7;8]);(9,[]);(10,[])])
let filterEmpty pairs =
List.where (fun (_, y) -> y <> []) pairs // pattern match on pair occurs here
let filtering : Result<_, string> =
pairsToBeFiltered
|> Result.map filterEmpty
printfn "%A" filtering // Ok [(2, [3; 4]); (5, [6; 7; 8])]
There are a number of issues here:
For clarity, I modified filterEmpty so it processes the entire list, rather than a single pair. This is where we apply the filtering function, List.where, using pattern matching. (In your code, note that List.map with a match expression doesn't filter anything.)
Since your list is wrapped in a Result, you need to unwrap it via Result.map in order to process it. (Since you didn't specify a 'TError type, I assumed string to pacify the compiler.)
Three more versions:
(* using match statement *)
module Version1 =
let pairsToBeFiltered : Result<_, string> =
Ok [(1,[]);(2,[3;4]);(5,[6;7;8]);(9,[]);(10,[])]
let myWhere (pair : int * List<int>) =
match pair with
| _, [] -> false
| _, _ -> true
let myFilter l0 = l0 |> Result.map (List.filter myWhere)
let result = pairsToBeFiltered |> myFilter
(* using lambda functions and List.isEmpty *)
module Version2 =
let pairsToBeFiltered : Result<_, string> =
Ok [(1,[]);(2,[3;4]);(5,[6;7;8]);(9,[]);(10,[])]
let myFilter l0 =
l0
|> Result.map (fun l1 ->
l1 |> List.filter (fun (_, l2) ->
l2 |> List.isEmpty |> not))
let result = pairsToBeFiltered |> myFilter
(* shortening Version2 (point free style - take care, can be confusing) *)
module Version3 =
let pairsToBeFiltered : Result<_, string> =
Ok [(1,[]);(2,[3;4]);(5,[6;7;8]);(9,[]);(10,[])]
let myFilter = Result.map (List.filter (snd >> List.isEmpty >> not))
let result = pairsToBeFiltered |> myFilter

How can I determine the json path to a field within a record without actually hard coding the path?

I would like to work with the following type
type RecordPath<'a,'b> = {
Get: 'a -> 'b
Path:string
}
It's purpose is to define a getter for going from record type 'a to some field within 'a of type 'b. It also gives the path to that field for the json representation of the record.
For example, consider the following fields.
type DateWithoutTimeBecauseWeirdlyDotnetDoesNotHaveThisConcept = {
Year:uint
Month:uint
Day:uint
}
type Person = {
FullName:string
PassportNumber:string
BirthDate:DateWithoutTimeBecauseWeirdlyDotnetDoesNotHaveThisConcept
}
type Team = {
TeamName:string
TeamMembers:Person list
}
An example RecordPath might be
let birthYearPath = {
Get = fun (team:Team) -> team.TeamMembers |> List.map (fun p -> p.BirthDate.Year)
Path = "$.TeamMember[*].BirthDate.Year" //using mariadb format for json path
}
Is there some way of letting a library user create this record without ever actually needing to specify the string explicitly. Ideally there is some strongly typed way of the user specifying the fields involved. Maybe some kind of clever use of reflection?
It just occurred to me that with a language that supports macros, this would be possible. But can it be done in F#?
PS: I notice that I left out the s in "TeamMembers" in the path. This is the kind of thing I want to guard against to make it easier on the user.
As you noted in the comments, F# has a quotation mechanism that lets you do this. You can create those explicitly using <# ... #> notation or implicitly using a somewhat more elengant automatic quoting mechanism. The quotations are farily close representations of the F# code, so converting them to the desired path format is not going to be easy, but I think it can be done.
I tried to get this to work at least for your small example. First, I needed a helper function that does two transformations on the code and turns:
let x = e1 in e2 into e2[x <- e1] (using the notation e2[x <- e1] to mean a subsitution, i.e. expression e2 with all occurences of x replaced by e1)
e1 |> fun x -> e2 into e2[x <- e1]
This is all I needed for your example, but it's likely you'll need a few more cases:
open Microsoft.FSharp.Quotations
let rec simplify dict e =
let e' = simplifyOne dict e
if e' <> e then simplify dict e' else e'
and simplifyOne dict = function
| Patterns.Call(None, op, [e; Patterns.Lambda(v, body)])
when op.Name = "op_PipeRight" ->
simplify (Map.add v e dict) body
| Patterns.Let(v, e, body) -> simplify (Map.add v e dict) body
| ExprShape.ShapeVar(v) when Map.containsKey v dict -> dict.[v]
| ExprShape.ShapeVar(v) -> Expr.Var(v)
| ExprShape.ShapeLambda(v, e) -> Expr.Lambda(v, simplify dict e)
| ExprShape.ShapeCombination(o, es) ->
ExprShape.RebuildShapeCombination(o, List.map (simplify dict) es)
With this pre-processing, I managed to write an extractPath function like this:
let rec extractPath var = function
| Patterns.Call(None, op, [Patterns.Lambda(v, body); inst]) when op.Name = "Map" ->
extractPath var inst + "[*]." + extractPath v.Name body
| Patterns.PropertyGet(Some(Patterns.Var v), p, []) when v.Name = var -> p.Name
| Patterns.PropertyGet(Some e, p, []) -> extractPath var e + "." + p.Name
| e -> failwithf "Unexpected expression: %A" e
This looks for (1) a call to map function, (2) a property access on a variable that represents the data source and (3) a property access where the instance has some more property accesses.
The following now works for your small example (but probably for nothing else!)
type Path =
static member Make([<ReflectedDefinition(true)>] f:Expr<'T -> 'R>) =
match f with
| Patterns.WithValue(f, _, Patterns.Lambda(v, body)) ->
{ Get = f :?> 'T -> 'R
Path = "$." + extractPath v.Name (simplify Map.empty body) }
| _ -> failwith "Unexpected argument"
Path.Make(fun (team:Team) -> team.TeamMembers |> List.map (fun p -> p.BirthDate.Year))
The way I solved this is
let jsonPath userExpr =
let rec innerLoop expr state =
match expr with
|Patterns.Lambda(_, body) ->
innerLoop body state
|Patterns.PropertyGet(Some parent, propInfo, []) ->
sprintf ".%s%s" propInfo.Name state |> innerLoop parent
|Patterns.Call (None, _, expr1::[Patterns.Let (v, expr2, _)]) when v.Name = "mapping"->
let parentPath = innerLoop expr1 "[*]"
let childPath = innerLoop expr2 ""
parentPath + childPath
|ExprShape.ShapeVar x ->
state
|_ ->
failwithf "Unsupported expression: %A" expr
innerLoop userExpr "" |> sprintf "$%s"
type Path =
static member Make([<ReflectedDefinition(true)>] f:Expr<'T -> 'R>) =
match f with
|Patterns.WithValue(f, _, expr) ->
let path = jsonPath expr
{
Get = f :?> 'T -> 'R
Path = path
}
| _ -> failwith "Unexpected argument"
Caveat: I don't know enough about these techniques to tell if Tomas' answer performs better in some edge cases than mine.

Mutually recursive let bindings

I'm trying to implement a parser that looks something like this:
open System
type ParseResult<'a> =
{
Result : Option<'a>;
Rest : string
}
let Fail = fun input -> { Result = None; Rest = input }
let Return a = fun input -> { Result = Some a; Rest = input }
let ThenBind p f =
fun input ->
let r = p input
match r.Result with
| None -> { Result = None; Rest = input } // Recreate the result since p returns a ParseResult<'a>
| _ -> (f r.Result) r.Rest
let Then p1 p2 = ThenBind p1 (fun r -> p2)
let Or p1 p2 =
fun input ->
let r = p1 input
match r.Result with
| None -> p2 input
| _ -> r
let rec Chainl1Helper a p op =
Or
<| ThenBind op (fun f ->
ThenBind p (fun y ->
Chainl1Helper (f.Value a y.Value) p op))
<| Return a
let Chainl1 p op = ThenBind p (fun x -> Chainl1Helper x.Value p op)
let rec Chainr1 p op =
ThenBind p (fun x ->
Or
(ThenBind op (fun f ->
ThenBind (Chainr1 p op) (fun y ->
Return (f.Value x.Value y.Value))))
(Return x.Value))
let Next = fun input ->
match input with
| null -> { Result = None; Rest = input }
| "" -> { Result = None; Rest = input }
| _ -> { Result = Some <| char input.[0..1]; Rest = input.[1..] }
let Sat predicate = ThenBind Next (fun n -> if predicate n.Value then Return n.Value else Fail)
let Digit = ThenBind (Sat Char.IsDigit) (fun c -> Return <| float c.Value)
let rec NatHelper i =
Or
(ThenBind Digit (fun x ->
NatHelper (float 10 * i + x.Value) ))
(Return i)
let Nat = ThenBind Digit (fun d -> NatHelper d.Value)
let LiteralChar c = Sat (fun x -> x = c)
let rec Literal input token =
match input with
| "" -> Return token
| _ -> Then (LiteralChar <| char input.[0..1]) (Literal input.[1..] token)
let AddSub =
Or
<| ThenBind (LiteralChar '+') (fun c -> Return (+))
<| ThenBind (LiteralChar '-') (fun c -> Return (-))
let MulDiv =
Or
<| ThenBind (LiteralChar '*') (fun c -> Return (*))
<| ThenBind (LiteralChar '/') (fun c -> Return (/))
let Exp = ThenBind (LiteralChar '^') (fun c -> Return ( ** ))
let rec Expression = Chainl1 Term AddSub
and Term = Chainl1 Factor MulDiv
and Factor = Chainr1 Part Exp
and Part = Or Nat Paren
and Paren =
Then
<| LiteralChar '('
<| ThenBind Expression (fun e ->
Then (LiteralChar ')') (Return e.Value))
The last functions are mutually recursive in their definitions. Expression's definition depends on Term, which depends on Factor, which depends on Part, which depends on Paren, which depends on Expression.
When I try to compile this, I get an error about mutually recursive definitions with the suggestion to make Expression lazy or a function. I tried both of those, and I get a cryptic InvalidOperationException with both that says something about ValueFactory attempting to access the Value property.
In general, F# lets you use let rec .. and .. not just for defining mutually recursive functions, but also for defining mutually recursive values. This means that you might be able to write something like this:
let rec Expression = Chainl1 Term AddSub
and Paren =
Then
<| LiteralChar '('
<| ThenBind Expression (fun e ->
Then (LiteralChar ')') (Return e.Value))
and Part = Or Nat Paren
and Factor = Chainr1 Part Exp
and Term = Chainl1 Factor MulDiv
However, this only works if the computation is not evaluated immediately (because then the recursive definition would not make sense). This very much depends on the library you're using here (or on the rest of your code). But you can try the above and see if that works - if no, you'll need to provide more details.
EDIT In the updated example, there is an immediate loop in your recursive definition. You need to delay some part of the definition using fun _ -> ... so that not everything needs to be evaluated at once. In your example, you can do that by replacing Then with ThenBind in the definition of Paren:
let rec Expression = Chainl1 Term AddSub
and Term = Chainl1 Factor MulDiv
and Factor = Chainr1 Part Exp
and Part = Or Nat Paren
and Paren =
ThenBind
(LiteralChar '(')
(fun _ -> ThenBind Expression (fun e ->
Then (LiteralChar ')') (Return e.Value)))

Printing a list of lists in OCaml

So I am trying to print a list of lists that would look like this:
[0;0;0;0;0];
[0;0;0;0;0];
[0;0;1;0;0];
[0;0;0;0;0];
I can use as many functions as necessary, but only one function may use a print function. Here is what I have so far:
let rec rowToString(row) =
if (row == []) then []
else string_of_int(List.hd row) :: ";" :: rowToString(List.tl row);;
let rec pp_my_image s =
print_list(rowToString(List.hd s)) :: pp_my_image(List.tl s);;
I know this is wrong, but I can't figure out a way to do it.
Here is one way to do it:
let rec rowToString r =
match r with
| [] -> ""
| h :: [] -> string_of_int h
| h :: t -> string_of_int h ^ ";" ^ (rowToString t)
let rec imageToString i =
match i with
| [] -> ""
| h :: t -> "[" ^ (rowToString h) ^ "];\n" ^ (imageToString t)
let pp_my_image s =
print_string (imageToString s)
The rowToString function will create a string with the items in each inner list. Notice that case h :: [] is separated so that a semicolon is not added after the last item.
The imageToString function will create a string for each inner list with a call to rowToString. It will surround the result of each string with brackets and add a semicolon and newline to the end.
pp_my_image will simply convert the image to a string and print the result.

Graph with sets as vertices

I have a tiny grammar represented as a variant type term with strings that are tokens/part of tokens (type term).
Given expressions from the grammar, I am collecting all strings from expressions and pack them into sets (function vars). Finally, I want to create some graph with these sets as vertices (lines 48-49).
For some reason, the graph created in the such sophisticated way does not recognise sets containing same variables and creates multiple vertices with the same content. I don't really understand why this is happening.
Here is minimal working example with this behaviour:
(* demo.ml *)
type term =
| Var of string
| List of term list * string option
| Tuple of term list
module SSet = Set.Make(
struct
let compare = String.compare
type t = string
end)
let rec vars = function
| Var v -> SSet.singleton v
| List (x, tail) ->
let tl = match tail with
| None -> SSet.empty
| Some var -> SSet.singleton var in
SSet.union tl (List.fold_left SSet.union SSet.empty (List.map vars x))
| Tuple x -> List.fold_left SSet.union SSet.empty (List.map vars x)
module Node = struct
type t = SSet.t
let compare = SSet.compare
let equal = SSet.equal
let hash = Hashtbl.hash
end
module G = Graph.Imperative.Digraph.ConcreteBidirectional(Node)
(* dot output for the graph for illustration purposes *)
module Dot = Graph.Graphviz.Dot(struct
include G
let edge_attributes _ = []
let default_edge_attributes _ = []
let get_subgraph _ = None
let vertex_attributes _ = []
let vertex_name v = Printf.sprintf "{%s}" (String.concat ", " (SSet.elements v))
let default_vertex_attributes _ = []
let graph_attributes _ = []
end)
let _ =
(* creation of two terms *)
let a, b = List ([Var "a"], Some "b"), Tuple [Var "a"; Var "b"] in
(* get strings from terms packed into sets *)
let avars, bvars = vars a, vars b in
let g = G.create () in
G.add_edge g avars bvars;
Printf.printf "The content is the same: [%s] [%s]\n"
(String.concat ", " (SSet.elements avars))
(String.concat ", " (SSet.elements bvars));
Printf.printf "compare/equal output: %d %b\n"
(SSet.compare avars bvars)
(SSet.equal avars bvars);
Printf.printf "Hash values are different: %d %d\n"
(Hashtbl.hash avars) (Hashtbl.hash bvars);
Dot.fprint_graph Format.str_formatter g;
Printf.printf "Graph representation:\n%s" (Format.flush_str_formatter ())
In order to compile, type ocamlc -c -I +ocamlgraph demo.ml; ocamlc -I +ocamlgraph graph.cma demo.cmo. When the program is executed you get this output:
The content is the same: [a, b] [a, b]
compare/equal output: 0 true
Hash values are different: 814436103 1017954833
Graph representation:
digraph G {
{a, b};
{a, b};
{a, b} -> {a, b};
{a, b} -> {a, b};
}
To sum up, I am curious why there are non-equal hash values for sets and two identical vertices are created in the graph, despite the fact these sets are equal by all other means.
I suspect the general answer is that OCaml's built-in hashing is based on rather physical properties of a value, while set equality is a more abstract notion. If you represent sets as ordered binary trees, there are many trees that represent the same set (as is well known). These will be equal as sets but might very well hash to different values.
If you want hashing to work for sets, you might have to supply your own function.
As Jeffrey pointed out, it seems that the problem is in the definition of the hash function that is part of Node module.
Changing it to let hash x = Hashtbl.hash (SSet.elements x) fixed the issue.

Resources