How to split a string on spaces in Clean?

How to split a string on spaces in Clean? - functional-programming

I'm a newbie with functional programming and Clean. I want to split a string on whitespace, like the words function in Haskell.
words :: String -> [String]
input: "my separated list "
output: ["my","separated","list"]
This is the definition in Haskell:
words :: String -> [String]
words s = case dropWhile {-partain:Char.-}isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') =
break {-partain:Char.-}isSpace s'
But Clean doesn't have break, and I dont know what it means, and how to implement it in Clean:
s' -> w : words s''
where (w, s'')

As the StdEnvApi document advises you should convert the String to a list to use the StdList API functions (section 6, page 20).
This results in something like this:
splitString :: String -> [String]
splitString x = [foldr (+++) "" i\\i<- splitString` (fromString x)]
where
splitString` :: [String] -> [[String]]
splitString` x = let (p, n) = span ((<>) " ") x in
if (isEmpty n) [p] [p:splitString` (tl n)]

Related

A function that compare a two lists of string

I am a new at F# and i try to do this task:
Make a function compare : string list -> string list -> int that takes two string lists and returns: -1, 0 or 1
Please help. I spend a lot of time, and i can not understand how to implement this task.

Given the task I assume what your professor wants to teach you with this exercise. I'll try to give you a starting point without
Confusing you
Presenting a 'done-deal' solution
I assume the goal of this task is to work with recursive functions and pattern matching to element-wise compare their elements. It could looks somewhat like this here
open System
let aList = [ "Apple"; "Banana"; "Coconut" ]
let bList = [ "Apple"; "Banana"; "Coconut" ]
let cList = [ "Apple"; "Zebra" ]
let rec doSomething f (a : string list) (b : string list) =
match (a, b) with
| ([], []) ->
printfn "Both are empty"
| (x::xs, []) ->
printfn "A has elements (we can unpack the first element as x and the rest as xs) and B is empty"
| ([], x::xs) ->
printfn "A is empty and B has elements (we can unpack the first element as x and the rest as xs)"
| (x::xs, y::ys) ->
f x y
printfn "Both A and B have elements. We can unpack them as the first elements x and y and their respective tails xs and ys"
doSomething f xs ys
let isItTheSame (a : string) (b : string) =
if String.Equals(a, b) then
printfn "%s is equals to %s" a b
else
printfn "%s is not equals to %s" a b
doSomething isItTheSame aList bList
doSomething isItTheSame aList cList
The example has three different lists, two of them being equal and one of them being different. The doSomething function takes a function (string -> string -> unit) and two lists of strings.
Within the function you see a pattern match as well as a recursive call of doSomething in the last match block. The signatures aren't exactly what you need and you might want to think about how to change the parametrization for cases where you don't want to stop the recursion (the last match block - if the strings are equal you want to keep on comparing, right?).
Just take the code and try it out in FSI. I'm confident, that you'll find the solution 🙂

In F# many collections are comparable if their element type is:
let s1 = [ "a"; "b" ]
let s2 = [ "foo"; "bar" ]
compare s1 s2 // -5
let f1 = [ (fun () -> 1); fun () -> 2 ]
let f2 = [ (fun () -> 3); fun () -> 42 ]
// compare f1 f2 (* error FS0001: The type '(unit -> int)' does not support the 'comparison' constraint. *)
so
let slcomp (s1 : string list) s2 = compare s1 s2 |> sign
Posting for reference as the original question is answered already.

Parse the string into a list of tuples

I am looking for a piece of code in F# that can parse this type of string:
"x=1,y=42,A=[1,3,4,8]"
into a list of tuples that looks like this:
[("x",1);("y",42);("A",1);("A",3);("A",4);("A",8)]
Thanks in advance :)

You can quite nicely solve this using the FParsec parser combinator library. This is manageable using regular expressions, but it's not very elegant. Parser combinators make it very clear what the grammar of the inputs that you can handle is. You can also easily add other features like whitespace.
The following actually produces a list of string * Value pairs where Value is a new data type, corresponding to the possible right-hand-sides in the input:
type Value = Int of int | List of int list
Now, you can do the parsing using the following:
let ident = identifier (IdentifierOptions())
let rhs =
// Right-hand-side is either an integer...
( pint32 |>> Int ) <|>
// Or a list [ .. ] of integers separated by ','
( pchar '[' >>. (sepBy pint32 (pchar ',')) .>> pchar ']' |>> List )
let tuple =
// A single tuple is an identifier = right-hand-side
ident .>> pchar '=' .>>. rhs
let p =
// The input is a comma separated list of tuples
sepBy tuple (pchar ',')
run p "x=1,y=42,A=[1,3,4,8]"

Sometimes a named regex makes for readable code, even if not the regex.
(?<id>\w+)=((\[((?<list>(\d+))*,?\s*)*\])|(?<number>\d+))
This reads: Identifier = [Number followed by comma or space, zero or more] | Number
let parse input =
[
let regex = Regex("(?<id>\w+)=((\[((?<list>(\d+))*,?\s*)*\])|(?<number>\d+))")
let matches = regex.Matches input
for (expr : Match) in matches do
let group name = expr.Groups.[string name]
let id = group "id"
let list = group "list"
let number = group "number"
if list.Success then
for (capture : Capture) in list.Captures do
yield (id.Value, int capture.Value)
else if number.Success then
yield (id.Value, int number.Value)
]
Test
let input = "var1=1, var2=2, list=[1, 2, 3, 4], single=[1], empty=[], bad=[,,], bad=var"
printfn "%A" (parse input)
Output
[("var1", 1); ("var2", 2); ("list", 1); ("list", 2); ("list", 3); ("list", 4); "single", 1)]

It's quite advisable to follow the approach outlined by Tomas Petricek's answer, employing the established FParsec parser combinator library.
For educational purposes, you might want to roll your own parser combinator, and for this endeavor Scott W.'s blog ("Understanding parser combinators", and "Building a useful set of parser combinators") contains valuable information.
The parsing looks quite similar:
// parse a list of integers enclosed in brackets and separated by ','
let plist = pchar '[' >>. sepBy1 pint (pchar ',') .>> pchar ']'
// parser for the right hand side, singleton integer or a list of integers
let intOrList = pint |>> (fun x -> [x]) <|> plist
// projection for generation of string * integer tuples
let ungroup p =
p |>> List.collect (fun (key, xs) -> xs |> List.map (fun x -> key, x))
// parser for an input of zero or more string value pairs separated by ','
let parser =
sepBy (letters .>> pchar '=' .>>. intOrList) (pchar ',')
|> ungroup
"x=1,y=42,A=[1,3,4,8]"
|> run parser
// val it : ((String * int) list * string) option =
// Some ([("x", 1); ("y", 42); ("A", 1); ("A", 3); ("A", 4); ("A", 8)], "")
This simple grammar still requires 15 or so parser combinators. Another difference is that for simplicity's sake the Parser type has been modeled on FSharp's Option type.
type Parser<'T,'U> = Parser of ('T -> ('U * 'T) option)
let run (Parser f1) x = // run the parser with input
f1 x
let returnP arg = // lift a value to a Parser
Parser (fun x -> Some(arg, x))
let (>>=) (Parser f1) f = // apply parser-producing function
Parser(f1 >> Option.bind (fun (a, b) -> run (f a) b))
let (|>>) p f = // apply function to value inside Parser
p >>= (f >> returnP)
let (.>>.) p1 p2 = // andThen combinator
p1 >>= fun r1 ->
p2 >>= fun r2 ->
returnP (r1, r2)
let (.>>) p1 p2 = // andThen, but keep first value only
(p1 .>>. p2) |>> fst
let (>>.) p1 p2 = // andThen, keep second value only
(p1 .>>. p2) |>> snd
let pchar c = // parse a single character
Parser (fun s ->
if String.length s > 0 && s.[0] = c then Some(c, s.[1..])
else None )
let (<|>) (Parser f1) (Parser f2) = // orElse combinator
Parser(fun arg ->
match f1 arg with None -> f2 arg | res -> res )
let choice parsers = // choose any of a list of combinators
List.reduce (<|>) parsers
let anyOf = // choose any of a list of characters
List.map pchar >> choice
let many (Parser f) = // matches zero or more occurrences
let rec aux input =
match f input with
| None -> [], input
| Some (x, rest1) ->
let xs, rest2 = aux rest1
x::xs, rest2
Parser (fun arg -> Some(aux arg))
let many1 p = // matches one or more occurrences of p
p >>= fun x ->
many p >>= fun xs ->
returnP (x::xs)
let stringP p = // converts list of characters to string
p |>> (fun xs -> System.String(List.toArray xs))
let letters = // matches one or more letters
many1 (anyOf ['A'..'Z'] <|> anyOf ['a'..'z']) |> stringP
let pint = // matches an integer
many1 (anyOf ['0'..'9']) |> stringP |>> int
let sepBy1 p sep = // matches p one or more times, separated by sep
p .>>. many (sep >>. p) |>> (fun (x,xs) -> x::xs)
let sepBy p sep = // matches p zero or more times, separated by sep
sepBy1 p sep <|> returnP []

Try this:
open System.Text.RegularExpressions
let input = "x=1,y=42,A=[1,3,4,8]"
Regex.Split(input,",(?=[A-Za-z])") //output: [|"x=1"; "y=42"; "A=[1,3,4,8]"|]
|> Array.collect (fun x ->
let l,v = Regex.Split(x,"=") |> fun t -> Array.head t,Array.last t //label and value
Regex.Split(v,",") |> Array.map (fun x -> l,Regex.Replace(x,"\[|\]","") |> int))
|> List.ofArray

Printing a list of lists in OCaml

So I am trying to print a list of lists that would look like this:
[0;0;0;0;0];
[0;0;0;0;0];
[0;0;1;0;0];
[0;0;0;0;0];
I can use as many functions as necessary, but only one function may use a print function. Here is what I have so far:
let rec rowToString(row) =
if (row == []) then []
else string_of_int(List.hd row) :: ";" :: rowToString(List.tl row);;
let rec pp_my_image s =
print_list(rowToString(List.hd s)) :: pp_my_image(List.tl s);;
I know this is wrong, but I can't figure out a way to do it.

Here is one way to do it:
let rec rowToString r =
match r with
| [] -> ""
| h :: [] -> string_of_int h
| h :: t -> string_of_int h ^ ";" ^ (rowToString t)
let rec imageToString i =
match i with
| [] -> ""
| h :: t -> "[" ^ (rowToString h) ^ "];\n" ^ (imageToString t)
let pp_my_image s =
print_string (imageToString s)
The rowToString function will create a string with the items in each inner list. Notice that case h :: [] is separated so that a semicolon is not added after the last item.
The imageToString function will create a string for each inner list with a call to rowToString. It will surround the result of each string with brackets and add a semicolon and newline to the end.
pp_my_image will simply convert the image to a string and print the result.

Graph with sets as vertices

I have a tiny grammar represented as a variant type term with strings that are tokens/part of tokens (type term).
Given expressions from the grammar, I am collecting all strings from expressions and pack them into sets (function vars). Finally, I want to create some graph with these sets as vertices (lines 48-49).
For some reason, the graph created in the such sophisticated way does not recognise sets containing same variables and creates multiple vertices with the same content. I don't really understand why this is happening.
Here is minimal working example with this behaviour:
(* demo.ml *)
type term =
| Var of string
| List of term list * string option
| Tuple of term list
module SSet = Set.Make(
struct
let compare = String.compare
type t = string
end)
let rec vars = function
| Var v -> SSet.singleton v
| List (x, tail) ->
let tl = match tail with
| None -> SSet.empty
| Some var -> SSet.singleton var in
SSet.union tl (List.fold_left SSet.union SSet.empty (List.map vars x))
| Tuple x -> List.fold_left SSet.union SSet.empty (List.map vars x)
module Node = struct
type t = SSet.t
let compare = SSet.compare
let equal = SSet.equal
let hash = Hashtbl.hash
end
module G = Graph.Imperative.Digraph.ConcreteBidirectional(Node)
(* dot output for the graph for illustration purposes *)
module Dot = Graph.Graphviz.Dot(struct
include G
let edge_attributes _ = []
let default_edge_attributes _ = []
let get_subgraph _ = None
let vertex_attributes _ = []
let vertex_name v = Printf.sprintf "{%s}" (String.concat ", " (SSet.elements v))
let default_vertex_attributes _ = []
let graph_attributes _ = []
end)
let _ =
(* creation of two terms *)
let a, b = List ([Var "a"], Some "b"), Tuple [Var "a"; Var "b"] in
(* get strings from terms packed into sets *)
let avars, bvars = vars a, vars b in
let g = G.create () in
G.add_edge g avars bvars;
Printf.printf "The content is the same: [%s] [%s]\n"
(String.concat ", " (SSet.elements avars))
(String.concat ", " (SSet.elements bvars));
Printf.printf "compare/equal output: %d %b\n"
(SSet.compare avars bvars)
(SSet.equal avars bvars);
Printf.printf "Hash values are different: %d %d\n"
(Hashtbl.hash avars) (Hashtbl.hash bvars);
Dot.fprint_graph Format.str_formatter g;
Printf.printf "Graph representation:\n%s" (Format.flush_str_formatter ())
In order to compile, type ocamlc -c -I +ocamlgraph demo.ml; ocamlc -I +ocamlgraph graph.cma demo.cmo. When the program is executed you get this output:
The content is the same: [a, b] [a, b]
compare/equal output: 0 true
Hash values are different: 814436103 1017954833
Graph representation:
digraph G {
{a, b};
{a, b};
{a, b} -> {a, b};
{a, b} -> {a, b};
}
To sum up, I am curious why there are non-equal hash values for sets and two identical vertices are created in the graph, despite the fact these sets are equal by all other means.

I suspect the general answer is that OCaml's built-in hashing is based on rather physical properties of a value, while set equality is a more abstract notion. If you represent sets as ordered binary trees, there are many trees that represent the same set (as is well known). These will be equal as sets but might very well hash to different values.
If you want hashing to work for sets, you might have to supply your own function.

As Jeffrey pointed out, it seems that the problem is in the definition of the hash function that is part of Node module.
Changing it to let hash x = Hashtbl.hash (SSet.elements x) fixed the issue.

functional programming with less recursion?

I am currently doing reasonably well in functional programming using F#. I tend, however, to do a lot of programming using recursion, when it seems that there are better idioms in the F#/functional programming community. So in the spirit of learning, is there a better/more idiomatic way of writing the function below without recursion?
let rec convert line =
if line.[0..1] = " " then
match convert line.[2..] with
| (i, subline) -> (i+1, subline)
else
(0, line)
with results such as:
> convert "asdf";;
val it : int * string = (0, "asdf")
> convert " asdf";;
val it : int * string = (1, "asdf")
> convert " asdf";;
val it : int * string = (3, "asdf")

Recursion is the basic mechanism for writing loops in functional languages, so if you need to iterate over characters (as you do in your sample), then recursion is what you need.
If you want to improve your code, then you should probably avoid using line.[2..] because that is going to be inefficient (strings are not designed for this kind of processing). It is better to convert the string to a list and then process it:
let convert (line:string) =
let rec loop acc line =
match line with
| ' '::' '::rest -> loop (acc + 1) rest
| _ -> (acc, line)
loop 0 (List.ofSeq line)
You can use various functions from the standard library to implement this in a more shorter way, but they are usually recursive too (you just do not see the recursion!), so I think using functions like Seq.unfold and Seq.fold is still recursive (and it looks way more complex than your code).
A more concise approach using standard libraries is to use the TrimLeft method (see comments), or using standard F# library functions, do something like this:
let convert (line:string) =
// Count the number of spaces at the beginning
let spaces = line |> Seq.takeWhile (fun c -> c = ' ') |> Seq.length
// Divide by two - we want to count & skip two-spaces only
let count = spaces / 2
// Get substring starting after all removed two-spaces
count, line.[(count * 2) ..]
EDIT Regarding the performance of string vs. list processing, the problem is that slicing allocates a new string (because that is how strings are represented on the .NET platform), while slicing a list just changes a reference. Here is a simple test:
let rec countList n s =
match s with
| x::xs -> countList (n + 1) xs
| _ -> n
let rec countString n (s:string) =
if s.Length = 0 then n
else countString (n + 1) (s.[1 ..])
let l = [ for i in 1 .. 10000 -> 'x' ]
let s = new System.String('x', 10000)
#time
for i in 0 .. 100 do countList 0 l |> ignore // 0.002 sec (on my machine)
for i in 0 .. 100 do countString 0 s |> ignore // 5.720 sec (on my machine)

Because you traverse the string in a non-uniform way, a recursive solution is much more suitable in this example. I would rewrite your tail-recursive solution for readability as follows:
let convert (line: string) =
let rec loop i line =
match line.[0..1] with
| " " -> loop (i+1) line.[2..]
| _ -> i, line
loop 0 line
Since you asked, here is a (bizarre) non-recursive solution :).
let convert (line: string) =
(0, line) |> Seq.unfold (fun (i, line) ->
let subline = line.[2..]
match line.[0..1] with
| " " -> Some((i+1, subline), (i+1, subline))
| _ -> None)
|> Seq.fold (fun _ x -> x) (0, line)

Using tail recursion, it can be written as
let rec convert_ acc line =
if line.[0..1] <> " " then
(acc, line)
else
convert_ (acc + 1) line.[2..]
let convert = convert_ 0
still looking for a non-recursive answer, though.

Here's a faster way to write your function -- it checks the characters explicitly instead of using string slicing (which, as Tomas said, is slow); it's also tail-recursive. Finally, it uses a StringBuilder to create the "filtered" string, which will provide better performance once your input string reaches a decent length (though it'd be a bit slower for very small strings due to the overhead of creating the StringBuilder).
let convert' str =
let strLen = String.length str
let sb = System.Text.StringBuilder strLen
let rec convertRec (count, idx) =
match strLen - idx with
| 0 ->
count, sb.ToString ()
| 1 ->
// Append the last character in the string to the StringBuilder.
sb.Append str.[idx] |> ignore
convertRec (count, idx + 1)
| _ ->
if str.[idx] = ' ' && str.[idx + 1] = ' ' then
convertRec (count + 1, idx + 2)
else
sb.Append str.[idx] |> ignore
convertRec (count, idx + 1)
// Call the internal, recursive implementation.
convertRec (0, 0)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to split a string on spaces in Clean? - functional-programming

Related

A function that compare a two lists of string

Parse the string into a list of tuples

Printing a list of lists in OCaml

Graph with sets as vertices

functional programming with less recursion?

Categories

Resources