To learn some Prolog (I'm using GNU Prolog) and grok its parsing abilities, I am starting by writing a Lisp (or S-expression, if I'm being exact) tokenizer, which given a set of tokens like ['(', 'f', 'o', 'o', ')'] should produce ['(', 'foo', ')']. It's not working as expected, which is why I'm here! I thought my thought process shined through in my pseudocode:
tokenize([current | rest], buffer, tokens):
if current is '(' or ')',
Tokenize the rest,
And the output will be the current token buffer,
Plus the parenthesis and the rest.
if current is ' ',
Tokenize the rest with a clean buffer,
And the output will be the buffer plus the rest.
if the tail is empty,
The output will be a one-element list containing the buffer.
otherwise,
Add the current character to the buffer,
And the output will be the rest tokenized, with a bigger buffer.
I translated that to Prolog like this:
tokenize([Char | Chars], Buffer, Tokens) :-
((Char = '(' ; Char = ')') ->
tokenize(Chars, '', Tail_Tokens),
Tokens is [Buffer, Char | Tail_Tokens];
Char = ' ' ->
tokenize(Chars, '', Tail_Tokens),
Tokens is [Buffer | Tail_Tokens];
Chars = [] -> Tokens is [Buffer];
atom_concat(Buffer, Char, New_Buffer),
tokenize(Chars, New_Buffer, Tokens)).
print_tokens([]) :- write('.').
print_tokens([T | N]) :- write(T), write(', '), print_tokens(N).
main :-
% tokenize(['(', 'f', 'o', 'o', '(', 'b', 'a', 'r', ')', 'b', 'a', 'z', ')'], '', Tokens),
tokenize(['(', 'f', 'o', 'o', ')'], '', Tokens),
print_tokens(Tokens).
When running the result, below, like this: gprolog --consult-file lisp_parser.pl it just tells me no. I traced main, and it gave me the stack trace below. I do not understand why tokenize fails for the empty case. I see that the buffer is empty since it was cleared with the previous ')', but even if Tokens is empty at that point in time, wouldn't Tokens accumulate a larger result recursively? Can someone who is good with Prolog give me a few tips here?
| ?- main.
no
| ?- trace.
The debugger will first creep -- showing everything (trace)
(1 ms) yes
{trace}
| ?- main.
1 1 Call: main ?
2 2 Call: tokenize(['(',f,o,o,')'],'',_353) ?
3 3 Call: tokenize([f,o,o,')'],'',_378) ?
4 4 Call: atom_concat('',f,_403) ?
4 4 Exit: atom_concat('',f,f) ?
5 4 Call: tokenize([o,o,')'],f,_429) ?
6 5 Call: atom_concat(f,o,_454) ?
6 5 Exit: atom_concat(f,o,fo) ?
7 5 Call: tokenize([o,')'],fo,_480) ?
8 6 Call: atom_concat(fo,o,_505) ?
8 6 Exit: atom_concat(fo,o,foo) ?
9 6 Call: tokenize([')'],foo,_531) ?
10 7 Call: tokenize([],'',_556) ?
10 7 Fail: tokenize([],'',_544) ?
9 6 Fail: tokenize([')'],foo,_519) ?
7 5 Fail: tokenize([o,')'],fo,_468) ?
5 4 Fail: tokenize([o,o,')'],f,_417) ?
3 3 Fail: tokenize([f,o,o,')'],'',_366) ?
2 2 Fail: tokenize(['(',f,o,o,')'],'',_341) ?
1 1 Fail: main ?
(1 ms) no
{trace}
| ?-
How about this. I think that's what you want to do, but let's use Definite Clause Grammars (which are just horn clauses with :- replaced by --> and two elided arguments holding the input character list and remaining character list. An example DCG rule:
rule(X) --> [c], another_rule(X), {predicate(X)}.
List processing rule rule//1 says: When you find character c in the input list, then continue list processing with another_rule//1, and when that worked out, call predicate(X) as normal.
Then:
% If we encounter a separator symbol '(' or ')', we commit to the
% clause using '!' (no point trying anything else, in particular
% not the clause for "other characters", tokenize the rest of the list,
% and when we have done that decide whether 'MaybeToken', which is
% "part of the leftmost token after '(' or ')'", should be retained.
% it is dropped if it is empty. The caller is then given an empty
% "part of the leftmost token" and the list of tokens, with '(' or ')'
% prepended: "tokenize('', [ '(' | MoreTokens] ) -->"
tokenize('', [ '(' | MoreTokens] ) -->
['('],
!,
tokenize(MaybeToken,Tokens),
{drop_empty(MaybeToken,Tokens,MoreTokens)}.
tokenize('',[')'|MoreTokens]) -->
[')'],
!,
tokenize(MaybeToken,Tokens),
{drop_empty(MaybeToken,Tokens,MoreTokens)}.
% No more characters in the input list (that's what '--> []' says).
% We succeed, with an empty token list and an empty buffer fro the
% leftmost token.
tokenize('',[]) --> [].
% If we find a 'Ch' that is not '(' or ')', then tokenize
% more of the list via 'tokenize(MaybeToken,Tokens)'. On
% returns 'MaybeToken' is a piece of the leftmost token found
% in that list, so we have to stick 'Ch' onto its start.
tokenize(LargerMaybeToken,Tokens) -->
[Ch],
tokenize(MaybeToken,Tokens),
{atom_concat(Ch,MaybeToken,LargerMaybeToken)}.
% ---
% This drops an empty "MaybeToken". If "MaybeToken" is
% *not* empty, it is actually a token and prepended to the list "Tokens"
% ---
drop_empty('',Tokens,Tokens) :- !.
drop_empty(MaybeToken,Tokens,[MaybeToken|Tokens]).
% -----------------
% Call the DCG using phrase/2
% -----------------
tokenize(Text,Result) :-
phrase( tokenize(MaybeToken,Tokens), Text ),
drop_empty(MaybeToken,Tokens,Result),!.
And so:
?- tokenize([h,e,l,l,o],R).
R = [hello].
?- tokenize([h,e,l,'(',l,')',o],R).
R = [hel,(,l,),o].
?- tokenize([h,e,l,'(',l,l,')',o],R).
R = [hel,(,ll,),o].
I think in GNU Prolog, the notation `hello` generates [h,e,l,l,o] directly.
I do not understand why tokenize fails for the empty case.
The reason anything fails in Prolog is because there is no clause that makes it true. If your only clause for tokenize is of the form tokenize([Char | Chars], ...), then no call of the form tokenize([], ...) will ever be able to match this clause, and since there are no other clauses, the call will fail.
So you need to add such a clause. But first:
:- set_prolog_flag(double_quotes, chars).
This allows you to write ['(', f, o, o, ')'] as "foo".
Also, you must plan for the case where the input is completely empty, or other cases where you must maybe emit a token for the buffer, but only if it is not '' (since there should be no '' tokens littering the result).
finish_buffer(Tokens, Buffer, TokensMaybeWithBuffer) :-
( Buffer = ''
-> TokensMaybeWithBuffer = Tokens
; TokensMaybeWithBuffer = [Buffer | Tokens] ).
For example:
?- finish_buffer(MyTokens, '', TokensMaybeWithBuffer).
MyTokens = TokensMaybeWithBuffer.
?- finish_buffer(MyTokens, 'foo', TokensMaybeWithBuffer).
TokensMaybeWithBuffer = [foo|MyTokens].
Note that you can prepend the buffer to the list of tokens, even if you don't yet know what that list of tokens is! This is the power of logical variables. The rest of the code uses this technique as well.
So, the case for the empty input:
tokenize([], Buffer, Tokens) :-
finish_buffer([], Buffer, Tokens).
For example:
?- tokenize([], '', Tokens).
Tokens = [].
?- tokenize([], 'foo', Tokens).
Tokens = [foo].
And the remaining cases:
tokenize([Parenthesis | Chars], Buffer, TokensWithParenthesis) :-
( Parenthesis = '('
; Parenthesis = ')' ),
finish_buffer([Parenthesis | Tokens], Buffer, TokensWithParenthesis),
tokenize(Chars, '', Tokens).
tokenize([' ' | Chars], Buffer, TokensWithBuffer) :-
finish_buffer(Tokens, Buffer, TokensWithBuffer),
tokenize(Chars, '', Tokens).
tokenize([Char | Chars], Buffer, Tokens) :-
Char \= '(',
Char \= ')',
Char \= ' ',
atom_concat(Buffer, Char, NewBuffer),
tokenize(Chars, NewBuffer, Tokens).
Note how I used separate clauses for the separate cases. This makes the code more readable, but it does have the drawback compared to (... -> ... ; ...) that the last clause must exclude characters handled by previous clauses. Once you have your code in this shape, and you're happy that it works, you can transform it into a form using (... -> ... ; ...) if you really want to.
Examples:
?- tokenize("(foo)", '', Tokens).
Tokens = ['(', foo, ')'] ;
false.
?- tokenize(" (foo)", '', Tokens).
Tokens = ['(', foo, ')'] ;
false.
?- tokenize("(foo(bar)baz)", '', Tokens).
Tokens = ['(', foo, '(', bar, ')', baz, ')'] ;
false.
Finally, and very importantly, the is operator is meant only for evaluation of arithmetic expressions. It will throw an exception when you apply it to anything that is not arithmetic. Unification is different from the evaluation of arithmetic expression. Unification is written as =.
?- X is 2 + 2.
X = 4.
?- X = 2 + 2.
X = 2+2.
?- X is [a, b, c].
ERROR: Arithmetic: `[a,b,c]' is not a function
ERROR: In:
ERROR: [20] throw(error(type_error(evaluable,...),_3362))
ERROR: [17] arithmetic:expand_function([a,b|...],_3400,_3402) at /usr/lib/swi-prolog/library/arithmetic.pl:175
ERROR: [16] arithmetic:math_goal_expansion(_3450 is [a|...],_3446) at /usr/lib/swi-prolog/library/arithmetic.pl:147
ERROR: [14] '$expand':call_goal_expansion([system- ...],_3512 is [a|...],_3492,_3494,_3496) at /usr/lib/swi-prolog/boot/expand.pl:863
ERROR: [13] '$expand':expand_goal(_3566 is [a|...],_3552,_3554,_3556,user,[system- ...],_3562) at /usr/lib/swi-prolog/boot/expand.pl:524
ERROR: [12] setup_call_catcher_cleanup('$expand':'$set_source_module'(user,user),'$expand':expand_goal(...,_3640,_3642,_3644,user,...,_3650),_3614,'$expand':'$set_source_module'(user)) at /usr/lib/swi-prolog/boot/init.pl:443
ERROR: [8] '$expand':expand_goal(user:(_3706 is ...),_3692,user:_3714,_3696) at /usr/lib/swi-prolog/boot/expand.pl:458
ERROR: [6] setup_call_catcher_cleanup('$toplevel':'$set_source_module'(user,user),'$toplevel':expand_goal(...,...),_3742,'$toplevel':'$set_source_module'(user)) at /usr/lib/swi-prolog/boot/init.pl:443
ERROR:
ERROR: Note: some frames are missing due to last-call optimization.
ERROR: Re-run your program in debug mode (:- debug.) to get more detail.
^ Call: (14) call('$expand':'$set_source_module'(user)) ? abort
% Execution Aborted
?- X = [a, b, c].
X = [a, b, c].
Related
I'm studying Standard ML and one of the exercices I have to do is to write a function called opPairs that receives a list of tuples of type int, and returns a list with the sum of each pair.
Example:
input: opPairs [(1, 2), (3, 4)]
output: val it = [3, 7]
These were my attempts, which are not compiling:
ATTEMPT 1
type T0 = int * int;
fun opPairs ((h:TO)::t) = let val aux =(#1 h + #2 h) in
aux::(opPairs(t))
end;
The error message is:
Error: unbound type constructor: TO
Error: operator and operand don't agree [type mismatch]
operator domain: {1:'Y; 'Z}
operand: [E]
in expression:
(fn {1=1,...} => 1) h
ATTEMPT 2
fun opPairs2 l = map (fn x => #1 x + #2 x ) l;
The error message is: Error: unresolved flex record (need to know the names of ALL the fields
in this context)
type: {1:[+ ty], 2:[+ ty]; 'Z}
The first attempt has a typo: type T0 is defined, where 0 is zero, but then type TO is referenced in the pattern, where O is the letter O. This gets rid of the "operand and operator do not agree" error, but there is a further problem. The pattern ((h:T0)::t) does not match an empty list, so there is a "match nonexhaustive" warning with the corrected type identifier. This manifests as an exception when the function is used, because the code needs to match an empty list when it reaches the end of the input.
The second attempt needs to use a type for the tuples. This is because the tuple accessor #n needs to know the type of the tuple it accesses. To fix this problem, provide the type of the tuple argument to the anonymous function:
fun opPairs2 l = map (fn x:T0 => #1 x + #2 x) l;
But, really it is bad practice to use #1, #2, etc. to access tuple fields; use pattern matching instead. Here is a cleaner approach, more like the first attempt, but taking full advantage of pattern matching:
fun opPairs nil = nil
| opPairs ((a, b)::cs) = (a + b)::(opPairs cs);
Here, opPairs returns an empty list when the input is an empty list, otherwise pattern matching provides the field values a and b to be added and consed recursively onto the output. When the last tuple is reached, cs is the empty list, and opPairs cs is then also the empty list: the individual tuple sums are then consed onto this empty list to create the output list.
To extend on exnihilo's answer, once you have achieved familiarity with the type of solution that uses explicit recursion and pattern matching (opPairs ((a, b)::cs) = ...), you can begin to generalise the solution using list combinators:
val opPairs = map op+
Can someone please explain the: "description of g"? How can f1 takes unit and returns an int & the rest i'm confused about too!!
(* Description of g:
* g takes f1: unit -> int, f2: string -> int and p: pattern, and returns
* an int. f1 and f2 are used to specify what number to be returned for
* each Wildcard and Variable in p respectively. The return value is the
* sum of all those numbers for all the patterns wrapped in p.
*)
datatype pattern = Wildcard
| Variable of string
| UnitP
| ConstP of int
| TupleP of pattern list
| ConstructorP of string * pattern
datatype valu = Const of int
| Unit
| Tuple of valu list
| Constructor of string * valu
fun g f1 f2 p =
let
val r = g f1 f2
in
case p of
Wildcard => f1 ()
| Variable x => f2 x
| TupleP ps => List.foldl (fn (p,i) => (r p) + i) 0 ps
| ConstructorP (_,p) => r p
| _ => 0
end
Wildcard matches everything and produces the empty list of bindings.
Variable s matches any value v and produces the one-element list holding (s,v).
UnitP matches only Unit and produces the empty list of bindings.
ConstP 17 matches only Const 17 and produces the empty list of bindings (and similarly for other integers).
TupleP ps matches a value of the form Tuple vs if ps and vs have the same length and for all i, the i-th element of ps matches the i-th element of vs. The list of bindings produced is all the lists from the nested pattern matches appended together.
ConstructorP(s1,p) matches Constructor(s2,v) if s1 and s2 are the same string (you can compare them with =) and p matches v. The list of bindings produced is the list from the nested pattern match. We call the strings s1 and s2 the constructor name.
Nothing else matches.
Can someone please explain the: "description of g"? How can f1 takes unit and returns an int & the rest i'm confused about too!!
The function g has type (unit → int) → (string → int) → pattern → int, so it takes three (curried) parameters of which two are functions and one is a pattern.
The parameters f1 and f2 must either be deterministic functions that always return the same constant, or functions with side-effects that can return an arbitrary integer / string, respectively, determined by external sources.
Since the comment speaks of "what number to be returned for each Wildcard and Variable", it sounds more likely that the f1 should return different numbers at different times (and I'm not sure what number refers to in the case of f2!). One definition might be this:
local
val counter = ref 0
in
fun uniqueInt () = !counter before counter := !counter + 1
fun uniqueString () = "s" ^ Int.toString (uniqueInt ())
end
Although this is just a guess. This definition only works up to Int.maxInt.
The comment describes g's return value as
[...] the sum of all those numbers for all the patterns wrapped in p.
Since the numbers are not ascribed any meaning, it doesn't seem like g serves any practical purpose but to compare the output of an arbitrarily given set of f1 and f2 against an arbitrary test that isn't given.
Catch-all patterns are often bad:
...
| _ => 0
Nothing else matches.
The reason is that if you extend pattern with additional types of patterns, the compiler will not notify you of a missing pattern in the function g; the catch-all will erroneously imply meaning for cases that are possibly yet undefined.
I'm completely new to Prolog and have a hard time understanding its unification system. My problem is as follows:
I have a Constraint integer, a Source array and a Target array (both of them are arrays of integers). I'd like to filter the elements in the Source array such that the remaining elements have a counterpart in the Target array so that the difference between the element in the Source array and the element in the Target array is exactly Constraint.
Example: Constraint = 1, Source = [1, 5, 7], Target = [2, 3, 4]. FilteredSource should be [1, 5], because abs(1 - 2) = 1 and abs(5 - 4) = 1. There's no 6 or 8 in Target so 7 is no good.
Depending on the arguments I use I either get wrong results or the query get's in an infinite loop.
So far, I have come up with this:
filterByDistanceConstraint(_Constraint, [], _Target, _).
filterByDistanceConstraint(Constraint, [SourceHead|SourceTail], Target, FilteredSource) :-
filterByDistanceConstraint(Constraint, SourceTail, Target, NewFilteredSource),
(
passesDistanceConstraint(Constraint, SourceHead, Target)
->
append(NewFilteredSource, [SourceHead], FilteredSource),
filterByDistanceConstraint(Constraint, SourceTail, Target, NewFilteredSource)
;
filterByDistanceConstraint(Constraint, SourceTail, Target, FilteredSource)
).
The passesDistanceConstraint part seems to work ok but I'll include it here for reference:
passesDistanceConstraint(_Constraint, _SourceHead, []) :-
fail.
passesDistanceConstraint(Constraint, SourceHead, [TargetHead|TargetTail]) :-
Distance is TargetHead - SourceHead,
( abs(Distance) =\= Constraint %test failed
-> passesDistanceConstraint(Constraint, SourceHead, TargetTail)
; write('Distance constraint passed for '), write(SourceHead), nl
).
I'm quite sure this is a trivial problem but I can't figure it out and I'm pulling my hair off. Thanks for any help!
This should work
filterByDistanceConstraint(_Constraint, [], _Target, []).
filterByDistanceConstraint(Constraint, [SourceHead|SourceTail], Target, FilteredSource) :-
( passesDistanceConstraint(Constraint, SourceHead, Target)
-> FilteredSource = [SourceHead|NewFilteredSource]
; FilteredSource = NewFilteredSource
),
filterByDistanceConstraint(Constraint, SourceTail, Target, NewFilteredSource).
Note the [] instead of _ in the first rule, and that there is only one recursive call in the second.
The 'filter' programming pattern is implemented in SWI-Prolog by include/3 or exclude/3, but to write in plain Prolog, since member/2 can act as an 'enumerator':
filterByDistanceConstraint(Constraint, Source, Target, FilteredSource) :-
findall(E, (member(E, Source), member(C, Target), abs(E - C) =:= Constraint), FilteredSource).
I have this code which finds double quotation marks and converts the inside of those quotation marks into a string. It manages to find the first quotation mark but fails to find the second so: "this" would be "this . How do I get it I can get this function to find the full string.
Maybe this is too obvious:
if (ch = #"\"") then SOME(String(x ^ "\""))
I do not really understand your code: you return the string just after the first occurence of the quotation mark, but this string has been built with the characters that you've found before it. Moreover, why do you return SOME(Error) instead of NONE?
You need to use a boolean variable to know when the first quotation mark has been seen and to stop when the second one is found. So I would write something like this:
fun parseString x inStr quote =
case (TextIO.input1 inStr, quote) of
(NONE, _) => NONE
| (SOME #"\"", true) => SOME x
| (SOME #"\"", false) => parseString x inStr true
| (SOME ch, true) => parseString (x ^ (String.str ch)) inStr quote
| (SOME _ , false) => parseString x inStr quote;
and initialize quote with false.
let _ as s = "abc" in s ^ "def"
So how should understand this?
I guess it is some kind of let pattern = expression thing?
First, what's the meaning/purpose/logic of let pattern = expression?
Also, in pattern matching, I know there is pattern as identifier usage, in let _ as s = "abc" in s ^ "def", _ is pattern, but behind as, it is an expression s = "abc" in s ^ "def", not an identifier, right?
edit:
finally, how about this: (fun (1 | 2) as i -> i + 1) 2, is this correct?
I know it is wrong, but why? fun pattern -> expression is allowed, right?
I really got lost here.
The grouping is let (_ as s) = "abc" -- which is just a convoluted way of saying let s = "abc", because as with a wildcard pattern _ in front is pretty much useless.
The expression let pattern = expr1 in expr2 is pretty central to OCaml. If the pattern is just a name, it lets you name an expression. This is like a local variable in other language. If the pattern is more complicated, it lets you destructure expr1, i.e., it lets you give names to its components.
In your expression, behind as is just an identifier: s. I suspect your confusion all comes down to this one thing. The expression can be parenthesized as:
let (_ as s) = "abc" in s ^ "def"
as Andreas Rossberg shows.
Your final example is correct if you add some parentheses. The compiler/toplevel rightly complains that your function is partial; i.e., it doesn't know what to do with most ints,
only with 1 and 2.
Edit: here's a session that shows how to add the parentheses to your final example:
$ ocaml
OCaml version 4.00.0
# (fun (1 | 2) as i -> i + 1) 2;;
Error: Syntax error
# (fun ((1 | 2) as i) -> i + 1) 2;;
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
0
- : int = 3
#
Edit 2: here's a session that shows how to remove the warning by specifying an exhaustive set of patterns.
$ ocaml
OCaml version 4.00.0
# (function ((1|2) as i) -> i + 1 | _ -> -1) 2;;
- : int = 3
# (function ((1|2) as i) -> i + 1 | _ -> -1) 3;;
- : int = -1
#