This really falls under the purview of "Don't DO that!" ,but..
I wrote this to see what would happen.
circit <-function(x=deparse(substitute(y)),y=deparse(substitute(x)))
{
return(list(x=x,y=y))
}
Two examples:
> circit()
$x
[1] "deparse(substitute(x))"
$y
[1] "deparse(substitute(y))"
> circit(3)
$x
[1] 3
$y
[1] "3"
Notice the subtle swap of "x" and "y" in the output.
I can't follow the logic, so can someone explain how the argument parser handles this absurd pair of default inputs? (the second case is easy to follow)
The key thing to understand/remember is that formal arguments are promise objects, and that substitute() has special rules for how it evaluates promise objects. As explained in ?substitute, it returns their expression slot, not their value:
Substitution takes place by examining each component of the parse
tree as follows: If it is not a bound symbol in ‘env’, it is
unchanged. If it is a promise object, i.e., a formal argument to
a function or explicitly created using ‘delayedAssign()’, the
expression slot of the promise replaces the symbol. If it is an
ordinary variable, its value is substituted, unless ‘env’ is
‘.GlobalEnv’ in which case the symbol is left unchanged.
To make this clearer, it might help to walk through the process in detail. In the first case, you call circuit() with no supplied arguments, so circuit() uses the default values of both x= and y=.
For x, that means its value is gotten by evaluating deparse(substitute(y)). The symbol y in that expression is matched by the formal argument y, a promise object. substitute() replaces the symbol y with its expression slot, which holds the expression deparse(substitute(x)). Deparsing that expression returns the text string "deparse(substitute(x))", which is what gets assigned to x's value slot.
Likewise, the value of y is gotten by evaluating the expression deparse(substitute(x)). The symbol x is matched by the formal argument x, a promise object. Even though the value of x is something else, its expression slot is still deparse(substitute(y)), so that's what's returned by evaluating substitute(x). As a result, deparse(substitute(x)) returns the string "deparse(substitute(y))"
Trying to get into Julia after learning python, and I'm stumbling over some seemingly easy things. I'd like to have a function that takes strings as arguments, but uses one of those arguments as a regular expression to go searching for something. So:
function patterncount(string::ASCIIString, kmer::ASCIIString)
numpatterns = eachmatch(kmer, string, true)
count(numpatterns)
end
There are a couple of problems with this. First, eachmatch expects a Regex object as the first argument and I can't seem to figure out how to convert a string. In python I'd do r"{0}".format(kmer) - is there something similar?
Second, I clearly don't understand how the count function works (from the docs):
count(p, itr) → Integer
Count the number of elements in itr for which predicate p returns true.
But I can't seem to figure out what the predicate is for just counting how many things are in an iterator. I can make a simple counter loop, but I figure that has to be built in. I just can't find it (tried the docs, tried searching SO... no luck).
Edit: I also tried numpatterns = eachmatch(r"$kmer", string, true) - no go.
To convert a string to a regex, call the Regex function on the string.
Typically, to get the length of an iterator you an use the length function. However, in this case that won't really work. The eachmatch function returns an object of type Base.RegexMatchIterator, which doesn't have a length method. So, you can use count, as you thought. The first argument (the predicate) should be a one argument function that returns true or false depending on whether you would like to count a particular item in your iterator. In this case that function can simply be the anonymous function x->true, because for all x in the RegexMatchIterator, we want to count it.
So, given that info, I would write your function like this:
patterncount(s::ASCIIString, kmer::ASCIIString) =
count(x->true, eachmatch(Regex(kmer), s, true))
EDIT: I also changed the name of the first argument to be s instead of string, because string is a Julia function. Nothing terrible would have happened if we would have left that argument name the same in this example, but it is usually good practice not to give variable names the same as a built-in function name.
I'm trying to implement a sliding window algorithm for matching words in a text file. I come from a procedural background and my first attempt to do this in a functional language like Erlang seems to require time O(n^2) (or even more). How would one do this in a functional language?
-module(test).
-export([readText/1,patternCount/2,main/0]).
readText(FileName) ->
{ok,File} = file:read_file(FileName),
unicode:characters_to_list(File).
patternCount(Text,Pattern) ->
patternCount_(Text,Pattern,string:len(Pattern),0).
patternCount_(Text,Pattern,PatternLength,Count) ->
case string:len(Text) < PatternLength of
true -> Count;
false ->
case string:equal(string:substr(Text,1,PatternLength),Pattern) of
true ->
patternCount_(string:substr(Text,2),Pattern,PatternLength,Count+1);
false ->
patternCount_(string:substr(Text,2),Pattern,PatternLength,Count)
end
end.
main() ->
test:patternCount(test:readText("file.txt"),"hello").
Your question is a bit too broad, since it asks about implementing this algorithm in functional languages but how best to do that is language-dependent. My answer therefore focuses on Erlang, given your example code.
First, note that there's no need to have separate patternCount and patternCount_ functions. Instead, you can just have multiple patternCount functions with different arities as well as multiple clauses of the same arity. First, let's rewrite your functions to take that into account, and also replace calls to string:len/1 with the length/1 built-in function:
patternCount(Text,Pattern) ->
patternCount(Text,Pattern,length(Pattern),0).
patternCount(Text,Pattern,PatternLength,Count) ->
case length(Text) < PatternLength of
true -> Count;
false ->
case string:equal(string:substr(Text,1,PatternLength),Pattern) of
true ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count+1);
false ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count)
end
end.
Next, the multi-level indentation in the patternCount/4 function is a "code smell" indicating it can be done better. Let's split that function into multiple clauses:
patternCount(Text,Pattern,PatternLength,Count) when length(Text) < PatternLength ->
Count;
patternCount(Text,Pattern,PatternLength,Count) ->
case string:equal(string:substr(Text,1,PatternLength),Pattern) of
true ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count+1);
false ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count)
end.
The first clause uses a guard to detect that no more matches are possible, while the second clause looks for matches. Now let's refactor the second clause to use Erlang's built-in matching. We want to advance through the input text one element at a time, just as the original code does, but we also want to detect matches as we do so. Let's perform the matches in our function head, like this:
patternCount(_Text,[]) -> 0;
patternCount(Text,Pattern) ->
patternCount(Text,Pattern,Pattern,length(Pattern),0).
patternCount(Text,_Pattern,_Pattern,PatternLength,Count) when length(Text) < PatternLength ->
Count;
patternCount(Text,[],Pattern,PatternLength,Count) ->
patternCount(Text,Pattern,Pattern,PatternLength,Count+1);
patternCount([C|TextTail],[C|PatternTail],Pattern,PatternLength,Count) ->
patternCount(TextTail,PatternTail,Pattern,PatternLength,Count);
patternCount([_|TextTail],_,Pattern,PatternLength,Count) ->
patternCount(TextTail,Pattern,Pattern,PatternLength,Count).
First, note that we added a new argument to the bottom four clauses: we now pass Pattern as both the second and third arguments to allow us to use one of them for matching and one of them to maintain the original pattern, as explained more fully below. Note also that we added a new clause at the very top to check for an empty Pattern and just return 0 in that case.
Let's focus only on the bottom three patternCount/5 clauses. These clauses are tried in order at runtime, but let's look at the second of these three clauses first, then the third clause, then the first of the three:
In the second of these three clauses, we write the first and second arguments in [Head|Tail] list notation, which means Head is the first element of the list and Tail is the rest of the list. We use the same variable for the head of both lists, which means that if the first elements of both lists are equal, we have a potential match in progress, so we then recursively call patternCount/5 passing the tails of the lists as the first two arguments. Passing the tails allows us to advance through both the input text and the pattern an element at a time, checking for matching elements.
In the last clause, the heads of the first two arguments do not match; if they did, the runtime would execute the second clause, not this one. This means that our pattern match has failed, and so we no longer care about the first element of the first argument nor about the second argument, and we have to advance through the input text to look for a new match. Note that we write both the head of the input text and the second argument as the _ "don't care" variable, as they are no longer important to us. We recursively call patternCount/5, passing the tail of the input text as the first argument and the full Pattern as the second argument, allowing us to start looking for a new match.
In the first of these three clauses, the second argument is the empty list, which means we've gotten here by successfully matching the full Pattern, element by element. So we recursively call patternCount/5 passing the full Pattern as the second argument to start looking for a new match, and we also increment the match count.
Try it! Here's the full revised module:
-module(test).
-export([read_text/1,pattern_count/2,main/0]).
read_text(FileName) ->
{ok,File} = file:read_file(FileName),
unicode:characters_to_list(File).
pattern_count(_Text,[]) -> 0;
pattern_count(Text,Pattern) ->
pattern_count(Text,Pattern,Pattern,length(Pattern),0).
pattern_count(Text,_Pattern,_Pattern,PatternLength,Count)
when length(Text) < PatternLength ->
Count;
pattern_count(Text,[],Pattern,PatternLength,Count) ->
pattern_count(Text,Pattern,Pattern,PatternLength,Count+1);
pattern_count([C|TextTail],[C|PatternTail],Pattern,PatternLength,Count) ->
pattern_count(TextTail,PatternTail,Pattern,PatternLength,Count);
pattern_count([_|TextTail],_,Pattern,PatternLength,Count) ->
pattern_count(TextTail,Pattern,Pattern,PatternLength,Count).
main() ->
pattern_count(read_text("file.txt"),"hello").
A few final recommendations:
Searching through text element by element is slower than necessary. You should have a look at the Boyer-Moore algorithm and other related algorithms to see ways of advancing through text in larger chunks. For example, Boyer-Moore attempts to match at the end of the pattern first, since if that's not a match, it can advance through the text by as much as the full length of the pattern.
You might want to also looking into using Erlang binaries rather than lists, as they are more compact memory-wise and they allow for matching more than just their first elements. For example, if Text is the input text as a binary and Pattern is the pattern as a binary, and assuming the size of Text is equal to or greater than the size of Pattern, this code attempts to match the whole pattern:
case Text of
<<Pattern:PatternLength/binary, TextTail/binary>> = Text ->
patternCount(TextTail,Pattern,PatternLength,Count+1);
<<_/binary,TextTail/binary>> ->
patternCount(TextTail,Pattern,PatLen,Count)
end.
Note that this code snippet reverts to using patternCount/4 since we no longer need the extra Pattern argument to work through element by element.
As shown in the full revised module, when calling functions in the same module, you don't need the module prefix. See the simplified main/0 function.
As shown in the full revised module, conventional Erlang style does not use mixed case function names like patternCount. Most Erlang programmers would use pattern_count instead.
Heres a snippet:
translate("a", "4").
translate("m", "/\\/\\").
tol33t([], []).
tol33t([Upper|UpperTail], [Lower|LowerTail]) :-
translate([Upper], [Lower]),
tol33t(UpperTail, LowerTail).
Basically what i want to do is look up in the table for a letter and then get that letter and add it to the new list.
What i have works if its a character, but I'm not sure how to append the new list of characters with the old.
Example input:
l33t("was", L).
It will be put through like this:
l33t([119,97,115], L).
Now that should come back as:
[92,47,92,47]++[52]++[53] or [92,47,92,47,52,53]
Problem is i don't know how to append it like that.
Consider these modifications to tol33t/2:
tol33t([], []).
tol33t([Code|Codes], Remainder) :-
translate([Code], Translation), !,
tol33t(Codes, Rest),
append(Translation, Rest, Remainder).
tol33t([Code|Codes], [Code|Remainder]) :-
tol33t(Codes, Remainder).
The first clause is the base case.
The second clause will succeed iff there is a translation for the current Code via translate/2, as a list of characters of arbitrary length (Translation - note you had [Lower] instead, which restricted results to lists of length 1 only). The cut (!) after the check for a code translation commits to finding the Rest of the solution recursively and then appends the Translation to the front, as the Remainder to return.
The third clause is executed iff there was no translation for the current Code (i.e., the call to translate/2) in the second clause. In this case, no translation for the Code means we just return it as is and compute the rest.
EDIT:
If you don't have cut (!), the second and third clauses can be combined to become:
tol33t([Code|Codes], Remainder) :-
tol33t(Codes, Rest),
(translate([Code], Translation) ->
append(Translation, Rest, Remainder)
; Remainder = [Code|Rest]
).
This (unoptimized) version checks, at every Code in the character list, if there is a translate/2 that suceeds; if so, the Translation is appended to the Rest, else the Code is passed through unchanged. Note that this has the same semantics as the implementation above, in that solutions are commited to (i.e., simulating a cut !) if the antecedent to -> (translate/2) succeeds. Note that the cut in both implementations is strictly necessary; without it, the program will backtrack to find solutions where Code bindings are not translated where there exists an applicable translate/2 clause.