assoc strings in doseq result in empty map - dictionary

I am learning Clojure by solving problems, I'm stuck with one of them, basically I have to find the top five strings in a log file.
Here is what I've got so far:
(ns topfive
(:import (java.io BufferedReader FileReader)))
(defn extract-query [line]
(.substring line (+ (.lastIndexOf line "=") 1) (.lastIndexOf line "]")))
(defn process-file [file-name, queries]
(with-open [rdr (BufferedReader. (FileReader. file-name))]
(doseq [line (line-seq rdr)]
(assoc queries (extract-query line) (inc (get queries (extract-query line) 0))))))
(process-file "in" {})
My problem is that queries does not contain anything, I've already checked that extract-queries returns the string I want, I thought that this might have something to do with the language itself, I've read that Clojure has immutability at language level, but this still does not seem a good point to me.
Could you suggest something about what I am doing wrong?

Clojure does have immutability at a low level, and hash-maps are immutable. So assoc doesn't mutate a map in-place, it creates a new map with an updated item in it, and returns the new map. You're calling assoc over and over, but discarding the results.
One fix is to use reduce instead of doseq. doseq iterates over a seq and does something to each item, but doesn't accumulate any results. So it should be used mostly for things that have side effects, e.g. printing to screen or file. reduce similarly iterates over a seq, but it does accumulate results.
(defn process-file [file-name, queries]
(with-open [rdr (BufferedReader. (FileReader. file-name))]
(reduce (fn [queries, line]
(assoc queries (extract-query line) (inc (get queries (extract-query line) 0))))
queries
(line-seq rdr))))
You could do a few things to simplify this a bit further. There's no need for a queries parameter to process-file, since it's always going to be an empty map to begin with. Your assoc line can be written more concisely using update-in and fnil; this also lets us avoid calling extract-query twice per line. You can replace all the calls to the Java Reader classes with the Clojure wrapper reader in clojure.java.io. You can replace your calls to substring with a regular expression; regex is more concise, but for large inputs your version might perform faster. You could also replace the anonymous function in my example with a sugary reader macro version using #(), though it's starting to look a bit noisy at this point, so I'd probably use let to make it read a bit better.
(ns topfive
(:require [clojure.java [io :as io]]))
(defn extract-query [line]
(nth (re-find #"query=([^]]+)" line) 1))
(defn process-file [file-name]
(with-open [rdr (io/reader file-name)]
(reduce #(let [search-term (extract-query %2)]
(update-in %1 [search-term] (fnil inc 0)))
{}
(line-seq rdr))))

in addition to Brians excellent answer: The threading macro may improve readability:
(ns stackoverflow
(:use [clojure.string :only [split]]
[clojure.java.io :only [reader]]))
(->> (reader "input.txt")
(line-seq)
(map #(last (split % #"=")))
(frequencies))

Related

Define a constant array of struct known at compilation-time

In my program I have constant strings, the values are known at compilation time. For each offset there are currently 2 associated strings. I first wrote the following code:
(eval-when (:compile-toplevel :load-toplevel :execute) ;; BLOCK-1
(defstruct test-struct
str-1
str-2))
(eval-when (:compile-toplevel) ;; BLOCK-2
(defparameter +GLOBAL-VECTOR-CONSTANT+ nil) ;; ITEM-1
(let ((vector (make-array 10
:initial-element (make-test-struct)
:element-type 'test-struct)))
(setf (test-struct-str-1 (aref vector 0)) "test-0-1")
(setf (test-struct-str-2 (aref vector 0)) "test-0-2")
(setf +GLOBAL-VECTOR-CONSTANT+ vector)))
(format t "[~A]~%" (test-struct-str-1 (elt +GLOBAL-VECTOR-CONSTANT+ 0)))
(format t "[~A]~%" (test-struct-str-2 (elt +GLOBAL-VECTOR-CONSTANT+ 0)))
This seems to work as it returns the following:
[test-2-1]
[test-2-2]
In BLOCK-1 the struct containing the data is defined, for compile-time, load-time and execute-time. In BLOCK-2, the code which create a vector and sets the values is executed, at compile-time.
But I have the following concerns:
This code seems unnecessary verbose
The strings are stored in a structure
I need to manually set the offset of each values ((aref vector 0), (aref vector 1), etc).
When I set ITEM-1 inside BLOCK-1 instead of BLOCK-2 I get an error in SBCL which I don't understand
What is the idiomatic way to define complex constants in Common Lisp?
It's not really clear what you want to do from your question.
First important note: your code is seriously broken. It's broken because you define +global-vector-constant+ only at compile time but refer to it later than that. If you compile this file and then load that compiled file into a cold image you will get errors.
It is absolutely critical when dealing with things like this to make sure that your code will compile in a cold Lisp. One of the classic problems with resident environments (which CL isn't really, compared to the way Interlisp-D was for instance) is to end up with systems which you can't cold build: I'm pretty sure I worked for several years with an Interlisp-D sysout that no-one knew how to cold build any more.
If what you want is an object (an array, for instance) whose initial value is computed at compile time and then treated as a literal, then the answer to that is, in general, a macro: macros are exactly functions which do their work at compile time, and so a macro can expand to a literal. In addition it must be the case that the object you want to be a literal is externalizable (which means 'can be dumped in compiled files') and anything involved in it is known about at compile time. Instances of some classes are externalizable by default, those of some other classes can be made externalizable by user code, and some are not externalizable at all (for instance functions).
In quite a lot of simple cases, like the one you gave, if I understand it, you don't really need a macro, and in fact you can almost always get away without one, although it may make your code easier to understand if you do use one.
Here is a simple case: many arrays are externalizable if their elements are
(defparameter *my-strings*
#(("0-l" . "0-r")
("1-l" . "1-r")))
This means that *my-strings* will be bound to a literal array of conses of strings.
A more interesting case is when the elements are, for instance structures. Well, structures are also externalizable, so we can do that. And in fact it's quite possible, still, to avoid a macro, although it now becomes a bit noisy.
(eval-when (:compile-toplevel :load-toplevel :execute)
(defstruct foo
l
r))
(defparameter *my-strings*
#(#s(foo :l "0-l" :r "0-r")
#s(foo :l "1-l" :r "1-r")))
Note that the following won't work:
(defstruct foo
l
r)
(defparameter *my-strings*
#(#s(foo :l "0-l" :r "0-r")
#s(foo :l "1-l" :r "1-r")))
It won't work because, at compile time, you are trying to externalize instances of a structure which is not yet defined (but it probably will work if the Lisp is not cold, and you might even be able to reload the compiled file you made that way). Again, in this case you can avoid the eval-when in a larger system by ensuring that the file which defines the foo structure is compiled and loaded before the file with the defparameter is loaded.
And even in more complex cases you can escape using a macro. For instance for many sorts of objects which are normally not externalizable you can teach the system how to externalize them, and then splice the object in as a literal using #.:
(eval-when (:compile-toplevel :load-toplevel :execute)
;; Again, this would be in its own file in a bigger system
(defclass string-table-wrapper ()
((strings)
(nstrings :initform 0)))
(defmethod initialize-instance :after ((w string-table-wrapper)
&key (strings '()))
(let ((l (length strings)))
(when l
(with-slots ((s strings) (n nstrings)) w
(setf s (make-array l :initial-contents strings)
n l)))))
(defmethod make-load-form ((w string-table-wrapper) &optional environment)
(make-load-form-saving-slots w :slot-names '(strings nstrings)
:environment environment))
) ;eval-when
(defgeneric get-string (from n)
(:method ((from string-table-wrapper) (n fixnum))
(with-slots (strings nstrings) from
(assert (< -1 n nstrings )
(n)
"bad index")
(aref strings n))))
(defparameter *my-strings*
#.(make-instance 'string-table-wrapper
:strings '("foo" "bar")))
Note that, of course, although the value of *my-strings* is a literal, code ran to reconstruct this object at load-time. But that is always the case: it's just that in this case you had to define what code needed to run. Instead of using make-load-form-saving-slots you could have done this yourself, for instance by something like this:
(defmethod make-load-form ((w string-table-wrapper) &optional environment)
(declare (ignore environment))
(if (slot-boundp w 'strings)
(values
`(make-instance ',(class-of w))
`(setf (slot-value ,w 'strings)
',(slot-value w 'strings)
(slot-value ,w 'nstrings)
,(slot-value w 'nstrtrings)))
`(make-instance ',(class-of w))))
But make-load-form-saving-slots is much easier.
Here is an example where a macro does perhaps least make reading the code easier.
Let's assume you have a function which reads an array of strings from a file, for instance this:
(defun file-lines->svector (file)
;; Needs CL-PPCRE
(with-open-file (in file)
(loop
with ltw = (load-time-value
(create-scanner '(:alternation
(:sequence
:start-anchor
(:greedy-repetition 1 nil
:whitespace-char-class))
(:sequence
(:greedy-repetition 1 nil
:whitespace-char-class)
:end-anchor)))
t)
for nlines upfrom 0
for line = (read-line in nil)
while line
collect (regex-replace-all ltw line "") into lines
finally (return (make-array nlines :initial-contents lines)))))
Then, if this function is available at macroexpansion time, you could write this macro:
(defmacro file-strings-literal (file)
(check-type file (or string pathname) "pathname designator")
(file-lines->svector file))
And now we can create a literal vector of strings:
(defparameter *fl* (file-strings-literal "/tmp/x"))
However you could perfectly well instead do this:
(defparameter *fl* #.(file-lines->svector "/tmp/x"))
Which will do the same thing, but slightly earlier (at read time, rather than at macroexpansion/compile time). So this is gaining nothing really.
But you could also do this:
(defmacro define-stringtable (name file &optional (doc nil docp))
`(defparameter ,name ,(file-lines->svector file)
,#(if docp (list doc) nil)))
And now your code reads like
(define-stringtable *st* "my-stringtable.dat")
And that actually is a significant improvement.
Finally note that in file-lines->svector that load-time-value is used to create the scanner exactly once, at load time, which is a related trick.
First of all, your let code can be simplified to
(defparameter +global-vector-constant+
(let ((vector ...))
...
vector))
Second, you can also do
(defparameter +global-vector-constant+
(make-array 10 :element-type 'test-struct :initial-content
(cons (make-test-struct :str-1 "test-0-1" :str-2 "test-0-2")
(loop :repeat 9 :collect (make-test-struct)))))
Note that the benefit of :element-type 'test-struct is generally limited to code self-documentation (see upgraded-array-element-type)

Clojure: Update map inside a method

I have a use case where I want to update one of my map type variables inside a method call. To demonstrate here is a code snippet,
(defn func [mymap]
(conj mymap [1 2])) ;update mymap variable here such that changes are persistent after the method returns
(let [mymap {}
_ (func mymap)] (println mymap))
which outputs {} because I think a new map is created with the conj function. How do I update the mymap variable in func such that the output of the above program will be {1 2}?
If it is not possible in Clojure, how are such use cases handled in general?
Many choices. Simplest is to rebind the mymap variable. Consider:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(defn doit [mymap]
(into mymap {:b 42}))
(dotest
(let [m {:a 1}
m2 (doit m)]
(spyx m2)))
m2 => {:a 1, :b 42}
and we get what we expect.
Update the code to reuse the name m:
(dotest
(let [m {:a 1}
m (doit m)]
(spyx m)))
m => {:a 1, :b 42}
Here the 2nd usage of m creates a separate variable that shadows the first m. This works great and people do it accidentally all the time without even realizing it.
If you want to copy the behavior of Java, you need a Clojure atom to create a mutable storage location.
(dotest
(let [m-atom (atom {:a 1})]
(swap! m-atom doit)
(spyx #m-atom)))
(deref m-atom) => {:a 1, :b 42}
Here swap! applies the function doit to the contents
of m-atom, the puts the results as the new contents.
We need the #m-atom or (deref m-atom) to pull out the contents of the atom for printing.
The above convenience functions can be found here. I also have some great documentation references in this template project. Be especially sure to study the Clojure Cheatsheet daily.
Clojure uses immutable data types by default. This means, you cannot
mutate the data in place like you are used to from many other
programming languages.
The functional approach here is to use the result from the conj (the
last statement inside a defn is it's return value).
(let [mymap {}
result (func mymap)]
(println result))
The longer you can keep up with pure functions on immutable data the
easier your life will be; reasoning about your programs and testing
them becomes a lot easier.
There is of course the option to use mutable data classes from Java, but
don't use them unless you really have to.
And since nearly all programs need some state, there are also atom:s
I only mention this here, because short of def everywhere, atom
everywhere are the next best "trap" beginners run into.

How do I map over a list of async channels in the order they exist in a list?

I'm having trouble returning the values from core.async channels in the browser in the order they were created (as opposed to the order at which they return a value). The channels themselves are returned from mapping cljs-http.client/get over a list of URLs.
If I bind the results manually in a let block then I can return the results in the order of the channels "by hand", but this obviously a problem when I don't know how many channels exist.
(let [response-channels (map #(http/get "http://date.jsontest.com" {:with-credentials? false}) (range 3))]
; Response is now three channels generated by http/get:
;(#object[cljs.core.async.impl.channels.ManyToManyChannel]
; #object[cljs.core.async.impl.channels.ManyToManyChannel]
; #object[cljs.core.async.impl.channels.ManyToManyChannel])
; If I want the results back in the guaranteed order that I made them, I can do this:
(go (let [response1 (<! (nth response-channels 0))
response2 (<! (nth response-channels 1))
response3 (<! (nth response-channels 2))]
(println "This works as expected:" response1 response2 response3))))
But if I try to map <! over the channels instead of binding to them individually then I just get a the list of channels instead of their values.
(let [response-channels (map #(http/get "http://date.jsontest.com" {:with-credentials? false}) (range 3))]
(let [responses (into [] (map (fn [c] (go (<! c))) response-channels))]
(println "This just returns the channels:" responses)
; This is still just a vec of many-to-many channels
; [#object[cljs.core.async.impl.channels.ManyToManyChannel]
; #object[cljs.core.async.impl.channels.ManyToManyChannel]
; #object[cljs.core.async.impl.channels.ManyToManyChannel]]
)
)
I suspect it's a problem with the location of the go block, however I can't move it outside of the anonymous function without an error that I'm using <! outside of a go block.
This doesn't work:
(into [] (go (map <! response-channels)))
And neither does this:
(go (let [responses (into [] (map <! response-channels))]))
I also tried merging the channels via async/merge and then using async/reduce to conjoin the values but results are in the order of when the requests were fulfilled, not the order of the channels being merged.
Can anyone shed some light on retrieving values from a list of channels in the order the channels exist in the list?
In Clojure you could do (map <!! response-channels), but that's not possible in ClojureScript. What's even more important is that it's discouraged to use map—or lazy operations in general—for side effects (checkout this blog post to see why). The reason your code doesn't yield the results you're expecting is the (nested) use of fn within the go block (see this answer):
By [the Clojure go-block] stops translation at function boundaries, I mean this: the go block takes its body and translates it into a state-machine. Each call to <! >! or alts! (and a few others) are considered state machine transitions where the execution of the block can pause. At each of those points the machine is turned into a callback and attached to the channel. When this macro reaches a fn form it stops translating. So you can only make calls to <! from inside a go block, not inside a function inside a code block.
I'm not quite sure, but when you have a look at (source map) you'll see that it invokes fn directely as well as via other functions (such as lazy-seq), which is probably why (go (map <! response-channels)) doesn't work.
Anyway, how about doseq:
(go (doseq [c response-channels]
(println (<! c))))
This will respect the order within response-channels.

Clojure fn name leaking outside its scope when compiled ahead-of-time

I want to generate named functions with fn and return them from the macro, I tried the following example:
(defmacro getfn
[namestr children]
`(fn fn-name# []
(println "Recursing" ~namestr)
(doall (map (fn [child#] (child#)) ~children))))
(def foo (getfn "foo" []))
(def bar (getfn "bar" [foo]))
(defn -main [& args]
(bar))
The resulting output is usually as expected:
Recursing bar
Recursing foo
However, when I run this compiled ahead-of-time (AOT) I get:
Recursing bar
Recursing bar
...
Recursing bar
Recursing bar
Exception in thread "main" java.lang.StackOverflowError
I find it pretty strange that bar keeps calling itself instead of foo, the only sensible reason for this is for the generated symbol fn-name# to leak outside its scope. Is this a bug in Clojure or intended behaviour?
Update: For clarity should mention that removing the fn-name# symbol and making the function anonymous fixes this problem. However, in my actual code I need to call it recursively sometimes, so naming it is necessary.
One solution I have for this problem is to use gensym to get a new symbol for each version of the macro, this would work by modifying the getfn as follows:
(defmacro getfn
[namestr children]
`(let [fn-name# (gensym)]
(fn fn-name# []
(println "Recursing" ~namestr)
(doall (map (fn [child#] (child#)) ~children)))))
This feels a bit unnecessary since by definition the fn name should be relevant only inside its own scope.
Update: Just tested with alpha releases and it seems Clojure 1.7.0-alpha3 and later work without this hack, Clojure 1.7.0-alpha2 and earlier are broken. Using this workaround is probably ok until stable version of 1.7.0 is released, unless someone can think of something better.

Error with define in Racket

I just discovered Racket a few days ago, and I'm trying to get more comfortable with it by writing a little script that generates images to represent source code using #lang slideshow.
I know that when programming in a functional paradigm it's good practice to create almost all your variables with let, but I find that it introduces too many levels of nesting and that Racket's let has an overcomplicated API which requires superfluous parentheses. I'm sure this is to remove ambiguity when using let in more powerful ways, but for my purposes it's just an annoyance. Consequently, I'm creating all my variables with define, and writing blocks with begin if I need to (such as in the body of an if statement).
The problem is that I've repeatedly been getting what seem to be very mysterious errors. I'm sure I'm just making some silly beginner's mistake, being new to the language, but I really can't seem to find the source of the complaint.
Here's the offending code:
(define sub-code (foldr ht-append (rectangle 0 0) (map internal-style (rest code))))
although what we're defining sub-code to seems pretty irrelevant. If I replace it with
(define sub-code '())
I receive the same error. DrRacket is saying that define is being used in an expression context. I understand what this error would normally mean - IE that it would raise when you write code like (print (define x 10)), but I can't see what would trigger it here.
If it helps, this define is at the beginning of a begin block, inside an if statement
(if (list? code)
(begin
(define sub-code '())
; a few more define statements and finally an expression ))
The specific error message DrRacket is printing is
define: not allowed in an expression context in: (define sub-code (quote ()))
I thought maybe define isn't allowed in begin blocks, but I checked the docs and one of the examples for begin is
(begin
(define x 10)
x)
So I don't really know what to do. Thanks in advance!
Definitions are allowed in a 'body' context, like in lambda and let among others. The consequent and alternate clauses of if are not body contexts; they are expression contexts and therefore definitions are not allowed.
begin is special - begin in a body context allows definitions, but begin in an expression contexts forbids definitions. Your case falls in to the later.
For example:
(define (foo . args) #| body context #|)
(define foo (lambda args #| body context |#))
(define (foo . args)
(let (...)
#| body context |#))
Syntactic keywords that requires expressions: if, cond, case, and, or, when, unless, do, begin. Check out the formal syntax in any Scheme report (r{4,5,6,7}rs); look for <body>, <sequence>, <command>, and <expression>.
Also, if you need a body context in an expression, just wrap a let syntactic form, as such:
(if test
(let ()
(define foo 'foo)
(list foo foo))
alternate)
As GoZoner explained, you can't use define in an expression context.
What could you do instead?
Use let:
(if (list? code)
(let ([x '()])
x)
...
Or it would work with an "empty" let and define:
(if (list? code)
(let ()
(define x '())
x)
...
But that's a bit silly.
Or use cond and define:
(cond [(list? code)
(define x '())
x]
...
This last way -- using cond and define -- is closest to what the current Racket style guide recommends.
Here's more details, from the Racket docs.
The different contexts are required because macros must expand differently, depending on which language forms are allowed.
As others have said, definitions are not allowed in expression contexts ("expr ..." in the docs), but are ok in other contexts.
In other doc entries, "body ..." indicates an internal-definition context (guide, reference), for example in lambda bodies, and "form ..." indicates all non-expression contexts, like in the docs for begin.
Or you could wrap the expressions in (begin)
e.g.(begin
(define x 10)
(define y 100)
(define z 1000))

Resources