Converting a org.w3c.dom.NodeList to a Clojure ISeq - collections

I am trying to get a handle on the new defprotocol, reify, etc.
I have a org.w3c.dom.NodeList returned from an XPath call and I would like to "convert" it to an ISeq.
In Scala, I implemented an implicit conversion method:
implicit def nodeList2Traversable(nodeList: NodeList): Traversable[Node] = {
new Traversable[Node] {
def foreach[A](process: (Node) => A) {
for (index <- 0 until nodeList.getLength) {
process(nodeList.item(index))
}
}
}
}
NodeList includes methods int getLength() and Node item(int index).
How do I do the equivalent in Clojure? I expect that I will need to use defprotocol. What functions do I need to define to create a seq?
If I do a simple, naive, conversion to a list using loop and recur, I will end up with a non-lazy structure.

Most of Clojure's sequence-processing functions return lazy seqs, include the map and range functions:
(defn node-list-seq [^org.w3c.dom.NodeList node-list]
(map (fn [index] (.item node-list index))
(range (.getLength node-list))))
Note the type hint for NodeList above isn't necessary, but improves performance.
Now you can use that function like so:
(map #(.getLocalName %) (node-list-seq your-node-list))

Use a for comprehension, these yield lazy sequences.
Here's the code for you. I've taken the time to make it runnable on the command line; you only need to replace the name of the parsed XML file.
Caveat 1: avoid def-ing your variables. Use local variables instead.
Caveat 2: this is the Java API for XML, so there objects are mutable; since you have a lazy sequence, if any changes happen to the mutable DOM tree while you're iterating, you might have unpleasant race changes.
Caveat 3: even though this is a lazy structure, the whole DOM tree is already in memory anyway (I'm not really sure about this last comment, though. I think the API tries to defer reading the tree in memory until needed, but, no guarantees). So if you run into trouble with big XML documents, try to avoid the DOM approach.
(require ['clojure.java.io :as 'io])
(import [javax.xml.parsers DocumentBuilderFactory])
(import [org.xml.sax InputSource])
(def dbf (DocumentBuilderFactory/newInstance))
(doto dbf
(.setValidating false)
(.setNamespaceAware true)
(.setIgnoringElementContentWhitespace true))
(def builder (.newDocumentBuilder dbf))
(def doc (.parse builder (InputSource. (io/reader "C:/workspace/myproject/pom.xml"))))
(defn lazy-child-list [element]
(let [nodelist (.getChildNodes element)
len (.getLength nodelist)]
(for [i (range len)]
(.item nodelist i))))
;; To print the children of an element
(-> doc
(.getDocumentElement)
(lazy-child-list)
(println))
;; Prints clojure.lang.LazySeq
(-> doc
(.getDocumentElement)
(lazy-child-list)
(class)
(println))

Related

Acessing parent object fields/methods in Clojure

I have a java.awt.Frame that is a descendent of java.awt.Component. I'm trying to get the peer field of the Component, or else call .getPeer on it.
(def f (new Frame "AWT test"))
(. f setSize 400 400)
(. f setLayout (new GridLayout 3 1))
(class f) ;;java.awt.Frame
(supers (class f)) ;; #{java.awt.Container java.io.Serializable java.awt.Window java.awt.image.ImageObserver java.awt.Component java.awt.MenuContainer java.lang.Object javax.accessibility.Accessible}
I can see that Component is a superclass, but can't figure out how to access it.
(filter #(instance? java.awt.Component %) (supers (class f))) ;; () - it returns empty
Yes, I know getPeers is deprecated. I'll likely need to do some reflection work after I get the Component. And I already have the requisite add-opens in play.
You just call (.getPeer f), like any other method. No fancy business is required to call a public method, whether declared in this class or in a superclass. Of course, this only works if you're using a version of Java old enough to support this method.
I still don't have a clear answer of how to filter by the class, but I did find a way to get the peer component:
(def acc (AWTAccessor/getComponentAccessor))
(.getPeer acc f);; #object[sun.awt.X11.XFramePeer 0x2fa3dc81 "sun.awt.X11.XFramePeer#2fa3dc81(7600007)"]
That's good enough for now.

Clojure's disappearing reflection warnings

A simple reflection warning example:
lein repl
user=> (set! *warn-on-reflection* true)
true
user=> (eval '(fn [x] (.length x)))
Reflection warning, NO_SOURCE_PATH:1:16 - reference to field length can't be resolved.
#object[user$eval2009$fn__2010 0x487ba4b8 "user$eval2009$fn__2010#487ba4b8"]
I want to make this into a function. But where do reflection warnings go?
//clojure/compile.java 63
RT.errPrintWriter()
.format("Reflection warning, %s:%d:%d - reference to field %s can't be resolved.\n",
SOURCE_PATH.deref(), line, column, fieldName);
//clojure/RT.java 269
public static PrintWriter errPrintWriter(){
Writer w = (Writer) ERR.deref();
//clojure/RT.java 188
final static public Var ERR =
Var.intern(CLOJURE_NS, Symbol.intern("*err*"),
new PrintWriter(new OutputStreamWriter(System.err), true)).setDynamic();
Ok so they go to System.err. Lets capture it's output:
(def pipe-in (PipedInputStream.))
(def pipe-out (PipedOutputStream. pipe-in))
(System/setErr (PrintStream. pipe-out))
(defn reflection-check [fn-code]
(binding [*warn-on-reflection* true]
(let [x (eval fn-code)
;_ (.println (System/err) "foo") ; This correctly makes us return "foo".
n (.available pipe-in)
^bytes b (make-array Byte/TYPE n)
_ (.read pipe-in b)
s (apply str (mapv char b))]
s)))
However, calling it gives no warning, and no flushing seems to be useful:
(println "Reflection check:" (reflection-check '(fn [x] (.length x)))) ; no warning.
How can I extract the reflection warning?
You have correctly discovered how *err* is initialized, but since vars are rebindable this is no guarantee about its current value. The REPL often rebinds it to something else, e.g. a socket. If you want to redirect it yourself, you should simply rebind *err* to a Writer of your choosing.
Really I'm not sure your approach would work even if *err* were never rebound. The Clojure runtime has captured a pointer to the original value of System.err, and then you ask the Java runtime to use a new value for System.err. Clojure certainly won't know about this new value. Does the JRE maintain an extra level of indirection to allow it to do these swaps behind the scenes even for people who have already captured System.err? Maybe, but if so it's not documented.
I ran into a similar problem a while back and created some helper functions modelled on with-out-str. Here is a solution to your problem:
(ns tst.demo.core
(:use tupelo.core tupelo.test) )
(defn reflection-check
[fn-code]
(let [err-str (with-err-str
(binding [*warn-on-reflection* true]
(eval fn-code)))]
(spyx err-str)))
(dotest
(reflection-check (quote (fn [x] (.length x)))))
with result:
-------------------------------
Clojure 1.10.1 Java 14
-------------------------------
err-str => "Reflection warning, /tmp/form-init3884945788481466752.clj:12:36
- reference to field length can't be resolved.\n"
Note that binding and let forms can be in either order and still work.
Here is the source code:
(defmacro with-err-str
"Evaluates exprs in a context in which *err* is bound to a fresh
StringWriter. Returns the string created by any nested printing
calls."
[& body]
`(let [s# (new java.io.StringWriter)]
(binding [*err* s#]
~#body
(str s#))))
If you need to capture the Java System.err stream, it is different:
(defmacro with-system-err-str
"Evaluates exprs in a context in which JVM System/err is bound to a fresh
PrintStream. Returns the string created by any nested printing calls."
[& body]
`(let [baos# (ByteArrayOutputStream.)
ps# (PrintStream. baos#)]
(System/setErr ps#)
~#body
(System/setErr System/err)
(.close ps#)
(.toString baos#)))
See the docs here.
There are 5 variants (plus clojure.core/with-out-str):
with-err-str
with-system-out-str
with-system-err-str
discarding-system-out
discarding-system-err
Source code is here.

Clojure map outside scope

I am trying to save data into a collection of some sort, but the program that I have is saving everything into a separate map. I want to make it one map.
(defn readFile []
(map (fn [line] (clojure.string/split line #";"))
(with-open [rdr (reader "C:/Users/Rohil/Desktop/textfile.txt")]
(doseq [[idx line] (map-indexed vector(line-seq rdr))]
(if(.contains line "201609")
(if(not(.contains line "TBA"))
(println(assoc table :code(nth(clojure.string/split line #";")3) :instructor(nth(clojure.string/split line #";")19)))
)
)
)
)
)
)
)
Any help will be appreciated.
Looks like you are adapting to clojure :-) I went to the same process. Hang on, it will be worth it!
First: it is important to realize that map will save the result of the function into a new collection. Like cfrick mentions, println returns nil and assoc does not change a map.
I'm guessing a bit here what you are trying to do: You want to have a collection of dicts, where every dict has two keys, like so:
[
{ :code 1 :instructor "blah"}
{ :code 2 :instructor "boo" }
]
You need these values to come from a file, but you only want to save the lines where the line contains "201609" but not "TBA"
First some general remarks:
You probably want to split this function into smaller parts. One could be the check for lines (contains 201609 but not tba ), another could read the file...
I know it is the title of your question, but most likely there is a better way than to change a global variable. Maybe you could make the function readFile return the table?
try if you can pass in arguments to your function.
I'm not sure what you are trying to do with the line (doseq [[... Please give us more context there. I will ignore it
Here is a possible solution:
(ns test
(:require [clojure.string :as s]
[clojure.java.io :as io]))
(defn line-filter [include exclude line]
(and (not (s/includes? line exclude))
(s/includes? line include)))
(defn process-line [line]
(let [line-parts (s/split line #";")
code (nth line-parts 3)
instructor (nth line-parts 19)]
{:code code :instructor instructor}))
(defn read-file [file-name]
(s/split (slurp (io/resource file-name)) #"\n"))
(defn parse-lines [lines]
(map process-line lines))
(defn read-file-and-parse
"This function will read a file, process the lines, and output a collection of maps"
[filename search-for exclude]
(parse-lines
(filter #(line-filter search-for exclude %)
(read-file filename))))
you could now call this function like this: (read-file-and-parse "test.txt" "201609" "TBA")
If you want to add the result of this function into your table, you can use concat. But again, this will return a new version of your list (with new entries added) and not change the one you defined earlier.
Welcome to functional programming :-)))

Recursive Clojure function not recurring when called from last place in the calling function

This is my caller
(resolveEntity [r entity-id]
(println "resolve" entity-id)
(recursive-get r entity-id)
(cache entity-id)
)
Called function is
(defn recursive-get [r entity-id]
(println "recursive" entity-id)
(let [e (f (merge {} (-> r :conns first d/db (d/entity entity-id))))]
(alter-var-root #'cache assoc entity-id e)
(for [[k v] e]
(if (:db/isComponent (k components))
(if (not= (class v) Long)
(map #(recursive-get r %) v)
(recursive-get r v)
)))))
The called function is called just once. If I remove the last line in the caller (cache entity-id), then it recurs every time that I want it to, but I need to return something else (cache entity-id).
I tested a similar but simpler code (a recursive function not called at the tail of a calling function) the REPL and it worked, so I am left crashing my head against the table.
You have been bitten by a Lazy-Bug!
If you remove the last line then the return value of the function is the result of (recursive-get r entity-id) which the repl then iterates through so it can print it. The act of printing each value causes each entry in the lazy collection to be evaluated. When you put another line after that, the result of the map is ignored: nothing reads the entries, and they remain in the unrealized lazy state forever and the computation never happens.
To fix this wrap it in a call to dorun:
(dorun (recursive-get r entity-id))
Or if you need to save the result then use doall instead.

Why does binding affect the type of my map?

I was playing around in the REPL and I got some weird behavior:
Clojure 1.4.0
user=> (type {:a 1})
clojure.lang.PersistentArrayMap
user=> (def x {:a 1})
#'user/x
user=> (type x)
clojure.lang.PersistentHashMap
I thought that all small literal maps were instances of PersistentArrayMap, but apparently that's not the case if it's been bound with def. Why would using def cause Clojure to choose a different representation for my litte map? I know it's probably just some strange implementation detail, but I'm curious.
This question made me dig into the Clojure source code. I just spent a few hours putting print statements in the source in order to figure this out.
It turns out the two map expressions are evaluated through different code paths
(type {:a 1}) causes Java byte-code to be emitted and ran. The emitted code use clojure.lang.RT.map() to construct the map which returns a PersistentArrayMap for small maps:
static public IPersistentMap map(Object... init){
if(init == null)
return PersistentArrayMap.EMPTY;
else if(init.length <= PersistentArrayMap.HASHTABLE_THRESHOLD)
return PersistentArrayMap.createWithCheck(init);
return PersistentHashMap.createWithCheck(init);
}
When evaluating (def x {:a 1}) at least from the REPL there's no byte-code emitted. The constant map is parsed as a PersistentHashMap in clojure.lang.Compiler$MapExpr.parse() which returns it warpped it in a ConstantExpr:
else if(constant)
{
IPersistentMap m = PersistentHashMap.EMPTY;
for(int i=0;i<keyvals.length();i+= 2)
{
m = m.assoc(((LiteralExpr)keyvals.nth(i)).val(), ((LiteralExpr)keyvals.nth(i+1)).val());
}
//System.err.println("Constant: " + m);
return new ConstantExpr(m);
}
The def expression when evaluated binds the value of the ConstantExpr created above which as as said is a PersistentHashMap.
So why is it implemented this way?
I don't know. It could be simple oversight or the PersistentArrayMap optimization may not really be worth it.

Resources