Question about Puppeteer xpath element evaluation - web-scraping

I am working on some puppeteer code and ran into an issue that I'm looking to better understand.
When working with XPaths I'm having trouble wrapping my head around why this code works:
await page.goto(url)
//identify element with absolute xpath then click
const b = (await page.$x("<absolute XPath>"))[0]
//Works!
b.click()
//But this won't work
const b = (await page.$x("<absolute XPath>"))
b.click()
//And this won't work
b.click()
const b = await page.$x("<absolute XPath>")
Why does encasing the await page.$x statement in parenthesis work? What's happening there?
When I remove the parenthesis, remove the array index, any changes... etc... I then get the standard errors like "click is not a function of b" for example. I was under the impression that I'm evaluating an absolute XPath... a single element. Where does the Array come from? And what is happening when wrapping the entire thing in parenthesis, and specifying the array element, that evaluates this out perfectly and straight to the element I want with no issues?
Would appreciate any insight or links to the right information. I hope it's not something I missed in the docs

It is not parenthesis but [0] changes everything.
xpath can find many elements so it always returns list/array - even if it finds only one element or nothing - and you have to use [0] to get first element from list. Or you can use other index to get other element from list (if list is longer). (Of course list can be also empty and [0] may raise error)
You can't do list.click(). You have to do first = list[0] and later first.click(). Or you may use forEach to execute click on all elements on list.
It needs parenthesis only to execute commands in correct order - first await, later [0].
If you don't use parenthesis then it will try first execute [0] and later await.
So you could rewrite it on more lines
const all_results = await page.$x("<XPath>")
const first = all_results[0]
first.click()
or write shorter
const all_results = await page.$x("<XPath>")
all_results[0].click()
or
const first = (await page.$x("<XPath>"))[0]
first.click()
If you would use console.log(b) then you should see what you really have in this variable in different versions.
BTW:
The same can be with CSS selector - page.$("<css selection>") - it also returns list/array.
EDIT:
As #ggorlen he mentioned in a comment you can assign first element using array on left size
[first] = await page.$x("<XPath>")
first.click()

Related

Get array's first item as object in TinkerPop3 Gremlin query and JanusGraph

I faced this issue during a migration of gremlin queries from v2 to v3.
V2-way: inE().has(some condition).outV().map().toList()[0] will return an object. This is wrapped in transform{label: it./etc/} step.
V3-way, still WIP: inE().has(some condition).outV().fold() will return an array. This is wrapped in project(...).by(...) step.
V3 works fine, I just have to unwrap an item from the array manually. I wonder if there is a more sane approach (anyway, this feels like non-graph-friendly step).
Environment: JanusGraph, TinkerPop3+. For v2: Titan graph db and TinkerPop2+.
Update: V3 query sample
inE('edge1').
has('cond1').outV(). // one vertex left
project('items', 'count'). // pagination
by(
order().
by('field1', decr).
project('vertex_itself', 'vertex2', 'vertices3').
by(identity()).
by(outE('edge2').has('type', 'type1').limit(1).inV().fold()). // now this is empty array or single-element array, can we return element itself?
by(inE('edge2').has('type', 'type2').outV().fold()).
fold()).
by(count())
Desired result shape:
[{
items: [
{vertex_itself: Object, vertex2: Object/null/empty, veroces3: Array},
{}...
],
cont: Number,
}]
Problem: vertex2 property is always an array, empty or single-element.
Expected: vertex2 to be object or null/empty.
Update 2: it turns out my query is not finished yet, it returns many object if there are no single element in has('cond1').outV() step, e.g. [{items, count}, {items, count}...]
it looks like your main issue is getting a single item from the traversal.
you can do this with next(), which will retrieve the next element in the current traversal iteration:
inE().has(some condition).outV().next()
the iteratee's structure is, i think, implementation specific. e.g. in javascript, you can access the item with the value property:
const result = await inE().has(some condition).outV().next();
const item = result.value;
I may not fully understand, but it sounds like from this:
inE().has(some condition).outV().fold()
you want to just grab the first vertex you come across. If that's right, then is there a reason to fold() at all? maybe just do:
inE().has(some condition).outV().limit(1)

complex reduce sample unclear how the reduce works

Starting with complex reduce sample
I have trimmed it down to a single chart and I am trying to understand how the reduce works
I have made comments in the code that were not in the example denoting what I think is happening based on how I read the docs.
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn); //set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for add below
return function(elements, item) {
//get the position of the key value for this element in the sorted array and put it there
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
function groupArrayRemove(keyfn) {
var bisect = d3.bisector(keyfn);//set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for remove below
return function(elements, item) {
//get the position of the key value for this element in the sorted array and splice it out
var pos = bisect.left(elements, keyfn(item));
if(keyfn(elements[pos])===keyfn(item))
elements.splice(pos, 1);
return elements;
};
}
function groupArrayInit() {
//for each key found by the key function return this array?
return []; //the result array for where the data is being inserted in sorted order?
}
I am not quite sure my perception of how this is working is quite right. Some of the magic isn't showing itself. Am I correct that elements is the group the reduce function is being called on ? also the array in groupArrayInit() how is it being indirectly populated?
Part of me feels that the functions supplied to the reduce call are really array.map functions not array.reduce functions but I just can't quite put my finger on why. having read the docs I am just not making a connection here.
Any help would be appreciated.
Also have I missed Pens/Fiddles that are created for all these examples? like this one
http://dc-js.github.io/dc.js/examples/complex-reduce.html which is where I started with this but had to download the csv and manually convert to Json.
--------------Update
I added some print statements to try to clarify how the add function is working
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn); //set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for add below
return function(elements, item) {
console.log("---Start Elements and Item and keyfn(item)----")
console.log(elements) //elements grouped by run?
console.log(item) //not seeing the pattern on what this is on each run
console.log(keyfn(item))
console.log("---End----")
//get the position of the key value for this element in the sorted array and put it there
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
and to print out the group's contents
console.log("RunAvgGroup")
console.log(runAvgGroup.top(Infinity))
which results in
Which appears to be incorrect b/c the values are not sorted by key (the run number)?
And looking at the results of the print statements doesn't seem to help either.
This looks basically right to me. The issues are just conceptual.
Crossfilter’s group.reduce is not exactly like either Array.reduce or Array.map. Group.reduce defines methods for handling adding new records to a group or removing records from a group. So it is conceptually similar to an incremental Array.reduce that supports an reversal operation. This allows filters to be applied and removed.
Group.top returns your list of groups. The value property of these groups should be the elements value that your reduce functions return. The key of the group is the value returned by your group accessor (defined in the dimension.group call that creates your group) or your dimension accessor if you didn’t define a group accessor. Reduce functions work only on the group values and do not have direct access to the group key.
So check those values in the group.top output and hopefully you’ll see the lists of elements you expect.

Using Rascal MAP

I am trying to create an empty map, that will be then populated within a for loop. Not sure how to proceed in Rascal. For testing purpose, I tried:
rascal>map[int, list[int]] x;
ok
Though, when I try to populate "x" using:
rascal>x += (1, [1,2,3])
>>>>>>>;
>>>>>>>;
^ Parse error here
I got a parse error.
To start, it would be best to assign it an initial value. You don't have to do this at the console, but this is required if you declare the variable inside a script. Also, if you are going to use +=, it has to already have an assigned value.
rascal>map[int,list[int]] x = ( );
map[int, list[int]]: ()
Then, when you are adding items into the map, the key and the value are separated by a :, not by a ,, so you want something like this instead:
rascal>x += ( 1 : [1,2,3]);
map[int, list[int]]: (1:[1,2,3])
rascal>x[1];
list[int]: [1,2,3]
An easier way to do this is to use similar notation to the lookup shown just above:
rascal>x[1] = [1,2,3];
map[int, list[int]]: (1:[1,2,3])
Generally, if you are just setting the value for one key, or are assigning keys inside a loop, x[key] = value is better, += is better if you are adding two existing maps together and saving the result into one of them.
I also like this solution sometimes, where you instead of joining maps just update the value of a certain key:
m = ();
for (...whatever...) {
m[key]?[] += [1,2,3];
}
In this code, when the key is not yet present in the map, then it starts with the [] empty list and then concatenates [1,2,3] to it, or if the key is present already, let's say it's already at [1,2,3], then this will create [1,2,3,1,2,3] at the specific key in the map.

SELECT COUNT(*) doesn't work in QML

I'm trying to get the number of records with QML LocalStorage, which uses sqlite. Let's take this snippet in account:
function f() {
var db = LocalStorage.openDatabaseSync(...)
db.transaction (
function(tx) {
var b = tx.executeSql("SELECT * FROM t")
console.log(b.rows.length)
var c = tx.executeSql("SELECT COUNT(*) FROM t")
console.log(JSON.stringify(c))
}
)
}
The output is:
qml: 3
qml: {"rowsAffected":0,"insertId":"","rows":{}}
What am I doing wrong that the SELECT COUNT(*) doesn't output anything?
EDIT: rows only seems empty in the second command. Calling
console.log(JSON.stringify(c.rows.item(0)))
gives
qml: {"COUNT(*)":3}
Two questions now:
Why is rows shown as empty
How can I access the property inside c.rows.item(0)
In order to visit the items, you have to use:
b.rows.item(i)
Where i is the index of the item you want to get (in your first example, i belongs to [0, 1, 2] for you have 3 items, in the second one it is 0 and you can query it as c.rows.item(0)).
The rows field appears empty and it is a valid result, for the items are not part of the rows field itself (indeed you have to use a method to get them, as far as I know that method could also be a memento that completely enclose the response data) and the item method is probably defined as not enumerable (I cannot verify it, I'm on the beach and it's quite difficult to explore the Qt code now :-)). You can safely rely on the length parameter to know if there are returned values, thus you can iterate over them to print them out. I did something like that in a project of mine and it works fine.
The properties inside item(0) have the same names given for the query. I suggest to rewrite that query as:
select count(*) as cnt from t
Then, you can get the count as:
c.rows.item(0).cnt

Traverse a tree recursively and return an un-muted list of all values found? Possible?

I wonder if this is even possible at all as the question suggest.
My problem is that I cannot seem to grasp how to handle the fact that a given input value can have multiple children. The problem is easily solved by using the mutable SortedSet variable as shown below. But I would really like to find out if this is a problem possible to solve with pure recursion and creation of new un-muted lists or similar. I hope my question is clear. I fear I'm ignorant to the easy conclusion that it's not possible.
As you can see bellow the if(true) will return a list but the else will return a list of list. So the code bellow is not in working state.
let someSet = new System.Collections.Generic.SortedSet<string>()
let rec children(value:string,listSoFar) =
printfn "ID: %A" value
someSet.Add(value) works fine of course.
let newList = List.append listSoFar [value]
if(not (hasChildren(value))) then
newList
else
let tmpCollection = database.GetCollection<Collection>("Collection")
let tmpQuery = Query.EQ("Field",BsonValue.Create(value))
let tmpRes = tmpCollection.Find(tmpQuery)
[ for child in tmpRes do
yield children(child.Value,newList) ]
let resultList = children("aParentStartValue",[])
//Or do i need to use someSet values?
Unless the tree is very deeply nested (in which case, this would be inefficient), you can write the code as a recursive F# sequence expression that generates elements using yield and yield!
let rec children (value:string) = seq {
// Produce the current value as the next element of the sequence
yield value
if hasChildren value then
// If it has children, then get all the children
let tmpCollection = database.GetCollection<Collection>("Collection")
let tmpQuery = Query.EQ("Field",BsonValue.Create(value))
let tmpRes = tmpCollection.Find(tmpQuery)
// For each child, generate all its sub-children recursively
// and return all such elements as part of this sequence using 'yield!'
for child in tmpRes do
yield! children child.Value }
// Using 'List.ofSeq' to fully evaluate the lazy sequence
let resultList = List.ofSeq (children "aParentStartValue")
If the tree is more deeply nested, then the situation is a bit more difficult. When iterating over all the children, you'd need to pass the list collected so far to the first children, get the results and then pass the resulting list to the next children (using something like List.fold). But the above is clean and should work in most cases.

Resources