xquery using fn:count function on sequence - xquery

I am updating a file that returns a sequence of figure elements from a search. The count function works just fine to count the total number of items returned:
let $total-count := fn:count($result)
However, I also need to count the number of figure elements that contain video elements as well as those that contain graphic elements, but these statements do not work:
let $vid-count := fn:count($result//video)
let $graph-count := fn:count($result//graphic)
Any ideas?

You're counting the video (respective graphic) elements contained inside all $result elements. If you want to count elements which contain another element, use predicates instead of axis steps.
let $vid-count := fn:count($result[.//video])
let $graph-count := fn:count($result[.//graphic])

Related

Filtering data, comma vs not comma

I have the following code
#abnormal return
exp.ret <- lm((RET-rf)~mkt.rf+smb+hml, data=tesla[tesla$period=="estimation.period",])
tesla$abn.ret <- (tesla$RET-tesla$rf)-predict(exp.ret,tesla)
#CAR during event window
CAR <- sum(tesla$abn.ret[tesla$period=="event.period",])
First section runs fine, but second gets this error:
"Error in tesla$abn.ret[tesla$period == "event.period", ] :
incorrect number of dimensions
I know that the solution is to remove the last comma:
#CAR during event window
CAR <- sum(tesla$abn.ret[tesla$period=="event.period"])
Just wondering what is the right pedagogical way of understanding it, why do I need a comma in the end in some cases, but some not, when I'm filtering for only parts of the data frame.
$ sign, [[]] and [] have different meanings.
In short:
$ sign and [[]] subsets one column of a dataframe or one item of a list.
The output of a subsetted dataframe will be a vector, while the output of a subsetted list will be a variable the same class as the original item, which can be a dataframe, another list, etc...
It's important to note that $ doesn't accept a column index (only a column name) and that you cannot insert two column names/index after $ or inside [[]].
[] slices a dataframe or a list sorting out one or more elements.
the class of the output variable will be the same as the original variable.
if you slice a dataframe using [], the output will be a dataframe, the same applies for lists, etc...
In your specific case, you used $ sign to subset your variable. Then, you tried to slice this output from the subset action using [ , ], but it turned out that the output is a vector, and a vector has always only one dimension and an error was fired. You should slice your vector using [] (the output will be a vector) or [[]] (the output will be a vector with length = 1).
Possible ways to subset tesla as you wish:
tesla$abn.ret[tesla$period == "event.period"]
tesla[["abn.ret"]][tesla$period == "event.period"]
tesla[tesla$period == "event.period", "abn.ret"]
You would achieve the same result using tesla[["period"]] instead of tesla$period.
For some extra details/examples, refer to An introduction to R, published by CRAN.
I hope it helped you somehow..!
tesla$abn.ret is one-dimensional. Each comma separates a dimension, so yours implies 2 dimensions.
Alternatively you could run
tesla[tesla$period=="event.period", "abn.ret"]
And get the same results, since tesla is 2-d.
If you look at the documentation with command ?'[', you find that the default behaviour of syntax x[i] is to drop one dimension away.
If you want to disable the dropping of the dimension, you have explicitly to write x[i,drop=False].

Regex in R match specified words when they all (two or more) occur in whatever order within certain distance in particular line

I have a double challenge.
First, I want to match lines that contain two (or eventually more) specified words within certain distance in whatever order.
Using lookaround I manage to select lines matching two or more words, regardless of the order within they occur. I can also easily add more words to be found in the same line, so it this can also be applied without much effort when more word must occur in order to be selected. The disadvantage is that can't detail the maximal distance between them.
^(?=.*\john)(?=.*\jack).*$
By using the pipe operator I can detail both orders in which the terms may occur as well as the accepted distance between them, but when more words should be matched the code becomes lengthy and errorsensitive.
jack.{0,100}john|john.{0,100}jack
Is there a way to combine the respective advantages of both approaches in one regular expression?
Second, ideally I would like that only 'jack' and 'john' (and are selected in the line but not the whole line.
Is there a possibility to do this all at once?
For this case, you have to use the second approach. But it can't be possible with regex alone.. You have to ask for language tools help like paste in-order to build a regex (given in the second format).
In python, I would do like below to create a long regex.
>>> def create_reg(lis):
out = []
for i in lis:
out.append(''.join(i) + '|' + ''.join([i[2],i[1], i[0]]))
return '(?:' + '|'.join(out) + ')'
>>> lst = [('john', '{0,100}', 'jack'), ('foo', '{0,100}', 'bar')]
>>> create_reg(lst)
'(?:john{0,100}jack|jack{0,100}john|foo{0,100}bar|bar{0,100}foo)'
>>>

Merge sort, the recursion part

After studying the merge sort for a couple of days, I understand it conceptually, but there is one thing that I don't get.
What I get:
1.) It takes a list, for example an array of numbers and splits it in half and sorts the two halfs, and in the end merges them together.
2.) Because it's an recursive algorithm it uses recursion to do that.
So the split of the mentioned array looks like this:
It, splits the array until there is only one item in each list and by that its considered sorted. And at that point the merge steps in.
Which should look like this:
What I don't get is, how does the recursion "know" after it splits all the lists to only one item in a list, to get back up the recursion tree? How does something that has a left and right side become the left side after it merges?
The thing that bothers me is this. I've taken a snapshot of the code from interactivepython page
How does the code get to the point, after we have lefthalf = 2, and righthalf = 1, to to code that's shown in the picture where the lefthalf = [1,2] and righthalf = [4,3] without going back to the recursion that would divide what we have have merged?
Tnx,
Tom
Once the list only contains one element, each pair of leaves are sorted and joined. Then you can traverse through the list and find out where the next pair should be inserted. The recursion "knows" nothing about going back up the recursion tree, rather it is the act of sorting and joining that has this effect.
The "recursion" does of course know nothing of that sort. It is the code that uses the recursion, which looks like this (a bit simplified):
sort list = merge (sort left_half) (sort right_half)
where
(left_half, right_half) = split list
Here you see that the "recursion" (i.e. the recursive invocations of sort) don't need to "know" anything. Their only job is to deliver a sorted list, array or whatever.
To put it differently: If we have merge satisfying the following invariant:
1. `merge`, given two sorted lists, will return a sorted list.
then we can write mergesort easily like outlined above. What is left to do in sort is to handle the easy cases: empty list, singleton and list with two elements.
If you are talking about odd numbered sub lists, then it is dependant on the implementation.
It either puts the bigger sub list on the left every single time, or it puts it on the right every single time.

Checking last element in a boost::fusion::for_each loop

I want to know if there is a way to check for the last element in a fusion for_each loop (in order to apply special code for this case)
Edit : Maybe a better question should be :
I have played with fusion::for_each, now I want to apply code on each element of a fusion sequence with special code (special code does not mean "extra code" but different code) for the last element. May be I should use iterators (an example please)?
Some ideas:
1) use boost::fusion::fold, count your way though, and on the last one, perform your edit
2) if all types in the tuple are heterogenous, match on type to determine last one
3) include some sort of marker for the last one on which you can match
4) use the 'prior(end(v))' operators to manipulate the last element when for_each processing is complete

Stopping a large number of zeros being printed (not scientific notation)

What I'm trying to achieve is to have all printed numbers display at maximum 7 digits. Here are examples of what I want printed:
0.000000 (versus the actual number which is 0.000000000029481.....)
0.299180 (versus the actual number which is 0.299180291884922.....)
I've had success with the latter types of numbers by using options(scipen=99999) and options(digits=6). However, the former example will always print a huge number of zeros followed by five non-zero digits. How do I stop this from occurring and achieve my desired result? I also do not want scientific notation.
I want this to apply to ALL printed numbers in EVERY context. For example if I have some matrix, call it A, and I print this matrix, I want every element to just be 6-7 digits. I want this to be automatic for every print in every context; just like using options(digits=6) and options(scipen=99999) makes it automatic for every context.
You can define a new print method for the type you wish to print. For example, if all your numbers are doubles, you can create
print.double=function(x){sprintf("%.6f", x)}
Now, when you print a double (or a vector of doubles), the function print.double() will be called instead of print.default().
You may have to create similar functions print.integer(), print.complex(), etc., depending on the types you need to print.
To return to the default print method, simply delete the function print.double().
Are all your numbers < 1? You could try a simple sprintf( "%.6f", x ). Otherwise you could try wrapping things to sprintf based on the number of digits; check ?sprintf for other details.

Resources