XQuery count appearance of one element across all direct-parent nodes

XQuery count appearance of one element across all direct-parent nodes - xquery

In XML file:
<article>
<title>This is book AAAA</title>
<author>A</author>
<author>B</author>
</article>
<article>
<title>This is book BBB</title>
<author>A</author>
<author>C</author>
</article>
I need to use XQuery to output the author name if he/she appears in more than one <article>. In this case, author A should be outputted. Please note that one article can have multiple authors.
How should I write the XQuery?

The typical way to do this is to iterate over the distinct values of the entity you want to look up and then filter the query to your constraints using where:
let $authors := $data//article/author
for $author in distinct-values($authors)
where (count($authors[. = $author]) gt 1)
return $author
However, for large amounts of data distinct-values() may not perform well and implementation-specific methods of getting unique values may be required (e.g: using indexes).

Related

element-attribute-range-query fetching result but element-attribute-value-query is not fetching any result

I wanted to fetch the document which have the particular element attribute value.
So, I tried the cts:element-attribute-value-query but I didn't get any result. But the same element attribute value, I am able to get using cts:element-attribute-range-query.
Here the sample snippet used.
let $s-query := cts:element-attribute-range-query(xs:QName("tit:title"),xs:QName("name"),"=",
"SampleTitle",
("collation=http://marklogic.com/collation/codepoint"))
let $s-query := cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"),
"SampleTitle",
())
return cts:search(fn:doc(),($s-query))
The problem with range-query is it needs the range index. I have hundreds of DB's in multiple hosts. I need to create range indexes on each DB.
What could be the problem with attribute-value-query?

I found the issue with a couple of research.
Actually the result document is a french language document. It has the structure as follows. This is a sample.
<doc xml:lang="fr:CA" xmlns:tit="title">
<tit:title name="SampleTitle"/>
</doc>
The cts:element-attribute-value-query is a language dependent query. To get the french language results, then language needs to be mentioned in the option as follows.
cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"), "SampleTitle",("lang=fr"))
But cts:element-attribute-range-query don't require the language option.
Thanks for the effort.

DBpedia : Get list of Chinese universities and their adresses to populate google map?

I'm trying to get list of Chinese universities and their adresses. The minimum being the City/Town name. I will use these addresses to populate a googlemap, fiddle here.
I saw interesting code such as:
SELECT ?resource ?value
WHERE {
?resource a <http://dbpedia.org/class/yago/CitiesAndTownsInDenmark> .
?resource <http://dbpedia.org/property/populationTotal> ?value .
FILTER (?value > 100000)
}
ORDER BY ?resource ?value
Since CitiesAndTownsInChina doesn't work,
1. Where to find the exact name of the class I'am targeting ? and
2. Where to find dbpedia's operators manual ?
Note: I'am a very active user on Wikipedia, I'am well aware of all the data available there, but the dbpedia ontology/syntaxe/keywords is quite hard to get.
Personal note: queries on http://dbpedia.org/snorql/ , http://dbpedia.org/sparql/ , http://querybuilder.dbpedia.org/

(Expanding on my reply to How to find cities with more than X population in a certain country)
CitiesAndTownsInDenmark exists because people use the category http://en.wikipedia.org/wiki/Category:Cities_and_towns_in_Denmark in wikipedia. Wikipedia categories are pretty loose and as a result there's a lot of variation in style, so even if a useful category exists the name may not be guessable.
In addition categories are maintained manually, and may not be consistently applied.
A good place to start is looking at the data. Visiting http://dbpedia.org/page/Beijing I see yago:MetropolitanAreasOfChina which seems promising, but if you follow that link you'll see it's not well populated.
As a consequence avoid relying on the existence of such categories and directly querying for populated places in a country. This information comes from wikipedia infoboxes, and they're much more consistent than categories. Taking Beijing as an exemplar again I found:
select ?s {
?s a <http://dbpedia.org/ontology/PopulatedPlace> ;
<http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/China>
}
(The relevant properties and values for my query were found by copying link location in the Beijing page)
with the result:
"http://dbpedia.org/resource/Hulunbuir"
"http://dbpedia.org/resource/Guangzhou"
"http://dbpedia.org/resource/Chongqing"
"http://dbpedia.org/resource/Kuqa_County"
"http://dbpedia.org/resource/Changzhou"
... nearly 3000 results ...
You'll notice that position is encoded multiple times (geo:lat and long, georss:point, various dbpprop:latd longd things), and there seem to be two values excitingly. You can either simply deal with the multiple values in whichever format you prefer, or try picking just one using GROUP BY and SAMPLE.
As for a manual, almost everything I know of are academic papers, and not very useful. However the data is reasonably self documenting.

for your first question:
you can see possible classes by querying one member of your intended set of entities (ex: Shanghai).
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type WHERE {
<http://dbpedia.org/resource/Shanghai> rdf:type ?type.
FILTER regex(str(?type), ".*China", "i").
} LIMIT 100
which gives this result:
dbpedia:class/yago/MetropolitanAreasOfChina [http]
dbpedia:class/yago/PortCitiesAndTownsInChina [http]
dbpedia:class/yago/MunicipalitiesOfThePeople'sRepuBlicOfChina [http]
dbpedia:class/yago/PopulatedCoastalPlacesInChina [http]
they are CamelCase versions of the categories that you will find at the bottom of wikipedia pages. I was fooled for a while by the erroneous capitalization of RepuBlic and finally saw that it contains only 4 cities, so it is of limited use for you.
so I would propose to go with #user205512 answer and get the cities by linking 2 properties.
for your second question:
I would advice you to search/ask on http://answers.semanticweb.com

Drupal - Views. Setting a filter programmatically

I hope this is not a stupid question I have been searching for most of the day!
I have a Content Type (Documents) which simply contains a title, file and a category. The category value is required and is 'powered' by Taxonomy.
I now wish to create a view which will display these documents grouped and titled by the taxonomy term.
Using my limited Drupal knowledge I intent to iterate through the relevant terms IDs (using taxonomy_get_tree($vid)) and then render each view accordingly.
To do this I have been hoping to use this snippet.
view = views_get_view('documents');
$view->set_display($display_id);
$filter = $view->get_item($display_id, 'filter', 'field_dl_category');
$filter['value']['value'] = $filter_value;
$view->set_item($display_id, 'filter', 'field_dl_category', $filter);
$viewsoutput = $view->render();
But this is not working; when I query the value of the $filter ($view->get_item($display_id, 'filter', 'field_dl_category')) I get null returned.
Might this be that my filter name is not the same as the CCK field name?
I am using Drupal 7.
Any help much appreciated, I am running out of ideas (and time).

I finally managed to get this working but I took a slightly different approach.
I changed my view and added the relevant contextual filter and then used this function views_embed_view to get at my required results.
If this helps! this is my solution:
$display_id = 'default';
$vid = 7;
$terms = taxonomy_get_tree($vid);
foreach($terms As $term){
$content = views_embed_view('documents', $display_id, $term->tid);
//now we see if any content has been provided
if(trim($content) != ''){
print "<h3>" . $term->name . "</h3>";
print $content;
}
}
In my case the trim($content) returns '' with no data as the view template has been edited, this might not be the case for all.
I am a very new Drupal developer so I'm sure there are much better ways of doing this, if so please do post.

I am going to go ahead and assume that you want to show, using Views, a list of document nodes grouped by the category that they have been tagged with.
There are two (of maybe more) ways by which you can do this in Views 3:
(a) Choose a display style that allows you to select a Grouping field. (You could try the table style that ships with Views by default). Suppose you have properly related the node table to the taxonomy_term_data table through a Views relationship, you could choose taxonomy_term_data.name as the grouping field.
Note that this grouping is done before the view is just rendered. So, your query would just have to select a flat list of (content, tag) pairs.
(b) You could also make use of the Attachment display type to achieve something similar. Show the used categories first in a list view clicking on which will show a page (attachment) with all documents tagged in that chosen category.
To understand how to do (a) or (b), turn on the advanced_help module (which is not a Views requisite but is recommended) first.
For (a), read the section on Grouping in styles i.e. views/help/style-grouping.html and
For (b), read the section on Attachment display i.e. views/help/display-attachment.html
A couple of things about your approach:
(a) It will show all terms from that vocabulary irrespective of whether or not they were used to tag at least one document.
(b) views_embed_view() will return NULL even if the currently viewing user does not have access to the view. So, ensure that you catch that case.

Here's an alternative:
$view = views_get_view('view_machine_name');
$view->init_display('default');
$view->display_handler->display->display_options['filters']['your_filter_name']['default_value'] = 'your_value';
$view->is_cacheable = FALSE;
$view->execute();
print $view->render();
I know you can probably set this using some convoluted method and obviously that would be better. But if you just want a quick and dirty straight access without messing around this will get you there.

Xquery on MarkLogic using OR

This is a newbie MarkLogic question. Imagine an xml structure like this, a condensation of my real business problem:
<Person id="1">
<Name>Bob</Name>
<City>Oakland</City>
<Phone>2122931022</Phone>
<Phone>3123032902</Phone>
</Person>
Note that a document can and will have multiple Phone elements.
I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. The list may have a couple of dozen phone numbers in it.
I have tried this:
let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383")
let $c := cts:word-query("3933849383")
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)
which does the query properly, but it returns a sequence of Phone elements inside a Results element. My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. Example:
<results>
<match id="18" phone="2123339494" name="bob" city="oakland"/>
<match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>
So I think I need some form of cts:search that allows both this boolean capability but also allows me to specify what part of each document gets returned. At that point then I could further process the result with XPATH. I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. Thanks!

Your approach is not as bad as you might think. There are only a few changes necessary to make it work as you like.
First of all, you are better off using cts:element-value-query instead of cts:word-query. It will allow you to limit the searched values to a specific element. It performs best when you add an element range index for that element, but it is not required. It can rely on the always present word index as well.
Secondly, there is no need for the cts:or-query. Both cts:word-query and cts:element-value-query functions (as well as all other related functions) accept multiple search strings as one sequence argument. They are automatically treated as or-query.
Thirdly, the phone numbers are your 'primary key' in the result, so returning a list of all matching Phone elements is the way to go. You just need to realize that the resulting Phone element are still aware of where they came from. You can easily use XPath to navigate to parent and siblings.
Fourthly, there is nothing against looping over the search results. It may sound a bit weird, but it doesn't cost much extra performance. Actually, it is pretty much negligable, in MarkLogic Server that is. Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away.
To get what you ask, you could use the following code:
<results>{
for $phone in
cts:search(
doc()/Person/Phone,
cts:element-value-query(
xs:QName("Phone"),
("3738494044", "2373839383", "3933849383")
)
)
return
<match id="{data($phone/../#id)}" phone="{data($phone)}" name="{data($phone/../Name)}" city="{data($phone/../City)}"/>
}</results>
Best of luck.

Here's what I would do:
let $numbers := ("3738494044", "2373839383", "3933849383")
return
<results>{
for $person in cts:search(/Person, cts:element-value-query(xs:QName("Phone"),$numbers))
return
<match id="{data($person/#id)}" name="{data($person/Name)}" city="{data($person/City)}">
{
for $phone in $person/Phone[cts:contains(.,$numbers)]
return element phone {$phone}
}
</match>
}
First, there's an implicit OR when passing multiple values into word-query and value-query and their cousins, and this query is more efficiently resolved from the indexes, so do this when you can.
Second, an individual might match on more than one phone number, so you need that additional inner loop to effectively group by individual.
I would not create a range index for this - no need, and it isn't necessarily faster. There are indexes for element values by default, so you can leverage those with element-value-query.
You could do all of this with the SearchAPI and a little XSLT. That would make it easy to start combining names and numbers and other conditions in a single query.

How to display the total amount of posts?

What's the best way to display in a block the total amount of posts and comments of my entire drupal website?
thanks

The quick and dirty way:
Ensure that you have PHP filter installed and available to you. Create a block with the php code
<?php
$ncount = db_query("SELECT COUNT(nid) FROM {node} WHERE status=%d", 1);
$ccount = db_query("SELECT COUNT(cid) FROM {comments} WHERE status=%d", 1);
print "Nodes: ".$ncount;
print "Comments: ".$ccount;
?>

One option is to use a View with the block display type. Views Calc can do the summing for you (http://drupal.org/project/views_calc).
Honestly, I think you will find it easier and possibly more performant to create a Statistics content type with CCK integer fields to store the initial values for the amount of each piece of information you need. Then configure the Rules module to increment/decrement the fields when you add or remove content/comments.
A third option I have not personally explored is the Statistics Pro module (http://drupal.org/project/statspro), which says it is Views-compatible.

Use Views GroupBy module ( http://drupal.org/project/views_groupby ). You can specify the filters (e.g. you want to count nodes of particular type only) and so on. It will count the nodes for you.
If your view type is comment then in a similar count can be done on comments.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex