I am trying to get list of all the authors who have had more than 3 piece of work - DBpedia Sparql - bigdata

I am trying to get list of all the authors who have had 3 or more piece of work done (in DBpedia).
my example can be run on : http://dbpedia.org/sparql
base code
select (count(?work) as ?totalWork), ?author
Where
{
?work dbo:author ?author.
}
GROUP BY ?author
I get each authors total amount of piece of work done. But when I try to filter to show only list of author that have more than 3 piece of work. I get error:
I tried HAVING keyword or using FILTER keyword.
Using Filter
select (count(?work) as ?tw), ?author
Where
{
?work dbo:author ?author.
FILTER (?work > 3).
}
GROUP BY ?author
error: Virtuoso 22023 Error VECDT: SR066: Unsupported case in CONVERT (INTEGER -> IRI_ID)
Using HAVING keyword
select (count(?work) as ?tw), ?author
Where
{
?work dbo:author ?author.
}
GROUP BY ?author
HAVING (?tw > 3)
Virtuoso 37000 Error SP031: SPARQL compiler: Variable ?tw is used in the result set outside aggregate and not mentioned in GROUP BY clause

Using HAVING is correct, but there is a limitation in SPARQL with indirectly referring to aggregates.
This one works:
SELECT (count(?work) as ?tw) ?author
WHERE
{
?work dbo:author ?author.
}
GROUP BY ?author
HAVING (count(?work) > 3)

HAVING (?tw > 3) is correct SPARQL. HAVING filters after assignments due to SELECT, so ?tw is visible, and before projection.
(prefix ((dbo: <http://purl.org/dc/elements/1.1/>))
(project (?tw ?author)
(filter (> ?tw 3)
(extend ((?tw ?.0))
(group (?author) ((?.0 (count ?work)))
(bgp (triple ?work dbo:author ?author)))))))
where ?.0 is the assignment of count.

Related

MDX error trying to compare one hierarchy level to another one

I have an MDX issue that I really don't understand with a 5 level hierarchy "SEGMENTATION" : AFFAIRE/NIVEAU 1/ NIVEAU 2/NIVEAU 3/NIVEAU 4
I want to compare "NIVEAU 1" sub-levels weight to "Niveau 1".
For instance, I want to know for each 'NIVEAU 3' members its contributions part for its "NIVEAU 1".
I've tried a bunch of things, but nothing works properly. I don't get the trick and is stucked to :
WITH MEMBER [Measures].[TEST] AS'
iif(ISEMPTY(([Segmentation].[Niveau1], [Measures].[Total])) OR ([Segmentation].[Niveau1],[Measures].[Total]) = 0
, NULL
,[Measures].[Total] / ([Segmentation].[Niveau1], [Measures].[Total])
)'
SELECT NON EMPTY { [Measures].[TEST],[Measures].[Total]} ON COLUMNS
, NON EMPTY { [Segmentation].[Niveau2]}
ON ROWS FROM ( SELECT ( { [Segmentation].[Niveau1].&[8589934592]&[1|DESC111] } ) ON COLUMNS FROM [CUBE]) // Only one "Niveau 1" focus
And I get :
<Niveau 2> TEST Total
SF - C... #Error 25143658
SF - M... #Error 1638913,5
ZZZ ... #Error 90468628
#Error : The EqualTo function expects a string or numeric expression for argument 1. A tuple set expression was used.
The expected result is :
<Niveau 2> TEST Total
SF - C... 21,44% 25143658
SF - M... 1,40% 1638913,5
ZZZ ... 77,16% 90468628
21,4% = 25143658/(25143658+1638913,5+90468628)
What's wrong with my MDX?
Is there a mistake among the dimension or hierarchy set up?
Tuples are written as comma separated lists of members. What you have is a dimension.
Try
[Segmentation].CurrentMember.Parent
Instead of
[Segmentation].[Niveau1]
On your measure definition.
[EDIT] As mentioned in a comment, the goal is a solution that works on all levels. The solution is to use
Ancestor( [Segmentation].CurrentMember, [Segmentation].[Niveau1] )
in the Tuple used in the custom measure definition.
Thanks to nsousa, I'm now using :
WITH MEMBER [Measures].[Total Niveau1] AS'
iif([Segmentation].CURRENTMEMBER.level.ordinal>=2
,(Ancestor([Segmentation].CurrentMember,[Segmentation].[Niveau1] ),[Measures].[Total])
,([Segmentation].CURRENTMEMBER, [Measures].[Total])
)
'
MEMBER [Measures].[TEST] AS'
DIVIDE([Measures].[Societe],[Measures].[Total Niveau1])
',FORMAT_STRING = 'Percent'
SELECT NON EMPTY { [Measures].[TEST],[Measures].[Societe],[Measures].[Total]} ON COLUMNS
, NON EMPTY { [Segmentation].[Niveau3]}
ON ROWS FROM [CUBE]

TopK function over Distributed Engine In clickhouse returns only 10 records

I'm running the following query
select topK(30)(Country) from distributed_table
note: distributed_table's engine is Distributed.
and even though there are over 100 possible "country" values, the query returns only 10.
Also, when I run it on local table , I'm getting more than 10 results.
Have I missed out some crucial configuration?
It looks like the problem occurs when intermediate results from shards are combined to the final result.
Let's check the results from each shard (will use distributed_group_by_no_merge-setting to disable the merging of intermediate results from each shard):
select any(_shard_num), topK(30)(Country)
from distributed_table
SETTINGS distributed_group_by_no_merge = 1
On each shard, the topK-function works correctly so as a workaround you can combine all intermediate results manually:
SELECT arrayDistinct(
arrayMap(x -> x.1,
/* sort values by frequency */
arraySort(x -> x.2,
/* converts an array of arrays to a flat array */
flatten(
/* group results from shards to one array */
groupArray(
/* assign each value the index number */
arrayMap((x, index) -> (x, index), shard_result, arrayEnumerate(shard_result))))))) ordered_value
FROM (
select topK(30)(Country) AS shard_result
from distributed_table
SETTINGS distributed_group_by_no_merge = 1)

Update dictionary key inside list using map function -Python

I have a dictionary of phone numbers where number is Key and country is value. I want to update the key and add country code based on value country. I tried to use the map function for this:
print('**Exmaple: Update phone book to add Country code using map function** ')
user=[{'952-201-3787':'US'},{'952-201-5984':'US'},{'9871299':'BD'},{'01632 960513':'UK'}]
#A function that takes a dictionary as arg, not list. List is the outer part
def add_Country_Code(aDict):
for k,v in aDict.items():
if(v == 'US'):
aDict[( '1+'+k)]=aDict.pop(k)
if(v == 'UK'):
aDict[( '044+'+k)]=aDict.pop(k)
if (v == 'BD'):
aDict[('001+'+k)] =aDict.pop(k)
return aDict
new_user=list(map(add_Country_Code,user))
print(new_user)
This works partially when I run, output below :
[{'1+952-201-3787': 'US'}, {'1+1+1+952-201-5984': 'US'}, {'001+9871299': 'BD'}, {'044+01632 960513': 'UK'}]
Notice the 2nd US number has 2 additional 1s'. What is causing that?How to fix? Thanks a lot.
Issue
You are mutating a dict while iterating it. Don't do this. The Pythonic convention would be:
Make a new_dict = {}
While iterating the input a_dict, assign new items to new_dict.
Return the new_dict
IOW, create new things, rather than change old things - likely the source of your woes.
Some notes
Use lowercase with underscores when defining variable names (see PEP 8).
Lookup values rather than change the input dict, e.g. a_dict[k] vs. a_dict.pop(k)
Indent the correct number of spaces (see PEP 8)

LINQ: Get all members with LAST order failed

I'm learning LINQ, and I'm trying to figure out how to get all members with the last order failed (each member can have many orders). For efficiency reasons I'd like to do it all in LINQ before putting it into a list, if possible.
So far I believe this is the right way to get all the members with a failed order which joined recently (cutoffDate is current date -10 days).
var failedOrders =
from m in context.Members
from o in context.Orders
where m.DateJoined > cutoffDate
where o.Status == Failed
select m;
I expect I need to use Last or LastOrDefault, or possibly I need to use
orderby o.OrderNumber descending
and then get the First or FirstOrDefault as suggested in this stackoverflow answer.
Note that I want to look at ONLY the last order for a given member and see if that has failed (NOT just find last failed order).
Normally you would write something like:
var failedOrders = from m in context.Members
where m.DateJoined > cutoffDate
select new
{
Member = m,
LastOrder = m.Orders.OrderByDescending(x => x.OrderNumber).FirstOrDefault()
} into mlo
// no need for null checks here, because the query is done db-side
where mlo.LastOrder.Status == Failed
select mlo; // or select mlo.Member to have only the member
This if there is a Members.Orders relationship

MDX - distinct count

I was following this article:
http://msdn.microsoft.com/en-us/library/aa902637%28v=sql.80%29.aspx
and my query for distinct count looks like this:
Count(CrossJoin({[Measures].[Submission Count]}, [Submission].[PK Submission].Members), ExcludeEmpty)
it returns always 1 more than it should (for example it returns 27 instead of 26).
In the same article there is this query (which is suppose to solve this problem):
Count(CrossJoin( {[Sales]},
Descendants([Customers].CurrentMember, [Customer Names])),
ExcludeEmpty)
But I can't get it to work. I've tried these two but second one always returns 1 or 0 while the first one doesn't work (error: I have to explicitly define a level):
Count(CrossJoin( {[Measures].[Submission Count]},
Descendants([Submission].CurrentMember, [Submission].[PK Submission])),
ExcludeEmpty)
Count(CrossJoin( {[Measures].[Submission Count]},
Descendants([Submission].[PK Submission].CurrentMember, [Submission].[PK Submission])),
ExcludeEmpty)
Any idea what am I doing wrong?
Thanks!
The reason the first query returns "1 more than it should" is because the [Submission].[PK Submission].Members tuple set also includes the All member.
If you refer to the [PK Submission] level instead of all the members of the [PK Submission] hierarchy, it doesn't include the All member.
So, the following returns what you're expecting:
Count( CrossJoin( { [Measures].[Submission Count] }
, { [Submission].[PK Submission].[PK Submission] })
, ExcludeEmpty)

Resources