Accurate document counts in terms aggregations with elastic search - count

I need the exact count of terms aggregations and I saw that elastic search is not always accurate. I wonder if there is another solution to overcome this constraint.

You can set size to 0 in the aggregation query:
{
"aggs" : {
"products" : {
"terms" : {
"field" : "product",
"size" : 0
}
}
}
}
But as per the documentation:
It is possible to not limit the number of terms that are returned by setting size to 0. Don’t use this on high-cardinality fields as this will kill both your CPU since terms need to be return sorted, and your network.

Related

Querying wordpress with meta queries from Gatsby

I'm trying to fetch data from my Wordpress backend using a meta query. I'm using this plugin:
https://www.wpgraphql.com/extenstion-plugins/wpgraphql-meta-query/
I can run my query in GraphiQL IDE in Wordpress, but not in Gatsbys GraphiQL tool.
I get this error:
Unknown argument "where" on field "Query.allWpPage"
Query:
query test {
allWpPage(
where: {metaQuery: {
relation: OR,
metaArray: [
{
key: "some_value",
value: null,
compare: EQUAL_TO
},
{key: "some_value",
value: "536",
compare: EQUAL_TO
}
]
}}
) {
edges {
node {
id
uri
}
}
}
}
I've tried deleting the cache directory and rebuilding, didn't help.
And just to clarify, I have no problems running other queries and getting ACL-data and what not. The only problem I have (right now) is exposing the where argument to Gatsby.
where filter is restricted in Gatsby. Here you have a detailed list of comparators, but they are:
eq (equals)
ne (not equals)
in (includes)
nin (not includes)
lt, lte, gt, gte (less than, equal or less than, greater than, equal or greater than respectively)
regex, glob (regular expression)
elemMatch (element matches)
On the other hand, there is a list of filters available. In your case, filter is what you are looking for. Your final query should look like:
query test {
allWpPage(
filter : {uri : {ne : "" }}
) {
edges {
node {
id
uri
}
}
}
}
Of course, adapt the filter to your needs. elemMatch should work for you either.
You will need to add each condition for each property of the object you're trying to match.
Why is where restricted?
Because it belonged to Sift, a library that Gatsby was using to use MongoDB queries, where where is available. Since Gatsby 2.23.0 (June 2020) this library is not being used anymore. More details at History and Sift:
For a long time Gatsby used the Sift library through which you can use
MongoDB queries in JavaScript.
Unfortunately Sift did not align with how Gatsby used it and so a
custom system was written to slowly replace it. This system was called
“fast filters” and as of gatsby#2.23.0 (June 2020) the Sift library is
no longer used.

DynamoBD/Amplify non-negative field and field validation on mutations

I am new to AWS in general, I am building a relatively simple application with Amplify, but I've used Google Firebase before. My question is: Is there a way to set a constrain for a field to be non-negative? I have an application that does transactions and I don't want my balance to be negative. I just need a simple error/exception. Is it possible to set a field constraint in DynamoDB that says "This field should be >= 0"?.
I also checked if it was possible to do it in the VTL amplify generated resolver of my graphql mutation, and indeed it is possible to set some constraints, But somehow it allows the operation and crashes on the next one (when the balance on the DB is already < 0, like if it checks it before the update). I tried saying something like "current_balance - transaction >= 0" but I couldn't get it to work.
So it seems that the only way is to create a custom lambda resolver that does the various checks before submitting the mutation to DynamoDB. I haven't tried it yet but I don't understand how I can do a check on the current balance (stored in the DB) without doing a query.
More in general is it even possible to validate fields (even with simple assertions like non-negative) on amplify/dynamoDB? Moving to another DB like Aurora would help?
Thanks for you help
DynamoDb supports conditional updates which allow an update to be applied when the given condition is met. You can set the condition current_balance >= cost for your update.
However, the negative balance is not the main problem. What you should address is how to prevent other requests from updating the same current_balance at the same time, or in short, race conditions on current_balance. In order to deal with that, you also need a conditional update whose condition is "current_balance = initial_balance". The initial_balance is, I guess, what you get from DynamoDB at the very beginning of the purchase process.
Sample VTL code
#set( $remaining_balance = $initial_balance - $transaction_cost )
#if( $remaining_balance < 0 )
$util.error("Insufficient balance")
#end
{
"version" : "2018-05-29",
"operation" : "UpdateItem",
"key": { <your-dynamodb-key> },
"update" : {
"expression" : "SET current_balance = :remaining_balance",
"expressionValues" : {
":remaining_balance" : $util.dynamodb.toNumberJson($remaining_balance)
}
},
"condition": {
"expression": "current_balance = :initial_balance",
"expressionValues" : {
":initial_balance" : $util.dynamodb.toNumberJson($initial_balance)
}
}
}

In a reference node with only a list of ids, do we only have to use 'true' as the value?

So I'm planning data structure like this for an food/potluck event where people can bring or reserve food.
{
"potLuckEvents": {
"eventID1": {
"cake": {
"userID1" : 2
"userID2" : 3
}
"burger": {
"userID3" : 1
"userID4" : 2
}
}
}
}
So in this example, userID1 will be bringing 2 servings of cake to the event. All examples I have seen in the documentation were using something like:
"userID1" : true
I'm wondering if there is a specific reason for the list of ids to only have true as the value? Can I use non-true value for this case?
==========
Extra:
I'm also thinking to do use the int value for status of event invitation
{
"potLuckEvents": {
"eventID1": {
"attendees": {
"userID1" : -1
"userID2" : 0
"userID3" : 1
}
}
}
}
In this example:
userID1 declined invitation
userID2 did not accept or decline
invitation yet userID3 accepted invitation
Is this another good use case to use int values instead of 'true'?
The value of a node can be any valid JSON value type.
The reason you see true (more often than other values) is because when you're creating an index node (essentially a list of "foreign keys") there isn't any real value to store. But since a node without a value will be deleted immediately, it's a convention to use true as the value.

Different priority based on user in Firebase

I have data which looks like this:
"-JnbxaJp3rgsIeM2O0EN" : {
"Name":"Bill"
},
"-yryexaJp3rgsIeM2O0EN" : {
"Name":"Jill"
},
"-6yrhxaJp3rgsIeM2O0EN" : {
"Name":"John"
},
"-gn643Jp3rgsIeM2O0EN" : {
"Name":"Jack"
}
When a user is logged in with id simplelogin:5 I want to order the output based on their sort preferences. So say for example user simplelogin:5 previously set his order to Jack,Jill,Joh,Bill and simplelogin:1 set their order to Bill,John,Jack,Jill.
I know I can set priority but that's priority for the data as a whole and it isn't tied to a user, this is shared data which needs custom priority per user.
I was thinking of setting up something like this:
users[
{
"uid":"simplelogin:1",
"nameOrder":[-gn643Jp3rgsIeM2O0EN, -yryexaJp3rgsIeM2O0EN, etc.]
}
];
But it seems like there should be a better way, and even if I was able to generate a list like that, i'm not sure how to sort the output to follow the order in the nameOrder entry.
First you need to modify your main list with a persistent sort id. I used a numeric value but you can use whatever value you prefer.
"-JnbxaJp3rgsIeM2O0EN" : {
"Name":"Bill",
nameIndex": 0
},
"-yryexaJp3rgsIeM2O0EN" : {
"Name":"Jill",
"nameIndex": 1
},
"-6yrhxaJp3rgsIeM2O0EN" : {
"Name":"John",
"nameIndex": 2
},
"-gn643Jp3rgsIeM2O0EN" : {
"Name":"Jack",
"nameIndex": 3
}
Then store your user defined sort preference
"simplelogin:1" : {
"nameOrder": "0,2,3,1"
},
"simplelogin:5" : {
"nameOrder": "3,1,2,0"
}
Now when you read the name list, use the array index saved in nameOrder to display results.

Firebase indexing on huge lists (100000+ items)

I'm migrating my relational database to Firebase. In general, I have a planner for workers. They can add an item ('appointment') to their schedule. I've read the FireBase documentation, and found a section on indexing.
So I've created following structure (date = YYYYMMDD and time = HHMMSS):
{
appointments :
'id1' : { 'date' : '20141207', 'time' : '170000', worker : 'worker1' },
'id2' : { 'date' : '20141208', 'time' : '170000', worker : 'worker1' }
}
I've added an index for date, time and worker, to be able to query data like this (e.g. fetch all appointments for today):
curl -X GET 'https://myapp.firebaseio.com/appointments.json?orderBy="date"&equalsTo="20141207"'
This works as expected and does the job well. The problem is, the number of appointments can grow exponentially (about a year from now, there could be 100000+ appointments). Is it a good approach to use these indexes? Another option would be to store the date and time also separately, like this:
{
'20141207' :
{ '170000' : { 'id1' : true } },
'20141208' :
{ '170000' : { 'id2' : true } }
}
In order to ensure that appointments can be fetched per day very fast. Or is FireBase able to handle this just using indexes?
The number of records in the path won't be an issue; Firebase is a scalable, real-time back end that handles hundreds of thousands of concurrent connections and millions of nodes. Querying should be fast. This is the point of an index and, like all things Firebase, must meet our standards of speed and excellence.
Be sure to read about '.indexOn' and to implement this in your security rules:
{
"rules": {
"appointments": {
".indexOn": ["date", "time", "worker"]
}
}
}
Also, your real limitation here will be the bandwidth of transferring data over the tubes, so be sure to limit your results in some manner and paginate:
curl -X GET 'https://myapp.firebaseio.com/appointments.json?orderBy="date"&equalsTo="20141207"&limitToFirst=100'

Resources