Quicksight - distinct count of a value while ignoring filters - count

I need to compare an initial distinct count of values (with no filters applied) vs a distinct count of the same values after some filters have been applied.
I've searched a lot and can't find how to do this. Not sure if level aware calculation will work since I don't need to do a countOver.

If you want to do that in a graph, filters have the option to be applied only in some visuals:
Documentation

Related

Order by Gremlin (on AWS Neptune) descending puts 0 at the top

I have a Neptune Gremlin query that should order vertices by the number of times they've been saved by other users in descending order. It works perfectly for vertices where the property value is > 0, but for some reason puts the vertices where the property is equal to zero at the top.
When adding the vertex, the property is created without quotes (so not a string), and I am able to sum on the property when I increment it in other scenarios, so they should all be numbers. When ordering in ascending order it works as expected too (zero values come up first and then ordering is correct).
Has anyone seen this before or knows why it might be happening? I don't want to have to pre-filter out zero values.
The relevant part of my query is the following (and acts the same way with incorrect ordering, but has some stuff in the results that isn't relevant for this question), but I have attached an image for the full query I'm using with the results it gives g.V().hasLabel('trip').order().by('numSaves', desc)
Query and results
I was able to reproduce the issue thanks to the very helpful additional information. In the near term, the workaround of using fold().unfold() will work as it causes a different code path through the query engine to be taken. I will update this answer when more information is available. The issue seems to be related to the sum step. Another workaround that worked for me is to use a sack to do the "add one". Not a very elegant query but it does seem to avoid the order problem.
g.V("some-id").
property(single, "numSaves",
sack(assign).by('numSaves').
sack(sum).by(constant(1)).sack())
UPDATED July 29th 2021:
An Amazon Neptune update (1.0.5.0) was just released that contains a fix for this issue.

Gremlin Vertex id vs label: advantages?

I am designing a graph, and see examples where several vertices will have a similar label, such as 'user', etc. When knowing its unique value, one can assign it to the vertex' id, and look it up as:
g.V('person').has('id','unique-value'). ...
Or assign that unique value as a label, and reference it that way.
g.V('unique-value'). ...
Is there a particular reason not use unique values (an id, essentially) as a label, such as performance? What is the best strategy for this?
Your question and your Gremlin examples don't quite align. I think that you mean to compare:
g.V().hasLabel('person').has(T.id,'unique-value')
and
g.V('unique-value')
Note my corrections in that first Gremlin statement. V() does not take a vertex label as an argument - it can only take a vertex id or a Vertex object. Also, the actual vertex identifier must be referenced by T.id and not 'id', the latter being a reference to a user-defined property named "id". T.id is what you get returned from g.V().id(). We often refer to T.id as just id and I will do so going forward.
With that being straightened out, there is no need to do hasLabel('person') if you have the id handy, so the two examples above return the same value and I would think that most graph databases would likely optimize away the label filter and just use the id for their lookup so I wouldn't imagine that you'd see a difference in performance, but for readability purposes I'd stick to just using V('unique-value').
Your question specifically asked about using a unique label as a way to identify a vertex, so I will also address that. A label is not meant for unique identification of a graph element. It is meant to categorize groups of elements. Aside from that convention, I think there are a number of technical reasons not to do that. Some graphs have limits on the number of labels you can have so that could be a problem depending on your graph provider. At the very least, you reduce the portability of your code by doing that. I think it would impact performance as label lookups are not going to be as fast as id lookups (especially as you scale the graph up in size).

Indexing only individual values in property arrays (instead of indexing every combination of those values) in Google datastore

The data model I am planning would have a few property "fields" in place, including a "category/tags" property, which would be a list/array of a lot of tags.
I'm planning on querying on one category at a time. I am not interested in indexing which entities have combinations of categories, just individual categories.
I am NOT referencing simply not indexing a particular property.
Bonus Question:
It seems Google datastore doesn't like "monotonically increasing" property values (ie timestamps) because presumably they make hotspots on the machines while forming indexes. So would just storing the current calendar date help? I could see that making even more of a "hotspot" since every entity for 24 hours would have the same index value for that property, is there some way of storing some data about when each entity was recorded?
Indeed, one should encounter no issues creating a builtin index, as mentioned in the above reply. Still, properties with array values can behave in surprising ways. For more than one filter, all conditions defined by the filters must be satisfied by at least one of the array’s individual values, for it to match the query. This does not apply in case of the equality filters.
Sort order is also unusual: the first value seen in the index determines an entity's sort order.
I don't think a property index (aka Built-in Index) on an Array property creates the index with various value combinations. I believe each value in the Array is indexed. For example, if you have a Book with two tags, the index will have two entries for each tag. Adding another book with three tags would add 3 more entries to the Tags index. This index allows you to query for books based on a single tag as well as multiple tags.
The "combination of values" that you mentioned happens if you create a composite index containing more than one Array type (e.g. Authors and Tags of a Book), and all/most books have multiple authors and multiple tags.
You should not have any issues creating a builtin index on your Category/Tag.
On your other question on indexing entity created/modified timestamp, I do see that the Best Practices says to avoid indexing such a property.
Do not index properties with monotonically increasing values (such as
a NOW() timestamp). Maintaining such an index could lead to hotspots
that impact Cloud Datastore latency for applications with high read
and write rates
Not sure what the alternative would be. If you don't have to query on the timestamp/sort on the timestamp, you are fine storing the timestamp by excluding the property from indexing.

How many points in an InfluxDB measurement?

Since there is no way to delete points by field values in InfluxDB, I'd like to get a count of all the points, SELECT INTO excluding the points with unwanted values, then get a count of the second measurement.
However,
SELECT COUNT(*) FROM measurement1
returns an array of counts for each field and tag, which doesn't tell me how many data points there are total.
It seems there is currently no way to do this without knowing a name of a column/value that is present in all points.
Although time is always present in all points, it is unfortunately not possible to do count(time) for now, either.
This issue addresses the problem, but it is closed and a bit outdated. Someone should open a new one because the problem is still there.
use this command
SHOW SERIES CARDINALITY
works for tag

Why is this so in Crossfilter?

In the Crossfilter documentation, it states the following.
a grouping intersects the crossfilter's current filters, except for the associated dimension's filter. Thus, group methods consider only records that satisfy every filter except this dimension's filter. So, if the crossfilter of payments is filtered by type and total, then group by total only observes the filter by type.
What is the reasoning behind that and what is the way around it?
The reason is that Crossfilter is designed for filtering on coordinated views. In this scenario, you are usually filtering on a dimension that is visualized and you want to see other dimensions change based on your filter. But the dimension where the filter is defined should stay constant, partially because it would be redundant (the filter mechanism is usually displayed visually already) and partly because you don't want your dimension values to jump around while you are trying to filter on them.
In any case, to get around it you can define two identical dimensions on the same attribute. Use one dimension for filtering and the other for grouping. This way, as far as Crossfilter is concerned, your filtering dimension and grouping dimensions are separate.

Resources