Since there is no way to delete points by field values in InfluxDB, I'd like to get a count of all the points, SELECT INTO excluding the points with unwanted values, then get a count of the second measurement.
However,
SELECT COUNT(*) FROM measurement1
returns an array of counts for each field and tag, which doesn't tell me how many data points there are total.
It seems there is currently no way to do this without knowing a name of a column/value that is present in all points.
Although time is always present in all points, it is unfortunately not possible to do count(time) for now, either.
This issue addresses the problem, but it is closed and a bit outdated. Someone should open a new one because the problem is still there.
use this command
SHOW SERIES CARDINALITY
works for tag
Related
I have a Neptune Gremlin query that should order vertices by the number of times they've been saved by other users in descending order. It works perfectly for vertices where the property value is > 0, but for some reason puts the vertices where the property is equal to zero at the top.
When adding the vertex, the property is created without quotes (so not a string), and I am able to sum on the property when I increment it in other scenarios, so they should all be numbers. When ordering in ascending order it works as expected too (zero values come up first and then ordering is correct).
Has anyone seen this before or knows why it might be happening? I don't want to have to pre-filter out zero values.
The relevant part of my query is the following (and acts the same way with incorrect ordering, but has some stuff in the results that isn't relevant for this question), but I have attached an image for the full query I'm using with the results it gives g.V().hasLabel('trip').order().by('numSaves', desc)
Query and results
I was able to reproduce the issue thanks to the very helpful additional information. In the near term, the workaround of using fold().unfold() will work as it causes a different code path through the query engine to be taken. I will update this answer when more information is available. The issue seems to be related to the sum step. Another workaround that worked for me is to use a sack to do the "add one". Not a very elegant query but it does seem to avoid the order problem.
g.V("some-id").
property(single, "numSaves",
sack(assign).by('numSaves').
sack(sum).by(constant(1)).sack())
UPDATED July 29th 2021:
An Amazon Neptune update (1.0.5.0) was just released that contains a fix for this issue.
I am a tableau newbie and am trying to see if this is possible or not. I have 2 separate data sources where the same employees are listed, one is for closed cases and the other is for open cases. These data sources have some of the same columns, but for the most part they are different.
Is it possible to aggregate the case count for each employee on the closed and open data sources into a single column? For instance, if an employee has 50 closed cases and 23 open cases, I want it to show 73 for them.
I attempted to play around with the joins/unions but these didn't work properly and duplicated the data most times.
I think this is a great chance to leverage blends.
I have created a workbook with the Sample Superstore Excel dataset. This dataset has three sheets. I'll use the Orders and Returns sheets to demonstrate how we can calculate the net orders using blends.
The dataset I'm using can be found here.
Start by connecting to both the Orders and Returns separately. Once done with this step you should see the two data sources at the top of your data pane.
In this example, I'll calculate the Net Returns by Category. In your case, you're after the Total Cases by Employee, so just imagine Employee in place of Category.
Next, drag Category from the Orders data source onto the view, then select the Orders data source and click the chain icon to blend on Order ID.
You will need a common column between the two tables in order to blend.
Once blended I'll go back to the primary data source (indicated by the blue check mark) and create the Net Orders calculation.
This calculation uses the dot notation - similar to what you might see in SQL - to reference our other table.
To double check that our calculation is working properly, we can drag the components of this calculation onto the view and do the math.
Of course, once you are satisfied you can remove all but your blended calculation.
Blending isn't ideal in most cases but you could try it. Bring in each data source separately and "join" them within your workbook pane on Employee or hopefully an Employee_id. Click the little chain once you have them both loaded and you are on a worksheet tab. Then you could sum the counts by employee. Blending sometimes presents some issues with calculated fields across the two data sources but this is what I would try first.
We are new to DynamoDB and struggling with what seems like it would be a simple task.
It is not actually related to stocks (it's about recording machine results over time) but the stock example is the simplest I can think of that illustrates the goal and problems we're facing.
The two query scenarios are:
All historical values of given stock symbol <= We think we have this figured out
The latest value of all stock symbols <= We do not have a good solution here!
Assume that updates are not synchronized, e.g. the moment of the last update record for TSLA maybe different than for AMZN.
The 3 attributes are just { Symbol, Moment, Value }. We could make the hash_key Symbol, range_key Moment, and believe we could achieve the first query easily/efficiently.
We also assume could get the latest value for a single, specified Symbol following https://stackoverflow.com/a/12008398
The SQL solution for getting the latest value for each Symbol would look a lot like https://stackoverflow.com/a/6841644
But... we can't come up with anything efficient for DynamoDB.
Is it possible to do this without either retrieving everything or making multiple round trips?
The best idea we have so far is to somehow use update triggers or streams to track the latest record per Symbol and essentially keep that cached. That could be in a separate table or the same table with extra info like a column IsLatestForMachineKey (effectively a bool). With every insert, you'd grab the one where IsLatestForMachineKey=1, compare the Moment and if the insertion is newer, set the new one to 1 and the older one to 0.
This is starting to feel complicated enough that I question whether we're taking the right approach at all, or maybe DynamoDB itself is a bad fit for this, even though the use case seems so simple and common.
There is a way that is fairly straightforward, in my opinion.
Rather than using a GSI, just use two tables with (almost) the exact same schema. The hash key of both should be symbol. They should both have moment and value. Pick one of the tables to be stocks-current and the other to be stocks-historical. stocks-current has no range key. stocks-historical uses moment as a range key.
Whenever you write an item, write it to both tables. If you need strong consistency between the two tables, use the TransactWriteItems api.
If your data might arrive out of order, you can add a ConditionExpression to prevent newer data in stocks-current from being overwritten by out of order data.
The read operations are pretty straightforward, but I’ll state them anyway. To get the latest value for everything, scan the stocks-current table. To get historical data for a stock, query the stocks-historical table with no range key condition.
I need to mark the first(earliest) 10 entries in my table as '1'. How to do this?
I read about query.setRange(0,10) option. I just want to confirm whether this setRange option without any sort applied retrieves the earliest 10 entities of a table?
I don't want to save any date and time to measure this.
If you have some other simpler way to do this, please advise.
Thanks,
Karthick.
You should never count on a database (no matter which one) to sort results without explicitly requesting a specific order. Even if datastore returns the earliest entities, it might change in the future and your software will stop working.
I'm fairly new to Tableau, and I'm struggling in building some routines that could be easily implemented in Excel (though it would take forever for big sets of data).
So here is the deal, consider a dataset with the following fields:
int [id_order] -> id of the sales order (deepest level, there are only unique entries of id_order)
int [id_client] -> as I want to know who bought it
date [purchase_date] -> when the customer bought the product
What I want to know is, for each order, when was the last time (if ever) the client has bought something. In order words, what is the highest purchase_date for that user that is smaller than current purchase_date.
In excel, approach is simple (but again, not efficient)
{=max(if(id_client=B1,if(purchase_order
Is there a way to do this kind of calculation in Tableau?
You can do this in Tableau using table calculations. They take a little time to understand how to use well, but are very powerful and flexible. I posted a sample Tableau workbook for a similar question in an answer for SO question Find first time a condition is met
Your situation is similar, but with the extra complication that you want to repeat the analysis for each client id, so you might want to try a recursive approach using the Previous_Value() function instead of the approach used in that example - though I'm not certain that previous_value() will fit your situation.
Still, it might be helpful to download the example workbook I mentioned to get an idea how table calculations can address similar problems.
Just to register the solution, in case someone has the same doubt.
So, basically the solution I found use table calculation, which is not calculated until it's called on a sheet (and is only calculated on the context of the sheet). That's a little bit limiting, so what I do is create a sheet with all the fields I need (+ what is necessary for the table calculation) then export the data (to mdb) and connect to this new file.
So, for my example, the right table calculation is (let's name it last_order_date):
LOOKUP(MAX([purchase_date]),-1)
Explanations. The MAX() is necessary because Lookup (and all table calculations) does not work with data directly, only with aggregations. You can use sum, avg, max, attr, whatever suits you. As in my case there will be only 1 correspondence, any function will do just fine and return the same value.
The -1 indicates that I'm looking for the element immediately before the current entry (of the table, as you define it). If it were FIRST(), it would go for the first entry of the table, and LAST() would go for the last.
Now, I have to put it on a sheet. So I'll bring the fields id_client, id_order, purchase_date and last_order_date.
Then I have to define the parameters of my table calculation last_order_date (Edit Table Calculation). I'll go to Compute using and choose advanced. Now I'll do Partitioning: id_client, and addressing all the rest. What will that do? This mean Tableau will create temporary tables for each id_client, and table calculations will use those tables as parameter.
Additionally, I will Sort by field purchase_date, Max (again the aggregation issue) and ascending, to guarantee my entries are in chronological order.
Now, what will it do? For each entry it will access the table of the id_client, and check what was the purchase_date that is immediately before the current entry (that is being assessed), exactly what I need.
To avoid spending Tableau processing in Visualization, I often put all the fields in details (and leave nothing on screen), use Bar chart (it's good because it allows me to see the data). Then I export it to mdb, then connect to it again. Unfortunately Tableau doesn't directly export to tde.