Cassandra collection tombstones - collections

I have created a table with a collection. Inserted a record and took sstabledump of it and seeing there is range tombstone for it in the sstable. Does this tombstone ever get removed? Also when I run sstablemetadata on the only sstable, it shows "Estimated droppable tombstones" as 0.5", Similarly it shows one record with epoch time as insert time for - "Estimated tombstone drop times: 1548384720: 1". Does it mean that when I do sstablemetadata on a table having collections, the estimated droppable tombstone ratio and drop times values are not true and dependable values due to collection/list range tombstones?
CREATE TABLE ks.nmtest (
reservation_id text,
order_id text,
c1 int,
order_details map<text, text>,
PRIMARY KEY (reservation_id, order_id)
) WITH CLUSTERING ORDER BY (order_id ASC)
user#cqlsh:ks> insert into nmtest (reservation_id , order_id , c1, order_details ) values('3','3',3,{'key':'value'});
user#cqlsh:ks> select * from nmtest ;
reservation_id | order_id | c1 | order_details
----------------+----------+----+------------------
3 | 3 | 3 | {'key': 'value'}
(1 rows)
[root#localhost nmtest-e1302500201d11e983bb693c02c04c62]# sstabledump mc-5-big-Data.db
WARN 02:52:19,596 memtable_cleanup_threshold has been deprecated and should be removed from cassandra.yaml
[
{
"partition" : {
"key" : [ "3" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 41,
"clustering" : [ "3" ],
"liveness_info" : { "tstamp" : "2019-01-25T02:51:13.574409Z" },
"cells" : [
{ "name" : "c1", "value" : 3 },
{ "name" : "order_details", "deletion_info" : { "marked_deleted" : "2019-01-25T02:51:13.574408Z", "local_delete_time" : "2019-01-25T02:51:13Z" } },
{ "name" : "order_details", "path" : [ "key" ], "value" : "value" }
]
}
]
}
SSTable: /data/data/ks/nmtest-e1302500201d11e983bb693c02c04c62/mc-5-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1548384673574408
Maximum timestamp: 1548384673574409
SSTable min local deletion time: 1548384673
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 1.0714285714285714
TTL min: 0
TTL max: 0
First token: -155496620801056360 (key=3)
Last token: -155496620801056360 (key=3)
minClustringValues: [3]
maxClustringValues: [3]
Estimated droppable tombstones: 0.5
SSTable Level: 0
Repaired at: 0
Replay positions covered: {CommitLogPosition(segmentId=1548382769966, position=6243201)=CommitLogPosition(segmentId=1548382769966, position=6433666)}
totalColumnsSet: 2
totalRows: 1
Estimated tombstone drop times:
1548384720: 1
Another quuestion was on the nodetool tablestats output - what does slice refer to in cassandra?
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0

sstablemetadata does not have the information about your table that is not held within the sstable as it is not guaranteed to be run on system that has Cassandra running, and even if it was its very complex to be able to know how to pull the schema information from it.
Since the gc_grace_seconds is a table parameter and not in the metadata it defaults to assuming a 0 gc grace so the droppable times listed in that histogram will be more a histogram of the tombstone creation times by default. If you know your gc grace you can add it as a -g parameter to your sstablemetadata call. like:
sstablemetadata -g 864000 mc-5-big-Data.db
see http://cassandra.apache.org/doc/latest/tools/sstable/sstablemetadata.html for information on the tools output.
With collections it's just normal range tombstone with all that it entails. They are used to prevent the requirement of a read-before-write when overwriting the value of a multicell collection.

Related

Parsing an array of objects, do some math with their index

I have a large json which is actually a concatenated array of objects from several configuration files. I would like to use them to bring up a menue in a bash script. To make the menue easier to read, the json array contains special objects that would trigger a line break. In the end, the user picks the index of the array.
A simplified json looks like this:
[
{
"index" : 0,
"value" : "one a"
},
{
"index" : 3,
"value" : "two a"
},
{
"value" : ""
},
{
"index" : 2,
"value" : "three a"
},
{
"value" : ""
},
{
"index" : 1,
"value" : "one b"
},
{
"index" : 3,
"value" : "two b"
},
{
"index" : 2,
"value" : "three b"
}
]
All values with a come from the first file, all bs from the second file. The entries with an empty value are line breaks.
What I got so far, after hours of researching, is this:
jq --raw-output 'to_entries[] | "\(.key + 1). \(.value.value) (\(.value.index))"' test.json
Which produces this out of the above data:
1. one a (0)
2. two a (3)
3. (null)
4. three a (2)
5. (null)
6. one b (1)
7. two b (3)
8. three b (2)
Now the user would type 8 to work with the three b.
What I need, however, is this:
1. one a (0)
2. two a (3)
3. three a (2)
4. one b (1)
5. two b (3)
6. three b (2)
So the user would need to type 6 to do the same.
Any idea welcome!
Using foreach to count would be one way:
foreach .[] as {$index, $value} (0;
if $value != "" then . + 1 else . end;
if $value != "" then "\(.). \($value) (\($index))" else "" end
)
1. one a (0)
2. two a (3)
3. three a (2)
4. one b (1)
5. two b (3)
6. three b (2)
Demo

Set multiple threshold on a log based kusto query

I have set up a log-based alert in Microsoft Azure. The deployment of the alerts done via ARM template.
Where you can input your query and set threshold like below.
"triggerThresholdOperator": {
"value": "GreaterThan"
},
"triggerThreshold": {
"value": 0
},
"frequencyInMinutes": {
"value":15
},
"timeWindowInMinutes": {
"value": 15
},
"severityLevel": {
"value": "0"
},
"appInsightsQuery": {
"value": "exceptions\r\n| where A_ != '2000' \r\n| where A_ != '4000' \r\n| where A_ != '3000' "
}
As far as I understand we can only set threshold once ON an entire query.
Questions: I have multiple statements in my query which I am excluding since it's just a noise. But now I want to set a threshold on value 3000 to 5 and also want to set a time-window to 30 in the same query. meaning only exclude 3000 when it occurs 5 times in the last 30 minutes(when query get run).
exceptions
| where A_ != '2000'
| where A_ != '4000'
| where A_ != '3000'
I am pretty sure that I can't set a threshold like this in the query and the only workaround is to create a new alert just for value 3000 and set a threshold in ARM template. I haven't found any heavy threshold/time filters in Aure. Is there any way I can set multiple thresholds and time filters in a single query? which is again getting checked by different threshold and time filetrs in the ARM template.
Thanks.
I don't fully understand your question.
But for your time window question you could do something like
exceptions
| summarize count() by A_, bin(TimeGenerated, 30m)
That way you will get a count of A_ in blocks of 30 minutes.
Another way would be to do:
let Materialized = materialize(
exceptions
| summarize Count=count(A_) by bin(TimeGenerated, 30m)
); 
Materialized | where Count == 10
But then again it all depends on what you would like to achieve
You can easily set that in the query and fire based on the aggregate result.
exceptions
| where timestamp > ago(30m)
| summarize count2000 = countif(A_ == '2000'), count3000 = countif(A_ == '3000'), count4000 = countif(A_ == '4000')
| where count2000 > 5 or count3000 > 3 or count4000 > 4
If the number of results is greater than one than the aggregate condition applies.

Grafana: How to have the duration for a selected period

I can't find the correct mathematical formula to compute a SLA (availability) with Grafana:
I have graph to show the duration of downtime for each days:
From this, i would like to compute the SLA (eg: 99,5%).
On the graph for the selected period (Last 7 days) i can to have this data:
71258 is the sum of duration of downtime in second. I have this with summarize(1day, max, false)
I need to have the sum of duration of time for the selected period (here 7 days = 604800second). But how ?
If i have this last data, after i will do :
(100% * 604800) / 71258 = X %
100% - X % = My SLA!!
My question is: Which formula use to have the duration for a selected period in Grafana ?
One of the database you can run behind Grafana, is Axibase Time Series Database (ATSD). It provides built-in aggregation functions that can perform SLA-type calculations, for example, to compute % of the period when the value exceeded the threshold.
THRESHOLD_COUNT - number of violations in the period
THRESHOLD_DURATION - cumulative duration of the violations
THRESHOLD_PERCENT - duration divided by period
In your example, that would be THRESHOLD_PERCENT.
Here's a sample SLA report for Amazon Web Services instance: https://apps.axibase.com/chartlab/0aa34311/6/. THRESHOLD_PERCENT is visualized on the top chart.
The API request looks as follows:
{
"queries": [{
"startDate": "2016-02-22T00:00:00Z",
"endDate": "2016-02-23T00:00:00Z",
"timeFormat": "iso",
"entity": "nurswgvml007",
"metric": "app.response_time",
"aggregate": {
"types": [
"THRESHOLD_COUNT",
"THRESHOLD_DURATION",
"THRESHOLD_PERCENT"
],
"period": {
"count": 1,
"unit": "HOUR"
},
"threshold": {
"max": 200
}
}
}]
}
ATSD driver: https://github.com/grafana/grafana-plugins/tree/master/datasources/atsd
Disclosure: I work for Axibase.

How to use the function "table:get" (table extension) when 2 keys are required?

I have a file .txt with 3 columns: ID-polygon-1, ID-polygon-2 and distance.
When I import my file into Netlogo, I obtain 3 lists [[list1][list2][list3]] which corresponds with the 3 columns.
I used table:from-list list to create a table with the content of 3 lists.
I obtain {{table: [[1 1] [67 518] [815 127]]}} (The table displays the first two lines of my dataset).
For example, I would like to get the value of distance (list3) between ID-polygon-1 = 1 (list1) and ID-polygon-2 = 67 (list1), that is, 815.
How can I use table:get table key when I have need of 2 keys (ID-polygon-1 and ID-polygon-2) ?
Thanks very much your help.
Using table:from-list will not help you there: it expects "a list of two element lists, or pairs" where the "the first element in the pair is the key and the second element is the value." That's not what you have in your original list.
Furthermore, NetLogo tables (and associative arrays in general) cannot have two keys. They are always just key-value pairs. Nothing prevents the value from being another table, however, and in your case, that is what you need: a table of tables!
There is no primitive to build that directly, however. You will need to build it yourself:
extensions [ table ]
globals [ t ]
to setup
let lists [
[ 1 1 ] ; ID-polygon-1 column
[ 67 518 ] ; ID-polygon-2 column
[ 815 127 ] ; distance column
]
set t table:make
foreach n-values length first lists [ ? ] [
let id1 item ? (item 0 lists)
let id2 item ? (item 1 lists)
let dist item ? (item 2 lists)
if not table:has-key? t id1 [
table:put t id1 table:make
]
table:put (table:get t id1) id2 dist
]
end
Here is what you get when you print the resulting table:
{{table: [[1 {{table: [[67 815] [518 127]]}}]]}}
And here is a small reporter to make it convenient to get a distance from the table:
to-report get-dist [ id1 id2 ]
report table:get (table:get t id1) id2
end
Using get-dist 1 67 will give the 815 result you were looking for.

Two index with one value in a lua table

I am very new to lua and my plan is to create a table. This table (I call it test) has 200 entries - each entry has the same subentries (In this example the subentries money and age):
This is a sort of pseudocode:
table test = {
Entry 1: money=5 age=32
Entry 2: money=-5 age=14
...
Entry 200: money=999 age=72
}
How can I write this in lua ? Is there a possibility ? The other way would be, that I write each subentry as a single table:
table money = { }
table age = { }
But for me, this isn't a nice way, so maybe you can help me.
Edit:
This question Table inside a table is related, but I cannot write this 200x.
Try this syntax:
test = {
{ money = 5, age = 32 },
{ money = -5, age = 14 },
...
{ money = 999, age = 72 }
}
Examples of use:
-- money of the second entry:
print(test[2].money) -- prints "-5"
-- age of the last entry:
print(test[200].age) -- prints "72"
You can also turn the problem on it's side, and have 2 sequences in test: money and age where each entry has the same index in both arrays.
test = {
money ={1000,100,0,50},
age={40,30,20,25}
}
This will have better performance since you only have the overhead of 3 tables instead of n+1 tables, where n is the number of entries.
Anyway you have to enter your data one way or another. What you'd typically do is make use some easily parsed format like CSV, XML, ... and convert that to a table. Like this:
s=[[
1000 40
100 30
0 20
50 25]]
test ={ money={},age={}}
n=1
for balance,age in s:gmatch('([%d.]+)%s+([%d.]+)') do
test.money[n],test.age[n]=balance,age
n=n+1
end
You mean you do not want to write "money" and "age" 200x?
There are several solutions but you could write something like:
local test0 = {
5, 32,
-5, 14,
...
}
local test = {}
for i=1,#test0/2 do
test[i] = {money = test0[2*i-1], age = test0[2*i]}
end
Otherwise you could always use metatables and create a class that behaves exactly like you want.

Resources