How do I undo an ingestion in Azure Data Explorer (Kusto)? - azure-data-explorer

Context: I'm following this guide: https://learn.microsoft.com/en-us/azure/kusto/api/netfx/kusto-ingest-client-examples
I'm using IngestFromStorageAsync - I see that the results have an IngestionSourceId (a GUID) - but I don't know what to do with this. (this is not the extent id)
I was assuming that you could use this ID to remove all the records that were imported...
Does anyone know how to undo an ingestion?
Currently, I'm using .show cluster extents to show the extent ids, then I call .drop extent [id]. Is this the right way to undo an ingestion?

"undo"ing an ingestion is essentially dropping the data that was ingested.
dropping data can be done at the resolution of extents (data shards), and extents can get merged with one another at any given moment (e.g. straight after data was ingested).
if you know there's a chance you'll want to drop the data you've just ingested (and you can't fix the ingestion pipeline that leads to those "erroneous"(?) ingestions), one direction you could follow would be to use extent tags, to be able to identify the extents that were created as part of your ingestion, then drop them.
more information can be found here: https://learn.microsoft.com/en-us/azure/kusto/management/extents-overview.
if you do choose to use tags for this purpose (and can't avoid the situations where you need to "undo" your ingestions), please make sure you read the "performance notes" in that doc.
Excerpt from documentation link:
'ingest-by:' extent tags
Tags that start with an ingest-by: prefix can be used to ensure that data
is only ingested once. You can issue an ingestIfNotExists property command that prevents the data from being ingested if there already exists an extent with this specific ingest-by: tag.
The values for both tags and ingestIfNotExists are arrays of strings,
serialized as JSON.
The following example ingests data only once. The 2nd and 3rd commands do nothing:
.ingest ... with (tags = '["ingest-by:2016-02-17"]')
.ingest ... with (ingestIfNotExists = '["2016-02-17"]')
.ingest ... with (ingestIfNotExists = '["2016-02-17"]', tags = '["ingest-by:2016-02-17"]')
[!NOTE]
Generally, an ingest command is likely to include
both an ingest-by: tag and an ingestIfNotExists property,
set to the same value, as shown in the 3rd command above.
[!WARNING]
Overusing ingest-by tags isn't recommended.
If the pipeline feeding Kusto is known to have data duplications, we recommend that you solve these duplications as much as possible, before ingesting the data into Kusto.
Attempting to set a unique ingest-by tag for each ingestion call might result with severe impact on performance.
If such tags aren't required for some period of time after the data is ingested, we recommend that you drop extent tags.
To drop the tags automatically, you can set an extent tags retention policy.

Related

PeopleCode to load from CSV file and split 1 field into multiple columns

I am not familiar with Application Engine or PeopleCode but inherited this project when someone left. Seems simple but I'm not sure how to approach it.
I have to load a CSV file that has 5 fields. The last field has multiple values separated by a comma and it is qualified with quotes.
file example:
ID , YEAR, VALUE1 , VALUE2, CODE
87778, 2022, processed, none , 100,40
93332, 2022, processed, none , 60
76633, 2022, error , none , 55,35,9
I have created a File Layout definition and set the qualifier and I can load the file into a staging table but now I want to split the last column (CODE) into individual codes.
I have created 2 PeopleTools Record definitions with a parent/child relationship:
parent Record definition with ID,YEAR,VALUE1,VALUE2, and
child Record definition with ID,YEAR,CODE
I have found that I can use the PeopleCode split function to break the CODE column out into an array containing each value in an element. I'm not sure what the best way to structure the program is though.
Is the staging table necessary?
Or can I use the split function as I read the CSV file in and update the parent/child tables?
Or do I need to keep the staging table and then read out the fields for the parent record and move them to the permanent table and then do the same for the child after using the split function and then loop through the array?
Just looking for some guidance so my first AE project is not a mess.
IMO, there are always multiple ways to achieve the same thing(especially in AE). we choose one based on our requirements and efficiency.
for staging table: In your case, you can ignore the staging table unless you are expecting to load a huge set of data every time or want to do parallel processing. In other words, you can have staging table if you think loading takes a lot of time and you don't want to risk failing that due to other errors.
You can even achieve this whole thing in one peoplecode action without a staging table.
or,
Load the data into staging table and commit.
loop through the data from staging table in AE (having the data in state rec)
Do the transformation as required using peoplecode action
insert data in necessary tables
update status(have a field in staging table) field in staging table, this may come in handy for any analysis/issue in production

Function of Rows, Rowsets in PeopleCode

I'm trying to get a better understanding of what Rows and Rowsets are used for in PeopleCode? I've read through PeopleBooks and still don't feel like I have a good understanding. I'm looking to get more understanding of these as it pertains to Application Engine programs. Perhaps walking through an example may help. Here are some specific questions I have:
I understand that Rowsets, Row, Record, and Field are used to access component buffer data, but is this still the case for stand alone Application Engine programs run via Process Scheduler?
What would be the need or advantage to using these as opposed to using SQL objects/functions (CreateSQL, SQLExec, etc...)? I often see in AE programs where the CreateRowset object is instantiated and uses a .Fill method with a SQL WHERE Clause and I don't quite understand why a SQL was not used instead.
I've seen in PeopleBooks that a Row object in a component scroll is a row, how does a component scroll relate to the row? I've seen references to rows having different scroll levels, is this just a way of grouping and nesting related data?
After you have instantiated the CreateRowset object, what are typical uses of it in the program afterwards? How would you perform logic (If, Then, Else, etc..) on data retrieved by the rowset, or use it to update data?
I appreciate any insight you can share.
You can still use Rowsets, Rows, Records and fields in stand alone Application Engines. Application Engines do not have component buffer data as they are not running within the context of a component. Therefore to use these items you need to populate them using built-in methods like .fill() on a rowset, or .selectByKey() on a record.
The advantage of using rowsets over SQL is that it makes the CRUD easier. There are built-in methods for selecting, updating, inserting and deleting. Additionally you don't have to worry about making a large number of variables if there were multiple fields like you would with a SQL object. Another advantage is when you do the fill, the data is read into memory, where if you looped through the SQL, the SQL cursor would be open longer. The rowset, row, record and field objects also have a lot of other useful methods such as allowing you to executeEdits (validation) or copy from one rowset\row\record to another.
This question is a bit less clear to me but I'll try and explain. If you have a Page, it would have a level 0 row. It then could have multiple Level 1 rowsets. Under each of those it could have a level 2 rowsets.
Level0
/ \
Level1 Level1
/ \ / \
Level2 Level2 Level2 Level2
If one of your level1 rows had 3 rows, then you would find 3 rows in the Rowset associated with that level1. Not sure I explained this to answer what you need, please clarify if I can provide more info
Typically after I create a rowset, I would loop through it. Access the record on each row, do some processing with it. In the example below, I look through all locked accounts and prefix their description with LOCKED and then updated the database.
.
Local boolean &updateResult;
local integer &i;
local record &lockedAccount;
Local rowset &lockedAccounts;
&lockedAccounts = CreateRowset(RECORD.PSOPRDEFN);
&lockedAccounts.fill("WHERE acctlock = 1");
for &i = 1 to &lockedAccounts.ActiveRowCount
&lockedAccount = &lockedAccounts(&i).PSOPRDEFN;
if left(&lockedAccount.OPRDEFNDESCR.value,6) <> "LOCKED" then
&lockedAccount.OPRDEFNDESCR.value = "LOCKED " | &lockedAccount.OPRDEFNDESCR.value;
&updateResult = &lockedAccount.update();
if not &updateResult then
/* Error handle failed update */
end-if;
end-if;
End-for;

Is it possible to filter the list of fields when outputting a Full Dataset?

I have a DataTable that I'm passing to a FlexCel report. It contains a variable number of columns, so I'm using the Full Dataset feature (e.g. <#table_name.*>).
However, only a subset of the fields are dynamically generated (I have a variable number of attachments). The column name for each attachment field starts with a common word (e.g. "Attachment0", "Attachment1", etc).
What I would like to do is output the known finite set of fields and then the variable number of attachments. It would be nice if I could write something like <#table_name.Attachment*> (and <#table_name.Attachment**>). Is there any way in FlexCel Reports I can achieve the same result?
A side benefit to such a solution means that I could keep the formatting for the known/finite set of fields.
Update
I added place holder columns to the document, each with a <#delete column> tag, so that the un-wanted columns/data are removed.
Although this works, it's not ideal. For example, if I want to see how the columns fit in the page width (in print preview), then I need to hide the columns. Then I have to remember to un-hide them again, so other developers can see/understand my handy work.
It would be much more straight forward if I could filter the fields before they're output to the document.
I realised there's an alternate way around this problem. I broke up the data into two sets of data - <#table_name.*> and <#table_name_attachments.*>.
The fixed set of fields are in the first table and the variable set of fields is in the second table (all the "Attachment*" fields). When the report is run, I place them next to each other (in the same order) in the same worksheet. This means I have two table ranges - "_table_name_" and "_table_name_attachments_" on the one sheet.
Now I'm able to run my print preview without hiding/re-showing the columns-to-be-deleted. I've also eliminated human error - it was all to easy to accidentally set the wrong number of padded/delete columns.

element-attribute-range-query fetching result but element-attribute-value-query is not fetching any result

I wanted to fetch the document which have the particular element attribute value.
So, I tried the cts:element-attribute-value-query but I didn't get any result. But the same element attribute value, I am able to get using cts:element-attribute-range-query.
Here the sample snippet used.
let $s-query := cts:element-attribute-range-query(xs:QName("tit:title"),xs:QName("name"),"=",
"SampleTitle",
("collation=http://marklogic.com/collation/codepoint"))
let $s-query := cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"),
"SampleTitle",
())
return cts:search(fn:doc(),($s-query))
The problem with range-query is it needs the range index. I have hundreds of DB's in multiple hosts. I need to create range indexes on each DB.
What could be the problem with attribute-value-query?
I found the issue with a couple of research.
Actually the result document is a french language document. It has the structure as follows. This is a sample.
<doc xml:lang="fr:CA" xmlns:tit="title">
<tit:title name="SampleTitle"/>
</doc>
The cts:element-attribute-value-query is a language dependent query. To get the french language results, then language needs to be mentioned in the option as follows.
cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"), "SampleTitle",("lang=fr"))
But cts:element-attribute-range-query don't require the language option.
Thanks for the effort.

Qt TableView column and cell colors

I've got dialog with QTableView that is using QSqlTableModel and QSortFilterProxyModel, and they are reading from SQL Database. I want to change the color of my columns [3 , 4 , 5].
I am using the following code:
ui->tableView->model()->setData(
ui->tableView->model()->index(1,2),
QVariant(QBrush(Qt::red)),
Qt::BackgroundRole);
I am searching solution for 4 days already, and still nothing. Please tell me what to do. Whatever it is, i just need some new source code, or some other way, or just to edit my piece of code.
The problem is that neither the proxy model nor the QSqlTableModel will do anything with the background role. So you set it, but if you cared to check for the result returned by setData, you'd notice that it's false: what you're doing is a no-operation.
Just think about it: the SQL database in general has no way of storing an attribute like the background color together with other data in a given field. Similarly, the proxy model is there only to sort the data, and it doesn't give you any extra storage.
What you need to do is to insert a custom proxy between the table model and the sort/filter proxy. That proxy needs to store such extended attributes for you. Then it'll work.

Resources