Currently we are using Azure Databricks as Transformation layer and transformed data are loaded to Cosmos DB through connector.
Scenario:
We have 2 files as source files.
1st File contains name,age
2nd file contains name, state, country
In Cosmos, I have created collection using id, Partition Key
In databricks, I am loading these 2 files as Dataframe and creating a temp table to query the content.
I am querying the content from the first file [ select name as id, name, age from file ]and loading the same to Cosmos Collection.
From the second file. I am using [ select name as id, state, country] and loading to the same collection expecting the content from the second file get inserted in the same collection in the same document based on id field.
The issue here is when I am loading the content from the second file, the attribute 'age' from the first file gets deleted and only id, name, state, country is seen in the cosmos document. This is happening because I am using UPSERT in databricks to load to Cosmos.
When I am changing the UPSERT to INSERT or UPDATE it throws as error which says 'Resource with id already exists'
Databricks Connection to Cosmos:
val configMap = Map(
"Endpoint" -> {"https://"},
"Masterkey" -> {""},
"Database" -> {"ods"},
"Collection" -> {"tval"},
"preferredRegions" -> {"West US"},
"upsert" -> {"true"})
val config = com.microsoft.azure.cosmosdb.spark.config.Config(configMap)
Is there a way to insert the attributes from second file without deleting the attribute which is already present. I am not using JOIN operation as the use case doesn't fit to use.
From a vague memory of doing this you need to set the id attribute on your data frame to match between the two datasets.
If you omit this field Cosmos generates a new record - which is what is happening for you.
So if df1 & df2 have id=1 on the first record then the first will insert it, the second will update it.
But if they are the same record then joining in Spark will be far more efficient.
Related
I have a requirement to upsert data from REST API to Cosmos DB and also maintain the item level TTL for particular time interval.
I have used ADF Copy activity to copy the data but for TTL, used additional custom column at source side with hardcoded value 30.
Noticed that time interval (seconds) updating as string instead of integer. Hence failing with the below error.
Details
Failure happened on 'Sink' side. ErrorCode=UserErrorDocumentDBWriteError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Documents failed to import due to invalid documents which violate some of Cosmos DB constraints: 1) Document size shouldn't exceeds 2MB; 2) Document's 'id' property must be string if any, and must not include the following charaters: '/', '', '?', '#'; 3) Document's 'ttl' property must not be non-digital type if any.,Source=Microsoft.DataTransfer.DocumentDbManagement,'
ttl Mapping between Custom column to cosmos DB
When i use ttl1 instead of ttl, it is getting success and value stored as string.
Any suggestion please?
Yes, that's the issue with additional columns in Copy activity. Even of you set it to int, it will change to string at the source.
The possible workaround is to create a Cosmos DB trigger in Azure function and add 'TTL' there.
at first event I get data like below
{
'product_name':'hamam',
'quantity':'100'
}
at second I get data like below
{
'product_name':'hamam',
'quantity':'70'
}
here I wanna update the values in cosmos db, how can I do it?
ASA supports upserts feature for cosmos db if your data contains a unique document id.(Your sample data seems does not have it) Please see this paragraph about upserts in ASA for cosmos db.
Some excerpt as below:
Stream Analytics integration with Azure Cosmos DB allows you to insert or update records in your container based on a given Document ID column.
If the incoming JSON document has an existing ID field, that field is automatically used as the Document ID column in Cosmos DB and any subsequent writes are handled as such, leading to one of these situations:
unique IDs lead to insert
duplicate IDs and 'Document ID' set to 'ID' leads to upsert
duplicate IDs and 'Document ID' not set leads to error, after the
first document
I am using Microsoft.Azure.CosmosDB.BulkExecutor.IBulkExecutor.BulkImportAsync to insert documents as a batch. I have implemented unique constraints for my cosmos db collection. If any of the input documents violates the constraint the entire bulk import operation fails with throwing DocumentClientException. Is this an expected behaviour? Or is there a way we can handle the exceptions for failed documents and make sure the valid documents are inserted?
First of all Thanks to Microsoft Document which has explained solid scenarios on the issue,
https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-guide
This error appears when we define unique_key in addition to default id field defined by cosmos. The reason could be possible duplication of row for Unique Key in the dataset. Another possible reason, the Delta dataset which we are about to load has some of the unique keys which are already present in existing cosmos dataset.
For regular batch jobs there could be some updates happening on existing unique key itself, but we cannot update an existing unique key through batch process. As each record gets into cosmos as new record with new 'id' field value. Cosmos updates an existing record only same id field not on unique key.
Workaround, Since unique key is already going to be unique for every row across entire collection, we can define our unique value itself as also 'id' field. So now if we have any updates on addition field apart from unique key we can update them as 'id' field for respective unique key will also be same.
In SQL way,
SELECT <unique_key_field> AS id, <unique_key_field>, field1, field2 FROM <table_name>
Is there any way I can find labels which are not used in D365 FO (labels which dont have references)?
The cross references are stored in database DYNAMICSXREFDB. You can use a sql query to generate a list of labels that have no references.
This query uses two tables in the database:
Names holds an entry for each object in the application that can be referenced.
The Path field of the table holds the name of the object (e.g. /Labels/#FormRunConfiguration:ViewDefaultLabel is the path of the ViewDefaultLabel in the FormRunConfiguration label file.
Field Id is used to reference a record in this table in other tables.
References holds the actual references that connect the objects.
Field SourceId contains the Id of the Names record of the object that references another object identified by field TargetId.
The actual query could look like this:
SELECT LabelObjects.Path AS UnusedLabel
FROM [dbo].[Names] AS LabelObjects
WHERE LabelObjects.Path LIKE '/Labels/%'
AND NOT EXISTS
(SELECT *
FROM [dbo].[References] AS LabelReferences
WHERE LabelReferences.TargetId = LabelObjects.Id)
Make sure to compile the application to update the cross reference data. Otherwise the query might give you wrong results. When I run this query on a version 10.0.3 PU27 environment, it returns one standard label as a result.
I use Navicat and this command to create temp table in sqlite:
create temp table search as select * from documents
Then when i try to query:
select * from search
I got:
no such table: temp.sqlite_master
or:
no such table
The table doesn't appear in table list too, but when I try to create it again I get:
table search already exists
What is the problem? is it from navicat?
You create statement looks correct to me. When you create a temp table it is deleted when you close the connection string used to create the table. Are you closing the connection after you create the table and then opening it again when you are sending the query?
If not, can you include your query statement too?
It's like a bug in SQLite DLL shipped with Navicat. Test it somewhere else worked ok.
Documentation of SQLite tells this about CREATE TABLE:
If a is specified, it must be either "main", "temp",
or the name of an attached database. In this case the new table is
created in the named database. If the "TEMP" or "TEMPORARY" keyword
occurs between the "CREATE" and "TABLE" then the new table is created
in the temp database. It is an error to specify both a
and the TEMP or TEMPORARY keyword, unless the is
"temp". If no database name is specified and the TEMP keyword is not
present then the table is created in the main database.
May be you should accesse table via temp prefix like this: temp.search.