I'm trying to publish a data factory solution with this ADF DataLakeAnalyticsU-SQL pipeline activity following the azure step by step doc (https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity).
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "\\scripts\\111_risk_index.usql",
"scriptLinkedService": "PremiumAzureDataLakeStoreLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/DF_INPUT/Consodata_Prelios_consegna_230617.txt",
"out": "/DF_OUTPUT/111_Analytics.txt"
}
},
"inputs": [
{
"name": "PremiumDataLakeStoreLocation"
}
],
"outputs": [
{
"name": "PremiumDataLakeStoreLocation"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "ConsodataFilesProcessing",
"linkedServiceName": "PremiumAzureDataLakeAnalyticsLinkedService"
}
During publishing got this error:
25/07/2017 18:51:59- Publishing Project 'Premium.DataFactory'....
25/07/2017 18:51:59- Validating 6 json files
25/07/2017 18:52:15- Publishing Project 'Premium.DataFactory' to Data
Factory 'premium-df'
25/07/2017 18:52:15- Value cannot be null.
Parameter name: value
Trying to figure up what could be wrong with the project it came up that the issues reside into the activity options "typeProperties" as shown above, specifically for scriptPath and scriptLinkedService attributes. The doc says:
scriptPath: Path to folder that contains the U-SQL script. Name of the file
is case-sensitive.
scriptLinkedService: Linked service that links the storage that contains the
script to the data factory
Publishing the project without them (using hard-coded script) it will complete successfully. The problem is that I can't either figure out what exactly put into them. I tried with several combinations paths. The only thing I know is that the script file must be referenced locally into the solution as a dependency.
The script linked service needs to be Blob Storage, not Data Lake Storage.
Ignore the publishing error, its misleading.
Have a linked service in your solution to an Azure Storage Account, referred to in the 'scriptLinkedService' attribute. Then in the 'scriptPath' attribute reference the blob container + path.
For example:
"typeProperties": {
"scriptPath": "datafactorysupportingfiles/CreateDimensions - Daily.usql",
"scriptLinkedService": "BlobStore",
"degreeOfParallelism": 2,
"priority": 7
},
Hope this helps.
Ps. Double check for case sensitivity on attribute names. It can also throw unhelpful errors.
Related
I am using Copy Activity in my Datafactory(V2) to query Cosmos DB (NO SQL/SQLAPI). I have a where clause to build datetime from parts using DateTimeFromParts datetime function. THis query works fine when I execute it on the Cosmos DB data explorer query window. But when i use the same query from my copy activity I get the following error:
"message":"'DateTimeFromParts' is not a recognized built-in function name."}]}
ActivityId: ac322e36-73b2-4d54-a840-6a55e456e15e, documentdb-dotnet-sdk/2.5.1 Host/64-bit
I am trying convert a string attribute which is like this '20221231', this translates to Dec 31,2022, to a date to compare it with current date, i use the DateTimeFromParts to build the date, is there another way to convert this '20221231' to a valid date
Select * from c where
DateTimeFromParts(StringToNumber(LEFT(c.userDate, 4)), StringToNumber(SUBSTRING(c.userDate,4, 2)), StringToNumber(RIGHT(c.userDate, 2))) < GetCurrentDateTime()
I suspect the error might be because the documentdb-dotnet-sdk might be an old version. Is there way to specify which sdk to use in the activity?
I tried to repro this and got the same error.
Instead of changing the format of userDate column using DateTimeFromParts function, try changing the GetCurrentDateTime() function to userDate column format.
Workaround query:
SELECT * FROM c
where c.userDate <
replace(left(GetCurrentDateTime(),10),'-','')
Input data
[
{
"id": "1",
"userDate": "20221231"
},
{
"id": "2",
"userDate": "20211231",
}
]
Output data
[
{
"id": "2",
"userDate": "20211231"
}
]
Apologies for the slow reply here. Holidays slowed getting an answer for this.
There is a workaround that allows you to use the SDK v3 which would then allows you to access the DateTimeFromParts() system function which was released in .NET SDK v.3.13.0.
Option 1: Use AAD authentication (i.e Service Principal or System or User Managed Identity) for the Linked Service object in ADF to Cosmos DB. This will automatically pick up the .NET SDK v3.
Option 2: Modify the linked service template. First, click on Manage in ADF designer, next click on Linked Services, then select the connection and click the {} to open the JSON template, you can then modify and set useV3 to true. Here is an example.
{
"name": "<CosmosDbV3>",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"annotations": [],
"type": "CosmosDb",
"typeProperties": {
"useV3": true,
"accountEndpoint": "<https://sample.documents.azure.com:443/>",
"database": "<db>",
"accountKey": {
"type": "SecureString",
"value": "<account key>"
}
}
}
}
Is there anyway to use the reference funtion in an ARM parameter file? I understand the following can be used to retrieve the intrumentation key of an app insights instance but this doesnt seem to work in a parameter file.
"[reference('microsoft.insights/components/web-app-name-01', '2015-05-01').InstrumentationKey]"
I currently set a long list of environment variables using an array from a parameter file and need to include the dynamic app insights instrumentation key to that list of variables.
Unfortunately, no.
Reference function only works at runtime. It can't be used in the parameters or variables sections because both are resolved during the initial parsing phase of the template.
Here is an excerpt from the docs and also how to use reference correctly:
You can't use the reference function in the variables section of the template. The reference function derives its value from the resource's runtime state. However, variables are resolved during the initial parsing of the template. Construct values that need the reference function directly in the resources or outputs section of the template.
Not in a param file... it's possible to simulate what you want by nested a deployment if that's an option. So your param file can contain the resourceId of the insights resource and then a nested deployment can make the reference call - but TBH, probably easier to fetch the key as a pipeline step (or similar).
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"insightsResourceId": {
"type": "string",
"defaultValue": "'microsoft.insights/components/web-app-name-01'"
}
},
"resources": [
{
"apiVersion": "2018-02-01",
"type": "Microsoft.Resources/deployments",
"name": "nestedDeployment",
"properties": {
"mode": "Incremental",
"parameters": {
"instrumentationKey": {
"value": "[reference(parameters('insightsResourceId'), '2015-05-01').InstrumentationKey]"
}
},
"template": {
// original template goes here
}
}
}
]
}
Way 1
Use the reference function in your parameter file for resources that are already deployed in another template. For that you have to pass the ApiVersion parameter. Refer MsDoc. which follows:
"value": "[reference(resourceId(variables('<AppInsightsResourceGroup>'),'Microsoft.Insights/components', variables('<ApplicationInsightsName>')), '2015-05-01', 'Full').properties.InstrumentationKey]"
You need to change the property that you are referencing from '.InstrumentationKey' to '.properties.InstrumentationKey'.
Refer to Kwill answer for more information.
Way 2
Get the Content of parameter file in PowerShell variable/Object using
$ParameterObject = Get-Content ./ParameterFileName.json
Update the Parameter file values using
#Assign the parameter values using
$ParameterObject.parameters.<KeyName>.value = "your dynamic value"
Pass $parameterObject to -TemplateParameterObject parameter
Refer here
Way 3
You have to Add/Modify the parameter file values using (PowerShell/ Dev lang (like Python, c#,...) ). After changing the parameter file try to deploy it.
The amplify docks here says that we can configure a lambda function as a dynamodb trigger by running **amplify add function** and selecting the "Lambda Trigger" option, but when I run the "amplify add api" (selected Python as runtime language) I am not getting the lambda trigger option, I'm only getting the "Serverless function" and "lambda layer" options.
Please help me to resolve this issue to access the feature.
docs snapshot - showing 4 options
my CLI snapshot - showing only 2 options
I know it works for nodejs runtime lambda, but I want this option for Python Lambda as well.
Just followed these steps with amplify CLI version 4.50.2.
To create a lambda function that is triggered by changes to a DynamoDB table, you can use the following command line actions, which are walked-through inside of the CLI after entering the below command:
amplify add function
Select which capability you want to add:
❯ Lambda function (serverless function)
Provide an AWS Lambda function name:
<YourFunctionsName>
Choose the runtime that you want to use:
> NodeJS # IMPORTANT: Must be NodeJS as of now, you can change this later by manually editing ...-cloudformation-template.json file inside function directory
Choose the function template you want to use
> Lambda Trigger
What event source do you want to associate with the lambda trigger
> Amazon DynamoDB Stream
Choose a DynamoDB event source option
>Use API category graphql #model backend DynamoDB table(s) in the current Amplify project
Choose the graphql #model(s)
<Select any models (using spacebar) you want to trigger the function after editing>
Do you want to trigger advanced settings
Y # IMPORTANT: If you are using a dynamodb event source based on a table defined by graphql schema, you will need to give this function read access to the api resource that contains the graphql schema that defines the table that drives the event
Do you want to access other resources in this project from your Lambda function?
y # See above, select your api that contains the data model and make sure that the function has at least read access.
After this, the other options (layer, call scheduling) are up to you.
After creating the function via the above CLI options, you can change the "Runtime" field inside the -cloudformation-template.json file inside function directory, eg if you want a python lambda function change the runtime to "python3.8". You will also need to create a file called index.py inside your function's directory which has a handler(event, context) function. See example below:
import json
def handler(event, context):
print("Triggered via DynamoDB")
print(event)
return json.dumps({'status_code': 200, "message": "Received from DynamoDB"})
After making these edits, you can run amplify push and, if you open your fxn in the management console online, it should show an attached dynamoDB stream.
Doesn't appear to be available anymore in the CLI codebase - see Supported-service.json deleted and replaced by supported-services.ts
https://github.com/aws-amplify/amplify-cli/commit/607ae21287941805f44ea8a9b78dd12d16d71f85#diff-a0fd8c5607fd81977cb4745b9af3af2c6649ded748991bf9968a7d782b000c6b
https://github.com/aws-amplify/amplify-cli/commits/4e974007d95c894ab4108a2dff8d5996e7e3ce25/packages/amplify-category-function/src/provider-utils/supported-services.ts
Select nodejs and you will be able to view lambda trigger
just add the following to {YOUR_FUNCTION_NAME}-cloudformation-template.json, remember to replace (YOUR_TABLE_NAME) to your table name.
"LambdaTriggerPolicyPurchase": {
"DependsOn": [
"LambdaExecutionRole"
],
"Type": "AWS::IAM::Policy",
"Properties": {
"PolicyName": "amplify-lambda-execution-policy-Purchase",
"Roles": [
{
"Ref": "LambdaExecutionRole"
}
],
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:DescribeStream",
"dynamodb:GetRecords",
"dynamodb:GetShardIterator",
"dynamodb:ListStreams"
],
"Resource": {
"Fn::ImportValue": {
"Fn::Sub": "${apilanguageGraphQLAPIIdOutput}:GetAtt:(YOUR_TABLE_NAME):StreamArn"
}
}
}
]
}
}
},
"LambdaEventSourceMappingPurchase": {
"Type": "AWS::Lambda::EventSourceMapping",
"DependsOn": [
"LambdaTriggerPolicyPurchase",
"LambdaExecutionRole"
],
"Properties": {
"BatchSize": 100,
"Enabled": true,
"EventSourceArn": {
"Fn::ImportValue": {
"Fn::Sub": "${apilanguageGraphQLAPIIdOutput}:GetAtt:(YOUR_TABLE_NAME):StreamArn"
}
},
"FunctionName": {
"Fn::GetAtt": [
"LambdaFunction",
"Arn"
]
},
"StartingPosition": "LATEST"
}
},
i got them by creating a dummy function using the template that shows up after you choose nodejs and checking compare its -cloudformation-template.json with my own function
I have upgraded to the latest Cloud Endpoints 2.0 as well as the endpoints_proto_datastore to its latest commit. When I now try to generate the API discovery doc I get the following error messages:
Method user.update specifies path parameters but you are not using a ResourceContainer This will fail in future releases; please switch to using ResourceContainer as soon as possible
Method position.update specifies path parameters but you are not using a ResourceContainer This will fail in future releases; please switch to using ResourceContainer as soon as possible
The only two available endpoints are the following two methods which should update the User and the Position model:
#User.method(name='user.update', path='users/{id}', http_method='PUT')
def UserUpdate(self, user):
""" Update an user resource. """
user.put()
return user
#Position.method(name='position.update', path='positions/{id}', http_method='PUT')
def PositionUpdate(self, position):
""" Update a position resource. """
position.put()
return position
Before upgrading to Cloud Endpoints 2.0 everything worked fine. But now if I take a look into the generated discovery file both endpoints have a ProtorpcMessagesCombinedContainer in their request. But the combined container itself is defined with the properties of the Position model!
This is how both methods request attribute are defined:
"request": {
"$ref": "ProtorpcMessagesCombinedContainer",
"parameterName": "resource"
},
And this is the definition of the combined container (which has the properties of the Position model):
"ProtorpcMessagesCombinedContainer": {
"id": "ProtorpcMessagesCombinedContainer",
"type": "object",
"properties": {
"displayName": {
"type": "string"
},
"shortName": {
"type": "string"
}
}
},
Does anyone else had this issue with GAE and Cloud Endpoints 2.0?
What am I doing wrong? Usually the endpoints-proto-datastore should handle the ResourceContainer and the methods path parameters. Also the endpoints-proto-datastore wasn't updated for years ... I really don't know where the error comes from.
Thanks for your help!
** UPDATE **
Thanks to Alfred Fuller for pointing out that I need to create a manual index for this query.
Unfortunately, using the JSON API, from a .NET application, there does not appear to be an officially supported way of doing so. In fact, there does not officially appear to be a way to do this at all from an app outside of App Engine, which is strange since the Cloud Datastore API was designed to allow access to the Datastore outside of App Engine.
The closest hack I could find was to POST the index definition using RPC to http://appengine.google.com/api/datastore/index/add. Can someone give me the raw spec for how to do this exactly (i.e. URL parameters, what exactly should the body look like, etc), perhaps using Fiddler to inspect the call made by appcfg.cmd?
** ORIGINAL QUESTION **
According to the docs, "a query can combine equality (EQUAL) filters for different properties, along with one or more inequality filters on a single property".
However, this query fails:
{
"query": {
"kinds": [
{
"name": "CodeProse.Pogo.Tests.TestPerson"
}
],
"filter": {
"compositeFilter": {
"operator": "and",
"filters": [
{
"propertyFilter": {
"operator": "equal",
"property": {
"name": "DepartmentCode"
},
"value": {
"integerValue": "123"
}
}
},
{
"propertyFilter": {
"operator": "greaterThan",
"property": {
"name": "HourlyRate"
},
"value": {
"doubleValue": 50
}
}
},
{
"propertyFilter": {
"operator": "lessThan",
"property": {
"name": "HourlyRate"
},
"value": {
"doubleValue": 100
}
}
}
]
}
}
}
}
with the following response:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "FAILED_PRECONDITION",
"message": "no matching index found.",
"locationType": "header",
"location": "If-Match"
}
],
"code": 412,
"message": "no matching index found."
}
}
The JSON API does not yet support local index generation, but we've documented a process that you can follow to generate the xml definition of the index at https://developers.google.com/datastore/docs/tools/indexconfig#Datastore_Manual_index_configuration
Please give this a shot and let us know if it doesn't work.
This is a temporary solution that we hope to replace with automatic local index generation as soon as we can.
The error "no matching index found." indicates that an index needs to be added for the query to work. See the auto index generation documentation.
In this case you need an index with the properties DepartmentCode and HourlyRate (in that order).
For gcloud-node I fixed it with those 3 links:
https://github.com/GoogleCloudPlatform/gcloud-node/issues/369
https://github.com/GoogleCloudPlatform/gcloud-node/blob/master/system-test/data/index.yaml
and most important link:
https://cloud.google.com/appengine/docs/python/config/indexconfig#Python_About_index_yaml to write your index.yaml file
As explained in the last link, an index is what allows complex queries to run faster by storing the result set of the queries in an index. When you get no matching index found it means that you tried to run a complex query involving order or filter. So to make your query work, you need to create your index on the google datastore indexes by creating a config file manually to define your indexes that represent the query you are trying to run. Here is how you fix:
create an index.yaml file in a folder named for example indexes in your app directory by following the directives for the python conf file: https://cloud.google.com/appengine/docs/python/config/indexconfig#Python_About_index_yaml or get inspiration from the gcloud-node tests in https://github.com/GoogleCloudPlatform/gcloud-node/blob/master/system-test/data/index.yaml
create the indexes from the config file with this command:
gcloud preview datastore create-indexes indexes/index.yaml
see https://cloud.google.com/sdk/gcloud/reference/preview/datastore/create-indexes
wait for the indexes to serve on your developer console in Cloud Datastore/Indexes, the interface should display "serving" once the index is built
once it is serving your query should work
For example for this query:
var q = ds.createQuery('project')
.filter('tags =', category)
.order('-date');
index.yaml looks like:
indexes:
- kind: project
ancestor: no
properties:
- name: tags
- name: date
direction: desc
Try not to order the result. After removing orderby(), it worked for me.