I have my below kql which when ran in Log Analytics give me the right result. But Now I have moved my logs to a storage account and created an ADX external table to query the same logs using Kusto. however I am finding it difficult to Query as same query wont work and will need some modification. I would need help if someone can advice on what all changes should I do in existing Kusto to get the same result.
In log Analytics this works:
"AzureDiagnostics
| where Category == 'kube-audit'
| where TimeGenerated between (datetime("$querystart") .. datetime("$queryend"))
| where (strlen(log_s) >= 32000
and not(log_s contains \"aksService\")
and not(log_s contains \"system:serviceaccount:crossplane-system:crossplane\")
and not(log_s contains \"system:serviceaccount:elastic-system:elastic-operator\")
and not(log_s contains \"system:serviceaccount:internal-services:cert-manager-cainjector\")
and not(log_s contains \"system:serviceaccount:internal-services:spinnaker\")
and not(log_s contains \"system:serviceaccount:kube-system:daemon-set-controller\")
and not(log_s contains \"system:serviceaccount:kube-system:deployment-controller\")
and not(log_s contains \"system:serviceaccount:kube-system:endpoint-controller\")
and not(log_s contains \"system:serviceaccount:kube-system:node-controller\")
and not(log_s contains \"system:serviceaccount:kube-system:replicaset-controller\")
and not(log_s contains \"system:serviceaccount:kube-system:statefulset-controller\"))
or strlen(log_s) < 32000
| extend op = parse_json(log_s)
| where not(tostring(op.verb) in (\"list\", \"get\", \"watch\"))
| where not(tostring(op.user.username) hasprefix \"system:\")
| where not(tostring(op.user.username) in (\"hcpService\", \"aksService\", \"aksProblemDetector\", \"readinessChecker\", \"nodeclient\", \"masterclient\"))
| where substring(tostring(op.responseStatus.code), 0, 1) == \"2\"
| where not(tostring(op.requestURI) in (\"/apis/authorization.k8s.io/v1/selfsubjectaccessreviews\"))
| extend user = op.user.username
| extend decision = tostring(parse_json(tostring(op.annotations)).[\"authorization.k8s.io/decision\"])
| extend requestURI = tostring(op.requestURI)
| extend name = tostring(parse_json(tostring(op.objectRef)).name)
| extend namespace = tostring(parse_json(tostring(op.objectRef)).namespace)
| extend verb = tostring(op.verb)
| project TimeGenerated, SubscriptionId, ResourceId, namespace, name, requestURI, verb, decision, ['user']
| order by TimeGenerated asc"
and the output in Log Analytics for query
AzureDiagnostics
| where Category == 'kube-audit'
On exporting to storage account and then creating an External table in ADX over it, I dont see the same schema, the result I have in ADX external table for kube-audit is something like this:
"operationName": Microsoft.ContainerService/managedClusters/diagnosticLogs/Read,
"category": kube-audit,
"ccpNamespace": 5c40f,
"resourceId": /SUBSCRIPTIONS/53AEB/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV,
"properties": {
"log": "{\"kind\":\"Event\",\"apiVersion\":\"audit.k8s.io/v1\",\"level\":\"Request\",\"auditID\":\"d80ca0b72-75eaf\",\"stage\":\"ResponseComplete\",\"requestURI\":\"/apis/apps/v1/namespaces/events/deployments/api/scale\",\"verb\":\"get\",\"user\":{\"username\":\"system:serviceaccount:kube-system:horizontal-pod-autoscaler\",\"uid\":\"d5d7-ba1cfb172033\",\"groups\":[\"system:serviceaccounts\",\"system:serviceaccounts:kube-system\",\"system:authenticated\"]},\"sourceIPs\":[\"100.11.11.0\"],\"userAgent\":\"kube-controller-manager/v1.22.6 (linux/amd64) kubernetes/0795921/system:serviceaccount:kube-system:horizontal-pod-autoscaler\",\"objectRef\":{\"resource\":\"deployments\",\"namespace\":\"events\",\"name\":\"api\",\"apiGroup\":\"apps\",\"apiVersion\":\"v1\",\"subresource\":\"scale\"},\"responseStatus\":{\"metadata\":{},\"code\":200},\"requestReceivedTimestamp\":\"2022-05-23T13:44:59.985416Z\",\"stageTimestamp\":\"2022-05-23T13:45:00.002107Z\",\"annotations\":{\"authorization.k8s.io/decision\":\"allow\",\"authorization.k8s.io/reason\":\"RBAC: allowed by ClusterRoleBinding \\\"system:controller:horizontal-pod-autoscaler\\\" of ClusterRole \\\"system:controller:horizontal-pod-autoscaler\\\" to ServiceAccount \\\"horizontal-pod-autoscaler/cfxyz\\\"\"}}\n",
"stream": "stdout",
"pod": "kube-apiserver-7d-q6v"
},
"time": 2022-05-23T13:45:00Z,
"Cloud": AzureCloud,
"Environment": prod,
"UnderlayClass": hcp-underlay,
"UnderlayName": hcp-underlay-norteurope-cx-624,
External table schema:
"TableName": logsKube,
"Schema": operationName:string,category:string,ccpNamespace:string,resourceId:string,properties:dynamic,['time']:datetime,Cloud:string,Environment:string,UnderlayClass:string,UnderlayName:string,
"DatabaseName": logsstorage,
"Folder": ,
"DocString": ,
How can I run the above query in ADX to get the result?
Create the external table manually, using the original columns' names.
Create and alter Azure Storage external tables
Should be somthing like that:
.create-or-alter external table logsKube (TenantId:string,TimeGenerated:datetime,ResourceId:string,Category:string,ResourceGroup:string,SubscriptionId:string,ResourceProvider:string,Resource:string,ResourceType:string,OperationName:string,ResultType:string,CorrelationId:string,ResultDescription:string,Tenant_g:string,JobId_g:string,RunbookName_s:string,StreamType_s:string,Caller_s:string,requestUri_s:string,Level:string,DurationMs:string,CallerIPAddress:string,OperationVersion:string,ResultSignature:string,id_s:string,status_s:string,LogicalServerName_s:string,Message:string,clientInfo_s:string,httpStatusCode_d:string,identity_claim_appid_g:string,identity_claim_http_schemas_microsoft_com_identity_claims_objectidentifier_g:string,userAgent_s:string,ruleName_s:string,identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_upn_s:string,systemId_g:string,isAccessPolicyMatch_b:string,EventName_s:string,httpMethod_s:string,subnetId_s:string,type_s:string,instanceId_s:string,macAddress_s:string,vnetResourceGuid_g:string,direction_s:string,subnetPrefix_s:string,primaryIPv4Address_s:string,conditions_sourcePortRange_s:string,priority_d:string,conditions_destinationPortRange_s:string,conditions_destinationIP_s:string,conditions_None_s:string,conditions_sourceIP_s:string,httpVersion_s:string,matchedConnections_d:string,startTime_t:string,endTime_t:string,DatabaseName_s:string,clientIP_s:string,host_s:string,requestQuery_s:string,sslEnabled_s:string,clientPort_d:string,httpStatus_d:string,receivedBytes_d:string,sentBytes_d:string,timeTaken_d:string,resultDescription_ErrorJobs_s:string,resultDescription_ChildJobs_s:string,identity_claim_http_schemas_microsoft_com_identity_claims_scope_s:string,workflowId_s:string,resource_location_s:string,resource_workflowId_g:string,resource_resourceGroupName_s:string,resource_subscriptionId_g:string,resource_runId_s:string,resource_workflowName_s:string,_schema_s:string,correlation_clientTrackingId_s:string,properties_sku_Family_s:string,properties_sku_Name_s:string,properties_tenantId_g:string,properties_enabledForDeployment_b:string,code_s:string,resultDescription_Summary_MachineId_s:string,resultDescription_Summary_ScheduleName_s:string,resultDescription_Summary_Status_s:string,resultDescription_Summary_StatusDescription_s:string,resultDescription_Summary_MachineName_s:string,resultDescription_Summary_TotalUpdatesInstalled_d:string,resultDescription_Summary_RebootRequired_b:string,resultDescription_Summary_TotalUpdatesFailed_d:string,resultDescription_Summary_InstallPercentage_d:string,resultDescription_Summary_StartDateTimeUtc_t:string,resource_triggerName_s:string,resultDescription_Summary_InitialRequiredUpdatesCount_d:string,properties_enabledForTemplateDeployment_b:string,resultDescription_Summary_EndDateTimeUtc_s:string,resultDescription_Summary_DurationInMinutes_s:string,resource_originRunId_s:string,properties_enabledForDiskEncryption_b:string,resource_actionName_s:string,correlation_actionTrackingId_g:string,resultDescription_Summary_EndDateTimeUtc_t:string,resultDescription_Summary_DurationInMinutes_d:string,conditions_protocols_s:string,identity_claim_ipaddr_s:string,ElasticPoolName_s:string,identity_claim_http_schemas_microsoft_com_claims_authnmethodsreferences_s:string,RunOn_s:string,query_hash_s:string,SourceSystem:string,MG:string,ManagementGroupName:string,Computer:string,RawData:string,certificatePolicyProperties_certificateProperties_subject_s:string,certificatePolicyProperties_certificateProperties_validityInMonths_d:string,certificatePolicyProperties_keyProperties_type_s:string,certificatePolicyProperties_keyProperties_size_d:string,certificatePolicyProperties_keyProperties_export_b:string,certificatePolicyProperties_secretProperties_type_s:string,certificatePolicyProperties_certificateIssuerProperties_name_s:string,error_state_d:string,location_s:string,Tenant_s:string,RecoveryJobDestination_s:string,RecoveryJobRPLocation_s:string,RecoveryLocationType_s:string,upstreamSourcePort_s:string,ProtectedContainerOSType_s:string,ProtectedContainerOSVersion_s:string,GatewayManagerVersion_s:string,targetResources_CertificateName_s:string,displayResourceId_s:string,executionClusterType_s:string,clientResponseTime_d:string,targetResources_NodeConfigurationName_s:string,targetResources_NodeId_g:string,targetResources_CredentialId_g:string,targetResources_CredentialName_s:string,targetResources_DscConfigurationName_s:string,targetResources_VariableId_g:string,targetResources_VariableName_s:string,targetResources_RunbookId_g:string,targetResources_RunbookName_s:string,targetResources_ModuleId_g:string,targetResources_ModuleName_s:string,targetResources_ScheduleId_g:string,targetResources_ScheduleName_s:string,clientInfo_TenantId_g:string,clientInfo_Issuer_s:string,clientInfo_ObjectId_g:string,clientInfo_AppId_g:string,targetResources_JobScheduleId_g:string,targetResources_JobName_s:string,clientInfo_IpAddress_s:string,clientInfo_PrincipalName_s:string,clientInfo_ClientRequestId_g:string,targetResources_Resource_s:string,targetResources_JobId_g:string,targetResources_JobName_g:string,clusterType_s:string,identity_claim_upn_s:string,DataCenterName_s:string,identity_claim_scp_s:string,identity_claim_unique_name_s:string,identity_claim_amr_s:string,identity_claim_oid_g:string,identity_claim_home_oid_g:string,removedAccessPolicy_Permissions_storage_s:string,replicationHealthErrors_s:string,eventGridEventProperties_topic_s:string,eventGridEventProperties_subject_s:string,eventGridEventProperties_eventType_s:string,eventGridEventProperties_eventTime_t:string,eventGridEventProperties_data_Id_s:string,eventGridEventProperties_data_VaultName_s:string,eventGridEventProperties_data_ObjectType_s:string,eventGridEventProperties_data_ObjectName_s:string,eventGridEventProperties_data_Version_s:string,eventGridEventProperties_dataVersion_s:string,properties_networkAcls_bypass_s:string,properties_networkAcls_defaultAction_s:string,properties_softDeleteRetentionInDays_d:string,error_number_d:string,Severity:string,user_defined_b:string,state_d:string,PolicyUniqueId_s:string,ProtectedContainerName_g:string,identity_claim_http_schemas_xmlsoap_org_ws_2005_05_identity_claims_name_s:string,retryHistory_s:string,network_s:string,nexthop_s:string,locprf_s:string,weight_s:string,path_s:string,addressfamily_s:string,ClientOperationId_g:string,CorrelationRequestId_g:string,Region_s:string,ScaleUnit_s:string,ActivityId_g:string,EventTimeString_s:string,EventProperties_s:string,SKU_s:string,virtual_core_count_s:string,avg_cpu_percent_s:string,reserved_storage_mb_s:string,storage_space_used_mb_s:string,io_requests_s:string,io_bytes_read_s:string,io_bytes_written_s:string,timeOfOccurence_t:string,eventType_s:string,description_s:string,healthErrors_s:string,logId_g:string,removedAccessPolicy_TenantId_g:string,removedAccessPolicy_ObjectId_g:string,removedAccessPolicy_Permissions_keys_s:string,removedAccessPolicy_Permissions_secrets_s:string,removedAccessPolicy_Permissions_certificates_s:string,addedAccessPolicy_TenantId_g:string,addedAccessPolicy_ObjectId_g:string,addedAccessPolicy_Permissions_keys_s:string,addedAccessPolicy_Permissions_secrets_s:string,addedAccessPolicy_Permissions_certificates_s:string,addedAccessPolicy_Permissions_storage_s:string,properties_enableSoftDelete_b:string,JobOperationSubType_s:string,DataTransferredInMB_s:string,ProtectedInstanceCount_s:string,StorageConsumedInMBs_s:string,StorageType_s:string,StorageName_s:string,OldestRecoveryPointTime_s:string,OldestRecoveryPointLocation_s:string,LatestRecoveryPointTime_s:string,LatestRecoveryPointLocation_s:string,BackupItemFrontEndSize_s:string,StorageUniqueId_s:string,AlertConsolidationStatus_s:string,CountOfAlertsConsolidated_s:string,AlertRaisedOn_s:string,AlertCode_s:string,RecommendedAction_s:string,AlertUniqueId_s:string,AlertType_s:string,AlertStatus_s:string,AlertOccurrenceDateTime_s:string,AlertSeverity_s:string,TelemetryProperties_s:string,AdHocOrScheduledJob_s:string,affectedResourceId_s:string,JobUniqueId_g:string,JobOperation_s:string,JobStatus_s:string,JobFailureCode_s:string,JobStartDateTime_s:string,JobDurationInSecs_s:string,RecoveryJobRPDateTime_s:string,affectedResourceName_s:string,affectedResourceId_g:string,affectedResourceType_s:string,logId_d:string,DeploymentUnit_s:string,CloudStorageInBytes_s:string,ProtectedInstances_s:string,trustedService_s:string,OptionName_s:string,OptionDesiredState_s:string,OptionActualState_s:string,OptionDisableReason_s:string,IsDisabledBySystem_d:string,DatabaseDesiredMode_s:string,DatabaseActualMode_s:string,RegisteredContainerId_s:string,ProtectedServerType_s:string,ProtectedServerFriendlyName_s:string,BackupManagementServerUniqueId_s:string,BackupItemId_s:string,ProtectedServerName_s:string,ProtectionState_s:string,ProtectedServerUniqueId_s:string,exec_type_d:string,wait_category_s:string,total_query_wait_time_ms_d:string,max_query_wait_time_ms_d:string,is_parameterizable_s:string,statement_type_s:string,statement_key_hash_s:string,query_param_type_d:string,interval_start_time_d:string,interval_end_time_d:string,logical_io_writes_d:string,max_logical_io_writes_d:string,physical_io_reads_d:string,max_physical_io_reads_d:string,logical_io_reads_d:string,max_logical_io_reads_d:string,execution_type_d:string,count_executions_d:string,cpu_time_d:string,max_cpu_time_d:string,dop_d:string,max_dop_d:string,rowcount_d:string,max_rowcount_d:string,query_max_used_memory_d:string,max_query_max_used_memory_d:string,duration_d:string,max_duration_d:string,num_physical_io_reads_d:string,max_num_physical_io_reads_d:string,log_bytes_used_d:string,max_log_bytes_used_d:string,query_id_d:string,plan_id_d:string,query_plan_hash_s:string,statement_sql_handle_s:string,tags_displayName_s:string,error_code_s:string,error_message_s:string,start_utc_date_t:string,end_utc_date_t:string,wait_type_s:string,delta_max_wait_time_ms_d:string,delta_signal_wait_time_ms_d:string,delta_wait_time_ms_d:string,delta_waiting_tasks_count_d:string,LogBackupFrequency_s:string,LogBackupRetentionDuration_s:string,PolicyTimeZone_s:string,PolicyName_s:string,BackupFrequency_s:string,BackupTimes_s:string,BackupDaysOfTheWeek_s:string,DailyRetentionDuration_s:string,DailyRetentionTimes_s:string,ProtectedContainerFriendlyName_s:string,ProtectedContainerWorkloadType_s:string,ProtectedContainerName_s:string,ProtectedContainerProtectionState_s:string,ProtectedContainerLocation_s:string,ProtectedContainerType_s:string,listenerName_s:string,backendPoolName_s:string,backendSettingName_s:string,originalRequestUriWithArgs_s:string,transactionId_g:string,sslCipher_s:string,sslProtocol_s:string,sslClientVerify_s:string,sslClientCertificateFingerprint_s:string,sslClientCertificateIssuerName_s:string,serverRouted_s:string,serverStatus_s:string,serverResponseLatency_s:string,originalHost_s:string,EndpointName_s:string,Status_s:string,NodeId_g:string,NodeName_s:string,NodeComplianceStatus_s:string,DscReportId_g:string,DscReportStatus_s:string,LastSeenTime_t:string,ReportStartTime_t:string,ReportEndTime_t:string,ConfigurationMode_s:string,HostName_s:string,NumberOfResources_d:string,IPAddress:string,DscResourceId_s:string,DscResourceName_s:string,DscResourceStatus_s:string,DscModuleName_s:string,DscModuleVersion_s:string,DscConfigurationName_s:string,DscResourceDuration_d:string,ErrorCode_s:string,ErrorMessage_s:string,BackupItemProtectionState_s:string,BackupItemAppVersion_s:string,BackupItemUniqueId_s:string,BackupItemName_s:string,BackupItemFriendlyName_s:string,BackupItemType_s:string,BackupManagementType_s:string,ProtectedContainerUniqueId_s:string,PolicyUniqueId_g:string,timeStamp_t:string,lastRecoveryPoint_t:string,latestAppConsistentRecoveryPoint_t:string,replicatingDisksCount_d:string,uploadRPOInSeconds_d:string,uploadRPOUpdateTime_t:string,processedRPOInSeconds_d:string,processedRPOUpdateTime_t:string,EventId_d:string,VaultUniqueId_s:string,VaultName_s:string,AzureDataCenter_s:string,VaultTags_s:string,ResourceGroupName_s:string,StorageReplicationType_s:string,SchemaVersion_s:string,State_s:string,InstanceName_s:string,Value_s:string,ProviderName_s:string,TaskName_s:string,agentVersion_s:string,recoveryRegion_s:string,multiVmGroupId_g:string,multiVmGroupName_s:string,multiVmGroupCreateOption_s:string,recoveryNetworkId_s:string,lastHeartbeat_t:string,multiVmSyncStatus_s:string,targetVmNicDetails_s:string,recoveryServicesProviderId_g:string,replicationHealth_s:string,failoverHealth_s:string,name_s:string,id_g:string,primaryFabricName_s:string,recoveryFabricName_s:string,primaryFabricType_s:string,recoveryFabricType_s:string,primaryContainerName_s:string,recoveryContainerName_s:string,protectionState_s:string,activeLocation_s:string,policyName_s:string,replicationProviderName_s:string,osFamily_s:string,initialReplicationProgressPercentage_d:string,itemType_s:string,failoverHealthErrors_s:string,rpoInSeconds_d:string,lastRpoCalculatedTime_t:string,version_s:string,attrs_s:string,containerID_s:string,ccpNamespace_s:string,log_s:string,stream_s:string,pod_s:string,Cloud_s:string,Environment_s:string,UnderlayClass_s:string,UnderlayName_s:string,msg_s:string,AdditionalFields:string,Type:string,_ResourceId:string)
kind=storage
dataformat=csv
(
h#'abfss://filesystem#storageaccount.dfs.core.windows.net/path;secretKey'
)
with (includeHeaders=all)
vehicles --> accounts --> organizations <-- users
We have the above graph structure where vechicles , accounts, organizations and users are vertex labels and the arrows indicate the edge direction.
Consider the following number of vertices :
organizations = 1
accounts per organizations = 2
vehciles per account = 5000
users per organizations = 100
Our requirement is , given two vertexIds , find a set of all users and vehicles that satisfy the above graph.
For example if I have vertex1 = accounts:1 and vertex2 = organizations:1 , find the set of users and vehicles that are part of these two vertices.
We have the following query
g.V('accounts:1').outE().otherV().hasId('organizations:1')
.V('accounts:1').inE().otherV().as('B')
.V('organizations:1').inE().otherV().as('A')
.select('A', 'B')
While this works , the query takes ~3.5 seconds to complete , now we know that there are going to be 500000 traversers for this query.
Is there a better way to do this ?
Thanks for the help
Edit #1 : Attaching the query's profile API response
Optimized Traversal
===================
Neptune steps:
[
NeptuneGraphQueryStep(VertexId)#[A, B] {
JoinGroupNode {
JoinGroupNode {
PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=1}
PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
}, finishers=[dedup(?3)]
PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=102}
PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102, indexTime=0, joinTime=128, numSearches=102, actualTotalOutput=102}
PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100, indexTime=1, joinTime=6, numSearches=102, actualTotalOutput=100}
PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=0, joinTime=128, numSearches=100, actualTotalOutput=100}
PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=83, numSearches=100, actualTotalOutput=100}
PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=1, joinTime=1, numSearches=1, actualTotalOutput=100}
PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=100}
PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=100}
PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000, indexTime=0, joinTime=119, numSearches=1, actualTotalOutput=500000}
PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000, indexTime=194, joinTime=142, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000, indexTime=183, joinTime=499, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000, indexTime=193, joinTime=858, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260, indexTime=360, joinTime=1372, numSearches=500}
}, annotations={path=[Vertex(?1):GraphStep, Edge(?6):VertexStep, Vertex(?3):EdgeOtherVertexStep, Vertex(?8):GraphStep, Edge(?13):VertexStep, Vertex(?10):EdgeOtherVertexStep, VertexId(?10):IdStep#[A], Vertex(?16):GraphStep, Edge(?21):VertexStep, Vertex(?18):EdgeOtherVertexStep, VertexId(?18):IdStep#[B]], joinStats=true, optimizationTime=329, maxVarId=24, executionTime=6279}
},
NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [SelectStep(last,[A, B])]
WARNING: >> SelectStep(last,[A, B]) << (or one of its children) is not supported natively yet
Physical Pipeline
=================
NeptuneGraphQueryStep#[A, B]
|-- StartOp
|-- JoinGroupOp
|-- JoinGroupOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- FilterOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260})
Runtime (ms)
============
Query Execution: 6283.262
Serialization: 2120.104
Traversal Metrics
=================
Step Count Traversers Time (ms) % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(VertexId)#[A, obje... 500000 500000 2502.636 41.43
NeptuneTraverserConverterStep 500000 500000 2580.098 42.71
SelectStep(last,[A, B]) 500000 500000 958.328 15.86
>TOTAL - - 6041.062 -
Predicates
==========
# of predicates: 37
WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance
Results
=======
Count: 500000
Output: <Removed for space>
Response serializer: application/vnd.gremlin-v3.0+gryo
Response size (bytes): 64,000,045
Index Operations
================
Query execution:
# of statement index ops: 15915
# of unique statement index ops: 15915
Duplication ratio: 1.0
# of terms materialized: 0
Serialization:
# of statement index ops: 0
# of terms materialized: 0
If possible always provide labels on traversal steps like in() and out(). Also, you do not need to specify inE().otherV() unless you need data from the edge. in() will suffice. As a first step I would try:
g.V('accounts:1').out(<labels>).hasId('organizations:1')
.V('accounts:1').in(<labels>).as('B')
.V('organizations:1').in(<labels>).as('A')
.select('A', 'B')
Where <labels> will be of the form in('works-with','knows').
Using edge labels, especially on the in steps can help a lot in some cases. I would start there as a first step. There are other rewrites that can be tried but this is a good first step.