I am new to DynamoDB. I have two tables:
country
city
I wish to join both tables via the country_id primary key and foreign key. So can I do this in DynamoDB?
Amazon DynamoDB is a NoSQL database. This means that traditional relational database concepts such as JOINs and Foreign Keys are not available.
Your application would be responsible for "joining" tables. That is, you would need to read values from both tables and determine a relationship between them within your application. DynamoDB is not able to do this for you.
Alternatively, you could use a system such as Amazon EMR, which provides Hadoop. You could use Hadoop and Hive to access a DynamoDB table by using HiveQL (which is similar to SQL). However, this is a very slow operation and might be too complex for your particular requirement.
As #John said, Amazon DynamoDB is a NoSQL database. So it's impossible do this at one query. But, Of course, you can get the data through multiple queries if you want. Below is dynamoDB table scheme.(It's also the table creation code)
import { CreateTableInput } from 'aws-sdk/clients/dynamodb';
import AWS from 'aws-sdk'
const paramCountryTable: CreateTableInput = {
BillingMode: 'PAY_PER_REQUEST', // for just example
TableName: 'Country',
AttributeDefinitions: [
{ AttributeName: 'country_id', AttributeType: 'S' },
],
KeySchema: [
{ AttributeName: 'country_id', KeyType: 'HASH' },
],
}
const paramCityTable: CreateTableInput = {
BillingMode: 'PAY_PER_REQUEST', // for just example
TableName: 'City',
AttributeDefinitions: [
{ AttributeName: 'city_id', AttributeType: 'S' },
{ AttributeName: 'country_id', AttributeType: 'S' },
],
KeySchema: [
{ AttributeName: 'city_id', KeyType: 'HASH' },
],
GlobalSecondaryIndexes: [{
Projection: { ProjectionType: 'ALL' },
IndexName: 'IndexCountryId',
KeySchema: [
{ AttributeName: 'country_id', KeyType: 'HASH' },
],
}],
}
const run = async () => {
const dd = new AWS.DynamoDB({ apiVersion: '2012-08-10' })
await dd.createTable(paramCountryTable).promise()
await dd.createTable(paramCityTable).promise()
}
if (require.main === module) run()
Then, you can implement what you want through several queries as below.
const ddc = new AWS.DynamoDB.DocumentClient({ apiVersion: '2012-08-10' })
const getCountryByCityId = async (city_id: string) => {
const resultCity = await ddc.get({ TableName: 'City', Key: {city_id}}).promise()
if (!resultCity.Item) return null
const { country_id } = resultCity.Item
const resultCountry = await ddc.get({ TableName: 'Country', Key: {country_id}}).promise()
return resultCountry.Item
}
const getCityArrByCountryId = async (country_id: string) => {
const result = await ddc.query({
TableName: 'City',
IndexName: 'IndexCountryId',
ExpressionAttributeValues: { ":country_id": country_id },
KeyConditionExpression: "country_id = :country_id",
}).promise()
return result.Items
}
Note that this is not a recommended use case for NoSQL.
Related
can dynamodb query data for list contains object that match some attribute?
my data format:
[{
pk,
sk,
gsi1pk: 'USER',
gsi1sk,
list:[
{
id,
type, // admin, moderator, user
name,
}
]
}]
can we do something like this to find data with list contains type is 'admin'?
await Ddb.query( {
IndexName: 'GSI1',
KeyConditionExpression: 'gsi1pk = :gsi1pk',
FilterExpression: 'contains(list, :list)'
ExpressionAttributeValues: {
':gsi1pk': 'USER',
':list': { type: 'admin'}
},
});
this does not work now :(
I am new to dynamodb, and I came across sparse indexes. I think they fit for what I need, but I am not completely sure how to implement them. In my case I have a table with a Post entity that has the following fields, it looks like this:
post_id <string> | user_id <string> | tags <string[]> | public <boolean>| other post data attributes...
The queries that I need to do are:
Get all posts that are marked public
Get posts filtered by tags
Get all posts filtered by user, both public and not public
Get a single post
For case of getting all public posts I think sparse indexes could work. I could set the public attribute only to the entities that are marked with public. But, how does a query look like then?
I am not even sure if I have set up the DB correctly. I am using serverless framework and this is what I have come up, but not sure if that is good.
PostsDynamoDBTable:
Type: 'AWS::DynamoDB::Table'
Properties:
AttributeDefinitions:
- AttributeName: postId
AttributeType: S
- AttributeName: userId
AttributeType: S
- AttributeName: createdAt
AttributeType: S
KeySchema:
- AttributeName: postId
KeyType: HASH
- AttributeName: userId
KeyType: RANGE
BillingMode: PAY_PER_REQUEST
TableName: ${self:provider.environment.POSTS_TABLE}
GlobalSecondaryIndexes:
- IndexName: ${self:provider.environment.USER_ID_INDEX}
KeySchema:
- AttributeName: userId
KeyType: HASH
- AttributeName: public
KeyType: RANGE
Projection:
ProjectionType: ALL
A sparse index could work if you only want a subset of attributes, you'd want to use ProjectionType: INCLUDES to include the non-key attributes into the sparse index (in your case: public attributes). It's important to note that the only attributes you can access in a query to a sparse index are the attributes you explicitly include.
Firstly, you'd need to declare those public attributes in your Attribute Definitions.
Say for example, one of the public attributes is userName.
You'd want to add:
- AttributeName: userName
AttributeType: S
Then, in the GlobalSecondaryIndexes block:
GlobalSecondaryIndexes:
- IndexName: byUserIdAndPublic // You can name this whatever you'd like
KeySchema:
- AttributeName: userId
KeyType: HASH
- AttributeName: public
KeyType: RANGE
Projection:
NonKeyAttributes:
- userName
ProjectionType: INCLUDE
Then you simply query that index specifically - in the response, you'll get back the userName (no special query changes are required, except to specify using this index).
If you need all attributes for each post, you'd want to use ProjectionType: ALL, (and then just remove the NonKeyAttributes bit)
Here's an example nodejs method that would filter posts by public (public being a boolean passed into the method):
const listByUserIdAndPublic = async (userId, public) => {
const params = {
TableName: tableName,
IndexName: 'byUserIdAndPublic',
KeyConditionExpression: '#public = :public AND #userId = :userId',
ExpressionAttributeNames: {
'#userId': 'userId',
'#public': 'public'
},
ExpressionAttributeValues: {
':public': public,
':userId': userId
},
};
const response = await documentClient.query(params).promise();
return response.Items;
};
Based on your question:
Get all posts that are marked public
I think you'd want to use a full index (ProjectionType: ALL), because I'd imagine you want all attributes of a post to be indexed.
I'm using localstack and awslocal to run the local AWS services inside a docker container.
Localstack is configured with a docker-compose.yml file.
Tests are written using Jest and AWS Node JS sdk for DynamoDB.
'use strict';
const { dbClient, db } = require('../../adapters/dynamodb');
const dbSchema = {
TableName: 'DUMMY',
KeySchema: [
{
AttributeName: 'dummyId',
KeyType: 'HASH',
},
],
AttributeDefinitions: [
{
AttributeName: 'dummyId',
AttributeType: 'S',
},
],
ProvisionedThroughput: {
ReadCapacityUnits: 100,
WriteCapacityUnits: 100,
},
};
it('create DUMMY table', async () => {
const { TableNames } = await db.listTables().promise();
if (!TableNames.includes('DUMMY')) {
const { TableDescription } = await db.createTable(dbSchema).promise();
expect(TableDescription.TableName).toEqual('DUMMY');
}
});
it('write/read to DUMMY table', async () => {
await dbClient
.put({
TableName: 'DUMMY',
Item: {
dummyId: '123456',
selectedIds: ['1', '2', '3'],
},
})
.promise();
const { Item } = await dbClient
.get({
TableName: 'DUMMY',
Key: {
dummyId: '123456',
},
})
.promise();
expect(Item.dummyId).toEqual('123456');
expect(Item.selectedIds).toEqual(['1', '2', '3']);
});
it('delete DUMMY table', async () => {
const { TableNames } = await db.listTables().promise();
if (TableNames.includes('DUMMY')) {
const result = await db.deleteTable({ TableName: 'DUMMY' }).promise();
expect(result.TableDescription.TableName).toEqual('DUMMY');
}
}, 25000);
Jest fails on the last test and dymanodb sdk throws the mentioned exception:
I tried using the awslocal cli to replicate this issue outside of jest and the same thing happens (used official aws docs for cli command examples):
awslocal dynamodb create-table --table-name MusicCollection --attribute-definitions AttributeName=Artist,AttributeType=S AttributeName=SongTitle,AttributeType=S --key-schema AttributeName=Artist,KeyType=HASH AttributeName=SongTitle,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1 -> works!
awslocal dynamodb list-tables -> works! Logs out the list of existing tables:
{"TableNames": ["MusicCollection"]}
awslocal dynamodb delete-table --table-name MusicCollection
...and I get the same exception:
An error occurred (ResourceNotFoundException) when calling the DeleteTable operation: Cannot do operations on a non-existent table
And now the best part. Even though the exception is thrown, the table gets deleted:
Honestly, it's hard to come up with an explanation here, but I feel that there is something simple that I'm missing! Any help is appreciated <3
UPDATE:
It's not 100% confirmed yet, but in DynamoDB docs only the deleteItem is said to be idempotent, but not deleteTable which means that this could be the issue here.
This is the context:
I have a GCP Function that must to go to Datastore to get some data to return an array to client.
The Problem:
I can't achieve that GCP Functions returns data when I use Datetime filters about my code, however, when I put the equivalent query on GCP Datastore Query console, i can achieve turn back a lot of rows.
Technical data:
Datastore GQL:
select * from KIND where recordDate >= DATETIME ("2018-10-10T10:10:00.000000+03:00") and recordDate <= DATETIME ("2018-10-11T10:10:00.999999+03:00")
(It works on GCP Datastore console)
GCP Functions Code:
query = datastore.createQuery(kind).filter('recordDate','>=',dateFrom).filter('recordDate','<=',dateTo);
console.log(query);
datastore.runQuery(query, (err,entities) => {
console.log(err);
console.log(entities);
});
(It runQuery()... always returns null as err variable and returns a void Array on entity variable)
The help I need:
Can anybody tell me an example of a successful case of a query that
returns entities using Datetime filters ?
Ways I tried about the format of dateFrom and dateTo vars:
DATETIME ("2018-10-10T10:10:00.000000+03:00")
DATETIME ("2018-10-10 10:10:00")
"2018-10-10T10:10:00.000000+03:00"
'2018-10-10T10:10:00.000000+03:00'
DATETIME ("2018-10-10")
"2018-10-10"
DATE ("2018-10-10")
DATE ('2018-10-10')
DATETIME (2018-10-10T10:10:00.000000+03:00)
And no one works :(
UPDATE (2018-11-19):
I printed the query before do runQuery and I get this:
(I PUT SOME DOTS TO SAFE SENSIBLE DATA)
{
"textPayload": "Query {\n scope: \n Datastore {\n clients_: Map {},\n datastore: [Circular],\n namespace: undefined,\n projectId: '................',\n defaultBaseUrl_: 'datastore.googleapis.com',\n baseUrl_: 'datastore.googleapis.com',\n options: \n { libName: 'gccl',\n libVersion: '2.0.0',\n scopes: [Array],\n servicePath: 'datastore.googleapis.com',\n port: 443,\n projectId: 'c..........' },\n auth: \n GoogleAuth {\n checkIsGCE: undefined,\n jsonContent: null,\n cachedCredential: null,\n _cachedProjectId: 'c..........',\n keyFilename: undefined,\n scopes: [Array] } },\n namespace: null,\n kinds: [ '....KIND......' ],\n filters: \n [ { name: 'recordDate', op: '>', val: 2018-10-10T00:00:00.000Z },\n { name: 'recordDate', op: '<', val: 2018-10-12T23:59:59.000Z } ],\n orders: [],\n groupByVal: [],\n selectVal: [],\n startVal: null,\n endVal: null,\n limitVal: 20,\n offsetVal: -1 }",
"insertId": "............................098...",
"resource": {
"type": "cloud_function",
"labels": {
"region": "us-central1",
"function_name": "...................-get-search",
"project_id": "............."
}
},
"timestamp": "2018-11-19T21:19:46.737Z",
"severity": "INFO",
"labels": {
"execution_id": "792s.....lp"
},
"logName": "projects/......./logs/cloudfunctions.googleapis.com%2Fcloud-functions",
"trace": "projects/........../traces/4a457.......",
"receiveTimestamp": "2018-11-19T21:19:52.852569373Z"
}
And the Functions Code is:
query = datastore.createQuery(kind).filter('recordDate','>',new Date(dateFrom)).filter('recordDate','<',new Date(dateTo)).limit(20);
console.log(query);
var test = datastore.runQuery(query, (err,entities) => {
console.log(err);
console.log(entities);
entities.forEach(entity => {
console.log(entity);
});
return{
entities:entities,
err:err
};
});
console.log(test);
When using the client libraries to filter or sort query results based on datetime properties you should use the native datetime representation in the respective language, not strings or the GQL structures.
In particular for node.js which you apparently use you should use Date() objects. Here's an example from Restrictions on queries:
const query = datastore
.createQuery('Task')
.filter('created', '>', new Date('1990-01-01T00:00:00z'))
.filter('created', '<', new Date('2000-12-31T23:59:59z'));
I am using the Serverless framework to setup the below table:
provider:
name: aws
runtime: nodejs6.10
stage: dev
region: ap-southeast-1
environment:
DYNAMODB_TABLE: "Itinerary-${self:provider.stage}"
DYNAMODB_COUNTRY_INDEX: country_index
iamRoleStatements:
- Effect: Allow
Action:
- dynamodb:Query
- dynamodb:Scan
- dynamodb:GetItem
- dynamodb:PutItem
- dynamodb:UpdateItem
- dynamodb:DeleteItem
Resource: "arn:aws:dynamodb:${opt:region, self:provider.region}:*:table/${self:provider.environment.DYNAMODB_TABLE}"
- Effect: Allow
Action:
- dynamodb:Query
- dynamodb:Scan
- dynamodb:GetItem
- dynamodb:PutItem
- dynamodb:UpdateItem
- dynamodb:DeleteItem
Resource: "arn:aws:dynamodb:${opt:region, self:provider.region}:*:table/${self:provider.environment.DYNAMODB_TABLE}/index/${self:provider.environment.DYNAMODB_COUNTRY_INDEX}"
resources:
Resources:
ItineraryTable:
Type: 'AWS::DynamoDB::Table'
DeletionPolicy: Retain
Properties:
AttributeDefinitions:
-
AttributeName: ID
AttributeType: S
-
AttributeName: Country
AttributeType: S
KeySchema:
-
AttributeName: ID
KeyType: HASH
GlobalSecondaryIndexes:
-
IndexName: ${self:provider.environment.DYNAMODB_COUNTRY_INDEX}
KeySchema:
-
AttributeName: Country
KeyType: HASH
Projection:
ProjectionType: ALL
ProvisionedThroughput:
ReadCapacityUnits: 1
WriteCapacityUnits: 1
ProvisionedThroughput:
ReadCapacityUnits: 1
WriteCapacityUnits: 1
TableName: ${self:provider.environment.DYNAMODB_TABLE}
Now, I am able to get to do a "get" to retrieve items from the DB by ID. However, if I try to do a "query" on the GSI, it returns nothing. There is no failure, but I never get data back.
The below is what my query looks like:
const dynamoDb = new AWS.DynamoDB.DocumentClient();
var params = {
TableName: process.env.DYNAMODB_TABLE, // maps back to the serverless config variable above
IndexName: process.env.DYNAMODB_COUNTRY_INDEX, // maps back to the serverless config variable above
KeyConditionExpression: "Country=:con",
ExpressionAttributeValues: { ":con" : "Portugal" }
};
dynamoDb.query(params, (error, result) => {
// handle potential errors
if (error) {
console.error(error);
callback(null, {
statusCode: error.statusCode || 501,
headers: { 'Content-Type': 'text/plain' },
body: 'Couldn\'t fetch the Itineraries'+error,
});
return;
}
}
var json_result = JSON.stringify(result.Item);
I should add here that I can't get data if I do a filterless "scan" as well. If I try to search for items by index (or otherwise) on the AWS Dynamo web portal, I get results though.
I cannot figure out what it is that's going wrong here. Any light that someone can shed would be much appreciated.
Ok, so I figured out the problem. There was a key statement missing in the question that I posted (because I thought it wasn't relevant), but which turned out to the problem.
I was just stringifying the results from the query using:
var json_result = JSON.stringify(result.Item);
The above works for a "get", but for a query, it needs to be result.Items:
var json_result = JSON.stringify(result.Items);
Silly on my part. Thank you for the help!
P.S. I have added the statement to the original question to be clear