DynamoDb streams, just get new updates since - amazon-dynamodb

I'm trying to work with DynamoDb streams, I am using the example code shown in this article. I've modified it to work in a basic Spring Boot app (initializr), utilizing an existing DynamoDb table which has streams enabled. Everything appears to work, however; I'm not seeing any new updates.
This particular database has a bulk update once per day at a specific time, it may get some minor changes now and then during the day. I'm trying to monitor these minor updates. When I run the application I can see the records from the bulk update, however if my application is running and I use the AWS Console to modify, create or delete a record I don't seem to get any output.
I'm using:
Spring Boot:2.3.9.RELEASE
amazon-kinesis-client:1.14.2
Java 11
Running on Mac Catalina (though that shouldn't matter)
In my test application I did the following:
package com.test.dynamodb_streams_test_kcl.service;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBStreams;
import com.amazonaws.services.dynamodbv2.model.*;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import javax.annotation.PostConstruct;
import java.time.ZoneId;
import java.time.ZonedDateTime;
import java.util.List;
#Slf4j
#Service
#RequiredArgsConstructor
public class LowLevelKclProcessor {
private static final String dynamoDbTableName = "global-items";
private final AmazonDynamoDB dynamoDB;
private final AmazonDynamoDBStreams dynamoDBStreams;
private final ZonedDateTime startTime = ZonedDateTime.now();
#PostConstruct
public void initialize() {
log.info("Describing table={}", dynamoDbTableName);
DescribeTableResult itemTableDescription = dynamoDB.describeTable(dynamoDbTableName);
log.info("Got description");
String itemTableStreamArn = itemTableDescription.getTable().getLatestStreamArn();
log.info("Got stream arn ({}) for table={} tableArn={}", itemTableStreamArn,
itemTableDescription.getTable().getTableName(), itemTableDescription.getTable().getTableArn());
// Get all the shard IDs from the stream. Note that DescribeStream returns
// the shard IDs one page at a time.
String lastEvaluatedShardId = null;
do {
DescribeStreamResult describeStreamResult = dynamoDBStreams.describeStream(
new DescribeStreamRequest()
.withStreamArn(itemTableStreamArn)
.withExclusiveStartShardId(lastEvaluatedShardId));
List<Shard> shards = describeStreamResult.getStreamDescription().getShards();
// Process each shard on this page
for (Shard shard : shards) {
String shardId = shard.getShardId();
System.out.println("Shard: " + shard);
// Get an iterator for the current shard
GetShardIteratorRequest getShardIteratorRequest = new GetShardIteratorRequest()
.withStreamArn(itemTableStreamArn)
.withShardId(shardId)
.withShardIteratorType(ShardIteratorType.LATEST);
GetShardIteratorResult getShardIteratorResult =
dynamoDBStreams.getShardIterator(getShardIteratorRequest);
String currentShardIter = getShardIteratorResult.getShardIterator();
// Shard iterator is not null until the Shard is sealed (marked as READ_ONLY).
// To prevent running the loop until the Shard is sealed, which will be on average
// 4 hours, we process only the items that were written into DynamoDB and then exit.
int processedRecordCount = 0;
while (currentShardIter != null && processedRecordCount < 100) {
System.out.println(" Shard iterator: " + currentShardIter.substring(380));
// Use the shard iterator to read the stream records
GetRecordsResult getRecordsResult = dynamoDBStreams.getRecords(new GetRecordsRequest()
.withShardIterator(currentShardIter));
List<Record> records = getRecordsResult.getRecords();
for (Record record : records) {
// I set a breakpoint on the line below, but it was never hit after the bulk update info
if (startTime.isBefore(ZonedDateTime.ofInstant(record.getDynamodb()
.getApproximateCreationDateTime().toInstant(), ZoneId.systemDefault()))) {
System.out.println(" " + record.getDynamodb());
}
}
processedRecordCount += records.size();
currentShardIter = getRecordsResult.getNextShardIterator();
}
}
// If LastEvaluatedShardId is set, then there is
// at least one more page of shard IDs to retrieve
lastEvaluatedShardId = describeStreamResult.getStreamDescription().getLastEvaluatedShardId();
} while (lastEvaluatedShardId != null);
}
}

Note that your test is based on the low-level API, not on the Kenisis client library. So it's normal to have some tricky technical details to deal with.
Your test application has some similarities with the example given in the doc, but it has issues:
When I run the application I can see the records from the bulk update
ShardIteratorType.LATEST will not look for old records that happened before running the test (It starts reading just after the most recent stream records in the shard)
So, I will assume that the iterator type was different (ex: TRIM_HORIZON) and changed later to LATEST during your tests.
The main issue comes from the fact that your application will sequentially poll shards, and it will bloque in the first shard until it finds 100 new records in this shard (due to LATEST iterator type).
So, you may not see the new minor changes while the test is running if they belong to a different shard.
Solutions:
1- Poll shards in parallel using threads.
2- Filter returned shards using the sequence number of the last logged record, and try to guess the shard that may contain minor changes.
3- Dangerous & I'm not sure if it works :)
In a test table, and if your data model allows this: close the current stream, and enable a new one, then make sure that all your writes belong to one partition. In the majority of cases, table partitions have a one-to-one relationship with active shards. Theoretically, you have only one active shard to deal with.

Related

How can I optimize this function get all values in a redis json database?

My function
public IQueryable<T> getAllPositions<T>(RedisDbs redisDbKey)
{
List<T> positions = new List<T>();
List<string> keys = new List<string>();
foreach (var key in _redisServer.Keys((int)redisDbKey))
{
keys.Add(key.ToString());
}
var sportEventRet = _redis.GetDatabase((int)redisDbKey).JsonMultiGetAsync(keys.ToArray());
foreach (var sportEvent in sportEventRet.Result)
{
var redisValue = (RedisValue)sportEvent;
if(!redisValue.IsNull)
{
var positionEntity = JsonConvert.DeserializeObject<T>(redisValue, jsonSerializerSettings);
positions.Add(positionEntity);
}
}
return positions.AsQueryable();
}
Called as
IQueryable<IPosition> union = redisClient.getAllPositions<Position>(RedisDbs.POSITIONDB);
Where Position is a simple model with just a few simple properties. And RedisDbs is just an enum representing an int for a specific database. With both this application and the redisjson instance running locally on a high performance server, it takes two seconds for this function to return a database with 20k json values in it. This is unacceptable for my specific usecase, I need this to be done in the maximum of 1 second, preferably sub 600ms. Are there any optimizations I could make to this?
I'm convinced the problem is with the KEYS command.
Here is what is written about Keys command in redis.io:
Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance
when it is executed against large databases. This command is intended
for debugging and special operations, such as changing your keyspace
layout. Don't use KEYS in your regular application code.
You can save the list of your json keys and then use them in your function instead of calling the keys command.

Record/Revert DynamoDB state for integration testing

I want to write integration testing for my API Gateway which is using DynamoDB as backend. I was wondering if there is a method/framework/libraries which provides flexibility to record DynamoDB state before tests and revert it back to original state after the tests?
Ideally, I want something which can keep track changes made in DynamoDB since the beginning of tests and revert all those changes once the test is completed.
I use DynamoDB Local in my test environment, instead of running tests against DynamoDB directly. This saves costs and time. I use a test framework (RSpec) where I can delete anything stored in the database after a test is run.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html
If you need to run tests against a real DynamoDB table, look into DynamoDB streams + AWS Lambda. You can write a Lambda function that is triggered on item changes from your table. That function can, for example, store a record of the change in another table. Once your test is done, it can kick off a second Lambda function which goes through the change table and reverts each change in your original table.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html
as readyornot recommended, i am also using DynamoDBLocal for integration test. I implemented it like with the following concept:-
Add DynamoDBLocal dependency in the dependency management (in my case gradle: testCompile 'com.amazonaws:DynamoDBLocal:1.11.86').
DynamoDBLocal server needs some native files add them to the test resources, you will find them in the extracted files of the lib (sqlite4java-win32-x86.dll, libsqlite4java-linux-i386.so...etc)
In the #Before Setup of the Junit class set java library path with the location of the native libs you placed in step 2.
SQLite.setLibraryPath("src/test/resources/location/of/native");
In the setup method as well start the DynamoDB local server with inMemory mode, so you don't need to delete any records after the test finished.
final String[] localArgs = { "-inMemory" };
DynamoDBProxyServer server=ServerRunner.createServerFromCommandLineArgs(localArgs);
server.start();
ddb = AmazonDynamoDBClientBuilder
.standard()
.withEndpointConfiguration(new EndpointConfiguration("http://localhost:8000", "local"))
.withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials("test", "password")))
.build();
table = createTable(ddb, TABLE_NAME, HASH_KEY_NAME, SORT_KEY_NAME);
Create table like that
private CreateTableResult createTable(AmazonDynamoDB ddb, String tableName, String hashKeyName,
String sortKeyName)
{
List<AttributeDefinition> attributeDefinitions = new ArrayList<AttributeDefinition>();
attributeDefinitions.add(new AttributeDefinition(hashKeyName, ScalarAttributeType.S));
attributeDefinitions.add(new AttributeDefinition(sortKeyName, ScalarAttributeType.S));
List<KeySchemaElement> ks = new ArrayList<KeySchemaElement>();
ks.add(new KeySchemaElement(hashKeyName, KeyType.HASH));
ks.add(new KeySchemaElement(sortKeyName, KeyType.RANGE));
ProvisionedThroughput provisionedthroughput = new ProvisionedThroughput(1000L, 1000L);
CreateTableRequest request =
new CreateTableRequest()
.withTableName(tableName)
.withAttributeDefinitions(attributeDefinitions)
.withKeySchema(ks)
.withProvisionedThroughput(provisionedthroughput);
return ddb.createTable(request);
}
And here is an example of a test method that tests the table metadata
#Test
public void createTableTest()
{
TableDescription tableDesc = table.getTableDescription();
assertEquals(TABLE_NAME, tableDesc.getTableName());
assertEquals("[{AttributeName: "
+ HASH_KEY_NAME
+ ",KeyType: HASH}, {AttributeName: "
+ SORT_KEY_NAME
+ ",KeyType: RANGE}]",
tableDesc.getKeySchema().toString());
assertEquals("[{AttributeName: "
+ HASH_KEY_NAME
+ ",AttributeType: S}, {AttributeName: "
+ SORT_KEY_NAME
+ ",AttributeType: S}]", tableDesc.getAttributeDefinitions().toString());
assertEquals(Long.valueOf(1000L), tableDesc.getProvisionedThroughput().getReadCapacityUnits());
assertEquals(Long.valueOf(1000L), tableDesc.getProvisionedThroughput().getWriteCapacityUnits());
assertEquals("ACTIVE", tableDesc.getTableStatus());
assertEquals("arn:aws:dynamodb:ddblocal:000000000000:table/" + TABLE_NAME, tableDesc.getTableArn());
ListTablesResult tables = ddb.listTables();
assertEquals(1, tables.getTableNames().size());
}
implement #After method so it deletes the table and shutdown the server
#After
public void tearDown()
{
ddb.deleteTable(TABLE_NAME);
ddb.shutdown();
}

WCF Transaction with multiple inserts

When creating a user, entries are required in multiple tables. I am trying to create a transaction that creates a new entry into one table and then pass the new entityid into the parent table and so on. The error I am getting is
The transaction manager has disabled its support for remote/network
transactions. (Exception from HRESULT: 0x8004D024)
I believe this is caused by creating multiple connections within a single TransactionScope, but I am unsure on what the best/most efficient way of doing this is.
[OperationBehavior(TransactionScopeRequired = true)]
public int CreateUser(CreateUserData createData)
{
// Create a new family group and get the ID
var familyGroupId = createData.FamilyGroupId ?? CreateFamilyGroup();
// Create the APUser and get the Id
var apUserId = CreateAPUser(createData.UserId, familyGroupId);
// Create the institution user and get the Id
var institutionUserId = CreateInsUser(apUserId, createData.AlternateId, createData.InstitutionId);
// Create the investigator group user and return the Id
return AddUserToGroup(createData.InvestigatorGroupId, institutionUserId);
}
This is an example of one of the function calls, all the other ones follow the same format
public int CreateFamilyGroup(string familyGroupName)
{
var familyRepo = _FamilyRepo ?? new FamilyGroupRepository();
var familyGroup = new FamilyGroup() {CreationDate = DateTime.Now};
return familyRepo.AddFamilyGroup(familyGroup);
}
And the repository call for this is as follows
public int AddFamilyGroup(FamilyGroup familyGroup)
{
using (var context = new GameDbContext())
{
var newGroup = context.FamilyGroups.Add(familyGroup);
context.SaveChanges();
return newGroup.FamilyGroupId;
}
}
I believe this is caused by creating multiple connections within a single TransactionScope
Yes, that is the problem. It does not really matter how you avoid that as long you avoid it. A common thing to do is to have one connection and one EF context per WCF request. You need to find a way to pass that EF context along.
The method AddFamilyGroup illustrates a common anti-pattern with EF: You are using EF as a CRUD facility. It's supposed to me more like a live object graph connected to the database. The entire WCF request should share the same EF context. If you move in that direction the problem goes away.

What is the best approah to insert large records return from webservice in SQLite

Using Async-based Webservice and Async framework in WinRT (Win8) to get a large recordsets(1000 to 5000) from a remote Ms SQL Server.
I want to know :
1) Which is the best approach to handle to insert large recordsets into SQLite?
2) Using RollBack transaction will start all over again if there is connection error. The below method will insert whatever and I can update the data later if the records are not complete. Is this a good approach?
3) Any better way to enhance my below solution?
This foreach statement to handle
each reords in returned result which returned from Async-Based WebService:
foreach (WebServiceList _List in IList)
{
InsertNewItems(_List.No, _List.Description, _List.Unit_Price, _List.Base_Unit_of_Measure);
}
private void InsertNewItems(string ItemNo, string ItemName, decimal ItemPrice, string ItemBUoM)
{
var existingItem = (db2.Table().Where(c => c.No == ItemNo)).SingleOrDefault();
if (existingItem != null)
{
existingItem.No = ItemNo;
existingItem.Description = ItemName;
existingItem.Unit_Price = ItemPrice;
existingItem.BaseUnitofMeasure = ItemBUoM;
int success = db2.Update(existingItem);
}
else
{
int success = db2.Insert(new Item()
{
No = ItemNo,
Description = ItemName,
Unit_Price = ItemPrice,
Base_Unit_of_Measure = ItemBUoM
});
}
}
You should use RunInTransaction from sqlite-net. The documentation for it says,
Executes action within a (possibly nested) transaction by wrapping it
in a SAVEPOINT. If an exception occurs the whole transaction is rolled
back, not just the current savepoint. The exception is rethrown.
using (var db = new SQLiteConnection(DbPath))
{
db.RunInTransaction(() =>
{
db.InsertOrReplace(MyObj);
});
}
Wiki article for Transactions at GitHub
The most important performance aspect for bulk inserts is to use a single transaction. If you want to handle aborts, I suggest that you feed the data in sufficiently large parts and restart from that point on next time. An SQL transaction either finishes completely or rolls back completely, so unless the input data changes between two runs, there should be no need to do an insert-or-update.
See, for example, here for a discussion of SQLite bulk insert performance using different methods.

How do I create a unit test that updates a record into database in asp.net

How do I create a unit test that updates a record into database in asp.net
While technically we don't call this a 'unit test', but an 'integration test' (as Oded explained), you can do this by using a unit testing framework such as MSTest (part of Visual Studio 2008/2010 professional) or one of the free available unit testing frameworks, such as NUnit.
However, testing an ASP.NET web project is usually pretty hard, especially when you've put all you logic inside web pages. Best thing to do is to extract all your business logic to a separate layer (usually a separate project within your solution) and call that logic from within your web pages. But perhaps you’ve already got this separation, which would be great.
This way you can also call this logic from within your tests. For integration tests, it is best to have a separate test database. A test database must contain a known (and stable) set of data (or be completely empty). Do not use a copy of your production database, because when data changes, your tests might suddenly fail. Also you should make sure that all changes in the database, made by an integration test, should be rolled back. Otherwise, the data in your test database is constantly changing, which could cause your tests to suddenly fail.
I always use the TransactionScope in my integration tests (and never in my production code). This ensures that all data will be rolled back. Here is an example of what such an integration test might look like, while using MSTest:
[TestClass]
public class CustomerMovedCommandTests
{
// This test test whether the Execute method of the
// CustomerMovedCommand class in the business layer
// does the expected changes in the database.
[TestMethod]
public void Execute_WithValidAddress_Succeeds()
{
using (new TransactionScope())
{
// Arrange
int custId = 100;
using (var db = new ContextFactory.CreateContext())
{
// Insert customer 100 into test database.
db.Customers.InsertOnSubmit(new Customer()
{
Id = custId, City = "London", Country = "UK"
});
db.SubmitChanges();
}
string expectedCity = "New York";
string expectedCountry = "USA";
var command = new CustomerMovedCommand();
command.CustomerId = custId;
command.NewAddress = new Address()
{
City = expectedCity, Country = expectedCountry
};
// Act
command.Execute();
// Assert
using (var db = new ContextFactory.CreateContext())
{
var c = db.Customers.Single(c => c.Id == custId);
Assert.AreEqual(expectedCity, c.City);
Assert.AreEqual(expectedCountry, c.Country);
}
} // Dispose rolls back everything.
}
}
I hope this helps, but next time, please be a little more specific in your question.

Resources