I am working on a simple database procedure in Kotlin using Room, and I can't explain why the process is so slow, mostly on the Android Studio emulator.
The table I am working on is this:
#Entity(tableName = "folders_items_table", indices = arrayOf(Index(value = ["folder_name"]), Index(value = ["item_id"])))
data class FoldersItems(
#PrimaryKey(autoGenerate = true)
var uid: Long = 0L,
#ColumnInfo(name = "folder_name")
var folder_name: String = "",
#ColumnInfo(name = "item_id")
var item_id: String = ""
)
And what I am just trying to do is this: checking if a combination folder/item is already present, insert a new record. If not, ignore it. on the emulator, it takes up to 7-8 seconds to insert 100 records. On a real device, it is much faster, but still, it takes around 3-4 seconds which is not acceptable for just 100 records. It looks like the "insert" query is particularly slow.
Here is the procedure that makes what I have just described (inside a coroutine):
val vsmFoldersItems = FoldersItems()
items.forEach{
val itmCk = database.checkFolderItem(item.folder_name, it)
if (itmCk == 0L) {
val newFolderItemHere = vsmFoldersItems.copy(
folder_name = item.folder_name,
item_id = it
)
database.insertFolderItems(newFolderItemHere)
}
}
the variable "items" is an array of Strings.
Here is the DAO definitions of the above-called functions:
#Query("SELECT uid FROM folders_items_table WHERE folder_name = :folder AND item_id = :item")
fun checkFolderItem(folder: String, item: String): Long
#Insert
suspend fun insertFolderItems(item: FoldersItems)
Placing the loop inside a single transaction should significantly reduce the time taken.
The reason is that each transaction (by default each SQL statement that makes a change to the database) will result in a disk write. So that's 100 disk writes for your loop.
If you begin a transaction before the loop and then set the transaction successful when the loop is completed and then end the transaction a single disk write is required.
What I am unsure of is exactly how to do this when using a suspended function (not that familiar with Kotlin).
As such I'd suggest either dropping the suspend or having another Dao for use within loops.
Then have something like :-
val vsmFoldersItems = FoldersItems()
your_RoomDatabase.beginTransaction()
items.forEach{
val itmCk = database.checkFolderItem(item.folder_name, it)
if (itmCk == 0L) {
val newFolderItemHere = vsmFoldersItems.copy(
folder_name = item.folder_name,
item_id = it
)
database.insertFolderItems(newFolderItemHere)
}
}
your_RoomDatabase.setTransactionSuccessful() //<<<<<<< IF NOT set then ALL updates will be rolled back
your_RoomDatabase.endTransaction()
You may wish to refer to:-
https://developer.android.com/reference/androidx/room/RoomDatabase
You may wish to especially refer to runInTransaction
Related
Realm allows you to receive the results of a query in sorted order.
let realm = try! Realm()
let dogs = realm.objects(Dog.self)
let dogsSorted = dogs.sorted(byKeyPath: "name", ascending: false)
I ran this test to see how quickly realm returns sorted data
import Foundation
import RealmSwift
class TestModel: Object {
#Persisted(indexed: true) var value: Int = 0
}
class RealmSortTest {
let documentCount = 1000000
var smallestValue: TestModel = TestModel()
func writeData() {
let realm = try! Realm()
var documents: [TestModel] = []
for _ in 0 ... documentCount {
let newDoc = TestModel()
newDoc.value = Int.random(in: 0 ... Int.max)
documents.append(newDoc)
}
try! realm.write {
realm.deleteAll()
realm.add(documents)
}
}
func readData() {
let realm = try! Realm()
let sortedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
let start = Date()
self.smallestValue = sortedResults[0]
let end = Date()
let delta = end.timeIntervalSinceReferenceDate - start.timeIntervalSinceReferenceDate
print("Time Taken: \(delta)")
}
func updateSmallestValue() {
let realm = try! Realm()
let sortedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
smallestValue = sortedResults[0]
print("Originally loaded smallest value: \(smallestValue.value)")
let newSmallestValue = TestModel()
newSmallestValue.value = smallestValue.value - 1
try! realm.write {
realm.add(newSmallestValue)
}
print("Originally loaded smallest value after write: \(smallestValue.value)")
let readStart = Date()
smallestValue = sortedResults[0]
let readEnd = Date()
let readDelta = readEnd.timeIntervalSinceReferenceDate - readStart.timeIntervalSinceReferenceDate
print("Reloaded smallest value \(smallestValue.value)")
print("Time Taken to reload the smallest value: \(readDelta)")
}
}
With documentCount = 100000, readData() output:
Time taken to load smallest value: 0.48901796340942383
and updateData() output:
Originally loaded smallest value: 2075613243102
Originally loaded smallest value after write: 2075613243102
Reloaded smallest value 2075613243101
Time taken to reload the smallest value: 0.4624580144882202
With documentCount = 1000000, readData() output:
Time taken to load smallest value: 4.807577967643738
and updateData() output:
Originally loaded smallest value: 4004790407680
Originally loaded smallest value after write: 4004790407680
Reloaded smallest value 4004790407679
Time taken to reload the smallest value: 5.2308430671691895
The time taken to retrieve the first document from a sorted result set is scaling with the number of documents stored in realm rather than the number of documents being retrieved. This indicates to me that realm is sorting all of the documents at query time rather than when the documents are being written. Is there a way to index your data so that you can quickly retrieve a small number of sorted documents?
Edit:
Following discussion in the comments, I updated the code to load only the smallest value from the sorted collection.
Edit 2
I updated the code to observe the results as suggested in the comments.
import Foundation
import RealmSwift
class TestModel: Object {
#Persisted(indexed: true) var value: Int = 0
}
class RealmSortTest {
let documentCount = 1000000
var smallestValue: TestModel = TestModel()
var storedResults: Results<TestModel> = (try! Realm()).objects(TestModel.self).sorted(byKeyPath: "value")
var resultsToken: NotificationToken? = nil
func writeData() {
let realm = try! Realm()
var documents: [TestModel] = []
for _ in 0 ... documentCount {
let newDoc = TestModel()
newDoc.value = Int.random(in: 0 ... Int.max)
documents.append(newDoc)
}
try! realm.write {
realm.deleteAll()
realm.add(documents)
}
}
func observeData() {
let realm = try! Realm()
print("Loading Data")
let startTime = Date()
self.storedResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
self.resultsToken = self.storedResults.observe { changes in
let observationTime = Date().timeIntervalSince(startTime)
print("Time to first observation: \(observationTime)")
let firstTenElementsSlice = self.storedResults[0..<10]
let elementsArray = Array(firstTenElementsSlice) //print this if you want to see the elements
elementsArray.forEach { print($0.value) }
let moreElapsed = Date().timeIntervalSince(startTime)
print("Time to printed elements: \(moreElapsed)")
}
}
}
and I got the following output
Loading Data
Time to first observation: 5.252112984657288
3792614823099
56006949537408
Time to printed elements: 5.253015995025635
Reading the data with an observer did not reduce the time taken to read the data.
At this time it appears that Realm sorts data when it is accessed rather than when it is written, and there is not a way to have Realm sort data at write time. This means that accessing sorted data scales with the number of documents in the database rather than the number of documents being accessed.
The actual time taken to access the data varies by use case and platform.
dogs and dogsSorted are Realm Results Collection object that essentially contains pointers to the underlying data, not the data itself.
Defining a sort order does NOT load all of the objects and they remain lazy - only loading as needed, which is one of the huge benefits to Realm; giant datasets can be used without worrying about overloading memory.
It's also one of the reasons that Realm Results objects always reflect the current state of the data of the underlying data; that data can change many times and what you see in your app Results vars (and Realm Collections in general) will always show the updated data.
As a side node, at this time working with Realm Collection objects with Swift High Level functions causes that data to load into memory - so don't do that. Sort, Filter etc with Realm functions and everything stays lazy and memory friendly.
Indexing is a trade off; on one hand it can improve the performance of certain queries like an equality ( "name == 'Spot'" ) but on the other hand it can slow down write performance. Additionally, adding indexes takes up a bit more space.
Generally speaking, indexing is best for specific use cases; maybe in a situation were you doing some kind of type ahead autofill where performance is critical. We have several apps with very large datasets (Gb's) and nothing is indexed because the performance advantage received is offset by slower writes, which are done frequently. I suggest starting without indexing.
EDIT:
Going to update the answer based on additional discussion.
First and foremost, copying data from one object to another is not a measure of database loading performance. The real objective here is the user experience and/or being able to access that data - from the time the user expects to see the data to when it's shown. So let's provide some code to demonstrate general performance:
We'll first start with a similar model to what the OP used
class TestModel: Object {
#Persisted(indexed: true) var value: Int = 0
convenience init(withIndex: Int) {
self.init()
self.value = withIndex
}
}
Then define a couple of vars to hold the Results from disk and a notification token which allows us to know when that data is available to be displayed to the user. And then lastly a var to hold the time of when the loading starts
var modelResults: Results<TestModel>!
var modelsToken: NotificationToken?
var startTime = Date()
Here's the function that writes lots of data. The objectCount var will be changed from 10,000 objects on the first run to 1,000,000 objects on the second. Note this is bad coding as I am creating a million objects in memory so don't do this; for demonstration purposes only.
func writeLotsOfData() {
let realm = try! Realm()
let objectCount = 1000000
autoreleasepool {
var testModelArray = [TestModel]()
for _ in 0..<objectCount {
let m = TestModel(withIndex: Int.random(in: 0 ... Int.max))
testModelArray.append(m)
}
try! realm.write {
realm.add(testModelArray)
}
print("data written: \(testModelArray.count) objects")
}
}
and then finally the function that loads those objects from realm and outputs when the data is available to be shown to the user. Note they are sorted per the original question - and in fact will maintain their sort as data is added and changed! Pretty cool stuff.
func loadBigData() {
let realm = try! Realm()
print("Loading Data")
self.startTime = Date()
self.modelResults = realm.objects(TestModel.self).sorted(byKeyPath: "value")
self.modelsToken = self.modelResults?.observe { changes in
let elapsed = Date().timeIntervalSince(self.startTime)
print("Load completed of \(self.modelResults.count) objects - elapsed time of \(elapsed)")
}
}
and the results. Two runs, one with 10,000 objects and one with 1,000,000 objects
data written: 10000 objects
Loading Data
Load completed of 10000 objects - elapsed time of 0.0059670209884643555
data written: 1000000 objects
Loading Data
Load completed of 1000000 objects - elapsed time of 0.6800119876861572
There are three things to note
A Realm Notification object fires an event when the data has
completed loading, and also when there are additional changes. We are
leveraging that to notify the app when the data has completed loading
and is available to be used - shown to the user for example.
We are lazily loading all of the objects! At no point are we going
to run into a memory overloading issue. Once the objects have loaded
into the results, they are then freely available to be shown to the
user or processed in whatever way is needed. Super important to work
with Realm objects in a Realm way when working with large datasets.
Generally speaking, if it's 10 objects well, no problem tossing
them into an array, but when there are 1 Million objects - let Realm
do it's lazy job.
The app is protected using the above code and techniques. There
could be 10 objects or 1,000,000 objects and the memory impact is
minimal.
EDIT 2
(see comment to the OP's question for more info about this edit)
Per a request fromt the OP, they wanted to see the same exercise with printed values and times. Here's the updated code
self.modelsToken = self.modelResults?.observe { changes in
let elapsed = Date().timeIntervalSince(self.startTime)
print("Load completed of \(self.modelResults.count) objects - elapsed time of \(elapsed)")
print("print first 10 object values")
let firstTenElementsSlice = self.modelResults[0..<10]
let elementsArray = Array(firstTenElementsSlice) //print this if you want to see the elements
elementsArray.forEach { print($0.value)}
let moreElapsed = Date().timeIntervalSince(self.startTime)
print("Printing of 10 elements completed: \(moreElapsed)")
}
and then the output
Loading Data
Load completed of 1000000 objects - elapsed time of 0.6730009317398071
print first 10 object values
12264243738520
17242140785413
29611477414437
31558144830373
32913160803785
45399774467128
61700529799916
63929929449365
73833938586206
81739195218861
Printing of 10 elements completed: 0.6745189428329468
I Have the following model
class Process: Object {
#objc dynamic var processID:Int = 1
let steps = List<Step>()
}
class Step: Object {
#objc private dynamic var stepCode: Int = 0
#objc dynamic var stepDateUTC: Date? = nil
var stepType: ProcessStepType {
get {
return ProcessStepType(rawValue: stepCode) ?? .created
}
set {
stepCode = newValue.rawValue
}
}
}
enum ProcessStepType: Int { // to review - real value
case created = 0
case scheduled = 1
case processing = 2
case paused = 3
case finished = 4
}
A process can start, processing , paused , resume (to be in step processing again), pause , resume again,etc. the current step is the one with the latest stepDateUTC
I am trying to get all Processes, having for last step ,a step of stepType processing "processing ", ie. where for the last stepDate, stepCode is 2 .
I came with the following predicate... which doesn't work. Any idea of the right perform to perform such query ?
my best trial is the one. Is it possible to get to this result via one realm query .
let processes = realm.objects(Process.self).filter(NSPredicate(format: "ANY steps.stepCode = 2 AND NOT (ANY steps.stepCode = 4)")
let ongoingprocesses = processes.filter(){$0.steps.sorted(byKeyPath: "stepDateUTC", ascending: false).first!.stepType == .processing}
what I hoped would work
NSPredicate(format: "steps[LAST].stepCode = \(TicketStepType.processing.rawValue)")
I understand [LAST] is not supported by realm (as per the cheatsheet). but is there anyway around I could achieve my goal through a realm query?
There are a few ways to approach this and it doesn't appear the date property is relevant because lists are stored in sequential order (as long as they are not altered), so the last element in the List was added last.
This first piece of code will filter for processes where the last element is 'processing'. I coded this long-handed so the flow is more understandable.
let results = realm.objects(Process.self).filter { p in
let lastIndex = p.steps.count - 1
let step = p.steps[lastIndex]
let type = step.stepType
if type == .processing {
return true
}
return false
}
Note that Realm objects are lazily loaded - which means thousands of objects have a low memory impact. By filtering using Swift, the objects are filtered in memory so the impact is more significant.
The second piece of code is what I would suggest as it makes filtering much simpler, but would require a slight change to the Process model.
class Process: Object {
#objc dynamic var processID:Int = 1
let stepHistory = List<Step>() //RENAMED: the history of the steps
#objc dynamic var name = ""
//ADDED: new property tracks current step
#objc dynamic var current_step = ProcessStepType.created.index
}
My thought here is that the Process model keeps a 'history' of steps that have occurred so far, and then what the current_step is.
I also modified the ProcessStepType enum to make it more filterable friendly.
enum ProcessStepType: Int { // to review - real value
case created = 0
case scheduled = 1
case processing = 2
case paused = 3
case finished = 4
//this is used when filtering
var index: Int {
switch self {
case .created:
return 0
case .scheduled:
return 1
case .processing:
return 2
case .paused:
return 3
case .finished:
return 4
}
}
}
Then to return all processes where the last step in the list is 'processing' here's the filter
let results2 = realm.objects(Process.self).filter("current_step == %#", ProcessStepType.processing.index)
The final thought is to add some code to the Process model so when a step is added to the list, the current_step var is also updated. Coding that is left to the OP.
After reading the documentation, I'm having a hard time conceptualizing the change feed. Let's take the code from the documentation below. The second change feed is picking up the changes from the last time it was run via the checkpoints. Let's say it is being used to create summary data and there was an issue and it needed to be re-run from a prior time. I don't understand the following:
How to specify a particular time the checkpoint should start. I understand I can save the checkpoint dictionary and use that for each run, but how do you get the changes from X time to maybe rerun some summary data
Secondly, let's say we are rerunning some summary data and we save the last checkpoint used for each summarized data so we know where that one left off. How does one know that a record is in or before that checkpoint?
Code that runs from collection beginning and then from last checkpoint:
Dictionary < string, string > checkpoints = await GetChanges(client, collection, new Dictionary < string, string > ());
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-201", MetricType = "Temperature", Unit = "Celsius", MetricValue = 1000
});
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-212", MetricType = "Pressure", Unit = "psi", MetricValue = 1000
});
// Returns only the two documents created above.
checkpoints = await GetChanges(client, collection, checkpoints);
//
private async Task < Dictionary < string, string >> GetChanges(
DocumentClient client,
string collection,
Dictionary < string, string > checkpoints) {
List < PartitionKeyRange > partitionKeyRanges = new List < PartitionKeyRange > ();
FeedResponse < PartitionKeyRange > pkRangesResponse;
do {
pkRangesResponse = await client.ReadPartitionKeyRangeFeedAsync(collection);
partitionKeyRanges.AddRange(pkRangesResponse);
}
while (pkRangesResponse.ResponseContinuation != null);
foreach(PartitionKeyRange pkRange in partitionKeyRanges) {
string continuation = null;
checkpoints.TryGetValue(pkRange.Id, out continuation);
IDocumentQuery < Document > query = client.CreateDocumentChangeFeedQuery(
collection,
new ChangeFeedOptions {
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = 1
});
while (query.HasMoreResults) {
FeedResponse < DeviceReading > readChangesResponse = query.ExecuteNextAsync < DeviceReading > ().Result;
foreach(DeviceReading changedDocument in readChangesResponse) {
Console.WriteLine(changedDocument.Id);
}
checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
}
}
return checkpoints;
}
DocumentDB supports check-pointing only by the logical timestamp returned by the server. If you would like to retrieve all changes from X minutes ago, you would have to "remember" the logical timestamp corresponding to the clock time (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), then use that to retrieve changes.
Change feed uses logical time in place of clock time because it can be different across various servers/partitions. If you would like to see change feed support based on clock time (with some caveats on skew), please propose/upvote at https://feedback.azure.com/forums/263030-documentdb/.
To save the last checkpoint per partition key/document, you can just save the corresponding version of the batch in which it was last seen (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), like Fred suggested in his answer.
How to specify a particular time the checkpoint should start.
You could try to provide a logical version/ETag (such as 95488) instead of providing a null value as RequestContinuation property of ChangeFeedOptions.
I am trying to use a DynamoDB table to store this data:
DartsPlayerInsultTable
CustomerId String
PlayerId String
PlayerInsult String
Using the method (concept, not code) described here:
https://java.awsblog.com/post/Tx3GYZEVGO924K4/The-DynamoDBMapper-Local-Secondary-Indexes-and-You
here:
http://mobile.awsblog.com/post/TxTCW7KW8BGZAF/Amazon-DynamoDB-on-Mobile-Part-4-Local-Secondary-Indexes
and here:
http://labs.journwe.com/2013/12/15/dynamodb-secondary-indexes/comment-page-1/#comment-116
I want to have multiple insult records per customer-player.
CustomerId is my Hash Key
PlayerId is my Range Key
and I a trying to use PlayerInsult in a key so that
a second PlayerInsult value inserts a second record
rather than replacing the existing one.
Have tried both Global and Secondary indexes for this,
but if I try to add a row with a new insult, it still
replaces the insult with the same customer-player key
rather than adding a new one.
Any suggestions on the best approach to use for this is
DynanoDB? Do I need to create a hybrid column for a range-key?
Trying to keep this simple...
class func createDartsPlayerInsultTable() -> BFTask {
let dynamoDB = AWSDynamoDB.defaultDynamoDB()
let hashKeyAttributeDefinition = AWSDynamoDBAttributeDefinition()
hashKeyAttributeDefinition.attributeName = "CustomerId"
hashKeyAttributeDefinition.attributeType = AWSDynamoDBScalarAttributeType.S
let hashKeySchemaElement = AWSDynamoDBKeySchemaElement()
hashKeySchemaElement.attributeName = "CustomerId"
hashKeySchemaElement.keyType = AWSDynamoDBKeyType.Hash
let rangeKeyAttributeDefinition = AWSDynamoDBAttributeDefinition()
rangeKeyAttributeDefinition.attributeName = "PlayerId"
rangeKeyAttributeDefinition.attributeType = AWSDynamoDBScalarAttributeType.S
let rangeKeySchemaElement = AWSDynamoDBKeySchemaElement()
rangeKeySchemaElement.attributeName = "PlayerId"
rangeKeySchemaElement.keyType = AWSDynamoDBKeyType.Range
/*
let indexRangeKeyAttributeDefinition = AWSDynamoDBAttributeDefinition()
indexRangeKeyAttributeDefinition.attributeName = "PlayerInsult"
indexRangeKeyAttributeDefinition.attributeType = AWSDynamoDBScalarAttributeType.S
let rangeKeySchemaElement = AWSDynamoDBKeySchemaElement()
rangeKeySchemaElement.attributeName = "PlayerId"
rangeKeySchemaElement.keyType = AWSDynamoDBKeyType.Range
let indexRangeKeyElement = AWSDynamoDBKeySchemaElement()
indexRangeKeyElement.attributeName = "PlayerInsult"
indexRangeKeyElement.keyType = AWSDynamoDBIndexRangeKeyType.
*/
//Add non-key attributes
let playerInsultAttrDef = AWSDynamoDBAttributeDefinition()
playerInsultAttrDef.attributeName = "PlayerInsult"
playerInsultAttrDef.attributeType = AWSDynamoDBScalarAttributeType.S
let provisionedThroughput = AWSDynamoDBProvisionedThroughput()
provisionedThroughput.readCapacityUnits = 5
provisionedThroughput.writeCapacityUnits = 5
// CREATE GLOBAL SECONDARY INDEX
/*
let gsi = AWSDynamoDBGlobalSecondaryIndex()
let gsiArray = NSMutableArray()
let gsiHashKeySchema = AWSDynamoDBKeySchemaElement()
gsiHashKeySchema.attributeName = "PlayerId"
gsiHashKeySchema.keyType = AWSDynamoDBKeyType.Hash
let gsiRangeKeySchema = AWSDynamoDBKeySchemaElement()
gsiRangeKeySchema.attributeName = "PlayerInsult"
gsiRangeKeySchema.keyType = AWSDynamoDBKeyType.Range
let gsiProjection = AWSDynamoDBProjection()
gsiProjection.projectionType = AWSDynamoDBProjectionType.All;
gsi.keySchema = [gsiHashKeySchema,gsiRangeKeySchema];
gsi.indexName = "PlayerInsult";
gsi.projection = gsiProjection;
gsi.provisionedThroughput = provisionedThroughput;
gsiArray .addObject(gsi)
*/
// CREATE LOCAL SECONDARY INDEX
let lsi = AWSDynamoDBLocalSecondaryIndex()
let lsiArray = NSMutableArray()
let lsiHashKeySchema = AWSDynamoDBKeySchemaElement()
lsiHashKeySchema.attributeName = "CustomerId"
lsiHashKeySchema.keyType = AWSDynamoDBKeyType.Hash
let lsiRangeKeySchema = AWSDynamoDBKeySchemaElement()
lsiRangeKeySchema.attributeName = "PlayerInsult"
lsiRangeKeySchema.keyType = AWSDynamoDBKeyType.Range
let lsiProjection = AWSDynamoDBProjection()
lsiProjection.projectionType = AWSDynamoDBProjectionType.All;
lsi.keySchema = [lsiHashKeySchema,lsiRangeKeySchema];
lsi.indexName = "PlayerInsult";
lsi.projection = lsiProjection;
//lsi.provisionedThroughput = provisionedThroughput;
lsiArray .addObject(lsi)
//Create TableInput
let createTableInput = AWSDynamoDBCreateTableInput()
createTableInput.tableName = DartsPlayerInsultTableName;
createTableInput.attributeDefinitions = [hashKeyAttributeDefinition, rangeKeyAttributeDefinition, playerInsultAttrDef]
//createTableInput.attributeDefinitions = [hashKeyAttributeDefinition, rangeKeyAttributeDefinition]
createTableInput.keySchema = [hashKeySchemaElement, rangeKeySchemaElement]
createTableInput.provisionedThroughput = provisionedThroughput
//createTableInput.globalSecondaryIndexes = gsiArray as [AnyObject]
createTableInput.localSecondaryIndexes = lsiArray as [AnyObject]
return dynamoDB.createTable(createTableInput).continueWithSuccessBlock({ (var task:BFTask!) -> AnyObject! in
if ((task.result) != nil) {
// Wait for up to 4 minutes until the table becomes ACTIVE.
let describeTableInput = AWSDynamoDBDescribeTableInput()
describeTableInput.tableName = DartsPlayerInsultTableName;
task = dynamoDB.describeTable(describeTableInput)
for var i = 0; i < 16; i++ {
task = task.continueWithSuccessBlock({ (task:BFTask!) -> AnyObject! in
let describeTableOutput:AWSDynamoDBDescribeTableOutput = task.result as! AWSDynamoDBDescribeTableOutput
let tableStatus = describeTableOutput.table.tableStatus
if tableStatus == AWSDynamoDBTableStatus.Active {
return task
}
sleep(15)
return dynamoDB .describeTable(describeTableInput)
})
}
}
return task
})
}
Putting this as an answer and not another comment in case it gets long...
It sounds like the average user's insults might fit into a single record. With the disclaimer that I know absolutely nothing about swift, this might at least be something relatively simple. Keep your customer and player keys. Before you persist the insults, turn the whole list into one big string using whatever version of join("|") swift has. When you fetch the record, do a split("|") to get your list back. (Just be a little judicious with your choice of separators, I'm only using "|" as an example, you don't want to choose something that might appear in an insult...)
There's going to be that one user with enough insults to take you over the 400kb object limit. Set a max list size constant in your code -- when you turn your lists into strings to persist them to dynamo, check the player's list length against that limit. If you exceed it, break your list into chunks of that size and use hash and range keys like ("foo", "bar"), ("foo", "bar1"), ("foo", "bar2"), etc. Yes, the first one does not have a bucket number at the end...
When you query for the data, just do a straight query first and assume you'll be in the good case (just "foo" and "bar", no other buckets). When you unpack that first list, check its length. If it's equal to your max list size constant, you know that you got a "bad" user and need to do a range query. That second one can use the hash key "foo" and the range "bar" to "bar9999". You will fetch back all those buckets with that range query. Unpack and concatenate all the lists.
This is a little gory, but it should also ultimately be straight ahead to code up. Hopefully it's still simple enough to hook into the patterns you mentioned.
What I decided to do was make a conventional dynamodb table with just one hash key, but the new hash key is a combined string of:
CustomerId + "|" + PlayerId
It is not too hard to maintain synchrony between players and insults tables because once a player is inserted into the player table, modifying the player name results in a new row being inserted. Thus, insults do not need to be modified if the player name changes. You only need to cleanup insults if a player is deleted.
This update behavior is just the way dynamodb works if you make Player name a hash key, which I did to insure they were unique.
I am having an issue with ODAC (Oracle Data Access Components), Entity Framework 4.3.1, and expression trees. We have a legacy database (don't we all?) that we are mapping in Entity Framework. The table has millions of records and over one hundred columns (sad face).
Here is an example query on an indexed column:
int myId = 2;
var matchingRecord = context.MyLargeTable.Where(v=>v.Id == myId).ToList(); //Super slow (5+ minutes, sometimes Out of Memory exception)
int myId = 2;
Expression<Func<bool>> myLambda = v => v.Id == myId; //Shouldn't this work now?
var matchingRecord = context.MyLargeTable.Where(myLambda).ToList(); //Still super slow (5+ minutes, sometimes Out of Memory exception)
var elementName = Expression.Parameter(typeof(LargeTable), "v");
var propertyName = Expression.Parameter(elementName, "Id");
var constantValue = Expression.Constant(myId);
var comparisonMethod = Expression.Call(
propertyName,
typeof(int).GetMethod("Equals", new[] { typeof(int) }),
constantValue
)
var finalTree = Expression.Lambda<Func<LargeTable, bool>>(comparisonMethod, elementName);
var matchingRecord = context.MyLargeTable.Where(finalTree).ToList(); //Super fast
I've read things like this that explain the different between Func<> and Expression> and how Expression> actually gets passed to the database for the query and that's why it is faster.
http://www.fascinatedwithsoftware.com/blog/post/2011/12/02/Falling-in-Love-with-LINQ-Part-7-Expressions-and-Funcs.aspx - Whole thing is good, but if in a rush, just read the section titled “Unintended Consequences” for the main takeaway
http://fascinatedwithsoftware.com/blog/post/2012/01/10/More-on-Expression-vs-Func-with-Entity-Framework.aspx
Why would you use Expression<Func<T>> rather than Func<T>? - No set of links is complete without a corresponding SO question
My question is this: Are people really sitting there constructing expression trees using Expression.* classes? Any query beyond simple comparisons get really complicated and is almost impossible to read. What am I missing about passing the Expression> to the database? Who do I go punch in the face for this manually constructed expression tree solution? Oracle? EF? What am I missing?