DynamoDB input broken item if putting items too fast? - amazon-dynamodb

I'm facing a weird phenomenal when putting items to DynamoDB.
It seems like if putting items too fast, DynamoDB can't put the whole data to the table(kinda like a broken data, it has partial attributes but with some weird values)?
I'm using the AWS JavaScript SDK to putting items, no errors shown up, everything seemed to work fine, but once I checked the data from web console, some of the inserted data was broken. Is this is related to write capacity units? (but no errors tell me it's caused by the write capacity units..) I could confirm the spike of my write capacity units was about 60/min, the setting is "on-demand".
I tried to slow down the putting speed with one second interval and with the exactly same data, the data could be inserted correctly...
Anyone knows why and how to fix this..?

The answer is no: If DynamoDB decides to throttle your requests because you exceeded your provisioned capacity or exceed their own hardware's capacity or whatever - it will refuse to do whole requests, or in the case of BatchWriteItems do some of the writes and not do others (and it will tell you which it did and which it didn't). DynamoDB will never write part of a request or corrupt parts of one attribute.
If you are seeing that, the most likely culprit is a bug in your own code that does the write. Maybe your own code is not thread-safe, so if your code is trying to prepare two items for writing concurrently, the code doing this preperation has a data race and results in a corrupt item to be written. Obviously, it is also possible that DynamoDB has a bug causing this, but it can't be as simple a bug as "writing more than 60 items a minute causes corruption" - if this were the case, everyone would have encountered this bug...

Related

DynamoDB Read Capacity way over the limit but only on one table

I have 6 DynamoDB tables set up for use as data sources for my iOS app. They all get called from the client near enough the exact same amount when someone uses the app but for some reason one table if getting crazy spikes in reads and going way over the read capacity and causing a provisioned throughput for table exceeded error.
The app only has a very small number of users in general (below 100).
The offending table metrics look like this:
But all the other tables look like this:
Any thoughts on what might be causing this? The code for making requests client side is all the same. They are seemingly all set up the same in the API using lambda and API gateway. What am i missing?

How to defragment/consolidate partially filled sqlite pages in order to reclaim space?

I am designing a little server application with local data store and sqlite3 seems to be the way to manage the persistent data. But I am worried about malicious users who know the internal logic and might trick the server into creating (and subsequent deleting) lots of records, in a way where a few valid records remain in each data page. The database size might explode quite soon.
Following the documentation of and recommendations like https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/ implies that even auto_vacuum=incremental would not help me in this scenario because it's only effective for released pages, not for used pages with internal gaps (i.e. fragmentation).
Is there a good way to tell sqlite to consolidate such data on-the-fly?
VACUUM operation is not an option due to long-living global DB lock.
Sqlite will merge an almost empty page with neighbors automatically to help reduce fragmentation like you describe.
From an email from D Richard Hipp on the sqlite mailing list:
Once a sufficient number of rows are removed from a page, and the free space on that page gets to be a substantial fraction of the total space for the page, then the page is merged with adjacent pages, freeing up a whole page for reuse. But as doing this reorganization is expensive, it is deferred until a lot of free space accumulates on the page. (The exact thresholds for when a rebalance occurs are written down some place, but they do not come immediately to my mind, as the whole mechanism just works and we haven't touched it in about 15 years.)

Can't see SQLite database changes on a database open by multiple processes

I have a process that opens a database using sqlite3_open and sets journal mode to WAL.
Another process, uses sqlite3_open to open that same database. Everything seems to work, but the problem is that second process does not seem to see changes made by the first process. I am trying to fetch count, or rowids, and they stay the same.
I am sure that database is being updated, because refreshing using SQLiteDatabaseBrowser shows the changes.
I tried multiple ways of opening databases, and multiple ways of querying, but no luck so far. What am I missing? Thanks!
Transactions are used to isolate connections from each other, especially to make changes visible only after a transaction has completed.
So for changes to be visible, the writing connection must end its transaction, and the reading connection must not have started its own transaction before that. (When using automatic transactions, ensure that statements are reset or finalized.)
I figured out what the problem was, and as usual in cases where thing make no sense, mistake on my side. Problem is however subtle, so worth mentioning.
I was doing sqlite3_reset calls on cached prepared statements lazily, that is before I reuse the prepared statement, not immediately after I am done executing it. Problem is that this pattern means that there’s always prepared statement pending reset. Apparently, reset is necessary to be able to see updates to the database (probably some mutex is being held).
Thanks for your help.
EDIT: after sleeping on it this behavior actually makes sense. Updates should not be visible during the time of prepared statement execution, otherwise it might never be done or accurate.

Riak: are my 2is broken?

we're having some weird things happening with a cleanup cronjob and riak:
the objects we store (postboxes) have a 2i for modification date (which is a unix timestamp).
there's a cronjob running freqently deleting all postboxes that have not been modified within 180 days. however we've found evidence that postboxes that some (very little) postboxes that were modified in the last three days were deleted by this cronjob.
After reviewing and debugging several times over every line of code, I am confident, that this is not a problem of the cronjob.
I also traced back all delete calls to that bucket - and no one else is deleting objects there.
Of course I also checked with Riak to read the postboxes with r=ALL: they're definitely gone. (and they are stored with w=QUORUM)
I also checked the logs: updating the post boxes did succeed (there were no errors reported back from the write operations)
This leaves me with two possible causes for this:
riak loses data (which I am not willing to believe that easily)
the secondary indexes are corrupt and queries to them return wrong keys
So my questions are:
Can 2is actually break?
Is it possible to verify that?
Am I missing something completely different?
Cheers,
Matthias
Secondary index queries in Riak are coverage queries, which means that they will only use one of the stored replicas, and not perform a quorum read.
As you are writing with w=QUORUM, it is possible that one (or more) of the replicas may not get updated if you have n_val set to 3 or higher while the operation still is deemed successful. If this is the one selected for the coverage query, you could end up deleting based on the old value. In order to avoid this, you will need to perform updates with w=ALL.

ASP.NET/SQL 2008 Performance issue

We've developed a system with a search screen that looks a little something like this:
(source: nsourceservices.com)
As you can see, there is some fairly serious search functionality. You can use any combination of statuses, channels, languages, campaign types, and then narrow it down by name and so on as well.
Then, once you've searched and the leads pop up at the bottom, you can sort the headers.
The query uses ROWNUM to do a paging scheme, so we only return something like 70 rows at a time.
The Problem
Even though we're only returning 70 rows, an awful lot of IO and sorting is going on. This makes sense of course.
This has always caused some minor spikes to the Disk Queue. It started slowing down more when we hit 3 million leads, and now that we're getting closer to 5, the Disk Queue pegs for up to a second or two straight sometimes.
That would actually still be workable, but this system has another area with a time-sensitive process, lets say for simplicity that it's a web service, that needs to serve up responses very quickly or it will cause a timeout on the other end. The Disk Queue spikes are causing that part to bog down, which is causing timeouts downstream. The end result is actually dropped phone calls in our automated VoiceXML-based IVR, and that's very bad for us.
What We've Tried
We've tried:
Maintenance tasks that reduce the number of leads in the system to the bare minimum.
Added the obvious indexes to help.
Ran the index tuning wizard in profiler and applied most of its suggestions. One of them was going to more or less reproduce the entire table inside an index so I tweaked it by hand to do a bit less than that.
Added more RAM to the server. It was a little low but now it always has something like 8 gigs idle, and the SQL server is configured to use no more than 8 gigs, however it never uses more than 2 or 3. I found that odd. Why isn't it just putting the whole table in RAM? It's only 5 million leads and there's plenty of room.
Poured over query execution plans. I can see that at this point the indexes seem to be mostly doing their job -- about 90% of the work is happening during the sorting stage.
Considered partitioning the Leads table out to a different physical drive, but we don't have the resources for that, and it seems like it shouldn't be necessary.
In Closing...
Part of me feels like the server should be able to handle this. Five million records is not so many given the power of that server, which is a decent quad core with 16 gigs of ram. However, I can see how the sorting part is causing millions of rows to be touched just to return a handful.
So what have you done in situations like this? My instinct is that we should maybe slash some functionality, but if there's a way to keep this intact that will save me a war with the business unit.
Thanks in advance!
Database bottlenecks can frequently be improved by improving your SQL queries. Without knowing what those look like, consider creating an operational data store or a data warehouse that you populate on a scheduled basis.
Sometimes flattening out your complex relational databases is the way to go. It can make queries run significantly faster, and make it a lot easier to optimize your queries, since the model is very flat. That may also make it easier to determine if you need to scale your database server up or out. A capacity and growth analysis may help to make that call.
Transactional/highly normalized databases are not usually as scalable as an ODS or data warehouse.
Edit: Your ORM may have optimizations as well that it may support, that may be worth looking into, rather than just looking into how to optimize the queries that it's sending to your database. Perhaps bypassing your ORM altogether for the reports could be one way to have full control over your queries in order to gain better performance.
Consider how your ORM is creating the queries.
If you're having poor search performance perhaps you could try using stored procedures to return your results and, if necessary, multiple stored procedures specifically tailored to which search criteria are in use.
determine which ad-hoc queries will most likely be run or limit the search criteria with stored procedures.. can you summarize data?.. treat this
app like a data warehouse.
create indexes on each column involved in the search to avoid table scans.
create fragments on expressions.
periodically reorg the data and update statistics as more leads are loaded.
put the temporary files created by queries (result sets) in ramdisk.
consider migrating to a high-performance RDBMS engine like Informix OnLine.
Initiate another thread to start displaying N rows from the result set while the query
continues to execute.

Resources