AWS Dynamodb, lot of literature around scan vs query, using proper GSI vs LSI, etc....but nothing around how to actually analyze/profile (other than manually reviewing each one) if your current query usage patterns are 'ideal' or 'less than ideal'.
Are there any practices, dashboards, logging I may be missing to collect which queries may be less than optimal across a wide range of java apps with enough info to tackle the problem (i.e. more than just 'its slow, go figure it out'). :-)
thanks!
Related
After a watching a few videos regarding DynamoDB and its best practices, I decided to give it a try; however, I cannot help but feel what I'm doing may be an anti-pattern. As I understand it, the best practice is to leverage as few tables as possible while also taking advantage of GSIs to do some 'heavy' lifting. Unfortunately, I'm working with a use case that doesn't actually have strictly defined access patterns yet since we're still in early development.
Some early access patterns that we may see are:
Retrieve the number of wins for a particular game: rock paper scissors, boxing, etc. [1 quick lookup]
Retrieve the amount of coins a user has. [1 quick lookup]
Retrieve all the items that someone has purchased (don't care about date). [Not sure?]
Possibly retrieve all the attributes associated with a user (rps wins, box wins, coins, etc). [I genuinely don't know.]
Additionally, there may be 2 operations we will need to complete. For example, if the user wins a particular game they may receive "coins". Effectively, we'll need to add coins to the user "coins" attribute & update their number of wins for the game.
Do you think I should revisit this strategy? Additionally, we'll probably start creating 'logs' associated with various games and each individual play.
Designing a DynamoDB data model without fully understanding your applications access patterns is the anti-pattern.
Take the time to define your entities (Users, Games, Orders, etc), their relationship to one another and your applications key access patterns. This can be hard work when you are just getting started, but it's absolutely critical to do this when working with DynamoDB. How else can we (or you, or anybody) evaluate whether or not you're using DDB correctly?
When I first worked with DDB, I approached the process in a similar way you are describing. I was used to working with SQL databases, where I could define a few tables and rely on the magic of SQL to support my access patterns as my understanding of the application access patterns evolved. I quickly realized this was not going to work if I wanted to use DynamoDB!
Instead, I started from the front-end of my application. I sketched out the different pages in my app and nailed down the most important concepts in my application. Granted, I may not have covered all the access patterns in my application, but the exercise certainly nailed down the minimal access patterns I'd need to have a usable app.
If you need to rapidly prototype your application to get a better understanding of your acecss patterns, consider using the skills you and your team already have. If you already understand data modeling with SQL databses, go with that for now. You can always revisit DynamoDB once you have a better understanding of your access patterns and determine that your application can benefit from using a NoSQL databse.
I've researched on what I can about SQLite and UnQLite but there are still a few things that haven't quite been answered yet. UnQLite appears to have been released within the past few years which would attribute to the lack of benchmarks. "Performance" (read/write speed, querying, avg. database size before significant slowdown, etc.) comparisons may be somewhat apples-to-oranges here.
From all that I have seen the two have very few differences comparatively speaking, namely that SQLite is a relational database whereas UnQLite is a key-value pair and document (via Jx9) database. They're both portable, cross-platform, and 32/64-bit friendly, and can have single-write and multi-read connections. Very little can be found on UnQLite benchmarks while SQLite has quite a few with different implementations across various (scripting) languages. SQLite has some varied performance across in-memory databases, indexed data, and read/write modes with varying data size. Overall SQLite appears quick and reliable.
All that I can find on UnQLite are unreliable and confusing. I cannot seem to find anything helpful. What read/writes speeds does UnQLite seem to peak at? What languages are (not) recommended when using UnQLite? What are some known disadvantages and bugs?
If it helps at all to explain my intrigue, I'm developing a network utility that will be reading and processing packets with hot-swapping between network interfaces. Since the connections can, though unlikely, reach speeds up to 1 Gbps there will be a lot of raw data being written out to a database. It's still in the early stages of development and I'm having to find a way to balance out performance. There are a lot of factors such as missed packets, how large each write size is, how quickly it can process and move data, how much organization will be required, how many tables will be needed, if I can implement multiprocessing, how reliant each database is on HDD speeds, etc. etc.. My data will need tables but whether or not I have to store them as relational is still in the air. Seeing how the two stack up with their own pros and cons (aside from the usual KVP vs Relational debate) may push me towards either one or, if I'm crazy enough, a mix of both
I've done a bit of fooling around with UnQLite using python bindings I wrote. The Python bindings use cython and are quite fast.
What I've found from my experimentation is that UnQLite's key/value APIs are pretty damn fast, comparable to other DBMs. Things slow down a bit when you start using Jx9 and the document store, though.
Basically depends on what you need...
If you want SQL and ad-hoc querying, I'd suggest using SQLite. It is plenty fast and quite flexible.
If you want just keys and values, I'd use something like leveldb or rocksdb.
If you want a lightweight JSON document store, or key/value with a bit "extra", then UnQLite may be a good fit.
We have a collection that contains about 30K entities. When we query for a subset using a UUID of another entity in another collection there is quite a delay (5-10secs on avg.). Is there a way to optimize this? Would creating connections be faster? We are assessing APIGEE as a potential back-end for millions of entities so this is a huge problem for us. Any recommendations would be appreciated!
Cheers
Using connections is much more efficient (generally) than using queries. With queries, it is the number of search terms that can be a problem, not the number of entities in the collection. Large collection sizes shouldn't really impact performance. Please share an example query if you can - maybe we can help optimize?
Also, we are just putting the finishing touches on a major upgrade that will significantly improve performance of the API BaaS product. Ask your contact at Apigee for more info.
One last thing to consider is that our developer offering may not have the same performance as our paid products. Apigee strives to offer awesome performance on developer, but because of the architecture and SLAs of our paid offering, we can definitely cater to your needs to make sure you have the performance required.
I have a ASP.NET web application (.NET 2008) using MS SQL server 2005. I want to increase the performance of the web site. Does anyone know of an article containing steps to do that, step by step, in SQL (indexes, etc.), and in the code?
Performance tuning is a very specific process. I don't know of any articles that discuss directly how to achieve this, but I can give you a brief overview of the steps I follow when I need to improve performance of an application/website.
Profile.
Start by gathering performance data. At the end of the tuning process you will need some numbers to compare to actually prove you have made a difference. This means you need to choose some specific processes that you monitor and record their performance and throughput.
For example, on your site you might record how long a login takes. You need to keep this very narrow. Pick a specific action that you want to record and time it. (Use a tool to do the timing, or put some Stopwatch code in you app to report times. Also, don't just run it once. Run it multiple times. Try to ensure you know all the environment set up so you can duplicate this again at the end.
Try to make this as close to your production environment as possible. Make sure your code is compiled in release mode, and running on real separate servers, not just all on one box etc.
Instrument.
Now you know what action you want to improve, and you have a target time to beat, you can instrument your code. This means injecting (manually or automatically) extra code that times each method call, or each line and records times and or memory usage right down the call stack.
There are lots of tools out their that can help you with this and automate some of it. (Microsoft's CLR profiler (free), Redgate - Ants (commercial), the higher editions of visual studio have stuff built in, and loads more) But you don't have to use automatic tools, it's perfectly acceptable to just use the Stopwatch class to time each block of your code. What you are looking for is a bottle neck. The likely hood is that you will find a high proportion of the overall time is spent in a very small bit of code.
Tune.
Now you have some timing data, you can start tuning.
There are two approaches to consider here. Firstly, take an overall perspective. Consider if you need to re design the whole call stack. Are you repeating something unnecessarily? Or are you just doing something you don't need to?
Secondly, now you have an idea of where your bottle neck is you can try and figure out ways to improve this bit of code. I can't offer much advice here, because it depends on what your bottle neck is, but just look to optimise it. Perhaps you need to cache data so you don't have to loop over it twice. Or batch up SQL calls so you can do just one. Or tighten your query filters so you return less data.
Re-profile.
This is the most important step that people often miss out. Once you have tuned your code, you absolutely must re-profile it in the same environment that you ran your initial profiling in. It is very common to make minor tweaks that you think might improve performance and actually end up degrading it because of some unknown way that the CLR handles something. This is much more common in managed languages because you often don't know exactly what is going on under the covers.
Now just repeat as necessary.
If you are likely to be performance tuning often I find it good to have a whole batch of automated performance tests that I can run that check the performance and throughput of various different activities. This way I can run these with every release and record performance changes each release. It also means that I can check that after a performance tuning session I know I haven't made the performance of some other area any worse.
When you are profiling, don't always just think about the time to run a single action. Also consider profiling under load, with lots of users logged in. Sometimes apps perform great when there's just one user connected, but when they hit a certain number of users suddenly the whole thing grinds to a halt. Perhaps because suddnely they are spending more time context switching or swapping memory in and out to disk. If it's throughput you want to improve you need to be figuring out what is causing the limit on throughput.
Finally. Check out this huge MSDN article on Improving .NET Application Performance and Scalability. Specifically, you might want to look at chapter 6 and chapter 17.
I think the best we can do from here is give you some pointers:
query less data from the sql server (caching, appropriate query filters)
write better queries (indexing, joins, paging, etc)
minimise any inappropriate blockages such as locks between different requests
make sure session-state hasn't exploded in size
use bigger metal / more metal
use appropriate looping code etc
But to stress; from here anything is guesswork. You need to profile to find the general area for the suckage, and then profile more to isolate the specific area(s); but start by looking at:
sql trace between web-server and sql-server
network trace between web-server and client (both directions)
cache / state servers if appropriate
CPU / memory utilisation on the web-server
I think First of all you have to find your Bottlenecks and then try to improve those.
This helps you to perform exactly where you have serios problem.
An in addition you needto improve your Connection to DB. For exampleusing a Lazy , Singletone Pattern and also create Batch request instead of single requests.
It help you to decrease DB connection.
Check your cache and suitable loop structures.
another thing is to use appropriate types, forexample if you need int donot create a long and etc
at the end ypu can use some Profiler (specially in SQL) andcheckif your queries implemented as well as possible.
I have considered SQLite, but from what I've read, it is very unstable at sizes bigger than 2 GB. I need a database that in theory can grow up to 10 GB.
It would be best if it was stand-alone since it is easier to implement for non-techie users, instead of having the extra step of installing something like MySQL which most likely will require assistance.
Any recommendations?
SQLite should handle your file sizes just fine. The only caveat worth mentioning is that SQLite is not suitable for highly-concurrent environments, since the entire database file is exclusively-locked during writing processes.
So if you are writing an application that needs to handle several users concurrently, a better choice would be Postgresql.
I believe SQLite will actually work fine for you with large databases, especially if you index them appropriately. Considering SQLite's popularity it seems unlikely that it would have fundamental bugs.
I would suggest that you revisit the decision to rule out SQLite, and you might try to compensate for the selection bias of negative reports. That is, people tend to publicize bug reports, not non-bug reports, and if SQLite were the most popular embedded database then you might expect to see more negative experiences than with less popular packages even if it were superior.