Pentaho Dashboard Performance

Pentaho Dashboard Performance - bigdata

I have a CDE dashboard developed with Pentaho CE 6.0.1 with 4 line charts. My plugins are updated.
Our dataset is a SQL Server database with millions of records (near to 80 millions).
The BA Server is running in a cloud enviroment as the database as well and queries in database motor run at very few seconds.
The dashboard performance is very poor, it takes almost 2 minutes to full load. I tried to remove 3 of the charts and leave only the principal but the load times didn't change at all.
Do you think I need a big data approach for this amount of data?
Can you please help me providing me some performance tuning tips in order to improve this behavior?
Thanks a lot for any help!

with sql server you will face this problem, workarounds are you can increase the ram , you can enable catch so for the first time it will take time after that it will be smooth but again no concrete solution.
Either you have to change to columnar database like HP vertica or you can do changes at architecture level, create data-warehouse and store only necessary data

Related

Oracle PeopleSoft - Data Mover script taking too much time while database import operation

I am using PT859 and PT858 tools and while importing db files data to Database, my VM is taking double the usual time. I am using VM with below configuration:
Windows Server 2016 Standard
Processor: Intel(R) Xeon(R) CPU E5-2699C v4 #2.20GHz 2.19 GHz
RAM: 15GB
Please help me guide if any of you have gone through similar problems.
Thanks in advance.

Data mover scripts taking longer time to execute have many parameters to look into for it’s better performance.
Generate the AWR report at the time of the script execution and get it
reviewed with the DBAs. They will suggest you to best actions to be
taken.
Check the ASH report when the script is running .
ADDM is self explanatory ,the recommendation are also given along with
the improvement you would achieve after making the changes
There are general points you could look into, there are many others parameters as well like:
1.Script must be written in the proper manner. Bad scripting causes delay as well.
2.Use the indexes if possible.
3.Check if the database is also slow or it is performing well . If database performance is already low then queries may take longer time.
4.The time you are running your query , check parameters such as Disk I/O, swap utilization, Memory and CPU utilization of the database server, These should not hit the maximum.
5.Generate the execution plan of the query , you will come to know which part of the query is taking longer time. Take action accordingly.
6.Run the script during the off business hours and check how long it takes. If it takes lesser time then there must be load on the database when you are running the scripts .(assuming it’s a production database).

How to prevent proxy timeouts with SQL Server Reporting Services

We have a system running Windows Server 2008R2 x64 and SQL Server 2008R2 x64 with SSRS installed/configured. This is a shared reporting server used by a large number of people, with some fairly large inefficient databases (400-500gb of data ish), and these users use the system to generate ad-hoc reports based of a reporting model that sits on top of the aforementioned databases. Note that the users are using NTLM to logon and identify for running reports.
Most reports are quick, but if you are running a report for 1 or 2 years worth of data, they can take a while to return (5minutes ish). This is fine for most users, however some of the users are stuck behind a proxy, which has a connection timeout set at 2minutes. As SSRS 2008R2 does not seem to send back a "keep-alive" signal (confirmed via wireshark), when running one of these long reports the proxy server thinks the connection has died, and as such it just gives up and kills the connection. This gives the user a 401 or 503 error and obviously cancels the report (the incorrect error is a known bug in SSRS which Microsoft refuse to fix).
We're getting a lot of flak from the user's about this, even though it's not really our issue..so I am looking for a creative solution.
So far I have come up with:
1) Discovering some as yet unknown setting for SSRS that can make it keep the connection alive.
2) installing our own proxy in between the users and our reports server, which WILL send a keep-alive back (not sure this will work and it's a bit hacky, just thinking creatively!)
3) re-writing our reports databases to be more efficient (yes this is the best solution, but also incredibly expensive)
4) ask the experts :) :)
We have a call booked in with Microsoft Support to see if they can help - but can any experts on Stack help out? I appreciate that this may be a better question for server fault (and I may post it there) but it's a development question too really :)
Thanks!

A few things:
A. For SSRS overall on it's service:
I personally use a keep alive service as I believe the default recycle is 12 hours for SSRS server. I use a tool someone turned me onto called 'VisualCron' that can do many task processes automatically. You can also just make a call in a WCF service or similar to. Basically I know the first report from a user for the day is generally slow. Usually you need to hit http:// (servername)/ReportServer to keep it alive.
B. For cachine report level items:
If this does not help I would suggest possibly caching DataSets when possible. Some people have data that is up to the moment but for a lot of people that is not the case. You may create a shared dataset in SSRS and then cache that on a schedule. So if you have domain like tables that only need to be updated once in a blue moon put them there. Same with data that is nightly or in batches. If you are transactional based shop that is up to the moment this may not help but for batch based businesses this can help tremendously.
You can also cache the reports for their data as a continuation of this. Under 'Manage' drop down for a report when in the /Reports landing page you can set the data to run under a specific schedule. You can also set a snapshot which is an extension of this as it executes with some default parameters set on a schedule and is a copy of the report when it was ran.
You are mentioning ASP.NET so I am not certain how much some of this will work if you are doing this all through a site you are setting up internally as a pass through. But you could email or save files on a schedule as well through SSRS's subscription service.
C. Change how you store your data for reporting.
You can create a Report Warehouse of select item level values of queries. Create a small database that is just a few recent years of data and only certain fields and certain tables. Then index it to death and report off of that. In my experience this method will fly in terms of performance but it does take the extra overhead of setting it up. Generally most companies will whine about this but it often takes a single day to set up and then you create one SSMS job that does it all nightly or an SSIS package then you don't worry about it. I like this method as I know my data is not being reported off of production and is isolated personally.

Using Slave database as primary while Master gets updated?

I currently have a problem where the performance of my database is impacted by several million lines of updates that run (it takes more or less 3 days, so we usually run them over a weekend)
However since the site is live, search performance is impacted. A 3 second query to pull 1.3 million records and page through them takes in excess of the timeout values by default in sql server sometimes. This obviously creates a user experience no one wants (or can afford to) to have happen.
My question now. If I setup replication on the Master to a Slave on the same server; Would I be able to point the website to the Slave and avoid that performance impact? Or would it just be duplicating the same problem since the Master will push any updates through to the Slave in any case?

I don't think replication is going to help you here, it is only going to make things on the source system worse IMHO.
Is it possible you can make a static copy of the data for the users running queries while the updates are going on? For reporting solutions that don't need to be up-to-the-minute, I've done this in several cases using two schemas - one that holds the static versions of the tables for querying, and one where the work is being done; when the work is done, switch. I go into this methodology in a little more detail here: What is the best way to refresh a rollup table under load?
Perhaps another thought is to make your updates more efficient, such that they don't take 3 days? Do you only do this on long weekends?

Question is, what is nature of your site ?? Do users use it just to "Search" or it does CRUD operations ?? If it is just for "Search" and Report generation then I agree with #Aaron. You can have some database just for reporting purposes, you can even use Log Shipping to automatically update your reporting database at very brief interval.
Is it possible that user can change data at the same time while records are being updated by your update process ?? In that case, you will have to update your Primary Database using your update job, and then again update Primary database for changes made by users using Slave database.

How many is too many databases on SQL Server?

I am working with an application where we store our client data in separate SQL databases for each client. So far this has worked great, there was even a case where some bad code selected the wrong customer ids from the database and since the only data in the database belonged to that client, the damage was not as bad as it could have been. My concerns are about the number of databases you realistically have on an SQL Server.
Is there any additional overhead for each new database you create? We we eventually hit a wall where we have just to many databases on one server? The SQL Server specs say you can have something like 32000 databases but is that possible, does anyone have a large number of database on one server and what are the problems you encounter.
Thanks,
Frank

The upper limits are
disk space
memory
maintenance
Examples:
Rebuilding indexes for 32k databases? When?
If 10% of 32k databases each has a active set of 100MB data in memory at one time, you're already at 320GB target server memory
knowing what DB you're connected too
...
The effective limit depends on load, usage, database size etc.
Edit: And bandwidth as Wyatt Barnett mentioned.. I forgot about network, the bottleneck everyone forgets about...

The biggest problem with all the multiple databases is keeping them all in synch as you make schema changes. As far as realistic number of databases you can have and have the system work well, as usual it depends. It depends on how powerful the server is and how large the databases are. Likely you would want to have multiple servers at some point not just because it will be faster for your clients but because it will put fewer clients at risk at one time if something happens to the server. At what point that is, only your company can decide. Certainly if you start getting a lot of time-outs another server might be indicated (or fixing your poor queries might also do it). Big clients will often pay a premium to be on a separate server, so consider that in your pricing. We had one client so paranoid about their data we had to have a separate server that was not even co-located with the other servers. They paid big bucks for that as we had to rent extra space.

ISPs routinely have one database server that is shared by hundreds or thousands of databases.

Architecturally, this is the right call in general. You've seen the first huge advantage--oftentimes, damage can be limited to a single client and you have near zero risk of a client getting into another client's data. But you are missing the other big advantage--you don't have to keep all the clients on the same database server. When you do get big enough that your server is suffering, you can offload clients onto another box entirely with minimal effort.
I'd also bet you'll run out of bandwidth to manage the databases before your server runs out of steam to handle more databases . . .

What you are really asking about is Scalability; Though, ideally setting up 32,000 Databases on one Server is probably not advantageous it is possible (though, not recommended).
Read - http://www.sql-server-performance.com/articles/clustering/massive_scalability_p1.aspx

I know this is an old thread but it's the same structure we've had in place for the past 2 years and current run 1768 databases over 3 servers.
We have the following setup (not included mirrors and so on):
2 web farm servers and 4 content servers
SQL instance just for a master database of customers, which is queried when they access their webpage by the ID to get the server/instance and database name which their data resides on. This is then stored in the authentication ticket.
3 SQL servers to host customer databases on with load spread on creation based on current total number of learners that exist within all databases on each server (quickly calculated by license number field in master database).
On each SQL Server there is a smaller master database setup which contains shared static data that is used by all clients, therefore allowing smaller client databases and quicker updating of the content.
The biggest thing as mentioned above is keeping the database structures synchronises! For this I ended up programming a small .NET windows form that looks up all customers in the master database and you paste code in to execute and it'll loop through getting the database location and executing the SQL you past.
Creating new customers also caused some issues for us, so I ended up programming a management system for our sale people and it create a new database based on a backup of a inactive "blank" database, therefore we have the latest DB without need to re-script the entire database creation script. It then inserts the customer details inside the master database with location of where the database was created and migrates any old data from an old version of our software. All this is done on a separate instance before moving, therefore reducing any SQL locks.
We are now moving to a single database for our next version of the software as database redundancy is near impossible with so many databases! This is a huge thing to consider as SQL creates a couple of waiting tasks which mirror your data per database, once you start multiplying the databases it gets out of hand and the system almost solely is tasked with synchronising and can lock up due to the shear number of threads. See page 30 of Microsoft document below:
SQLCAT's Guide to High Availability Disaster Recovery.pdf
I do however have doubts about moving to a single database, due to some concerns as mentioned above, such as constantly checking in every single procedure that the current customer has access to only their data and also things along the lines of one little issue will now affect every single database, such as table indexing and so on. Also at the minute our customer are spaced over 3 servers, but the single database will mean yes we have redundancy, but if the error was within the database rather than server going down, then that's every single customer down, not just 1 customer database.
All in all, it depends what you're doing and if you are wanting the redundancy; for me, the redundancy is now key and everything else in a perfect world shouldn't happen (such as error which causes errors within the database for everyone). We only started off expecting a hundred or so to move to the system from the old self hosted software and that quickly turned into 200,500,1000,1500... We now have over 750,000 users use our system each year and in August/September we have over 15,000 concurrent users online (expecting to hit 20,000 this year).
Hope that this is of help to someone along the line :-)
Regards
Liam

Running a Asp.net website with MS SQL server - When should i worry about scalability?

I run a medium sized website on an ASP.net platform and using MS SQL server to store the data.
My current site stats are:
~ 6000 Page Views a day
~ 10 tables in the SQL server with around 1000 rows per table
~ 4 queries per page served
The hosting machine has 1GB RAM
I expect by the end of 2009 to hit around:
~ 20,000 page views
~ 10 tables and around 4000 rows per table
~ 5 queries per page served
My question is should I plan for scalability right now itself? Will the machine hold up till the end of the year with the expects stats.
I know my description is very top level and does not provide insight into the kind of queries etc. But just wanted to know what your gut instinct tells you?
Thanks!

You should always plan for scalability. When to put resources into doing the actual scaling is usually the tough guess.
Will the machine hold up until the end
of the year
Way too little information to answer this. If a page request takes 30 CPU seconds to process due to massive interaction with a legacy enterprise application through the four queries per page - then there's no way. If it's taking miniscule fractions of a second to serve some static content stored in the cache and your queries are only executed every half hour to refresh the content - then you're good until 2020 at the rate of traffic growth you describe.
My guess is that you're somewhere closer to the latter scenario. 20,000 page hits a day is not really a ton of traffic, but you'll need to benchmark your page and server performance at some point so that you can make the calculations you need.
Things to look at for scaling your site when it is time:
Output Caching
Optimizing Viewstate
Using Ajax where appropriate
Session optimization
Request, script, css and html minification
Two years ago I saw a relatively new (for two years ago) laptop running IIS and serving up 1100 to 1200 simple dynamic page requests per second. It had been set up by a consulting firm whose business was optimizing ASP.Net websites, but it goes to show you how much you can do.

Essentially, by the end of 2009, you expect to do 100,000 SQL queries per day. This is about 1.157 queries per second.
I am making the assumption that your configuration is "normal" (i.e. you're not doing something funky and these are pretty straightforward SELECT, UPDATE, INSERT, etc), and that your server is running RAID disks.
At 4,000 rows per table this is nothing to SQL server. You should be just fine. If you wanted to be proactive about it, put another stick of RAM in the server and bring it up to at least 2GB, that way IIS and SQL have plenty of memory (SQL will certainly take advantage of it).

The hosting machine? Does this mean that you have IIS and SQL installed on the same box or IIS on your host machine with a dedicated SQL Server provided by your hosting company? Either way I would suggest starting to take a look at how you might implement a caching layer to minimize the hits (where possible) to the database. Once this is PLANNED (not necessarily implemented) I would then start to look at how you might build a caching layer around your output (things built in ASP.NET). If you see a clear an easy path to building caching layers...then this is a quick and easy way to start to minimize request to the database and work on your web server. I suggest that this cache layer be flexible...read not use anything provided by .NET! Currently I still suggest using MemCached Win32. You can install it on your one hosted local box easily and configure your cache layer to use local resources (add memory...1gb is not enough). Then if you find that you really need to squeeze every little bit of performance out of your system...splurge for a second box. Split your cache between your current box...and the new box (allowing you to keep more in cache). This will give you some room (and time) to grow. Offloading to more cache should help address any future spikes...and with the second box you can now also focus on making your site work in farmed environment. If you are using local session..push that into your cache layer so that a request from one box or another won't matter (standard session is local to the box that it is managed on).
This is a huge subject...so without real details this is all speculation of course! You might be just right for adding better and more hardware to the existing installation.

Have you tried setting up a quick performance test using sample data? 20,000 page views is less than one/sec (assuming even distribution over 8 hours), which is pretty minimal given your small tables. Assuming you're not sending a ton of data with each page view (i.e. a data table with all 1000 rows from one of your tables), you are likely OK.
You may need to increase RAM, but other than running a performance test I wouldn't worry too much about performance right now.

I don't think the load you are describing would be too much of a problem for most machines. Of course it doesn't just depend on the few metrics you outlined but also on query complexity, page size, and a heap of other things.
If you worry about scalability do some load testing and see how your site handles, say 10000 page views per hour (about 3 views per second). It's mostly always good to plan ahead as long as you plan for probable scenarios.

Guts say: Given 10 tables with 4,000 rows each and assuming about 2KB of data per row is only 80MB for the entire database. Easily cached within memory available. Assuming everything else about the application is equally simple, you should be able to easily serve hundreds of pages per second.
Engineers say: If you want to know, stress test your application.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex