Oracle PeopleSoft - Data Mover script taking too much time while database import operation

Oracle PeopleSoft - Data Mover script taking too much time while database import operation - peoplesoft

I am using PT859 and PT858 tools and while importing db files data to Database, my VM is taking double the usual time. I am using VM with below configuration:
Windows Server 2016 Standard
Processor: Intel(R) Xeon(R) CPU E5-2699C v4 #2.20GHz 2.19 GHz
RAM: 15GB
Please help me guide if any of you have gone through similar problems.
Thanks in advance.

Data mover scripts taking longer time to execute have many parameters to look into for it’s better performance.
Generate the AWR report at the time of the script execution and get it
reviewed with the DBAs. They will suggest you to best actions to be
taken.
Check the ASH report when the script is running .
ADDM is self explanatory ,the recommendation are also given along with
the improvement you would achieve after making the changes
There are general points you could look into, there are many others parameters as well like:
1.Script must be written in the proper manner. Bad scripting causes delay as well.
2.Use the indexes if possible.
3.Check if the database is also slow or it is performing well . If database performance is already low then queries may take longer time.
4.The time you are running your query , check parameters such as Disk I/O, swap utilization, Memory and CPU utilization of the database server, These should not hit the maximum.
5.Generate the execution plan of the query , you will come to know which part of the query is taking longer time. Take action accordingly.
6.Run the script during the off business hours and check how long it takes. If it takes lesser time then there must be load on the database when you are running the scripts .(assuming it’s a production database).

Related

What can I do to speed up my load test using NBomber? (VS LT 250 RPS easily; NBomber maxed with 25 RPS)

We've been using Visual Studio Load Test to exercise our .NET Framework 4.7.2 telemetry client where we can set up the load test to post metrics to our Rabbit MQ at a rate of about 250 metrics per second. Recently, we've had to migrate our telemetry client to .NET Core and need to run load testing and verify that it can still post metrics at the same rate. Now, Visual Studio Load Test (VSLT) is being deprecated and has no support for .NET Core framework so we've had to look to something like NBomber to use in place of VSLT.
With regards to NBomber, there doesn't seem to be enough documentation or support that I can get because I've tried all I know and cannot get NBomber to post more than 25 metrics per second. At the same time, I'm seeing 100% CPU usage.
Anyone has any insight to share with me? Thanks in advance for your help,
Tien

Turns out, my logic was bad. A senior developer and friend shared with me some insights where I was initializing a telemetry client for each posting of a metric. This was the key to high consumption of CPU and not allowing me to reach the performance I was expecting. I'm in the process of re-coding my test(s) so that NBomber can be used to initialize 250 telemetry clients posting a minimum of 1MM metrics within an hour. I ran a fix yesterday that posted 17K metrics within 56secs with just 1 telemetry client or of about a rate of 300 RPS. I thought VS LT was awesome, but I'm thinking NBomber is quite impressive.
Cheers to Load Testing with NBomber!!
Tien

If single instance of NBomber is consuming 100% of CPU and not conducting the necessary load you will need to set up another machine and run NBomber in distributed cluster mode
Why do you need cluster?
You reached the point that the capacity of one node is not enough to create a relevant load.
You want to delegate running multiple scenarios to different nodes. For example, you want to test the database by sending in parallel read and write queries. In this case, one node can send inserts and another one can send read queries.
You want to simulate a real production load that requires several nodes to participate. For example, you may have one node that periodically writes data to the Kafka broker and two nodes that constantly read this data from the Redis cache.
Also it seems that Microsoft recommends using Apache JMeter™ so it might worth giving it a try. JMeter is capable of sending messages to various MQ implementations and its documentation is more concise, i.e. see Building a JMS Topic Test Plan

Snowpipe vs Airflow for Continues data loading into Snowflake

I had a question related to Snowflake. Actually in my current role, I am planning to migrate data from ADLS (Azure data lake) to Snowflake.
I am right now looking for 2 options
Creating Snowpipe to load updated data
Create Airflow job for same.
I am still trying to understand which will be the best way and what is the pro and cons of choosing each.

It depends on what you are trying to as part of this migration. If it is a plain vanilla(no transformation, no complex validations) as-is migration of data from ADLS to Snowflake, then you may be good with SnowPipe(but please also check if your scenario is good for Snowpipe or Bulk Copy- https://docs.snowflake.com/en/user-guide/data-load-snowpipe-intro.html#recommended-load-file-size).
If you have many steps before you move the data to snowflake and there are chances that you may need to change your workflow in future, it is better to use Airflow which will give you more flexibility. In one of my migrations, I used Airflow and in the other one CONTROL-M

You'll be able to load higher volumes of data with lower latency if you use Snowpipe instead of Airflow. It'll also be easier to manage Snowpipe in my opinion.
Airflow is a batch scheduler and using it to schedule anything that runs more frequently than 5 minutes becomes painful to manage. Also, you'll have to manage the scaling yourself with Airflow. Snowpipe is a serverless option that can scale up and down based on the volumes sees and you're going to see your data land within 2 minutes.
The only thing that should restrict your usage of Snowpipe is cost. Although, you may find that Snowpipe ends up being cheaper in the long run if you consider that you'll need someone to manage your Airflow pipelines too.

There are a few considerations. Snowpipe can only run a single copy command, which has some limitations itself, and snowpipe imposes further limitations as per Usage Notes. The main pain is that it does not support PURGE = TRUE | FALSE (i.e. automatic purging while loading) saying:
Note that you can manually remove files from an internal (i.e.
Snowflake) stage (after they’ve been loaded) using the REMOVE command.
Regrettably the snowflake docs are famously vague as they use an ambiguous colloquial writing style. While it said you 'can' remove the files manually yourself in reality any user using snowpipe as advertised for "continuous fast ingestion" must remove the files to not suffer performance/cost impacts of the copy command having to ignore a very large number of files that have been previously loaded. The docs around the cost and performance of "table directories" which are implicit to stages talk about 1m files being a lot of files. By way of an official example the default pipe flush time on snowflake kafka connector snowpipe is 120s so assuming data ingests continually, and you make one file per flush, you will hit 1m files in 2 years. Yet using snowpipe is supposed to imply low latency. If you were to lower the flush to 30s you may hit the 1m file mark in about half a year.
If you want a fully automated process with no manual intervention this could mean that after you have pushed files into a stage and invoked the pipe you need logic have to poll the API to learn which files were eventually loaded. Your logic can then remove the loaded files. The official snowpipe Java example code has some logic that pushes files then polls the API to check when the files are eventually loaded. The snowflake kafka connector also polls to check which files the pipe has eventually asynchronously completed. Alternatively, you might write an airflow job to ls #the_stage and look for files last_modified that is in the past greater than some safe threshold to then rm #the_stage/path/file.gz the older files.
The next limitation is that a copy command is a "copy into your_table" command that can only target a single table. You can however do advanced transformations using SQL in the copy command.
Another thing to consider is that neither latency nor throughput is guaranteed with snowpipe. The documentation very clearly says you should measure the latency yourself. It would be a completely "free lunch" if snowpipe that is running on shared infrastructure to reduce your costs were to run instantly and as fast if you were paying for hot warehouses. It is reasonable to assume a higher tail latency when using shared "on-demand" infrastructure (i.e. a low percentage of invocations that have a high delay).
You have no control over the size of the warehouse used by snowpipe. This will affect the performance of any sql transforms used in the copy command. In contrast if you run on Airflow you have to assign a warehouse to run the copy command and you can assign as big a warehouse as you need to run your transforms.
A final consideration is that to use snowpipe you need to make a Snowflake API call. That is significantly more complex code to write than making a regular database connection to load data into a stage. For example, the regular Snowflake JDBC database connection has advanced methods to make it efficient to stream data into stages without having to write oAuth code to call the snowflake API.
Be very clear that if you carefully read the snowpipe documentation you will see that snowpipe is simply a restricted copy into table command running on shared infrastructure that is eventually run at some point; yet you yourself can run a full copy command as part of a more complex SQL script on a warehouse that you can size and suspend. If you can live with the restrictions of snowpipe, can figure out how to remove the files in the stage yourself, and you can live with the fact that tail latency and throughput is likely to be higher than paying for a dedicated warehouse, then it could be a good fit.

100% CPU Usage in ASP.Net

After deploying a new version of a hybrid asp.net web application, Framework 4.5.1, IIS 7.5, we immediately noticed that CPU usage was spiking to 100%.
I followed CPU spike debugging using DebugDiag as described in this article: http://www.iis.net/learn/troubleshoot/performance-issues/troubleshooting-high-cpu-in-an-iis-7x-application-pool
I now have my report, and every one of the threads identified as High CPU usage problems look like this, with varying thread numbers:
Thread 1576 - .SNIReadSyncOverAsync(SNI_ConnWrapper*, SNI_Packet**, Int32)
I'm guessing this means the culprit is a LINQ to SQL call. The application uses a lot of LINQ to SQL. Unfortunately the DebugDiag report gives no clue as to which LINQ to SQL call is causing the difficulty.
Is there any way to use the information in the DebugDiag report to identify the SQL Server calls that causes the High CPU usage?

We never did find an answer to the question. I was hoping for an answer that would tell us what we could add to the performance monitor data collection to see the actual SQL that was being passed by the threads that were spiking CPU.
Instead we ran SQL Server performance monitor, duly filtered to cover only traffic from the web application, for about a minute. We dumped all the data collected into a table, then examined statement start and end times to identify statements that were taking an inordinate amount of time. From this collection of sluggish statements we identified the SQL call that was spiking CPU.
Oddly enough, the SQL call (selecting the results of an Inline Table-Valued Function) takes 2-3 seconds to complete, but most of that time is taken with sql server breaking the connection (sp_reset_connection). The call itself returns in less than a millisecond, and when we execute the same function in SSMS using identical parameters the call executes in less than a millisecond. However, this will be the topic of a separate question.

Limit database usage of a website

To start off - I have 2 separate websites and a database (IIS 7.5, ASP.NET and SQL Server 2008, using Linq-To-SQL for database access).
I have a separate administrative website that sometimes, during usage needs to trigger long running operations (more than 10 seconds) on database. The problem is that those operations cause sqlserver process to hit 100% CPU and then other, main customer website, can't access database promptly - there are some delays in accessing database.
I am OK with those administrative operations lasting 2x or 4x or nx times longer since they are lower priority.
I've tried using CPU Limit setting on AppPool in IIS, but that doesn't help, as w3wp.exe process never uses much of CPU... rather it's sqlservr.exe. Thanks in advance for your suggestions!

If your admin queries are consuming all the CPU on the box, there is almost certainly some tuning opportunities there - likely some indexing optimizations.
In lieu of the time to invest in those, and until you get your Resource Governor configuration settled, you can simply reduce their impact to a single CPU, which may provide short-term symptom relief, by adding the MAXDOP hint to your admin query:
OPTION (MAXDOP 1);
Yes, it might make you feel a little dirty, but the rest of your CPUs will be freed up to work on your more important queries.
The real answer is to tune your admin queries. Just because it's ok that they run long does not mean it's good for your server or the experience of your users. You'll never be able to completely isolate them from the effects of other queries going on on the box, especially if you are experiencing high CPU that is compensating for slow I/O. I/O does not have any knobs in resource governor - you can only control CPU and memory, and not even 100%.

Sounds like you want to look into Resource Governor which is built into SQL server as of SQL 2008. BOL link should get you started.
http://msdn.microsoft.com/en-us/library/bb933866(v=SQL.100).aspx
Essentially, you can throttle CPU and memory usage for the resource pools and workloads you define. This throttling will only kick in when the server is under load. Be aware that you cannot control disk IO utilization. If the process in your admin database is IO bound and your other DBs share drives you will inevitably still see performance issues and moving databases to separate spindles or query tuning will be necessary.
Example of the classifier function that will ensure the user you define is throttled by the desired resource pool based on workload group:
/* Classifier function */
CREATE FUNCTION dbo.rgov_classifier_db ()
RETURNS sysname
WITH SCHEMABINDING
AS
BEGIN
DECLARE #rgWorkloadGrp sysname
IF SUSER_SNAME() = 'adminWebsiteDB'
SET #rgWorkloadGrp = 'workloadGroupName'
ELSE
SET #rgWorkloadGrp = 'defaultWorkloadGroupName'
RETURN #rgWorkloadGrp
END;
GO
/* Register the function with Resource Governor and then start Resource Governor. */
ALTER RESOURCE GOVERNOR
WITH (CLASSIFIER_FUNCTION = dbo.rgov_classifier_db);
GO
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

Performance logging tips

I am developing large data collecting ASP.Net/Windows service application-pair that uses Microsoft SQL Server 2005 through LINQ2Sql.
Performance is always the issue.
Currently the application is divided into multiple larger processing parts, each logging the duration of their work. This is not detailed and does not help us with anything. It would be nice to have some database tables that contain statistics that the application itself collected from its own behavior.
What logging tips and data structures do you recommend to spot the parts that cause performance problems?
Edit:
Mostly I am looking for parts of the application that can cripple the whole system when excessively used. There are peaks during the day when some parts of the application are under heavy load. Some advanced logging would help me isolate the parts that need more attention and optimizing.

Don't use logging for this, use Performance Counters instead. The runtime impact of performance counters is minor and you can simple have them always on. To collect and monitor the performance, you can rely on the existing performance counters infrastructure (perfmon.exe, logman.exe, relog.exe etc).
I personally use XML and XSLT to generate the counters. I can then decorate all my code with performance counters tracking functions being run, average call duration time, number of executions, rate of executions per second and so on and so forth. Good choice of counters will give an immediate, accurate, performance picture much faster than logging can. While logging can give more insight on certain event paths (ie. order of events that lead to certain state), logging can seldom be 'always on' as the impact on performance is significant, not only on performance but most importantly on concurrency as most existing logging infrastructures add contention.

This is not a job for logging. It's a job for a profiler.
Try one of these:
JetBrains' dotTrace - http://www.jetbrains.com/profiler/index.html
Red-Gate ANTS - http://www.red-gate.com/products/ants_profiler/index.htm
Automated QA's AQTime - http://www.automatedqa.com/products/aqtime/index.asp

While I haven't (yet) tried it for myself, it may be worth looking at Gibraltar which can be used with PostSharp to put declarative performance logging into your code.

When dealing with problems like this I try and not add any extra headache by manually adding logging / tracing & timing into the application itself. If all you want is to tune the application then I suggest getting a profiler which will show you what areas of code are an issue. I recommend Red-Gate's Ant's Profiler.
Now if you want to collect statistics for monitoring or trending purposes then a profiler is not the right tool. I have had success using PerformanceCounters which let's many third party tools pull the performance information out of the application.
So what are you trying to do solve performance problems or monitor to ensure you catch a performance problem before it becomes severe?
EDIT
Based on your comment, I would look at using performance monitors around critical sections of code, timing how long it took to complete an operation. Then you can use the built in performance monitoring tools, or any number of third party tools to monitor and trend the stats.

SQL Server keeps track of some things for you, so try running some of these queries on your system:
Wayback Machine: Uncover Hidden Data to Optimize Application Performance
here is an example from the link:
--Identifying Most Costly Queries by I/O
SELECT TOP 10
[Average IO] = (total_logical_reads + total_logical_writes) / qs.execution_count
,[Total IO] = (total_logical_reads + total_logical_writes)
,[Execution count] = qs.execution_count
,[Individual Query] = SUBSTRING (qt.text,qs.statement_start_offset/2,
(CASE WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)
,[Parent Query] = qt.text
,DatabaseName = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Average IO] DESC;
the link contains many queries including ones for: Costly Missing Indexes, Logically Fragmented Indexes, Identifying Queries that Execute Most Often , etc...

I would start would diagnosing what is the real cause for the perf issue? Is it CPU, Memory, Disk or IO. This can be identified by few perfmon counters.
For example Linq2SQL uses Sync I/O which could be a big bottleneck for scalability. Because it uses Sync I/O windows threads get blocked and requests would end up waiting. This is usual suspect and might not be true. Here is an MSDN article how sync I/O could affect scalability.
If CPU is an issue then the next question is application CPU bound? Then you could use one of the profilers like mentioned above. Also look for time spent on GC perfmon counter that is another usual suspect?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex