we are using the load and unload utilities (Fastload, Mload, Fast Export, etc...) wuth the default number of sessions ? we increased the number of AMPs now so can we know the Optimal settings for Utility sessions ? what is the criteria to calculate this?Is it DBA task or ETL developer task?
Related
When it comes to export, we have the following property options which affect concurrency of the export either to storage directly or to external table (documentation link):-
distribution
distributed
spread
concurrency
query_fanout_nodes_percent
Say, I tweak these options and increase/decrease concurrency based on shards or nodes, is there any Kusto command that will allow me to exactly see how many of these parallel threads of export (whether it's based on per_shard or per_node or some percent) are running? The command .show operation details doesn't show these details , it just shows how many separate export commands are issued by client and not the related parallelization details.
As it stands now, there is no additional information that the system will provide regarding the threads used in the export operation in the same way that this information is not available for queries.
Can you add to your question the benefit of having such information? Is it to track the progress of the command? In any case, if this is something that you feel is missing from the service please open a new item or vote for an existing item in the Azure Data Explorer user voice
We want to setup DR server for IBM BPM Std 8.5.7 and planning to use Prod DB (Oracle) so that if for some reason Prod BPM environment becomes unavailable we can use Prod DB data in DR IBM BPM. Is this possible? What factors need to considered for this?
At present we take the snapshot of Prod DB and using this DB snapshot for COB, all servers are started but when we open Process Admin console we don't see the "Installed App" option and menus on left side to manage users. It seems DR BPM admin ID does not have required roles to get the details.
First of all, I'd like to point you to the article below;
Disaster recovery guidance for IBM Business Process Manager
Please note the difference between configuration data and runtime data as defined on this article. Since some configuration data resides at profile folders of your servers, not the database, it's not enough to just move a snapshot of production database to DR. You must also synchronise configuration data on your file system. This is probably why you can't use your DR BPM as you expected, you move runtime data to DR, but you're missing configuration data.
As for your question on what configurations are possible and what factors to consider, unfortunately answer is not so simple, as you have many alternatives.
Article above highlights which factors to consider for your DR design. There are seven different alternatives for DR topology. These evaluated according to abovementioned factors, and advantages/disadvantages are explained. You must choose one of these according to your specific requirements and resource availability.
I use asp.net mvc, sql server. Query in my repository's class. Sometimes query is executed in 10 seconds, sometimes in 3 minutes!! Why? I used a SQL Server Profiler, but I realy don't understand what could be the cause and how I can find it.
Query:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[FirstAddressId] AS [FirstAddressId],
[Extent1].[SecondAddressId] AS [SecondAddressId],
[Extent1].[Distance] AS [Distance],
[Extent1].[JsonRoute] AS [JsonRoute]
FROM [dbo].[AddressXAddressDistances] AS [Extent1]
Check your query plan. Just run your SELECT statement in SqlServer Management Studio to obtain real query plan. More info is here: Query plan.
If they are the same, but response time differs significantly between the calls, than probably the issue is with lockc on db level (or huge active workloads). I mean incorrect transaction isolation level for instance or some reports running in the meantime obtaining too much resources (or generating locks "because of something" to ensure some data consistency enforced by some developer).
Many factors have influence on performance (including memory available at the moment of query execution).
You can run also a few queries to analyze quality of your statistics (or just update all of them using EXEC sp_updatestats) or please analyze fragmentation of the indexes. I guess, but by me locks and outdated stats or defragmented indexes, can force SqlServer to choose very inefficient query plan.
Some info on active locks: Active locks on table
Additional info 1:
If you are the only user of this db and it's on your local machine (you use SQLServer Express) the issue with locks is rather less possible then other problems. Try to open Event Log of SqlServer. It's available in SqlServer Management Studio on left side (tree) under your engine instance here: Management/Sql Server Logs/Current. Do you see any unusual info there? Try to review system log also (using Event Viewer app). In case of hardware problems you should see there also some info. Btw: how many rows do your have in the table? Try to review also behavior of your disks in some Process Explorer or Performance Monitor. If disk queue length is to big it can be main source of the problem (in such case look what apps stress disk)...
More info on locks:
SELECT
[spid] = session_Id
, ecid
, [blockedBy] = blocking_session_id
, [database] = DB_NAME(sp.dbid)
, [user] = nt_username
, [status] = er.status
, [wait] = wait_type
, [current stmt] =
SUBSTRING (
qt.text,
er.statement_start_offset/2,
(CASE
WHEN er.statement_end_offset = -1 THEN DATALENGTH(qt.text)
ELSE er.statement_end_offset
END - er.statement_start_offset)/2)
,[current batch] = qt.text
, reads
, logical_reads
, cpu
, [time elapsed (ms)] = DATEDIFF(mi, start_time,getdate())
, program = program_name
, hostname
--, nt_domain
, start_time
, qt.objectid
FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt
WHERE session_Id > 50 -- Ignore system spids.
AND session_Id NOT IN (##SPID) -- Ignore this current statement.
ORDER BY 1, 2
GO
Before you waste any more time on this, you should realize that something like the time a query takes in development is essentially meaningless. In development, you're running a single-threaded web server in IIS Express, which means that you've also got VS running, sitting on roughly 2-4 GB of RAM. Together with that, you're running a SQL Server instance, that's fighting the system for both RAM and hard drive time. You haven't given any specs of your system, but if you also happen to be sporting a consumer-class 5400 or 7200 RPM platter-style drive rather than an SSD, that's going to severely impact performance as well. Then, we haven't even got into what else might be running on this system. Photoshop? Outlook? Your favorite playlists of MP3s decoding in the background? What's Windows doing? It might be downloading/applying updates, indexing your drive for search, etc. None of that applies any more when you move into production (or at least shouldn't). In production, you should have a dedicated server with 4-8 GB of RAM and an SSD or enterprise-class 15,000+ RPM platter drive devoted just to SQL Server, so it can spit out query results at lightning speeds.
Long and short, if you want to guage website/query performance of your application, you need to deploy it to a facsimile of what you'll be running in production. There, you can pound the hell out of it and get some real data you can actually do something with. Trying to profile your app in development is just a total waste of time.
I have about 100GB data in BigQuery, and I'm fairly new to using data analysis tools. I want to grab about 3000 extracts for different queries, using a programmatic series of SQL queries, and then run some statistical analysis to compare kurtosis across those extracts.
Right now my workflow is as follows:
running on my local machine, use BigQuery Python client APIs to grab the data extracts and save them locally
running on my local machine, run kurtosis analysis over the extracts using scipy
The second one of these works fine, but it's pretty slow and painful to save all 3000 data extracts locally (network timeouts, etc).
Is there a better way of doing this? Basically I'm wondering if there's some kind of cloud tool where I could quickly run the calls to get the 3000 extracts, then run the Python to do the kurtosis analysis.
I had a look at https://cloud.google.com/bigquery/third-party-tools but I'm not sure if any of those do what I need.
So far Cloud Datalab is your best option
https://cloud.google.com/datalab/
It is in beta so some surprises are possible
Datalab is built on top of below (Jupyter/IPython) option and totally in cloud
Another option is Jupyter/IPython Notebook
http://jupyter-notebook-beginner-guide.readthedocs.org/en/latest/
Our data sience team started with second option long ago with great success and now are moving toward Datalab
For the rest of the business (prod, bi, ops, sales, marketing, etc.), though, we had to build our own workflow/orchestration tool as nothing around was found good or relevant enough.
two easy ways:
1: if your issue is network like you say, use a google compute engine machine to do the analisis, in the same zone as your bigquery tables (us, eu etc). it will not have network issues getting data from bigquery and will be super-fast.
the machine will only cost you for the minutes you use it. save a snapshot of your machine to reuse the machine setup anytime (snapshot also has monthly cost but much lower than having the machine up.)
2: use Google cloud Datalab (beta as of dec. 2015) which supports bigquery sources and gives you all the tools you need to do the analysis and later share it with others:
https://cloud.google.com/datalab/
from their docs: "Cloud Datalab is built on Jupyter (formerly IPython), which boasts a thriving ecosystem of modules and a robust knowledge base. Cloud Datalab enables analysis of your data on Google BigQuery, Google Compute Engine, and Google Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions)."
You can check out Cooladata
It allows you to query BQ tables as external data sources.
What you can do is either schedule your queries and export the results to Google storage, where you can pick up from there, or use the built in powerful reporting tool to answer your 3000 queries.
It will also provide you all the BI tools you will need for your business.
Currently I'm working on a project with opends. I have to upload more than 200k entries in the OpenDS. But unfortunately its fails at random times when file limit exceeding more than 10k - 15k.
When I google for that particular error (alert ID 9896233: JE Database Environment corresponding to backend id userRoot is corrupt. Restart the Directory Server to reopen the Environment) it seems like openDS backend DB [BerklyDB] is not that reliable when adding massive number of entries. How can i plug in new commercial or open source reliable relational DB [Oracle/ H2] to the openDS. any configuration ? or do i have to change the openDS code ?
First you should be aware that Oracle has pulled the plug on the OpenDS project and it is now completely stalled. Development continues as open source as the OpenDJ project : http://opendj.forgerock.org.
This said, I believe that there is a problem with your environment. When I was still working on OpenDS, our basic stress test was importing and running very high load against 10 Millions users. 200K entries is not massive number. My daily OpenDJ tests on my laptop are done with 100K to 1M entries. We have customers running in production with OpenDJ with more than 20M entries, growing 40% every 6 months !
Berkeley DB has been proved to be very scalable and reliable.
Things you might want to check : what is the maximum number of files that can be opened by a single process on your machine ? Linux defaults to 1024 and the limit may be easy to hit with OpenDS or OpenDJ. Are you using a local filesystem ? Berkeley DB is not supported on networked FS such as NFS or other NAS.
Finally, check the logs/errors file and your systems log. Chances are that one of them will have a message containing the root cause of the problem (most likely logs/errors).
Kind regards,
Ludovic Poitou
ForgeRock - Product Manager for OpenDJ