Same USQL job taking almost 4 hours which was taking 30 minutes before why? - u-sql

I am having one ADF pipeline which runs USQL jobs with 1 AU. Its scheduled for daily once. normally it was taking around 30 minutes to complete the job. But now days I can see that jobs are taking more than 3 - 4 hours to completed the job. I can see that now days it is allocating more vertices to the job. So to compare I ran same old job. You can see below differences.
1) first Job -
https://cmndatadevdl01.azuredatalakeanalytics.net/jobLink/a3071c07-4b90-4f17-8dab-ba16764d9165
It is running with 5815 vertices with 1 AUs and completes in 28 minutes
2) second job -
https://cmndatadevdl01.azuredatalakeanalytics.net/Jobs/07e41502-3785-4f87-97d0-7682d544864b?api-version=2015-10-01-preview
I am running same above job with 5 AUS to save my time which completes in 46 minutes. I am running same code but it is using 42,330 vertices why?
3) third job
https://cmndatadevdl01.azuredatalakeanalytics.net/jobLink/c0037de7-6ba4-4aa5-9938-c7ba17b5edeb
This is almost same job but little different input but it takes 42173 vertices and complete in 4.4 hours with 1 AUs
I think there is something wrong with Azure data lake analytics account now days. I am facing this issue from couple of days. Around one week before everything was working fine. Please help me to resolve this issue

unfortunately, we do not have access to your job links.
The number of vertices depends among other things primarily on the number of files, their size and - if you use tables - the number of distributions, partitions, and - often overlooked but very important - the number of table fragment that you may have generated while inserting into the tables.
Can you tell us more about your data?
Are you querying files? What format (e.g., JSON, CSV etc)? Did they grow in number or size?
Are you querying tables? Are you inserting into them frequently? When did you last rebuild the table or table partition?

Related

Azure Time Series Insights Gen2 slower than preview?

We have a couple of environments still running Time Series Insights Preview version. This is really fast and we are really satisfied by it. However, new environments really seem a lot slower with the official release. Warm path extraction is a lot slower, but still doable, while cold path extraction becomes unbearable.
EDIT: We need to add &storeType=WarmStore if we would like to query warm data. Cool! This works really fast again! Question about cold store still persists:
It is hard to compare the different environments, because the datasets are not exactly the same, but for our new environment we have about 4.5 TB sensor data imported in TSI.
The following screenshot shows a query that tries to retrieve one minute of data for one device (each device only sends data each 10 seconds) in the far past of 2018. However, the server returns the call after 30 seconds with a continuationtoken, saying it couldn't retrieve all the 6 values in time. Sometimes it manages to return all 6 of the values, but it still takes 30 seconds.
My internet download speed, while performing the query, was over 80 Mb per second, so that shouldn't be an issue either.
Is this something we should be worried about in the new release?
please submit a support ticket through the Azure portal with all of these details and the product team will investigate.

How exactly to read CosmosDB Metrics charts

I am trying to get my mind around metrics charts in azure portal for CosmosDB and i find it a bit confusing.
For example, i get charts like this:
What confuses me in particular is how to read combination of charts 1 and 3?
chart 1 shows a spike of roughly 100RU. That would mean, if there would be 4 times more, it would start with requests throttling.
On the other hand, chart 3 suggests that there is still alot of capacity left untill provisioned 400RU limit is met.
So, what should be concluded here about when will the first throttled request occur? in 3x more as with spike or in ~100x more as suggested by chart 3?
Graph 3 shows the average, which is pretty flat. Graph 1 shows actual RU/s consumed. It looks as though you had a temporary spike in RU consumption - perhaps even one query. Throttling is performed on a per second basis. To answer your question, if you had 3x more consumption in a single second, you'd be throttled.

Best retention practice using Graphite

I have been a happy user of Graphite+Grafana for a few months now and I have been advocating it around my firm.
My approach has been to measure data of interest and collect them into 1-minute or 5-minute buckets and send that information to Graphite. I was recently contacted by a group that processes quotes (billions a day!) and their approach has been to create a log line each time their applications process 1 million quotes. The problem is that the interval between 2 log lines can be highly erratic from 1 second to a few hours.
The dilemma is then: should I set my retention policy to a 1-second bucket so that I can see all measurements associated with spikes or should I use say a 1-minute bucket so that the number of data points to be saved and later on queried is much more manageable. FYI, when I set it to 1-second, showing the data for 8 or 10 charts, for a few days was bringing the system (or at least my browser) to a crawl because of the numbers of data points (mostly NULL) being pushed around from Graphite to Grafana
Here's my retention policy: 1s:10d,1m:36d,5m:180d
Alternatively, is there a way to configure Grafana+Graphite to only retrieve non-NULL data points?
What do you recommend?
You can always specify a lower retention period for 1s metrics so when you show a longer range Graphite will send you only the more coarse level.
For example, you can specify: 1s:2d, 1m:7d, 5m:180d
This way, if you show a range more than 2 days in the past you will get 1m resolution (and so on), which won't make your browser crawl, while you will still be able to inspect spikes in the last 2 days.

ASP.net / VB.net / SQL Server : change variables without page request

I'm stuck with building my own, simple browser game.
My program: you can upgrade your tools which allow you to gain more points per hour.
My problem:
So for example a user logs in and upgrades his tools from 0 to 1 which would double the amount of points gained. But upgrading takes 2 hours to complete. I don't expect my user to be online for 2 hours so I save the time he was last seen in an SQL table. Now when 2 hours have passed the amount of points gained need to be doubled but it's very possible that the user doesn't visit the page for another 10 hours. So my current program keeps adding 1 point per hour until the user visits the page. So in this case he'd have 12 points. But it needs to multiply after 2 hours so he needs to have 22 points.
Another, maybe simpler example is a maximum amount of points. Let's say the max is 10 points. But the user stays offline for 15 hours which means he'd earn 15 points at a rate of 1pnt/hr.
I don't have any functionally code yet because I want to know if something like this is actually possible and how for example cityVille(facebook) does it.
Now my question:
Can anyone give me a tip or give me some info on how to get started at this or at least give me the name of what I'm searching for? I've tried google'ing things like "offline database interactions" or "changing variables without user request" but nothing useful comes up.
Thanks in advance,
BlaDrzz.
You can schedule jobs with SQL server. These jobs can run at whatever frequency you like.
http://technet.microsoft.com/en-us/library/ms191439.aspx

How to update SQL Server database every 1 minute?

I have a SQL Server database which contains stock market quotes and other related data.
This database needs to be updated at regular interval, say 1 minute.
My question is:
How do I get stock quotes every 1 minute and update it to database?
I really appreciate your help.
Thanks!
You know, you seriously put the question from the wrong side. Like "I have a car, Mercedes, Coupe - how can I find the best road from A to B". Totally unrelated to the car.
Same with your question - this is not a sql or even an asp.net question to start with. The solution is independant of both, the sql server used and your web technology. Your main question is:
How do I get stock quotes every 12 minute and update it to the database?
Here we go. I assume you (a) talk of US stocks and (b) mean all of them, not a handfull.. 1 minute is too small an interval to make scanning things like yahoo.com feasible - main problem here is that there are tousands of stocks (actually more in the tens of thousands), and you dont want to go to yahoo scrapping thousands of pages per minute.
Same time, a end retail user data feed provider will not work. They support X symbols at a time, and x being typcially in the low hundred area, sometimes upgradable to 500 or so.
If you need STOCK DATA every minute, as per all US stocks, then this is technically identical to "real time prices", which ends up costing money. In adition you need a commercial higher end data feed of which I know of... one. Sorry. Costs going to be near or full four digit, without (!) publication rights.
And that is NxCore - their system has a data offer that offers US Stocks (all exchanges) real time, complete feed with all corretions etc. Native and C# wrapper API, so you can take the real time data feed, update your current pricing in memory and write them out to sql server every minute. Preferably not from asp.net (baaaaad choice for something that should run 24/7 without interruption unless you do heavy setup changes etc.) but from an installed windows service. Takes some bandwidth - no real idea how much (I am getting 4 exchanges from them, but no stocks, only the cme group futures, CME, CBOT, NYMEX and COMEX).
Note that wwith this setup you can go faster, too, but if you go fully real time you need a serious server. We talk of a billion updates or so per day...
End user sql server setup (i.e. little ram, and few slow discs) wont work.
Too expensive? Ther are plenty of data feeds around for a lower price, but they will not give you "stocks" as in "all of them", just "a selection".
If you are ok with not real time data - i.e. pulling stuff down at the end of the day, eoddata.com has a decent offer. YOu could also thnen pull things up via an asp.net page, but again.... you will not have the data during the day, just - well - after close. Smallest granularity is 1 minute. Repluublication rights again a no - but probably you can talk to them.
This isn't really SQL Server specific; a typical solution is that your run a process that polls an external source (a web service or the like) at regular intervals and uses this information to update the database. You can either implement this as a simple command-line program that gets executed every minute from the task scheduler, or you can make it a windows service that sleeps most of the time and only wakes up once a minute to do its processing. Once you have that, writing to the database is as usual.

Resources