Google Apps Script running slow due to for loop - rss

I have a google apps scripts that takes a spreadsheet, and loops over the rows, getting the value column by column and generating an RSS feed.
I have some performance issues, and it's due to the foor loop I think, and querying that many values.
Any insights on how to optimize this? Thanks!
http://pastebin.com/EPN5EPAx

Calling getCell and setValue over and over again is probably what is slowing it down so much. Each time you call setValue() it makes a new IO call which is slow. It''s best to load and save your data all in one fell swoop.
For example, load all the values from the range at the start with:
var values = range.getValues();
Then iterate through he resulting two dimensional array (instead of getCell(i, 2) use values[i - 1][1]).
When you need to change a value use:
values[i][j] = newValue;
Then when you're finished call:
range.setValues(values);
This way you minimize the IO calls to two: one to load at the start and one save changes at the end.

Related

Kusto: How to query large tables as chunks to export data?

How can I structure a Kusto query such that I can query a large table (and download it) while avoiding the memory issues like: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/concepts/querylimits#limit-on-result-set-size-result-truncation
set notruncation; only works in-so-far as the Kusto cluster does not run OOM, which in my case, it does.
I did not find the answers here: How can i query a large result set in Kusto explorer?, helpful.
What I have tried:
Using the .export command which fails for me and it is unclear why. Perhaps you need to be the cluster admin to run such a command? https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/data-export/export-data-to-storage
Cycling through row numbers, but run n times, you do not get the right answer because the results are not the same, like so:
let start = 3000000;
let end = 4000000;
table
| serialize rn = row_number()
| where rn between(start..end)
| project col_interest;
"set notruncation" is not primarily for preventing an Out-Of-Memory error, but to avoid transferring too much data over-the-wire for an un-suspected client that perhaps ran a query without a filter.
".export" into a co-located (same datacenter) storage account, using a simple format like "TSV" (without compression) has yielded the best results in my experience (billions of records/Terabytes of data in extremely fast periods of time compared to using the same client you would use for normal queries).
What was the error when using ".export"? The syntax is pretty simple, test with a few rows first:
.export to tsv (
h#"https://SANAME.blob.core.windows.net/CONTAINER/PATH;SAKEY"
) with (
includeHeaders="all"
)
<| simple QUERY | limit 5
You don't want to overload the cluster at the same time by running an inefficient query (like a serialization on a large table per your example) and trying to move the result in a single dump over the wire to your client.
Try optimizing the query first using the Kusto Explorer client's "Query analyzer" until the CPU and/or memory usage are as low as possible (ideally 100% cache hit rate; you can scale up the cluster temporarily to fit the dataset in memory as well).
You can also run the query in batches (try first to use time-filters, since this is a time-series engine) and save each batch into an "output" table (using ".set-or-append"), in this way you split the load by first using the cluster to process the dataset, and then exporting the full "output" table into an external storage.
If for some reason you absolutely most use the same client to run the query and consume the (large) result, try using database cursors instead of serializing the whole table, it's the same idea, but pre-calculated, so you can use a "limit XX" where "XX" is the largest dataset you can move over the wire to your client, so you can run the same query over and over moving the cursor, until you are finished moving the whole dataset:

AnalysisServices: Cannot query internal supporting structures for column because they are not processed. Please refresh or recalculate the table

I'm getting the following error when trying to connect Power BI to my tabular model in AS:
AnalysisServices: Cannot query internal supporting structures for column 'table'[column] because they are not processed. Please refresh or recalculate the table 'table'
It is not a calculated column and the connection seems to work fine on the local copy. I would appreciate any help with this!
This would depend on how you are processing the data within your model. If you have just done a Process Data, then the accompanying meta objects such as relationships have not yet been built.
Every column of data that you load needs to also be processed in this way regardless of whether it is a calculated column or not.
This can be achieved by running a Process Recalc on the Database or by loading your tables or table partitions with a Process Full/Process Default rather than just a Process Data, which automatically runs the Process Recalc once the data is loaded.
If you have a lot of calculated columns and tables that result in a Process Recalc taking a long time, you will need to factor this in to your refreshes and model design.
If you run a Process Recalc on your database or a Process Full/Process Default on your table now, you will no longer have those errors in Power BI.
More in depth discussion on this can be found here: http://bifuture.blogspot.com/2017/02/ssas-processing-tabular-model.html

Object data source with business objects slow?

In my project I have a page with some RDLC graphs. They used to run on some stored procedures and an xsd. I would pass a string of the ID's of my results should include to restrict my data set. I had to change this because I started running into the 1000 character limit on object data set parameters.
I updated my graphs to run on a list of business objects instead and it seems that the page loads significantly slower than before. By significantly slower I mean page loads take about a minute now.
Does anybody know if object data sources are known to run slow when pulling business objects? If not is there a good way to track down what exactly is causing the issue? I put break points in my method that actually retrieves the business objects before and after it gets them; that method doesn't seem to be the cause of the slowdown.
I did some more testing and it seems that the dang thing just runs significantly slower when binding to business objects instead of a data table.
When I was binding my List< BusinessObject> to the ReportViewer the page took 1 minute 9 seconds to load.
When I had my business logic use the same function that returns the List and build a DataTable from the list with only the required columns for the report, then bound the DataTable to the report the page loads in 20 seconds.
Are you using select *? If so try selecting each field individually if you aren't using the entire table. That will help a bit.
#William: I experienced the same problem. I noticed though, when I flatten the business object, the reports runs significantly faster. You don't even have to map the business object to a new flattened one, you can simply set the nested object to null. I.e.:
foreach(var employee in employees)
{
employee.Department = null;
employee.Job = null;
}
It seems that the reportwriter does some weired things traversing the object graph.
This seems to be the case only in VS 2010. VS 2008 doesn't seem to suffer the same problems.

persistent data store between calls in plr

I have a web application that talks to R using plr when doing adaptive testing.
I would need to find a way to store static data persistently between calls.
I have an expensive calculation creating an item bank than a lot of cheap ones getting the next item after each response submission. However currently I can't find a way to store the result of the expensive calculation persistently.
Putting it into the db seems to be a lot of overhead.
library(catR)
data(tcals)
itembank <- createItemBank(tcals) --this is the expensive call
nextItem(itembank, 0) # item 63 is selected
I tried to save and load the result, like this, but it doesn't seem to work, the result of the second NOTICE is 'itembank'.
save(itembank, file="pltrial.Rdata")
pg.thrownotice(itembank)
aaa=load("pltrial.Rdata")
pg.thrownotice(aaa)
I tried saving and loading the workspace as well, but didn't succeed with that either.
Any idea how to do this?
The load function directly loads objects into your workspace. You don't have to assign the return value (which is just the names of the objects loaded, as you discovered). If you do a ls() after loading, you should find your itembank object sitting there.

Accessing Process global variables in scriptTask (jbpm5)

I created a simple process definition in jBPM5 with just a single script task. I want to include a global variable, say count that is static in the sense that the same value is shared across the various process instances, however it is not a constant and each instance can update the value, say increment it in the first task of the process. From the script task I want to do this modification (increment) and print it to the stdout. How do I do this?
System.out.println(count);
kcontext.setVariable("count", count + 1);
I myself found the answer with some researching that we need to use kcontext.getKnowledgeRuntime().setVariable() and .getVariable() for setting and getting a 'static' variable that is shared across process instances. However, it is leading to another question in my mind as to what would happen if the scriptTask that uses setVariable is called simultaneously by multiple instances! Thanks #KrisV! Without your help I would not have been able to come to this. :)

Resources