Restarting from where recorder left off and Iteration number

Restarting from where recorder left off and Iteration number - openmdao

I have 2 questions on the case recorder.
1- I am not sure how to restart an optimizaiton from where the recorder left off. I can read in the case reader sql file etc but can not see how this can be fed into the problem() to restart.
2- this question is maybe due to my lack of knowledge in python but how can one access to the iteration number from within an openmdao component (one way is to read the sql file that is constantly being updated but there should be a more efficient way.)

You can re-load a case back via the load_case method on the problem.
See the docs for it here.
Im not completely sure what you mean by access the iteration count, but if you just want to know the number of times your components are called you can add a counter to them yourself.
There is not a programatic API for accessing the iteration count in OpenMDAO as of version 2.3

Related

Design of JSR352 batch job: Is several steps a better design than one large batchlet?

My JSR352 batch job needs to read from a database, and then depending on the result flows to one of two pathways, each of which involves some more if/else scenarios. I wonder what the pros and cons between writing a single step with a large batchlet and several steps consisting of smaller batchlets would be. This job does not involves chunk steps with chunk size larger than 1, as it needs to persists the read result immediately in case there is any before proceeding to other logic. The job will be run using Control-M, I wonder if using multiple smaller steps provides more control points.

From that description, I'd suggest these
Benefits of more, fine-grained steps
1. Restart
After a job failure, the default behavior on restart is to begin executing at the step where the previous job execution failed. So breaking the job up into more steps allows you to avoid writing the logic to resume where you left off and avoid re-processing, and may save execution time in the process.
2. Reuse
By encapsulating a discrete function as its own batchlet, you can potentially compose other steps in other jobs (or even later in this job) implemented with this same batchlet.
3. Extract logic into XML
By moving the transition logic into the transition elements, and extracting the conditional flow (e.g. <next on="RC1" to="step3"/>, etc.)
into the job definition XML (JSL), you can introduce changes at a standard control point, without having to go into the Java source and find the right place.
Final Thoughts
You'll have to decide if those benefits are worth it for your case.
One more thought
I wouldn't automatically rule out the chunk step just because you are using a 1-item chunk, if you can still find benefits from the checkpointing or even possibly the skip/retry. (But that's probably a separate question.)

Performance Issue with JavaFX multiple tabs simultaneous updates of TextArea

I'm relatively new to JavaFX and have written a small applet which launches a number of (typically between 3 and 10) sub-processes. Each process has a dedicated tab displaying current status and a large TextArea where the process output is appended to. For simplicity all tabs are generated on startup.
javafx.application.Platform.runLater(() -> logTextArea.appendText(line)))
The applet works fine when workloads on sub-processes are low-moderate (not many logs), but starts to freeze when sub-processes are heavily used and generate a decent amount of logging output (a good few hundreds of lines per second in total).
I looked into binding the TextArea to the output, but my understanding is it effectively calls the Platform.runLater() method so there will still be hundreds of calls to JavaFX application thread per second.
Batching logging outputs isn't an ideal solution either because I'd like to keep the displayed log as real-time as possible.
The only solution which I think might solve the problem seems to be dynamic loading of individual tabs. This would definitely prevent unnecessary calls to update logging textareas that aren't currently visible, but before I go ahead to make the changes, I'd like to get some helpful advice from you here. Thanks!

Thanks for all your suggestions. Finally got around to implementing a fix today.
The issue was fixed by using a buffer coupled with a secondary check for time lapse (maximum 20 lines or 100 ms).
In addition, I also implemented rolling output to limit the total process output to 1,000 lines.
Thanks again for your invaluable contribution!

is this the result of a partial image transfer?

I have code that generates thumbnails from JPEGs. It pulls an image from S3 and then generates the thumbs.
One in about every 3000 files ends up looking like this. It happens in batches. The high res looks like this and they're all resized down to low res. It does not fail on resize. I can go to my S3 bucket and see that the original file is indeed intact.
I had this code written in Ruby and ported it all over to clojure hoping it would just fix my issue but it's still happening.
What would result in a JPEG that looks like this?
I'm using standard image copying code like so
(with-open [in (clojure.java.io/input-stream uri)
out (clojure.java.io/output-stream file)]
(clojure.java.io/copy in out))
Would there be any way to detect the transfer didn't go well in clojure? Imagemagick? Any other command line tool?

My guess is it is one of 2 possible issues (you know your code, so you can probably rule one out quickly):
You are running out of memory. If the whole batch of processing is happening at once, the first few are probably not being released until the whole process is completed.
You are running out of time. You may be reaching your maximum execution time for the script.
Implementing some logging as the batches are processed could tell you when the issue happens and what the overall state is at that moment.

Setting a new label on all nodes takes too long in a huge graph

I'm working on a graph containing about 50 million nodes and 40 million relationships.
I need to update every node.
I'm trying to set a new label to these nodes, but it's taking too long.
The label applies to all 50 million nodes, so the operation never ends.
After some research, i found out that Neo4j treats this operation as a single transaction (i don't know if optimistic or not), keeping the changes uncommitted, until the end (which will never happen in this fashion).
I'm currently using Neo4j 2.1.4, which has a feature called "USING PERIODIC COMMIT" (already present in earlier versions). Unfortunately, this feature is coupled to the "LOAD CSV" feature, and not available to every cypher command.
The cypher is quite simple:
match n set n:Person;
I decided to use a workaround, and make some sort of block update, as follows:
match n
where not n:Person
with n
limit 500000
set n:node;
It's ugly, but i couldn't come up with a better solution yet.
Here are some of my confs:
== neo4j.properties =========
neostore.nodestore.db.mapped_memory=250M
neostore.relationshipstore.db.mapped_memory=500M
neostore.propertystore.db.mapped_memory=900M
neostore.propertystore.db.strings.mapped_memory=1300M
neostore.propertystore.db.arrays.mapped_memory=1300M
keep_logical_logs=false
node_auto_indexing=true
node_keys_indexable=name_autocomplete,document
relationship_auto_indexing=true
relationship_keys_indexable=role
execution_guard_enabled=true
cache_type=weak
=============================
== neo4j-server.properties ==
org.neo4j.server.webserver.limit.executiontime=20000
org.neo4j.server.webserver.maxthreads=200
=============================
The hardware spec is:
RAM: 24GB
PROC: Intel(R) Xeon(R) X5650 # 2.67GHz, 32 cores
HDD1: 1.2TB
In this environment, each block update of 500000 nodes took from 200 to 400 seconds. I think this is because every node satisfies the query at the start, but as the updates take place, more nodes need to be scanned to find the unlabeled ones (but again, it's a hunch).
So what's the best course of action whenever an operation needs to touch every node in the graph?
Any help towards a better solution to this will be appreciated!
Thanks in advance.

The most performant way to achieve this is using the batch inserter API. You might use the following recipe:
take a look at http://localhost:7474/webadmin and note the "node count". In fact it's not the number of nodes it's more the highest node id in use - we'll need that later on.
make sure to cleanly shut down your graph database.
take a backup copy of your graph.db directory.
write a short piece of java/groovy/(whatever jvm language you prefer...) program that performs the following tasks
open your graph.db folder using the batch inserter api
in a loop from 0..<node count> (from step above) check if the node with given id exists, if so grab its current labels and amend the list by the new label and use setNodeLabels to write it back.
make sure you run shutdown with the batchinserter
start up your Neo4j instance again

Asp.net sql server 2005 timeout issue

HI
We am getting time outs in our asp.net application. We are using sql server 2005 as the DB.
The queries run very fast in the query analyser . However when we check the time through the profiler it shows a time that is many times more than what we get in query analyser.
(paramter sinffing is not the cause)
Any help is much appreciated
thanks
We are on a SAN
Cleared the counters. The new counters are
ASYNC_NETWORK_IO 540 9812 375 78
WRITELOG 70 1828 328 0
The timeout happens only on a particular SP which a particular set of params. if we change the params and access the app it works fine. We ran the profiler and found that the SP batchcompleted statement comes up in the profiler after the timeout happens on asp.net side. If we restart the server everything works fine
if we remove the plan from the cache the app works fine. However we have taken into consideration parameter sniffing in the sp. what else could be the reason

If I was to take a guess, I would assume that the background database load from the webserver is elevating locks and causing the whole thing to slow down. Then you take a large-ish query and run it and that causes lock (and resource) contension.
I see this ALL THE TIME with companies complaining of performance problems with their client-server applications when going from one SQL server to a cluster. In the web-world, we get those issues much earlier.
The solution (most times) to lock issues with one of the following:
* Refactor your queries to work better (storing SCOPE_IDENTITY instead of calling it 5 times for example)
* Use the NO LOCK statement everywhere it makes sense.
EDIT:
Also, try viewing the server with the new 2008 SQL Management Studio 'Activity Monitor'. You can find it by right-clicking on your server and selecting 'Activity Monitor'.
Go to the Processes section and look at how many processes are 'waiting'. Your wait time should be near-0. If you see alot of stuff under 'Wait Type', post a screen shot and I can give you an idea of what the next step is.
Go to the Resource Waits section and see what the numbers look like there. Your waiters should always be near-0.
And 'Recent Expensive Queries' is awesome to look at to find out what you can do to improve your general performance.
Edit #2:
How much slower is it? Your SAN seems to be taking up about 10 seconds worth, but if you are talking 20 seconds vs. 360 seconds, then that would not be relevent, and there is no waits for locks, so I guess I am drawing a blank. If the differene is between 1 second and 10 seconds then it seems to be network related.

Run the following script to create this stored proc:
CREATE PROC [dbo].[dba_SearchCachedPlans]
#StringToSearchFor VARCHAR(255)
AS
/*----------------------------------------------------------------------
Purpose: Inspects cached plans for a given string.
------------------------------------------------------------------------
Parameters: #StringToSearchFor - string to search for e.g. '%<MissingIndexes>%'.
Revision History:
03/06/2008 Ian_Stirk#yahoo.com Initial version
Example Usage:
1. exec dba_SearchCachedPlans '%<MissingIndexes>%'
2. exec dba_SearchCachedPlans '%<ColumnsWithNoStatistics>%'
3. exec dba_SearchCachedPlans '%<TableScan%'
4. exec dba_SearchCachedPlans '%CREATE PROC%MessageWrite%'
-----------------------------------------------------------------------*/
BEGIN
-- Do not lock anything, and do not get held up by any locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT TOP 100
st.TEXT AS [SQL],
cp.cacheobjtype,
cp.objtype,
DB_NAME(st.dbid) AS [DatabaseName],
cp.usecounts AS [Plan usage],
qp.query_plan
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
WHERE CAST(qp.query_plan AS NVARCHAR(MAX)) LIKE #StringToSearchFor
ORDER BY cp.usecounts DESC
END
Then execute:
exec dba_SearchCachedPlans '%<MissingIndexes>%'
And see if you are missing any recommended indexes.
When SQL server creates a plan it saves it, along with any recommended indexes. Just click on the query_plan column text to show you the graph. On the top there will be recommended indexes you should implement.

I don't have the answer for you, because I'm not a guru. But I do remember reading on some SQL blogs recently that SQL 2008 has some extra things you can add to the query/stored procedure so it calculates things differently. I think one thing you could try searching for is called 'hints'. Also, how SQL uses the current 'statistics' makes a difference too. Look that up. And how the execution plan is only generated for the first run--if that plan doesn't work with different parameter values because there would be a vast difference in what would be searched/returned, it can present this behavior I think.
Sorry I can't be more helpful. I'm just getting my feet wet with SQL Server performance at this level. I bet if you asked someone like Brent Ozar he could point you in the right direction.

I've had this exact same issue a couple of times before. It seemed to happen to me when a particular user was on the site when it was deployed. When that user would run certain stored procedures with their ID it would timeout. When others would run it, or I would run it from the DB, it would run in no time. We had our DBA's watch everything they could and they never had an answer. In the end, everything was fixed whenever I re-deployed the site and the user was not already logged in.

I've had similar issues and with my case it had to do with the SP recompiling. Specifically it was my use of temp tables vs table variables.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex