persistent data store between calls in plr - r

I have a web application that talks to R using plr when doing adaptive testing.
I would need to find a way to store static data persistently between calls.
I have an expensive calculation creating an item bank than a lot of cheap ones getting the next item after each response submission. However currently I can't find a way to store the result of the expensive calculation persistently.
Putting it into the db seems to be a lot of overhead.
library(catR)
data(tcals)
itembank <- createItemBank(tcals) --this is the expensive call
nextItem(itembank, 0) # item 63 is selected
I tried to save and load the result, like this, but it doesn't seem to work, the result of the second NOTICE is 'itembank'.
save(itembank, file="pltrial.Rdata")
pg.thrownotice(itembank)
aaa=load("pltrial.Rdata")
pg.thrownotice(aaa)
I tried saving and loading the workspace as well, but didn't succeed with that either.
Any idea how to do this?

The load function directly loads objects into your workspace. You don't have to assign the return value (which is just the names of the objects loaded, as you discovered). If you do a ls() after loading, you should find your itembank object sitting there.

Related

How To Use Flux Stores

Most examples of Flux use a todo or chat example. In all those examples, the data set you are storing is somewhat small and and be kept locally so not exactly sure if my planned use of stores falls in line with the flux "way".
The way I intend to use stores are somewhat like ORM repositories. A way to access data in multiple ways and persist data to the data service, whatever that might be.
Lets say I am building a project management system. I would probably have methods like these for data retrieval:
getIssueById
getIssuesByProject
getIssuesByAssignedUser
getIssueComments
getIssueCommentById
etc...
I would also have methods like this for persisting data to the data service:
addIssue
updateIssue
removeIssue
addIssueComment
etc...
The one main thing I would not do is locally store any issue data (and for that matter most store data that related to a data store). Most of the data is important to have fresh because maybe the issue status has updated since I last retrieved that issue. All my data retrieval method would probably always make an API requests to the the latest data.
Is this against the flux "way"? Are there any issue with going about flux in this way?
I wouldn't get too hung up on the term "store". You need create application state in some way if you want your components to render something. If you need to clear that state every time a different request is made, no problem. Here's how things would flow with getIssueById(), as an example:
component calls store.getIssueById(id)
returns empty object since issue isn't in store's cache
the store calls action.fetchIssue(id)
component renders empty state
server responds with issue data and calls action.receiveIssue(data)
store caches that data and dispatches a change event
component responds to event by calling store.getIssueById(id)
the issue data is returned
component renders data
Persisting changes would be similar, with only the most recent server response being held in the store.
user interaction in component triggers action.updateIssue(modifiedIssue)
store handles action, sending changes to server
server responds with updated issue and calls action.receiveIssue(data)
...and so on with the last 4 steps from above.
As you can see, it's not really about modeling your data, just controlling how it comes and goes.

Object data source with business objects slow?

In my project I have a page with some RDLC graphs. They used to run on some stored procedures and an xsd. I would pass a string of the ID's of my results should include to restrict my data set. I had to change this because I started running into the 1000 character limit on object data set parameters.
I updated my graphs to run on a list of business objects instead and it seems that the page loads significantly slower than before. By significantly slower I mean page loads take about a minute now.
Does anybody know if object data sources are known to run slow when pulling business objects? If not is there a good way to track down what exactly is causing the issue? I put break points in my method that actually retrieves the business objects before and after it gets them; that method doesn't seem to be the cause of the slowdown.
I did some more testing and it seems that the dang thing just runs significantly slower when binding to business objects instead of a data table.
When I was binding my List< BusinessObject> to the ReportViewer the page took 1 minute 9 seconds to load.
When I had my business logic use the same function that returns the List and build a DataTable from the list with only the required columns for the report, then bound the DataTable to the report the page loads in 20 seconds.
Are you using select *? If so try selecting each field individually if you aren't using the entire table. That will help a bit.
#William: I experienced the same problem. I noticed though, when I flatten the business object, the reports runs significantly faster. You don't even have to map the business object to a new flattened one, you can simply set the nested object to null. I.e.:
foreach(var employee in employees)
{
employee.Department = null;
employee.Job = null;
}
It seems that the reportwriter does some weired things traversing the object graph.
This seems to be the case only in VS 2010. VS 2008 doesn't seem to suffer the same problems.

Generate and serve a file in the server on demand. Best way to do it without consume too much resources?

My application has to export the result of a stored procedure to .csv format. Basically, the client performs a query, and he can see the results on a paged grid, if it contains what he wants, then he clicks on a "Export to CSV" button and he downloads the whole thing.
The server will have to run a stored procedure that will return the full result without paging, create the file and return it to the user.
The result file could be very large, so I'm wondering what is the best way to create this file in the server on demand and serve it to the client without blow up the server memory or resources.
The easiest way: Call the stored procedure with LINQ, create a stream and iterate over the result collection and creating a line in the file per collection item.
Problem 1: Does the deferred execution applies as well to LINQ to stored procedures? (I mean, will .NET try to create a collection with all the items in the result set in memory? or will it give me the result item by item if I do an iteration instead of a .ToArray?)
Problem 2: Is that stream kept in RAM memory till I perform a .Dispose/.Close?
The not-so-easy way: Call the stored procedure with a IDataReader and per each line, write directly to the HTTP response stream. It looks like a good approach, as long as I read I write to the response, the memory is not blown up.
Is it really worth it?
I hope I have explained myself correctly.
Thanks in advance.
Writing to a stream is the way to go, as it will rougly consume not more than the current "record" and associated memory. That stream can be a FileStream (if you create a file) or the ASP.NET stream (if you write directly to the web), or any other useful stream.
The advantage of creating a file (using the FileStream) is to be able to cache the data to serve the same request over and over. Depending on your need, this can be a real benefit. You must come up with an intelligent algorithm to determine the file path & name from the input. This will be the cache key. Once you have a file, you can use the TransmitFile api which leverages Windows kernel cache and in general very efficient. You can also play with HTTP client caches (headers like last-modified-since, etc.), so next time the client request the same information, you may return a not modified (HTTP 304 status code) response. The disadvantages of using cache files is you will need to manage these files, disk space, expiration, etc.
Now, Linq or IDataReader should not change much about perf or memory consumption provided you don't use Linq method that materialize the whole data (exhaust the stream) or a big part of it. That means you will need to avoid ToArray(), ToList() methods and other methods like this, and concentrate only on "streamed" methods (enumerations, skips, while, etc.).
I know I'm late to the game on here, but theoretically how many records are we talking here? I saw 5000 being thrown around, and if its around there that shouldn't be a problem for your server.
Answering the easiest way:
It does unless you specify otherwise (you disable lazy loading).
Not sure I get what you're asking here. Are you referring to a streamreader you'd be using for creating the file, or the datacontext you are using to call the SP? I believe the datacontext will clean up for you after you're done (always good practice to close anyway). Streamreader or the like will need a dispose method run to remove from memory.
That being said, when dealing with file exports I've had success in the past building the Table (CSV) programmatically (via iteration), then sending the structured data as an HTTP response with the type specified in the header, the not so easy way as you so eloquently stated :). Heres a question that asks how to do that with CSV:
Response Content type as CSV
"The server will have to run a stored procedure that will return the full result without paging..."
Perhaps not, but I believe you'll need Silverlight...
You can set up a web service or controller that allows you to retrieve data "by the page" (much like just calling a 'paging' service using GridView or other repeater). You can make async calls from silverlight to get each "page" of data until completed, then use the SaveFileDialog to save to the harddisk.
Hope this helps.
Example 1 |
Example 2
What you're talking about isn't really deferred execution, but limiting the results of a query. When you say objectCollection.Take(10), the SQL that is generated when you iterate the enumerable only takes the top 10 results of that query.
That being said, a stored procedure will return whatever results you are passing back, whether its 5 or 5000 rows of data. Performing a .Take() on the results won't limit what the database returns.
Because of this, my recommendation (if possible for your scenario), is to add paging parameters to your stored procedure (page number, page size). This way, you will only be returning the results you plan to consume. Then when you want the full list for your CSV, you can either pass a large page size, or have NULL values mean "Select all".

ASP.NET data caching design

I have method in my BLL that interacts with the database and retrieves data based on the defined criteria.
The returned data is a collection of FAQ objects which is defined as follows:
FAQID,
FAQContent,
AnswerContent
I would like to cache the returned data to minimize the DB interaction.
Now, based on the user selected option, I have to return either of the below:
ShowAll: all data.
ShowAnsweredOnly: faqList.Where(Answercontent != null)
ShowUnansweredOnly: faqList.Where(AnswerContent != null)
My Question:
Should I only cache all data returned from DB (e.g. FAQ_ALL) and filter other faqList modes from cache (= interacting with DB just once and filter the data from the cache item for the other two modes)? Or should I have 3 cache items: FAQ_ALL, FAQ_ANSWERED and FAQ_UNANSWERED (=interacting with database for each mode [3 times]) and return the cache item for each mode?
I'd be pleased if anyone tells me about pros/cons of each approach.
Food for thought.
How many records are you caching, how big are the tables?
How much mid-tier resources can be reserved for caching?
How many of each type data exists?
How fast filtering on the client side will be?
How often does the data change?
how often is it changed by the same application instance?
how often is it changed by other applications or server side jobs?
What is your cache invalidation policy?
What happens if you return stale data?
Can you/Should you leverage active cache invalidation, like SqlDependency or LinqToCache?
If the dataset is large then filtering on the client side will be slow and you'll need to cache two separate results (no need for a third if ALL is the union of the other two). If the data changes often then caching will return stale items frequently w/o a proactive cache invalidation in place. Active cache invalidation is achievable in the mid-tier if you control all the updates paths and there is only one mid-tier instance application, but becomes near really hard if one of those prerequisites is not satisfied.
It basically depends how volatile the data is, how much of it there is, and how often it's accessed.
For example, if the answered data didn't change much then you'd be safe caching that for a while; but if the unanswered data changed a lot (and more often) then your caching needs might be different. If this was the case it's unlikely that caching it as one dataset will be the best option.
It's not all bad though - if the discrepancy isn't too huge then you might be ok cachnig the lot.
The other point to think about is how the data is related. If the FAQ items toggle between answered and unanswered then it'd make sense to cache the base data as one - otherwise the items would be split where you wanted it together.
Alternatively, work with the data in-memory and treat the database as an add-on...
What do I mean? Well, typically the user will hit "save" this will invoke code which saves to the DB; when the next user comes along they will invoke a call which gets the data out of the DB. In terms of design the DB is a first class citizen, everything has to go through it before anyone else gets a look in. The alternative is to base the design around data which is held in-memory (by the BLL) and then saved (perhaps asynchronously) to the DB. This removes the DB as a bottleneck but gives you a new set of problems - like what happens if the database connection goes down or the server dies with data only in-memory?
Pros and Cons
Getting all the data in one call might be faster (by making less calls).
Getting all the data at once if it's related makes sense.
Granularity: data that is related and has a similar "cachability" can be cached together, otherwise you might want to keep them in separate cache partitions.

I’m not sure whether lazy load pattern is useful here

I’m currently reading a book on website programming and author mentions that he will code DLL objects to use lazy load pattern. I think that conceptually I somewhat understand lazy load pattern, but I’m not sure if I understand its usefulness in the way author implemented it
BTW - Here I’m not asking for usefulness of lazy load patterns in general, but whether it is useful in the way this particular book implements it:
1) Anyways, when DLL object is created, a DB query is performed (via DAL), which retrieves data from various columns and with it populates the properties of our DLL object. Since one of the fields ( call it "L" ) may contain quite a substantial amount of text, author decided to retrieve that field only when that property is read for the first time.
A) In our situation, what exactly did we gain by applying lazy load pattern? Just less of memory usage?
B) But on the other hand, doesn’t the way author implemented lazy load pattern cause for CPU to do more work and thus longer time to complete, since if L is retrieved separately from other fields, then that will require of our application to make an additional call to Sql Server in order to retrieve "L", while without lazy load pattern only one call to Sql Server would be needed, since we would get all the fields at once?!
BTW – I realize that lazy load pattern may be extremely beneficial in situations where retrieving particular piece of data would require heavy computing, but that’s not the case in the above example
thanx
It makes sense if the DLL objects can be used without the L field (most of the time). If that is the case your program can work with the available data while waiting for L to load. If L is always needed then the pattern just increases complexity. I do not think it will significantly slow things down especially if loading L takes more time than anything else. But that is just a guess. Write both with lazy loading and without then see which is better.
I think that this is pretty useful when applied to the correct columns. For instance let's say that you have a table in your database, Customers, and in that table you have a column CustomerPublicProfile, which is a text column which can be pretty big.
If you have a screen (let's call it Customers.aspx) which displays a list of the customers (but not the CustomerPublicProfile column), then you should try and avoid populating that column.
For instance if your Customers.aspx page shows 50 customers at a time you shouldn't have to get the CustomerPublicProfilecolumn for each customer. If the user decides to drill down into a specific customer then you would go and get the CustomerPublicProfile column.
About B, yes this does make N extra calls, where N is the number of customers that the user decided to drill down into. But the advantage is that you saved a lot of extra un-needed overhead in skipping the column in the first place. Specifically you avoided getting M-N values of the CustomerPublicProfile column, where M is the number of customers that were retrieved on the Customers.aspx page to begin with.
If in your scenario M has a value close to N then it is not worth it. But in the situation I described M is usually much larger than N so it makes sense.
Sayed Ibrahim Hashimi
I had a situation like this recently where I was storing large binary objects in the database. I certainly didn't want these loading into the DLL object every time it was initialised, especially when the object was part of a collection. So there are cases when lazy loading a field would make sense. However, I don't think there is any general rule you can follow - you know your data and how it will be accessed. If you think it's more efficient to make one trip to the database and use a little more memory, then that is what you should do.

Resources