Sharing in memory data in RStudio Server - r

I am trying to determine if I am able to keep data in-memory with RStudio to be used by multiple sessions, or for the session to at least be preserved. Searching for information about the existence/nonexistence of this feature has proven to be challenging.
The test is this:
In a session with RStudio create a variable and assign a value to it.
In another session run a script that refers to that variable.
If the variable is assigned a value then the script will work, otherwise it fails with "Error: object variable not found.
Is it possible to make a cross session variable in Rstudio Server that will work with this procedure without engaging file i/o? Or is it simply unavailable as a server function?

Unfortunately this is not possible because of the way R itself is designed.
Each R session has its own private memory space which contains values and data (the global environment for that session, etc.).
In order to create a cross-session variable, the R sessions would have to share memory, and they would also have to coordinate access to that memory, so that (for instance) if one session was changing the value of the variable, the other session could not read the value until the first session was done changing it. This sort of coordination mechanism just doesn't exist in R.
If you want to do this there are a couple work-arounds:
Keep your data in a place that both sessions can read and write to safely, for instance in a database, or
As you mentioned, engaging file I/O is an option, but this isn't too hard: use a .Rdata file; when you wish to publish data to other sessions, write the private variables as an R data file (using e.g. save), and when the other session wishes to synchronize, it can load the data into its own private space (using e.g. load).

Related

What is the difference between Sys.getenv() and getOption() in R?

I'm wondering about the difference between the R environment and R options. Specifically, what is the difference between the settings held in this place:
Sys.getenv()
vs. this place
getOption()
When calling each of them, I receive quite different settings but I can't figure out what the difference is on a conceptual level. What is each of these intended for? When deciding in which location to let the user store their API keys, which one is the better (safer, faster, more logical) place to store them in? Does either of these settings persist after closing R?

Storing large object to InProc session rather than reloading on every page

This is my first post/question so please let me know if/how I can improve it. I found similar questions but nothing quite covered this.
When you store to InProc session you're just storing a reference to the data. So, if I have a public property foo, and I store it in Session("foo") = foo, then I haven't really taken up any additional memory (aside from the 32/64 bits used by the pointer)?
In my case, we are currently reloading foo on every page of our website, so if I were to instead store it in session, then it should be taking the same about of space, but not needing to reload on every page. I'm seeing a lot of people say not to store large objects in session, but if that large object already exists, what difference does it make to have a pointer to it? Of course I would remove the object from session the moment it was no longer needed.
The data we are trying to store is an object specific to the user's current work, but not user data. As an analogy, say the user was a car dealer, and he is looking at all the data for a particular customer. We have multiple pages for this customer, and we want to keep all the customer info loaded on each page, All the customer data is stored in a single xml data column in a SQL table, which we parse on every page.
We have tried binary serialization instead of parsing xml, so we could store with session in state server mode, but we found the performance to actually be worse.
We are running on a single web server.
First off, no. When you store something in the session state all the data required to store that object is consumed by the website process(s). Just because .NET treats variables like references doesn't mean it actually uses less memory than a no-GC language. It just means that copying that variable around is done efficiently without using reference operators or pointers.
Your question is a bit vague, but you have a few options for persisting data:
1) Send the data to the client as JSON and store it on the browser if it should be per-user and is needed more on the client side than the server side. You can then send pieces of the data with different requests if you need to (put it in hidden fields if you have to use ASPX web forms).
2) Store it in the session state if it is a small bit of per user data.
3) Store it in the ASP.NET cache if it is large and common to all users, see here (https://msdn.microsoft.com/en-us/library/6hbbsfk6.aspx).
4) If it is large and user-specific that is used primarily on the server then you have more of a performance problem. You should see if you can break out any user specific stuff from static stuff. If you do that and its still large then a database may not be a bad solution. If you are already using DB calls in your application then looking up this data on every request won't cause too much overhead and you won't have to regenerate it from scratch (You should only do this if the data takes a considerable time to generate as a DB call could be slower than just regenerating the data itself). I recommend writing some sort of middleware (HttpModule or OwinMiddleware) that uses whatever user Identity you use for auth to look up the data and then set it on the HttpContext.Current.Items collection. This way the data is usable for the entire request and you can add logic in the middleware to figure out when to set it.
I would think that having a large chunk of user-specific data would be a red flag as user data should just be a list of what the user can/can't do and what their preferences are.
If this is static data then its super simple. The application cache is what you want. The only complications would be if you have multiple servers that need synced data.

Get time data last changed in redis using rredis

I have some data stored in redis which I am accessing through R using the package rredis. Is it possible to find that time which a piece of data stored in redis (in my particular example a hash) last changed from within the R terminal?
Thanks
No.
Redis does not maintain the timestamp of a key's last update. It is possible, however, to implement this type of behavior "manually" by adding this logic to your application's code.

What is the best way to store tagged value into a content in Plone: zope.annotation or setattr

This is a common case for a developer. You want to add a feature for every content types of your website that need to store data. It is a kind of metadata, not configuration data.
I see two solutions:
zope.annotation
setattr: add attribute to the persistent object
I don't really know why but from Plone2.5 it was nice to use zope.annotation and now it seems to not be the prefered way to store additionnal data. For example plone.uuid use setattr to store the unique id.
Either one, depending on how often your data changes and how big it is. It's all about how the ZODB stores this information.
Storing data directly on the object using setattr means that data is stored in the persistent record for that object. If that's a big object that means there is a big transaction going to take place for that write.
Storing data in a zope.annotations annotation means you get a separate persistent record for each annotation entry, so any changes to your data will result in a smaller transaction. But if you want to access this data often, that extra persistent record will need to be loaded, on top of all the other persistent records. It'll take a slot in your ZODB cache, your ZEO server or RelStorage server will need to serve it, etc.
plone.uuid uses setattr because it is generally generated only once for a given object, usually at a time it is being created already. It is also a piece of data that is accessed often and quite small. So, by putting it directly on the object itself it'll be loaded as soon as you load that object, no extra trips to the ZODB required, and it'll only be changed once in it's lifetime.
Note: the above presumes that the annotations are stored with the AttributeAnnotations adapter, which is the most common method and the default for Plone content.

Pitfall of storing dataTable(10,000 rows) in a session variable?

Consider my dataTable contains 10,000 rows and i want to know the pitfall of storing datatable in a session variable... I want to use it until a new row has been added...
What type of session mode should i use?
Never do that, its not recommended.. It affects the performance if your server has low memory and busy processing times.
Before proceeding with this you need to consider
is your server got a available memory?
How busy your server is?
Whether the data you goin to put in session will be shared across among multiple user requests?
Why do you want to store that much data into an session.Probably you might be doing this for pagination (in a datagrid i assume), then you have to reconsider your design.
If you need the data through out your user session, then storing it in Session is good. But in your case you can go for either ViewState or Cache. You can store in ViewState is better.
ViewState["Table"] = ds.Tables[0];
to retrieve it,
DataSet ds=new DataSet();
ds = (DataSet)ViewState["Table"];
If a session variable is read before it has been assigned or if the current session times out, it will result in NullReferenceException as no null checking happens before reading from session variable.
There is no easy way to track the usage of session variables, their key names, their data types, the approximate memory being used per session, etc.
But do remember one thing putting DataSet inside the ViewState will increase the page loading.
**
The most efficient way to do what you
ask is to convert the dataset that was
returned by the DB into a Generic List
and then to use the session in order
to hold the List. List will spend less
memory then Dataset in session.
**
Firstly, do you really need that much data stored on a session level.
Could you possibly store a bit more on an application level (cache objects).
If you have a farm, that is more than one iis server, then you will have to use
either a in memory cache, or a sql cache.
With a large blob the in memory cache would be faster.
If its on a single computer in memory on that server would work.
I would seriously think about refactoring the problem.
Can you introduce paging or some other form of data window in order to reduce
the amount of data?
With dataset that big, I really dont think you should store it in session or viewstate. Storing in SQL state will have huge overhead of read and write too. Storing in memory session will significantly affect your worker process memory usage. Storing in ViewState is even worse as it will make page loading extremely slow due to page size and time taken to encode and decode the viewstate.
If you really have to do it, of all choices I will go for in memory session and pay the price. Otherwise I think you should consider refactoring your code and not having to cache datatable of a that size and pull out data in a set/page as you need to. Alternatively if you have layered architecture, have your application code cache it and when a request is made the application code can extract and return the record/subset to the UI when needed.

Resources