Shiny apps in R: how to structure them correctly - r

I created my first Shiny app, which runs perfectly in my laptop.
However, I need to submit it to my professor and I want to make sure he will be able to run it.
I have a UI file, a server file, a global file and a process file.
The process file stores the data preparation.
The global file reads two RDS files which are the datasets that I use in the server.
Where should my libraries be loaded? For example, the app does not run without leaflet, how can I ensure the libraries are run automatically?
My RDS files are saved to my local drive, which means that my professor will need to change the path in order to use them, how can avoid this?
Shall I put the UI, server and global into one R script or is it ok to have them onto two different scripts?
Thank you!

As you describe your current set up, the most obvious place to load your libraries is at the start of your global file. (Or at the start of app.R if you move to a single file configuration.) Though not exactly what the reprex package is designed for, you could probably use reprex to make sure that your code is reproducible and independent of anything you may have overlooked. (You've already identified the obvious issue with data files.) Look here for more information on reprex.
Indeed. This is a problem. If you must load the data from files, you need to find a way of providing them to your professor. Telling him to edit your code by hand is not a good way to start. Exactly how to do this depends on the infrastucture your institution has set up, so it's difficult to advise you what to do. Do you not have a shared area for your course? Ask your fellow students - or even your professor.
Whether you bundle the app in a single file or in seperate files is really a matter of choice. I don't think there's a wroing way and a right way. For me, the deciding factor is usually the size of the app. Large apps get several files. For small apps, separation is just an unnecessary complication. Another factor - that doesn't seem to be an issue here - is whether I see the app as a potential front end for some methodology that might be worthy of its own package. In that case, I develop the app as a front end, and the package itself as separate entities.

Question 1 and 3 depend on criteria for code quality like performance and maintainability. When the code grows, there will come the point when it is more difficult to handle all code in one file. Once it grows even further you will encounter the point when you need split the app into modules to keep your code easy to maintain.
Regarding libraries I advise declaring them at the entrance point of an app (though, basically, that is a matter of taste and style). That way, you provide maximum clarity regarding the dependencies of the app. Again, if the app becomes very large and not all parts of the app rely on the same packages, it could improve performance and maintainability that each part loads the packages as required. It could give you a performance advantage when you do not need to load all packages at once. However, that is probably only true for very large apps.
However, since this all seems to be an exercise at university, I doubt that your app will reach higher levels of complexity.
Question 2: In a shiny app you can provide fileInput widget. This SO question shows you how.

Related

What is the best practice for transferring objects across R projects?

I would like to use R objects (e.g., cleaned data) generated in one git-versioned R project in another git-versioned R project.
Specifically, I have multiple git-versioned R projects (that hold drake plans) that do various things for my thesis experiments (e.g., generate materials, import and clean data, generate reports/articles).
The experiment-specific projects should ideally be:
Connectable - so that I can get objects (mainly data and materials) that I generated in these projects into another git-versioned R project that generates my thesis report.
Self-contained - so that I can use them in other non-thesis projects (such as presentations, reports, and journal manuscripts). When sharing such projects, I'd ideally like not to need to share a monolithic thesis project.
Versioned - so that their use in different projects can be independent (e.g., if I make changes to the data cleaning for a manuscript after submitting the thesis, I still want the thesis to be reproducible as it was originally compiled).
At the moment I can see three ways of doing this:
Re-create the data cleaning process
But: this involves copy/paste, which I'd like to avoid, especially if things change upstream.
Access the relevant scripts/functions by changing the working directory
But: even if I used here it seems that this would introduce poor reproducibility.
Make the source projects into packages and make the objects I want to "export" into exported data (as per the data section of Hadley's R packages guide)
But: I'd like to avoid the unnecessary metadata, artefacts, and noise (e.g., see Miles McBain's "Project as an R package: An okay idea") if I can.
Is there any other way of doing this?
Edit: I tried #landau's suggestion of using a single drake plan, which worked well for a while, until (similar to #vrognas' case) I ended up with too many sub-projects (e.g., conference presentations and manuscripts) that relied on the same objects. Therefore, I added some clarifications above to my intentions with the question.
My first recommendation is to use a single drake plan to unite the stages of the overall project that need to share data. drake is designed to handle a lot of moving parts this way, and it will be more seamless when it comes to drake's decisions about what to rerun downstream. But if you really do need different plans in different sub-projects that share data, you can track each shared dataset as a file_out() file in one plan and track it with file_in() in another plan.
upstream_plan <- drake_plan(
export_file = write_csv(dataset, file_out("exported_data/dataset.csv"))
)
downstream_plan <- drake_plan(
dataset = read_csv(file_in("../upstream_project/exported_data/dataset.csv"))
)
You fundamentally misunderstood Miles McBain’s critique. He isn’t saying that you shouldn’t write reusable code nor that you shouldn’t use packages. He’s saying that you shouldn’t use packages for everything. But reusable code (i.e. code that you want to reuse) absolutely belongs in packages (or, better, modules), which can then be used in multiple projects.
That being said, first off, pay attention to Will Landau’s advice.
Secondly, you can make your RStudio projects configurable such that they can load data based on paths given in a configuration. Once that’s accomplished, nothing speaks against hard-coding paths to data in different projects inside that config file.
I am in a similar situation. I have many projects that are spawned from one raw dataset. Previously, when the project was young and small, I had it all in one version controlled project. This got out of hand as more sub-projects were spawned and my git history got cluttered from working on projects in parallel. This could be to my lack of skills with git. My folder structure looked something like this:
project/.git
project/main/
project/sub-project_1/
project/sub-project_2/
project/sub-project_n/
I contemplated having each project in its own git branch, but then I could not access them simultaneously. If I had to change something to the main dataset (eg I might have not cleaned some parts) then project 1 could become outdated and nonfunctional. Once I had finished project 1, I would have liked it to be isolated and contained for reproducibility. This is easier to achieve if the projects are separated. I don't think a drake/targets plan would solve this?
I also looked briefly into having the projects as git submodules but it seemed to add too much complexity. Again, my git ignorance might shine through here.
My current solution is to have the main data as an R-package, and each sub-project as a separate git-versioned folder (they are actually packages as well, but this is not necessary). This way I can load in a specific version of the data (using renv for package versions).
My folder structure now looks something like this:
main/.git
sub-project_1/.git
sub-project_2/.git
sub-project_n/.git
And inside each sub-project, I call library(main) to load the cleaned data. Within each sub-project, a drake/targets plan could be used.

Static data storage on server-side

Why some data on server-side are still stored in DBC files, not in SQL-DB? In particular - spells (spells.dbc). What for?
We have a lot of bugs in spells and it's very hard to understand what's wrong with spell, but it's harder to find it spell...
Spells, Talents, achievements, etc... Are mostly found in DBC files because that is the way Blizzard did it back in the day. It's true that in 2019 this is a pretty outdated way to work indeed. Databases are getting stronger and more versatile and having hard-coded data is proving to be hard to work with. Hell, DBCs aren't really that heavy anyways and the reason why we haven't made this change yet is that... We have no other reason other than it being a task that takes a bit of time and It is monotonous to do.
We are aware that Trinity core has already made this change but they have far more contributors than we do if that serves as an excuse!
Nonetheless, this is already in our to-do list if you check the issue tracker at the main repository.
While It's true that we can't really edit DBC files because we would lose all the progress when re-extracted or lost the files, however, we can modify spells in a C++ file called SpellMgr.
There we have a function called SpellMgr::LoadDbcDataCorrections().
The main problem while doing this change is that we have to modify the core to support this change, and the function above contains a lot of corrections. Would need intense testing to make sure nothing is screwed up in the process.
In here by altering bits you can remove or add certain properties to the desired spells instead of touching the hard coded dbc files.
If you want an example, in this link, I have changed an Archimonde spell to have no cast time.
NOTE:
In this line, the commentary about damage can be miss leading but that's because I made a mistake and I haven't finished this pull request yet as of 18/04/2019.
The work has been started, notably by Kaev. I think at least 3 DBCs are now useless server side (but probably still needed client side, they are called DataBaseClient for a reason) like item.dbc.
Also, the original philosophy (for ALL cores, not just AC) was that we would not touch DBC because we don't do custom modifications, so there was no interest in having them server side.
But we wanted to change this and started to make them available directly in the DB, if you wish to help with that, it would be nice!
Why?
Because when emulation started, dbc fields were 90% unknown. So, developers created a parser for them that just required few code changes to support new fields as soon as their functionality was discovered.
Now that we've discovered 90% of required dbc fields and we've also created some great conversion tools for DBC<->SQL, it's just a matter of "effort".
SQL conversion is useful to avoid using of client data on server (you can totally overwrite them if you don't want to go against EULA) or just extends/customize them.
Here you are the issue about DBC->SQL conversion: https://github.com/azerothcore/azerothcore-wotlk/issues/584

Workflow for maintaining different versions of a Shiny app

Lately I have been making several very similar Shiny apps for different clients and hosting them on shinyapps.io.
Each app has a different title, different data, some differences in branding etc. but otherwise the code is very similar.
I'm having trouble maintaining these apps. When find and fix a bug I currently have to go through 5 different apps and make the change each time.
Does anyone have good suggestions on how to handle this? Git branches? I know the best solution would be to have one app and upload different data, but that's not possible unfortunately.
I'd like to keep using shinyapps.io, but I'm open to hosting the apps somewhere else if it makes my workflow better.
As I wrote in the comment shinyModules() will help you: https://shiny.rstudio.com/articles/modules.html
Shiny modules are to shiny functions, like ordinary functions are to repeating code.
Or to put it differently:
Repeating code --> function
repeating shiny function --> shiny module
As the documentation is a bit complicated here and there, i wrote a simplified example here:
Create a reactive function outside the shiny app.
You could store all the shiny modules in a file modules.R and add a global.R script to each of the apps that loads the modules (source("../modules.R"). Then you only have to update the functions within modules.R. That change of structure might take a while in the beginning. But, i think in the long run it pays off for more complex apps.
I ended up making a library that contained most of the code I needed for the apps, as suggested by Adam Spannbauer in the comments.
It's not perfect; I still have some duplication and I have to have the library on GitHub so that it will work with shinyapps.io. However, it's a big improvement on what I was doing previously.

Whether to write in "ui.R + server.R" or "app.R"

We can write our Shiny code in two separate files, "ui.R" and "server.R", alternatively we can write both the modules in a single file "app.R" and call the function shinyApp()
Is there any benefit regarding performance with either of the two approaches or we should choose one based on whether we want concise code or differentiated one?
They achieve the same thing. I usually like to write my real apps, that have lots of code and are complex, as two separate files to separate the logic and make it more maintainable. But when dealing with tiny apps for demo purposes or when posting an app to Stack Overflow or anywhere else, I find it's more reproducible and easier to have one statement (the app.R) approach.
Personal preference, really.
I think that app.R is better, but it's better to include your source files as the UI and server respectively, with source("file.R", local=TRUE). This way, you can separate the app into more than just 2 files while having an "overall" view of the app through the main file, like a main.cpp file in C++.

How do you use Excel server-side?

A client wants to "Web-enable" a spreadsheet calculation -- the user to specify the values of certain cells, then show them the resulting values in other cells.
(They do NOT want to show the user a "spreadsheet-like" interface. This is not a UI question.)
They have a huge spreadsheet with lots of calculations over many, many sheets. But, in the end, only two things matter -- (1) you put numbers in a couple cells on one sheet, and (2) you get corresponding numbers off a couple cells in another sheet. The rest of it is a black box.
I want to present a UI to the user to enter the numbers they want, then I'd like to programatically open the Excel file, set the numbers, tell it to re-calc, and read the result out.
Is this possible/advisable? Is there a commercial component that makes this easier? Are their pitfalls I'm not considering?
(I know I can use Office Automation to do this, but I know it's not recommended to do that server-side, since it tries to run in the context of a user, etc.)
A lot of people are saying I need to recreate the formulas in code. However, this would be staggeringly complex.
It is possible, but not advisable (and officially unsupported).
You can interact with Excel through COM or the .NET Primary Interop Assemblies, but this is meant to be a client-side process.
On the server side, no display or desktop is available and any unexpected dialog boxes (for example) will make your web app hang – your app will behave flaky.
Also, attaching an Excel process to each request isn't exactly a low-resource approach.
Working out the black box and re-implementing it in a proper programming language is clearly the better (as in "more reliable and faster") option.
Related reading: KB257757: Considerations for server-side Automation of Office
You definitely don't want to be using interop on the server side, it's bad enough using it as a kludge on the client side.
I can see two options:
Figure out the spreadsheet logic. This may benefit you in the long term by making the business logic a known quantity, and in the short term you may find that there are actually bugs in the spreadsheet (I have encountered tons of monster spreadsheets used for years that turn out to have simple bugs in them - everyone just assumed the answers must be right)
Evaluate SpreadSheetGear.NET, which is basically a replacement for interop that does it all without Excel (it replicates a huge chunk of Excel's non-visual logic and IO in .NET)
Although this is certainly possible using ASP.NET, it's very inadvisable. It's un-scalable and prone to concurrency errors.
Your best bet is to analyze the spreadsheet calculations and duplicate them. Now, granted, your business is not going to like the time it takes to do this, but it will (presumably) give them a more usable system.
Alternatively, you can simply serve up the spreadsheet to users from your website, in which case you do almost nothing.
Edit: If your stakeholders really insist on using Excel server-side, I suggest you take a good hard look at Excel Services as #John Saunders suggests. It may not get you everything you want, but it'll get you quite a bit, and should solve some of the issues you'll end up with trying to do it server-side with ASP.NET.
That's not to say that it's a panacea; your mileage will certainly vary. And Sharepoint isn't exactly cheap to buy or maintain. In fact, short-term costs could easily be dwarfed by long-term costs if you go the Sharepoint route--but it might the best option to fit a requirement.
I still suggest you push back in favor of coding all of your logic in a separate .NET module. That way you can use it both server-side and client-side. Excel can easily pass calculations to a COM object, and you can very easily publish your .NET library as COM objects. In the end, you'd have a much more maintainable and usable architecture.
Neglecting the discussion whether it makes sense to manipulate an excel sheet on the server-side, one way to perform this would probably look like adopting the
Microsoft.Office.Interop.Excel.dll
Using this library, you can tell Excel to open a Spreadsheet, change and read the contents from .NET. I have used the library in a WinForm application, and I guess that it can also be used from ASP.NET.
Still, consider the concurrency problems already mentioned... However, if the sheet is accessed unfrequently, why not...
The simplest way to do this might be to:
Upload the Excel workbook to Google Docs -- this is very clean, in my experience
Use the Google Spreadsheets Data API to update the data and return the numbers.
Here's a link to get you started on this, if you want to go that direction:
http://code.google.com/apis/spreadsheets/overview.html
Let me be more adamant than others have been: do not use Excel server-side. It is intended to be used as a desktop application, meaning it is not intended to be used from random different threads, possibly multiple threads at a time. You're better off writing your own spreadsheet than trying to use Excel (or any other Office desktop product) form a server.
This is one of the reasons that Excel Services exists. A quick search on MSDN turned up this link: http://blogs.msdn.com/excel/archive/category/11361.aspx. That's a category list, so contains a list of blog posts on the subject. See also Microsoft.Office.Excel.Server.WebServices Namespace.
It sounds like you're talking that the user has the spreadsheet open on their local system, and you want a web site to manipulate that local spreadsheet?
If that's the case, you can't really do that. Even Office automation won't help, unless you want to require them to upload the sheet to the server and download a new altered version.
What you can do is create a web service to do the calculations and add some vba or vsto code to the Excel sheet to talk to that service.

Resources