This is the source code of my Shiny app plotting polygons of more than 350 towns in Taiwan whenever there's any changed input from UI. Values of towns would change every time according to the inputs so there's little opportunity to do leafletProxy. Yet I am now having performance issues, especially on Shiny Server.
You may try running the app locally. The map would show up in like 10 seconds after the options are changed in UI. However, the deployed app on Google Compute Engine or on shinyapps.io takes so much longer (around 30 seconds) to depict the map, not only when initializing the app, but also every time the inputs are changed. Besides, the Shiny Server is frequently disconnected during computation like this:
When that disconnection happens, /var/log/shiny-server.log tells me:
[INFO] shiny-server - Error getting worker: Error: The application
exited during initialization.
, which has never happened locally.
It doesn't make any sense to me. How is it possible that my laptop is beating servers? My laptop is MacBook Air (Early 2015) with just 1.6 GHz Intel Core i5 and 8 GB 1600 MHz DDR3, whereas the VM on Google Compute Engine performs so badly even when it has 4 vCPU and 15 GB RAM.
How can I possibly find out the reasons of worse performances on Shiny Server, or refactor my codes?
Can be related: Leaflet R performance issues with large map
Well firstly - preprocessing has no place in the shiny application. Why repeat something every time someone uses the app when it can be done once and then that saved product can be loaded.
I'd have a look at the following steps:
Remove anything that can be done once externally (e.g. Ln 12 - 37)
Simplify the polygons to make the file smaller (faster loading, do this once and load in product)
Anything you generate (labels etc) that are repetitively done, do once and save in a list (e.g. metadata.rds) and read in once and reference.
Sometimes it can appear that your app runs faster locally because you dont actually restart
the session when developing - Shiny is basically kickstarting a session for each user (kinda).
Related
I’m having a lot of trouble working with Rstudio on a new PC. I could not find a solution searching the web.
When Rstudio is on, it is constantly eating up memory until it becomes unworkable. If I work on an existing project, it takes half an hour to an hour to become impossible to work with. If I start a new project without loading any objects or packages, just writing scripts without even running them, it takes longer to reach that point, however, it still does.
When I first start the program, the Task Manager shows memory usage of 950-1000 MB already (sometimes larger), and as I work, it climbs up to 6000 MB at which point it is impossible to work with as every activity is delayed and 'stuck'. Just to compare, on my old PC while working on the program, the Task Manager shows 100-150 MB. When I click the "Memory Usage Report" within Rstudio, the "used by session" is very small, the "used by system" is almost at a maximum yet Rstudio is the only thing taking up they system memory on the PC.
Things I tried: installing older versions of both R and Rstudio, pausing my anti-virus program, changing compatibility mode, zoom on "100%". It feels like Rstudio is continuously running something in the background as the memory usage keeps growing (and quite quickly). But maybe it is something else entirely.
I am currently using the latest versions of R and Rstudio (4.1.2, and 2021.09.0-351), on a PC with processor Intel i7, x64 bit, RAM 16GM, Windows 10.
What should I look for at this point?
On Windows, there is several typical memory or CPU issues with Rstudio. In my answer, I explain how the Rstudio interface itself use memory and CPU, as soon as you open a project (e.g., when Rstudio show you some .Rmd files). The memory / CPU cost associated with the computation is not covered in my answer (i.e. when you have performance issues when executing a line of code = not covered).
When working on 'long' .Rmd files within Rstudio on Windows, the CPU and/or memory usage get sometimes very high and increases progressively (e.g., because of a process named 'Qtwebengineprocess'). To solve the problem caused by long Rmd files loaded within a Rstudio session, you should:
pay attention to the process of Rstudio that consume memory, when scanning your code (i.e. disable or enable stuff in the 'Global options' menu of Rstudio). For example, try to disable 'inline display'(Tools => Global options => Rmarkdown => Show equation and image preview => Never). This post put me on this way to consider that memory / CPU leak are sometimes due to Rstudio itself, nor the data or the code.
set up a bookdown project, in order to split your large Rmd files into several Rmd. See here.
Bonus step, see if there is a conflict in some packages which are loaded with the command tidyverse_conflicts(), but it's already a 'computing problem' (not covered here).
A few days ago I noticed R was using 34% of the CPU when I have no code running. I noticed it again today and I can't figure out why. If I restart R, CPU usage returns to normal, then after 20 minutes or so it ramps up again.
I have a task scheduled that downloads a small file once a week using R, and another using wget in ubuntu (WSL). It might be the case that the constant CPU usage only happens after I download covid-related data from a github (link below). Is there a way to see if this is hijacking resources? If it is, other people should know about it.
I don't think it's a windows task reporting error since my temps are what I would expect for a constant 34% cpu usage (~56C).
Is this a security issue? Is there a way to see what R is doing? I'm sure there is a way to better inspect this but I don't know where to begin.. Glasswire hasn't reported any unusual activity.
From Win10 event viewer, I've noticed a lot of these recently but don't quite know how to read it:
The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID {8BC3F05E-D86B-11D0-A075-00C04FB68820} and APPID {8BC3F05E-D86B-11D0-A075-00C04FB68820} to the user redacted SID (S-1-5-21-1564340199-2159526144-420669435-1001) from address LocalHost (Using LRPC) running in the application container Unavailable SID (S-1-15-2-181400768-2433568983-420332673-1010565321-2203959890-2191200666-700592917). This security permission can be modified using the Component Services administrative tool.
*edit: CPU usage seems to be positively correlated with the duration R is open.
Given the information you provided, it looks like RStudio (not R) is using a lot of resources. R and RStudio are 2 very different things. These types of issues are very difficult to investigate as one need to be able to reproduce them on another computer. One thing you can maybe do is raise the issue on github to the RStudio team.
I am running an application that takes quite a while to load. I have to load up 7GB worth of data which when run locally takes the app about 220 seconds. However on my deployed app it disconnects from the server after roughly 120 seconds, before it can load.
I don't know exactly what I can put here since the Log doesn't show anything. If there is anywhere I can grab information from to show you all or if this is a known issue that can be easily solved I would love to know!
Are you using shinyapps.io? The free tier only allows you to use 1GB RAM. Loading 7GB data will definitely crash the server.
My shiny app works fine locally, but when I deploy it on the server (this app), it's fetching following error:
I looked at the logs and it says "out of memory".
I checked that shiny allows memory usage up to 1 GB for free accounts.
But when I looked at my app's memory usage it's hardly touching even 500 MB.
I admit that the CSV file which I am trying to access is of 1.3 GB, possibly can be the culprit.
But I am not convinced because often the good data for analysis are large and doesn't limit to size 2 or 50 MB.
Kindly requesting some expert advice and if possible a workaround.
Some confidential data is stored on a server and accessible for researchers via remote access.
Researchers can login via some (I think cisco) remote client, and share virtual machines on the same host
There's a 64 bit Windows running on the virtual machine
The system appears to be optimized for Stata, I'm among the first to use the data using R. There is no RStudio installed on the client, just the RGui 3.0.2.
And here's my problem: the data is saved in the stata format (.dta), and I need to open it in R. At the moment I am doing
read.dta(fileName, convert.factors = FALSE)[fields]
Loading in a smaller file (around 200MB) takes 1-2 minutes. However, loading in the main file (3-4 GB) takes very long, longer than my patience was for me. During that time, the R GUI is not responding anymore.
I can test my code on my own machine (OS X, RStudio) on a smaller data sample, which works all fine. Is this
because of OS X + RStudio, or only
because of the size of the file?
A college is using Stata on a similar file in their environment, and that was working fine for him.
What can I do to improve the situation? Possible solutions I came up with were
Load the data into R somehow differently (perhaps there is a way that doesn't require all this memory usage). I have also access to stata. If all else fails, I could prepare the data in Stata, for example slice it into smaller pieces and reassemble it in R
Ask them to allocate more memory to my user of the VM (if that indeed is the issue)
Ask them to provide RStudio as a backend (even if that's not faster, perhaps its less prone to crashes)
Certainly the size of the file is a prime factor, but the machine and configuration might be, too. Hard to tell without more information. You need a 64 bit operating system and a 64 bit version of R.
I don't imagine that RStudio will help or hinder the process.
If the process scales linearly, it means your big data case will take (120 seconds)*(4096 MB/200 MB) =2458 seconds, or around three quarters of an hour. Is that how long you waited?
The process might not be linear.
Was the processor making progress? If you checked CPU and memory, was the process still running? Was it doing a lot of page swaps?