I have a shiny app which selects a subset of observations from a large dataframe, and then renders r markdown reports against each observation of that subset, zipping them all these reports at the end and downloading the zip file.
When the subset is small (eg less than 10 reports), all works fine, but a network timeout occurs once it takes more than a certain amount of time to render all the reports in the background (eg in some cases more than 100 reports need to be rendered).
I have tried editing the config file to set app_init_timeout = 3600 and app_idle_timeout =3600 but this does not seem to impact this problem....
Any ideas?
I solved this problem by separating the report creation from the download. I used eventReactive to handle the report creation and the zipping of the files, and then made the downloadHandler conditional on the existence of the zip file, so that it only appeared when the downloaded file was ready.
Related
Hi all complete ADF newbie here - I have a strange issue with DataFactory and surprisingly cant see that anyone else has experienced this same issue.
To summarize:
I have setup a basic copy activity from blob to an Azure SQL database with no transformation steps
I have setup a trigger based on wildcard name. I.e. any files loaded to blob that start with IDT* will be copied to the database
I have loaded a few files to a specific location in Azure Blob
The trigger is activated
As soon as it looks like it all works, a quick assessment of the record count shows that the same files have been imported X number of times
I have analysed what is happening, basically when I load my files to blob, they don't technically arrive at the exact same time. So when file 1 hits the blob, the wildcard search is triggered and it finds 1 file. Then when the 2nd file hits the blob some milliseconds later, the wildcard search is triggered again and this time it processes 2 files (first and second).
The the problem keeps compounding based on the number of files loaded.
I have tried multiple things to get this fixed to no avail, because fundamentally it is behaving "correctly".
I have tried:
Deleting the file once it has processed but again due to the millisecond issue the file is technically still there and can still be processed
I have added a loop to process 1 file at a time then deleting the file before the next is loaded based on file name in the blob but hasn't worked (and cant explain why)
I have limited ADF to only 1 concurrent connection, this reduces the number of times it has duplicated but unfortunately still duplicates it
Tried putting a wait timer at the start of the copy activity, but this causes a resource locking issue. I get an error saying that multiple waits are causing the process to fail
Tried a combination of 1,2 and 3 and i end up with an entirely different issue in that it is trying to find file X, but now no longer exists because it was deleted as part of step 2 above
I am really struggling with something that seems extremely basic. So i am sure it is me overlooking something extremely fundamental as noone else seems to have this issue with ADF.
I'm currently working on an R Shiny App that utilizes googlesheets4 to pull in a large dataset from GoogleSheets upon app launch. Loading in this dataset to my app takes ~2 minutes, which stalls my entire app's load time.
The only visual in my app is based on this GoogleSheets data, so it is very dependent on this specific dataset. Once the dataset gets pulled into my app, it is filter and therefore becomes much smaller (85,000 rows ---> 1,000 rows). This GoogleSheet data is updated every day, so I don't have the luxury of pre-downloading it once and storing it as a .csv forever.
There are two different fixes for this that I have tried but have been unsuccessful...curious if anyone has any thoughts.
Have a separate app running. My first idea was to create a separate Shiny app entirely, that would have a sole purpose of pulling the GoogleSheets df once a day. Once it pulls it, it would conduct the necessary data cleaning to get it down to ~1,000 rows, and then would push the smaller df to a different GoogleSheet link. Then, my original app with the visual would just always reference that new GoogleSheet (which would take much less time to load in).
The problem I ran into here is that I couldn't figure out how to write a new GoogleSheets doc using googlesheets4. If anyone has any idea how to do that it would be much appreciated.
Temporarily delay the load in of the GoogleSheets data, and let visual populate first. My second idea was to have the code that pulls in the GoogleSheets df be delayed upon launch, letting my visual first populate (using old data) and then have the GoogleSheets pull happen. Once the pull is complete, have the visual re-populate with the updated data.
I couldn't figure out the best/right way to make this happen. I tried messing around with sleep.sys() and futures/promises but couldn't get things to work correctly.
Curious if anyone has any thoughts on my 2 different approaches, or if there's a better approach I'm just not considering...
Thanks!
There is a function called write_sheet that allows you to write data to a google sheet. Does that work for you?
googlesheets4::write_sheet(data = your_data,
ss = spread_sheet_identifier,
sheet = "name_of_sheet_to_write_in")
If you on only want to add something without deleting everything in the sheet, the function is sheet_append
googlesheets4::sheet_append(data = your_data,
ss = spread_sheet_identifier,
sheet = "name_of_sheet_to_write_in")
Not sure you can store the credentials in a save way, but couldn't you use github actions? Or alternatively a cron job on your local computer?
I do not know how to illustrate this with a simple example. The problem is this:
I generate and display a flextable in a Shiny app and want to place it in a PDF. The only available method is to convert the flextable object to a PNG then place the PNG in the PDF. This works fine, except users are reporting strange results - getting the report with a table that looks nothing like that displayed in the app. I suspect that occasionally users are executing reports very close in time so that the last saved png is grabbed, but it was saved by another user.
The PNG files (there are three) are placed in the app directory, which I believe is not isolated from one user session to another. In the PDF I cannot use relative paths so I cannot save it to a different directory.
Any suggestions?
Have you tried naming the images with a unique key, such as a per-report number getting the images named something like chart_0153927_01.png instead of chart_01.png for report #0153927? Or something like a millisecond/microsecond timestamp set once at the start of the session to reduce collisions?
I had a small but important R file that I have been working on for a few days.
I created and uploaded a list of about 1,000 ID's to SQL Server the other day and today I was repeating the process with a different type of ID. I frequently save the file and after having added a couple of lines and saved, I ran the sqlSave() statement to upload the new ID's.
RStudio promptly converted all of my code to gibberish and froze (see screen shot).
After letting it try to finish for several minutes I closed RStudio and reopened it. It automatically re-opened my untitled text files where I had a little working code, but didn't open my main code file.
When I tried to open it I was informed that the file is 55 Megabytes and thus too large to open. Indeed, I confirmed that it really is 55MB now and when opening it in an external text editor I see the same gibberish as in this screnshot.
Is there any hope of recovering my code?
I suppose a low memory must be to blame. The object and command I was executing at the time were not resource intensive, however a few minutes before that I did retrieve an overly large dataframe from SQL Server.
You overwrote your code with a binary representation of your objects with this line:
save.image('jive.R')
save.image saves the R objects, not your R script file. To save your script, you can just click "File->Save". To save your objects, you would have to put that in a different file.
I have an R script that creates multiple scripts and submits these simultaneously to a computer cluster, and after all of the multiple scripts have completed and the output has been written in the respective folders, I would like to automatically launch another R script that works on these outputs.
I haven't been able to figure out whether there is a way to do this in R: the function 'wait' is not what I want since the scripts are submitted as different jobs and each of them completes and writes its output file at different times, but I actually want to run the subsequent script after all of the outputs appear.
One way I thought of is to count the files that have been created and, if the correct number of output files are there, then submit the next script. However to do this I guess I would have to have a script opened that checks for the presence of the files every now and then, and I am not sure if this is a good idea since it probably takes a day or more before the completion of the first scripts.
Can you please help me find a solution?
Thank you very much for your help
-fra
I think you are looking at this the wrong way:
Not an R problem at all, R happens to be the client of your batch job.
This is an issue that queue / batch processors can address on your cluster.
Worst case you could just wait/sleep in a shell (or R script) til a 'final condition reached' file has been touched
Inter-dependencies can be expressed with make too