I am working on a shiny app that creates a live report based on the data of a google spreadsheet. I only want to read the whole gsheet again if there are new values in it.
For this, it would be best if I could get the last update time of the gsheet and then decide based on that if I need to read in the data again.
I know that registering the gsheet gives the last update time, but registering takes more time than reading in the whole table if there are only a few values in the table.
Picture of the result of the microbenchmark comparison
Is there a way to get only the time of the last update without registering a gsheet again in R?
Related
I'm getting the following error when trying to connect Power BI to my tabular model in AS:
AnalysisServices: Cannot query internal supporting structures for column 'table'[column] because they are not processed. Please refresh or recalculate the table 'table'
It is not a calculated column and the connection seems to work fine on the local copy. I would appreciate any help with this!
This would depend on how you are processing the data within your model. If you have just done a Process Data, then the accompanying meta objects such as relationships have not yet been built.
Every column of data that you load needs to also be processed in this way regardless of whether it is a calculated column or not.
This can be achieved by running a Process Recalc on the Database or by loading your tables or table partitions with a Process Full/Process Default rather than just a Process Data, which automatically runs the Process Recalc once the data is loaded.
If you have a lot of calculated columns and tables that result in a Process Recalc taking a long time, you will need to factor this in to your refreshes and model design.
If you run a Process Recalc on your database or a Process Full/Process Default on your table now, you will no longer have those errors in Power BI.
More in depth discussion on this can be found here: http://bifuture.blogspot.com/2017/02/ssas-processing-tabular-model.html
I'm currently working on an R Shiny App that utilizes googlesheets4 to pull in a large dataset from GoogleSheets upon app launch. Loading in this dataset to my app takes ~2 minutes, which stalls my entire app's load time.
The only visual in my app is based on this GoogleSheets data, so it is very dependent on this specific dataset. Once the dataset gets pulled into my app, it is filter and therefore becomes much smaller (85,000 rows ---> 1,000 rows). This GoogleSheet data is updated every day, so I don't have the luxury of pre-downloading it once and storing it as a .csv forever.
There are two different fixes for this that I have tried but have been unsuccessful...curious if anyone has any thoughts.
Have a separate app running. My first idea was to create a separate Shiny app entirely, that would have a sole purpose of pulling the GoogleSheets df once a day. Once it pulls it, it would conduct the necessary data cleaning to get it down to ~1,000 rows, and then would push the smaller df to a different GoogleSheet link. Then, my original app with the visual would just always reference that new GoogleSheet (which would take much less time to load in).
The problem I ran into here is that I couldn't figure out how to write a new GoogleSheets doc using googlesheets4. If anyone has any idea how to do that it would be much appreciated.
Temporarily delay the load in of the GoogleSheets data, and let visual populate first. My second idea was to have the code that pulls in the GoogleSheets df be delayed upon launch, letting my visual first populate (using old data) and then have the GoogleSheets pull happen. Once the pull is complete, have the visual re-populate with the updated data.
I couldn't figure out the best/right way to make this happen. I tried messing around with sleep.sys() and futures/promises but couldn't get things to work correctly.
Curious if anyone has any thoughts on my 2 different approaches, or if there's a better approach I'm just not considering...
Thanks!
There is a function called write_sheet that allows you to write data to a google sheet. Does that work for you?
googlesheets4::write_sheet(data = your_data,
ss = spread_sheet_identifier,
sheet = "name_of_sheet_to_write_in")
If you on only want to add something without deleting everything in the sheet, the function is sheet_append
googlesheets4::sheet_append(data = your_data,
ss = spread_sheet_identifier,
sheet = "name_of_sheet_to_write_in")
Not sure you can store the credentials in a save way, but couldn't you use github actions? Or alternatively a cron job on your local computer?
I am making an R app that is keeping track of remaining vacation days of employees. On the first time launching it creates new tables while afterwards all important tables and variables will get stored in reactiveValues. Once I close my app and re-open it I don't want to create the tables from scratch again, I want to use the data that were once stored in my global variables as well. Is this possible?
I'm getting daily exports of Google Analytics data into BigQuery and these form the basis for our main reporting dataset.
Over time i need to add new columns for additional things we use to enrich the data - like say a mapping from url to 'reporting category' for example.
This is easy to just add as a new column onto the processed tables (there is about 10 processing steps at the moment for all the enrichment we do).
This issue is if stakeholders then ask - can we add that new column to the historical data?
Currently i then need to rerun all the daily jobs which is very slow and costly.
This is coming up frequently enough that i'm seriously thinking about redesigning my data pipelines to tailor for the fact that i often need to essentially drop and recreate ALL the data from time to time when i need to add a new field or correct old dirty data or something.
I'm just wondering if there is better ways to
Add a new column to an old table in BQ (would be happy to do this by hand for these instances where i can just join the new column based on the ga [hit_key] i have defined which is basically a row key)
(Less common) Update existing tables based on some where condition.
Just wondering what best practices are and if anyone has had similar issues where you basically need to update an historic shema and if there are ways to do it without just dropping and recreating which is essentially what i'm currently doing.
To be clearer on my current approach: I'm taking the [ga_sessions_yyyymmdd] table and making a series of [ga_data_prepN_yyyymmdd] tables where is either add new columns at each step or reduce the data in some way. There is now 11 of these steps and each time i'm taking all the 100 or more columns along for the ride. This is what i'm going to try design away from as currently 90% of the columns at each stage dont even need to be touched as they can just be joined back on at the end maybe based on hit_key or something.
It's a little bit messy though to try and pick apart.
Adding new columns to the schema of the existing historical tables is possible, but the values for newly added columns will be NULLs. If you do need to populate values into these columns, probably the best approach is to use UPDATE DML statement. More details how to try it out is here: Does BigQuery support UPDATE, DELETE, and INSERT (SQL DML) statements?
I have a database that is used to store transactional records, these records are created and another process picks them up and then removes them. Occasionally this process breaks down and the number of records builds up. I want to setup a (semi) automated way to monitor things, and as my tool set is limited and I have an R shaped hammer, this looks like an R shaped nail problem.
My plan is to write a short R script that will query the database via ODBC, and then write a single record with the datetime, the number of records in the query, and the datetime of the oldest record. I'll then have a separate script that will process the data file and produce some reports.
What's the best way to create my datafile, At the moment my options are
Load a dataframe, add the record and then resave it
Append a row to a text file (i.e. a csv file)
Any alternatives, or a recommendation?
I would be tempted by the second option because from a semantic point of view you don't need the old entries for writing the new ones, so there is no reason to reload all the data each time. It would be more time and resources consuming to do that.