Easily view/browse free-text datasets in R/RStudio - alternatives to View() in R/RStudio? - r

When working with survey data (10-30 columns, 100 - 10k rows, mix of demographic columns like name, age etc, and free-text responses up to nchar == 3000), View() isn't so useful, because it only displays the first 50 or so characters of lengthy strings (we can always widen the column, but this has practical limitations). AFAIK, increasing row height is not possible. So it is not easy to view free text inside RStudio unless it's in the console, which is not necessarily designed for easily browsing through columns of long strings.
Is there any function like View() that displays data similarly but allows for resizing of row heights (to display >1 line of long strings), and perhaps some smarts to allow us to explore list columns in data.frames?
One idea is a function that takes a data.frame argument, writes it as a temp file, and starts a shiny app that displays the data. But something in native R (or built into RStudio) would probably be better than an ad hoc shiny app.
Note: I do know how to achieve this in markdown using kableextra and similar packages that make nice bootstrap tables. However, the goal is to reduce friction between coding in the script pane in RStudio and exploring the data, and I feel like moving code into an Rmd has potential but creates extra friction

DT::datatable() provides the ability to view raw data in tables using all the features of the Javascript DataTables library either in RStudio's viewer tab or separately on the browser of your choice. You can further finetune the display of your data to fit your needs using any of the features provided here: https://rstudio.github.io/DT/

Related

Can I use a wrangled dataframe from one output in an RShiny server to continue wrangling in another?

I have multiple tabs (one for uploading files, and many others that show table and plot outputs). Every tab uses the same base dataframe. That base dataframe is influenced by the user input in the UI on the upload tab. Then, each output (tables and plots) is individually manipulated by the user in different tabs to change data displayed, types of plots, etc.
This is my first time building an RShiny app. The fileInput reactives directly influence the base dataframe, and I get errors if I try and do the data-wrangling with these objects outside of an output in the server. Because of this the build process has become cumbersome since all changes I make to the base dataframe in my working output also need to be copied into all others in the server.
Is there a better way to do this?
I have tried to use renderTable to export the wrangled dataframe into the UI and then use it as an input for the table and plot outputs on the server side, but that just caused more problems than it solved.
I apologize I do not have any code to show. My code is long (as a result of this problem), untidy (as a result of me), and will just take up too much space and offer no helpful context.
Thank you in advance!

Using two R scripts in a PowerBI dashboard where script B depends on script A

I am trying to translate some code that we previously used in a software similar to PowerBI into some form that's compatible with PowerBI. One thing that I need to do for that is to generate a model fit to some data and use that to display some data on the fit (in some further visual elements).
From a sequential point of view, this is trivial. Generate an object, then work on that object and print some data. But from what I understand about PowerBI, this kind of interdependency between R scripts / visual elements (generate an object, then hand that object to other procedures to generate further output) is not intended and since I need to use several visual elements, and all of them depend on the output of the first, I have no idea how to work this out.
I need to use several visual elements, and all of them depend on the output of the first
Then the data needs to be created in Power Query and loaded into the data model. You can run R in Power Query to generate the data, and visualize it with regular Power BI Visuals and the R Visual.

How to make a nice looking table in base r (not markdown)

I’ve been looking for an hour, but everything I can find about how to make a nice looking table out of a data frame mentions that it’s for rmarkdown, html, or latex.
Is it not possible to make a nice looking table in base r?
plot(x, y) makes a graph.
Is there no function like: printTable(df)?
Broadly speaking over what you can get from a normal print in base::print there is not much else you can do. You could try to twist plot function to plot values from selected cells in a data frame but that would be very onerous to develop and impractical in the light of currently available and maintained solutions. There is a number of packages that let you achieve what you need. For instance you can try formattable by renkun-ken.
Example
For a simple example you can try formattable::formattable(mtcars[1:10,])
Creating Images
For a solution creating images from tables, have a look at this discussion. As discussed, in the linked answer if you insist on generating a static image you can use grid.table function offered via gridExtra: tbl <- grid.table(mtcars[1:5,]).
You may be interested in the flextable package that is very easy to use with multiple options to create nice tables.
You can also have multiple word, pdf, or html output types.
I invite you to check the manual : https://cran.r-project.org/web/packages/flextable/vignettes/overview.html

How do I compare R Markdown outputs with a previous version?

I have a large R Markdown file with many different outputs. The dataset is still being collected, and I often reknit the file to get an update including the most recent data. I would like to automatically see what has changed from the last time without needing to page through the entire output.
A) Is there an easier strategy than writing code to extract all the values from the output and formatting a side-by-side presentation myself?
B) The output includes several figures. I would like to compare these as well, but I would be happy with a solution that only compares numbers.
C) I would also be satisfied with a function or package that saves a defined subset of variables and lets me compare them to the values of variables saved with the same name in the past.

Shiny - Efficient way to use ggplot2(boxplot) & a 'reactive' subset function

I have a dataset with > 1000K rows and 5 columns. (material & prices been the relevant columns)
I have written a 'reactive' Shiny app which uses ggplot2 to create a boxplot of the price of the various materials.
e.g the user selects 4-5 materials from a list and then Shiny creates a boxplot of the price of each material :
Price spread of: Made of Cotton, Made of Paper, Made of Wood
It also creates a material combination data plot of the pricing spread of the combination of all the materials
e.g Boxplot of
Price spread of: Made of Cotton & Paper & Wood
It is working relatively quickly for the sample dataset (~5000 rows) but I am worried about scaling it effectively.
The dataset is static so I look at the following solutions:
Calculate the quartile ranges of the various materials (data <-
summary(data)) and then use googleViz to create a candle stick,
however I run into problems when trying to calculate the material combination plot as there are over 100 materials, so calculating
all the possible combinations offline is not feasible.
Calculate the quartile ranges of the various materials (data <- summary(data)) and then create a matrix which stores the row numberof the summary data (min,median,max,1st&3rd quartile) for each material. I can then use some rough calculations to establish the summary() data for the material combination plot,
and then plot using GoogleVIZ however I have little experience with this type of calculation using Shiny.
Can anyone suggest the most robust and scalable way to calculate & boxplot reactive subsets using Shiny?
I understand this a question related to method, rather than code, but I am new to the capabilities of R and am still digesting the different class capabilities, and don't want to 'miss a trick' so to speak.
As always thanks!
Please see below for methods reviewed.
Quartile Clustering: A quartile based technique for Generating Meaningful Clusters
http://arxiv.org/ftp/arxiv/papers/1203/1203.4157.pdf
Conditionally subsetting and calculating a new variable in dataframe in shiny
If you really have a dataset that has more than 1000K, which is 1M. It is probably in a flat file or in a database. You can always do some precalculations and store the result in a database table and use shiny app to call that table instead of loading everything into R every time people open up your shiny app.
I have built several shiny apps for internal use and the lesson I have learned is that: before you build your app, you need to carefully think about, how can I minimize the calculations for R and at the same time deliver the info to app user. Some of our data is 10billion+ and use Hive query will take more than 1 hour. Then I ended up precalculate result and put it on the crontab to update the result table every midnight.
I prefer, maybe your method2? or store the precalculation in a mysql database. (Maybe a Python script update the table once a day if you need some real-time feature later).

Resources