How do I compare R Markdown outputs with a previous version? - r

I have a large R Markdown file with many different outputs. The dataset is still being collected, and I often reknit the file to get an update including the most recent data. I would like to automatically see what has changed from the last time without needing to page through the entire output.
A) Is there an easier strategy than writing code to extract all the values from the output and formatting a side-by-side presentation myself?
B) The output includes several figures. I would like to compare these as well, but I would be happy with a solution that only compares numbers.
C) I would also be satisfied with a function or package that saves a defined subset of variables and lets me compare them to the values of variables saved with the same name in the past.

Related

Easily view/browse free-text datasets in R/RStudio - alternatives to View() in R/RStudio?

When working with survey data (10-30 columns, 100 - 10k rows, mix of demographic columns like name, age etc, and free-text responses up to nchar == 3000), View() isn't so useful, because it only displays the first 50 or so characters of lengthy strings (we can always widen the column, but this has practical limitations). AFAIK, increasing row height is not possible. So it is not easy to view free text inside RStudio unless it's in the console, which is not necessarily designed for easily browsing through columns of long strings.
Is there any function like View() that displays data similarly but allows for resizing of row heights (to display >1 line of long strings), and perhaps some smarts to allow us to explore list columns in data.frames?
One idea is a function that takes a data.frame argument, writes it as a temp file, and starts a shiny app that displays the data. But something in native R (or built into RStudio) would probably be better than an ad hoc shiny app.
Note: I do know how to achieve this in markdown using kableextra and similar packages that make nice bootstrap tables. However, the goal is to reduce friction between coding in the script pane in RStudio and exploring the data, and I feel like moving code into an Rmd has potential but creates extra friction
DT::datatable() provides the ability to view raw data in tables using all the features of the Javascript DataTables library either in RStudio's viewer tab or separately on the browser of your choice. You can further finetune the display of your data to fit your needs using any of the features provided here: https://rstudio.github.io/DT/

Non-programmer, ascii file data extract (can I even learn to code?)

As the title says, I'm not a programmer. I've tried R before, got very confused and abandoned it. I'm a physician, and I do all my statistics either with SPSS or Excel. I'd like to learn some coding for when I get into problems like this:
I have an ascii file that I'd like to extract data from. The fields are contained within columns of variable width. 90% of the file is useless to me. For example, the fields I'm interested in extracting are encoded in columns 00645-00649, 03315-03319, etc. I'd like to get this into a format so I can run stats in SPSS/Excel. Should I be looking to use R, Python, something else or am I totally beyond hope?
Thanks in advance.
It's impossible to say for certain given only the information here, but the DATA LIST command in SPSS may well allow you to read the data into SPSS directly from the current file. If you can specify the column locations of the desired variables, you can specify those on that command, and SPSS will simply skip over the unnamed columns.

Using `data()` for time series objects in R

I apologise if this question has been asked already (I haven't been able to find it). I was under the impression that I could access datasets in R using data(), for example, from the datasets package. However, this doesn't work for time series objects. Are there other examples where this is not the case? (And why?)
data("ldeaths") # no dice
ts("ldeaths") # works
(However, this works for data("austres"), which is also a time-series object).
The data function is designed to load package data sets and all their attributes, time series or otherwise.
I think the issue your having is that there is no stand-alone data set called ldeaths in the datasets package. ldeaths does exist as 1 of 3 data sets in the UKLungDeaths data set. The other two are fdeaths and mdeaths.
The following should lazily load all data sets.
data(UKLungDeaths)
Then, typing ldeaths in the console or using it as an argument in some function will load it.
str(ldeaths)
While it is uncommon for package authors to include multiple objects in 1 data set, it does happen. This line from the data function documentation gives on a 'heads up' about this:
"For each given data set, the first two types (‘.R’ or ‘.r’, and ‘.RData’ or ‘.rda’ files) can create several variables in the load environment, which might all be named differently from the data set"
That is the case here, as while there are three time series objects contained in the data set, not one of them is named UKLungDeaths.
This choice occurs when the package author uses the save function to write multiple R objects to an external file. In the wild, I've seen folks use the save function to bundle a description file with the data set, although this would not be the proper way to document something in a full on package. If your really curious, go read the documentation on the save function.
Justin
r

Convention for R function to read a file and return a collection of objects

I would like to find out what the "R way" would be to let users the following with R: I have a file that can contain the data of one or more analysis runs of some other software. My R package should provide additional ways to calculate statistics or produce plots for those analyses. So the first step a user would have to do, is read in the file (with one or more analyses), then select the analysis and work with it.
An analysis is uniquely identified by two names (an analysis name and an analysis type where the type should later correspond to an S3 class).
What I am not sure about is how to best represent the collection of analyses that is returned when reading in the file: should this be an object or simply a list of lists (since there are two ids for identifying an analysis, the first list could be indexed by name and the second by type). Using a list feels very low-level and clumsy though.
If the read function returns a special kind of container object what would be a good method to access one of the contained objects based on name and type?
There are probably many ways how to do this, but since I only started to work with R in a way where others should eventually use my code, I am not sure how to best follow existing R-conventions for how to design this.

Tableau to R connection - script_real returning rounded fraction numbers

I'm pretty new to Tableau but have a lot of experience with R. Everytime I use SCRIPT_REAL to call an R function based on Tableau aggregates, I get back a number that seems to be like the closest fraction approximation. For example if raw R gives me .741312, Tableau will spit out .777778, and so on. Does anything have any experience with this issue?
I'm pretty sure this is an aggregation issue.
From the Tableau and R Integration post by Jonathan Drummey on their community site:
Using Every Row of Data - Disaggregated Data For accurate results
for the R functions, sometimes those R functions need to be called
with every row in the underlying data. There are two solutions to
this:
Disaggregate the measures using Analysis->Aggregate Measures->Off. This doesn’t actually cause the measures to stop their
aggregations, instead it tells Tableau to return every row in the data
without aggregating by the dimensions on the view (which gives the
wanted effect). Using this with R scripts can get the desired results,
but can cause problems for views that we want to have R work on the
non-aggregated data and then display the data with some level of
aggregation.
The second solution deals with this situation: Add a
dimension such as a unique Row ID to the view, and set the Compute
Using (addressing) of the R script to be along that dimension. If
we’re doing some sort of aggregation with R, then we might need to
reduce the number of values returned by filtering them out with
something like:
IF FIRST()==0 THEN SCRIPT_REAL('insert R script here') END
If we need to then perform additional aggregations on that
data, we can do so with table calculations with the appropriate
Compute Usings that take into account the increased level of detail in
the view.

Resources