Reading LabVIEW TDMS files with R - r

As part of a transition from MATLAB to R, I am trying to figure out how to read TDMS files created with National Instruments LabVIEW using R. TDMS is a fairly complex binary file format (http://www.ni.com/white-paper/5696/en/).
Add-ons exist for excel and open-office (http://www.ni.com/white-paper/3727/en/), and I could make something in LabVIEW to make the conversion, but I am looking for a solution that would let me read the TDMS files directly into R. This would allow us to test out the use of R for certain data processing requirements without changing what we do earlier in the data acquisition process. Having a simple process would also reduce the barriers to others trying out R for this purpose.
Does anyone have any experience with reading TDMS files directly into R, that they could share?

This is far from supporting all TDMS specifications but I started a port of a python npTDMS package into R here https://github.com/msuefishlab/tdmsreader and it has been tested out in the context of a shiny app here

You don't say if you need to automate the reading of these files using R, or just convert the data manually. I'm assuming you or your colleagues don't have any access to LabVIEW yourselves otherwise you could just create a LabVIEW tool to do the conversion (and build it as a standalone application or DLL, if you have the professional development system or app builder - you could run the built app from your R code by passing parameters on a command line).
The document on your first link refers to (a) add-ins for OpenOffice Calc and for Excel, which should work for a manual conversion and which you might be able to automate using those programs' respective macro languages, and (b) a C DLL for reading TDMS - would it be possible for you to use one of those?

Related

How to capture the names of all files read and written by an R script?

In a project, many separate scripts are executed and they read and write files from / for each other. It has become quite confusing, which file is coming from where. Of course, this is bad software design but that is how this has grown over a long time.
Now, I would to execute all scripts in their proper order and capture which files are read and which ones are written by each script.
Is there, e.g., a way to monitor and log the input and output of the R process while the script is running (from within R)? Or any other ideas for a solution?
I am running R under Windows 10.

How do I unit test functions in my R package that interact with the file system?

I'm working on an R package at work. My package has gotten large enough that I've decided I need some form of repeatable testing. I settled upon using testthat and mockery. I'm not a developer, so this is the first time I'm writing tests at this level.
I deal with a lot of data files and it's very convenient to have functions in my package to help locate files. These functions interact with the file system via calls to dir. For example,
Data from one event can be split over multiple files. If I have file datafile_2017.10.20_12.00.00, I have a function that can find the next file that is part of the same event, i.e. datafile_2017.10.20_12.05.00.
My question is this: what is the best way to test functions like this? My intuition is to avoid using actual files stored somewhere else in my repository because that can fail for a number of reasons, e.g. different paths, different repo states b/w systems. I searched around and it looks like different languages have mocking libraries that allow for mocking directory structures. I haven't found anything like that for R (except for testthatsomemore, but it was removed from CRAN sometime in 2016).
Is there an R package that allows for mocking directory structures? Or am I wrong to move away from storing small test files in my repo?

Upload Saved ML Model in R (local) to Azure Machine Learning Studio

I am trying to reduce my development headaches for creating a ML Webservice on Azure ML Studio. One of the things that stuck me was can we just upload .rda files in the workbench and load it via an RScript (like in the figure below).
But can't connect directly to the R Script block. There's another way to do it (works to upload packages that aren't available in Azure's R directories) -- using zip. But there isn't really any resource out there that I found to access the .rda file in .zip.
I have 2 options here, make the .zip work or any other work around where I can directly use my .rda model. If someone could guide me about how to go forward it would appreciate it.
Note: Currently, I'm creating models via the "Create RModel" block, training them and saving it, so that I can use it to make a predictive web service. But for models like Random Forest, not sure how the randomness might create models (local versions and Azure versions are different, the setting of seed also isn't very helpful). A bit tight on schedule, Azure ML seems boxed for creating iterations and automating the ML workflow (or maybe I'm doing it wrong).
Here is an example of uploading a .rda file for scoring:
https://gallery.cortanaintelligence.com/Experiment/Womens-Health-Risk-Assessment-using-the-XGBoost-classification-algorithm-1

Running R script on hadoop and mapreduce

I have an R-script that does stuff with a bunch of tweets and I would like to use the same script on the same data but saved in an Hadoop file system. According to this Hortonworks tutorial I could use R code with data from my HDFS, but it is not quite clear.
Can I use the very same R-script, taking advantage of the mapreduce paradigm, by using this Revolution R? Should I change my code or is there a way to execute the same functions optimized for an Hadoop architecture?
My wish would be to write my code on a standard R IDE like R-Studio and then use it, or use the most of it, on my cloud services (such as Microsoft Azure) with mapreduce on the base.
Yes, you can run any R script across different data platform from Hadoop to Spark to Teradata and SQL Server by using environment specific compute context.
Following two links should help you get started on how to use Revolution R / Microsoft R Server on Hadoop:
https://msdn.microsoft.com/en-us/microsoft-r/scaler-hadoop-getting-started
https://github.com/Azure/Azure-MachineLearning-DataScience/blob/master/Misc/MicrosoftR/Samples/NYCTaxi/NYC2013_MRS_LinearBinary.Rmd

Apache log file format analysis by R

I was trying to do the analysis of weblog files by R. I am comfortable to deal with the date and bytes, wherever numeric data is present but fail to deal with the strings.
From the log file (log file in CSV format), I want to find out the particular user (with help of IP and Agents) and its total spending on the web page.
There are numurous libraries to do this kind of analysis, although I could find none in R. A google for parse apache logfile yielded a library in Perl, and python parse apache logfile yields the Scratchy library. Both rely on regular expressions to parse the contents of the file.
From here there are two ways to deal with the apache logfile:
Call perl or python from R, either using a direct link, or using a system call (this is simpler).
Take the idea from the perl or python lib and use it to implement R versions of the functions. This will take a lot of time.
You refer to a csv file, but I think the libraries above work with the original text file with the Apache log, so I'd use those, and not your csv file.
In addition, this SO post mentions an answer by #doug (profile) where he states that he has created some functions to create visualizations of apache logfile data, parsed by Python. Maybe you could send him a message or mail and see if he is willing to share the code.
Logfile analysis in R is an interesting topic we had before, you can find our discussion right here. Maybe this discussion might also help you to adjust to the SO etiquette in order to get better feedback (not to take anything away from yours, Paul).

Resources