How can I use the R arctools package on a data set that contains multiple subjects? - r

I want to use the activity_stats function (and others) on a data set that has several dozen subjects. Based on the documentation, it looks like I have to make a separate data frame for each subject, and then run the functions on each individual data frame. Is that the case?
https://github.com/martakarass/arctools#using-arctools-package-to-compute-physical-activity-summaries

Related

Comparing two lists in R

Hi so I have two nearly identical data sets, however one has some values the other doesn't and I'm trying to compare them in R. I'm trying to create a list of the observations in the two data sets that aren't shared between the two, but I'm struggling with how to do this. I'm relatively new to R.
You should try the arsenal package.
try
install.packages("arsenal")
library(arsenal)
captureVariable <- summary(arsenal::comparedf(list1,list2))
captureVariable[["diffs.byvar.table"]]
There are some other helpful outputs that will be captured by captureVariable if that particular table doesn't suit your needs.

How to manage more than one dataset - Machine Learning Azure

Is there any module that accepts more than one dataset for processing?
For instance "Split Data" , "Edit meta data" and "select columns in dataset" do not accept more than one dataset as input.
This is what I did :
There are several numeric and categorical variables in my model.I used "Convert to indicator variables " module to create dummy variables for my data. How do I include the indicator variables and numeric variables into one dataset so that I can split the data for my model ?
As of now, I'm doing data wrangling in Python and moving the datasets in Azure MLS for modeling. Ideally, I need to work on data wrangling in Azure MLS.
I expect to have one module that consolidates both the categorical binned variables and numeric variables in Azure MLS
Yup, there are several modules receiving multiple datasets - Add Columns, Apply SQL Transformation, Execute Python Script, to name a few.
Not sure why you need them for indicator values though - assuming you're talking about Train/Test Split, then I would just split the data after invoking the "Convert to indicator values" module.
I will add to the above answer. You can use Execute R script, Join data as well if the datasets have common keys.

Using `data()` for time series objects in R

I apologise if this question has been asked already (I haven't been able to find it). I was under the impression that I could access datasets in R using data(), for example, from the datasets package. However, this doesn't work for time series objects. Are there other examples where this is not the case? (And why?)
data("ldeaths") # no dice
ts("ldeaths") # works
(However, this works for data("austres"), which is also a time-series object).
The data function is designed to load package data sets and all their attributes, time series or otherwise.
I think the issue your having is that there is no stand-alone data set called ldeaths in the datasets package. ldeaths does exist as 1 of 3 data sets in the UKLungDeaths data set. The other two are fdeaths and mdeaths.
The following should lazily load all data sets.
data(UKLungDeaths)
Then, typing ldeaths in the console or using it as an argument in some function will load it.
str(ldeaths)
While it is uncommon for package authors to include multiple objects in 1 data set, it does happen. This line from the data function documentation gives on a 'heads up' about this:
"For each given data set, the first two types (‘.R’ or ‘.r’, and ‘.RData’ or ‘.rda’ files) can create several variables in the load environment, which might all be named differently from the data set"
That is the case here, as while there are three time series objects contained in the data set, not one of them is named UKLungDeaths.
This choice occurs when the package author uses the save function to write multiple R objects to an external file. In the wild, I've seen folks use the save function to bundle a description file with the data set, although this would not be the proper way to document something in a full on package. If your really curious, go read the documentation on the save function.
Justin
r

Adding multiple random forest models into a single data frame or data table in R

I am training multiple 'treebag' models in R. I loop through a data set, where each iteration I define a specific subset based on a feature in the set and train on that subset. I could save each result to disk, but I was hoping to save all the models to a single data frame or data table. I am not sure if this is at all possible. The data frame/table could have numerous classes (numeric and character), however I would like to add a completed model.
To start, is it even possible to assign multiple models to a single column, where each model is assigned to a different row in a data frame or data table?
Any ideas on how this could work is greatly appreciated.

Visualizing Data time series using the zoo package

I am loading time series data using the read.zoo function. I noticed that when loading time series using zoo package it doesn't display as a data frame and when clicked on in displayed as shown in the picture.
One cannot discern what the data looks like from this. While data pulled using the read.csv/read.table are labeled as a data.frame and displayed in neat manner when clicked on. I know I can simply use the View(data) command but this is very cumbersome, I am sorry to be picky but it would be nice to simply click on the data and have it displayed with the appropriate columns and rows.
I also noticed that when I generate variables using the data-set that the new variables are never attached to the data-set in which they were created and therefore must use the data=merge(data,newvariable) command to combine it to the initial data.
Are there any techniques that can be employed to fix these two issues?

Resources