How can I use a projection instead of raw columns - projection

I have a dataset with columns for
number of items (an integer value)
weight of the items (a fractional value)
A product category
A start time
An end time
and from this dataset I want to train a data model that given a number of items, weight and category can predict a duration (end time - start time).
How can I transform or set my label column to the duration, so that I get an EstimatorChain that I can call Fit on with an IDataView that I've loaded from CSV?

You can use the ML.NET CustomMapping functionality, to calculate the duration, and call that Label.
This contains an example of how to use it.
Action<Data, Data> mapping =
(input, output) => output.Label= input.End - Input.Start;
where Data would be your data model that contains a Label property, in addition to the other properties.

Related

Calculate mean value of date dependent variable for a certain end and start date taking into consideration groups in R

I have two CSV data sets. The first on is monthly data or a certain variable(PM_2.5) given by spacial constrains (latitude and longitude) which can be seen as the place variables. The second data frame contains different start and end dates for the observation. Also, those are given under the same spacial constraint and for each individual v1. You can see the data structure in the pictures. enter image description here enter image description here
I want to sum all observations of (PM_2.5) for one individual (ID) over the observation period (start date to end date) given the constraint that the geospatial identification (latitude, longitude) is the same.
Thanks a lot for your help.
Best,
Luise

Can I create factor groups based on repeating pattern in a time series in R?

I have time series data with a repeating pattern that I wish to label/group as separate factor levels each instance it occurs. The example below uses a sequence of 1:100, but my real data is based on a time series: day-of-the-year. I've used a conditional test in my data to identify each instance that the pattern is present - this is simulated with a simple TRUE/FALSE column below (In my real data I'm looking for each instance where a trend line goes outside a particular threshold (standard-deviation)).
From the data below, is there a way to group each set of 'TRUE' values? Perhaps a way to identify the start/stop of each TRUE sequence (rows 26:50, and 76:100) and somehow create two separate group factor levels?
I've attached an image below to illustrate my end goal of trying to identify areas of a curve that exceed a particular boundary.
data = data.frame(order = seq(1,100),
test = rep(c('False','True'), each = 25))
plot(data, col = data.table::rleid(data[,2]))

Assigning a Value to All Points in a List

I have a set of true/false data I need to prepare for a chi-squared analysis in R. Currently it's organized by time of day in several lists. What would be the best way to add a variable to each of these lists for time of day, fill in each list's points with the time they were collected, then combine them into one table?

ggplot using filter for one data

My data contains 29 different ID with each ID containing sub ID with their measurement value, upper bound value and lower bound value, time stamp.
I am trying to create a ggplot with points, where I could filter for one ID and plot their sub ID with measurement value, lower bound and upperbound value.
I am using the below, code, helped by satckoverflow, but anyone could how I could incorporate the filter function here.
My data looks like this. for sample
data<- data.frame(c(ST_10,ST_11,ST_10,ST_10,ST_11,ST_10) , c(M1,M3,M2,M5,M7,M9), c(0.3,0.5,1.98,198),c(0.2,0.4,1.98,199),c(0.1,0.3,1.0,190) )
this is just a sample, and I am using the below code.
ggplot(data)+
geom_jitter(aes(x=Day.ID., y=Measurement.Value, col=as.factor(Day.ID.)))+
geom_line(aes(x=Day.ID., y=LSL.ID), colour='orange')+
geom_line(aes(x=Day.ID., y=USL.ID), colour='orange')+
facet_wrap(~DESC)
according to the code, data is my data name, day is the timestamp, LSL is lower bound, USL is upperbound, and Desc is the sub ID

How to code a numeric field in r by a set of labels

I have a large data frame with around 190000 rows. The data frame has a label column storing 12 nominal categories. I want to change the weight column value of each row based on the label value of that row. For example, if the label of a row is "Res", I want to change its weight field value to 0.5 and if it is "Condo", I want to change its weight value to 2.
I know it is easy to implement this by if else statement but given the number of rows, the processing time takes so much long. I wanted to use cut() but it seems that cut categorizes based on intervals not nominal categories. I would appreciate any suggestion that can decrease the processing time.

Resources