How to compute frequency of categorical variables based on a condition

How to compute frequency of categorical variables based on a condition - r

Good afternoon ,
Assume we have the following dataset from UCI :
ballons=structure(list(YELLOW = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("PURPLE",
"YELLOW"), class = "factor"), SMALL = structure(c(2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L
), .Label = c("LARGE", "SMALL"), class = "factor"), STRETCH = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
1L, 1L), .Label = c("DIP", "STRETCH"), class = "factor"), ADULT = structure(c(1L,
2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L,
1L, 2L), .Label = c("ADULT", "CHILD"), class = "factor"), T = c(TRUE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)), class = "data.frame", row.names = c(NA,
-19L))
# output :
YELLOW SMALL STRETCH ADULT T
1 YELLOW SMALL STRETCH ADULT TRUE
2 YELLOW SMALL STRETCH CHILD FALSE
3 YELLOW SMALL DIP ADULT FALSE
4 YELLOW SMALL DIP CHILD FALSE
5 YELLOW LARGE STRETCH ADULT TRUE
6 YELLOW LARGE STRETCH ADULT TRUE
7 YELLOW LARGE STRETCH CHILD FALSE
8 YELLOW LARGE DIP ADULT FALSE
9 YELLOW LARGE DIP CHILD FALSE
10 PURPLE SMALL STRETCH ADULT TRUE
11 PURPLE SMALL STRETCH ADULT TRUE
12 PURPLE SMALL STRETCH CHILD FALSE
13 PURPLE SMALL DIP ADULT FALSE
14 PURPLE SMALL DIP CHILD FALSE
15 PURPLE LARGE STRETCH ADULT TRUE
16 PURPLE LARGE STRETCH ADULT TRUE
17 PURPLE LARGE STRETCH CHILD FALSE
18 PURPLE LARGE DIP ADULT FALSE
19 PURPLE LARGE DIP CHILD FALSE
Assume also i applied a clustering algorithm to get a results like the following :
clusterss=data.frame(index=1:19,class=c(1,2,3,3,3,2,3,1,2,3,3,2,2,3,2,2,1,1,2))
> clusterss
index class
1 1 1
2 2 2
3 3 3
4 4 3
5 5 3
6 6 2
7 7 3
8 8 1
9 9 2
10 10 3
11 11 3
12 12 2
13 13 2
14 14 3
15 15 2
16 16 2
17 17 1
18 18 1
19 19 2
Here the index variable represents the ballons rows and the class is the obtained cluster where the ballons row belongs to.
I know that we could compute the frequency of all categorical variables by :
> sapply(ballons,table)
y1 y2 y3 y4 y5
PURPLE 10 10 8 11 12
YELLOW 9 9 11 8 7
However , i need to compute this for each cluster independently . This means i need ( for each class ) to select their associated observations , After that i can compute the frequencies. For example , with class=1 :
# Expected results for the first cluster : class == 1
result1 <- filter(clusterss, class == 1)
sapply(ballons[result1[,1],],table)
y1 y2 y3 y4 y5
PURPLE 2 3 2 3 3
YELLOW 2 1 2 1 1
# Expected results for the second cluster : class == 2
result2 <- filter(clusterss, class == 2)
sapply(ballons[result2[,1],],table)
y1 y2 y3 y4 y5
PURPLE 5 5 3 4 5
YELLOW 3 3 5 4 3
# Expected results for the third cluster : class == 3
result3 <- filter(clusterss, class == 3)
sapply(ballons[result3[,1],],table)
y1 y2 y3 y4 y5
PURPLE 3 2 3 4 4
YELLOW 4 5 4 3 3
I'm searching an efficient way to obtain such results ( maybe with select function of dplyr ).
Thank you for help !

You can give an additional column, here clusterss$class, to table:
sapply(ballons,table, clusterss$class)
#lapply(ballons,table, clusterss$class) #Alternative
# YELLOW SMALL STRETCH ADULT T
#[1,] 2 3 2 3 3
#[2,] 2 1 2 1 1
#[3,] 5 5 3 4 5
#[4,] 3 3 5 4 3
#[5,] 3 2 3 4 4
#[6,] 4 5 4 3 3

Related

time series plot for missing data

I have some sequence event data for which I want to plot the trend of missingness on value across time. Example below:
id time value
1 aa122 1 1
2 aa2142 1 1
3 aa4341 1 1
4 bb132 1 2
5 bb2181 2 1
6 bb3242 2 3
7 bb3321 2 NA
8 cc122 2 1
9 cc2151 2 2
10 cc3241 3 1
11 dd161 3 3
12 dd2152 3 NA
13 dd3282 3 NA
14 ee162 3 1
15 ee2201 4 2
16 ee3331 4 NA
17 ff1102 4 NA
18 ff2141 4 NA
19 ff3232 5 1
20 gg142 5 3
21 gg2192 5 NA
22 gg3311 5 NA
23 gg4362 5 NA
24 ii111 5 NA
The NA suppose to increase over time (the behaviors are fading). How do I plot the NA across time

I think this is what you're looking for? You want to see how many NA's appear over time. Assuming this is correct, if each time is a group, then you can count the number of NA's appear in each group
data:
df <- structure(list(id = structure(1:24, .Label = c("aa122", "aa2142",
"aa4341", "bb132", "bb2181", "bb3242", "bb3321", "cc122", "cc2151",
"cc3241", "dd161", "dd2152", "dd3282", "ee162", "ee2201", "ee3331",
"ff1102", "ff2141", "ff3232", "gg142", "gg2192", "gg3311", "gg4362",
"ii111"), class = "factor"), time = c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L,
5L, 5L), value = c(1L, 1L, 1L, 2L, 1L, 3L, NA, 1L, 2L, 1L, 3L,
NA, NA, 1L, 2L, NA, NA, NA, 1L, 3L, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-24L))
library(tidyverse)
library(ggplot2)
df %>%
group_by(time) %>%
summarise(sumNA = sum(is.na(value)))
# A tibble: 5 × 2
time sumNA
<int> <int>
1 1 0
2 2 1
3 3 2
4 4 3
5 5 4
You can then plot this using ggplot2
df %>%
group_by(time) %>%
summarise(sumNA = sum(is.na(value))) %>%
ggplot(aes(x=time)) +
geom_line(aes(y=sumNA))
As you can see, as time increases, the number of NA's also increases

Find all numbers in range with local min and global max

I have a dataframe testData which is made up of many unique ids. My objective is to identify whether or not the ids contain all of the possible integers in the range of month, yday, and week where the min is the first value per id and max is the max value in the entire range of the column
Please note this is different from the related question here
In other words, if id has all possible values in the range in month, then it should receive a t. For example, under month where id = 1, the min value is 2 and the max value for the whole column is 5, therefore 1 should receive a true because there is a value 2, 3, 4, and 5. Where id = 2, however, there are only values 1, 2, 4, and 5, so the 3 was skipped and therefore 2 should receive an f.
So far, I have a formula that takes all the values in the entire range of the column (but NOT the min value per id):
library(data.table)
setDT(testData)
output<-testData[,.(month=all(unique(testData$month)%in%.SD$month),yday=all(unique(testData$yday)%in%.SD$yday),week=all(unique(testData$week)%in%.SD$week)),by=(id)]
Any idea how I could integrate min where min is the minimum value per id and max is the maximum value in the range?
> testData
id month yday week
1 1 2 1 1
2 3 1 2 1
3 4 1 3 1
4 2 1 4 1
5 3 3 5 2
6 4 3 6 3
7 2 2 7 1
8 3 1 8 3
9 1 2 9 2
10 5 4 10 3
11 3 2 11 1
12 4 4 12 1
13 5 4 13 2
14 1 3 14 3
15 1 4 15 1
16 1 5 16 2
17 2 4 17 3
18 2 5 18 1
19 5 5 19 1
> dput(testData)
structure(list(id = c(1L, 3L, 4L, 2L, 3L, 4L, 2L, 3L, 1L, 5L,
3L, 4L, 5L, 1L, 1L, 1L, 2L, 2L, 5L), month = c(2L, 1L, 1L, 1L,
3L, 3L, 2L, 1L, 2L, 4L, 2L, 4L, 4L, 3L, 4L, 5L, 4L, 5L, 5L),
yday = 1:19, week = c(1L, 1L, 1L, 1L, 2L, 3L, 1L, 3L, 2L,
3L, 1L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 1L)), .Names = c("id",
"month", "yday", "week"), class = "data.frame", row.names = c(NA,
-19L))
In the end, the output should look like this:
> output
id month yday week
1 1 t f t
2 2 f f f
3 3 f f t
4 4 f f f
5 5 t f t

Using dplyr you can group by id and then just check whether all elements of the range are in the values present for each group. Note that min(month) gives the min for the grouped id variable, but max(testData$month) gives the max for the whole list.
library(dplyr)
tD2 <- testData %>% group_by(id) %>%
summarise(month=all(min(month):max(testData$month) %in% month),
yday=all(min(yday):max(testData$yday) %in% yday),
week=all(min(week):max(testData$week) %in% week))
tD2
# A tibble: 5 × 4
id month yday week
<int> <lgl> <lgl> <lgl>
1 1 TRUE FALSE TRUE
2 2 FALSE FALSE FALSE
3 3 FALSE FALSE TRUE
4 4 FALSE FALSE FALSE
5 5 TRUE FALSE TRUE

Tidy data.frame with repeated column names

I have a program that gives me data in this format
toy
file_path Condition Trial.Num A B C ID A B C ID A B C ID
1 root/some.extension Baseline 1 2 3 5 car 2 1 7 bike 4 9 0 plane
2 root/thing.extension Baseline 2 3 6 45 car 5 4 4 bike 9 5 4 plane
3 root/else.extension Baseline 3 4 4 6 car 7 5 4 bike 68 7 56 plane
4 root/uniquely.extension Treatment 1 5 3 7 car 1 7 37 bike 9 8 7 plane
5 root/defined.extension Treatment 2 6 7 3 car 4 6 8 bike 9 0 8 plane
My goal is to tidy the format into something that at least can be easier to finally tidy with reshape having unique column names
tidy_toy
file_path Condition Trial.Num A B C ID
1 root/some.extension Baseline 1 2 3 5 car
2 root/thing.extension Baseline 2 3 6 45 car
3 root/else.extension Baseline 3 4 4 6 car
4 root/uniquely.extension Treatment 1 5 3 7 car
5 root/defined.extension Treatment 2 6 7 3 car
6 root/some.extension Baseline 1 2 1 7 bike
7 root/thing.extension Baseline 2 5 4 4 bike
8 root/else.extension Baseline 3 7 5 4 bike
9 root/uniquely.extension Treatment 1 1 7 37 bike
10 root/defined.extension Treatment 2 4 6 8 bike
11 root/some.extension Baseline 1 4 9 0 plane
12 root/thing.extension Baseline 2 9 5 4 plane
13 root/else.extension Baseline 3 68 7 56 plane
14 root/uniquely.extension Treatment 1 9 8 7 plane
15 root/defined.extension Treatment 2 9 0 8 plane
If I try to melt from toy it doesn't work because only the first ID column will get used for id.vars (hence everything will get tagged as cars). Identical variables will get dropped.
Here's the dput of both tables
structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension",
"root/else.extension", "root/some.extension", "root/thing.extension",
"root/uniquely.extension"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c("Baseline", "Treatment"), class = "factor"),
Trial.Num = c(1L, 2L, 3L, 1L, 2L), A = 2:6, B = c(3L, 6L,
4L, 3L, 7L), C = c(5L, 45L, 6L, 7L, 3L), ID = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "car", class = "factor"), A = c(2L,
5L, 7L, 1L, 4L), B = c(1L, 4L, 5L, 7L, 6L), C = c(7L, 4L,
4L, 37L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "bike", class = "factor"),
A = c(4L, 9L, 68L, 9L, 9L), B = c(9L, 5L, 7L, 8L, 0L), C = c(0L,
4L, 56L, 7L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "plane", class = "factor")), .Names = c("file_path",
"Condition", "Trial.Num", "A", "B", "C", "ID", "A", "B", "C",
"ID", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA,
-5L))
structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L, 3L,
4L, 2L, 5L, 1L, 3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension",
"root/else.extension", "root/some.extension", "root/thing.extension",
"root/uniquely.extension"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("Baseline",
"Treatment"), class = "factor"), Trial.Num = c(1L, 2L, 3L, 1L,
2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L), A = c(2L, 3L, 4L,
5L, 6L, 2L, 5L, 7L, 1L, 4L, 4L, 9L, 68L, 9L, 9L), B = c(3L, 6L,
4L, 3L, 7L, 1L, 4L, 5L, 7L, 6L, 9L, 5L, 7L, 8L, 0L), C = c(5L,
45L, 6L, 7L, 3L, 7L, 4L, 4L, 37L, 8L, 0L, 4L, 56L, 7L, 8L), ID = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L), .Label = c("bike",
"car", "plane"), class = "factor")), .Names = c("file_path",
"Condition", "Trial.Num", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA,
-15L))

You can use the make.unique-function to create unique column names. After that you can use melt from the data.table-package which is able to create multiple value-columns based on patterns in the columnnames:
# make the column names unique
names(toy) <- make.unique(names(toy))
# let the 'Condition' column start with a small letter 'c'
# so it won't be detected by the patterns argument from melt
names(toy)[2] <- tolower(names(toy)[2])
# load the 'data.table' package
library(data.table)
# tidy the data into long format
tidy_toy <- melt(setDT(toy),
measure.vars = patterns('^A','^B','^C','^ID'),
value.name = c('A','B','C','ID'))
which gives:
> tidy_toy
file_path condition Trial.Num variable A B C ID
1: root/some.extension Baseline 1 1 2 3 5 car
2: root/thing.extension Baseline 2 1 3 6 45 car
3: root/else.extension Baseline 3 1 4 4 6 car
4: root/uniquely.extension Treatment 1 1 5 3 7 car
5: root/defined.extension Treatment 2 1 6 7 3 car
6: root/some.extension Baseline 1 2 2 1 7 bike
7: root/thing.extension Baseline 2 2 5 4 4 bike
8: root/else.extension Baseline 3 2 7 5 4 bike
9: root/uniquely.extension Treatment 1 2 1 7 37 bike
10: root/defined.extension Treatment 2 2 4 6 8 bike
11: root/some.extension Baseline 1 3 4 9 0 plane
12: root/thing.extension Baseline 2 3 9 5 4 plane
13: root/else.extension Baseline 3 3 68 7 56 plane
14: root/uniquely.extension Treatment 1 3 9 8 7 plane
15: root/defined.extension Treatment 2 3 9 0 8 plane
Another option is to use a list of column-indexes for measure.vars:
tidy_toy <- melt(setDT(toy),
measure.vars = list(c(4,8,12), c(5,9,13), c(6,10,14), c(7,11,15)),
value.name = c('A','B','C','ID'))
Making the column-names unique isn't necessary then.
A more complicated method that creates names that are better distinguishable by the patterns argument:
# select the names that are not unique
tt <- table(names(toy))
idx <- which(names(toy) %in% names(tt)[tt > 1])
nms <- names(toy)[idx]
# make them unique
names(toy)[idx] <- paste(nms,
rep(seq(length(nms) / length(names(tt)[tt > 1])),
each = length(names(tt)[tt > 1])),
sep = '.')
# your columnnames are now unique:
> names(toy)
[1] "file_path" "Condition" "Trial.Num" "A.1" "B.1" "C.1" "ID.1" "A.2"
[9] "B.2" "C.2" "ID.2" "A.3" "B.3" "C.3" "ID.3"
# tidy the data into long format
tidy_toy <- melt(setDT(toy),
measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'),
value.name = c('A','B','C','ID'))
which will give the same end-result.
As mentioned in the comments, the janitor-package can be helpful for this problem as well. The clean_names() works similar as the make.unique function. See here for an explanation.

with tidyverse we can do :
library(tidyverse)
toy %>%
repair_names(sep="_") %>%
pivot_longer(-(1:3),names_to = c(".value","id"), names_sep="_") %>%
select(-id)
#> # A tibble: 15 x 7
#> file_path Condition Trial.Num A B C ID
#> <fct> <fct> <int> <int> <int> <int> <fct>
#> 1 root/some.extension Baseline 1 2 3 5 car
#> 2 root/some.extension Baseline 1 2 1 7 bike
#> 3 root/some.extension Baseline 1 4 9 0 plane
#> 4 root/thing.extension Baseline 2 3 6 45 car
#> 5 root/thing.extension Baseline 2 5 4 4 bike
#> 6 root/thing.extension Baseline 2 9 5 4 plane
#> 7 root/else.extension Baseline 3 4 4 6 car
#> 8 root/else.extension Baseline 3 7 5 4 bike
#> 9 root/else.extension Baseline 3 68 7 56 plane
#> 10 root/uniquely.extension Treatment 1 5 3 7 car
#> 11 root/uniquely.extension Treatment 1 1 7 37 bike
#> 12 root/uniquely.extension Treatment 1 9 8 7 plane
#> 13 root/defined.extension Treatment 2 6 7 3 car
#> 14 root/defined.extension Treatment 2 4 6 8 bike
#> 15 root/defined.extension Treatment 2 9 0 8 plane
#> Warning message:
#> Expected 2 pieces. Missing pieces filled with `NA` in 4 rows [1, 2, 3, 4].

Create n data sets from one data set without repetition using stratified sampling

I have a data set train which has say 500 rows, I would like to get a data frame with n columns each containing 500/n values(row numbers without repetition in other columns) basing on stratified sampling of a column in train, say train$y.
I have tried the following but it returns duplicate values,
library(caret)
n <- 10 # I want to divide my data set in to 10 parts
data_partition <- createDataPartition(y = train$y, times = 10,
p = 1/n, list = F)
To summarize with an example,
If I have a data set train with 100 rows and one of the column train$y(value= 0 or 1). I would like to get 10 data sets with 10 rows each from the train and they should be stratified basing on train$y and they should not be seen on other 9 data sets.
Example input:
ID x y
1 1 0
2 2 0
3 3 1
4 1 1
5 2 1
6 4 1
7 4 0
8 4 1
9 3 1
10 1 1
11 2 1
12 3 0
13 4 1
14 5 1
15 6 1
16 10 1
17 9 1
18 3 0
19 7 0
20 8 1
Expected output (4 first column, with details of each set aside)
ID x y sample set 1 set 2 set 3
1 1 0 set 2 ID x y ID x y ID x y
2 2 0 set 3 8 4 1 11 2 1 17 9 1
3 3 1 set 3 9 3 1 12 3 0 5 2 1
4 1 1 set 3 10 1 1 13 4 1 6 4 1
5 2 1 set 3 18 3 0 1 1 0 7 4 0
6 4 1 set 3 19 7 0 14 5 1 2 2 0
7 4 0 set 3 20 8 1 15 6 1 3 3 1
8 4 1 set 1 16 10 1 4 1 1
9 3 1 set 1
10 1 1 set 1
11 2 1 set 2
12 3 0 set 2
13 4 1 set 2
14 5 1 set 2
15 6 1 set 2
16 10 1 set 2
17 9 1 set 3
18 3 0 set 1
19 7 0 set 1
20 8 1 set 1
In the above example given input as ID,x and y. I would like to get the column sample which I can segregate into those 3 tables(to the right) whenever I want to.
Please observe, the y in the data has 14- 1s and 6- 0s which are in the ratio of 70:30 and the output sets are almost in similar ratio.
Sample dataset in a copy/run friendly format:
data <- structure(list(ID = 1:20, x = c(1L, 2L, 3L, 1L, 2L, 4L, 4L, 4L,
3L, 1L, 2L, 3L, 4L, 5L, 6L, 10L, 9L, 3L, 7L, 8L), y = c(0L, 0L,
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L,
0L, 1L)), .Names = c("ID", "x", "y"), class = "data.frame", row.names = c(NA,
-20L))

It can be done using the caret package. Try the code below
# Createing dataset
data <- structure(list(ID = 1:20, x = c(1L, 2L, 3L, 1L, 2L, 4L, 4L, 4L,
3L, 1L, 2L, 3L, 4L, 5L, 6L, 10L, 9L, 3L, 7L, 8L), y = c(0L, 0L,
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L,
0L, 1L)), .Names = c("ID", "x", "y"), class = "data.frame", row.names = c(NA, -20L))
# Solution
library(caret)
k <- createFolds(data$y,k = 3,list = F)
addmargins(table(k,data$y))

Identifying 24hour periods in GPS data

I would like to identify sequential 24 hour periods in GPS data. I have a datetime column that is numerical (ex: 41422.29) and I know each rounded number is a day. I know how to get the day (just round), however my schedule does not specifically follow days. Instead, I would specifically like to identify all of the columns that are within 24 hours from the first column, and then go from there. I can not use a count of columns, as 24 hours is not divided into equal increments.
This is my logic so far, though it doesn't get me where I need to be:
for (i in 1:length(example)){
base<-round(example$DT_LMT[i], digits=0)
if(example$DT_LMT[i]<=base+1) {
example$DaySeq<-base
}
else {
base+1
}
}
I have a dummy data set example, with the kind of thing I would like:
structure(list(ID = 1:19, DT_LMT = c(41423.62517, 41423.79236,
41423.95868, 41424.12534, 41424.29203, 41424.45888, 41424.62535,
41424.79186, 41424.95852, 41425.12502, 41425.29185, 41425.75016,
41425.79201, 41425.83352, 41425.87534, 41425.91744, 41425.95868,
41426.00105, 41426.04257), NEED = c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), .Names = c("ID",
"DT_LMT", "NEED"), class = "data.frame", row.names = c(NA, -19L
))

Here is one approach, assuming df is the data assigned in your question. I created a new variable, need which I believe is your desired outcome.
transform(df, need = trunc(DT_LMT - DT_LMT[1]) + 1)

I would add 1 to the first value as the filter the data frame.
data<-data.frame(ID = 1:19, DT_LMT = c(41423.62517, 41423.79236,
41423.95868, 41424.12534, 41424.29203, 41424.45888, 41424.62535,
41424.79186, 41424.95852, 41425.12502, 41425.29185, 41425.75016,
41425.79201, 41425.83352, 41425.87534, 41425.91744, 41425.95868,
41426.00105, 41426.04257), NEED = c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L))
data[data$DT_LMT<=data$DT_LMT[1]+1,]
Output:
ID DT_LMT NEED
1 1 41423.63 1
2 2 41423.79 1
3 3 41423.96 1
4 4 41424.13 1
5 5 41424.29 1
6 6 41424.46 1
If you want to split the data into a list by 24 hour period.
split(data,unlist(lapply(data$DT_LMT,function(x){floor(x-data$DT_LMT[1])})))
Output:
$`0`
ID DT_LMT NEED
1 1 41423.63 1
2 2 41423.79 1
3 3 41423.96 1
4 4 41424.13 1
5 5 41424.29 1
6 6 41424.46 1
$`1`
ID DT_LMT NEED
7 7 41424.63 2
8 8 41424.79 2
9 9 41424.96 2
10 10 41425.13 2
11 11 41425.29 2
$`2`
ID DT_LMT NEED
12 12 41425.75 3
13 13 41425.79 3
14 14 41425.83 3
15 15 41425.88 3
16 16 41425.92 3
17 17 41425.96 3
18 18 41426.00 3
19 19 41426.04 3
To add a column with the day.
data$day<-lapply(data$DT_LMT,function(x){floor(x-data$DT_LMT[1])+1})

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to compute frequency of categorical variables based on a condition - r

You can give an additional column, here clusterss$class, to table: sapply(ballons,table, clusterss$class) #lapply(ballons,table, clusterss$class) #Alternative # YELLOW SMALL STRETCH ADULT T #[1,] 2 3 2 3 3 #[2,] 2 1 2 1 1 #[3,] 5 5 3 4 5 #[4,] 3 3 5 4 3 #[5,] 3 2 3 4 4 #[6,] 4 5 4 3 3

Related

time series plot for missing data

Find all numbers in range with local min and global max

Tidy data.frame with repeated column names

Create n data sets from one data set without repetition using stratified sampling

Identifying 24hour periods in GPS data

Categories

Resources