Related
I want to model whether I can respond to an event. This depends on when I last responded to an event. I need to take these conditions into account:
Events can occur throughout the day, but I can only respond between 7 am and 11pm.
Time between events can vary, but time between responses must be at least 90 min.
In other words, you can only respond to a new event if your last response was at least 90 min ago.
It is important, that I don’t want a 1 if the time between events is >90 but I only want a 1 if the time between an event and last response is >90.
structure(list(event_day = c(0L, 0L, 0L, 0L, 0L, 0L), event_hr = c(1,
8, 9, 9, 10, 12), event_minute = c(41L, 25L, 22L, 41L, 26L, 1L
), onset_time = c(101, 505, 562, 581, 626, 721)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Onset_time is the time since start of the model in minutes. We would like to have a “respond_col” with 1 when we can respond and 0 when we cannot respond. For these 6 rows, the respond_col is supposed to result in 0,1,0,0,1,1.
This sums up what I want to do but I don’t know how to code this:
If difference in onset_time>90 since last 1 in respond_col, print 1 in respond_col, else print 0 in respond_col.
Hope you can help me!
This requires a few data modifications and a for statement.
This requires 2 libraries, hms and tidyverse.
I added rows to your data frame to test some of the conditions you mentioned.
library(hms)
library(tidyverse)
dat <- read.table(header = T, text = "
event_day event_hr event_minute onset_time
1 0 1 41 101
2 0 8 25 505
3 0 9 22 562
4 0 9 41 581
5 0 10 26 626
6 0 12 1 721")
# add rows for testing
dat <- do.call("rbind",
list(dat, c(0, 12, 59, 721 + 58),
c(0, 14, 20, 721 + 58 + 21 + 60),
c(0, 23, 5, 860 + 45 + 8 * 60),
c(1, 7, 5, 860 + 45 + 16 * 60))) %>% as.data.frame()
# event_day event_hr event_minute onset_time
# 1 0 1 41 101
# 2 0 8 25 505
# 3 0 9 22 562
# 4 0 9 41 581
# 5 0 10 26 626
# 6 0 12 1 721
# 7 0 12 59 779
# 8 0 14 20 860
# 9 0 23 0 1380
# 10 1 7 5 1865
The next step requires a vector that stores the time thresholds (7-11) and the following changes to dat: a column with the time differences, a field that indicates whether or not the time meets the 7 am - 11 pm criteria, and 2 columns filled with 0: accumulated time and response. Both of these columns are filled in the for statement. The function hms is from the library hms.
these <- c(hms(0, 0, 7), hms(0, 0, 23)) # day constraints
(dat1 <- dat %>% mutate(
time = c(0, diff(onset_time)), # 0 for first row, then the rest
time_avail = between(hms(hours = event_hr, minutes = event_minute),
these[1], these[2]),
# accumulated time since last reset; whether response is warranted (conditions met)
accum_time = 0, response = 0))
# event_day event_hr event_minute onset_time time time_avail accum_time response
# 1 0 1 41 101 0 FALSE 0 0
# 2 0 8 25 505 404 TRUE 0 0
# 3 0 9 22 562 57 TRUE 0 0
# 4 0 9 41 581 19 TRUE 0 0
# 5 0 10 26 626 45 TRUE 0 0
# 6 0 12 1 721 95 TRUE 0 0
# 7 0 12 59 779 58 TRUE 0 0
# 8 0 14 20 860 81 TRUE 0 0
# 9 0 23 5 1385 525 FALSE 0 0
# 10 1 7 5 1865 480 TRUE 0 0
For the for statement, I'm using a boolean flag: reset for when the cumulative time resets.
reset = F # boolean flag for cumulative time
for(j in 1:nrow(dat1)) {
if(j == 1 | reset) { # first row or reset
dat1$accum_time[j] <- dat1$time[j]
reset = F
} else { # any row other than first or reset
dat1$accum_time[j] <- dat1$accum_time[j - 1] + dat1$time[j]
} # determine whether trigger the reset
if(dat1$accum_time[j] > 90 & dat1$time_avail[j]) {
dat1$response[j] <- 1
reset = T
}
}
dat1
# event_day event_hr event_minute onset_time time time_avail accum_time response
# 1 0 1 41 101 0 FALSE 0 0
# 2 0 8 25 505 404 TRUE 404 1
# 3 0 9 22 562 57 TRUE 57 0
# 4 0 9 41 581 19 TRUE 76 0
# 5 0 10 26 626 45 TRUE 121 1
# 6 0 12 1 721 95 TRUE 95 1
# 7 0 12 59 779 58 TRUE 58 0
# 8 0 14 20 860 81 TRUE 139 1
# 9 0 23 5 1385 525 FALSE 525 0
# 10 1 7 5 1865 480 TRUE 1005 1
Let me know if you have any questions.
I have a vector in a dataframe in R which is a time series that oscillates between 0 and 100.
I am wanting to create a new column/vector in R that has that will be series on 1s and 0s. It will be 1 when the time series drops below 10 and will continue to be 1 until it reaches 80. Thereafter it will go back to zero. So there is a path dependency in this problem I am wanting to solve.
Something like;
DataFrame %>% mutate(BinaryIndicator = ....)
I think the picture below will be the easiest way to show what I am wanting to get to. Any help would be sincerely appreciated.
Here is a link to an example of what I would like to create
Any help much appreciated.
Since the value of one row depends on the value of the previous row (after its value is updated from its previous row, etc), I think a rolling-window operation is appropriate. zoo does this well.
dat <- data.frame(x=rep(c(60, 50, 40, 35, 30, 25, 20, 15, 10.2, 9, 2, 3, 9, 40, 72, 81, 90), 2))
dat$binary <- cumsum(zoo::rollapply(dat$x, 2, function(a) {
if (length(a) < 2) return(0)
if (a[1] >= 10 && a[2] < 10) return(1)
if (a[1] < 80 && a[2] >= 80) return(-1)
return(0)
}, partial = TRUE, align = "right"))
dat
# x binary
# 1 60.0 0
# 2 50.0 0
# 3 40.0 0
# 4 35.0 0
# 5 30.0 0
# 6 25.0 0
# 7 20.0 0
# 8 15.0 0
# 9 10.2 0
# 10 9.0 1
# 11 2.0 1
# 12 3.0 1
# 13 9.0 1
# 14 40.0 1
# 15 72.0 1
# 16 81.0 0
# 17 90.0 0
# 18 60.0 0
# 19 50.0 0
# 20 40.0 0
# 21 35.0 0
# 22 30.0 0
# 23 25.0 0
# 24 20.0 0
# 25 15.0 0
# 26 10.2 0
# 27 9.0 1
# 28 2.0 1
# 29 3.0 1
# 30 9.0 1
# 31 40.0 1
# 32 72.0 1
# 33 81.0 0
# 34 90.0 0
(I wonder if the internal logic can be simplified some.)
I have a data set where participants were assigned to different groups and completed the same tests. I know I can use the aggregate function to identify the mean and sd but I cannot figure out how to find the outliers in these groups.
df<-read.table(header=T, text="id, group, test1, test2
1, 0, 57, 82
2, 0, 77, 80
3, 0, 67, 90
4, 0, 15, 70
5, 0, 58, 72
6, 1, 18, 44
7, 1, 44, 44
8, 1, 18, 46
9, 1, 20, 44
10, 1, 14, 38")
I like the format of this code but do not know how to change it in order to identify outliers for each group for each test.
ALSO, I want outliers to be considered anything greater than 2 standard deviations rather than 3. Can I format that too within this code?
##to get outliers on test1 if groups were combined
badexample <- boxplot(df$test1, plot=F)$out
which(df$test1 %in% badexample)
This would work if I wanted the outliers of both groups together on test1 but I want to separate by group.
Output should contain:
Outliers for group 0 on test1
outliers for group 0 on test2
outliers for group 1 on test1
outliers for group 1 on test2
You can write a function to compute the outliers and then call it with ave.
outlier <- function(x, SD = 2){
mu <- mean(x)
sigma <- sd(x)
out <- x < mu - SD*sigma | x > mu + SD*sigma
out
}
with(df, ave(test1, group, FUN = outlier))
# [1] 0 0 0 0 0 0 0 0 0 0
with(df, ave(test2, group, FUN = outlier))
# [1] 0 0 0 0 0 0 0 0 0 0
To have new columns in df with these results, assign in the usual way.
df$out1 <- with(df, ave(test1, group, FUN = outlier))
df$out2 <- with(df, ave(test2, group, FUN = outlier))
An option, using data.table:
library(data.table)
df <- read.table(header=T, sep=",", text="id, group, test1, test2
1, 0, 57, 82
2, 0, 77, 80
3, 0, 67, 90
4, 0, 15, 70
5, 0, 58, 72
6, 1, 18, 44
7, 1, 44, 44
8, 1, 18, 46
9, 1, 20, 44
10, 1, 14, 38")
DT <- as.data.table(df)
DT[, `:=`(mean1 = mean(test1), sd1 = sd(test1), mean2 = mean(test2), sd2 = sd(test2)), by = "group"]
DT[, `:=`(outlier1 = abs(test1-mean1)>2*sd1, outlier2 = abs(test2-mean2)>2*sd2)]
DT
# id group test1 test2 mean1 sd1 mean2 sd2 outlier1 outlier2
# 1: 1 0 57 82 54.8 23.66854 78.8 8.074652 FALSE FALSE
# 2: 2 0 77 80 54.8 23.66854 78.8 8.074652 FALSE FALSE
# 3: 3 0 67 90 54.8 23.66854 78.8 8.074652 FALSE FALSE
# 4: 4 0 15 70 54.8 23.66854 78.8 8.074652 FALSE FALSE
# 5: 5 0 58 72 54.8 23.66854 78.8 8.074652 FALSE FALSE
# 6: 6 1 18 44 22.8 12.04990 43.2 3.033150 FALSE FALSE
# 7: 7 1 44 44 22.8 12.04990 43.2 3.033150 FALSE FALSE
# 8: 8 1 18 46 22.8 12.04990 43.2 3.033150 FALSE FALSE
# 9: 9 1 20 44 22.8 12.04990 43.2 3.033150 FALSE FALSE
# 10: 10 1 14 38 22.8 12.04990 43.2 3.033150 FALSE FALSE
Here's a way with dplyr -
df %>%
mutate_at(
vars(starts_with("test")),
list(outlier = ~(abs(. - mean(.)) > 2*sd(.)))
)
id group test1 test2 test1_outlier test2_outlier
1 1 0 57 82 FALSE FALSE
2 2 0 77 80 FALSE FALSE
3 3 0 67 90 FALSE FALSE
4 4 0 15 70 FALSE FALSE
5 5 0 58 72 FALSE FALSE
6 6 1 18 44 FALSE FALSE
7 7 1 44 44 FALSE FALSE
8 8 1 18 46 FALSE FALSE
9 9 1 20 44 FALSE FALSE
10 10 1 14 38 FALSE FALSE
I have the following simple linear programming model that I am solving with the linprog package in R:
install.packages("linprog")
library(linprog)
function_opt <- c(8, 13, 9, 8, 9, 11, 12, 10, 7, 8, 10, 9)
names(function_opt) <- c("F1A1","F1A2","F1A3","F1A4","F2A1","F2A2","F2A3","F2A4","F3A1","F3A2","F3A3","F3A4")
##Order: 3 factory capacities, 4 customer demands
cons_indep_term <- c(60, 70, 80, 75, 45, 40, 50)
names(cons_indep_term) <- c("F1","F2","F3","A1","A2","A3","A4")
r1 <- c(1,1,1,1,0,0,0,0,0,0,0,0)
r2 <- c(0,0,0,0,1,1,1,1,0,0,0,0)
r3 <- c(0,0,0,0,0,0,0,0,1,1,1,1)
r4 <- c(1,0,0,0,1,0,0,0,1,0,0,0)
r5 <- c(0,1,0,0,0,1,0,0,0,1,0,0)
r6 <- c(0,0,1,0,0,0,1,0,0,0,1,0)
r7 <- c(0,0,0,1,0,0,0,1,0,0,0,1)
cons_coef <- rbind(r1,r2,r3,r4,r5,r6,r7)
res <- solveLP(function_opt, cons_indep_term, cons_coef, maximum=FALSE, const.dir = c("<=","<=","<=",">=",">=",">=",">="))
print (res)
The sixth constraint requires that the sum of the FxA3 variables must be at least 40. However, the solution that comes out as:
Results of Linear Programming / Linear Optimization
Objective function (Minimum): 1355
Iterations in phase 1: 6
Iterations in phase 2: 3
Solution
opt
F1A1 10
F1A2 0
F1A3 0
F1A4 50
F2A1 30
F2A2 0
F2A3 0
F2A4 0
F3A1 35
F3A2 45
F3A3 0
F3A4 0
Basic Variables
opt
F1A1 10
F1A4 50
F2A1 30
F3A1 35
F3A2 45
S F2 40
S A3 40
Constraints
actual dir bvec free dual dual.reg
F1 60 <= 60 0 1 10
F2 30 <= 70 40 0 40
F3 80 <= 80 0 2 35
A1 75 >= 75 0 9 40
A2 45 >= 45 0 10 35
A3 80 >= 40 40 0 40
A4 50 >= 50 0 9 10
All Variables (including slack variables)
opt cvec min.c max.c marg marg.reg
F1A1 10 8 -9 9 NA NA
F1A2 0 13 99 77 4 10
F1A3 0 9 99 77 10 10
F1A4 50 8 -17 9 NA NA
F2A1 30 9 -10 10 NA NA
F2A2 0 11 99 77 1 30
F2A3 0 12 99 77 12 40
F2A4 0 10 99 77 1 30
F3A1 35 7 -8 9 NA NA
F3A2 45 8 -18 9 NA NA
F3A3 0 10 99 77 12 35
F3A4 0 9 99 77 2 35
S F1 0 0 -1 Inf 1 10
S F2 40 0 NA 1 0 NA
S F3 0 0 -2 Inf 2 35
S A1 0 0 -9 Inf 9 40
S A2 0 0 -10 Inf 10 35
S A3 40 0 NA 10 0 NA
S A4 0 0 -9 Inf 9 10
All three FxA3 variables are set to 0, meaning the sixth constraint is violated. What is the problem? I have triplechecked everything but still no idea.
This is... very weird, and I can't spot any issue in your code. Since this is such a simple LP, you might think about filing a bug to the package maintainers.
That being said, you should be able to get unblocked by using the lpSolve package, which has a nearly identical interface (and is, in fact, used by the linprog package you asked about):
library(lpSolve)
mod2 = lp(direction = "min",
objective.in = function_opt,
const.mat = cons_coef,
const.dir = c("<=","<=","<=",">=",">=",">=",">="),
const.rhs = cons_indep_term)
setNames(mod2$solution, names(function_opt))
# F1A1 F1A2 F1A3 F1A4 F2A1 F2A2 F2A3 F2A4 F3A1 F3A2 F3A3 F3A4
# 0 0 40 20 40 0 0 30 35 45 0 0
mod2$objval
# [1] 1785
In the optimal solution, we have F1A3 = 40, F2A3 = 0, and F3A3 = 0, so the sixth constraint is met.
What's the right way to do in R:
for(row in 1:10)
{
counts[row] <- length(otherData[otherData[["some property"]] == otherList[row],])
}
In other words, put into each row of a new anything (matrix, data.frame, whatever) the count of those rows in another anything (matrix, data.frame, whatever) that equal the corresponding entry in some other list (again abstractly speaking, not literally list object)?
E.g. say x = otherData is
a b c
d 1 2 3
e 1 3 4
f 2 5 6
g 1 5 3
And say the "otherList" is the first column of x, so I want to count how many of x's rows have each of 1, 2, 3, etc. first
So I want counts to be
3,
1,
0,
(0s as long as this counts list goes)
Note it's more important that I be able to select out that data subset than that I get its length; I need to use the subset for other computations as well, though again want to select it out row-by-row and have the output of whatever computations I do stored in the row of the results (in this case counts) matrix.
I can obviously do this with a for loop, but what's the clever way to skip the loop?
Apologies if this duplicates another question. This seems like a very basic question, but I'm not sure what terms to search for. This question seems similar and would work for getting lengths, though I'm not clear on how to apply it in the general case.
EDIT
Here's an example. We select certain rows of x (here x is like otherData in my description above) that satisfy some row-dependent condition, in this case having a first col entry = to row, but the point is that "== row" could be replaced with any condition on row, e.g. "<= otherlist[row]-2" etc.
> x
condition value
1 2 25
2 9 72
3 41 60
4 41 61
5 25 38
6 41 10
7 41 43
8 41 26
9 41 46
10 12 263
11 26 136
12 24 107
13 9 70
14 12 62
15 12 136
16 34 44
17 12 53
18 32 14
19 32 148
20 4 34
> results = 0*1:20
> results
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> for(row in 1:20) {
+ results[row] = length(x[x[["condition"]]==row,2]) }
> results
[1] 0 1 0 1 0 0 0 0 2 0 0 4 0 0 0 0 0 0 0 0
Edited:
sapply( 1:20, function(z) sum(x[["condition"]] == z) )
#[1] 0 1 0 1 0 0 0 0 2 0 0 4 0 0 0 0 0 0 0 0
You would be able to substitute a different logical test and the sum would be the number of qualifying rows. (I was never able to figure out why you were using column number 2.) If you were hoping to select out a subset of rows that met a condition (which your example was not illustrating) then you could use this:
x[ x[,1] == test , ] " e.g.
> x[ x$condition == 9, ]
condition value
2 9 72
13 9 70
Or if you only wanted the column 'value' that corresponded to the tested 'condition' column , then use:
> x[ x[['condition']] == 9, "value" ]
[1] 72 70
If you want to apply functions to selected (disjoint) subsets of x and you can create a factor variable as long as the dataframe then you can use aggregate or by to process the split up lists. If you want to use the sapply formalism above, here's an example that computes the separate means for subsets of "values" for rows having rownames that are in "condition":
> sapply( rownames(x), function(z) mean( x[x[["condition"]] == z , "value"]) )
[1] NaN 25.0 NaN 34.0 NaN NaN NaN NaN 71.0 NaN NaN 128.5 NaN NaN NaN NaN
[17] NaN NaN NaN NaN
What about table?
table(factor(x[, 1], x[1, ]))
#
# 1 2 3
# 3 1 0
Update
Using the second x table in your question, same solution:
table(factor(x$condition, rownames(x)))
#
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# 0 1 0 1 0 0 0 0 2 0 0 4 0 0 0 0 0 0 0 0
Also, try match:
match(x$condition, rownames(x))
# [1] 2 9 NA NA NA NA NA NA NA 12 NA NA 9 12 12 NA 12 NA NA 4
table(match(x$condition, rownames(x)))
#
# 2 4 9 12
# 1 1 2 4
> a <- c(seq(1,10))
> a
[1] 1 2 3 4 5 6 7 8 9 10
> d <- cbind(a,a)
> d
a a
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
[5,] 5 5
[6,] 6 6
[7,] 7 7
[8,] 8 8
[9,] 9 9
[10,] 10 10
> d[,2]
[1] 1 2 3 4 5 6 7 8 9 10
> d[,2] <- d[,1]*2
> d
a a
[1,] 1 2
[2,] 2 4
[3,] 3 6
[4,] 4 8
[5,] 5 10
[6,] 6 12
[7,] 7 14
[8,] 8 16
[9,] 9 18
[10,] 10 20
>