R - Insert a True every nth row based on previous rows - r

Test data Frame:
a<-data.frame(True_False = c(T,F,F,F,F,T,F,T,T,T,F,F,F,F,F,F,F,F))
True_False
1 TRUE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 TRUE
7 FALSE
8 TRUE
9 TRUE
10 TRUE
11 FALSE
12 FALSE
13 FALSE
14 FALSE
15 FALSE
16 FALSE
17 FALSE
18 FALSE
Using this, I would like to edit this column or make a new one which has a True at least once every third row. Meaning I would need to check the current row, if False, and if the previous two rows are False, then make it a True. Otherwise leave it as it is. Using Zoo, Dplyr, and Rollapply, I get close.
library(zoo)
library(tidyverse)
b<-a%>%
mutate(Roll = ifelse(rollapplyr(Input,3,sum, partial = T) == 0,T,Input))
b$Desired<-c(T,F,F,T,F,T,F,T,T,T,F,F,T,F,F,T,F,F)
Input Roll Desired
1 TRUE TRUE TRUE
2 FALSE FALSE FALSE
3 FALSE FALSE FALSE
4 FALSE TRUE TRUE
5 FALSE TRUE FALSE
6 TRUE TRUE TRUE
7 FALSE FALSE FALSE
8 TRUE TRUE TRUE
9 TRUE TRUE TRUE
10 TRUE TRUE TRUE
11 FALSE FALSE FALSE
12 FALSE FALSE FALSE
13 FALSE TRUE TRUE
14 FALSE TRUE FALSE
15 FALSE TRUE FALSE
16 FALSE TRUE TRUE
17 FALSE TRUE FALSE
18 FALSE TRUE FALSE
Essentially my issue is that it will rollapply the sum to the whole column, and then add the Trues after. Thus, we have Trues that are not necessary. So is there a way I can do this in which the True is applied before going to the next row? I assume I need to use an apply of some sort, but that is an area I'm not familiar with, and even reading the documentation I'm not sure how to do this directly.

Due to the fact that you need to update your vector on the fly to process further operations, I'd say a simple for-loop is the way to go:
for(i in 3:nrow(a)){
a$True_False[i] <- ifelse(sum(a$True_False[(i-2):i]) == 0, T, a$True_False[i])
}
> a
True_False
1 TRUE
2 FALSE
3 FALSE
4 TRUE
5 FALSE
6 TRUE
7 FALSE
8 TRUE
9 TRUE
10 TRUE
11 FALSE
12 FALSE
13 TRUE
14 FALSE
15 FALSE
16 TRUE
17 FALSE
18 FALSE

Looks like you need something like this. Here is one approach (not the cleanest):
a<-data.frame(True_False = c(T,F,F,F,F,T,F,T,T,T,F,F,F,F,F,F,F,F))
a$Desired<-NA
a$Desired[4:nrow(a)]<-sapply(4:nrow(a),function(z){
if(z%%3==1 & a$True_False[z]==F & a$True_False[z-1]==F & a$True_False[z-2]==F){a$True_False[z]<-T}else{a$True_False[z]}
})
a$Desired[1:3]<-a$True_False[1:3]

Define an update function f and run it through Reduce.
f <- function(x, i) {
if (i >= 3 && all(!x[seq(to = i, length = 3)])) x[i] <- TRUE
x
}
transform(a, new = Reduce(f, init = True_False, seq_along(True_False)))
giving:
True_False new
1 TRUE TRUE
2 FALSE FALSE
3 FALSE FALSE
4 FALSE TRUE
5 FALSE FALSE
6 TRUE TRUE
7 FALSE FALSE
8 TRUE TRUE
9 TRUE TRUE
10 TRUE TRUE
11 FALSE FALSE
12 FALSE FALSE
13 FALSE TRUE
14 FALSE FALSE
15 FALSE FALSE
16 FALSE TRUE
17 FALSE FALSE
18 FALSE FALSE

Related

Select entries from data.frame, based on column names and values stored in a list

I have a data.frame similar to this:
mydf=data.frame(LETTERS=LETTERS, rev_letters=rev(letters), var1=c(rep('a',10),rep('b',10),rep('c',6)), value=1:26)
> head(mydf)
LETTERS rev_letters var1 value
1 A z a 1
2 B y a 2
3 C x a 3
4 D w a 4
5 E v a 5
6 F u a 6
I want to select the row indexes that correspond to the columns and values stored in a list, like this one:
mylist=list(LETTERS=c('A','M','X'), var1='b')
> mylist
$LETTERS
[1] "A" "M" "X"
$var1
[1] "b"
I would like to do something like the following, but for all columns and values at once:
> which(mydf[,names(mylist)[1]] %in% mylist[[1]])
[1] 1 13 24
... or even better as a TRUE/FALSE variable:
> mydf[,names(mylist)[1]] %in% mylist[[1]]
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[25] FALSE FALSE
The idea is to end up with a single variable of all the indexes for all the columns and values in the list; in the example above, the result would be:
> indexes
[1] 1 11 12 13 14 15 16 17 18 19 20 24
... or the TRUE/FALSE counterpart:
> indexes
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
[25] FALSE FALSE
Thanks!
With %in% + sapply:
mydf=data.frame(LETTERS=LETTERS, rev_letters=rev(letters), var1=c(rep('a',10),rep('b',10),rep('c',6)), value=1:26)
mylist = list(LETTERS = c('A','M','X'), var1 = 'b')
rowSums(sapply(names(mylist), function(x) mydf[[x]] %in% mylist[[x]])) != 0
# [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[11] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#[21] FALSE FALSE FALSE TRUE FALSE FALSE
which(rowSums(sapply(names(mylist), function(x) mydf[[x]] %in% mylist[[x]])) != 0)
#[1] 1 11 12 13 14 15 16 17 18 19 20 24
Loop through names and use which:
sort(unique(unlist(sapply(names(mylist), function(i){
which(mydf[, i] %in% mylist[[ i ]])
}))))
# [1] 1 11 12 13 14 15 16 17 18 19 20 24

Turns thousands of dummy variables into multinomial variable

I have a dataframe of the following sort:
a<-c('q','w')
b<-c(T,T)
d<-c(F,F)
.e<-c(T,F)
.f<-c(F,F)
.g<-c(F,T)
h<-c(F,F)
i<-c(F,T)
j<-c(T,T)
df<-data.frame(a,b,d,.e,.f,.g,h,i,j)
a b d .e .f .g h i j
1 q TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
2 w TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
I want to turn all variables starting with periods at the start into a single multinomial variable called Index such that the second row would have a value 1 for the Index column, the third row would have a value 2, etc. :
df$Index<-c('e','g')
a b d .e .f .g h i j Index
1 q TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE e
2 w TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE g
Although many rows can have a T for any of period-initial variable, each row can be T for only ONE period-initial variable.
If it were just a few items id do an ifelse statement:
df$Index <- ifelse(df$_10000, '10000',...
But there are 12000 of these. The names for all dummy variables begin with underscores, so I feel like there must be a better way. In pseudocode I would say something like:
for every row:
for every column beginning with '_':
if value == T:
assign the name of the column without '_' to a Column 'Index'
Thanks in advance
Sample data:
df <- cbind(a = letters[1:10], b = LETTERS[1:10],
data.frame(diag(10) == 1))
names(df)[-(1:2)] <- paste0("_", 1:10)
set.seed(42)
df <- df[sample(nrow(df)),]
head(df,3)
# a b _1 _2 _3 _4 _5 _6 _7 _8 _9 _10
# 1 a A TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 5 e E FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
# 10 j J FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Execution:
df$Index <- apply(subset(df, select = grepl("^_", names(df))), 1,
function(z) which(z)[1])
df
# a b _1 _2 _3 _4 _5 _6 _7 _8 _9 _10 Index
# 1 a A TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1
# 5 e E FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE 5
# 10 j J FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 10
# 8 h H FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE 8
# 2 b B FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 2
# 4 d D FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE 4
# 6 f F FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE 6
# 9 i I FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE 9
# 7 g G FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE 7
# 3 c C FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 3
If there are more than one TRUE in a row of _-columns, the first found will be used, the remainder silently ignored. If there are none, Index will be NA for that row.

Function for mutating, tagging, or identifying records surrounding a condition by a given window

Given a data.frame with some type of a flag or identifier column, I would like to be able to flag the surrounding (leading and lagging) records by some time window parameter, n. So given:
df <- data.frame(
id = letters[1:26],
flag = FALSE
)
df$flag[10] <- TRUE
df$flag[17] <- TRUE
I would like to write something like:
flag_surrounding <- function(flag, n) {
# should flag surrounding -n to +n records with condition flag
}
# expected results for n = 2, n = 1...
df
# id flag flag_n2 flag_n1
# 1 a FALSE FALSE FALSE
# 2 b FALSE FALSE FALSE
# 3 c FALSE FALSE FALSE
# 4 d FALSE FALSE FALSE
# 5 e FALSE FALSE FALSE
# 6 f FALSE FALSE FALSE
# 7 g FALSE FALSE FALSE
# 8 h FALSE TRUE FALSE
# 9 i FALSE TRUE TRUE
# 10 j TRUE TRUE TRUE
# 11 k FALSE TRUE TRUE
# 12 l FALSE TRUE FALSE
# 13 m FALSE FALSE FALSE
# 14 n FALSE FALSE FALSE
# 15 o FALSE TRUE FALSE
# 16 p FALSE TRUE TRUE
# 17 q TRUE TRUE TRUE
# 18 r FALSE TRUE TRUE
# 19 s FALSE TRUE FALSE
# 20 t FALSE FALSE FALSE
# 21 u FALSE FALSE FALSE
# 22 v FALSE FALSE FALSE
# 23 w FALSE FALSE FALSE
# 24 x FALSE FALSE FALSE
# 25 y FALSE FALSE FALSE
# 26 z FALSE FALSE FALSE
I started writing some things using dplyr::lead and dplyr::lag and variants with cumsum, but I felt like this is already in a package somewhere, but couldn't find it quickly (and not really sure how to phrase this as a question for googling) - maybe someone has better recall than me :)
The following does the trick (using ideas from this post), but feels a bit clunky and error prone. I'd be curious to get other approaches/techniques and/or something more robust from a package.
library(dplyr)
flag_surrounding <- function(flag, n) {
as.logical(cumsum(lead(flag, n, default = FALSE)) - cumsum(lag(flag, n + 1, default = FALSE)))
}
df %>%
mutate(flag_n2 = flag_surrounding(flag, 2),
flag_n1 = flag_surrounding(flag, 1))
Here's a simple solution in base:
set.seed(4)
df <- data.frame(
id = letters[1:26],
flag = as.logical(rbinom(n = 26, size = 1, prob = 0.1))
)
lead_lag_flag = function(x, n) {
flagged = which(x)
to_flag = sapply(flagged, function(z) (z - n):(z + n))
to_flag = pmax(0, to_flag)
to_flag = pmin(length(x), to_flag)
to_flag = unique(to_flag)
new_flag = rep(FALSE, length(x))
new_flag[to_flag] = TRUE
return(new_flag)
}
df$flag_n1 = lead_lag_flag(df$flag, 1)
df$flag_n2 = lead_lag_flag(df$flag, 2)
df
# id flag flag_n1 flag_n2
# 1 a FALSE FALSE FALSE
# 2 b FALSE FALSE FALSE
# 3 c FALSE FALSE FALSE
# 4 d FALSE FALSE FALSE
# 5 e FALSE FALSE FALSE
# 6 f FALSE FALSE TRUE
# 7 g FALSE TRUE TRUE
# 8 h TRUE TRUE TRUE
# 9 i TRUE TRUE TRUE
# 10 j FALSE TRUE TRUE
# 11 k FALSE FALSE TRUE
# 12 l FALSE FALSE TRUE
# 13 m FALSE TRUE TRUE
# 14 n TRUE TRUE TRUE
# 15 o FALSE TRUE TRUE
# 16 p FALSE TRUE TRUE
# 17 q TRUE TRUE TRUE
# 18 r FALSE TRUE TRUE
# 19 s TRUE TRUE TRUE
# 20 t FALSE TRUE TRUE
# 21 u FALSE TRUE TRUE
# 22 v TRUE TRUE TRUE
# 23 w FALSE TRUE TRUE
# 24 x FALSE FALSE TRUE
# 25 y FALSE FALSE FALSE
# 26 z FALSE FALSE FALSE
Another base alternative:
n <- 1
nm <- paste0("flag", n)
i <- -n:n
df[ , nm] <- FALSE
ix <- rep(which(df$flag), each = length(i)) + i
ix <- ix[ix > 0 & ix <= nrow(d)]
df[ix, nm] <- TRUE
df
# id flag flag1
# 1 a FALSE FALSE
# 2 b FALSE FALSE
# 3 c FALSE FALSE
# 4 d FALSE FALSE
# 5 e FALSE FALSE
# 6 f FALSE FALSE
# 7 g FALSE FALSE
# 8 h FALSE FALSE
# 9 i FALSE TRUE
# 10 j TRUE TRUE
# 11 k FALSE TRUE
# 12 l FALSE FALSE
# 13 m FALSE FALSE
# 14 n FALSE FALSE
# 15 o FALSE FALSE
# 16 p FALSE TRUE
# 17 q TRUE TRUE
# 18 r FALSE TRUE
# 19 s FALSE FALSE
# 20 t FALSE FALSE
# 21 u FALSE FALSE
# 22 v FALSE FALSE
# 23 w FALSE FALSE
# 24 x FALSE FALSE
# 25 y FALSE FALSE
# 26 z FALSE FALSE

R: Generate new data frame column based on a mapping of multiple (logical) columns

Clarification of 'map' or 'ordering' at bottom of post
Imagine we have a data frame with several logical columns, and a 'map' which, for specific combinations of those logical columns, gives a value.
What is the best/most efficient way to compute the value associated with each row of the data frame.
I have three possible solutions below: ifelse(), merge() and table(). I'd appreciate any comments or alternative solutions.
[Apologies, a rather long post]
Consider the following example data frame:
# Generate example
#N <- 15
#Data <- data.frame(A=sample(c(FALSE,TRUE),N,TRUE,c(8,2)),
# B=sample(c(FALSE,TRUE),N,TRUE,c(6,4)),
# C=sample(c(FALSE,TRUE),N,TRUE,c(7,3)),
# D=sample(c(FALSE,TRUE),N,TRUE,c(7,3)))
# Specific example used in this question
Data <- structure(list(A = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE),
B = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE), C = c(FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, TRUE, FALSE, FALSE, FALSE), D = c(TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, TRUE, FALSE)), .Names = c("A", "B", "C", "D"),
class = "data.frame", row.names = c(NA,-15L))
A B C D
1 FALSE FALSE FALSE TRUE
2 FALSE FALSE FALSE FALSE
3 FALSE TRUE FALSE FALSE
4 TRUE FALSE FALSE FALSE
5 FALSE FALSE FALSE FALSE
6 FALSE TRUE FALSE FALSE
7 FALSE TRUE FALSE FALSE
8 FALSE FALSE FALSE FALSE
9 FALSE FALSE FALSE FALSE
10 TRUE FALSE TRUE TRUE
11 FALSE TRUE FALSE TRUE
12 FALSE FALSE TRUE FALSE
13 FALSE TRUE FALSE FALSE
14 FALSE FALSE FALSE TRUE
15 FALSE FALSE FALSE FALSE
Combined with the following map:
# A -> B -> C
# \_ D
### To clarify, if someone has both B & D TRUE (with C FALSE), D is higher than B
### i.e. there can be no ties
This defines an ordering of the logical columns. The final value I want is the 'highest' column within each row. Such that, if column C is true we return C always. We only return "D" if C is FALSE and D is true.
The naive way to do this would be nested ifelse statements:
Data$Highest <- with(Data, ifelse( C, "C",
ifelse( D, "D",
ifelse( B, "B",
ifelse( A, "A", "none")
)
)
)
)
But that code is difficult to read/maintain and gets very complicated for complex orderings with many columns.
I can quickly generate a mapping from the column combinations to the desired output:
Map <- expand.grid( lapply( lapply( Data[c("A","B","C","D")], unique ), sort ) )
Map$Value <- factor(NA, levels=c("A","B","C","D","none"))
Map$Value[which(Map$A)] <- "A"
Map$Value[which(Map$B)] <- "B"
Map$Value[which(Map$D)] <- "D"
Map$Value[which(Map$C)] <- "C"
Map$Value[which(is.na(Map$Value))] <- "none"
A B C D Value
1 FALSE FALSE FALSE FALSE none
2 TRUE FALSE FALSE FALSE A
3 FALSE TRUE FALSE FALSE B
4 TRUE TRUE FALSE FALSE B
5 FALSE FALSE TRUE FALSE C
6 TRUE FALSE TRUE FALSE C
7 FALSE TRUE TRUE FALSE C
8 TRUE TRUE TRUE FALSE C
9 FALSE FALSE FALSE TRUE D
10 TRUE FALSE FALSE TRUE D
11 FALSE TRUE FALSE TRUE D
12 TRUE TRUE FALSE TRUE D
13 FALSE FALSE TRUE TRUE C
14 TRUE FALSE TRUE TRUE C
15 FALSE TRUE TRUE TRUE C
16 TRUE TRUE TRUE TRUE C
Which can be used with merge():
merge( Data, Map, by=c("A","B","C","D"), all.y=FALSE )
A B C D Highest Value
1 FALSE FALSE FALSE FALSE none none
2 FALSE FALSE FALSE FALSE none none
3 FALSE FALSE FALSE FALSE none none
4 FALSE FALSE FALSE FALSE none none
5 FALSE FALSE FALSE FALSE none none
6 FALSE FALSE FALSE TRUE D D
7 FALSE FALSE FALSE TRUE D D
8 FALSE FALSE TRUE FALSE C C
9 FALSE TRUE FALSE FALSE B B
10 FALSE TRUE FALSE FALSE B B
11 FALSE TRUE FALSE FALSE B B
12 FALSE TRUE FALSE FALSE B B
13 FALSE TRUE FALSE TRUE D D
14 TRUE FALSE FALSE FALSE A A
15 TRUE FALSE TRUE TRUE C C
However, the merge() function does not currently preserve the row order. There are ways round this though.
My final idea was to use a 4-dimensional table with character entries corresponding to the map:
Map2 <- table( lapply( Data[c("A","B","C","D")], unique ) )
Map2[] <- "none"
Map2["TRUE",,,] <- "A"
Map2[,"TRUE",,] <- "B"
Map2[,,,"TRUE"] <- "D"
Map2[,,"TRUE",] <- "C"
But I find the above lines unclear (perhaps there is a better way to make the table? I thought it would be possible to turn Map into Map2, but I couldn't see how).
We then use matrix-indexing to pull out the corresponding value:
BOB <- as.matrix(Data[c("A","B","C","D")])
cBOB <- matrix(as.character(BOB),nrow=NROW(BOB),ncol=NCOL(BOB),dimnames=dimnames(BOB))
Data$Alt.Highest <- Map2[cBOB]
A B C D Highest Alt.Highest
1 FALSE FALSE FALSE TRUE D D
2 FALSE FALSE FALSE FALSE none none
3 FALSE TRUE FALSE FALSE B B
4 TRUE FALSE FALSE FALSE A A
5 FALSE FALSE FALSE FALSE none none
6 FALSE TRUE FALSE FALSE B B
7 FALSE TRUE FALSE FALSE B B
8 FALSE FALSE FALSE FALSE none none
9 FALSE FALSE FALSE FALSE none none
10 TRUE FALSE TRUE TRUE C C
11 FALSE TRUE FALSE TRUE D D
12 FALSE FALSE TRUE FALSE C C
13 FALSE TRUE FALSE FALSE B B
14 FALSE FALSE FALSE TRUE D D
15 FALSE FALSE FALSE FALSE none none
So in summary, is there a better way to achieve this 'mapping' type operation and any thoughts on the efficiency of these methods?
For the application I'm interested in, I have nine columns and an ordering chart with three branches to apply to 3000 rows. Essentially I am trying to construct a factor based on an awkward data storage format. So clarity of code is my first priority, with speed/memory efficiency my second.
Thanks in advance.
P.S. Suggestions for amending the question title also welcome.
Clarification
The real application involves a questionnaire with 9 questions asking whether the respondent has achieved a given education/qualification level. These are binary yes/no responses.
What we want is to generate a new variable 'highest qualification achieved'.
The problem is that the 9 levels don't form a simple stack. For example, professional qualifications can be achieved without going to university (especially in older respondents).
We have designed an 'map' or 'ordering' such that, for every combination of responses we have a 'highest qualification' (this order is subjective, hence the desire to make it simple to implement alternative orders).
# So given the nine responses: A, B, C, D, E, F, G, H, I
# we define an ordering as:
# D > C > B > A
# F > E
# E > A
# E == B
# I > H
# H == B
# G == B
# which has a set of order relationships. There is equality in this example
# A -> B -> C -> D
# \_ E -> F
# \_ H -> I
# \_ G
# 0 1 2 3 4
# We could then have five levels in out final 'highest' ordered factor: none, 1, 2, 3, 4
# Or we could decide to add more levels to break certain ties.
The R question is, given an ordering (and what to do with ties) that map combinations of the logical columns to a 'highest achieved' value. How best to implement this in R.
I think I might not understand your concept of 'ordering'. If it is the case that no ties are allowed, and you know exactly how each letter compares to all others, that means that there is a strict ordering, that can be broken down into a simple vector from highest to lowest. If this isn't true, then maybe you could give a more difficult example. If it is true, then you could code this quite easily like:
order<-c('C','D','B','A')
reordered.Data<-Data[order]
Data$max<-
c(order,'none')[apply(reordered.Data,1,function(x) min(which(c(x,TRUE))))]
# A B C D max
# 1 FALSE FALSE FALSE TRUE D
# 2 FALSE FALSE FALSE FALSE none
# 3 FALSE TRUE FALSE FALSE B
# 4 TRUE FALSE FALSE FALSE A
# 5 FALSE FALSE FALSE FALSE none
# 6 FALSE TRUE FALSE FALSE B
# 7 FALSE TRUE FALSE FALSE B
# 8 FALSE FALSE FALSE FALSE none
# 9 FALSE FALSE FALSE FALSE none
# 10 TRUE FALSE TRUE TRUE C
# 11 FALSE TRUE FALSE TRUE D
# 12 FALSE FALSE TRUE FALSE C
# 13 FALSE TRUE FALSE FALSE B
# 14 FALSE FALSE FALSE TRUE D
# 15 FALSE FALSE FALSE FALSE none
I think I now understand your concept of 'ordering'. However, I think that you can safely ignore it at first. For example, G is the same level as B. But G and B will never be compared; you can only have one of {B,E,H,G}. So, as long as each "branch" is in the correct order, it won't matter. If you provided some sample data for your new branching, I could test this, but try something like this:
order<-c(D,C,F,I,B,E,H,G,A)
levs<-c(4,3,3,3,2,2,2,2,1)
names(levs)<-order
reordered.Data<-Data[order]
Data$max<-
c(order,'none')[apply(reordered.Data,1,function(x) min(which(c(x,TRUE))))]
Data$lev<-levs[Data$max]
Here's a data.table approach:
require(data.table)
DT <- data.table(Data)
valord <- c('none','A','B','D','C')
DT[,val:={
vals <- c('none'=TRUE,unlist(.SD))[valord]
names(vals)[max(which(vals))]
},by=1:nrow(DT)]
The result is
A B C D val
1: FALSE FALSE FALSE TRUE D
2: FALSE FALSE FALSE FALSE none
3: FALSE TRUE FALSE FALSE B
4: TRUE FALSE FALSE FALSE A
5: FALSE FALSE FALSE FALSE none
6: FALSE TRUE FALSE FALSE B
7: FALSE TRUE FALSE FALSE B
8: FALSE FALSE FALSE FALSE none
9: FALSE FALSE FALSE FALSE none
10: TRUE FALSE TRUE TRUE C
11: FALSE TRUE FALSE TRUE D
12: FALSE FALSE TRUE FALSE C
13: FALSE TRUE FALSE FALSE B
14: FALSE FALSE FALSE TRUE D
15: FALSE FALSE FALSE FALSE none
If you run
class(DT) # [1] "data.table" "data.frame"
you'll see that this is a data.frame, like your "Data," and the same functions can be applied to it.

find the biggest change in a time series

I have a time series in R
e.g.
[1] 0.2 0.6 0.4 -0.2 -0.1 0.3 0.8 0.7
How can I find out the biggest change in the series? (from point 4 to 7 biggest change =1)
How can I find out were a change of e.g. 1 is? (again from 4 (= -0.2) to 7 (= 0,8)
To calculate the distance matrix for a set of points, you can use the dist function. After that it is just a matter of selecting the point pair with the highest distance between them. In code:
m = as.matrix(dist(runif(10)))
m == max(m)
1 2 3 4 5 6 7 8 9 10
1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
5 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
6 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
7 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
8 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
9 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
10 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
which(m == max(m), arr.ind = TRUE)[1,]
row col
10 6
You can use expand.grid here.
exg <- expand.grid(x, x)
exg[apply(exg, 1, diff) == VALUE.TO.FIND, ] # notice the ', ' (comma-and-space)
Var1 Var2
52 -0.2 0.8
where VALUE.TO.FIND is whichever specific value you are seraching for
If instead you want to find the maximum distance:
dist <- apply(exg, 1, diff)
exg[dist == max(dist), ]
To get the biggest change in a list, just iterate through it and get the max and min values. Then compare them. It's in O(n) time. It's dirt simple.
To find a certain change is a little more complex. Don't know why you'd want it, but it's still possible. One way would be to call the first function you just wrote with every combination of start index and end indexes of the list. That's a little more computationally complex, but it's the simplest way of implementing it. Then when you get the change from position 1 to 2, you can check to see if it's what you want, if not, 1-3. Eventually you'll get to n-1 to n, and if that's not the change you're looking for, then it's not in the set.
This method will be in O(n^2).

Resources