Creating binary operator in r - r

I need some help on creating a special kind of subtraction.
I have a data table x and I must subtract two columns, say a and b.
However, either column may not exist.
If a column does not exist, its value in the subtraction should be set to zero.
So far, I have approached this problem by trying to define a new subtraction operator, %-%
Thus, for example, if x = data.table(a = 5, b = 2), then a %-% b should be 3, whereas a %-% d should be 5.
I have tried to define this subtraction operator as shown below. However, for some reason, my subtraction always yields zero! Can anyone help me understand what am I doing wrong and how may I correct my code?
library(data.table)
x = data.table(a = floor(10 * runif(5)), b = floor(10 * runif(5)), c =floor(10 * runif(5)))
`%-%` <- function(e1,e2, DT = x){
ifelse(is.numeric(substitute(e1, DT)), e1 <- substitute(e1, DT), e1 <- 0)
ifelse(is.numeric(substitute(e2, DT)), e2 <- substitute(e2, DT), e2 <- 0)
return(e1 - e2)
}
x[, d := a %-% b]
x
x[, d := a %-% d]
x
Many thanks!

We can create a function with intersect for passing the column names into .SDcols, then Reduce by subtracting the corresponding rows of each column in .SD (Subset of Data.table)
f1 <- function(dat, .x, .y) intersect(names(dat), c(.x, .y))
x[, d := Reduce('-', .SD), .SDcols = f1(x, 'a', 'b')]
x[, e := Reduce(`-`, .SD), .SDcols = f1(x, 'a', 'f')]
x
# a b c d e
#1: 7 0 8 7 7
#2: 3 6 4 -3 3
#3: 9 9 8 0 9
#4: 3 6 2 -3 3
#5: 0 2 3 -2 0
Or if we want to change the OP's function by passing unquoted arguments, then use enquo to convert it to from quosure and then reconvert it back to string with quo_name. Create an intersection vector from the column names of the dataset, and use - in the Reduce
library(dplyr)
`%-%` <- function(e1,e2, DT){
e1 <- quo_name(enquo(e1))
e2 <- quo_name(enquo(e2))
nm1 <- intersect(names(DT), c(e1, e2))
DT[, Reduce(`-`, .SD), .SDcols = nm1]
}
x[, d := `%-%`(a, b, .SD)]
x[, e := `%-%`(a, f, .SD)]
data
x <- structure(list(a = c(7L, 3L, 9L, 3L, 0L), b = c(0L, 6L, 9L, 6L,
2L), c = c(8L, 4L, 8L, 2L, 3L)), .Names = c("a", "b", "c"), row.names = c("1:",
"2:", "3:", "4:", "5:"), class = "data.frame")
setDT(x)

`%-%`=function(a,b){
DT=eval(sys.status()$sys.calls[[2]][[2]])
a=substitute(a)
b=substitute(b)
stopifnot(is.name(a),is.name(b),is.data.table(DT))
a=deparse(a)
b=deparse(b)
d=numeric(nrow(DT))
a=if(!exists(a,DT)) d else get(a,DT)
b=if(!exists(b,DT)) d else get(b,DT)
a-b
}
set.seed(5)
x = data.table(a = floor(10 * runif(5)), b = floor(10 * runif(5)), c =floor(10 * runif(5)))
x
a b c
1: 2 7 2
2: 6 5 4
3: 9 8 3
4: 2 9 5
5: 1 1 2
x[,a%-%b]
[1] -5 1 1 -7 0
x[,a%-%f]# F is just a column of zeros since it does not exist:
[1] 2 6 9 2 1
Or you can just do:
x[,c("d","e","f"):=.(a%-%b,a%-%h,g%-%h)]
x
a b c d e f
1: 2 7 2 -5 2 0
2: 6 5 4 1 6 0
3: 9 8 3 1 9 0
4: 2 9 5 -7 2 0
5: 1 1 2 0 1 0
This function is written to work on a datatable only. For example:
setDF(x)[,a%-%b]
Error: is.data.table(DT) is not TRUE
setDT(x)[,a%-%b]
[1] -5 1 1 -7 0
EDIT: This answer gives the correct value with regard to the order. (Most of the answers given below do not pass this test)
setDT(x)[,a%-%b]#Column subtract another
[1] -5 1 1 -7 0
setDT(x)[,b%-%a]#Reversing the order
[1] 5 -1 -1 7 0
setDT(x)[,b%-%b]#Column Subtract itself
[1] 0 0 0 0 0
setDT(x)[,a%-%f]#Column subtract a non-existing column
[1] 2 6 9 2 1
setDT(x)[,f%-%a]#a non-existing column subtract an existing column
[1] -2 -6 -9 -2 -1
x[,g%-%f] #subtract two non-existing columns
[1] 0 0 0 0 0

IIUC, you can try this way. We use exist function to ensure if the column is available in the data.
# helper function
do_sub <- function(df, col1 = 'a', col2='b')
{
ans <- integer()
if (exists(col1, df) & exists(col2, df)){
ans <- append(ans, df[[col1]] - df[[col2]])
} else if (exists(col1, df)){
ans <- append(ans, df[[col1]] - 0)
} else {
ans <- append(ans, 0 - df[[col2]])
}
return (ans)
}
# compute new columns
df[, d := do_sub(.SD, col1 = 'a', col2 = 'b')]
df[, e := do_sub(.SD, col1 = 'a', col2 = 'f')]
print(df)
a b c d e
1: 7 0 8 7 7
2: 3 6 4 -3 3
3: 9 9 8 0 9
4: 3 6 2 -3 3
5: 0 2 3 -2 0

Related

Counting turns of a factor and adding number to the factor

I have some time course data of three people having a conversation. Among other stuff like pitch and intensity, I have the timing information and who was speaking at this point - Person L, Person R, Person B or no one ("0"). A short example looks like this where t is the time in seconds and s is the speaker information:
> t = 1:10
> s = c("L", "0", "L", "0", "R", "B", "R", "0", "0", "L")
> data.frame(t,s)
t a
1 1 L
2 2 0
3 3 L
4 4 0
5 5 R
6 6 B
7 7 R
8 8 0
9 9 0
10 10 L
I would like to add information about the speech turns to the data. One turn is one person speaking including pauses until someone else starts speaking. In the specific example above the goal is the following:
t a goal
1 1 L L1
2 2 0 L1
3 3 L L1
4 4 0 L1
5 5 R R1
6 6 B B1
7 7 R R2
8 8 0 R2
9 9 0 R2
10 10 L L2
I know how to do this with a for loop, however, my data has 600000 rows so that would be super slow. Does anyone have an idea how one could accomplish something like this?
A lot of functions, but quite straightforward:
replace 0 with NAs and fill NAs with the most upwards non-NA value.
create a row_number row
group_by s and create groups based on consecutive row_numbers (this is the main function!)
paste s and gp to the desired value.
library(tidyverse)
data.frame(t, s) %>%
mutate(snew = na_if(s, "0"),
rown = row_number()) %>%
fill(snew) %>%
group_by(snew) %>%
mutate(gp = cumsum(c(TRUE, diff(rown) > 1)), .keep = "unused") %>%
ungroup() %>%
mutate(goal = paste0(snew, gp), .keep = "unused")
t s goal
1 1 L L1
2 2 0 L1
3 3 L L1
4 4 0 L1
5 5 R R1
6 6 B B1
7 7 R R2
8 8 0 R2
9 9 0 R2
10 10 L L2
Your key vector of interest:
s <- c("L", "0", "L", "0", "R", "B", "R", "0", "0", "L")
A base R solution:
## fill "0" using vectorized "last observation carried forward"
zero <- which(s == "0")
logi <- c(TRUE, diff(zero) > 1)
s[zero] <- rep(s[zero[logi] - 1], tabulate(cumsum(logi)))
## generate numeric ID
ID <- with(rle(s), rep(ave(values, values, FUN = seq_along), lengths))
## final `paste0`
paste0(s, ID)
#[1] "L1" "L1" "L1" "L1" "R1" "B1" "R2" "R2" "R2" "L2"
Using data.table:
library(data.table)
chnalocf = \(x) x[nafill(replace(seq_along(x), is.na(x), NA), "locf")]
setDT(df)
df[, s2 := chnalocf(replace(x, x == "0", NA))
][, tmp := rleid(s2)
][, goal := paste0(s2, rleid(tmp)), by = s2
][, !c("s2", "tmp")]
# t s goal
# <int> <char> <char>
# 1: 1 L L1
# 2: 2 0 L1
# 3: 3 L L1
# 4: 4 0 L1
# 5: 5 R R1
# 6: 6 B B1
# 7: 7 R R2
# 8: 8 0 R2
# 9: 9 0 R2
# 10: 10 L L2

Dispatch values in list column to separate columns

I have a data.table with a list column "c":
df <- data.table(a = 1:3, c = list(1L, 1:2, 1:3))
df
a c
1: 1 1
2: 2 1,2
3: 3 1,2,3
I want to create separate columns for the values in "c".
I create a set of new columns F_1, F_2, F_3:
mmax <- max(df$a)
flux <- paste("F", 1:mmax, sep = "_")
df[, (flux) := 0]
df
a c F_1 F_2 F_3
1: 1 1 0 0 0
2: 2 1,2 0 0 0
3: 3 1,2,3 0 0 0
I want to dispatch values in "c" to columns F_1, F_2, F_3 like this:
df
a c F_1 F_2 F_3
1: 1 1 1 0 0
2: 2 1,2 1 2 0
3: 3 1,2,3 1 2 3
What I have tried:
comp_vect <- function(vec, mmax){
vec <- vec %>% unlist()
n <- length(vec)
answr <- c(vec, rep(0, l = mmax -n))
}
df[ , ..flux := mapply(comp_vect, c, mmax)]
The expected data.table is :
> df
a c F_1 F_2 F_3
1: 1 1 1 0 0
2: 2 1,2 1 2 0
3: 3 1,2,3 1 2 3
I followed a radically different approach. I rbinded the list column and then dcasted it, obtaining the desired result. Last part is to set the names.
library(data.table)
df <- data.table(a = 1:3, d = list(1L, c(1L, 2L), c(1L, 2L, 3L)))
df2 <- df[, rbind(d), by = a][, dcast(.SD, a ~ V1, fill = 0)]
setnames(df2, 2:4, flux)[]
a F_1 F_2 F_3
1: 1 1 0 0
2: 2 1 2 0
3: 3 1 2 3
where flux is the variable of names that you defined in your question.
Please notice that avoided using the column name c, as it may be confused with the function c().
Solution :
for(idx in seq(max(sapply(df$c, length)))){ # maximum number of values according to all the elements of the list
set(x = df,
i = NULL,
j = paste0("F_",idx), # column's name
value = sapply(df$c, function(x){
if(is.na(x[idx])){
return(0) # 0 instead of NA
} else {
return(x[idx])
}
})
)
}
Explications :
We can extract the values from a list like this :
sapply(df$c, function(ll) return(ll[1])) # first value
[1] 1 1 1
sapply(df$c, function(ll) return(ll[2])) # second value
[1] NA 2 2
sapply(df$c, function(ll) return(ll[3])) # third value
[1] NA NA 3
We see that if there is no value, we have a NA.
We need an iterator to extract all values at the position idx. For that, we'll find the number of values in each element of df$c (the list) and keep the maximum.
max(sapply(df$c, length))
[1] 3
If we want zeros instead of NAs, we need to create a function in the sapply to convert them :
vec <- c(NA, 5, 1, NA)
> sapply(vec, function(x) if(is.na(x)) return(0) else return(x))
[1] 0 5 1 0

ifelse function group in group in R

I have data set
ID <- c(1,1,2,2,2,2,3,3,3,3,3,4,4,4)
Eval <- c("A","A","B","B","A","A","A","A","B","B","A","A","A","B")
med <- c("c","d","k","k","h","h","c","d","h","h","h","c","h","k")
df <- data.frame(ID,Eval,med)
> df
ID Eval med
1 1 A c
2 1 A d
3 2 B k
4 2 B k
5 2 A h
6 2 A h
7 3 A c
8 3 A d
9 3 B h
10 3 B h
11 3 A h
12 4 A c
13 4 A h
14 4 B k
I try to create variable x and y, group by ID and Eval. For each ID, if Eval = A, and med = "h" or "k", I set x = 1, other wise x = 0, if Eval = B and med = "h" or "k", I set y = 1, other wise y = 0. I use the way I don't like it, I got answer but it seem like not that great
df <- data.table(df)
setDT(df)[, count := uniqueN(med) , by = .(ID,Eval)]
setDT(df)[Eval == "A", x:= ifelse(count == 1 & med %in% c("k","h"),1,0), by=ID]
setDT(df)[Eval == "B", y:= ifelse(count == 1 & med %in% c("k","h"),1,0), by=ID]
ID Eval med count x y
1: 1 A c 2 0 NA
2: 1 A d 2 0 NA
3: 2 B k 1 NA 1
4: 2 B k 1 NA 1
5: 2 A h 1 1 NA
6: 2 A h 1 1 NA
7: 3 A c 3 0 NA
8: 3 A d 3 0 NA
9: 3 B h 1 NA 1
10: 3 B h 1 NA 1
11: 3 A h 3 0 NA
12: 4 A c 2 0 NA
13: 4 A h 2 0 NA
14: 4 B k 1 NA 1
Then I need to collapse the row to get unique ID, I don't know how to collapse rows, any idea?
The output
ID x y
1 0 0
2 1 1
3 0 1
4 0 1
We create the 'x' and 'y' variables grouped by 'ID' without the NA elements directly coercing the logical vector to binary (as.integer)
df[, x := as.integer(Eval == "A" & count ==1 & med %in% c("h", "k")) , by = ID]
and similarly for 'y'
df[, y := as.integer(Eval == "B" & count ==1 & med %in% c("h", "k")) , by = ID]
and summarise it, using any after grouping by "ID"
df[, lapply(.SD, function(x) as.integer(any(x))) , ID, .SDcols = x:y]
# ID x y
#1: 1 0 0
#2: 2 1 1
#3: 3 0 1
#4: 4 0 1
If we need a compact approach, instead of assinging (:=), we summarise the output grouped by "ID", "Eval" based on the conditions and then grouped by 'ID', we check if there is any TRUE values in 'x' and 'y' by looping over the columns described in the .SDcols.
setDT(df)[, if(any(uniqueN(med)==1 & med %in% c("h", "k"))) {
.(x= Eval=="A", y= Eval == "B") } else .(x=FALSE, y=FALSE),
by = .(ID, Eval)][, lapply(.SD, any) , by = ID, .SDcols = x:y]
# ID x y
#1: 1 FALSE FALSE
#2: 2 TRUE TRUE
#3: 3 FALSE TRUE
#4: 4 FALSE TRUE
If needed, we can convert to binary similar to the approach showed in the first solution.
The OP's goal...
"I try to create variable x and y, group by ID and Eval. For each ID, if Eval = A, and med = "h" or "k", I set x = 1, other wise x = 0, if Eval = B and med = "h" or "k", I set y = 1, other wise y = 0. [...] Then I need to collapse the row to get unique ID"
can be simplified to...
For each ID and Eval, flag if all med values are h or all med values are k.
setDT(df) # only do this once
df[, all(med=="k") | all(med=="h"), by=.(ID,Eval)][, dcast(.SD, ID ~ Eval, fun=any)]
ID A B
1: 1 FALSE FALSE
2: 2 TRUE TRUE
3: 3 FALSE TRUE
4: 4 FALSE TRUE
To see what dcast is doing, read ?dcast and try running just the first part on its own, df[, all(med=="k") | all(med=="h"), by=.(ID,Eval)].
The change to use x and y instead of A and B is straightforward but ill-advised (since unnecessary renaming can be confusing and lead to extra work when there are new Eval values); and ditto the change for 1/0 instead of TRUE/FALSE (since the values captured are actually boolean).
Here is my dplyr solution since I find it more readable than data.table.
library(dplyr)
df %>%
group_by(ID, Eval) %>%
mutate(
count = length(unique(med)),
x = ifelse(Eval == "A" &
count == 1 & med %in% c("h", "k"), 1, 0),
y = ifelse(Eval == "B" &
count == 1 & med %in% c("h", "k"), 1, 0)
) %>%
group_by(ID) %>%
summarise(x1 = max(unique(x)),
y1 = max(unique(y)))
A one liner solution for collapsing the rows of your result :
df[,lapply(.SD,function(i) {ifelse(1 %in% i,ifelse(!0 %in% i,1,0),0)}),.SDcols=x:y,by=ID]
ID x y
1: 1 0 0
2: 2 1 1
3: 3 0 1
4: 4 0 1

data.table: “group counter” for a specific combination of columns

I would like to add a counter column in a data frame based on a set of identical rows. To do this, I used the package data.table. In my case, the comparison between rows need doing from the combination of columns "z" AND ("x" OR "y").
I tested:
DF[ , Index := .GRP, by = c("x","y","z") ]
but the result is the combination of "z" AND "x" AND "y".
How can I have the combination of "z" AND ("x" OR "y") ?
Here is a data example:
DF = data.frame(x=c("a","a","a","b","c","d","e","f","f"), y=c(1,3,2,8,8,4,4,6,0), z=c("M","M","M","F","F","M","M","F","F"))
DF <- data.table(DF)
I would like to have this output:
> DF
x y z Index
1: a 1 M 1
2: a 3 M 1
3: a 2 M 1
4: b 8 F 2
5: c 8 F 2
6: d 4 M 3
7: e 4 M 3
8: f 6 F 4
9: f 0 F 4
The new group starts if the value for z is changing or the values both for x and y are changing.
Try this example.
require(data.table)
DF <- data.table(x = c("a","a","a","b","c","d","e","f","f"),
y = c(1,3,2,8,8,4,4,6,0),
z=c("M","M","M","F","F","M","M","F","F"))
# The functions to compare if value is not equal with the previous value
is.not.eq.with.lag <- function(x) c(T, tail(x, -1) != head(x, -1))
DF[, x1 := is.not.eq.with.lag(x)]
DF[, y1 := is.not.eq.with.lag(y)]
DF[, z1 := is.not.eq.with.lag(z)]
DF
DF[, Index := cumsum(z1 | (x1 & y1))]
DF
I know a lot of people warn against a for loop in R, but in this instance I think it is a very direct way of approaching the problem. Plus, the result isn't growing in size so performance issues aren't a large issue. The for loop approach would be:
dt$grp <- rep(NA,nrow(dt))
for (i in 1:nrow(dt)){
if (i == 1){
dt$grp[i] = 1
}
else {
if(dt$z[i-1] == dt$z[i] & (dt$x[i-1] == dt$x[i] | dt$y[i-1] == dt$y[i])){
dt$grp[i] = dt$grp[i-1]
}else{
dt$grp[i] = dt$grp[i-1] + 1
}
}
}
Trying this on OPs original problem, the result is:
DF = data.frame(x=c("a","a","a","b","c","d","e","f","f"), y=c(1,3,2,8,8,4,4,6,0), z=c("M","M","M","F","F","M","M","F","F"))
dt <- data.table(DF)
dt$grp <- rep(NA,nrow(dt))
for (i in 1:nrow(dt)){
if (i == 1){
dt$grp[i] = 1
}
else {
if(dt$z[i-1] == dt$z[i] & (dt$x[i-1] == dt$x[i] | dt$y[i-1] == dt$y[i])){
dt$grp[i] = dt$grp[i-1]
}else{
dt$grp[i] = dt$grp[i-1] + 1
}
}
}
dt
x y z grp
1: a 1 M 1
2: a 3 M 1
3: a 2 M 1
4: b 8 F 2
5: c 8 F 2
6: d 4 M 3
7: e 4 M 3
8: f 6 F 4
9: f 0 F 4
Trying this on the data.table in #Frank's comment, gives the expected result as well:
dt<-data.table(x = c("b", "a", "a"), y = c(1, 1, 2), z = c("F", "F", "F"))
dt$grp <- rep(NA,nrow(dt))
for (i in 1:nrow(dt)){
if (i == 1){
dt$grp[i] = 1
}
else {
if(dt$z[i-1] == dt$z[i] & (dt$x[i-1] == dt$x[i] | dt$y[i-1] == dt$y[i])){
dt$grp[i] = dt$grp[i-1]
}else{
dt$grp[i] = dt$grp[i-1] + 1
}
}
}
dt
x y z grp
1: b 1 F 1
2: a 1 F 1
3: a 2 F 1
EDITED TO ADD: This solution is in some ways a more verbose version of the one advocated by djhurio above. I think it shows what is happening a bit more so I'll leave it.
I think this is a task easier to do if it is broken down a little bit. The below code creates TWO indices at first, one for changes in x (nested in z) and one for changes in y (nested in z). We then find the first row from each of these indices. Taking the cumulative sum of the case where both FIRST.x and FIRST.y is true should give your desired index.
library(data.table)
dt_example <- data.table(x = c("a","a","a","b","c","d","e","f","f"),
y = c(1,3,2,8,8,4,4,6,0),
z = c("M","M","M","F","F","M","M","F","F"))
dt_example[,Index_x := .GRP,by = c("z","x")]
dt_example[,Index_y := .GRP,by = c("z","y")]
dt_example[,FIRST.x := !duplicated(Index_x)]
dt_example[,FIRST.y := !duplicated(Index_y)]
dt_example[,Index := cumsum(FIRST.x & FIRST.y)]
dt_example
x y z Index_x Index_y FIRST.x FIRST.y Index
1: a 1 M 1 1 TRUE TRUE 1
2: a 3 M 1 2 FALSE TRUE 1
3: a 2 M 1 3 FALSE TRUE 1
4: b 8 F 2 4 TRUE TRUE 2
5: c 8 F 3 4 TRUE FALSE 2
6: d 4 M 4 5 TRUE TRUE 3
7: e 4 M 5 5 TRUE FALSE 3
8: f 6 F 6 6 TRUE TRUE 4
9: f 0 F 6 7 FALSE TRUE 4
This approach looks for changes in x & z | y & z. The extra columns are left in the data.table to show the calculations.
DF[, c("Ix", "Iy", "Iz", "dx", "dy", "min.change", "Index") :=
#Create index of values based on consecutive order
list(ix <- rleid(x), iy <- rleid(y), iz <- rleid(z),
#Determine if combinations of x+z OR y+z change
ix1 <- c(0, diff(rleid(ix+iz))),
iy1 <- c(0, diff(rleid(iy+iz))),
#Either combination is constant (no change)?
change <- pmin(ix1, iy1),
#New index based on change
cumsum(change) + 1
)]
x y z Ix Iy Iz dx dy min.change Index
1: a 1 M 1 1 1 0 0 0 1
2: a 3 M 1 2 1 0 1 0 1
3: a 2 M 1 3 1 0 1 0 1
4: b 8 F 2 4 2 1 1 1 2
5: c 8 F 3 4 2 1 0 0 2
6: d 4 M 4 5 3 1 1 1 3
7: e 4 M 5 5 3 1 0 0 3
8: f 6 F 6 6 4 1 1 1 4
9: f 0 F 6 7 4 0 1 0 4

converting multiple rows into a single row based on specific conditions

Can you please suggest how to implement the following in R.
I have a table as given below.
ID object value
1 a 3
2 a 2
3 b 3
4 a 1
5 a 2
6 b 2
7 a 1
8 b 1
I would like to get the following table
ID object values
1 a 3, 2, 1
2 a 2, 1
4 a 1
5 a 2, 1
7 a 1
3 b 3, 2, 1
6 b 2,1
8 b 1
In other words, for each object each row value is appended with the next observed values till the value reaches 1.
Thanks a lot for helping.
Bikas
It is not altogether clear whether
the data will always be ordered decreasing by value
and whether the values should be output in decreasing order
IN any event, I would use the data.table library. Assuming your table is a data.frame, df, I would do the following:
library(data.table)
setDT(df)
df[ values >= 1 ][ by=list( ID, value ), order(value, decreasing=TRUE), values = paste0( value, sep=", " ) ]
What this is doing is:
initializing your data.frame as a data.table
using only rows with values >= 1
ordering the data
grouping by ID and value
pasting value together
Using a modified dataset with 2nd row value as 4
res <- unsplit(lapply(split(df, df$object), function(x) {
x$value <- sapply(seq_len(nrow(x)), function(i) {
i1 <- i:nrow(x)
indx <- which(x$value[i1]==1)[1]
paste(x$value[i1[seq(indx)]], collapse=",")
})
x}),
df$object)
res[order(res$object),]
# ID object value
#1 1 a 3, 4, 1
#2 2 a 4, 1
#4 4 a 1
#5 5 a 2, 1
#7 7 a 1
#3 3 b 3, 2, 1
#6 6 b 2, 1
#8 8 b 1
Also, using data.table
library(data.table)
setDT(df)[, N:=1:.N, by=object][,
values:=unlist(lapply(N, function(i) {
val <- value[i:.N]
paste(val[1:which(val==1)[1]], collapse=",")
})), keyby=object][,-(3:4), with=FALSE]
# ID object values
#1: 1 a 3,4,1
#2: 2 a 4,1
#3: 4 a 1
#4: 5 a 2,1
#5: 7 a 1
#6: 3 b 3,2,1
#7: 6 b 2,1
#8: 8 b 1
Update
If you need the sequence up till the minimum value, you could replace which(x$value[i1]==1 to which(x$value[i1]==min(x$value))[1]. For example, using the first code as a function.
f1 <- function(dat){
lst <- split(dat, dat$object)
lst2 <- lapply(lst, function(x) {
x$value <- sapply(seq_len(nrow(x)), function(i) {
i1 <- i:nrow(x)
indx <- which(x$value[i1]== min(x$value))[1]
paste(x$value[i1[seq(indx)]], collapse=",")
})
x})
res <- unsplit(lst2, dat$object)
res[order(res$object),]
}
f1(df)
# ID object value
#1 1 a 3,4,1
#2 2 a 4,1
#4 4 a 1
#5 5 a 2,1
#7 7 a 1
#3 3 b 3,2,1
#6 6 b 2,1
#8 8 b 1
If I change all the 1 values to 2
df$value[df$value==1] <- 2
f1(df)
# ID object value
#1 1 a 3,4,2
#2 2 a 4,2
#4 4 a 2
#5 5 a 2
#7 7 a 2
#3 3 b 3,2
#6 6 b 2
#8 8 b 2
data
df <- structure(list(ID = 1:8, object = c("a", "a", "b", "a", "a",
"b", "a", "b"), value = c(3L, 4L, 3L, 1L, 2L, 2L, 1L, 1L)), .Names = c("ID",
"object", "value"), class = "data.frame", row.names = c(NA, -8L
))

Resources