R data.table selecting the previous row within group blocks - r

I have the following example data frame.
id value
a 3
a 4
a 8
b 9
b 8
I want to convert it so that I can calculate differences in the column "value" between successive rows. So the expected result is
id value prevValue
a 3 0
a 4 3
a 8 4
b 9 0
b 8 9
Notice within each group I want the sequence of values to start with a 0 and successive values are from the one prior. I tried the following
x = x[,list(
prevValue = c(0,value[1:(.N-1)])
),by=id]
but no luck.
Thanks in advance.

Use negative indexing, something like:
x[,prev.value := c(0,value[-.N]) ,by=id]

Without data.table:
with(dat,ave(value,id,FUN=function(x) c(0,head(x,-1))))
[1] 0 3 4 0 9

Related

For each row return the multiple column indexs for specific number

Hi suppose I have a matrix with 0 an 1 only and I want to find out where 1 locates in each row. And for each row, there are multiple 1 exist.
For example I have
set.seed(444)
m3 <- matrix(round(runif(8*8)), 8,8)
For the first row I have column 2,3,8 are 1 and I want a code could report either column name or column index. Meanwhile, it is worth to point out that each the number of 1 in each row could be different.
Can anyone provide some suggestions? I appreciate it so much.
We can use which with arr.ind which returns the row/column index as a matrix
out <- which(m3 ==1, arr.ind = TRUE)
out[,2][order(out[,1])]
[1] 2 3 8 3 5 3 4 8 7 4 6 7 1 3 4 6 1 4 5 6 7 2 4 7 8
To get the column name, use the same index (if the matrix have any column names- here there are not column names attribute)
colnames(m3)[out[,2][order(out[,1])]]

Use if-else function on data frame with multiple values

I have a data frame that contains multiple values in each spot, like this:
ID<-c(1,1,1,2,2,2,2,3,3,4,4,4,5,6,6)
W<-c(29,72,32,33,34,44,42,78,32,42,18,26,10,34,39)
df1<-data.frame(ID, W)
df<-ddply(df1, .(ID), summarize,
X=paste(unique(W),collapse=","))
ID X
1 1 29,72,32
2 2 33,34,44,42
3 3 78,32
4 4 42,18,26
5 5 10
6 6 34,39
I am trying to generate another column using an if-else function so that every ID that has an X value greater than 70 will show a 1, and all others will show a 0, like this:
ID X Y
1 1 29,72,32 1
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
This is the code that I tried:
df$Y <- ifelse(df$X>=70, 1, 0)
But it doesn't work; it only seems to put the first value of each spot through the function:
ID X Y
1 1 29,72,32 0
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
It worked fine on my one column that has only one value per spot. Is there a way to get to the if-else function to evaluate every value in each spot and assign a 1 if any of them fit the statement?
Thank you, I'm sorry that I do not know a lot of R vocabulary yet.
As 'X' is a string, we can split the 'X' at the , to create a list of vectors, loop over the list with map check if there are any numeric converted values are greater than 70
library(dplyr)
library(purrr)
df %>%
mutate(Y = map_int(strsplit(X, ","), ~ +(any(as.numeric(.x) > 70))))

How do I identifying the first zero in a group of ordered columns?

I'm trying to format a dataset for use in some survival analysis models. Each row is a school, and the time-varying columns are the total number of students enrolled in the school that year. Say the data frame looks like this (there are time invariate columns as well).
Name total.89 total.90 total.91 total.92
a 8 6 4 0
b 1 2 4 9
c 7 9 0 0
d 2 0 0 0
I'd like to create a new column indicating when the school "died," i.e., the first column in which a zero appears. Ultimately I'd like to have this column be "years since 1989" and can re-name columns accordingly.
A more general version of the question, for a series of time ordered columns, how do I identify the first column in which a given value occurs?
Here's a base R approach to get a column with the first zero (x = 0) or NA if there isn't one:
data$died <- apply(data[, -1], 1, match, x = 0)
data
# Name total.89 total.90 total.91 total.92 died
# 1 a 8 6 4 0 4
# 2 b 1 2 4 9 NA
# 3 c 7 9 0 0 3
# 4 d 2 0 0 0 2
Here is an option using max.col with rowSums
df1$died <- max.col(!df1[-1], "first") * NA^!rowSums(!df1[-1])
df1$died
#[1] 4 NA 3 2

sequential counting with input from more than one variable in r

I want to create a column with sequential values but it gets its value from input from two other columns in the df. I want the value to sequentially count if either Team changes (between 1 and 2) or Event = x. Any help would be appreciated! See example below:
Team Event Value
1 1 a 1
2 1 a 1
3 2 a 2
4 2 x 3
5 2 a 3
6 1 a 4
7 1 x 5
8 1 a 5
9 2 x 6
10 2 a 6
This will do it...
df$Value <- cumsum(df$Event=="x" | c(1, diff(df$Team))!=0)
It takes the cumulative sum (i.e. of TRUE values) of those elements where either Event=="x" or the difference in successive values of Team is non-zero. An extra element is added at the start of the diff term to keep it the same length as the original.

R - Select rows where at least X columns matches condition

I am trying to select those rows where at least 4 of the columns have the same value. So far, I have tried the apply function and I can get the rows where any or every row matches.
team.composition[apply(team.composition, 1, function(X) any(as.numeric(X) == 1)),]
This is an example of my table
member.1 member.2 member.3 member.4 member.5
1 3 8 5 3
2 3 2 2 2
7 4 8 8 3
1 8 8 8 8
What I would like is to return the second row (2,3,2,2,2) and the fourth row (1,8,8,8,8).
Any suggestions? Thanks
Try
df1[apply(df1, 1,function(x) any(table(x)>=4)),]
Or
library(reshape2)
df1[!!rowSums(table(melt(as.matrix(df1))[-2])>=4),]

Resources