Subsequent row summing in dataframe object - r

I would like to do subsequent row summing of a columnvalue and put the result into a new columnvariable without deleting any row by another columnvalue .
Below is some R-code and an example that does the trick and hopefully illustrates my question. I was wondering if there is a more elegant way to do since the for loop will be time consuming in my actual object.
Thanks for any feedback.
As an example dataframe:
MyDf <- data.frame(ID = c(1,1,1,2,2,2), Y = 1:6)
MyDf$FIRST <- c(1,0,0,1,0,0)
MyDf.2 <- MyDf
MyDf.2$Y2 <- c(1,3,6,4,9,15)
The purpose of this is so that I can write code that calculates Y2 in MyDf.2 above for each ID, separately.
This is what I came up with and, it does the trick. (Calculating a TEST column in MyDf that has to be equal to Y2 cin MyDf.2)
MyDf$TEST <- NA
for(i in 1:length(MyDf$Y)){
MyDf[i,]$TEST <- ifelse(MyDf[i,]$FIRST == 1, MyDf[i,]$Y,MyDf[i,]$Y + MyDf[i-1,]$TEST)
}
MyDf
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15
MyDf.2
ID Y FIRST Y2
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15

You need ave and cumsum to get the column you want. transform is just to modify your existing data.frame.
> MyDf <- transform(MyDf, TEST=ave(Y, ID, FUN=cumsum))
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15

Related

Putting back a missing column from a data.frame into a list of dta.frames

My LIST of data.frames below is made from my data. However, this LIST is missing the scale column which is available in the original data.
I was wondering how to put back the missing scale column into LIST to achive my DESIRED_LIST?
Reproducible data and code are below.
m3="
scale study outcome time ES bar
2 1 1 0 1 8
2 1 2 0 2 7
1 2 1 0 3 6
1 2 1 1 4 5
2 3 1 0 5 4
2 3 1 1 6 3
1 4 1 0 7 2
1 4 2 0 8 1"
data <- read.table(text = m3, h=T)
LIST <- list(data.frame(study=c(3,3) ,outcome=c(1,1) ,time=0:1),
data.frame(study=c(1,1) ,outcome=c(1,2) ,time=c(0,0)),
data.frame(study=c(2,2,4,4),outcome=c(1,1,1,2),time=c(0,1,0,0)))
DESIRED_LIST <- list(data.frame(scale=c(2,2) ,study=c(3,3) ,outcome=c(1,1) ,time=0:1),
data.frame(scale=c(2,2) ,study=c(1,1) ,outcome=c(1,2) ,time=c(0,0)),
data.frame(scale=c(1,1,1,1),study=c(2,2,4,4),outcome=c(1,1,1,2),time=c(0,1,0,0)))
In base R, you could do:
lapply(LITS, \(x)merge(x, data)[names(data)])

Add group to data frame using monotonically increasing numbers

I've got a data frame that looks like this (the real data is much larger and more complicated):
df.test = data.frame(
sample = c("a","a","a","a","a","a","b","b"),
day = c(0,1,2,0,1,3,0,2),
value = rnorm(8)
)
sample day value
1 a 0 -1.11182146
2 a 1 0.65679637
3 a 2 0.03652325
4 a 0 -0.95351736
5 a 1 0.16094840
6 a 3 0.06829702
7 b 0 0.33705141
8 b 2 0.24579603
The data frame is organized by experiments but the experiment ids are missed. The same sample can be used in different experiment, but I know that in a single experiment the days start from 0 and are monotonically increasing.
How can I add the experiment ids that can be a numbers {1, 2, ...}?
So the resulted data frame will be
sample day value exp
1 a 0 -1.11182146 1
2 a 1 0.65679637 1
3 a 2 0.03652325 1
4 a 0 -0.95351736 2
5 a 1 0.16094840 2
6 a 3 0.06829702 2
7 b 0 0.33705141 3
8 b 2 0.24579603 3
I would appreciate any help, especially with a tidy/dplyr solution.
As indicated in the comments, you can do this with cumsum:
df.test %>% mutate(exp = cumsum(day == 0))
## sample day value exp
## 1 a 0 0.09300394 1
## 2 a 1 0.85322925 1
## 3 a 2 -0.25167313 1
## 4 a 0 -0.14811243 2
## 5 a 1 -1.86789014 2
## 6 a 3 0.45983987 2
## 7 b 0 2.81199150 3
## 8 b 2 0.31951634 3
You can use diff :
library(dplyr)
df.test %>% mutate(exp = cumsum(c(TRUE, diff(day) < 0)))
# sample day value exp
#1 a 0 -0.3382010 1
#2 a 1 2.2241041 1
#3 a 2 2.2202612 1
#4 a 0 1.0359635 2
#5 a 1 0.4134727 2
#6 a 3 1.0144439 2
#7 b 0 -0.1292119 3
#8 b 2 -0.1191505 3

detect missings (NA or 0) in data frame

i want to create a new variable in a data frame that contains information about the other variables.
I have got a large data frame. To keep it short let's say:
a <- c(1,0,2,3)
b <- c(3,0,1,1)
c <- c(2,0,2,2)
d <- c(4,1,1,1)
(df <- data.frame(a,b,c,d) )
a b c d
1 1 3 2 4
2 0 0 0 1
3 2 1 2 1
4 3 1 2 1
Aim: Create a new variable that informs me if one person (row) has cero reports (or missings / NA) either in the variables a+b or in the variables c+d.
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1
As i have a large data frame i was thinking about the use of df[1:2] and df[3:4] so that i do not need to type every variable name. But i am not sure which is the best way to implement it. Maybe dplyr has a nice option?
df$x <- ifelse(rowSums(df), 1, NA)
EDIT: Answer to the updated question:
df$x <- ifelse(rowSums(df[1:2])&rowSums(df[3:4]), 1, NA)
gives,
a b c d x
1 1 3 2 4 1
2 0 0 0 1 NA
3 2 1 2 1 1
4 3 1 2 1 1

subtract first value from each subset of dataframe

I want to subtract the smallest value in each subset of a data frame from each value in that subset i.e.
A <- c(1,3,5,6,4,5,6,7,10)
B <- rep(1:4, length.out=length(A))
df <- data.frame(A, B)
df <- df[order(B),]
Subtracting would give me:
A B
1 0 1
2 3 1
3 9 1
4 0 2
5 2 2
6 0 3
7 1 3
8 0 4
9 1 4
I think the output you show is not correct. In any case, from what you explain, I think this is what you want. This uses ave base function:
within(df, { A <- ave(A, B, FUN=function(x) x-min(x))})
A B
1 0 1
5 3 1
9 9 1
2 0 2
6 2 2
3 0 3
7 1 3
4 0 4
8 1 4
Of course there are other alternatives such as plyr and data.table.
Echoing Arun's comment above, I think your expected output might be off. In any event, you should be able to use can use tapply to calculate subsets and then use match to line those subsets up with the original values:
subs <- tapply(df$A, df$B, min)
df$A <- df$A - subs[match(df$B, names(subs))]
df
A B
1 0 1
5 3 1
9 9 1
2 0 2
6 2 2
3 0 3
7 1 3
4 0 4
8 1 4

In R, how can I make a running count of runs?

Suppose I have an R dataframe that looks like this, where end.group signifies the end of a unique group of observations:
x <- data.frame(end.group=c(0,0,1,0,0,1,1,0,0,0,1,1,1,0,1))
I want to return the following, where group.count is a running count of the number of observations in a group, and group is a unique identifier for each group, in number order. Can anyone help me with a piece of R code to do this?
end.group group.count group
0 1 1
0 2 1
1 3 1
0 1 2
0 2 2
1 3 2
1 1 3
0 1 4
0 2 4
0 3 4
1 4 4
1 1 5
1 1 6
0 1 7
1 2 7
You can create group by using cumsum and rev. You need rev because you have the end points of the groups.
x <- data.frame(end.group=c(0,0,1,0,0,1,1,0,0,0,1,1,1,0,1))
# create groups
x$group <- rev(cumsum(rev(x$end.group)))
# re-number groups from smallest to largest
x$group <- abs(x$group-max(x$group)-1)
Now you can use ave to create group.count.
x$group.count <- ave(x$end.group, x$group, FUN=seq_along)
x <- data.frame(end.group=c(0,0,1,0,0,1,1,0,0,0,1,1,1,0,1))
ends <- which(as.logical(x$end.group))
ends2 <- c(ends[1],diff(ends))
transform(x, group.count=unlist(sapply(ends2,seq)), group=rep(seq(length(ends)),times=ends2))
end.group group.count group
1 0 1 1
2 0 2 1
3 1 3 1
4 0 1 2
5 0 2 2
6 1 3 2
7 1 1 3
8 0 1 4
9 0 2 4
10 0 3 4
11 1 4 4
12 1 1 5
13 1 1 6
14 0 1 7
15 1 2 7

Resources