Calculation in Dataframe keeping first row as reference - r

My first row is the reference value for addition of the below rows, for N number of columns.
Data
A B C D
3 5 1 2
1 4 5 3
2 2 2 4
3 1 3 1
4 3 1 2
Calculation as follows:
3, is reference value that is added, which is 3 should be added to 1, 2, 3, and 4, similarly 5 is the reference value that is - added to 4,2,1,3 and then 1 is reference value added to 5,2,3,1 and so .... till n columns.
1 + 3 4 + 5 5 + 1 3 + 2
2 + 3 2 + 5 2 + 1 4 + 2
3 + 3 1 + 5 3 + 1 1 + 2
4 + 3 3 + 5 1 + 1 2 + 2
Expected output:
A B C D
4 9 6 5
5 7 3 6
6 6 4 3
7 8 2 4
Please help. Thanks.

Maybe just this:
c(mydf[1, ]) + mydf[-1, ]
## A B C D
## 2 4 9 6 5
## 3 5 7 3 6
## 4 6 6 4 3
## 5 7 8 2 4
Starting data.frame:
mydf <- structure(list(A = c(3L, 1L, 2L, 3L, 4L), B = c(5L, 4L, 2L, 1L,
3L), C = c(1L, 5L, 2L, 3L, 1L), D = c(2L, 3L, 4L, 1L, 2L)), .Names = c("A",
"B", "C", "D"), row.names = c(NA, 5L), class = "data.frame")

We can do
(df1[1,][col(df1)] + df1)[-1,]
# A B C D
#2 4 9 6 5
#3 5 7 3 6
#4 6 6 4 3
#5 7 8 2 4

If you are trying to replace the values in your initial dataframe with the new values, you could do the following:
df <- data.frame(c(3,1,2,3,4), c(5,4,2,1,3), c(1,5,2,3,1), c(2,3,4,1,2))
names(df) <- c("A", "B", "C", "D")
for (i in 2:nrow(df)) {
for (j in 1:ncol(df)) {
df[i,j] <- df[1,j] + df[i,j]
}
}
This could probably be vectorized and run faster which would be helpful if you have a very large dataframe, but it will work if you need a quick and dirty solution.
Output:
A B C D
1 3 5 1 2
2 4 9 6 5
3 5 7 3 6
4 6 6 4 3
5 7 8 2 4
Hope this is helpful!

Related

taking sum of rows in R based on conditions

I have a data in this format
ColA
ColB
ColC
A
2
1
A
1
1
B
3
2
B
5
2
C
2
3
C
5
3
A
1
1
A
3
1
B
7
2
B
1
2
I want to get a new column with the sum of the rows of ColB, something like this:
ColA
ColB
ColC
ColD
A
2
1
3
A
1
1
3
B
3
2
8
B
5
2
8
C
2
3
7
C
5
3
7
A
1
1
4
A
3
1
4
B
7
2
8
B
1
2
8
Thanks much for your help!
I tried
df$ColD <- with(df, sum(ColB[ColC == 1]))
It seems to me that you want ColD to have the sum of ColB for each consecutive group defined by the values in ColA. In which case, we may do:
library(dplyr)
df %>%
mutate(group = data.table::rleid(ColA)) %>%
group_by(group) %>%
mutate(ColD = sum(ColB)) %>%
ungroup() %>%
select(-group)
#> # A tibble: 10 x 4
#> ColA ColB ColC ColD
#> <chr> <int> <int> <int>
#> 1 A 2 1 3
#> 2 A 1 1 3
#> 3 B 3 2 8
#> 4 B 5 2 8
#> 5 C 2 3 7
#> 6 C 5 3 7
#> 7 A 1 1 4
#> 8 A 3 1 4
#> 9 B 7 2 8
#> 10 B 1 2 8
This, at any rate, is the same as the expected output.
Created on 2023-01-16 with reprex v2.0.2
Data from question in reproducible format
df <- structure(list(ColA = c("A", "A", "B", "B", "C", "C", "A", "A",
"B", "B"), ColB = c(2L, 1L, 3L, 5L, 2L, 5L, 1L, 3L, 7L, 1L),
ColC = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L)),
class = "data.frame", row.names = c(NA, -10L))
Base R
df$ColD=ave(
df$ColB,
cumsum(c(1,abs(diff(match(df$ColA,LETTERS))))),
FUN=sum
)
ColA ColB ColC ColD
1 A 2 1 3
2 A 1 1 3
3 B 3 2 8
4 B 5 2 8
5 C 2 3 7
6 C 5 3 7
7 A 1 1 4
8 A 3 1 4
9 B 7 2 8
10 B 1 2 8
A base solution:
df |>
transform(ColD = ave(ColB, with(rle(ColA), rep(seq_along(values), lengths)), FUN = sum))
ColA ColB ColC ColD
1 A 2 1 3
2 A 1 1 3
3 B 3 2 8
4 B 5 2 8
5 C 2 3 7
6 C 5 3 7
7 A 1 1 4
8 A 3 1 4
9 B 7 2 8
10 B 1 2 8
Another base solution using ave.
df$ColD <- ave(df$ColB, c(0, cumsum(diff(df$ColC) != 0)), FUN=sum) #Using ColC
#df$ColD <- ave(df$ColB, c(0, cumsum(df$ColA[-nrow(df)] != df$ColA[-1])), FUN=sum) #Using ColA
df
# ColA ColB ColC ColD
#1 A 2 1 3
#2 A 1 1 3
#3 B 3 2 8
#4 B 5 2 8
#5 C 2 3 7
#6 C 5 3 7
#7 A 1 1 4
#8 A 3 1 4
#9 B 7 2 8
#10 B 1 2 8

R Create multiple rows from 1 row based on presence of values in certain columns

I have a data frame that looks like the following:
ID Date Participant_1 Participant_2 Participant_3 Covariate 1 Covariate 2 Covariate 3
1 9/1 A B 16 2 1
2 5/4 B 4 2 2
3 6/3 C A B 8 3 6
4 2/8 A 7 8 4
5 9/3 C A 7 1 3
I need to expand this data frame so that a row is present for all of the participants present at each event "ID", with the date and all other variables in all the created rows. The multiple participant columns would now only be one column for participant. The output would therefore be:
ID Date Participant Covariate 1 Covariate 2 Covariate 3
1 9/1 A 16 2 1
1 9/1 B 16 2 1
2 5/4 B 4 2 2
3 6/3 C 8 3 6
3 6/3 A 8 3 6
3 6/3 B 8 3 6
4 2/8 A 7 8 4
5 9/3 C 7 1 3
5 9/3 A 7 1 3
Is there a way to do this efficiently? Perhaps with a pivot function?
We can use pivot_longer and then some formatting
library(tidyr)
df %>%
pivot_longer(starts_with("Participant"), values_to = "Participant") %>%
select(-name) %>%
relocate(Participant, .before = Covariate_1) %>%
drop_na()
# A tibble: 9 × 6
ID Date Participant Covariate_1 Covariate_2 Covariate_3
<int> <chr> <chr> <int> <int> <int>
1 1 9/1 A 16 2 1
2 1 9/1 B 16 2 1
3 2 5/4 B 4 2 2
4 3 6/3 C 8 3 6
5 3 6/3 A 8 3 6
6 3 6/3 B 8 3 6
7 4 2/8 A 7 8 4
8 5 9/3 C 7 1 3
9 5 9/3 A 7 1 3
Here's the example data used:
df <- structure(list(ID = 1:5, Date = c("9/1", "5/4", "6/3", "2/8",
"9/3"), Participant_1 = c("A", "B", "C", "A", "C"), Participant_2 = c("B",
NA, "A", NA, "A"), Participant_3 = c(NA, NA, "B", NA, NA), Covariate_1 = c(16L,
4L, 8L, 7L, 7L), Covariate_2 = c(2L, 2L, 3L, 8L, 1L), Covariate_3 = c(1L,
2L, 6L, 4L, 3L)), class = "data.frame", row.names = c(NA, -5L
))

How to merge datasets with repeated measures

I have Three datasets that I want to MERGE/JOIN.
This This examples only include the first participants I have a total of 25
df1
ID Grup pretest
1 1 A 2
2 1 A 1
3 1 A 3
4 2 B NA
5 2 B 1
6 2 B 3
7 3 A 2
8 3 A 1
9 3 A NA
10 4 B 2
11 4 B 1
12 4 B 3
df2 (this is missing one ID (5)
ID Grup posttest
1 1 A NA
2 1 A 5
3 1 A 4
4 2 B 2
5 2 B 4
6 2 B 3
7 3 A 5
8 3 A 6
9 3 A 3
10 6 B 4
11 6 B 2
12 6 B NA
Updated
df3( this have 5 Measurements for per ID)
ID Grup traning
1 1 A 2
2 1 A 6
3 1 A 3
4 1 A NA
5 1 A 1
6 2 B 3
7 2 B 4
8 2 B 1
9 2 B NA
10 2 B 2
11 3 A 1
12 3 A 3
I’ve been trying merge() and full_join() but both end up creating duplicates that I don’t want.
It won’t recognize the ID as an independent value, it’s creating 9 IDs for every ID value.
New <- merge(df1, df2, by= 'ID')
New <- full_join(df1, df2, By = "ID")
Setting all = TRUE doesn’t help.
I need the dataset to look like this
ID Grup pretest posttest traning
1 1 A 2 NA. 3
2 1 A 1 5. 4
3 1 A 3 4. 4
4 1 A NA Na. 4
5 1. A NA Na. 3
6 2 B 3 3. Na
7 2 B. 2 5. 3
8 2 B Na 6. 2
9 2 B NA Na. 5
10 2 B Na Na. 4
11 3 A. 1 2. 3
12 3 A. 3 3. 4
Since you are relying on the order of the frames, you can simply use cbind()
cbind(df1,df2[,3,F])
Output:
ID Grup pretest posttest
1 1 A 2 NA
2 1 A 1 5
3 1 A 3 4
4 2 B NA 2
5 2 B 1 4
6 2 B 3 3
7 3 A 2 5
8 3 A 1 6
9 3 A NA 3
10 4 B 2 4
11 4 B 1 2
12 4 B 3 NA
You can add a helper column iid to separate the entries.
df1 <- cbind(iid = 1:nrow(df1), df1)
df2 <- cbind(iid = 1:nrow(df2), df2)
With dplyr
library(dplyr)
left_join(df1, df2, c("iid", "ID", "Grup"))[,-1]
ID Grup pretest posttest
1 1 A 2 NA
2 1 A 1 5
3 1 A 3 4
4 2 B NA 2
5 2 B 1 4
6 2 B 3 3
7 3 A 2 5
8 3 A 1 6
9 3 A NA 3
10 4 B 2 4
11 4 B 1 2
12 4 B 3 NA
With base R merge
merge(df1, df2, c("iid", "ID", "Grup"))[,-1]
ID Grup pretest posttest
1 1 A 2 NA
2 4 B 2 4
3 4 B 1 2
4 4 B 3 NA
5 1 A 1 5
6 1 A 3 4
7 2 B NA 2
8 2 B 1 4
9 2 B 3 3
10 3 A 2 5
11 3 A 1 6
12 3 A NA 3
Data
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L), Grup = c("A", "A", "A", "B", "B", "B", "A", "A", "A",
"B", "B", "B"), pretest = c(2L, 1L, 3L, NA, 1L, 3L, 2L, 1L, NA,
2L, 1L, 3L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12"))
df2 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L), Grup = c("A", "A", "A", "B", "B", "B", "A", "A", "A",
"B", "B", "B"), posttest = c(NA, 5L, 4L, 2L, 4L, 3L, 5L, 6L,
3L, 4L, 2L, NA)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
Another option is joining by rownames, eg. row numbers:
library(tibble)
library(dplyr)
left_join(rownames_to_column(df1), df2 %>% rownames_to_column() , by="rowname") %>%
select(ID = ID.x, Grup = Grup.x, pretest, posttest)
ID Grup pretest posttest
1 1 A 2 NA
2 1 A 1 5
3 1 A 3 4
4 2 B NA 2
5 2 B 1 4
6 2 B 3 3
7 3 A 2 5
8 3 A 1 6
9 3 A NA 3
10 4 B 2 4
11 4 B 1 2
12 4 B 3 NA

Is there any R code to repeat a same value for multiple rows?

My df looks like this now.
A B C
1 3 .
1 6 .
1 9 .
2 1 .
2 2 .
2 5 .
3 9 .
3 3 .
3 2 .
Below is the ideal dataframe I am try to create:
Variable A refers to individuals (user-id). Each individual has three rows.
Each individual has different values for variable B...
whereas they need to have a same value of variable C.
I need to repeat a same value of variable C for each individual. I was wondering how I can give each participant the same value of variable C so that variable C is repeated three times for each participant.
A B C
1 3 1
1 6 1
1 9 1
2 1 3
2 2 3
2 5 3
3 9 8
3 3 8
3 2 8
We can just use rep in base R as the number of repeats are already known as 3
df$C <- rep(c(1, 3, 8), each = 3)
df
# A B C
#1 1 3 1
#2 1 6 1
#3 1 9 1
#4 2 1 3
#5 2 2 3
#6 2 5 3
#7 3 9 8
#8 3 3 8
#9 3 2 8
Or another option is to use 'A' as integer index which would also work when there are unequal lengths
df$C <- c(1, 3, 8)[df$A]
If the values in 'A' are not in sequence or it is not numeric, use a named vector to replace
df$C <- setNames(c(1, 3, 8), unique(df$A))[as.character(df$A)]
data
df <- data.frame(A = rep(1:3, each = 3), B = c(3, 6, 9, 1, 2, 5, 9, 3, 2))
You could use an assignment matrix and match it with your A column.
am <- matrix(c(1, 1,
2, 3,
3, 8), byrow=TRUE, ncol=2)
dat$C <- am[match(dat$A, am[,1]), 2]
dat
# A B C
# 1 1 3 1
# 2 1 6 1
# 3 1 9 1
# 4 2 1 3
# 5 2 2 3
# 6 2 5 3
# 7 3 9 8
# 8 3 3 8
# 9 3 2 8
Data:
dat <- structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), B = c(3L,
6L, 9L, 1L, 2L, 5L, 9L, 3L, 2L)), row.names = c(NA, -9L), class = "data.frame")
The solution by #akrun is the most efficient so far. Here is another base R solution, which applies to the cases grouped by df$A for unequal size of groups...
v <- c(1,3,8)
df <- do.call(rbind,lapply(seq_along(v), function(k) cbind(split(df,df$A)[[k]],C=v[k])))
such that
> df
A B C
1 1 3 1
2 1 6 1
3 1 9 1
4 2 1 3
5 2 2 3
6 2 5 3
7 3 9 8
8 3 3 8
9 3 2 8

How to find the last occurrence of a certain observation in grouped data in R?

I have data that is grouped using dplyr in R. I would like to find the last occurrence of observations ('B') equal to or greater than 1 (1, 2, 3 or 4) in each group ('A'), in terms of the 'day' they occurred. I would like the value of 'day' for each group to be given in a new column.
For example, given the following sample of data, grouped by A (this has been simplified, my data is actually grouped by 3 variables):
A B day
a 2 1
a 2 2
a 1 5
a 0 8
b 3 1
b 3 4
b 3 6
b 0 7
b 0 9
c 1 2
c 1 3
c 1 4
I would like to achieve the following:
A B day last
a 2 1 5
a 2 2 5
a 1 5 5
a 0 8 5
b 3 1 6
b 3 4 6
b 3 6 6
b 0 7 6
b 0 9 6
c 1 2 4
c 1 3 4
c 1 4 4
I hope this makes sense, thank you all very much for your help! I have thoroughly searched for my answer online but couldn't find anything. However, if I have accidentally duplicated a question then I apologise.
We can try
library(data.table)
setDT(df1)[, last := day[tail(which(B>=1),1)] , A]
df1
# A B day last
# 1: a 2 1 5
# 2: a 2 2 5
# 3: a 1 5 5
# 4: a 0 8 5
# 5: b 3 1 6
# 6: b 3 4 6
# 7: b 3 6 6
# 8: b 0 7 6
# 9: b 0 9 6
#10: c 1 2 4
#11: c 1 3 4
#12: c 1 4 4
Or using dplyr
library(dplyr)
df1 %>%
group_by(A) %>%
mutate(last = day[max(which(B>=1))])
Or use the last function from dplyr (as #docendo discimus suggested)
df1 %>%
group_by(A) %>%
mutate(last= last(day[B>=1]))
For the second question,
setDT(df1)[, dayafter:= if(all(!!B)) NA_integer_ else
day[max(which(B!=0))+1L] , A]
# A B day dayafter
# 1: a 2 1 8
# 2: a 2 2 8
# 3: a 1 5 8
# 4: a 0 8 8
# 5: b 3 1 7
# 6: b 3 4 7
# 7: b 3 6 7
# 8: b 0 7 7
# 9: b 0 9 7
#10: c 1 2 NA
#11: c 1 3 NA
#12: c 1 4 NA
Here is a solution that does not require loading external packages:
df <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"),
B = c(2L, 2L, 1L, 0L, 3L, 3L, 3L, 0L, 0L, 1L, 1L, 1L), day = c(1L,
2L, 5L, 8L, 1L, 4L, 6L, 7L, 9L, 2L, 3L, 4L)), .Names = c("A",
"B", "day"), class = "data.frame", row.names = c(NA, -12L))
x <- split(df, df$A, drop = TRUE)
tp <- lapply(x, function(k) {
tmp <- k[k$B >0,]
k$last <- tmp$day[length(tmp$day)]
k
})
do.call(rbind, tp)
A B day last
#a.1 a 2 1 5
#a.2 a 2 2 5
#a.3 a 1 5 5
#a.4 a 0 8 5
#b.5 b 3 1 6
#b.6 b 3 4 6
#b.7 b 3 6 6
#b.8 b 0 7 6
#b.9 b 0 9 6
#c.10 c 1 2 4
#c.11 c 1 3 4
#c.12 c 1 4 4

Resources