How I can insert values a dataframe to another dataframe - r

A similar to my data is:
dat1<-read.table (text=" ID Rat Garden Class Time1 Time2 Time3
1 12 12 0 15 16 20
1 13 0 1 NA NA NA
2 13 11 0 18 12 16
2 9 0 1 NA NA NA
1 6 13 0 17 14 14
1 7 0 2 NA NA NA
2 4 14 0 17 16 12
2 3 0 2 NA NA NA
", header=TRUE)
dat2<-read.table (text=" ID Value1 Value2
1 6 7
2 5 4
", header=TRUE)
I want to insert the values of dat2 to dat1 in the Time1 column. In front of numbers 1 and 2 in the class column.
I get the following outcome.
ID Rat Garden Class Time1 Time2 Time3
1 12 12 0 15 16 20
1 13 0 1 6
2 13 11 0 18 12 16
2 9 0 1 5
1 6 13 0 17 14 14
1 7 0 2 7
2 4 14 0 17 16 12
2 3 0 2 4

We may group by 'ID', and replace the 'Time1' where the NA values occur with the unlisted 'dat2' 'Value' columns where the ID matches
library(dplyr)
dat1 %>%
group_by(ID) %>%
mutate(Time1 = replace(Time1, is.na(Time1),
unlist(dat2[-1][dat2$ID == cur_group()$ID,]))) %>%
ungroup
-output
# A tibble: 8 × 7
ID Rat Garden Class Time1 Time2 Time3
<int> <int> <int> <int> <int> <int> <int>
1 1 12 12 0 15 16 20
2 1 13 0 1 6 NA NA
3 2 13 11 0 18 12 16
4 2 9 0 1 5 NA NA
5 1 6 13 0 17 14 14
6 1 7 0 2 7 NA NA
7 2 4 14 0 17 16 12
8 2 3 0 2 4 NA NA

Here is a wild ride:
First we pull the values as a vector from dat2.
Then we put alternating an NA into the vector until it gets column length of dat1 and
finally we use coalesce after cbind:
library(dplyr)
library(tidyr)
vector <- dat2 %>%
pivot_longer(-ID) %>%
arrange(name) %>%
pull(value)
col_x <- c(sapply(vector, c, rep(NA, 1)))
cbind(dat1, col_x) %>%
mutate(col_x = lag(col_x)) %>%
mutate(Time1= coalesce(Time1, col_x), .keep="unused")
ID Rat Garden Class Time1 Time2 Time3
1 1 12 12 0 15 16 20
2 1 13 0 1 6 NA NA
3 2 13 11 0 18 12 16
4 2 9 0 1 5 NA NA
5 1 6 13 0 17 14 14
6 1 7 0 2 7 NA NA
7 2 4 14 0 17 16 12
8 2 3 0 2 4 NA NA

Related

How to update a value in a specific column in R

Here is a part of the sample data :
dat<-read.table (text=" ID Time B1 T1 Q1 W1 M1
1 12 12 0 12 11 9
1 13 0 1 NA NA NA
2 10 12 0 6 7 8
2 14 0 1 NA NA NA
1 16 16A 0 1 2 4
1 14 0 1 NA NA NA
2 14 16A 0 5 6 7
2 7 0 1 NA NA NA
1 7 20 0 5 8 0
1 7 0 1 NA NA NA
2 9 20 0 7 8 1
2 9 0 1 NA NA NA
", header=TRUE)
I want to update value 1 In column T1 for repeated IDs. For the first repeated IDs, should be a value of 1, and for the second repeated IDs, a value of 2; and for the third repeated IDs, a value of 3 and so on. I also want to replace NA with blank cells. here is the expected outcome:
ID Time B1 T1 Q1 W1 M1
1 12 12 0 12 11 9
1 13 0 1
2 10 12 0 6 7 8
2 14 0 1
1 16 16A 0 1 2 4
1 14 0 2
2 14 16A 0 5 6 7
2 7 0 2
1 7 20 0 5 8 0
1 7 0 3
2 9 20 0 7 8 1
2 9 0 3
You could use an ifelse across with cumsum per group like this:
library(dplyr)
dat %>%
group_by(ID, B1) %>%
mutate(across(T1, ~ ifelse(.x == 1, cumsum(.x), T1)))
#> # A tibble: 12 × 7
#> # Groups: ID, B1 [8]
#> ID Time B1 T1 Q1 W1 M1
#> <int> <int> <chr> <int> <int> <int> <int>
#> 1 1 12 12 0 12 11 9
#> 2 1 13 0 1 NA NA NA
#> 3 2 10 12 0 6 7 8
#> 4 2 14 0 1 NA NA NA
#> 5 1 16 16A 0 1 2 4
#> 6 1 14 0 2 NA NA NA
#> 7 2 14 16A 0 5 6 7
#> 8 2 7 0 2 NA NA NA
#> 9 1 7 20 0 5 8 0
#> 10 1 7 0 3 NA NA NA
#> 11 2 9 20 0 7 8 1
#> 12 2 9 0 3 NA NA NA
Created on 2023-01-14 with reprex v2.0.2
With data.table
library(data.table)
setDT(dat)[T1 ==1, T1 := cumsum(T1), .(ID, B1)]
-output
> dat
ID Time B1 T1 Q1 W1 M1
1: 1 12 12 0 12 11 9
2: 1 13 0 1 NA NA NA
3: 2 10 12 0 6 7 8
4: 2 14 0 1 NA NA NA
5: 1 16 16A 0 1 2 4
6: 1 14 0 2 NA NA NA
7: 2 14 16A 0 5 6 7
8: 2 7 0 2 NA NA NA
9: 1 7 20 0 5 8 0
10: 1 7 0 3 NA NA NA
11: 2 9 20 0 7 8 1
12: 2 9 0 3 NA NA NA

Completing a sequence of integers by group with tidyverse in R

Given a dataset which contains a grouping variable and a column of integers which is incomplete (contains NAs) and the beginning and ending integer vary by group and the length of each group varies (and could be NA). How might one fill in the NA integer values by completing the sequence.
The following dataset may be used as an example:
library(dplyr)
set.seed(5112021)
dat1 <- bind_rows(data.frame(Group=1,Seq=(3:20)),
data.frame(Group=2,Seq=(-1:25))) %>%
mutate(rn = rnorm(45,mean=0.5,sd=1),
Seq = ifelse(rn < 0.4,NA,Seq)) %>%
select(-rn) %>%
group_by(Group) %>%
mutate(Seq = ifelse(Seq==-1,NA,Seq))
dat1
Group Seq
1 1 NA
2 1 NA
3 1 NA
4 1 6
5 1 7
6 1 8
7 1 NA
8 1 10
9 1 11
10 1 NA
11 1 13
12 1 NA
13 1 15
14 1 NA
15 1 NA
16 1 NA
17 1 NA
18 1 20
19 2 NA
20 2 0
21 2 NA
22 2 2
23 2 3
24 2 NA
25 2 5
26 2 6
27 2 7
28 2 8
29 2 NA
30 2 10
31 2 NA
32 2 12
33 2 NA
34 2 NA
35 2 NA
36 2 16
37 2 17
38 2 NA
39 2 NA
40 2 NA
41 2 NA
42 2 22
43 2 NA
44 2 NA
45 2 NA
One way to do this could be to make use of row_numbers (since they are a sequence of integers) by group and calculate the difference between the non-missing values and the row number (which is a unique value) and then add that value back to the row number.
for example
dat2 <- dat1 %>%
group_by(Group) %>%
mutate(rn = row_number(),
diff = mean(Seq-rn,na.rm=T)) %>%
mutate(New_Seq = rn+diff) %>%
select(-rn,-diff)
dat2
Group Seq New_Seq
1 1 NA 3
2 1 NA 4
3 1 NA 5
4 1 6 6
5 1 7 7
6 1 8 8
7 1 NA 9
8 1 10 10
9 1 11 11
10 1 NA 12
11 1 13 13
12 1 NA 14
13 1 15 15
14 1 NA 16
15 1 NA 17
16 1 NA 18
17 1 NA 19
18 1 20 20
19 2 NA -1
20 2 0 0
21 2 NA 1
22 2 2 2
23 2 3 3
24 2 NA 4
25 2 5 5
26 2 6 6
27 2 7 7
28 2 8 8
29 2 NA 9
30 2 10 10
31 2 NA 11
32 2 12 12
33 2 NA 13
34 2 NA 14
35 2 NA 15
36 2 16 16
37 2 17 17
38 2 NA 18
39 2 NA 19
40 2 NA 20
41 2 NA 21
42 2 22 22
43 2 NA 23
44 2 NA 24
45 2 NA 25
While this works, it doesn't seem very elegant and may be slow for very large datasets with many grouping variables. I'm curiouse if there is a more 'Tidyverse' way to do this.
You could do something like:
df %>%
group_by(Group) %>%
mutate(newseq = seq_along(Group) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq)))) - 1) %>%
ungroup()
Or
df %>%
group_by(Group) %>%
mutate(newseq = seq(first(na.omit(Seq)) - sum(cumall(is.na(Seq))), length.out = n())) %>%
ungroup()
Or
df %>%
group_by(Group) %>%
mutate(newseq = 0:(n() - 1) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq))))) %>%
ungroup()
All these do the same thing: shift the start of the sequence by the difference of the first non-NA value and the number of NAs before it.
Output
Group Seq newseq
<int> <int> <dbl>
1 1 NA 3
2 1 NA 4
3 1 NA 5
4 1 6 6
5 1 7 7
6 1 8 8
7 1 NA 9
8 1 10 10
9 1 11 11
10 1 NA 12
# ... with 35 more rows
First create row number, then take the max difference of Seq and row_number and add to row number:
dat1 %>%
group_by(Group) %>%
mutate(rn = row_number(),
Seq = rn + max(Seq - rn, na.rm = TRUE)) %>%
ungroup() %>%
select(-rn)
Output:
Group Seq
<dbl> <int>
1 1 3
2 1 4
3 1 5
4 1 6
5 1 7
6 1 8
7 1 9
8 1 10
9 1 11
10 1 12
11 1 13
12 1 14
13 1 15
14 1 16
15 1 17
16 1 18
17 1 19
18 1 20
19 2 -1
20 2 0
21 2 1
22 2 2
23 2 3
24 2 4
25 2 5
26 2 6
27 2 7
28 2 8
29 2 9
30 2 10
31 2 11
32 2 12
33 2 13
34 2 14
35 2 15
36 2 16
37 2 17
38 2 18
39 2 19
40 2 20
# … with 5 more rows
data:
set.seed(5112021)
dat1 <- bind_rows(data.frame(Group=1,Seq=(3:20)),
data.frame(Group=2,Seq=(-1:25))) %>%
mutate(rn = rnorm(45,mean=0.5,sd=1),
Seq = ifelse(rn < 0.4,NA,Seq)) %>%
select(-rn) %>%
group_by(Group) %>%
mutate(Seq = ifelse(Seq==-1,NA,Seq))

Sorting a specific range of column names in dplyr

I have a data frame and wish to sort specific columns alphabetically in dplyr. I know I can use the code below to sort all columns, but I would only like to sort columns C, B and A alphabetically. I tried using the across function as I would effectively like to select columns C:A, but this did not work.
df <- data.frame(1:16)
df$Testinfo1 <- 1
df$Band <- 1
df$Alpha <- 1
df$C <- c(10,12,14,16,10,12,14,16,10,12,14,16,10,12,14,16)
df$B <- c(10,0,0,0,12,12,12,12,0,14,NA_real_,14,16,16,16,16)
df$A <- c(1,1,1,1,1,1,1,1,1,1,1,14,NA_real_,NA_real_,NA_real_,16)
df
df %>%
select(sort(names(.)))
A Alpha B Band C Testinfo1 X1.16
1: 1 1 10 1 10 1 1
2: 1 1 0 1 12 1 2
3: 1 1 0 1 14 1 3
4: 1 1 0 1 16 1 4
5: 1 1 12 1 10 1 5
6: 1 1 12 1 12 1 6
7: 1 1 12 1 14 1 7
8: 1 1 12 1 16 1 8
9: 1 1 0 1 10 1 9
10: 1 1 14 1 12 1 10
11: 1 1 NA 1 14 1 11
12: 14 1 14 1 16 1 12
13: NA 1 16 1 10 1 13
14: NA 1 16 1 12 1 14
15: NA 1 16 1 14 1 15
16: 16 1 16 1 16 1 16
My desired output is below:
X1.16 Testinfo1 Band Alpha A B C
1: 1 1 1 1 1 10 10
2: 2 1 1 1 1 0 12
3: 3 1 1 1 1 0 14
4: 4 1 1 1 1 0 16
5: 5 1 1 1 1 12 10
6: 6 1 1 1 1 12 12
7: 7 1 1 1 1 12 14
8: 8 1 1 1 1 12 16
9: 9 1 1 1 1 0 10
10: 10 1 1 1 1 14 12
11: 11 1 1 1 1 NA 14
12: 12 1 1 1 14 14 16
13: 13 1 1 1 NA 16 10
14: 14 1 1 1 NA 16 12
15: 15 1 1 1 NA 16 14
16: 16 1 1 1 16 16 16
You can use relocate() (from dplyr 1.0.0 onwards):
library(dplyr)
vars <- c("C", "B", "A")
df %>%
relocate(all_of(sort(vars)), .after = last_col())
If you are passing a character vector of names you should wrap it in all_of() (which will error if any variables are missing) or any_of() which won't.
You can do
sortcols <- c("A","B","C")
library(dplyr)
df %>%
select(-sortcols, sort(sortcols))
The -sortcols part selects everything but the columns you want to sort and then you put the columns you want after those.
A base R option for a case which may or may not exist. If the columns that you want to sort are not at the end of the dataframe.
We add a new column D which you don't want to change the position of.
df$D <- 1:16
cols_to_sort <- c('A', 'B', 'C')
inds <- match(cols_to_sort, names(df))
cols <- seq_along(df)
cols[cols %in% inds] <- inds
df[cols]
# X1.16 Testinfo1 Band Alpha A B C D
#1 1 1 1 1 1 10 10 1
#2 2 1 1 1 1 0 12 2
#3 3 1 1 1 1 0 14 3
#4 4 1 1 1 1 0 16 4
#5 5 1 1 1 1 12 10 5
#6 6 1 1 1 1 12 12 6
#7 7 1 1 1 1 12 14 7
#8 8 1 1 1 1 12 16 8
#9 9 1 1 1 1 0 10 9
#10 10 1 1 1 1 14 12 10
#11 11 1 1 1 1 NA 14 11
#12 12 1 1 1 14 14 16 12
#13 13 1 1 1 NA 16 10 13
#14 14 1 1 1 NA 16 12 14
#15 15 1 1 1 NA 16 14 15
#16 16 1 1 1 16 16 16 16

R apply function to groups within data frame adding result as additional column

Here is the code for my example dataset.
df = data.frame("group" =c(rep(1,5),rep(1,6),rep(2,4),rep(2,3)), "time" = c(rep(NA,5),seq(1,6),rep(NA,4),seq(1,3)), "p" = seq(1,18) )
group time p
1 1 NA 1
2 1 NA 2
3 1 NA 3
4 1 NA 4
5 1 NA 5
6 1 1 6
7 1 2 7
8 1 3 8
9 1 4 9
10 1 5 10
11 1 6 11
12 2 NA 12
13 2 NA 13
14 2 NA 14
15 2 NA 15
16 2 1 16
17 2 2 17
18 2 3 18
I would like to figure out how to apply a function by group to only the values that have time then append the result as a new column in the data frame. Here is my example function I would like to apply.
pfunc <- function(p){
p+5
}
The output I am hoping to obtain would look as follows.
group time p new_p
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 11
7 1 2 7 12
8 1 3 8 13
9 1 4 9 14
10 1 5 10 15
11 1 6 11 16
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 21
17 2 2 17 22
18 2 3 18 23
You can try this:
library(dplyr)
df %>% group_by(group) %>%
mutate(pnew=ifelse(is.na(time),time,time+5))
# A tibble: 18 x 4
# Groups: group [2]
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 6
7 1 2 7 7
8 1 3 8 8
9 1 4 9 9
10 1 5 10 10
11 1 6 11 11
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 6
17 2 2 17 7
18 2 3 18 8
Update
You can use this function:
increase <- function(data,n)
{
data %>% group_by(group) %>%
mutate(pnew=ifelse(is.na(time),time,time+n)) -> result
return(result)
}
increase(df,n = 10)
# A tibble: 18 x 4
# Groups: group [2]
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 11
7 1 2 7 12
8 1 3 8 13
9 1 4 9 14
10 1 5 10 15
11 1 6 11 16
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 11
17 2 2 17 12
18 2 3 18 13
Update 2
I hope this helps:
df %>% group_by(group) %>% rowwise() %>% mutate(pnew=ifelse(is.na(time),NA,pfunc(time)))
# A tibble: 18 x 4
# Rowwise: group
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 6
7 1 2 7 7
8 1 3 8 8
9 1 4 9 9
10 1 5 10 10
11 1 6 11 11
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 6
17 2 2 17 7
18 2 3 18 8

R tidyverse: create groups based on index column

I have this tibble
# Data
set.seed(1)
x <- tibble(values = round(rnorm(20, 10, 10), 0),
index = c(0,0,1,1,1,0,1,0,1,1,1,1,1,1,0,
1,1,0,0,0))
x
#> # A tibble: 20 x 2
#> values index
#> <dbl> <dbl>
#> 1 4 0
#> 2 12 0
#> 3 2 1
#> 4 26 1
#> 5 13 1
#> 6 2 0
#> 7 15 1
#> 8 17 0
#> 9 16 1
#> 10 7 1
#> 11 25 1
#> 12 14 1
#> 13 4 1
#> 14 -12 1
#> 15 21 0
#> 16 10 1
#> 17 10 1
#> 18 19 0
#> 19 18 0
#> 20 16 0
I'd like to create groups where the value in the index column are consecutive ones. The final aim is to compute the sum per each group.
This is the expected tibble is someting like:
# A tibble: 20 x 3
values index group
<dbl> <dbl> <chr>
1 4 0 NA
2 12 0 NA
3 2 1 A
4 26 1 A
5 13 1 A
6 2 0 NA
7 15 1 B
8 17 0 NA
9 16 1 C
10 7 1 C
11 25 1 C
12 14 1 C
13 4 1 C
14 -12 1 C
15 21 0 NA
16 10 1 D
17 10 1 D
18 19 0 NA
19 18 0 NA
20 16 0 NA
Thank you in advance for your advice.
You could use cumsum() on runs identified by rle(), replacing the values where index is zero with NA. If there are more than 26 IDs it will need a minor modification.
library(dplyr)
x2 <- x %>%
mutate(id = LETTERS[replace(with(rle(index),
rep(cumsum(values), lengths)), index == 0, NA)])
Giving:
# A tibble: 20 x 3
values index id
<dbl> <dbl> <chr>
1 4 0 NA
2 12 0 NA
3 2 1 A
4 26 1 A
5 13 1 A
6 2 0 NA
7 15 1 B
8 17 0 NA
9 16 1 C
10 7 1 C
11 25 1 C
12 14 1 C
13 4 1 C
14 -12 1 C
15 21 0 NA
16 10 1 D
17 10 1 D
18 19 0 NA
19 18 0 NA
20 16 0 NA
To sum the values:
x2 %>%
group_by(id) %>%
summarise(sv = sum(values))
# A tibble: 5 x 2
id sv
* <chr> <dbl>
1 A 41
2 B 15
3 C 54
4 D 20
5 NA 109
An option with data.table
library(data.table)
setDT(x)[, group := LETTERS[as.integer(factor((NA^!index) *rleid(index)))]]
x
# values index group
# 1: 4 0 <NA>
# 2: 12 0 <NA>
# 3: 2 1 A
# 4: 26 1 A
# 5: 13 1 A
# 6: 2 0 <NA>
# 7: 15 1 B
# 8: 17 0 <NA>
# 9: 16 1 C
#10: 7 1 C
#11: 25 1 C
#12: 14 1 C
#13: 4 1 C
#14: -12 1 C
#15: 21 0 <NA>
#16: 10 1 D
#17: 10 1 D
#18: 19 0 <NA>
#19: 18 0 <NA>
#20: 16 0 <NA>
Or similar logic in dplyr
library(dplyr)
x %>%
mutate(group = LETTERS[as.integer(factor((NA^!index) *rleid(index)))])
# A tibble: 20 x 3
# values index group
# <dbl> <dbl> <chr>
# 1 4 0 <NA>
# 2 12 0 <NA>
# 3 2 1 A
# 4 26 1 A
# 5 13 1 A
# 6 2 0 <NA>
# 7 15 1 B
# 8 17 0 <NA>
# 9 16 1 C
#10 7 1 C
#11 25 1 C
#12 14 1 C
#13 4 1 C
#14 -12 1 C
#15 21 0 <NA>
#16 10 1 D
#17 10 1 D
#18 19 0 <NA>
#19 18 0 <NA>
#20 16 0 <NA>

Resources