Retrieving a column value in R by subsetting - r

I have this dataframe (df):
df <- data.frame(Data1 = c(1,3),
Data2 = c(3,9),
Data3 = c(7,2),
Data1Status = c(1,4),
Data2Status = c(2,5),
Data3Status = c(3,6),
NumberOfMaxValue = c(3,2))
Data1 Data2 Data3 Data1Status Data2Status Data3Status NumberOfMaxValue
1 3 7 1 2 3 3
3 9 2 4 5 6 2
And i want do get this new column:
Data1 Data2 Data3 Data1Status Data2Status Data3Status NumberOfMaxValue DataMaxStatus
1 3 7 1 2 3 3 3
3 9 2 4 5 6 2 5
I tried something like this:
DataMaxStatus = df[, as.numeric(df$NumberOfMaxValue) + 3], but it didn't work.
EDIT/EXPLANATION:
NumberOfMaxValue is the number of the biggest data (1, 2 or 3)
DataMaxStatus is the status of the greater number between Data1, Data2 e Data3

We can get the corresponding Status value by creating a matrix of row/column index to subset from Status columns.
cols <- grep('Status', names(df))
df$DataMaxStatus <- df[cols][cbind(1:nrow(df), df$NumberOfMaxValue)]
df
# Data1 Data2 Data3 Data1Status Data2Status Data3Status NumberOfMaxValue DataMaxStatus
#1 1 3 7 1 2 3 3 3
#2 3 9 2 4 5 6 2 5

Related

is it possible to filter rows of one dataframe based on another dataframe?

is it possible to filter rows of one dataframe based on another dataframe?
I have this 2 dataframe:
df_node <- data.frame( id= c("a","b","c","d","e","f","g","h","i"),
group= c(1,1,1,2,2,2,3,3,3))
df_link <- data.frame(from = c("a","d","f","i","b"),
to = c("d","f","i","b","h"))
I would like to delete the lines with characters that are not present in the second dataframe, like this:
here is a basic way to do that:
df_node <- data.frame( id= c("a","b","c","d","e","f","g","h","i"),
group= c(1,1,1,2,2,2,3,3,3))
df_link <- data.frame(from = c("a","d","f","i","b"),
to = c("d","f","i","b","h"))
library(dplyr)
df_result <- df_node%>%
filter(id%in%c(df_link$from,df_link$to))
df_result
# > df_result
# id group
# 1 a 1
# 2 b 1
# 3 d 2
# 4 f 2
# 5 h 3
# 6 i 3
We could use a semi_join:
library(dplyr)
df_node |>
semi_join(tibble(id = c(df_link$from, df_link$to)))
Output:
id group
1 a 1
2 b 1
3 d 2
4 f 2
5 h 3
6 i 3
Here is a oneliner with base R:
df_node[df_node$id %in% unlist(df_link),]
id group
1 a 1
2 b 1
4 d 2
6 f 2
8 h 3
9 i 3
But you could also use a join:
library(dplyr)
df_uniqueID <- data.frame(id = unique(c(df_link$from,df_link$to)) )
right_join(df_node,df_uniqueID)
Joining, by = "id"
id group
1 a 1
2 b 1
3 d 2
4 f 2
5 h 3
6 i 3

arranging columns based on numeric values in r

I need to arrange column names based on numbering.
Here is a short version of my dataset.
df <- data.frame(id = c(1,2,3),
raw_score = c(10,20,30),
a = c(1,1,1),
b = c(2,3,4),
c = c(4,6,7))
names(df) <- c("id","raw_score","2.2","2.3","2.1")
> df
id raw_score 2.2 2.3 2.1
1 1 10 1 2 4
2 2 20 1 3 6
3 3 30 1 4 7
How can I arrange the columns below?
> df
id raw_score 2.1 2.2 2.3
1 1 10 4 1 2
2 2 20 6 1 3
3 3 30 7 1 4
Maybe
df %>% dplyr::select(id, raw_score,stringr::str_sort(colnames(df[, 3:ncol(df)]), numeric = TRUE)) -> df

Efficient recoding of numeric variables into a factor in a data.frame

In recoding values of numeric variables like var1 below into character values, sometimes there is an easy patter. For example, suppose numeric values 1:4 in var1 need to be recoded as LETTERS[27-(4:1)], respectively.
In such situations, is it possible to avoid writing var1 = recode(var1,1="W",2="X",3="Y",4="Z") and instead loop the recoding?
library(tidyverse)
(dat <- data.frame(var1 = rep(1:4,2), id = 1:8))
mutate(dat, var1 = recode(var1,`1`="W",`2`="X",`3`="Y",`4`="Z")) # This works but can we
# loop it as well?
We can use a vectorized approach, no loops necessary. tail and base subsetting with [ will do the trick here.
library(dplyr)
dat %>% mutate(var1=tail(LETTERS, max(var1))[var1] %>% as.factor)
var1 id
1 W 1
2 X 2
3 Y 3
4 Z 4
5 W 5
6 X 6
7 Y 7
8 Z 8
data
dat <- data.frame(var1 = rep(1:4,2), id = 1:8)
data2
dat2 <- data.frame(var1 = c(2,1,3,1,4:1), id = 1:8))
var1 id
1 2 1
2 1 2
3 3 3
4 1 4
5 4 5
6 3 6
7 2 7
8 1 8
output2
var1 id
1 X 1
2 W 2
3 Y 3
4 W 4
5 Z 5
6 Y 6
7 X 7
8 W 8
You can use -
library(dplyr)
dat %>% mutate(var1 = LETTERS[length(LETTERS)-max(var1) + var1])
# var1 id
#1 W 1
#2 X 2
#3 Y 3
#4 Z 4
#5 W 5
#6 X 6
#7 Y 7
#8 Z 8
you can also just use the labels argument of factor()
library(dplyr)
dat <- data.frame(var1 = rep(1:4,2), id = 1:8) %>%
mutate(var1 = factor(var1, labels = tail(LETTERS, 4)))
dat
var1 id
1 W 1
2 X 2
3 Y 3
4 Z 4
5 W 5
6 X 6
7 Y 7
8 Z 8

Merging data frames so that values in one data frame are inserted in matching row numbers in another

I want to change the format of a dataset in a certain way. Say I have a list of data indicating when and how many times participants attended couselling sessions. They could attend a maximum of three sessions any time within a twelve week period. Say their data is recorded like so
set.seed(01234)
df1 <- data.frame(id = rep(LETTERS[1:4], each = 3),
session = rep(paste0("session", 1:3), length.out = 12),
week1 = c(sort(sample(1:12, 3, replace = F)),
sort(sample(1:12, 3, replace = F)),
sort(sample(1:12, 3, replace = F)),
sort(sample(1:12, 3, replace = F))))
df1$week1[c(3,8,9,12)] <- NA # insert some NAs representing sessions that weren't attended
And the dataset looks like this
# id session week1
# 1 A session1 2
# 2 A session2 7
# 3 A session3 NA
# 4 B session1 7
# 5 B session2 8
# 6 B session3 10
# 7 C session1 1
# 8 C session2 NA
# 9 C session3 NA
# 10 D session1 6
# 11 D session2 7
# 12 D session3 NA
But I want a long dataset where each person has a row for each of the twelve weeks they could have attended, like so
df2 <- data.frame(id = rep(LETTERS[1:4], each = 12),
week2 = rep(1:12, times = 4))
So participant A's data looks like this
df2[1:12,]
# id week2
# 1 A 1
# 2 A 2
# 3 A 3
# 4 A 4
# 5 A 5
# 6 A 6
# 7 A 7
# 8 A 8
# 9 A 9
# 10 A 10
# 11 A 11
# 12 A 12
I would like to merge the two somehow so that the numbers in the week1 column of df1 are matched to their appropriate row in df2, ideally something like this (example is participant A only)
data.frame(id = rep("A", 12),
week = 1:12,
attended = c(0,1,0,0,0,0,1,0,0,0,0,0))
# id week attended
# 1 A 1 0
# 2 A 2 1
# 3 A 3 0
# 4 A 4 0
# 5 A 5 0
# 6 A 6 0
# 7 A 7 1
# 8 A 8 0
# 9 A 9 0
# 10 A 10 0
# 11 A 11 0
# 12 A 12 0
One approach utilizing a merge:
# merge the 2 dataframes
names(df2)[2] <- "week"
names(df1)[3] <- "week"
df <- merge(df2, df1, by=c("id", "week"), all.x=T)
# replace 'session' with 1s and 0s
df$session <- !is.na(df$session)
do.call(rbind, lapply(split(df2, df2$id), function(x){
x$attended = as.integer(x$week2 %in% df1$week1[df1$id == x$id[1]])
x
}))
You could expand the original data.frame using tidyr::complete so you don't need to merge, just define week1 as a factor with the correct number of levels:
library(dplyr)
library(tidyr)
df1 %>%
group_by(id) %>%
mutate(week1 = factor(week1, levels = 1:12),
session = !is.na(session)) %>%
complete(week1, fill = list(session = 0))
# A tibble: 52 x 3
# Groups: id [4]
id week1 session
<fct> <fct> <dbl>
1 A 1 0
2 A 2 1
3 A 3 0
4 A 4 0
5 A 5 0
6 A 6 0
7 A 7 1
8 A 8 0
9 A 9 0
10 A 10 0
# ... with 42 more rows

Dynamically copy dataframe columns with suffix in R

I am looking for a way to copy a column "col1" x times and appending each of these copies with one of x strings from a character vector. Example:
df <- data.frame(col1 = c(1,2,3,4,5))
suffix <- c("a", "b", "c")
resulting in:
df_suffix <- data.frame(col1 = c(1,2,3,4,5), col1_a = c(1,2,3,4,5), col1_b = c(1,2,3,4,5), col1_c = c(1,2,3,4,5))
col1 col1_a col1_b col1_c
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
You can use paste() to create the new columns inside df, and assign them the values of the first column:
df[,paste(names(df), suffix, sep = "_")] <- df[,1]
# col1 col1_a col1_b col1_c
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 4 4
#5 5 5 5 5

Resources