How to add row and column to a dataframe of different length? - r

I have two dataframes of different length:
Headers <- data.frame(x = paste0("x", 1:4), y = 1:4)
Dataset <- data.frame(H = c(20, 10, 11, 8, 10), W = c(30, 20, 30, 10, 6))
Headers
x y
1 x1 1
2 x2 2
3 x3 3
4 x4 4
Dataset
H W
1 20 30
2 10 20
3 11 30
4 8 10
5 10 6
I need to convert column 'x' from 'Headers' to header, and column 'y' to corresponding values, and then bind to 'Dataset':
H W x1 x2 x3 x4
20 30 1 2 3 4
10 20 1 2 3 4
11 30 1 2 3 4
8 10 1 2 3 4
10 6 1 2 3 4
Here is the code which I tried:
H <- t(Headers)
Dataset <- cbind(H, Dataset)
names(H) <- NULL
Dataset <- qpcR:::cbind.na(H, Dataset)
Any help will be appreciated.Thanks

Transpose 'y' and repeat to the desired number of rows. Set column names to 'x'.
cbind(Dataset, `colnames<-`(t(Headers$y)[rep(1, nrow(Dataset)), ], Headers$x))
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4

A data.table approach:
library(data.table)
cbind(Dataset, dcast(Headers, . ~ x, value.var = "y")[,-1])
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
A tidyverse approach:
library(tidyverse)
Headers %>%
rownames_to_column %>%
spread(x, y) %>%
summarise_all(funs(first(na.omit(.)))) %>%
cbind(Dataset, .) %>% select(-rowname)
Output:
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4

You could also go with basic R
cbind(Dataset,data.frame(matrix(rep(Headers$y,each=nrow(Dataset)),nrow=nrow(Dataset))))

Related

How to count the number of occurrences of a given value for each row?

I'm sure this is a really easy fix but I can't seem to find the answer... I am trying to create a column at the end of my dataframe that is a sum of the number of times a specific value (say "1") appears across that row. So for example, if I started with the following dataframe:
X1 <- c(5,1,7,8,1,5)
X2 <- c(5,0,0,2,3,7)
X3 <- c(6,2,3,4,1,7)
X4 <- c(1,1,5,2,1,7)
df <- data.frame(id,X1,X2,X3,X4)
id X1 X2 X3 X4
1 1 5 5 6 1
2 2 1 0 1 1
3 3 7 0 3 5
4 4 8 2 4 2
5 5 1 3 2 1
6 6 5 7 7 7
and I was trying to identify how many times the value "1" appears across that row, I would want the output to look like this:
id X1 X2 X3 X4 one_appears
1 1 5 5 6 1 2
2 2 1 0 1 1 3
3 3 7 0 3 5 0
4 4 8 2 4 2 0
5 5 1 3 2 1 2
6 6 5 7 7 7 0
Thanks very much in advance!
library(tidyverse)
df %>%
mutate(
one = rowSums(across(everything(), ~ .x == 1))
)
# A tibble: 6 × 6
id X1 X2 X3 X4 one
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 5 5 6 1 2
2 2 1 0 2 1 2
3 3 7 0 3 5 0
4 4 8 2 4 2 0
5 5 1 3 1 1 3
6 6 5 7 7 7 0
EDIT:
df %>%
mutate(
one = rowSums(across(starts_with("X"), ~ .x == 1))
)
df %>%
mutate(
one = rowSums(across(X1:X4, ~ .x == 1))
)
We can use rowSums on a logical matrix
df$one_appears <- rowSums(df == 1, na.rm = TRUE)
-output
> df
id X1 X2 X3 X4 one_appears
1 1 5 5 6 1 2
2 2 1 0 1 1 3
3 3 7 0 3 5 0
4 4 8 2 4 2 0
5 5 1 3 2 1 2
6 6 5 7 7 7 0
Another option using apply with sum:
id <- c(1:6)
X1 <- c(5,1,7,8,1,5)
X2 <- c(5,0,0,2,3,7)
X3 <- c(6,2,3,4,1,7)
X4 <- c(1,1,5,2,1,7)
df <- data.frame(id,X1,X2,X3,X4)
df$one_appear = apply(df, 1, \(x) sum(x == 1))
df
#> id X1 X2 X3 X4 one_appear
#> 1 1 5 5 6 1 2
#> 2 2 1 0 2 1 2
#> 3 3 7 0 3 5 0
#> 4 4 8 2 4 2 0
#> 5 5 1 3 1 1 3
#> 6 6 5 7 7 7 0
Created on 2023-01-18 with reprex v2.0.2
This answer may not be the best of the approach, but an alternative that I tried so thought to share
code
library(dplyr)
X1 <- c(5,1,7,8,1,5)
X2 <- c(5,0,0,2,3,7)
X3 <- c(6,2,3,4,1,7)
X4 <- c(1,1,5,2,1,7)
df <- data.frame(X1,X2,X3,X4) %>% rowwise %>%
mutate(across(starts_with('X'), function(x) ifelse(x==1,1,NA), .names = 'Y_{col}'),
one_appears=sum(across(starts_with('Y')), na.rm = T)
)

How can I create rank variables for each other variables in R?

Hello dear community members.
I'm trying to create ranking variables for certain variables in R. For example I want to transform this data frame
> df
X1 X2 X3 X4 X5
1 1 4 7 3 2
2 2 5 8 4 3
3 3 6 3 5 4
4 4 1 2 6 5
5 5 2 1 7 6
into
> df
X1 X2 X3 X4 X5 x1_rank x2_rank x3_rank
1 1 4 7 3 2 3 2 1
2 2 5 8 4 3 3 2 1
3 3 6 3 5 4 3 1 3
4 4 1 2 6 5 1 3 2
5 5 2 1 7 6 1 2 3
like this (select X1~X3, and make ranking variables between them).
I tried this code
for (i in 1:nrow(df)) {
df_rank <- df[i, ] %>%
dplyr::select(X1, X2, X3, X4) %>%
base::rank()
}
I can imagine I can solve this problem by using for loop but I'm beginner about R so I do not understand why this doesn't work.
One way to achieve it is to use the ties argument on negative values.
df <- tibble::tribble(
~x1, ~x2, ~x3, ~x4, ~x5,
1,4,7,3,2,
2,5,8,4,3,
3,6,3,5,4,
4,1,2,6,5,
5,2,1,7,6
)
library(magrittr)
df %>%
cbind(
t(apply(-df[,1:3], 1, rank, ties = "min")) %>% {colnames(.) <- paste0(colnames(.), "_rank"); .}
)
x1 x2 x3 x4 x5 x1_rank x2_rank x3_rank
1 1 4 7 3 2 3 2 1
2 2 5 8 4 3 3 2 1
3 3 6 3 5 4 2 1 2
4 4 1 2 6 5 1 3 2
5 5 2 1 7 6 1 2 3
As to why your code does not work - the for loop does not return anything, instead, it assigns a variable df_rank every iteration. To fix it, you could declare an object outside of the loop, and add content to it each iteration, and finally bind that to the original data.
m <- matrix(ncol = 3, nrow = 5)
for (i in 1:nrow(df)) {
m[i,] <- -df[i, ] %>%
dplyr::select(x1, x2, x3) %>%
base::rank(ties = "min")
}
colnames(m) <- paste0(names(df)[1:3], "_rank")
df %>% bind_cols(m)

derive multiple columns from multiple columns in r

Consider that we have the below data and would like to derive variables z1,z2,z3 from x1y1, x2y2 and x3*y3.
could you please help me how i can achieve this in R
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c('A','B','C','D','E','F')
y1 <- c(1,2,3,4,5,6)
y2 <- c(2,3,4,5,6,7)
y3 <- c(3,4,5,6,7,8)
testa <- data.frame(x1,x2,x3,x4,y1,y2,y3)
Assuming the integrity of your structure and naming conventions, you can select the x and y variables, multiple them together as a group, and then assign back to z.
var_i <- 1:3
testa[paste0("z", var_i)] <- testa[paste0("x", var_i)] * testa[paste0("y", var_i)]
x1 x2 x3 x4 y1 y2 y3 z1 z2 z3
1 1 2 3 A 1 2 3 1 4 9
2 2 3 4 B 2 3 4 4 9 16
3 3 4 5 C 3 4 5 9 16 25
4 4 5 6 D 4 5 6 16 25 36
5 5 6 7 E 5 6 7 25 36 49
6 6 7 8 F 6 7 8 36 49 64
If we want to do this automatically, a tidyverse option is
library(dplyr)
library(stringr)
testa <- testa %>%
mutate(across(x1:x3, ~ .x * get(str_replace(cur_column(), "x",
"y")), .names = "{str_replace(.col, 'x', 'z')}"))
-output
testa
x1 x2 x3 x4 y1 y2 y3 z1 z2 z3
1 1 2 3 A 1 2 3 1 4 9
2 2 3 4 B 2 3 4 4 9 16
3 3 4 5 C 3 4 5 9 16 25
4 4 5 6 D 4 5 6 16 25 36
5 5 6 7 E 5 6 7 25 36 49
6 6 7 8 F 6 7 8 36 49 64

Get all combinations of a variable and their corresponding values in a grouped data set

My data looks like this:
mydata <- data.frame(id = c(1,1,1,2,2,3,3,3,3),
subid = c(1,2,3,1,2,1,2,3,4),
time = c(16, 18, 20, 10, 11, 7, 9, 10, 11))
id subid time
1 1 1 16
2 1 2 18
3 1 3 20
4 2 1 10
5 2 2 11
6 3 1 7
7 3 2 9
8 3 3 10
9 3 4 11
My goal is to transform the data to:
newdata <- data.frame(id = c(1,1,1,2,3,3,3,3,3,3),
subid.1 = c(1,1,2,1,1,1,1,2,2,3),
subid.2 = c(2,3,3,2,2,3,4,3,4,4),
time.1 = c(16,16,18,10,7,7,7,9,9,10),
time.2 = c(18,20,20,11,9,10,11,10,11,11))
id subid.1 subid.2 time.1 time.2
1 1 1 2 16 18
2 1 1 3 16 20
3 1 2 3 18 20
4 2 1 2 10 11
5 3 1 2 7 9
6 3 1 3 7 10
7 3 1 4 7 11
8 3 2 3 9 10
9 3 2 4 9 11
10 3 3 4 10 11
So it's not a simple reshape from long-to-wide procedure: The idea is, within groups defined by id, to take all possible combinations of
subid's and their corresponding time values, and get those into a wide format.
I know I can get all possible combinations using, for example gtools::combinations. The first group consists of 3 rows, so
gtools::combinations(n=3, r=2)
gives me the matrix of the new subid.1 and subid.2 pair for group id==1:
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 2 3
But then I don't know how to proceed (neither to reshape the group with id==1 to this format, nor how to do that separately for each group). Thank you!
with base R:
subset(merge(mydata, mydata, by="id", suffix=c(".1",".2")), subid.1 < subid.2)
# id subid.1 time.1 subid.2 time.2
# 1 1 1 16 2 18
# 2 1 1 16 3 20
# 3 1 2 18 3 20
# 4 2 1 10 2 11
# 5 3 1 7 2 9
# 6 3 1 7 3 10
# 7 3 1 7 4 11
# 8 3 2 9 3 10
# 9 3 2 9 4 11
# 10 3 3 10 4 11
dplyr version:
mydata %>% inner_join(.,.,by="id",suffix=c(".1",".2")) %>% filter(subid.1 < subid.2)
data.table version :
setDT(mydata)
mydata[mydata, on="id", allow.cartesian=TRUE][subid < i.subid]
# id subid time i.subid i.time
# 1: 1 1 16 2 18
# 2: 1 1 16 3 20
# 3: 1 2 18 3 20
# 4: 2 1 10 2 11
# 5: 3 1 7 2 9
# 6: 3 1 7 3 10
# 7: 3 2 9 3 10
# 8: 3 1 7 4 11
# 9: 3 2 9 4 11
# 10: 3 3 10 4 11
or to get your column names right, but it kills the fun of a short solution :).
merge(mydata, mydata, by="id", suffix=c(".1",".2"), allow.cartesian=TRUE)[subid.1 < subid.2]
Forgot to state that I came up with this rather lame 4-step solution:
step1 <- lapply(unique(mydata$id), function(x) {
nrows <- nrow(mydata[which(mydata$id == x), ])
combos <- gtools::combinations(n=nrows, r=2)
return(as.data.frame(cbind(x, combos)))
})
step2 <- dplyr::bind_rows(step1)
step3a <- merge(step2, mydata, by.x = c("x", "V2"), by.y = c("id", "subid"))
step3b <- merge(step3a, mydata, by.x = c("x", "V3"), by.y = c("id", "subid"))
step4 <- step3b[, c(1, 3, 2, 4, 5)]
names(step4) <- c("id", "subid.1", "subid.2", "time.1", "time.2")
It's ugly but works.
Using the data.table-package:
library(data.table)
setDT(mydata)[, .(subid = c(t(combn(subid, 2)))), by = id
][, grp := rep(1:2, each = .N/2), by = id
][mydata, on = .(id, subid), time := time
][, dcast(.SD, id + rowid(grp) ~ grp, value.var = list('subid','time'), sep = '.')]
which gives you:
id grp subid.1 subid.2 time.1 time.2
1: 1 1 1 2 16 18
2: 1 2 1 3 16 20
3: 1 3 2 3 18 20
4: 2 4 1 2 10 11
5: 3 5 1 2 7 9
6: 3 6 1 3 7 10
7: 3 7 1 4 7 11
8: 3 8 2 3 9 10
9: 3 9 2 4 9 11
10: 3 10 3 4 10 11

Change variable value-- repeated IDs

I've this data set
id <- c(0,0,1,1,2,2,3,3,4,4)
gender <- c("m","m","f","f","f","f","m","m","m","m")
x1 <-c(1,1,1,1,2,2,3,3,10,10)
x2 <- c(3,7,5,6,9,15,10,15,12,20)
alldata <- data.frame(id,gender,x1,x2)
which looks like:
id gender x1 x2
0 m 1 3
0 m 1 7
1 f 1 5
1 f 1 6
2 f 2 9
2 f 2 15
3 m 3 10
3 m 3 15
4 m 10 12
4 m 10 20
Notice that for each unique id x1 are similar, but x2 are different. I need to sort data by id and x2 (from smallest to largest)
and then for each unique id I need to set x1(for the second record) = x2 (for the first record).
The data would look like:
id gender x1 x2
0 m 1 3
0 m 3 7
1 f 1 5
1 f 5 6
2 f 2 9
2 f 9 15
3 m 3 10
3 m 10 15
4 m 10 12
4 m 12 20
I found this easier using data.table
> library(data.table)
> dt = data.table(alldata)
> setkey(dt, id, x2) #sort the data
This next line says: within each ID for x1, take the first value of x1, then every remaining value take from x2 as needed.
> dt[,x1 := c(x1[1], x2)[1:.N],keyby=id]
> dt
id gender x1 x2
1: 0 m 1 3
2: 0 m 3 7
3: 1 f 1 5
4: 1 f 5 6
5: 2 f 2 9
6: 2 f 9 15
7: 3 m 3 10
8: 3 m 10 15
9: 4 m 10 12
10: 4 m 12 20
Here's another possible solution using the seq command to select every other record:
alldata <- alldata[order(id, x2),]
alldata$x1[seq(2, length(alldata$x1), 2)] <- alldata$x2[seq(1, length(alldata$x2) - 1, 2)]
Here is a dplyr solution.
library(dplyr)
arrange(alldata,id,x2) %>%
group_by(id) %>%
mutate(x1= c(first(x1), first(x2)))
Source: local data frame [10 x 4]
Groups: id
id gender x1 x2
1 0 m 1 3
2 0 m 3 7
3 1 f 1 5
4 1 f 5 6
5 2 f 2 9
6 2 f 9 15
7 3 m 3 10
8 3 m 10 15
9 4 m 10 12
10 4 m 12 20
`rownames<-`(do.call(rbind,by(alldata,alldata$id,function(g) { o <- order(g$x2); g$x1[o[2]] <- g$x2[o[1]]; g; })),NULL);
## id gender x1 x2
## 1 0 m 1 3
## 2 0 m 3 7
## 3 1 f 1 5
## 4 1 f 5 6
## 5 2 f 2 9
## 6 2 f 9 15
## 7 3 m 3 10
## 8 3 m 10 15
## 9 4 m 10 12
## 10 4 m 12 20

Resources