Rename columns in nested lists and row bind - r

I've a nested list of objects that I'd like to first rename some variables and row bind its object, but selecting only some variables.
In the example below, I'd like to rename columns A to a in the second object, and w to x in the third object to, then row bind all three object selecting only columns a and x using.
Data:
df <- list(structure(list(a = 1:3,
x = c(-1.99, -1.11, -0.34),
y = c("C", "B", "A")), .Names = c("a", "x", "y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)), structure(list(a = 1:3, x = c(-0.44, -1.07, -0.23)), .Names = c("A", "x"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)),
structure(list(a = 1:3, x = c(-0.62, -0.60, -0.06),
y = c(3L, 2L, 1L)), .Names = c("a", "w", "y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)))
List structure:
> lapply(df, names)
[[1]]
[1] "a" "x" "y"
[[2]]
[1] "A" "x"
[[3]]
[1] "a" "w" "y"
Then, row binding then:
library(plyr)
df2 <- ldply(df, data.frame)

using purrr (map), dplyr(rename,select,bind_rows,%>%) and magrittr (%<>%,%>%) ):
library(purrr)
library(dplyr)
library(magrittr)
df[[2]] %<>% rename(.,a = A)
df[[3]] %<>% rename(.,x = w)
df %>% map_df(. %>% select("a","x"))
# # A tibble: 9 x 2
# a x
# <int> <dbl>
# 1 1 -1.99
# 2 2 -1.11
# 3 3 -0.34
# 4 1 -0.44
# 5 2 -1.07
# 6 3 -0.23
# 7 1 -0.62
# 8 2 -0.60
# 9 3 -0.06
Or in base R:
names(df[[2]])[names(df[[2]]) == "A"] <- "a"
names(df[[3]])[names(df[[3]]) == "w"] <- "x"
do.call(rbind,lapply(df,"[",c("a","x")))

You could achieve that with:
library(plyr)
df = lapply(df, function(x) {plyr::rename(x,c("A"="a","w"="x"),warn_missing = F)})
df2 <- ldply(lapply(df, function(x) {x[,c("a","x")]}), data.frame)
Output:
a x
1 1 -1.99
2 2 -1.11
3 3 -0.34
4 1 -0.44
5 2 -1.07
6 3 -0.23
7 1 -0.62
8 2 -0.60
9 3 -0.06
Hope this helps.

Another idea could be to create a named vector v with the replacement values, loop over your list, rename if there is a match and select the desired columns.
v <- c("a" = "A", "x" = "w")
map_df(df, .f = ~ rename_if(
.x,
.p = names(.x) %in% v,
.f = funs(stringi::stri_replace_all_fixed(., v, names(v), vectorize_all = FALSE))) %>%
select(names(v))
)
Which gives:
## A tibble: 9 x 2
# a x
# <int> <dbl>
#1 1 -1.99
#2 2 -1.11
#3 3 -0.34
#4 1 -0.44
#5 2 -1.07
#6 3 -0.23
#7 1 -0.62
#8 2 -0.60
#9 3 -0.06

Related

How to iteratively mutate every column in a tibble based on an operation with other tibble?

I have the following data frame:
library(tidyverse)
dat <- structure(list(residue = c("A", "R", "N"), PP1 = c(-0.96, 0.8,
0.82), KF2 = c(-1.67, 1.27, -0.07)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
It looks like this:
> dat
# A tibble: 3 × 3
residue PP1 KF2
<chr> <dbl> <dbl>
1 A -0.96 -1.67
2 R 0.8 1.27
3 N 0.82 -0.07
What I want to do is to multiply every column other than residue with the corresponding tibble here:
weight_dat <-structure(list(residue = c("A", "N", "R"), weight = c(2, 1, 2
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))
> weight_dat
# A tibble: 3 × 2
residue weight
<chr> <dbl>
1 A 2
2 R 2
3 N 1
Resulting in
residue PP1 KF2
1 A (-0.96*2)=-1.92 (-1.67*2) = -3.34
2 R (0.8*2)=1.6 (1.27*2) = 2.54
3 N (0.82*1)=0.82 (-0.07*1) = -0.07
in reality the dat has 3 rows and thousands of columns.
With match + *:
w <- weight_dat$weight[match(dat$residue, weight_dat$residue)]
cbind(dat[1], dat[-1] * w)
residue PP1 KF2
1 A -1.92 -3.34
2 R 1.60 2.54
3 N 0.82 -0.07
dplyr option:
library(dplyr)
dat %>%
mutate(across(-1, `*`, weight_dat$weight[match(dat$residue, weight_dat$residue)]))

Verifyin if there's at least two columns have the same value in a specefic column

i have a data and i want to see if my variables they all have unique value in specefic row
let's say i want to analyze row D
my data
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
> TRUE (because all the three variables have unique value)
Second example
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 4
>False (because F and T have the same value in row D )
In base R do
f1 <- function(dat, ind) {
tmp <- unlist(dat[ind, -1])
length(unique(tmp)) == length(tmp)
}
-testing
> f1(df, 4)
[1] TRUE
> f1(df1, 4)
[1] FALSE
data
df <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = 3:6), class = "data.frame", row.names = c(NA, -4L))
df1 <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = c(3L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-4L))
You can use dplyr for this:
df %>%
summarize_at(c(2:ncol(.)), n_distinct) %>%
summarize(if_all(.fns = ~ .x == nrow(df)))

Merging data frame and filling missing values [duplicate]

This question already has answers here:
Merging a lot of data.frames [duplicate]
(1 answer)
How do I replace NA values with zeros in an R dataframe?
(29 answers)
Closed 2 years ago.
I want to merge the following 3 data frames and fill the missing values with -1. I think I should use the fct merge() but not exactly know how to do it.
> df1
Letter Values1
1 A 1
2 B 2
3 C 3
> df2
Letter Values2
1 A 0
2 C 5
3 D 9
> df3
Letter Values3
1 A -1
2 D 5
3 B -1
desire output would be:
Letter Values1 Values2 Values3
1 A 1 0 -1
2 B 2 -1 -1 # fill missing values with -1
3 C 3 5 -1
4 D -1 9 5
code:
> dput(df1)
structure(list(Letter = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Values1 = c(1, 2, 3)), class = "data.frame", row.names = c(NA,
-3L))
> dput(df2)
structure(list(Letter = structure(1:3, .Label = c("A", "C", "D"
), class = "factor"), Values2 = c(0, 5, 9)), class = "data.frame", row.names = c(NA,
-3L))
> dput(df3)
structure(list(Letter = structure(c(1L, 3L, 2L), .Label = c("A",
"B", "D"), class = "factor"), Values3 = c(-1, 5, -1)), class = "data.frame", row.names = c(NA,
-3L))
You can get data frames in a list and use merge with Reduce. Missing values in the new dataframe can be replaced with -1.
new_df <- Reduce(function(x, y) merge(x, y, all = TRUE), list(df1, df2, df3))
new_df[is.na(new_df)] <- -1
new_df
# Letter Values1 Values2 Values3
#1 A 1 0 -1
#2 B 2 -1 -1
#3 C 3 5 -1
#4 D -1 9 5
A tidyverse way with the same logic :
library(dplyr)
library(purrr)
list(df1, df2, df3) %>%
reduce(full_join) %>%
mutate(across(everything(), replace_na, -1))
Here's a dplyr solution
df1 %>%
full_join(df2, by = "Letter") %>%
full_join(df3, by = "Letter") %>%
mutate_if(is.numeric, function(x) replace_na(x, -1))
output:
Letter Values1 Values2 Values3
<chr> <dbl> <dbl> <dbl>
1 A 1 0 -1
2 B 2 -1 -1
3 C 3 5 -1
4 D -1 9 5

how to calculate correlation between one row and remaining row of a data frame

I have a data like this
name col1 col2 col3
1 a 43.78 43.80 43.14
2 b 43.84 43.40 42.85
3 c 37.92 37.64 37.54
4 d 31.72 31.62 31.74
lets call it df
df<-structure(list(name = structure(1:4, .Label = c("a", "b", "c",
"d"), class = "factor"), col1 = c(43.78, 43.84, 37.92, 31.72),
col2 = c(43.8, 43.4, 37.64, 31.62), col3 = c(43.14, 42.85,
37.54, 31.74)), class = "data.frame", row.names = c(NA, -4L
))
now I want to calculate the R2 and adjusted R2 between row d and the other rows
If I want to see all combinations, I can do the following for correlation
out <- cor(t(df[, -1]))
out[upper.tri(out, diag = TRUE)] <- NA
rownames(out) <- colnames(out) <- df$name
out <- na.omit(reshape::melt(t(out)))
out <- out[ order(out$X1, out$X2), ]
which gives me this
X1 X2 value
5 a b 0.8841255
9 a c 0.6842705
13 a d -0.6491118
10 b c 0.9457125
14 b d -0.2184630
15 c d 0.1105508
but I only want between row d and the rest and also I want to have both correlation coefficient and adjusted R2
It's easier if you transpose your data frame first. After that use purrr::map and broom::tidy to get the job done
library(tidyverse)
df <- structure(list(name = structure(1:4, .Label = c("a", "b", "c",
"d"), class = "factor"), col1 = c(43.78, 43.84, 37.92, 31.72),
col2 = c(43.8, 43.4, 37.64, 31.62), col3 = c(43.14, 42.85,
37.54, 31.74)), class = "data.frame", row.names = c(NA, -4L
))
# transpose df
df_transpose <- df %>%
gather(variable, value, -name) %>%
spread(name, value) %>%
select(-variable)
# loop through columns, apply `cor` vs 'd' column
colnames(df_transpose) %>%
set_names() %>%
map(~ cor(df_transpose[, .x], df_transpose[, 'd'])) %>%
map_dfr(., broom::tidy, .id = "var")
#> # A tibble: 4 x 2
#> var x
#> <chr> <dbl>
#> 1 a -0.649
#> 2 b -0.218
#> 3 c 0.111
#> 4 d 1
Created on 2019-03-15 by the reprex package (v0.2.1.9000)
If I understand you right, you want the correlation between d and every single remaining column.
(M <- t(as.matrix(`rownames<-`(df1[-1], df$name))))
# a b c d
# col1 43.78 43.84 37.92 31.72
# col2 43.80 43.40 37.64 31.62
# col3 43.14 42.85 37.54 31.74
Due to vectorization we can calculate the correlation between d and the remainder very easily:
out <- t(cor(M[, 4], M[, -4]))
The R2 is just the square of the correlation (Ref.) which we can cbind to the correlations.
`colnames<-`(cbind(out, out^2), c("cor", "r2"))
# cor r2
# a -0.6491118 0.42134617
# b -0.2184630 0.04772607
# c 0.1105508 0.01222148
(Note: Case you're wondering about the `colnames<-` form, you may want to read "Advanced R: 6.8.4 Replacement functions".)
Data
df1 <- structure(list(name = structure(1:4, .Label = c("a", "b", "c",
"d"), class = "factor"), col1 = c(43.78, 43.84, 37.92, 31.72),
col2 = c(43.8, 43.4, 37.64, 31.62), col3 = c(43.14, 42.85,
37.54, 31.74)), class = "data.frame", row.names = c(NA, -4L
))

Add the index of list to bind_rows?

I have this data:
dat=list(structure(list(Group.1 = structure(3:4, .Label = c("A","B", "C", "D", "E", "F"), class = "factor"), Pr1 = c(65, 75)), row.names = c(NA, -2L), class = "data.frame"),NULL, structure(list( Group.1 = structure(3:4, .Label = c("A","B", "C", "D", "E", "F"), class = "factor"), Pr1 = c(81,4)), row.names = c(NA,-2L), class = "data.frame"))
I want to use combine using bind_rows(dat) but keeping the index number as a varaible
Output Include Type([[1]] and [[3]])
type Group.1 Pr1
1 1 C 65
2 1 D 75
3 3 C 81
4 3 D 4
data.table solution
use rbindlist() from the data.table-package, which had built-in id-support that respects NULL df's.
library(data.table)
rbindlist( dat, idcol = TRUE )
.id Group.1 Pr1
1: 1 C 65
2: 1 D 75
3: 3 C 81
4: 3 D 4
dplyr - partly solution
bind_rows also has ID-support, but it 'skips' empty elements...
bind_rows( dat, .id = "id" )
id Group.1 Pr1
1 1 C 65
2 1 D 75
3 2 C 81
4 2 D 4
Note that the ID of the third element from dat becomes 2, and not 3.
According to the documentation of bind_rows() you can supply the name for .id argument of the function. When you apply bind_rows() to the list of data.frames the names of the list containing your data.frames are assigned to the identifier column. [EDIT] But there is a problem mentioned by #Wimpel:
names(dat)
NULL
However, supplying the names to the list will do the thing:
names(dat) <- 1:length(dat)
names(dat)
[1] "1" "2" "3"
bind_rows(dat, .id = "type")
type Group.1 Pr1
1 1 C 65
2 1 D 75
3 3 C 81
4 3 D 4
Or in one line, if you prefer:
bind_rows(setNames(dat, seq_along(dat)), .id = "type")

Resources