I am trying to run rcorr as part of a function over multiple dataframes, extracting p-values for each test but am receiving an NA values when piping into rcorr.
For example if I create a matrix and run rcorr on this matrix, extracting the pvalue table with $P and the pvalue with [2] it works...
library(Hmisc)
library(magrittr)
mt <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ncol=2)
rcorr(mt, type="pearson")$P[2]
[1] 0
But if I try and pipe this I only recieve NAs.
mt %>% rcorr(., type="pearson")$P[2]
[1] NA NA
mt %>% rcorr(., type="pearson")$P
Error in .$rcorr(., type = "pearson") :
3 arguments passed to '$' which requires 2
Can someone explain to me why this doesnt work or give a workaround? Ideally I don't want to have to create variables for each of my matrices before running rcorr
Thanks in advance.
Solution
(mt %>% mcor(type = "pearson"))$P[2]
# [1] 0
Explanation
Notice that both
mt %>% rcorr(., type = "pearson")
and
mt %>% rcorr(type = "pearson")
work as expected. The problem is that you add $ and [ to the second object, which basically are like subsequent function calls. For instance,
s <- function(x) c(1, 1 + x)
1 %>% s
# [1] 1 2
works as expected, but
1 %>% s[1]
# Error in .[s, 1] : incorrect number of dimensions
doesn't return 1 since we are trying to do something like s[1](1) instead.
Now
1 %>% s(x = .)[1]
# Error in .[s(x = .), 1] : incorrect number of dimensions
just as yours
mt %>% rcorr(., type = "pearson")$P[2]
# [1] NA NA
is trickier. Notice that it can be rewritten as
mt %>% `[`(`$`(rcorr(., type = "pearson"), "P"), 2)
# [1] NA NA
So, now it becomes clear that the latter doesn't work because it basically is
`[`(mt, `$`(rcorr(mt, type = "pearson"), "P"), 2)
# [1] NA NA
which, when deciphered, is
mt[rcorr(mt, type = "pearson")$P, 2]
# [1] NA NA
A tidy solution, at least I hope!
library(dplyr)
library(broom)
library(Hmisc)
mtcars[, 5:6] %>%
as.matrix()%>%
rcorr()%>%
tidy() %>%
select(estimate)
A simple solution using %$% from magrittr:
library(Hmisc)
library(magrittr)
mt <- matrix(1:10, ncol=2)
mt %>% rcorr(type="pearson") %$% P[2]
[1] 0
Related
I have to calculate the number of missing values per observation in a data set. As there are several variables across multiple time periods, I thought it best to try a function to keep my syntax clean. The first part of looking up the number of missing values works fine:
data$NMISS <- data %>%
select('x1':'x4') %>%
apply(1, function(x) sum(is.na(x)))
But when I try turn it into a function I get "Error in select():! NA/NaN argument"
library(dplyr)
library(tidyverse)
data <- data.frame(x1 = c(NA, 1, 5, 1),
x2 = c(7, 1, 1, 5),
x3 = c(9, NA, 4, 9),
x4 = c(3, 4, 1, 2))
NMISSfunc <- function (dataFrame,variables) {
dataFrame %>% select(variables) %>%
apply(1, function(x) sum(is.na(x)))
}
data$NMISS2 <- NMISSfunc(data,'x1':'x4')
I think it doesn't like the : in the range as it will accept c('x1','x2','x3','x4') instead of 'x1':'x4'
Some of the ranges are over twenty columns so listing them doesn't really provide a solution to keep the syntax neat.
Any suggestions?
You are right that you can't use "x4":"x4", as this isn't valid use of the : operator in this context. To get this to work in a tidyverse-style, your variables variable needs to be selectively unquoted inside select. Fortunately, the tidyverse has the curly-curly notation {{variables}} for handling exactly this situation:
NMISSfunc <- function (dataFrame, variables) {
dataFrame %>%
select({{variables}}) %>%
apply(1, function(x) sum(is.na(x)))
}
Now we can use x1:x4 (without quotes) and the function works as expected:
NMISSfunc(data, x1:x4)
#> [1] 1 1 0 0
Created on 2022-12-13 with reprex v2.0.2
Why not simply,
data %>%
mutate(NMISS = rowSums(is.na(select(., x1:x4))))
x1 x2 x3 x4 NMISS
1 NA 7 9 3 1
2 1 1 NA 4 1
3 5 1 4 1 0
4 1 5 9 2 0
I want to sort my data frame based on a column that I pass to dplyr's arrange function with its position. This works as long as I'm using the "old" tidyverse/magrittr pipe operator. However, changing it to the new R pipe returns an error:
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
# Works
df %>%
arrange(.[1])
x y
1 1 3
2 3 1
3 4 2
4 5 4
# Throws error
df |>
arrange(.[1])
Error:
! arrange() failed at implicit mutate() step.
Problem with `mutate()` column `..1`.
i `..1 = .[1]`.
x object '.' not found
Run `rlang::last_error()` to see where the error occurred.
How can I still arrange by column position when using the new R pipe?
I realize that the |> operator does not accept the "." as an argument, but I still don't know how else I could address the data then.
Update:
This seems to work, but wondering if there is something more straightforward:
df |>
arrange(cur_data() |> select(1))
You can pass a lambda function (suggestion by #Martin Morgan in the comments to specify the columns position instead of names):
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
df |>
(\(z) arrange(z, z[[1]]))()
# x y
# 1 1 3
# 2 3 1
# 3 4 2
# 4 5 4
With order, this looks okay:
df |>
(\(z) z[order(z[,1]), ])()
x y
3 1 3
1 3 1
2 4 2
4 5 4
|> does not support dot but tidyverse functions do support cur_data().
# 1
df |> arrange(cur_data()[1])
Another possibility is the Bizarro pipe which is not really a pipe but does look like one and uses only base R.
# 2
df ->.; arrange(., .[1])
or any of these work-arounds
# 3
arrange1 <- function(.) arrange(., .[1])
df |> arrange1()
# 4
df |> (function(.) arrange(., .[1]))()
# 5
df |> list() |> setNames(".") |> with(arrange(., .[1]))
# 6
with. <- function(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
df |> with.(arrange(., .[1]))
# these hard code variable names so are not directly comparable
# but can be used if that is ok
# 7
df |> arrange(x)
# 8
df |> with(arrange(data.frame(x, y), x))
I have a data.frame that looks like
df <- data.frame(P1 = c("ATG","GTA","GGG","GGG"), P2 = c("TGG","GAT","GGG","GCG"))
I want to convert each DNA codon to an amino-acid using the below function (but any translate option is viable), and output an identical data.frame but with single letter amino-acids rather than codons:
library(Biostrings)
library(seqinr)
translate_R <- function(x)
{
translate(s2c(as.character(x)))
}
It works for individual elements of the data.frame
> translate_R(df[1,1])
[1] "M"
But trying to apply this to the whole data.frame isn't working. What am I missing? I don't understand why there is an error, as googling how to do this suggests it should work. Missing something fundamental I guess.
> df[] <- lapply(df, translate_R)
Error in seq.default(from = frame + 1, to = frame + l, by = 3) :
wrong sign in 'by' argument
In addition: Warning message:
In s2c(as.character(x)) :
Error in seq.default(from = frame + 1, to = frame + l, by = 3) :
wrong sign in 'by' argument
Your translate_R function is expecting a single value, but it's getting a vector. You can fix this by passing in individual values.
In other words, iterate over columns of df with an outer apply and then over values in each column with an inner apply.
Here's how to do it with base R:
data.frame(lapply(df, function(x) sapply(x, translate_R)))
And here's a tidyverse version with map:
library(tidyverse)
df %>% mutate(across(everything(), ~map(., translate_R)))
In both cases, the output is:
P1 P2
1 M W
2 V D
3 G G
4 G A
Another potential tidyverse solution is to use the "rowwise" tidyverse function:
library(tidyverse)
library(Biostrings)
library(seqinr)
translate_R <- function(x) {
translate(s2c(as.character(x)))
}
df <- data.frame(P1 = c("ATG","GTA","GGG","GGG"), P2 = c("TGG","GAT","GGG","GCG"))
df %>%
rowwise() %>%
mutate(across(everything(), ~ translate_R(.x)))
#> # A tibble: 4 x 2
#> # Rowwise:
#> P1 P2
#> <chr> <chr>
#> 1 M W
#> 2 V D
#> 3 G G
#> 4 G A
Created on 2021-07-21 by the reprex package (v2.0.0)
I just asked a question about generating multiple columns at once with dplyr, and I'm a bonehead and oversimplified the problem and have another question. I'd like to find a dplyr method for dynamically generating columns based on other columns.
cols <- c("x", "y")
foo <- c("a", "b")
bar <- c("c", "d")
df <- data.frame(a = 1, b = 2, c = 10, d = 20)
df[cols] <- df[foo] * df[bar]
In my first iteration of the question, I included only one set of previously defined columns, so the following worked:
df %>%
mutate_at(vars(foo), list(new = ~ . * 5)) %>%
rename_at(vars(matches('new')), ~ c('x', 'y'))
However, as the first few lines of code suggest, I would like to instead multiply two existing columns together, and am unable to figure out how to do this. I have tried:
df %>%
mutate_at(c(vars(foo), vars(bar)),
function(x,y) {x * y})
which returns the error:
Error in (function (x, y) : argument "y" is missing, with no default
Is it possible to reference multiple sets of columns to be used on each other with mutate_at?
Well as you want to work with two columns, I think purrr::map2 is the function to work with:
library(purrr)
library(dplyr)
map2(foo, bar, ~ df[[.x]] * df[[.y]]) %>%
set_names(cols) %>%
bind_cols(df, .)
#> a b c d x y
#> 1 1 2 10 20 10 40
I have a large matrix that is calculating the distance between two different zip codes (using rgeosphere package). I would like to run a function that finds all zip code pairings that are <=x distance away from each other and create a list of them. The data looks like this:
91423 92231 94321
90034 3 4.5 2.25
93201 3.75 2.5 1.5
94501 2 6 0.5
So if I ran the function to extract all zip code pairings that are <2 miles away I would end up with these zip codes:
94321
94321
93201
94501
The goal is basically to identify all adjacent zip codes in the US to a list of zip codes I have. If there is a better way to do this I am open to suggestions.
Perhaps something like the following. It will be slow, but it should work.
for(i in 1:nrow(data)){
for (j in 1:ncol(data)){
if(data[i,j]<distance){
if(exists(hold.zips)==FALSE){
hold.zips<-matrix(c(colnames(data)[i],colnames(data)[j]),ncol=2)
}else{
temp<-matrix(c(colnames(data)[i],colnames(data)[j]),ncol=2)
hold.zips<-rbind(hold.zips,temp)
}
}
}
}
This should work. Gives a nice list as output (calling your data x):
rn = rownames(x)
apply(x, 2, function(z) rn[z < 2])
# $`91423`
# character(0)
#
# $`92231`
# character(0)
#
# $`94321`
# [1] "93201" "94501"
Here is the Tidyverse solution:
library(dplyr)
library(tidyr)
# your data
dat <- matrix(c(3,3.75,2,4.5,2.5,6,2.25,1.5,0.5), nrow = 3, ncol = 3)
rownames(dat) <- c(90034, 93201, 94501)
colnames(dat) <- c(91423, 92231, 94321)
# tidyverse solution
r <- rownames(dat)
dat_tidy <- dat %>%
as_tibble() %>%
mutate(x = r) %>%
select(x, everything()) %>%
gather(key = y,
value = distance,
-x) %>%
filter(distance < 2)
print(dat_tidy)
# note if your matrix is a symetric matrix then
# to remove duplicates, filter would be:
# filter(x < y,
# distance < 2)