Passing arguments to function inside sapply - r

Lets say that we have a list with numbers and we apply a function,
for example the mean function, to each element of the list:
l <- list(a = 1:10, b = 11:20)
l.mean <- sapply(l, mean)
l.mean # it works
But what if the list consists of strings and we want to paste them:
ll <- list(a=c("1", "2"), b=c("3", "4"))
ll.paste <- sapply(ll, as.call(list(paste, ll, sep = ", ")))
ll.paste # it does not work
The output I expect should be something like that:
# 1, 2
# 3, 4

We need the collapse argument of paste.
unname(sapply(ll, paste, collapse=', '))
A wrapper function for paste(., collapse=', ') is toString
unname(sapply(ll, toString))

Related

Replace letter in array in specific way

I have these vectors:
texts=c("AANAAA","NNAAAA","AAAAAA", "NAAANN")
letter=c("C","C","G","T","P","D")
I am trying to replace N for each element in texts and replace it with a character from second array letter by position.
Take "AANAAA", the N at third position should be replaced by the third element of letter, i.e. G
My expected output is:
texts=c("AAGAAA","CCAAAA","AAAAAA", "CAAAPD")
I was trying gsub and a for loop
for (i in 1:6) {
for (j in 1:4)
gsub("N",letter[i],texts[j][i])
}
but it didn't work.
You can do that in base R like this. No loop is required. gregexpr extracts a list of positions in texts where matched. We then replace characters at those positions with corresponding characters in letter using regmatches<-.
m <- gregexpr("N", texts, fixed = TRUE)
regmatches(texts, m) <- lapply(m, \(i, x) x[i], letter)
Output
> texts
[1] "AAGAAA" "CCAAAA" "AAAAAA" "CAAAPD"
Using loops, split texts into letters and replace "N" with corresponding letter:
sapply(strsplit(texts, ""), function(i){
ix <- which(i == "N")
i[ ix ] <- letter[ ix ]
paste(i, collapse = "")
})
# [1] "AAGAAA" "CCAAAA" "AAAAAA" "CAAAPD"
A solution based on tidyverse/purrr:
library(tidyverse)
texts=c("AANAAA","NNAAAA","AAAAAA", "NAAANN")
letter=c("C","C","G","T","P","D")
texts %>%
map(~ str_split(.x, "") %>% unlist %>%
map2_chr(letter, ~ if_else(.x == "N", .y, .x))) %>%
map_chr(~ str_c(.x, collapse = ""))
#> [1] "AAGAAA" "CCAAAA" "AAAAAA" "CAAAPD"

R Applying self made formatting function over data frame R

I am using R and I need to format the number within a dataframe, notably by imposing the number of digits before the decimal separator as well as after. E.g. 3.56 must become "0003,56000".
So I built my own function:
format <- function(x, nbr_before_comma, nbr_after_comma){
x= round(x, nbr_after_comma)
x = toString(x)
l = strsplit(x, "[.]")[[1]]
#print(l)
#print(nchar(l[2]))
before_comma = paste0(strrep("0",nbr_before_comma - nchar(l[1])),l[1])
after_comma = ifelse(length(l) > 1,
paste0(l[2],strrep("0",nbr_after_comma - nchar(l[2]))),
strrep("0", nbre_after_comma))
res = paste0(before_comma, ",", after_comma)
return(res)
}
Trying this on a single number will work. Now I am trying to apply this to a dataframe. Let's take the toy example:
df <- data.frame("a" = c(2.5,3.56,4.5))
I define moreprecisely what I want:
format44 <- function(x){
return(format(x,4,4))
}
I have tried several possibilities:
df[] <- lapply(df, format44)
with dplyr:
df <- df %>%
mutate(a = format44(a))
and finally:
df["a"] <- lapply(df["a"],format44)
None will work. actually, I get the same output everytime:
a
1 0002,5, 3
2 0002,5, 3
3 0002,5, 3
Any idea what the problem is ?
Use sprintf and then translate the decimal points to comma:
before <- after <- 4
fmt <- sprintf("%%0%d.%df", before + after + 1, after)
transform(df, a = chartr(".", ",", sprintf(fmt, a)))
giving:
a
1 0002,5000
2 0003,5600
3 0004,5000
or writing this with dplyr:
library(dplyr)
before <- after <- 4
df %>%
mutate(a = "%%0%d.%df" %>%
sprintf(before + after + 1, after) %>%
sprintf(a) %>%
chartr(".", ",", .))
giving:
a
1 0002,5000
2 0003,5600
3 0004,5000
In this case, mapply suits better you:
df$b <- mapply(format44, df$a)
You do not even need the format44 wrapper. You can use:
df$c <- mapply(format, df$a, 4,4)

Flipping two sides of string

I need to prepare a certain dataset for analysis. What I have is a table with column names (obviously). The column names are as follows (sample colnames):
"X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM"
(this is a vector, for those not familiair with R colnames() function)
Now, what I want is simply to flip the values in front of, and after the underscore. e.g. X99_NORM becomes NORM_X99. Note that I want this only for the column names which contain NORM in their name.
Some other base R options
1)
Use sub to switch the beginning and end - we can make use of capturing groups here.
x <- sub(pattern = "(^X\\d+)_(NORM$)", replacement = "\\2_\\1", x = x)
Result
x
# [1] "NORM_X99" "NORM_X101" "X76_110_T02_09747" "NORM_X30"
2)
A regex-free approach that might be more efficient using chartr, dirname and paste. But we need to get the indices of the columns that contain "NORM" first
idx <- grep(x = x, pattern = "NORM", fixed = TRUE)
x[idx] <- paste0("NORM_", dirname(chartr("_", "/", x[idx])))
x
data
x <- c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM")
x = c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM")
replace(x,
grepl("NORM", x),
sapply(strsplit(x[grepl("NORM", x)], "_"), function(x){
paste(rev(x), collapse = "_")
}))
#[1] "NORM_X99" "NORM_X101" "X76_110_T02_09747" "NORM_X30"
A tidyverse solution with stringr:
library(tidyverse)
library(stringr)
my_data <- tibble(column = c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM"))
my_data %>%
filter(str_detect(column, "NORM")) %>%
mutate(column_2 = paste0("NORM", "_", str_extract(column, ".+(?=_)"))) %>%
select(column_2)
# A tibble: 3 x 1
column_2
<chr>
1 NORM_X99
2 NORM_X101
3 NORM_X30

Selecting multiple columns using Regular Expressions

I have variables with names such as r1a r3c r5e r7g r9i r11k r13g r15i etc. I am trying to select variables which starts with r5 - r12 and create a dataframe in R.
The best code that I could write to get this done is,
data %>% select(grep("r[5-9][^0-9]" , names(data), value = TRUE ),
grep("r1[0-2]", names(data), value = TRUE))
Given my experience with regular expressions span a day, I was wondering if anyone could help me write a better and compact code for this!
Here's a regex that gets all the columns at once:
data %>% select(grep("r([5-9]|1[0-2])", names(data), value = TRUE))
The vertical bar represents an 'or'.
As the comments have pointed out, this will fail for items such as r51, and can also be shortened. Instead, you will need a slightly longer regex:
data %>% select(matches("r([5-9]|1[0-2])([^0-9]|$)"))
Suppose that in the code below x represents your names(data). Then the following will do what you want.
# The names of 'data'
x <- scan(what = character(), text = "r1a r3c r5e r7g r9i r11k r13g r15i")
y <- unlist(strsplit(x, "[[:alpha:]]"))
y <- as.numeric(y[sapply(y, `!=`, "")])
x[y > 4]
#[1] "r5e" "r7g" "r9i" "r11k" "r13g" "r15i"
EDIT.
You can make a function with a generalization of the above code. This function has three arguments, the first is the vector of variables names, the second and the third are the limits of the numbers you want to keep.
var_names <- function(x, from = 1, to = Inf){
y <- unlist(strsplit(x, "[[:alpha:]]"))
y <- as.integer(y[sapply(y, `!=`, "")])
x[from <= y & y <= to]
}
var_names(x, 5)
#[1] "r5e" "r7g" "r9i" "r11k" "r13g" "r15i"
Remove the non-digits, scan the remainder in and check whether each is in 5:12 :
DF <- data.frame(r1a=1, r3c=2, r5e=3, r7g=4, r9i=5, r11k=6, r13g=7, r15i=8) # test data
DF[scan(text = gsub("\\D", "", names(DF)), quiet = TRUE) %in% 5:12]
## r5e r7g r9i r11k
## 1 3 4 5 6
Using magrittr it could also be written like this:
library(magrittr)
DF %>% .[scan(text = gsub("\\D", "", names(.)), quiet = TRUE) %in% 5:12]
## r5e r7g r9i r11k
## 1 3 4 5 6

How can I apply a function to a column of a data frame using lapply?

How can I rewrite the for loop in the following piece of code using lapply?
transactions <- read.table(file = file("stdin"), header = FALSE, stringsAsFactors = FALSE)
for (i in 1:nrow(transactions)) {
transactions[i,1] <- paste(sort(unlist(strsplit(transactions[i,1], ","))), collapse = ",")
}
If you find it easier to work with some input data, use the following as the contents of stdin:
a,b
b,c,a
a,b,c
b,a,c
a,b,c,d
a,d,b,c
# Sample data
n <- 10
d <- data.frame(
a = unlist( lapply(
1:n,
function (u) { paste( sample(LETTERS, 5), collapse="," ) }
)),
b = 1:n,
stringsAsFactors = FALSE
)
# Sort the lists
d[,1] <- unlist(lapply(
strsplit(d[,1], ","), # List with the data to process
function (u) { paste(sort(u), collapse=",") } # Function to apply to each element
))
It looks like you want to sort the individual comma-separated components of transactions[, 1].
transactions[, 1] <- sapply(lapply(strsplit(transactions[, 1], ","),
sort),
paste, collapse=",")
If I understand your code correctly, you want to replace the values in column 1 of transactions with their new values.
Since you used header = FALSE, I'll assume that the name of column 1 is V1. In which case, you do not need either a loop or lapply(), since the operation can be vectorized like this:
transactions$V1 <- paste(sort(unlist(strsplit(transactions$V1, ","))),
collapse = ",")
EDIT: I have no idea whether the paste/sort/unlist/strsplit works since I can't see the original data. My point is that you don't need loops or apply to transform a data frame column.
EDIT: OK, I get what the code is supposed to do now and yes, the above will not work because of the unlist(). But I'm sure there's a vectorized solution...will edit if I find one.
EDIT: Right: the best I can come up with is a double sapply(). One to split and sort transactions$V1, another to paste it back together. It's ugly. Here it is:
transactions$V1 <- sapply(sapply(strsplit(transactions$V1, ","), sort),
function(x) paste(x, collapse = ","))

Resources