Creating functions in R - r

I want to create a function that takes a character value x. If the value starts with the letter "A", the function should return "Apple", and so forth.
test_df <- function(x){
if (input 'a75'){ "apple75" }
else if (input "d21"){ "dragonfruit21" }
}

Please check this below code
code
library(stringr)
test_df <- function(x){
if (str_detect(x,'^a')) {
return('apple75')
} else if (str_detect(x,'^d')) {
return('dragonfruit75')
} else {
return('no input match')
}
}
test_df('a75')
output
[1] "apple75"

You can try something like this:
test_df <- function(x){
x <- unlist(strsplit(x, ""))
string <- if(x[1] == "a"){
paste0("apple", paste0(x[-1], collapse = ""))
} else{
if(x[1] == "d") {
paste0("dragonfruit", paste0(x[-1], collapse = ""))
}
}
return(string)
}
> test_df('a75')
[1] "apple75"
> test_df('d21')
[1] "dragonfruit21"
You can even use Vectorize and give a vector as input:
test_df_v <- Vectorize(test_df)
test_df_v(c('a75','d21', 'aZ45'))
a75 d21 aZ45
"apple75" "dragonfruit21" "appleZ45"
Now you have a starting point, anything else you want o add, then would be easier from now on.

test_fun <- function(x){
x <- tolower(x) # Ensure the input is all lowercase
if(startsWith(x, 'a')) "apple75"
else if (startsWith(x, 'd')) "dragonfruit21"
}
test_fun('A')
[1] "apple75"

One way that might be easier to keep track of the options would be to create a lookup table within the function and select from that:
startsWith <- function(x){
firstletter = substr(tolower(x), 0,1)
lookup = data.frame(firstletter = c("a", "b", "c", "d"),
associated = c("apple75", "Banana69", "Carrot18", "dragonfruit21")
)
return(lookup$associated[which(lookup$firstletter==firstletter)])
}
startsWith("apple")
#[1] "apple75"
startsWith("drnking")
#[1] "dragonfruit21"

Related

Improve if else statement

I have the following function in R. It is working fine, however, I think that must be a better way to run this function.
values <- c("a","b")
print <- function(values){
size <- length(values)
if (size == 1) {
final <- values[1]
}else if(size == 2){
final <- paste0(values[2], " and ", values[1])
}else if(size == 3){
final <- paste0(values[3], " and ",values[2], " and ", values[1])
}
return(final)
}
print(values)
The user can change the size of values, so if he choose values <- c("a","b", "c") the function is gonna run in the last condition. However, the last condition is in art equal to the second conditional plus something new. It is possible to make an if statement, or something in those lines that uses the previous condition . Something like:
values <- c("a","b", "c")
print <- function(values){
size <- length(values)
if (size == 1) {
final <- values[1]
}else if(size == 2){
final <- paste0(values[2], " and ", final )
}else if(size == 3){
final <- paste0(values[3], " and ",final )
}
return(final)
}
print(values)
Try this, which reverses the order of the input vector and pastes "and" between:
newfun <- function(x){
ifelse(length(x)>1, paste(rev(x), collapse = " and "), x)
}
Output:
newfun(letters[1])
# [1] "a"
newfun(letters[1:2])]
# [1] "b and a"
# and so on...
newfun(letters[1:5])
# [1] "e and d and c and b and a"
Testing this against your function to see if it is identical:
all.equal(print(letters[1:3]),
newfun(letters[1:3]))
# [1] TRUE
I would also strongly caution naming user-defined functions names that are already inherent in R (i.e. print() is already a function in R.
Another way of reversing the order of the vectors:
reverse_print <- function(values) paste(values[order(values, decreasing = TRUE)], collapse = " and ")
reverse_print(c("a", "b"))
#[1] "b and a"
reverse_print(c("a", "b", "c", "d"))
#[1] "d and c and b and a"
However, if your main objective is to create a function that recursively uses a condition and the previous conditions, one way of achieving it is to create a direct recursive function, in which the function calls itself (please see #G.Chan's comment for reference). Unfortunately, I failed to create such function for your case. Error: C stack usage 15927520 is too close to the limit was produced. This kind of error is relatively common in recursive functions, as discussed here.
Instead of crating a direct recursive function, I would suggest making the use of while along with incremented index as follows:
revprint <- function(values) {
size <- length(values)
if (size == 1) {
cat(values[1])
} else {
while (size > 1) {
final <- values[size]
appended <- paste0(final, " and ")
size <- size - 1
output <- cat(appended)
}
cat(output, values[1], sep = "")
}
}
revprint("a")
# a
revprint(c("a", "b", "c", "d"))
# d and c and b and a
If the length of the input (a character vector) is larger than 1, this function displays the final character of the input using paste0, and then incrementally reduces the length of the input. In each incremental step, the final character of the new (shorter) input is displayed, appended with the final character of the previous (longer) input.
Because this function uses cat, the result is displayed on the console, but it cannot be assigned directly to an object. To assign it to an object, you can use capture.output()
out <- capture.output(revprint(c("a", "b", "c", "d")))
out
#[1] "d and c and b and a"

Convert time don't work on a column's dataframe

I have some data from event producer. In a "created_at column I have mixed type of datetime value.
Some NA, some ISO8601 like, some POSIX with and without millisec.
I build a func that should take care of everything meanning let's NA and ISO8601 info as it is, and convert POSIX date to ISO8601.
library(anytime)
convert_time <- function(x) {
nb_char = nchar(x)
if (is.na(x)) return(x)
else if (nb_char == 10 | nb_char == 13) {
num_x = as.numeric(x)
if (nb_char == 13) {
num_x = round(num_x / 1000, 0)
}
return(anytime(num_x))
}
return(x)
}
If I passe one problematic value
convert_time("1613488656")
"2021-02-16 15:17:36 UTC"
Works well !
Now
df_offer2$created_at = df_offer2$created_at %>% sapply(convert_time)
I still have the problematic values.
Any tips here ?
I would suggest the following small changes...
convert_time <- function(x) {
nb_char = nchar(x)
if (is.na(x)) return(x)
else if (nb_char == 10 | nb_char == 13) {
num_x = as.numeric(x)
if (nb_char == 13) {
num_x = round(num_x / 1000, 0)
}
return(num_x) #remove anytime from here
}
return(x)
}
df_offer2$created_at = df_offer2$created_at %>%
sapply(convert_time) %>% anytime() #put it back in at this point
Two things that have worked for me:
col1<-seq(from=1,to=10)
col2<-rep("1613488656",10)
df <- data.frame(cbind(col1,col2))
colnames(df)<-c("index","created_at")
df <- df%>%
mutate(converted = convert_time(df$created_at))`
alternatively
col1<-seq(from=1,to=10)
col2<-rep("1613488656",10)
df <- data.frame(cbind(col1,col2))
colnames(df)<-c("index","created_at")
df$created_at <- convert_time(df$created_at)
Both spit out warnings but appear to make the correction properly

In R, is it possible to call a function in a text paragraph and remove the breaks?

I am trying to write a function for non R users for report writing in R markdown.
The function calls unicode for macron characters.
CODE:
library(stringr)
library(Unicode)
library(htmltools)
library(cat)
mac <- function(x){
if (x == "a") {
result <- "\u101"
} else if (x == "A") {
result <- "\u100"
} else if (x == "e") {
result <- "\u113"
} else if (x == "E") {
result <- "\u112"
} else if (x == "i") {
result <- "\u12b"
} else if (x == "I") {
result <- "\u12a"
} else if (x == "o") {
result <- "\u14d"
} else if (x == "O") {
result <- "\u14c"
} else if (x == "u") {
result <- "\u16b"
} else if (x == "U") {
result <- "\u16a"
} else (print("Entry not recognised"))
result = paste0(result, sep = "")
return(result)
# return(p(paste0(result, sep = "")))
}
I have tried:
# gsub("[\r\n]", "", result)
# str_replace_all(x, "[\r\n]" , "")
Without any success - I realise this is because there are no spaces around the output of the function to remove.
As an example, I want this:
p('Something',mac("a"),'nd something with a macron')
To read:
Something ānd something with a macron.
You're getting multiple lines because you're passing a list to p().
If you wrap the text in paste0 the output should all be on one line.
Input:
p(paste0('Something ',mac("a"),'nd something with a macron'))
Output:
<p>Something ānd something with a macron</p>
Which displays as:
Something ānd something with a macron
This can be wrapped in a single function:
p <- function(...) htmltools::p(paste0(...))
If you anticipate users trying to pass lists to p() then you could add something to handle those exceptions.
Full code with example use:
library(stringr)
library(Unicode)
library(htmltools)
library(cat)
mac <- function(x){
if (x == "a") {
result <- "\u101"
} else if (x == "A") {
result <- "\u100"
} else if (x == "e") {
result <- "\u113"
} else if (x == "E") {
result <- "\u112"
} else if (x == "i") {
result <- "\u12b"
} else if (x == "I") {
result <- "\u12a"
} else if (x == "o") {
result <- "\u14d"
} else if (x == "O") {
result <- "\u14c"
} else if (x == "u") {
result <- "\u16b"
} else if (x == "U") {
result <- "\u16a"
} else (print("Entry not recognised"))
result = paste0(result, sep = "")
return(result)
# return(p(paste0(result, sep = "")))
}
# wrap input in paste0() to create a string then pass to p()
p <- function(...) htmltools::p(paste0(...))
# example use
p('Something ',mac("a"),'nd something with a macron')

Replace multiple characters, by index, in a string quickly

I'm trying to quickly replace multiple characters in a string with another character such as *
For example, I have a string such as:
string = "abcdefghij"
I also have a vector of indexes that indicate where I would like to replace letters in the above string with another character.
string_indexes_replaced = c(1, 4, 6, 9)
Desired output:
"*bc*e*gh*j"
What I've done
I've tried a very novice like approach of splitting the characters up into a list, replacing the characters with *, then collapsing the list back into the desired string, as shown below:
library(dplyr)
library(stringi)
string%>%
strsplit(split = "")%>%
lapply(function(x) replace(x, string_indexes_replaced, rep("*", length(string_indexes_replaced))))%>%
lapply(stri_flatten)%>%
unlist(use.names = FALSE)
which outputs
"*bc*e*gh*j"
but it is clear that there should be something simpler and faster than what I've posted above. Is there anything simpler & quicker than what I've demonstrated here?
in base R, besides the method of substring() and for-loop shown by #akrun,, you can use utf8ToInt() and intToUtf8 to make it
v <- utf8ToInt(string)
v[string_indexes_replaced ] <- utf8ToInt("*")
res <- intToUtf8(v)
which gives
> res
[1] "*bc*e*gh*j"
We can use substring
v1 <- c(1, 4, 6, 9)
for(i in seq_along(v1)) substring(string, v1[i], v1[i]) <- "*"
#[1] "*bc*e*gh*j"
As we are using stringi, another option is
library(stringi)
stri_sub_all(string, from = v1, length = 1) <- "*"
string
#[1] "*bc*e*gh*j"
A simple recursive solution. The time efficiency should be same as iteration (for loop). The benefit is there is no side-effect (assignment of integer ks is localized), so that we can treat its whole computation as a functional abstract and feed it to other part of the bigger program which we are working on. It will help to modularize the code.
# multi-replace for character vector input with length greater than 1
multi_replace_v <- function(v, r, ks) {
ks <- as.integer(ks)
if (length(ks) == 0) {
v
} else if (length(ks) == 1) {
if (ks[[1]] > length(v) | ks[[1]] < 1) {
stop("Invalid parameter: ks=", as.character(ks[[1]]), ". Valid range: 1-", as.character(length(v)))
} else if (ks[[1]] == 1) {
c(r, v[-1])
} else if (ks[[1]] == length(v)) {
c(v[-length(v)], r)
} else {
c(v[1:(ks[[1]]-1)], r, v[(ks[[1]]+1):length(v)])
}
} else {
multi_replace_v(multi_replace_v(v, r, ks[[1]]), r, ks[-1])
}
}
# multi-replace for input of single string character vector
multi_replace_s <- function(s, r, ks) paste0(multi_replace_v(unlist(strsplit(s, '')), r, ks), collapse = '')
# multi-replace for both single string and long vector input
multi_replace <- function(v_or_s, r, ks) {
if (length(v_or_s) == 1) {
multi_replace_s(v_or_s, r, ks)
} else if (length(v_or_s) > 1) {
multi_replace_v(v_or_s, r, ks)
} else {
NULL
}
}
# Example
> multi_replace('abcdefghij', "*", c(1,4,6,9))
[1] "*bc*e*gh*j"

How to test when condition returns numeric(0) in R

Let's imagine you would like to construct a simple test based on the condition setdiff(input, 1:9).
How can I construct an
if isnotempty(setdiff(input, 1:9)) stop ("not valid")
statement which stops execution when input is c(3, 12) but continues when input is c(2,5,7) say?
Many thanks,
Bertie
You could use ?length:
isEmpty <- function(x) {
return(length(x)==0)
}
input <- c(3, 12);
if (!isEmpty(setdiff(input, 1:9))) {
stop ("not valid")
}
Here's another option identical(x, numeric(0)). Here's an example (basically took everything from sgibb and replaced the key line as I'm lazy):
isEmpty <- function(x) {
return(identical(x, numeric(0)))
}
input <- c(3, 12)
if (!isEmpty(setdiff(input, 1:9))) {
stop ("not valid")
}
I used the following functions:
# 1. Check if 'integer(0)'
is.integer0 <- function(x) {
is.integer(x) && length(x) == 0L
}
# 2. Check if 'numeric(0)'
is.numeric0 <- function(x) {
identical(x, numeric(0))
}
# 3. Check is 'integer0' or 'numeric0'
is.int_num_0 <- function(x) {
is.integer0(x) || is.numeric0(x)
}
Hope it will be useful.
This does not answer the question in your subject line, but I think it is a better approach for what you are trying to achieve:
if(!all(input %in% 1:9)) stop("not valid")
I liked the isEmpty function. I propose the following, just in case you parse a vector, or a list:
isEmpty <- function(x) {
s<-sapply(x, function(y) length(y)==0, simplify = T)
return(s)
}

Resources