Keep last n characters of cells in a function in R

Keep last n characters of cells in a function in R - r

Consider the following data.frame:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))
I only want to keep the 4 last characters of the cells in the column "id". Therefore, I can use the following code:
df$id <- substr(df$id,nchar(df$id)-3,nchar(df$id))
However, I want to create a function that does the same. Therefore, I create the following function and apply it:
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
}
df <- testfunction(df)
But I do not get the same result. Why is that?

Add return(x) in your function to return the changed object.
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
return(x)
}
df <- testfunction(df)
However, you don't need an explicit return statement always (although it is better to have one). R by default returns the last line in your function so here you can also do
testfunction <- function(x) {
transform(x, id = substring(id, nchar(id)-3))
}
df <- testfunction(df)
which should work the same.

We can also create a function that takes an argument n (otherwise, the function would be static for the n and only useful as a dynamic function for different data) and constructs a regex pattern to be used with sub
testfunction <- function(x, n) {
pat <- sprintf(".*(%s)$", strrep(".", n))
x$id <- sub(pat, "\\1", x$id)
return(x)
}
-testing
testfunction(df, n = 4)
# id value
#1 2010 1
#2 2010 1
#3 2010 1
#4 2010 1
#5 2010 1

Base R solution attempting to mirror Excel's RIGHT() function:
# Function to extract the right n characters from each element of a provided vector:
right <- function(char_vec, n = 1){
# Check if vector provided isn't of type character:
if(!is.character(char_vec)){
# Coerce it, if not: char_vec => character vector
char_vec <- vapply(char_vec, as.character, "character")
}
# Store the number of characters in each element of the provided vector:
# num_chars => integer vector
num_chars <- nchar(char_vec)
# Return the right hand n characters of the string: character vector => Global Env()
return(substr(char_vec, (num_chars + 1) - n, num_chars))
}
# Application:
right(df$id, 4)
Data:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))

Related

How to write a function with an unspecified number of arguments where the arguments are column names

I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks

1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))

Create a generalized function in R

a<-c(0,1,1,0)
b<-c(1,0,0,0)
c<-c(0,0,0,1)
binary_subset<-function(a){
a_seq = lapply(a, seq, 0) # keep 0s as 0, make 1s c(1, 0)
subset=do.call(expand.grid, a_seq)
colnames(subset)=(1:length(a))
return(subset)
}
test_fun<-function(a,b,c,d){
list <- list(a,b,c,d)
interactions_abc<-do.call("rbind",lapply(list, binary_subset))
interactions_no_duplicate<-unique(interactions_abc[1:length(a)])
rownames(interactions_no_duplicate)=1:nrow(interactions_no_duplicate)
interactions_no_duplicate
}
>test_fun(a,b,c,d)
Error in test_fun(a, b, c, d) : object 'd' not found
I am trying to write a function where the input is not fixed. I have defined the function for 4 binary vectors. If I input 3 binary vectors, I am getting an error because the 4th vector is missing. This will work only if I input 4 binary vectors.
How can I fix this? Means: if I input 2 or 3 vectors, the function will produce the corresponding output; that means the function will run for two vectors and ignore the rest.

Maybe you can use ... for the function arguments, e.g.,
test_fun <- function(...) {
list <- list(...)
interactions_abc <- do.call("rbind", lapply(list, binary_subset))
interactions_no_duplicate <- unique(interactions_abc[1:length(list[[1]])])
rownames(interactions_no_duplicate) <- 1:nrow(interactions_no_duplicate)
interactions_no_duplicate
}

Consider passing in a list as single, sole argument by retrieving all numeric vectors from global environment with eapply and Filter. Below functions are re-factored for one line where { and } are optional.
a <- c(0,1,1,0)
b <- c(1,0,0,0)
c <- c(0,0,0,1)
binary_subset <- function(x) {
setNames(do.call(expand.grid, lapply(x, seq, 0)), 1:length(a))
}
test_fun <- function(mylist) {
data.frame(unique(do.call("rbind", lapply(mylist, binary_subset))),
row.names = NULL, check.names = FALSE)
}
vecs <- Filter(is.numeric, eapply(.GlobalEnv, identity))
test_fun(vecs)
Online Demo

The loop in my R function appears to be running twice

I need to add rows to a data frame. I have many files with many rows so I have converted the code to a function. When I go through each element of the code it works fine. When I wrap everything in a function each row from my first loop gets added twice.
My code looks for a string (xx or x). If xx is present is replaces the xx with numbers 00-99 (one row for each number) and 0-9. If x is present it replaces it with number 0-9.
Create DF
a <- c("1.x", "2.xx", "3.1")
b <- c("single", "double", "nothing")
df <- data.frame(a, b, stringsAsFactors = FALSE)
names(df) <- c("code", "desc")
My dataframe
code desc
1 1.x single
2 2.xx double
3 3.1 nothing
My function
newdf <- function(df){
# If I run through my code chunk by chunk it works as I want it.
df$expanded <- 0 # a variable to let me know if the loop was run on the row
emp <- function(){ # This function creates empty vectors for my loop
assign("codes", c(), envir = .GlobalEnv)
assign("desc", c(), envir = .GlobalEnv)
assign("expanded", c(), envir = .GlobalEnv)
}
emp()
# I want to expand xx with numbers 00 - 99 and 0 - 9.
#Note: 2.0 is different than 2.00
# Identifies the rows to be expanded
xd <- grep("xx", df$code)
# I used chr vs. numeric so I wouldn't lose the trailing zero
# Create a vector to loop through
tens <- formatC(c(0:99)); tens <- tens[11:100]
ones <- c("00","01","02","03","04","05","06","07","08","09")
single <- as.character(c(0:9))
exp <- c(single, ones, tens)
# This loop appears to run twice when I run the function: newdf(df)
# Each row is there twice: 2.00, 2.00, 2.01 2.01...
# It runs as I want it to if I just highlight the code.
for (i in xd){
for (n in exp) {
codes <- c(codes, gsub("xx", n, df$code[i])) #expanding the number
desc <- c(desc, df$desc[i]) # repeating the description
expanded <- c(expanded, 1) # assigning 1 to indicated the row has been expanded
}
}
# Binds the df with the new expansion
df <- df[-xd, ]
df <- rbind(as.matrix(df),cbind(codes,desc,expanded))
df <- as.data.frame(df, stringsAsFactors = FALSE)
# Empties the vector to begin another expansion
emp()
xs <- grep("x", df$code) # This is for the single digit expansion
# Expands the single digits. This part of the code works fine inside the function.
for (i in xs){
for (n in 0:9) {
codes <- c(codes, gsub("x", n, df$code[i]))
desc <- c(desc, df$desc[i])
expanded <- c(expanded, 1)
}
}
df <- df[-xs,]
df <- rbind(as.matrix(df), cbind(codes,desc,expanded))
df <- as.data.frame(df, stringsAsFactors = FALSE)
assign("out", df, envir = .GlobalEnv) # This is how I view my dataframe after I run the function.
}
Calling my function
newdf(df)

R function doesn't return value

I'm working on a Kaggle Kernel relating to FIFA 19 data(https://www.kaggle.com/karangadiya/fifa19) and trying to create a function which adds up numbers in a column.
The column has values like 88+2 (class - character)
The desired result would be 90 (class - integer)
I tried to create a function in order to transform such multiple columns
add_fun <- function(x){
a <- strsplit(x, "\\+")
for (i in 1:length(a)){
a[[i]] <- as.numeric(a[[i]])
}
for (i in 1:length(a)){
a[[i]] <- a[[i]][1] + a[[i]][2]
}
x <- as.numeric(unlist(a))
}
This works perfectly fine when I manually transform each column but the function won't return the desired results. Can someone sort this out?

read the csv data in df
then extract the 4 columns required using
dff <- df[, c("LS","ST", "RS","LW")]
def_fun <- function(x){
a <- strsplit(x, '\\+')
for (i in length(a)){
b <- sum(as.numeric(a[[i]]))
}
return (b)
}
Then apply the operations on the required columns
for (i in 1: ncol(dff)){
dff[i] <- apply(dff[i], 1, FUN = def_fun)
}
You can cbind this dataFrame with the original one and drop the original columns.
I hope it proves helpful.

Using R to loop through vector and copy some sequences to data.frame

I want to search through a vector for the sequence of strings "hello" "world". When I find this sequence, I want to copy it, including the 10 elements before and after, as a row in a data.frame to which I'll apply further analysis.
My problem: I get an error "new column would leave holes after existing columns". I'm new to coding, so I'm not sure how to manipulate data.frames. Maybe I need to create rows in the loop?
This is what I have:
df = data.frame()
i <- 1
for(n in 1:length(v))
{
if(v[n] == 'hello' & v[n+1] == 'world')
{
df[i,n-11:n+11] <- v[n-10:n+11]
i <- i+1
}
}
Thanks!

May be this helps
indx <- which(v1[-length(v1)]=='hello'& v1[-1]=='world')
lst <- Map(function(x,y) {s1 <- seq(x,y)
v1[s1[s1>0 & s1 < length(v1)]]}, indx-10, indx+11)
len <- max(sapply(lst, length))
d1 <- as.data.frame(do.call(rbind,lapply(lst, `length<-`, len)))
data
set.seed(496)
v1 <- sample(c(letters[1:3], 'hello', 'world'), 100, replace=TRUE)