how to use lappy for string replacement - r

I am trying to use lapply to replace the elements of a string in several data.frames contained in a list. When I attempt to do this, the whole data.frame is replaced, rather than the string contained in the data.frame.
A reproducible example below:
a <- list( a = data.frame(Date = c("1900-08-31"), Val = 1000),
b = data.frame(Date = c("1900-08-31"), Val = 1000) )
lapply(a, function(x){
gsub(".{2}$","01",x$Date)
})
What I would expect to happen is the elements of a$Date and b$Date get replaced with '1900-08-01'. But what happens is a and b get replaced with "1900-08-01"

Your lapply function is returning a vector with the replacement instead of a and b with Date modified. Try this:
lapply(a, function(x){
x$Date <- gsub(".{2}$","01",x$Date)
return(x)
})

Related

Keep last n characters of cells in a function in R

Consider the following data.frame:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))
I only want to keep the 4 last characters of the cells in the column "id". Therefore, I can use the following code:
df$id <- substr(df$id,nchar(df$id)-3,nchar(df$id))
However, I want to create a function that does the same. Therefore, I create the following function and apply it:
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
}
df <- testfunction(df)
But I do not get the same result. Why is that?
Add return(x) in your function to return the changed object.
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
return(x)
}
df <- testfunction(df)
However, you don't need an explicit return statement always (although it is better to have one). R by default returns the last line in your function so here you can also do
testfunction <- function(x) {
transform(x, id = substring(id, nchar(id)-3))
}
df <- testfunction(df)
which should work the same.
We can also create a function that takes an argument n (otherwise, the function would be static for the n and only useful as a dynamic function for different data) and constructs a regex pattern to be used with sub
testfunction <- function(x, n) {
pat <- sprintf(".*(%s)$", strrep(".", n))
x$id <- sub(pat, "\\1", x$id)
return(x)
}
-testing
testfunction(df, n = 4)
# id value
#1 2010 1
#2 2010 1
#3 2010 1
#4 2010 1
#5 2010 1
Base R solution attempting to mirror Excel's RIGHT() function:
# Function to extract the right n characters from each element of a provided vector:
right <- function(char_vec, n = 1){
# Check if vector provided isn't of type character:
if(!is.character(char_vec)){
# Coerce it, if not: char_vec => character vector
char_vec <- vapply(char_vec, as.character, "character")
}
# Store the number of characters in each element of the provided vector:
# num_chars => integer vector
num_chars <- nchar(char_vec)
# Return the right hand n characters of the string: character vector => Global Env()
return(substr(char_vec, (num_chars + 1) - n, num_chars))
}
# Application:
right(df$id, 4)
Data:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))

R Convert loop into function

I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.
You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))

Apply Function to Specific Column in R List

I have seen many questions pretty similar to mine, but none of the answers I've seen have actually solved what I'm trying to do. I have a list of data frames, and I'm trying to apply the digest() function to the same column in each data frame in my list. A couple of the answers I've seen on SO to this have been:
dflist <- list(data.frame(number = 1:10, name = 1:10),
data.frame(number = 2:15, name = 1:14))
dflist <- lapply(dflist, function(x){
x$name <- digest(x$name, algo = "sha256")
return(x)
})
#OR this
dflist <- lapply(dflist, function(x) {
x %>% mutate_each(funs(digest(.,algo = "sha256")), "name")
})
Both of these give the same output - which is simply every row in the name column having the same exact value. The digest() function works but only returns the value of the first row, in every row.
I've also tried:
dflist <- lapply(dflist, function(x) {
digest(x[,"name"], algo = "sha256")
})
But this just returns only the first value from each data frame in the list.
Any advice would be much appreciated!
The digest is not vectorized
dflist1 <- lapply(dflist, function(x) {
x$name <- Vectorize(digest::digest)(x$name, algo = "sha256")
x
})
Or use it in transform
dflist1 <- lapply(dflist, transform, name = Vectorize(digest::digest)(name))

Is there a way to return two separate lists from one function?

I have a data frame which looks like this
value <- c(1:1000)
group <- c(1:5)
df <- data.frame(value,group)
And I want to use this function on my data frame
myfun <- function(){
wz1 <- df[sample(nrow(df), size = 300, replace = FALSE),]
wz2 <- df[sample(nrow(df), size = 10, replace = FALSE),]
wz3 <- df[sample(nrow(df), size = 100, replace = FALSE),]
wz4 <- df[sample(nrow(df), size = 40, replace = FALSE),]
wz5 <- df[sample(nrow(df), size = 50, replace = FALSE),]
wza <- rbind(wz1,wz2, wz3, wz4, wz5)
wza_sum <- aggregate(wza, by = list(group_ID=wza$group), FUN = sum)
return(list(wza = wza,wza_sum = wza_sum))
}
Right now I am returning one list which includes wza and wza_sum.
Is there a way to return two separate list in which one contains wza and the other list contains wza_sum?
The aggregate() function needs to be in myfun() because I want to replicate myfun() 100 times using
dfx <- replicate(100,myfun(),simplify = FALSE,)
A function should take one input (or set of inputs), and return only one output (or a set of outputs). Consider the simple example of
myfunction <- function(x) {
x
x ** 2
}
Unless you are calling return() early (which you usually don't), the last object is returned. In fact, if you try to return two objects, e.g. return(1,2) you are met with
Error in return(1, 2) : multi-argument returns are not permitted
That is why the solution proposed by #StupidWolf in the comments is the most appropriate one, where you use return(list(wza = list(wza),wza_sum = list(wza_sum))). You then have to perform the necessary post-processing of splitting the lists if appropriate.

How to pass variables into split()?

I want to run split() in a for loop, but when I pass it variable text, it just creates a new data.frame containing the text. The idea here is to split CMPD_DF_1, CMPD_DF_2, etc. based on CMPD_DF_1[5], CMPD_DF_2[5], etc. How do I pass in the data.frame and not a string?
for (i in 1:10) {
split(paste("CMPD_DF", i, sep = "_"),
paste(paste("CMPD_DF", i, sep = "_"), "[5]", sep=""))
}
Sorry for the initial confusion. You can put your data frames in a list and then use lapply. This assumes the column you are splitting on is the same in each data frame. I'll update with a more general solution...
d1 <- data.frame(x =1:10, y = rep(letters[1:2], each = 5))
d2 <- d1
l <- list(d1,d2)
myFun <- function(x){
return(split(x,x[,2]))
}
lapply(l,myFun)
And here's a way to do this using mapply that will allow for different splitting columns in each data frame. You just pre-specify the columns in a separate list and pass them to mapply:
l <- list(d1,d2)
splitColumns <- list("y","y")
myFun2 <- function(x,col){
return(split(x,x[,col]))
}
mapply(myFun2,l,splitColumns,SIMPLIFY = FALSE)
Your code doesn't work because you're not passing a data.frame to split. You're passing a character vector that contains a string with the name of your data.frame. Something like this should work, but it's not very R-like. #joran's answer is preferable.
for (i in 1:10) {
dfname <- paste("CMPD_DF", i, sep = "_")
split(get(dfname), get(dfname)[5])
}

Resources