R split function - do not include grouping variable in new dataset

R split function - do not include grouping variable in new dataset - r

I usually like to use lapply() instead of a for loop:
lx <- split( x, x$hr) #with the next step being lapply( lx, function( x) ...)).
But now each element of lx includes the column hr, which is inefficient because that information is already in names( lx).
So now I must do:
lx <- lapply( lx, function( X) select( X, -hr))
(An alternative is:
HR <- unique( x$hr)
lx <- select( lx, -hr)
lx <- split( x, HR)
)
The whole point of lapply() over a for loop is to be efficient so these extra lines bother me. It seems like such a common use case, and my experience has shown that usually R has something more efficient, or I'm missing something.
Can this be achieved in a single function call or one-liner?
EDIT: Specific Example
DF <- data.frame( A = 1:2, B = 2:3, C = 3:4)
DF <- split( DF, factor( DF$A)) # but each list element still contains the column A which is
# redundant (because the names() of the list element equals A
# as well), so I have to write the following line if I want
# to be efficient especially with large datasets
DF <- lapply( DF, function( x) select( x, -A)) # I hate always writing this line!

Remove the split column first:
split(DF[-1], DF[[1]])
or
split(subset(DF, select = -A), DF$A)
Update: Added last line.

Related

R Convert loop into function

I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.

You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))

How to call a different value for each element in a list in R

I have a list with 29 data frames.
I am trying to do a simple transformation with ifelse(), that looks something like this:
with(df, ifelse(col1 > x, col1 <- col1-y, col1<-col1+y))
The one thing I can't seem to get is how to change that x and y value so that a different value is used for each data frame in the list.
Here's a quick reproducible example of what I've got so far .. but I want to call different values for x and y from a data frame (e.g. info)
df.1 <- data.frame("df"=rep(c(1), times=4),"length"=c(10:7))
df.2 <- data.frame("df"=rep(c(2),times=4),"length"=c(8:11))
df.3 <- data.frame("df"=rep(c(3),times=4),"length"=c(9:12))
list <- list(df.1,df.2,df.3)
info <- data.frame(x=rep(c(8.5,9.5,10.5)), y=rep(c(1,1.5,2)))
# using static number for x & y but wanting these to be grabbed from the above df and change
# for each list
x <- 8
y <- 1
lapply(list, function(df) {
df <- with(df, ifelse(length > x,
length <- length-y,
length <- length+y)) })
Any and all help/insight is appreciated!
Edited to add clarification:
I would like the rows to match up with lists.
E.g. Row 1 in Info (x=8.5, y=1) is used in the function and applied just to the first data frame in the list (df.1).

When you need to pass more than one value to lapply, you must use mapply instead.
mapply(
function(df, x, y) {
#print("df")
#print(df)
#print("x")
#print(x)
#print("y")
#print(y)
with(df, ifelse(length > x, length <- length - x, length <- length + y))
},
list,
info$x,
info$y
)
I've left some debugging in the code which can enabled in case you want to see how it works.

Applying functions to vector elements to make rows in a new dataframe

I've got an interesting problem and have no idea where to begin -- in fact, I wasn't even sure how to title the question! What I want to do is apply functions to elements of a dataframe and use these to make new rows in a new dataframe. For example, suppose we have a dataframe df1 that gives some X and Y data for various States:
df1 <- data.frame(State=c("AL","AK"), X=c(1,3), y=c(2,4))
What I would like to do is start with the first state AL, and make a new dataframe df2with 3 rows, where the new values of df2$X are calculated using 3 different functions to give, for example: df1$X, df1$X - 1, and df1$X + 1. Likewise, I want to do a similar thing for new values of df2$Y, which in this example are calculated as df1$Y, df1$Y * 0.5, and df1$Y * 0.5.
Then, I would proceed to the next State. The end result should be:
df2 <- data.frame(State=c("AL", "AL","AL","AK","AK","AK"),
X=c(1,0,2,3,2,4), y=c(2,1,1,4,2,2))
Does anyone know how i might approach this? I have no idea where to even begin... I can imagine some kind of for loop, but I'm hoping there's a more elegant approach in R.

base R solution:
funcs.X <- list(function(x) x, function(x) x-1, function(x) x+1)
funcs.y <- list(function(y) y, function(y) y*0.5, function(y) y*0.5)
apply.funcs <- function(funcs,x) as.vector(t(sapply(funcs, function(f) f(x))))
d <- data.frame(State = rep(df1$State,each=length(funcs.X)),
X = apply.funcs(funcs.X, df1$X),
y = apply.funcs(funcs.y, df1$y)
)
identical(d,df2)
# [1] TRUE

You could try
library(data.table)
res <- setDT(df1)[,list(X=c(X, X-1, X+1), y=c(y,y*0.5, y*0.5)) , State]
all.equal(setDF(res), df2, check.attributes=FALSE)
#[1] TRUE

How can I do a t.test on an entire data.frame and extract the p-values?

My dataset looks something like this:
a <- rnorm(2)
b <- rnorm(2)-3
x <- rnorm(13)
y <- rnorm(2)-1
z <- rnorm(2)-2
eg <- expand.grid(a,b,x,y,z)
treatment <- c(rep(1, 2), rep(0,3))
eg <- data.frame(t(eg))
row.names(eg) <- NULL
eg <- cbind(treatment, eg)
What I need to do is run t-tests on each column, comparing the treatment =1 group to the treatment=0 group. I'd like to then have a vector of p-values. I've tried (several versions of) doing this through a loop, but I continue to receive the same error message: "undefined columns selected." Here's my code currently:
p.values <- c(rep(NA, 208))
for (i in 2:209) {
x <- data.frame(eg[eg$treatment==1][,i][1:2])
y <- data.frame(eg[eg$treatment==0][,i][3:5])
value <- t.test(x=x, y=y)['p.value']
p.values[i] <- value
}
I added the data.frame() after reading someone mention that for loops only loop through dataframes, but it didn't change anything. I am sure there is an easier way to do this, perhaps by using something in the apply family? Does anyone have any suggestions? Thanks so much!

A couple of options, both using sapply:
sapply(
eg[-1], function(x) t.test(x[eg$treatment==1],x[eg$treatment==0])[["p.value"]]
)
Or looping over the names instead:
sapply(
names(eg[-1]),
function(x) t.test(as.formula(paste(x,"~ treatment")),data=eg)[["p.value"]]
)
Or even mapply:
mapply(function(x,y) t.test(x ~ y,data=cbind(x,y))[["p.value"]], eg[-1], eg[1])

R df process each column by a different function provided in a list of functions

I guess my problem is very simple, but I could not find the solution in web yet.
I would like to modify a data frame with a set of functions.
The functions are defined in a list. They may have more than one argument, but one arg is always the value found on the related column in a df.
I used build in BOD data set just for convinience. The list could be this:
funs <- list(
fn1 = function(x) x+1,
fn2 = function(x) x-1
)
The function call could look like this:
searchedFunc(BOD, funs)
So after modificatin Time column values are added by 1 and demand column values are subtracted by one.

You can use sapply to be more flexible
funs <- list(
fn1 = function(x) x+1,
fn2 = function(x) x-1
)
searchedFunc <- function(df, fns) {
sapply(seq(along.with=fns), function(i) fns[[i]](df[, i]))
}
searchedFunc(BOD, funs)
Hope it helps,
alex

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R split function - do not include grouping variable in new dataset - r

Remove the split column first: split(DF[-1], DF[[1]]) or split(subset(DF, select = -A), DF$A) Update: Added last line.

Related

R Convert loop into function

How to call a different value for each element in a list in R

Applying functions to vector elements to make rows in a new dataframe

How can I do a t.test on an entire data.frame and extract the p-values?

R df process each column by a different function provided in a list of functions

Categories

Resources