R function for creating, naming and lagging variables - r

I have some data like so:
a <- c(1, 2, 9, 18, 6, 45)
b <- c(12, 3, 34, 89, 108, 44)
c <- c(0.5, 3.3, 2.4, 5, 13,2)
df <- data.frame(a, b,c)
I need to create a function to lag a lot of variables at once for a very large time series analysis with dozens of variables. So i need to lag a lot of variables without typing it all out. In short, I would like to create variables a.lag1, b.lag1 and c.lag1 and be able to add them to the original df specified above. I figure the best way to do so is by creating a custom function, something along the lines of:
lag.fn <- function(x) {
assign(paste(x, "lag1", sep = "."), lag(x, n = 1L)
return (assign(paste(x, "lag1", sep = ".")
}
The desired output is:
a.lag1 <- c(NA, 1, 2, 9, 18, 6, 45)
b.lag1 <- c(NA, 12, 3, 34, 89, 108, 44)
c.lag1 <- c(NA, 0.5, 3.3, 2.4, 5, 13, 2)
However, I don't get what I am looking for. Should I change the environment to the global environment? I would like to be able to use cbind to add to orignal df. Thanks.

Easy using dplyr. Don't call data frames df, may cause confusion with the function of the same name. I'm using df1.
library(dplyr)
df1 <- df1 %>%
mutate(a.lag1 = lag(a),
b.lag1 = lag(b),
c.lag1 = lag(c))

The data frame statement in the question is invalid since a, b and c are not the same length. What you can do is create a zoo series. Note that the lag specified in lag.zoo can be a vector of lags as in the second example below.
library(zoo)
z <- merge(a = zoo(a), b = zoo(b), c = zoo(c))
lag(z, -1) # lag all columns
lag(z, 0:-1) # each column and its lag

We can use mutate_all
library(dplyr)
df %>%
mutate_all(funs(lag = lag(.)))

If everything else fails, you can use a simple base R function:
my_lag <- function(x, steps = 1) {
c(rep(NA, steps), x[1:(length(x) - steps)])
}

Related

How to create list of functions with multiple parameters from dataframes in R?

Long time reader, first time poster. I have not found any previous questions about my current problem. I would like to create multiple linear functions, which I can later apply to variables. I have a data frame of slopes: df_slopes and a data frame of constants: df_constants.
Dummy data:
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
I would like to construct functions such as
myfunc <- function(slope, constant, trvalue){
result <- trvalue*slope+constant
return(result)}
where the slope and constant values are
slope<- df_slope[i,j]
constant<- df_constant[i,j]
I have tried many ways, for example like this, creating a dataframe of functions with for loop
myfunc_all<-data.frame()
for(i in 1:5){
for(j in 1:3){
myfunc_all[i,j]<-function (x){ x*df_slope[i,j]+df_constant[i,j] }
full_func[[i]][j]<- func_full
}
}
without success. The slope-constant values are paired up, such as df_slope[i,j] is paired with df_constant[i,j]. The desired end result would be some kind of data frame, from where I can call a function by giving it the coordinates, for example like this:
myfunc_all[i,j}
but any form would be great. For example
myfunc_all[2,1]
in our case would be
function (x){ x*2+4]
which I can apply to different x values. I hope my problem is clear.
So you have a slight problem with lazy evaluation and variable scopes when you are using a for loop to build functions (see here for more info). It's a bit safer to use something like mapply which will create closures for you. Try
myfunc_all <- with(expand.grid(1:5, 1:3), mapply(function(i, j) {
function(x) {
x*df_slope[i,j]+df_constant[i,j]
}
},Var1, Var2))
dim(myfunc_all) <- c(5,3)
This will create an array like object. The only difference is that you need to use double brackets to extract the function. For example
myfunc_all[[2,1]](0)
# [1] 4
myfunc_all[[5,3]](0)
# [1] -1
Alternative you can choose to write a function that returns a function. That would look like
myfunc_all <- (function(slopes, constants) {
function(i, j)
function(x) x*slopes[i,j]+constants[i,j]
})(df_slope, df_constant)
then rather than using brackets, you call the function with parenthesis.
myfunc_all(2,1)(0)
# [1] 4
myfunc_all(5,3)(0)
# [1] -1
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
functions = vector(mode = "list", length = nrow(df_slope))
for (i in 1:nrow(df_slope)) {
functions[[i]] = function(i,x) { df_slope[i]*x + df_constant[i]}
}
f = function(i, x) {
functions[[i]](i, x)
}
f(1, 1:10)
f(3, 5:10)

Problem with row-wise operation in base R

I have a problem with performing row-wise operations using 'apply' function in R. I want to calculate the distance between two points:
d <- function(x,y){
length <- norm(x-y,type="2")
as.numeric(length)
}
The coordinates are given by two dataframes:
start <- data.frame(
a = c(7, 5, 17, 1),
b = c(5, 17, 1, 2))
stop <- data.frame(
b = c(5, 17, 1, 2),
c = c(17, 1, 2, 1))
My point is to calculate successive distances given by start and stop coordiantes. I wish it worked like:
d(start[1,], stop[1,])
d(start[2,], stop[2,])
d(start[3,], stop[3,])
etc...
I have tried:
apply(X = start, MARGIN = 1, FUN = d, y = stop)
which brought some strange results. Can you please help me with finding the proper solution? I know how to perform the operation using dplyr rowwise() function, however my wish is to use base only.
Can you also explain me why did I receive such a strange results with apply()?
Loop over the sequence of rows and apply the d
sapply(seq_len(nrow(start)), function(i) d(start[i,], stop[i,]))
[1] 12.165525 20.000000 16.031220 1.414214
Or if we want to use apply, create a single data by cbinding the two data and then subset by indexing
apply(cbind(start, stop), 1, FUN = function(x) d(x[1:2], x[3:4]))
[1] 12.165525 20.000000 16.031220 1.414214
Or may use dapply for efficiency
library(collapse)
dapply(cbind(start, stop), MARGIN = 1, parallel = TRUE,
FUN = function(x) d(x[1:2], x[3:4]))
[1] 12.165525 20.000000 16.031220 1.414214

How to refer to unnamed object in R

I want to perform a simple task in R. I want to call a method on an object which has not been assigned to any variable yet.
Like this:
a <- c(5, 2, 11, 3)
b <- order(a, decreasing = TRUE)[1:floor(0.1 * length(.))]
So I guess, I would like to to find, what to pass to length function here. I know that I can perform it like this:
a <- c(5, 2, 11, 3)
b <- order(a, decreasing = TRUE)
b <- b[1:floor(0.1 * length(b))]
But I wanted to make it like I wrote above.
There is as far as i know, no implemented way that will achieve higher efficiency than the base code
a <- c(5, 2, 11, 3)
b <- order(a, decreasing = TRUE)
b[1:floor(0.1 * length(b))]
However one can achieve something similar to what you are asking, using either the magrittr, the dplyr or similar packages, which allow for piping calls. This would look similar to
a <- c(5, 2, 11, 3)
c <- a %>% order(., decreasing = TRUE) %>% .[1:floor(0.1 * length(.))]
identical(b[1:floor(0.1 * length(b))],c)
[1] TRUE

function writing in R

I have a dataframe(df) with only three columns showing ideal weight(x) in kgs (column1), age(y) in years (column 2) and gender(z) (column 3, boy coded 1 & girl coded 2) for school students. I want to write a function for getting what is ideal weight of a school student at given age and gender. My novice attempt is shown below:
idealwt<-function(age,gender){
age=df$y
gender=df$z
idealwt = df$x[age==df$y & gender==df$z]
return(idealwt)
}
However, above function returns the whole vector instead of specific value.
The problem in the OP's function results from creating the objects 'age' and 'gender' which is also the arguments of the function. So, in essence, we are comparing the df$y == df$y and df$z == df$z which results in getting TRUE for all the elements and output is the whole vector. Instead, we can define the function without age = df$y and gender = df$z
idealwt<-function(age,gender){
df$x[age==df$y & gender==df$z]
}
idealwt(12, 2)
#[1] 38 42
data
df <- data.frame(x = c(45, 38, 55, 33, 42), y = c(15, 12, 18, 14, 12),
z = c(1, 2, 1, 1, 2))

Randomizing a column in a list of dataframe

I want to have multiple copies of a dataframe, but with each time a new randomization of a variable. My objective behind this is to do multiple iterations of an analysis with a randomize value for one variable.
I've started by doing a list of dataframe, with copies of my original dataframe:
a <- c(1, 2, 3, 4, 5)
b <- c(45, 34, 50, 100, 64)
test <- data.frame(a, b)
test2 <- lapply(1:2,function(x) test) #List of 2 dataframe, identical to test
I know about transform and sample, to randomize the values of a column:
test1 <- transform(test, a = sample(a))
I just cannot find how to apply it to the entire list of dataframes. I've tried this:
test3<- lapply(test2,function(i) sample(i[["a"]]))
But I lost the other variables. And this:
test3 <- lapply(test2,function(i) {transform(i, i[["a"]]==sample(i[["a"]]))})
But my variable is not randomized.
Multiple questions are similar to mine, but didn't helped me to solve my problem:
Adding columns to each in a list of dataframes
Add a column in a list of data frames
You can try the following:
lapply(test2, function(df) {df$a <- sample(df$a); df})
Or, using transform:
lapply(test2, function(df) transform(df, a = sample(a)))
Or just
lapply(test2, transform, a = sample(a))
Is there a reason you need them in separate lists?
This will give you 10 columns of randomized samples of a in different columns and then you could loop through the columns for your further analysis.
a <- c(1, 2, 3, 4, 5)
b <- c(45, 34, 50, 100, 64)
test <- data.frame(a, b)
for(i in 3:12){
test[,i] <- transform(sample(a))
}
`

Resources