Why are values converted to strings in the rollapply window? - r

More newbie questions... I am trying to understand why rollapply is turning all my columns to strings. Suppose I have this:
> df <- data.frame(col1=c(1,2,3,4),
col2=c("a","b","c","d"),
col3=c("!","#","#","$"),
stringsAsFactors = F))
> v <- zoo(df, toupper(df$col2))
> v
col1 col2 col3
A 1 a !
B 2 b #
C 3 c #
D 4 d $
And then I run rollapply:
> rollapply(v, 2, by.column = F, function(x) {
+ sum(x[,"col1"])
+ })
Error in sum(x[, "col1"]) : invalid 'type' (character) of argument
Why is col1 now a character? and how do I fix it so I get a slice of my original zoo object in each window?

Rolled my own rollapply function based on some reading of other posts on SO. This just returns the indexes into the data (i.e. the zoo object):
rollapply.list <- function(data, width, FUN) {
len <- NROW(data)
add <- rep(0:(len-width),each=width)
lst <- rep(1:(width),len-width+1)
seq.list <- split(lst+add, add)
lapply(seq.list, FUN)
}
and then apply the indexes to the original data like:
rollapply.list(data=v, width=2, FUN=function(x) {
slice <- v[x] #slice out indexes from the original zoo object
...
}

Related

I am having an issue adding vectors to a list [duplicate]

This question already has an answer here:
What is the difference between [ ] and [[ ]] in R? [duplicate]
(1 answer)
Closed 1 year ago.
I have a list of matrices constructed by the following loops:
# Set up Row and Column Names for prediction coefficients.
rows = c("Intercept", "actsBreaks0", "actsBreaks1","actsBreaks2","actsBreaks3","actsBreaks4","actsBreaks5","actsBreaks6",
"actsBreaks7","actsBreaks8","actsBreaks9","tBreaks0","tBreaks1","tBreaks2","tBreaks3", "unitBreaks0", "unitBreaks1",
"unitBreaks2","unitBreaks3", "covgBreaks0","covgBreaks1","covgBreaks2","covgBreaks3","covgBreaks4","covgBreaks5",
"covgBreaks6","yearBreaks2016","yearBreaks2015","yearBreaks2014","yearBreaks2013","yearBreaks2011",
"yearBreaks2010","yearBreaks2009","yearBreaks2008","yearBreaks2007","yearBreaks2006","yearBreaks2005",
"yearBreaks2004","yearBreaks2003","yearBreaks2002","yearBreaks2001","yearBreaks2000","yearBreaks1999",
"yearBreaks1998","plugBump0","plugBump1","plugBump2","plugBump3")
cols = c("Value")
# Build Matrix for dummy coefficient values.
matrix1 <- matrix(c(1:48), nrow = 48, ncol = 1, byrow = TRUE, dimnames = list(rows,cols))
matrix1
# Extract each variable type into own matrix (i.e. all "actsBreaks{x}")
#
Beta_names <- list()
betabreaks <- unique(gsub("[0-9]*", "", rows))
for (bc in betabreaks)
{
Breaks <- grep(paste0(bc, "[0-9]*"), rows)
Beta_names[[bc]] <- matrix1[Breaks, ,drop = FALSE]
Beta_names[[bc]] <- data.matrix(unlist(Beta_names[[bc]])) #, byrow = TRUE)
}
# Set up matrices for excluded/test data
one_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,2,0,10)
two_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,3,0,10)
three_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,4,10,0)
four_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,5,0,10)
five_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,6,0,10)
six_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,7,0,10)
seven_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,8,0,10)
eight_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,9,0,10)
nine_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,1,0,10)
ten_column <-c(1,1,2,3,3,3,4,4,4,4,4,5,6,9,9,8,7,5,4,7,7,8,0,0,10)
DF1 <- data.frame (one_column ,two_column ,three_column ,
four_column ,five_column ,six_column ,
seven_column ,eight_column ,nine_column ,
ten_column )
paralength <- 5
Xnames <- list()
datindc <- 1
while ( datindc <= paralength )
{
Xbreaks <- factor(DF1[[datindc]],levels=sort(unique.default(DF1[[datindc]]),decreasing=FALSE))
Xnames[[datindc]] <- data.frame(model.matrix(~Xbreaks -1), stringsAsFactors = FALSE)
datindc <- datindc + 1
}
#
Xlngth <- length(Xnames)
BtaXind <- 1
BetaiXi <- list()
while ( BtaXind <= Xlngth )
{
BetaiXi[[BtaXind]] <- (Beta_names[[BtaXind + 1]] * Xnames[[BtaXind]])
BtaXind <- (BtaXind + 1)
}
I need to add each of those matrices' rows to each other, which I am trying to do by turning each matrix into a vector
BiXilngth <- length(BetaiXi)
BetaiXiTr <- list()
BtaiXiTrd <- 1
while (BtaiXiTrd <= BiXilngth)
{
Var1 <- c(t(BetaiXi[[BtaiXiTrd]]))
BetaiXiTr[BtaiXiTrd] <- Var1
BtaiXiTrd <- BtaiXiTrd + 1
}
and adding the vectors, effectively transposing the matrices. However, when I tried to convert the first matrix BetaiXi[[1]] to a vector and add it to the list with this command BetaiXiTr[BtaiXiTrd] <- c(t(BetaiXi[[BtaiXiTrd]])) I got the following message:
Warning message:
In BetaiXiTr[BtaiXiTrd] <- c(t(BetaiXi[[BtaiXiTrd]])) :
number of items to replace is not a multiple of replacement length
I then tried using unlist():
> BetaiXiTr[BtaiXiTrd] <-unlist(c(t(BetaiXi[[1]])))
Warning message:
In BetaiXiTr[BtaiXiTrd] <- unlist(c(t(BetaiXi[[1]]))) :
number of items to replace is not a multiple of replacement length
with the same result. Finally, I tried assigning the first vector to a variable > Var1 <- c(t(BetaiXi[[BtaiXiTrd]])) and assigning that vector to the list > BetaiXiTr[BtaiXiTrd] <- Var1 with, yet again, the same warning:
Warning message:
In BetaiXiTr[BtaiXiTrd] <- Var1 :
number of items to replace is not a multiple of replacement length
I searched for the warning message to determine what exactly I was being warned of but ended being more confused. Most reproduce or encountered the error message by trying to replace a vector of so many elements with a vector of fewer, while (to my understanding) I am simply trying to add a vector to a list. Am I going about this the incorrect way?
I was using [ ] and [ [ ] ] incorrectly in BetaiXiTr[BtaiXiTrd]. It needs to be BetaiXiTr[[BtaiXiTrd]]and that allows the vectors to be added.

Keep last n characters of cells in a function in R

Consider the following data.frame:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))
I only want to keep the 4 last characters of the cells in the column "id". Therefore, I can use the following code:
df$id <- substr(df$id,nchar(df$id)-3,nchar(df$id))
However, I want to create a function that does the same. Therefore, I create the following function and apply it:
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
}
df <- testfunction(df)
But I do not get the same result. Why is that?
Add return(x) in your function to return the changed object.
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
return(x)
}
df <- testfunction(df)
However, you don't need an explicit return statement always (although it is better to have one). R by default returns the last line in your function so here you can also do
testfunction <- function(x) {
transform(x, id = substring(id, nchar(id)-3))
}
df <- testfunction(df)
which should work the same.
We can also create a function that takes an argument n (otherwise, the function would be static for the n and only useful as a dynamic function for different data) and constructs a regex pattern to be used with sub
testfunction <- function(x, n) {
pat <- sprintf(".*(%s)$", strrep(".", n))
x$id <- sub(pat, "\\1", x$id)
return(x)
}
-testing
testfunction(df, n = 4)
# id value
#1 2010 1
#2 2010 1
#3 2010 1
#4 2010 1
#5 2010 1
Base R solution attempting to mirror Excel's RIGHT() function:
# Function to extract the right n characters from each element of a provided vector:
right <- function(char_vec, n = 1){
# Check if vector provided isn't of type character:
if(!is.character(char_vec)){
# Coerce it, if not: char_vec => character vector
char_vec <- vapply(char_vec, as.character, "character")
}
# Store the number of characters in each element of the provided vector:
# num_chars => integer vector
num_chars <- nchar(char_vec)
# Return the right hand n characters of the string: character vector => Global Env()
return(substr(char_vec, (num_chars + 1) - n, num_chars))
}
# Application:
right(df$id, 4)
Data:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))

Apply concordance dataframe to zoo objects

I have a zoo object made of several time series, like this:
indices <- seq.Date(as.Date('2000-01-01'),as.Date('2005-01-30'),by="year")
a <- zoo(rnorm(5), order.by=indices)
b <- zoo(rnorm(5), order.by=indices)
c <- zoo(rnorm(5), order.by=indices)
ts_origin <- merge(a,b,c)
I would like to multiply each zoo series from ts_origin by a ratio contained in a dataframe, an put
the results in another zoo object (ts_final) that contains the time seris d,e,f. In other words,
the dataframe is a concordance file between a,b,c and d,e,f , and the ratio would be applied this way:
ts_final$d = ts_origin$a * 10 ; ts_final$e = ts_origin$b * 100 ; ts_final$f = ts_origin$c * 1000.
df <- data.frame(original = c("a","b","c"),
final = c("d","e","f"),
ratio = c(10,100,1000))
indices <- seq.Date(as.Date('2000-01-01'),as.Date('2005-01-30'),by="year")
d <- zoo(, order.by=indices)
e <- zoo(, order.by=indices)
f <- zoo(, order.by=indices)
ts_final <- merge(d,e,f)
Not too sure what the best approach for this. I was trying with the apply function, but couldn't make
it work... any help would be greatly appreciated!
1) Map/merge
Use Map to iterate over final, original and ratio executing the products required producing a list of zoo objects L. Note that Map takes the names from the first argument after fun. Then merge the list components forming zoo object ts_final.
fun <- function(f, o, r) ts_origin[, o] * r
L <- with(df, Map(fun, final, original, ratio))
ts_final <- do.call("merge", L)
The result using the inputs shown in the Note at the end is this zoo object:
> ts_final
d e f
2000-01-01 -5.6047565 46.09162 400.7715
2001-01-01 -2.3017749 -126.50612 110.6827
2002-01-01 15.5870831 -68.68529 -555.8411
2003-01-01 0.7050839 -44.56620 1786.9131
2004-01-01 1.2928774 122.40818 497.8505
2005-01-01 17.1506499 35.98138 -1966.6172
2) sweep
Another approach is to sweep out the ratios setting the names appropriately giving the same result as in (1).
with(df, sweep(setNames(ts_origin[, original], final), 2, ratio, "*"))
3) rep
Set the names and multiply by ratio repeated appropriately giving the same result as in (1).
nr <- nrow(df)
with(df, setNames(ts_origin[, original], final) * rep(ratio, each = nr))
Note
We can define the input reproducibly like this:
set.seed(123)
tt <- as.Date(ISOdate(2000:2005, 1, 1))
m <- matrix(rnorm(6*3), 6, dimnames = list(NULL, c("a", "b", "c")))
ts_origin <- zoo(m, tt)
df <- data.frame(original = c("a","b","c"),
final = c("d","e","f"),
ratio = c(10,100,1000))
Here is a one-liner, with wrong final names.
ts_final <- t(df$ratio * t(ts_origin))
ts_final
# a b c
#2000-01-01 -5.382213 -12.64773 -513.6408
#2001-01-01 -9.218280 -98.55123 -1826.6430
#2002-01-01 2.114663 -28.58910 290.8008
#2003-01-01 -3.576460 -23.47314 -166.5473
#2004-01-01 6.490508 -36.29317 -398.0389
#2005-01-01 -5.382213 -12.64773 -513.6408
Now assign final names.
colnames(ts_final) <- df$final

R: Function arguments and lapply nested in a function or called from external function with data.table

Still new to data.table and working with environments.
I have a data.table similar to this (although much larger):
mydt <- data.table(ID = c("a", "a", "a", "b", "b", "b"),
col1 = c(1, 2, 3, 4, 5, 6),
col2 = c(7, 8, 9, 10, 11, 12),
key = "ID")
I wrote a function that takes mydt, splits it in a list of data.tables by its key, and then in each table in the list of data.tables takes the column, specified by the user in an argument and multiplies it by a number, provided by the user in another argument:
myfun <- function(data, constant, column) {
data <- split(x = data, by = key(data))
data <- lapply(data, function(i) {
i[ , (column) := get(column)*constant]
})
return(data)
}
x <- myfun(data = mydt, constant = 3, column = "col1")
x
$a
ID col1 col2
1: a 3 7
2: a 6 8
3: a 9 9
$b
ID col1 col2
1: b 12 10
2: b 15 11
3: b 18 12
If I understand correctly the scoping rules in R, lapply will look in the environment it was called in, will find the column and constant provided as arguments to myfun and will use them.
However, the function passed to lapply is much longer and more complex than the one here and it will be used in other functions that do many other things than just splitting the data.table. This is why I would like to define this part as an external function that will be called within other functions. This is what I did:
split.dt <- function(data) {
split(data, by = key(data))
}
mult <- function(data) {
lapply(data, function(i) {
i[ , (column) := get(column)*constant]
})
}
myfun <- function(data, constant, column) {
data <- split.dt(data = data)
data <- mult(data = data)
}
x <- myfun(data = mydt, constant = 3, column = "col1")
An error is returned:
Error in eval(expr, envir, enclos) : object 'column' not found
What I tried is wrapping column like i[ , eval(column)] and i[ , eval(column)] within the mult function with parent.frame() and parent.env() without any success. At the end I reached a solution where I used sys.call to get the arguments passed to myfun in a list and use them in mult like this:
split.dt <- function(data) {
split(data, by = key(data))
}
mult <- function(data) {
supplied.col <- sys.call(which = -1)[["column"]]
supplied.constant <- sys.call(which = -1)[["constant"]]
lapply(data, function(i) {
i[ , eval(supplied.col) := get(supplied.col)*supplied.constant]
})
}
myfun <- function(data, constant, column) {
data <- split.dt(data = data)
data <- mult(data = data)
}
x <- myfun(data = mydt, constant = 3, column = "col1")
x
$a
ID col1 col2
1: a 3 7
2: a 6 8
3: a 9 9
$b
ID col1 col2
1: b 12 10
2: b 15 11
3: b 18 12
It does work, BUT I am not sure if:
This is the right or most efficient approach. What is the way to make mult look up at the arguments supplied to myfun?
Will this work if the functions are wrapped in a package?
1) Just pass column and constant to mult as additional arguments.
mult <- function(data, constant, column) {
lapply(data, function(i) {
i[ , (column) := get(column)*constant]
})
}
myfun <- function(data, constant, column) {
data <- split.dt(data = data)
data <- mult(data, constant, column)
}
2) Alternately define mult as:
mult <- function(data, envir = parent.frame()) with(envir,
lapply(data, function(i) {
i[ , (column) := get(column)*constant]
})
)
2a) or
mult <- function(data, envir = parent.frame()) {
constant <- envir$constant
column <- envir$column
lapply(data, function(i) {
i[ , (column) := get(column)*constant]
})
}

Difference between lazy and substitute in R

I'm trying to use the lazyeval package to create non-standard evaluation in R, but was confused about what's the difference between substitute and lazy.
df <- data.frame(col1 = runif(10), col2 = runif(10))
> df
col1 col2
1 0.54959138 0.8926778
2 0.99857207 0.9649592
3 0.26451336 0.9243096
4 0.98755113 0.7155882
5 0.84257525 0.5918387
6 0.20692997 0.5875944
7 0.44383744 0.5839235
8 0.44014903 0.1006080
9 0.49835993 0.7637619
10 0.07162048 0.3155483
I first created a function to take a data frame and two column names and return a column that is the sum of the two columns. substitute and eval seem to work just fine.
SubSum <- function(data, x, y) {
exp <- substitute(x+y)
r <- eval(exp, data)
return(cbind(data, data.frame(sum=r)))
}
> SubSum(df, col1, col2)
col1 col2 sum
1 0.54959138 0.8926778 1.4422692
2 0.99857207 0.9649592 1.9635312
3 0.26451336 0.9243096 1.1888229
4 0.98755113 0.7155882 1.7031394
5 0.84257525 0.5918387 1.4344140
6 0.20692997 0.5875944 0.7945244
7 0.44383744 0.5839235 1.0277610
8 0.44014903 0.1006080 0.5407570
9 0.49835993 0.7637619 1.2621218
10 0.07162048 0.3155483 0.3871688
I then tried to create a function with lazy and lazy_eval, but it didn't work.
require(lazyeval)
LazySum <- function(data, x, y) {
exp <- lazy(x+y)
r <- lazy_eval(exp, data)
return(cbind(data, data.frame(sum=r)))
}
> LazySum(df, col1, col2)
Error in eval(expr, envir, enclos) : object 'col1' not found
My current answer
After some trial and error, this snippet seems to work.
LazySum <- function(data, x, y) {
exp <- interp(~x + y, x=lazy(x), y=lazy(y))
r <- lazy_eval(exp, data)
return(cbind(data, data.frame(sum=r)))
}
Basically I had to build the lazy expression myself using interp.
You were pretty close.
read ?lazy especially the examples to understand the changes I made to your code
require(lazyeval)
set.seed(357)
df <- data.frame(col1 = runif(10), col2 = runif(10))
LazySum <- function(data, sum=x+y) {
exp <- lazy(sum) #giving lazy a named arguement
r <- lazy_eval(exp, data)
return(cbind(data, data.frame(sum=r)))
}
LazySum(df, col1+col2)

Resources