I have a time series with multiple columns, some have NAs in them, for example:
date.a<-seq(as.Date('2014-01-01'),as.Date('2014-02-01'),by = 2)
date.b<-seq(as.Date('2014-01-01'),as.Date('2014-02-15'),by = 3)
df.a <- data.frame(time=date.a, A=sin((1:16)*pi/8))
df.b <- data.frame(time=date.b, B=cos((1:16)*pi/8))
my.ts <- merge(xts(df.a$A,df.a$time),xts(df.b$B,df.b$time))
I'd like to apply a function to each of the rows, in particular:
prices2percreturns <- function(x){100*diff(x)/x}
I think that sapply should do the trick, but
sapply(my.ts, prices2percreturns)
gives Error in array(r, dim = d, dimnames = if (!(is.null(n1 <- names(x[[1L]])) & :
length of 'dimnames' [1] not equal to array extent. I suspect that this is due to the NAs when merging, but maybe I'm just doing something wrong. Do I need to remove the NAs or is there something wrong with the length of the vector returned by the function?
Per the comments, you don't actually want to apply the function to each row. Instead you want to leverage the vectorized nature of R. i.e. you can simply do this
100*diff(my.ts)/my.ts
If you do want to apply a function to each row of a matrix (which is what an xts object is), you can use apply with MARGIN=1. i.e. apply(my.ts, 1, myFUN).
sapply(my.ts, myFUN) would work like apply(my.ts, 2, myFUN) in this case -- applying a function to each column.
Your diff(x) will be 1 shorter than your x. Also your returns will be based on the results. You want returns based on the starting price not the end price. Here I change the function to reflect that and apply the function per column.
prices2percreturns <- function(x){100*diff(x)/x[-length(x)]}
prcRets = apply(my.ts, 2, prices2percreturns)
Related
If I repeat this code
x<-1:6
n<-40
M<-200
y<-replicate(M,as.numeric(table(sample(x,n,1))))
str(y)
sometimes R decide to create a matrix and sometimes it creates a list. Can you explain me the reason for that? How can I be sure that it is a matrix or a list?
If you chose M very small, for example 10, it will almost always create a matrix. If you chose M very large, for example 2000, it will create a list.
You get a list for cases when not all the numbers in x are sampled.
You can always return a list by using simplify = FALSE.
y <- replicate(M, as.numeric(table(sample(x,n,TRUE))), simplify = FALSE)
Also, you are using 1 to set replace argument. It is better to use logical argument i.e TRUE.
To return always a matrix, we can do :
sapply(y, `[`, x)
This will append NA's for values where length is unequal.
May be it will help
[https://rafalab.github.io/dsbook/r-basics.html#data-types][1]
Vectors in matrix have to be all the same type and length
Vectors in list can contain elements of different classes and length
Try this:
x<-1
y<-2:7
z<-matrix(x,y)
z<-list(x,y)
In first case you will get matrix 2 rows and 1 column because y vector is longer
In the second case you will get a list with elements of different length.
Also
str()
function is very useful. But you can find the class of object using
class()
function.
I am converting my for-loops in R for a model that has multiple input datasets. In the for-loop I use the current loop value to retrieve values from other datasets. I am looking to replicate this using an apply function (over columns in a dataset) however I'm struggling to establish index of the apply function in order to retrieve the appropriate variables from other data
The apply function references the column by the variable in the function which is fine and I've tried to use both colname (after having named my various columns by number) but have not had any joy. Below is an example dataset and for loop with what I'd like to achieve (simplified somewhat). The length of the vectors and the number of columns in the tabular dataset will always be equal.
iteration<-1:3
df <- data.frame("column1" = 6:10, "column2" = 12:16, "column3" = 31:35)
variable1<-rnorm(3,mean = 25)
variable2<-rnorm(3, mean = 0.21)
outcome<-numeric()
for (i in iteration) {
intermediate<-(mean(df[,i])*variable1[i])^variable2[i]
outcome<-c(outcome,intermediate)
}
outcome
The expected results are outcome above...trying this in apply
What I imagine it to be is this:
apply(df, 2, function(x) (mean(x)*variable1[colnumber(x)])^variable2[colnumber(x)]
or perhaps
apply(df, 2, function(x) (mean(x)*variable1[x])^variable2[x])
but these two obviously do not work.
first time user so apologies for any etiquette issues but found the answer to my own problem using the purrr package, but maybe this helps someone else
pmap(list(df, variable1, variable2), function(df, variable1, variable2) (mean(df)*variable1)^variable2)
df is a frequency table, where the values in a were reported as many times as recorded in column x,y,z. I'm trying to convert the frequency table to the original data, so I use the rep() function.
How do I loop the rep() function to give me the original data for x, y, z without having to repeat the function several times like I did below?
Also, can I input the result into a data frame, bearing in mind that the output will have different column lengths:
a <- (1:10)
x <- (6:15)
y <- (11:20)
z <- (16:25)
df <- data.frame(a,x,y,z)
df
rep(df[,1], df[,2])
rep(df[,1], df[,3])
rep(df[,1], df[,4])
If you don't want to repeat the for loop, you can always try using an apply function. Note that you cannot store it in a data.frame because the objects are of different lengths, but you could store it in a list and access the elements in a similar way to a data.frame. Something like this works:
df2<-sapply(df[,2:4],function(x) rep(df[,1],x))
What this sapply function is saying is for each column in df[,2:4], apply the rep(df[,1],x) function to it where x is one of your columns ( df[,2], df[,3], or df[,4]).
The below code just makes sure the apply function is giving the same result as your original way.
identical(df2$x,rep(df[,1], df[,2]))
[1] TRUE
identical(df2$y,rep(df[,1], df[,3]))
[1] TRUE
identical(df2$z,rep(df[,1], df[,4]))
[1] TRUE
EDIT:
If you want it as a data.frame object you can do this:
res<-as.data.frame(sapply(df2, '[', seq(max(sapply(df2, length)))))
Note this introduces NAs into your data.frame so be careful!
R newbie here.
I'm learning functions, and i have a problem running this:
newfunction = function(x) {
limit = ncol(x)
for(i in 1:limit){
if(anyNA(x[,i] == T)) {
x[,i] = NULL
}
}
}
newfunction(WBD_SA)
I get the error: Error in '[.data.frame(x, , i) : undefined columns selected
I'm trying to remove all columns that have any NA values from my data set WBD_SA.
I know na.omit() removes for rows with NA values, but not sure if there is something for columns.
Any suggestions regarding packages/functions that can make this happen are also appreciated.
Cheers!
You are getting this error because you are iterating from 1 to limit, where limit is the number of columns at the start of the function, and you're dropping columns from the data.frame as you iterate through the for loop. This means that if you drop even 1 column, ncol(x) will be less than limit by the time the for loop ends. I'll give you 3 alternatives that work:
iterate backward:
for(i in limit:1)
if(anyNA(x[,i] == TRUE))
x[,i] = NULL
with the above loop, the i'th column will always be in the the same position as the it was when the for loop started.
iterate forward using a while loop:
i = 1
while(i <=ncol(x)){
if(anyNA(x[,i] == TRUE))
x[,i] = NULL
i=i+1
}
use the fact that data.frames are subclasses of lists, and use lapply to create an index that is TRUE for columns that contain a missing value and FALSE otherwise, like so:
columnHasMissingValue <- lapply(x,function(y)any(is.na(y)))
x <- x[,!columnHasMissingValue]
as long as you're learing about data.frames, it's useful that you can use negative indicies to drop column like so:
x <- x[,-which(columnHasMissingValue)]
Note that the above solution is similar to the apply solution in user1362215's solution, which takes advantage of the fact that data.frames have two dimensions* so you can apply a function over the second margin (columns) like so:
good_cols = apply(x,# the object over which to apply the function
2,# apply the function over the second margin (columns)
function(x) # the function to apply
!any(is.na(x))
)
x = x[,good_cols]
* 2 dimensions means that the [ operator defined for the data.frame class takes 2 arguments that are interpreted as rows and columns indexes.
When you are iterating over the columns, using x[,i] = NULL removes the column, reducing the number of columns by 1. Unless i is the last column, this will produce errors for future values of i. You should instead do something like this
good_cols = apply(x,2,function(x) {!any(is.na(x))})
x = x[,good_cols]
apply(x,margin,function) applies function over the margin dimension (rows for the value of 1, columns for the value of 2; 3 or higher is possible with arrays) of x, which is more efficient than looping (and doesn't cause errors from changing x partway).
When I pass a row of a data frame to a function using apply, I lose the class information of the elements of that row. They all turn into 'character'. The following is a simple example. I want to add a couple of years to the 3 stooges ages. When I try to add 2 a value that had been numeric R says "non-numeric argument to binary operator." How do I avoid this?
age = c(20, 30, 50)
who = c("Larry", "Curly", "Mo")
df = data.frame(who, age)
colnames(df) <- c( '_who_', '_age_')
dfunc <- function (er) {
print(er['_age_'])
print(er[2])
print(is.numeric(er[2]))
print(class(er[2]))
return (er[2] + 2)
}
a <- apply(df,1, dfunc)
Output follows:
_age_
"20"
_age_
"20"
[1] FALSE
[1] "character"
Error in er[2] + 2 : non-numeric argument to binary operator
apply only really works on matrices (which have the same type for all elements). When you run it on a data.frame, it simply calls as.matrix first.
The easiest way around this is to work on the numeric columns only:
# skips the first column
a <- apply(df[, -1, drop=FALSE],1, dfunc)
# Or in two steps:
m <- as.matrix(df[, -1, drop=FALSE])
a <- apply(m,1, dfunc)
The drop=FALSE is needed to avoid getting a single column vector.
-1 means all-but-the first column, you could instead explicitly specify the columns you want, for example df[, c('foo', 'bar')]
UPDATE
If you want your function to access one full data.frame row at a time, there are (at least) two options:
# "loop" over the index and extract a row at a time
sapply(seq_len(nrow(df)), function(i) dfunc(df[i,]))
# Use split to produce a list where each element is a row
sapply(split(df, seq_len(nrow(df))), dfunc)
The first option is probably better for large data frames since it doesn't have to create a huge list structure upfront.