I am downloading data from FRED with getSymbols. This creates an xts class object with the data attribute set to type integer for the series' that I am downloading. I want these data to be of type/class double.
What is an idiomatic way of doing this?
getSymbols("GDPMC1", src = 'FRED', auto.assign = TRUE)
growthRate <-
function (x) {
stopifnot(length(x) > 1)
(x[2:length(x)] - x[-length(x)] )/ x[-length(x)]
}
stopifnot(growthRate(c(2,3,4)) == c(0.5 , 1/3 ))
realGDPGrowthRate <- growthRate(GDPMC1) ### zeros due to integer math
You can change the storage mode for GDPMC1 to "double" via:
storage.mode(GDPMC1) <- "double"
But that won't solve your problem because the issue isn't integer arithmetic. The issue is that xts/zoo align objects by index before performing any Ops methods (arithmetic, logical operations, etc), so your growthRate function will never work correctly on xts/zoo objects.
You can use quantmod's Delt function instead of writing your own.
realGDPGrowthRate <- Delt(GDPMC1)
Related
Suppose I want to do something like:
mask_values <- function(x, mask) ifelse(mask, x, NA)
The purpose of this function is to take a vector and replace some of its values with NA based on the value of mask. However, this function doesn't guarantee that the return type is always the same as the input x. For example:
date_vec <- rep(lubridate::today(), 10)
my_mask <- rep(c(TRUE, FALSE), length.out = 10)
class(mask_values(date_vec, my_mask))
which yields "numeric" rather than the desired "Date". So I try switching to dplyr::if_else, which is supposed to preserve types:
mask_values <- function(x, mask) dplyr::if_else(mask, x, NA)
class(mask_values(date_vec, my_mask))
However, if_else also requires the input types to be the same as each other, and NA has type "logical", which means I get this error:
Error: `false` must be a `Date` object, not a logical vector.
So it seems that if I want to use if_else in order to preserve the input type, I need to be able to obtain an NA value with the same class as the input. Is there a reliable way to do this for any class? One possibility seems to be x[NA], but I'm not sure if that is a universal solution or if it just happens to work with the examples that I've tested. You can assume that the only classes that matter are "vector-like" classes for which NA values exist, such as Date and POSIXct, as well as all the basic R data types (logical, character, numeric, etc.).
Alternatively, is there another way to implement my mask_values function such that the return value always has the same type as x?
I recommend avoiding ifelse whenever possible. It is quite inefficient and as you have seen also quirky regarding what it returns (although that is well documented). I rarely use it and, if I do use it, only for interactive use and not programmatically.
The canonical and safe way of setting values to NA in base R is is.na<-. (Note that it supports logical and positional indexing. mask could also be a numeric vector.)
mask_values <- function(x, mask) {
is.na(x) <- mask
x
}
#or simply this:
#mask_values <- `is.na<-`
#i.e., `is.na<-` is already what you want.
class(mask_values(date_vec, my_mask))
#[1] "Date"
Alternatively, you can also use simple subset-assignment. NA is a logical value. (If you create it like this. It can be coerced to other types and of course you can specify it as other types with NA_real_ etc.) If you assign a logical vector into any other vector, it will be coerced to that other vector's type (because "logical" is the most primitive type).
mask_values <- function(x, mask) {
x[mask] <- NA
x
}
class(mask_values(date_vec, my_mask))
#[1] "Date"
Btw., this subset-assignment is how the is.na<-.default method is defined.
I prefer doing subset-assignment explicitly in my code but occasionally the convenience function replace can be useful.
There's a few questions here, I would be satisfied if any one of them was answered sufficiently well.
Background - what is the end goal?
I am interested in representing a date-range in R. Bare-minimum requirement is that we represent a start and end date, which can easily be done using a length-two date vector. Additionally, it would be nice to extend this object into a Class which further
supplies a name to each range (i.e. a character string)
enables the (easy) use of dplyr::between operator
Shortcomings of my previous approach
I've previously represented each range as a length-two date vector. The upside here is that I don't rely on any external dependencies and my data structure is so lightweight that it's not a hassle to program with. The downside is that I'm tired of having to access the beg and end of the date range via the [ operator and arguments 1 and 2 respectively (arguably less interpretable than if we had a class implementation).
Also, we ultimately deal with a sequence of date-ranges (i.e. a vector), and so abstracting away the DateRange is helpful before we start nesting data structures. I do not want to use a list of length-two date vectors nor do I wish to use a data.frame with two rows, each column being interpreted as a date-range.
Where have I looked?
I've looked at lubridate package and have considered inheriting from a Interval class. The downside to starting with this inheritance is that I don't think S4 is necessary for my use case. I just need a few simple data attributes and a nice API for calling dplyr::between.
An ideal solution might just extend the lubridate::Interval class to hold a name, an end date (could be a method as this info already stored in Interval via #start + #.Data), and extend dplyr::between to play nicely with said class.
What have I tried?
Here's a rough implementation of what I'm looking for:
# 3 key attributes: beg, end, and name.
MyInterval <- function(beg, end, name = NULL) {
if (class(beg) == "character") beg <- as.Date(beg)
if (class(end) == "character") end <- as.Date(end)
if (is.null(name)) name <- as.character(beg)
structure(.Data = list('beg' = beg, 'end' = end, 'name' = name), class = "MyInterval")
}
Now, I would like to be able to overload the between operator such that I may call it as follows: between(x, MyInterval), where we notice that dplyr::between(x, lo, hi) expects three arguments. To try and accomplish this, I've tried to set up type dispatching as follows:
between <- function(...) UseMethod('between')
between.MyInterval <- function(interval, x) {
if (class(x) == "character") x <- as.Date(x)
dplyr::between(x, interval$beg, interval$end)
}
between.default <- function(x, lo, hi) dplyr::between(x, lo, hi)
The reason I chose to use ... in the prototype for between is that the order of arguments currently differ between between.MyInterval and between.default. Is there a better way to code this up? I believe the behavior is as desired (to within a first glance)
i <- MyInterval("2012-01-01", "2012-12-31")
between(i, "2012-02-01") # Dispatches to between.MyInterval. Returns True as expected.
between(150, 100, 200) # Dispatches to dplyr::between. Good, we didn't break anything?
Thank you
Any criticisms are welcomed. I know that between is a function that doesn't do type-dispatching out of the box, and so implementing this myself raises a code smell.
A possibility is to use data.table's inrange-function.
First, let's make an interval:
my.interval <- function(beg, end) data.table(beg = as.Date(beg), end = as.Date(end))
mi <- my.interval("2012-01-01", "2012-12-31")
Now you can do:
> as.Date("2012-02-01") %inrange% mi
[1] TRUE
Or define you own inrange-function:
my.inrange <- function(x, intv) data.table::inrange(as.Date(x), intv$beg, intv$end)
With that you can do:
> my.inrange("2012-02-01", mi)
[1] TRUE
As #Frank commented, you can make an infix variant of my.inrange too:
`%my.inrange%` <- my.inrange
now you can use it in the following notation as well:
"2012-02-01" %my.inrange% mi
Which is similar to the infix notation of data.table's between and inrange functions.
I have a data frame of numerics,integers and string. I would like to check which columns are integers and I do
raw<-read.csv('./rawcorpus.csv',head=F)
ints<-sapply(raw,is.integer)
anyway this gives me all false. So I have to make a little change
nums<-sapply(raw,is.numeric)
ints2<-sapply(raw[,nums],function(col){return(!(sum(col%%1)==0))})
The second case works fine. My question is: what is actually checking the 'is.integer' function?
By default, R will store all numbers as double precision floating points, i.e., the numeric. Three useful functions class, typeof and storage.mode will tell you how a value is stored. Try:
x <- 1
class(x)
typeof(x)
storage.mode(x)
If you want x to be integer 1, you should do with suffix "L"
x <- 1L
class(x)
typeof(x)
storage.mode(x)
Or, you can cast numeric to integers by:
x <- as.integer(1)
class(x)
typeof(x)
storage.mode(x)
The is.integer function checks whether the storage mode is integer or not. Compare
is.integer(1)
is.integer(1L)
You should be aware that some functions actually return numeric, even if you expect it to return integer. These include round, floor, ceiling, and mod operator %%.
From R documentation:
is.integer(x) does not test if x contains integer numbers! For that, use round, as in the function is.wholenumber(x) in the examples.
So in is.integer(x), x must be a vector and if that contains integer numbers, you will get true. In your first example, argument is a number, not a vector
Hope that helps
Source: https://stat.ethz.ch/R-manual/R-devel/library/base/html/integer.html
This a question about R's internals. I am curious if someone could explain how the following call works.
# Let's just work with part of the iris data
data(iris)
df <- iris[1:10, 1:4]
# Now the question
1 - df
Does R create another matrix of equivalent dimensions? Does it loop over all elements? How is R subtracting a matrix from an integer?
Note that your example is a data.frame and not a matrix. I will refer to the data.frame case.
An S3 method is dispatched by the Ops group generic (see methods("Ops")). The relevant method is Ops.data.frame. Here are some excerpts with comments added by me:
#create an unevaluated function call
FUN <- get(.Generic, envir = parent.frame(), mode = "function")
f <- if (unary)
quote(FUN(left))
else quote(FUN(left, right))
#...
#a lot of checking and preparations
#...
#loop over the columns, create the function input and evaluate the function call
for (j in seq_along(cn)) {
left <- if (!lscalar)
e1[[j]]
else e1
right <- if (!rscalar)
e2[[j]]
else e2
value[[j]] <- eval(f)
}
In case of the arguments to - being an integer vector and an integer matrix, both are treated as an integer vector, but .Primitive("-") preserves attributes, which includes the dim atribute of the matrix. See also help("-").
I want to create an S4 class in R that will allow me to access large datasets (in chunks) from the cloud (similar to the goals of the ff package). Right now I'm working with a toy example called "range.vec" (I don't want to deal with internet access yet), which stores a sequence of numbers like so:
setClass("range.vec",
representation(start = "numeric", #beginning num in sequence
end = "numeric", #last num in sequence
step = "numeric", #step size
chunk = "numeric", #cache a chunk here to save memory
chunkpos = "numeric"), #where does the chunk start in the overall vec
contains="numeric" #inherits methods from numeric
)
I want this class to inherit the methods from "numeric", but I want it to use these methods on the whole vector, not just the chunk that I'm storing. For example, I don't want to define my own method for 'mean', but I want 'mean' to get the mean of the whole vector by accessing it chunk by chunk, using length(), '[', '[[', and el() functions that I've defined. I've also defined a chunking function:
setGeneric("set.chunk", function(x,...) standardGeneric("set.chunk"))
setMethod("set.chunk", signature(x = "range.vec"),
function (x, chunksize=100, chunkpos=1) {
#This function extracts a chunk of data from the range.vec object.
begin <- x#start + (chunkpos - 1)*x#step
end <- x#start + (chunkpos + chunksize - 2)*x#step
data <- seq(begin, end, x#step) #calculate values in data chunk
#get rid of out-of-bounds values
data[data > x#end] <- NA
x#chunk <- data
x#chunkpos <- chunkpos
return(x)
}})
When I try to call a method like 'mean', the function inherits correctly, and accesses my length function, but returns NA because I don't have any data stored in the .Data slot. Is there a way that I can use the .Data slot to point to my chunking function, or to tell the class to chunk numeric methods without defining every single method myself? I'm trying to avoid coding in C if I can. Any advice would be very helpful!
You could remove your chunk slot and replace it by numeric's .Data slot.
Little example:
## class definition
setClass("foo", representation(bar="numeric"), contains="numeric")
setGeneric("set.chunk", function(x, y, z) standardGeneric("set.chunk"))
setMethod("set.chunk",
signature(x="foo", y="numeric", z="numeric"),
function(x, y, z) {
## instead of x#chunk you could use numeric's .Data slot
x#.Data <- y
x#bar <- z
return(x)
})
a <- new("foo")
a <- set.chunk(a, 1:10, 4)
mean(a) # 5.5
Looks like there isn't a good way to do this within the class. The only solution I've found is to tell the user to calculate to loop through all of the chunks of data from the cloud, and calculate as they go.