This a question about R's internals. I am curious if someone could explain how the following call works.
# Let's just work with part of the iris data
data(iris)
df <- iris[1:10, 1:4]
# Now the question
1 - df
Does R create another matrix of equivalent dimensions? Does it loop over all elements? How is R subtracting a matrix from an integer?
Note that your example is a data.frame and not a matrix. I will refer to the data.frame case.
An S3 method is dispatched by the Ops group generic (see methods("Ops")). The relevant method is Ops.data.frame. Here are some excerpts with comments added by me:
#create an unevaluated function call
FUN <- get(.Generic, envir = parent.frame(), mode = "function")
f <- if (unary)
quote(FUN(left))
else quote(FUN(left, right))
#...
#a lot of checking and preparations
#...
#loop over the columns, create the function input and evaluate the function call
for (j in seq_along(cn)) {
left <- if (!lscalar)
e1[[j]]
else e1
right <- if (!rscalar)
e2[[j]]
else e2
value[[j]] <- eval(f)
}
In case of the arguments to - being an integer vector and an integer matrix, both are treated as an integer vector, but .Primitive("-") preserves attributes, which includes the dim atribute of the matrix. See also help("-").
Related
Function foo1 can subset (using subset()) a list of data.frames by one or more requested variables (e.g., by = ESL == 1 or by == ESL == 1 & type == 4).
However, I'm aware of the danger of using subset() in R. Thus, I wonder in foo1 below, what I can use instead of subset() to get the same output?
foo1 <- function(data, by){
s <- substitute(by)
L <- split(data, data$study.name) ; L[[1]] <- NULL
lapply(L, function(x) do.call("subset", list(x, s))) ## What to use instead of `subset`
## to get the same output?
}
# EXAMPLE OF USE:
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/k.csv", header=TRUE) # DATA
foo1(D, ESL == 1)
You can compute on the language. Building on my answer to "Working with substitute after $ sign in R":
foo1 <- function(data, by){
s <- substitute(by)
L <- split(data, data$study.name) ; L[[1]] <- NULL
E <- quote(x$a)
E[[3]] <- s[[2]]
s[[2]] <- E
eval(bquote(lapply(L, function(x) x[.(s),])))
}
foo1(D, ESL == 1)
This gets more complex for arbitrary subset expressions. You'd need a recursive function that crawls the parse tree and inserts the calls to $ at the right places.
Personally, I'd just use package data.table where this is easier because you don't need $, i.e., you can just do eval(bquote(lapply(L, function(x) setDT(x)[.(s),]))) without changing s. OTOH, I wouldn't do this at all. There is really no reason to split before subsetting.
I would guess (based on general knowledge and a quick skim of the answers to the "dangers of subset()" question) that the dangers of subset are intrinsic dangers of non-standard evaluation (NSE); if you want to be able to pass a generic expression and have it evaluated within the context of a data frame, I think you're more or less stuck with subset() or something like it.
If you were willing to use a more constrained set of expressions such as var, vals (looking for cases where the variable indexed by string var took on values in the vector vals) you could use
d[d[[var]] %in% vals, ]
Here var is a string, not a naked R symbol ("cyl" rather than cyl); it's unambiguous that you want to extract it from the data frame.
You could extend this to a vector of variables and a list of vectors of values:
for (i in seq_along(vars)) {
d <- d[d[[vars[i]]] %in% vals[[i]], ]
}
but if you want the full flexibility of expressions (e.g. to be able to use either ESL == 1 & type == 4 or ESL == 1 | type == 4, or inequalities based on numeric variables) I think you're stuck with an NSE-based approach.
It's conceivable that the new-ish "tidy eval" machinery (in the rlang package, documented in some detail here) would give you a slightly more principled approach, but I don't think the dangers will completely go away.
I want to calculate the log return of data . I define a function and want to load the data. but system always mentions second factor is missing. Otherwise it just calculate the log of row number.
#read data
data <- read.csv(file="E:/Lect-1-TradingTS.csv",header=TRUE)
mode(data)
p<-data["Price"]
#func1
func1 <- function(x1,x2)
{
result <- log(x2)-log(x1)
return(result)
}
#calculate log return
log_return<-vector(mode="numeric", length=(nrow(data)-1))
for(i in 2:nrow(p))
{
log_return[i-1] <- func1(p[(i-1):i])
}
Error in func1(p[(i - 1):i]) : argument "x2" is missing, with no default
Your function func1 was defined to accept two arguments, but you are passing it a single argument: the vector p[(i-1):i], which has two elements but is still considered a single object. To fix this you need to pass two separate arguments, p[i-1] and p[i]. Alternatively, modify the definition of func1 to accept a two-element vector:
func1 <- function(v)
{
x1 <- v[1]
x2 <- v[2]
result <- log(x2)-log(x1)
return(result)
}
Thank you guys,all your answers inspired me. I think I found a solution.
log_return[i-1] <- func1(p[(i-1),"Price"],p[(i),"Price"])
basically you do not need a func for those calcs in R
R's vectorization comes in handy in these cases
data <- read.csv(file="E:/Lect-1-TradingTS.csv",header=TRUE)
mode(data)
p <- data[["Price"]]
logrets <- log(p[2:length(p)]) - log(p[1:length(p)-1])
This vectorized computation will usually also heavily outperform any function you define "by hand".
I am trying to develop my first package in R and I am facing some issues with "myclass" generic functions that i will try to describe.
Assume a data.frame X with n <- nrow(X) rows and K <- ncol(X) columns.
My main package function (too big to put it in this post) lets say
fun1 <- function(X){
# do staff...
out <- list(index= character vector, A= A, B= B,... etc)
return(out)
class(out) <- "myclass"
}
returns as an output a list. Then I have to use the output for the generic print method in a print.myclass function. However, in my print function I want to use the data frame X used in my main function without asking the user to provide it in an argument (i.e, print(out,X)) and without having it in my output list out (visible to the user at least). Is there any way to do that? Thanks in advance!
I have a function where im trying to compare a dataframe column to a ref table of type character. I have downloaded some data from the Norwegian central statistics office with popular first names. I want to add a column to my data frame which is basically a 1 or a 0 if the name appears in the list (1 being a boy 0 being a girl). Im getting the following error with the code
*Error in match(x, table, nomatch = 0L) : object 'x' not found*
Data frame is train.
Reference data is male_names
male_names <- read.csv("~/R/Functions_Practice/NO/BoysNames_Data.csv", sep=";",as.is = TRUE)[ ,1]
get.sex <- function(x, ref)
for (i in ref)
{
if(x %in% ref)
{return (1)}
}
# set default for column
train$sex <- 2
# Update column if it appears in the names list
train$sex <- sapply(train$sex, FUN=get.sex(x,male_names))
I would then use the function to run the second Girls Name file against the table and set the flag for each record to zero where that occurs
Can anyone help
When using sapply, you don't write arguments directly in the FUN parameter.
train$sex <- sapply(train$sex, FUN=get.sex,ref = male_names)
It is implied that train$sex is the x argument, and all other parameters are passed after that (in this case, it's just ref) and are explicitly defined.
Edit:
As joran noted, in this case sapply isn't particularly useful, and you can do the results in one line:
train$sex = (train$sex %in% male_names)*1
%in% can be used when the argument on the left is a vector, so you don't have to loop over it. Multiplying the result by one converts logical (boolean) values into integers. 1*TRUE yields 1, and 1*FALSE yields 0.
I have a sublist of principal component rotation vectors computed by prcomp, where each list item is an Nx2 array (i.e., two column vectors), for each class.
Using those vectors, I'd like to project some data similarly structured into a list of classes, each class item containing arrays with dimension NxMxT, where T is the number of trials.
My problem is, I can write simple vectorized functions with apply and its variants, but I'm having trouble generalizing this to apply that over each list.
Example data:
somedata <- list(array(rnorm(100),dim=c(5,4,5)),array(rnorm(100),dim=c(5,4,5)))
somevectors <- list(array(rnorm(10),dim=c(5,2)),array(rnorm(10),dim=c(5,2)))
Here is a simple example of the operation over each list element:
o.proj.1 <- apply(somedata[[1]],3,function(x){
t(somevectors[[1]]) %*% x
}) # returns an array where each projected trial is a column
I tried fitting this inside a call to lapply(), but didn't find much success:
lapply(somedata, y = somevectors, function(x,y){
apply(x,3,function(z){
t(y) %*% z
})
})
Error in t(y) %*% z : requires numeric/complex matrix/vector arguments
Basically my algorithm is to put the appropriate apply type (here lapply) around the more local function and remove the index that will be vectorized (here [[]]). What am I missing?
Of the *apply family of functions, mapply is the one to use when you want to loop simultaneously over two or more objects. Try:
o.proj <- mapply(function(x,y){
apply(x,3,function(z){
t(y) %*% z
})
}, somedata, somevectors, SIMPLIFY = FALSE)
I suppose you will want to use SIMPLIFY = FALSE to return a list, otherwise mapply will attempt to simplify your output into an array, a little like sapply does.
Also know that you can use Map as a shortcut for mapply(..., SIMPLIFY = FALSE).