I want to perform a simple task in R. I want to call a method on an object which has not been assigned to any variable yet.
Like this:
a <- c(5, 2, 11, 3)
b <- order(a, decreasing = TRUE)[1:floor(0.1 * length(.))]
So I guess, I would like to to find, what to pass to length function here. I know that I can perform it like this:
a <- c(5, 2, 11, 3)
b <- order(a, decreasing = TRUE)
b <- b[1:floor(0.1 * length(b))]
But I wanted to make it like I wrote above.
There is as far as i know, no implemented way that will achieve higher efficiency than the base code
a <- c(5, 2, 11, 3)
b <- order(a, decreasing = TRUE)
b[1:floor(0.1 * length(b))]
However one can achieve something similar to what you are asking, using either the magrittr, the dplyr or similar packages, which allow for piping calls. This would look similar to
a <- c(5, 2, 11, 3)
c <- a %>% order(., decreasing = TRUE) %>% .[1:floor(0.1 * length(.))]
identical(b[1:floor(0.1 * length(b))],c)
[1] TRUE
Related
Long time reader, first time poster. I have not found any previous questions about my current problem. I would like to create multiple linear functions, which I can later apply to variables. I have a data frame of slopes: df_slopes and a data frame of constants: df_constants.
Dummy data:
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
I would like to construct functions such as
myfunc <- function(slope, constant, trvalue){
result <- trvalue*slope+constant
return(result)}
where the slope and constant values are
slope<- df_slope[i,j]
constant<- df_constant[i,j]
I have tried many ways, for example like this, creating a dataframe of functions with for loop
myfunc_all<-data.frame()
for(i in 1:5){
for(j in 1:3){
myfunc_all[i,j]<-function (x){ x*df_slope[i,j]+df_constant[i,j] }
full_func[[i]][j]<- func_full
}
}
without success. The slope-constant values are paired up, such as df_slope[i,j] is paired with df_constant[i,j]. The desired end result would be some kind of data frame, from where I can call a function by giving it the coordinates, for example like this:
myfunc_all[i,j}
but any form would be great. For example
myfunc_all[2,1]
in our case would be
function (x){ x*2+4]
which I can apply to different x values. I hope my problem is clear.
So you have a slight problem with lazy evaluation and variable scopes when you are using a for loop to build functions (see here for more info). It's a bit safer to use something like mapply which will create closures for you. Try
myfunc_all <- with(expand.grid(1:5, 1:3), mapply(function(i, j) {
function(x) {
x*df_slope[i,j]+df_constant[i,j]
}
},Var1, Var2))
dim(myfunc_all) <- c(5,3)
This will create an array like object. The only difference is that you need to use double brackets to extract the function. For example
myfunc_all[[2,1]](0)
# [1] 4
myfunc_all[[5,3]](0)
# [1] -1
Alternative you can choose to write a function that returns a function. That would look like
myfunc_all <- (function(slopes, constants) {
function(i, j)
function(x) x*slopes[i,j]+constants[i,j]
})(df_slope, df_constant)
then rather than using brackets, you call the function with parenthesis.
myfunc_all(2,1)(0)
# [1] 4
myfunc_all(5,3)(0)
# [1] -1
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
functions = vector(mode = "list", length = nrow(df_slope))
for (i in 1:nrow(df_slope)) {
functions[[i]] = function(i,x) { df_slope[i]*x + df_constant[i]}
}
f = function(i, x) {
functions[[i]](i, x)
}
f(1, 1:10)
f(3, 5:10)
I have a problem with performing row-wise operations using 'apply' function in R. I want to calculate the distance between two points:
d <- function(x,y){
length <- norm(x-y,type="2")
as.numeric(length)
}
The coordinates are given by two dataframes:
start <- data.frame(
a = c(7, 5, 17, 1),
b = c(5, 17, 1, 2))
stop <- data.frame(
b = c(5, 17, 1, 2),
c = c(17, 1, 2, 1))
My point is to calculate successive distances given by start and stop coordiantes. I wish it worked like:
d(start[1,], stop[1,])
d(start[2,], stop[2,])
d(start[3,], stop[3,])
etc...
I have tried:
apply(X = start, MARGIN = 1, FUN = d, y = stop)
which brought some strange results. Can you please help me with finding the proper solution? I know how to perform the operation using dplyr rowwise() function, however my wish is to use base only.
Can you also explain me why did I receive such a strange results with apply()?
Loop over the sequence of rows and apply the d
sapply(seq_len(nrow(start)), function(i) d(start[i,], stop[i,]))
[1] 12.165525 20.000000 16.031220 1.414214
Or if we want to use apply, create a single data by cbinding the two data and then subset by indexing
apply(cbind(start, stop), 1, FUN = function(x) d(x[1:2], x[3:4]))
[1] 12.165525 20.000000 16.031220 1.414214
Or may use dapply for efficiency
library(collapse)
dapply(cbind(start, stop), MARGIN = 1, parallel = TRUE,
FUN = function(x) d(x[1:2], x[3:4]))
[1] 12.165525 20.000000 16.031220 1.414214
I am trying to complete a function. Hopefully, sometime in the future, I may share it with other users. In this function, I would like to have an argument so that users will have an option either excluding all missing values in all analyses or as it is based on data available for different components. I wonder if there is a standard way to do this or a r rule for this.
To show my point:
mydata <- data.frame(x = c(1, 2, 3, 4, 5, NA, 7),
y = c(2, NA, 4, 5, 6, 7, NA))
myfun <- function(data, na.omit = FALSE, ...) {
if (na.omit == TRUE) {
data <- na.omit(data)
}
# computing a lot of things
print(data)
}
myfun(data = mydata, na.omit = F)
myfun(data = mydata, na.omit = T)
Although it works fine now, I am still a little worried about this because na.omit is an existing r function. Should I change this argument into something like na_omit or complete_set?
I have some data like so:
a <- c(1, 2, 9, 18, 6, 45)
b <- c(12, 3, 34, 89, 108, 44)
c <- c(0.5, 3.3, 2.4, 5, 13,2)
df <- data.frame(a, b,c)
I need to create a function to lag a lot of variables at once for a very large time series analysis with dozens of variables. So i need to lag a lot of variables without typing it all out. In short, I would like to create variables a.lag1, b.lag1 and c.lag1 and be able to add them to the original df specified above. I figure the best way to do so is by creating a custom function, something along the lines of:
lag.fn <- function(x) {
assign(paste(x, "lag1", sep = "."), lag(x, n = 1L)
return (assign(paste(x, "lag1", sep = ".")
}
The desired output is:
a.lag1 <- c(NA, 1, 2, 9, 18, 6, 45)
b.lag1 <- c(NA, 12, 3, 34, 89, 108, 44)
c.lag1 <- c(NA, 0.5, 3.3, 2.4, 5, 13, 2)
However, I don't get what I am looking for. Should I change the environment to the global environment? I would like to be able to use cbind to add to orignal df. Thanks.
Easy using dplyr. Don't call data frames df, may cause confusion with the function of the same name. I'm using df1.
library(dplyr)
df1 <- df1 %>%
mutate(a.lag1 = lag(a),
b.lag1 = lag(b),
c.lag1 = lag(c))
The data frame statement in the question is invalid since a, b and c are not the same length. What you can do is create a zoo series. Note that the lag specified in lag.zoo can be a vector of lags as in the second example below.
library(zoo)
z <- merge(a = zoo(a), b = zoo(b), c = zoo(c))
lag(z, -1) # lag all columns
lag(z, 0:-1) # each column and its lag
We can use mutate_all
library(dplyr)
df %>%
mutate_all(funs(lag = lag(.)))
If everything else fails, you can use a simple base R function:
my_lag <- function(x, steps = 1) {
c(rep(NA, steps), x[1:(length(x) - steps)])
}
I am working with R and using the expression sort(sample(1:60,6,replace=FALSE)) for generating 6 numbers between 1 and 60, without replacement...
I would like to create a loop using FOR statements that allow to generate n different samples, using the logic above.
Any suggestion about how to build this loop?
Use replicate:
replicate(sort(sample(1:60, 6, replace = FALSE)), n = 1000)
The result is a matrix of size 6x1000, so each column is one sample.
I guess you want to do random draws which would allow equal samples. In case you do want unique samples, I gave it a shot:
lottery <- function(n) {
S <- replicate(sort.int(sample(1:60, 6, repl = F)), n = n)
while(d <- anyDuplicated(S, MARGIN = 2)) {
S <- cbind(S[, -d], sort.int(sample(1:60, 6, repl = F)))
}
S
}
You can use the rerun function that returns a list with the result that you need
library(purrr)
rerun(.n = 1000, sort(sample(1:60, 6, replace = FALSE))) %>%
unique()