Sum function in R - r

I want to compute a simple sum, but not from 1 to the value that I put in the sum function, instead I want it to sum like I would normally do in math, where I have an expression which has some variable, that I then change from 1:4, and then R is suppose to sum the expression values.
Like
y = function(x) x**2
sum(y(x),x=3:5) = 3^2+4^2+5^2
How do I do this in R?

You almost had it, just pass the 3:5 directly to y:
> y <- function(x) x**2
> sum(y(3:5))
[1] 50

You can create a custom function:
mysum <- function(f,vals) sum(f(vals))
mysum(y,3:5)
# [1] 50
While this is not standard in R, there are uses for passing function and arguments separately:
sapply(list(sqrt=sqrt,log=log,sin=sin),mysum,vals=1:3)
# sqrt log sin
# 4.146264 1.791759 1.891888

If your function doesn't accept a vector, then you'll need to use an apply function. In base R:
y <- function(x) x^2
sum(sapply(1:4, y))
or
sum(Vectorise(y)(1:4))

Assign the values to x beforehand and than sum the result of your function. So like this:
y = function(x) x^2
x = 3:5
sum(y(x))

Related

Vectors with sigma notation (R)

I'm now learning R and have some difficulties while computing sigma notation. I know how to do the basic stuff like this:
summ <- 10:100
sum(summ^3 + 4 * summ^2)
But I don't know how to do the same operations with the values that differ from i (include not only i (ex: x and y)) or operations with two sigma notations in a row.
At the beginning I thought that it just requires to do the same as in the simple sigma notation with only i's
summ <- 1:10
sum((x^summ) / (y^summ))
But it shows an error that it is not a numeric argument.
Thank you in advance for your help.
For you second formula, you can define a function like below
f <- function(x,y,n) sum((x/y)**(1:n))
For you last formula, you can rewrite the expression as a product of two terms (you need a math transformation as the first step if you want to simplify the procedure), since i and j are independent
> sum((1:20)**2)*sum(1/(5+(1:10)**3))
[1] 886.0118
Otherwise, a straightforward translation from the formula could be using nested sapply
> sum(sapply(1:20,function(i) sapply(1:10, function(j) i**2/(5+j**3))))
[1] 886.0118
That's, basically, the answer to the first question with undefined variables x and y:
x <- readline(prompt = "Enter x: ")
y <- readline(prompt = "Enter y: ")
x <- as.integer(x)
y <- as.integer(y)
i = 1:10
answer <- sum((x^i) / (y^i))
answer

Rolling over function with 2 vector arguments

I want to apply rolling on the function that requires 2 vector arguments. Here is the exmample (that doesn't work) using data.table:
library(data.table)
df <- as.data.table(cbind.data.frame(x=1:100, y=101:200))
my_sum <- function(x, y) {
x <- log(x)
y <- x * y
return(x + y)
}
roll_df <- frollapply(df, 10, function(x, y) {
my_sum(x, y)})
It doesn't recognize y column. Ofc, the solution can be using xts or some other package.
EDIT:
This is the real function I want to apply:
library(dpseg)
dpseg_roll <- function(time, price) {
p <- estimateP(x=time, y=price, plot=FALSE)
segs <- dpseg(time, price, jumps=jumps, P=p, type=type, store.matrix=TRUE)
slope_last <- segs$segments$slope[length(segs$segments$slope)]
return(slope_last)
}
With runner you can apply any function in rolling window. Running window can be created also on a rows of data.frame inserted to x argument. Let's focus on simpler function my_sum. Argument f in runner can accept only one object (data in this case). I encourage to put browser() to the function to debug row-by-row before you apply some fancy model on the subset (some algorithms requires some minimal number of observations).
my_sum <- function(data) {
# browser()
x <- log(data$x)
y <- x * data$y
tail(x + y, 1) # return only one value
}
my_sum should return only one value, because runner computes for each row - if my_sum returns vector, you would get a list.
Because runner is an independent function you need to pass data.table object to x. Best way to do this is to use x = .SD (see here why)
df[,
new_col := runner(
x = .SD,
f = my_sum,
k = 10
)]
I have no idea what you are going to do with frollapply (mean or sum or something else?).
Assuming you are about to use rolling sum, here might be one example. I rewrote your function my_sum such that it applies to df directly.
my_sum <- function(...) {
v <- c(...)
x <- log(v[[1]])
y <- Reduce(`*`,v)
return(x + y)
}
roll_df <- frollapply(
my_sum(df),
10,
FUN = sum)
rollapply in zoo passes a zoo object to the function to be applied if coredata=FALSE is used. The zoo object is made up of a time and a value part so we can use the following if the x value represents ascending values (which I gather it does). Note that my_sum in the question returns a 10 element result if the two arguments are length 10 so out shown below is a 100 x 10 zoo object with the first 9 rows filled with NAs.
If you don't want the NAs omit fill=NA or if you want to apply the function to partial inputs at the beginning instead of fill=NA use partial=TRUE. If you only want one of the 10 elements, such as the last one, then use function(x) my_sum(time(x), coredata(x))[10] in place of the function shown or just use out[, 10].
fortify.zoo(out) can be used to turn a zoo object out to a data frame if you need the result in that form or use as.data.frame(out) if you want to drop the times. as.data.table(out) also works in a similar manner.
library(zoo)
z <- read.zoo(df) # df$x becomes the time part and df$y the value part
out <- rollapplyr(z, 10, function(u) my_sum(time(u), coredata(u)),
coredata = FALSE, fill = NA)
dim(out)
## [1] 100 10
Note that in dpseg_roll that jumps and type are not defined.

Assign single output for multiple-output function to new function

I have a function that gives me a single output which is however composed of two elements. Example for it would be:
example <- function(x){
sin <- sin(x)
cos <- cos(x)
output <- cbind(sin, cos)
return(output)
}
Now my idea is to plot separately sin and cos, each as functions of x. I would like to avoid writing a separate function in this context since the two objects are better to be calculated all at once.
If I try :
x_grid = seq(0,1,0,0.05)
plot(x_grid, sapply(x_grid, FUN = example[1]))
I get the following error message :
Error in example[1] : object of type 'closure' is not subsettable
How to proceed then? (notice that I use sapply because I need my function to deal with more than a single value of x in my real case).
If you're looking for a non-base graphics solution:
library(ggplot2)
example3 <- function(x){
data.frame(
x = x,
sin = sin(x),
cos = cos(x)
)
}
x_grid=seq(0,1,0.05)
ggplot(data = example3(x_grid),
aes(x=x)) +
geom_line(aes(y = sin), color = "blue") +
geom_line(aes(y = cos), color = "red")
With the output:
Your function is vectorized so you can input a vector and extract each column by example(x_grid)[, "sin"] or example(x_grid)[, "cos"].
example(x_grid)
# sin cos
# [1,] 0.000000000 1.000000000
# [2,] 0.049979169 0.998750260
# [3,] 0.099833417 0.995004165
example(x_grid)[, "sin"]
# [1] 0.000000000 0.049979169 0.099833417 0.149438132 0.198669331
# [6] 0.247403959 0.295520207 0.342897807 0.389418342 0.434965534
Note: In this case, sapply is not recommended because the function itself has been vectorized. sapply will make it inefficient. Here is an illustration by benchmark:
library(microbenchmark)
bm <- microbenchmark(
basic = example(x_grid)[, 1],
sapply = sapply(x_grid, function(x) example(x)[1]),
times = 1000L
)
ggplot2::autoplot(bm)
If you want to plot both the two functions, matplot() can plot each column of one matrix.
x_grid <- seq(0, 10, 0.05)
matplot(x_grid, example(x_grid), type = "l")
Appears to be an extra parameter to seq
x_grid <- seq(0, 1, 0.05)
Slight modification to pass variable to function and then subset
plot(x_grid, sapply(x_grid, function(x) example(x)[1]))
Another approach for function which uses a list and then the function can be subset by name
example2 <- function(x) {
within(list(), {
sin <- sin(x)
cos <- cos(x)
})
}
plot(x_grid, sapply(x_grid, function(x) example2(x)$sin))
Unless the example is simplified, the following works without sapply
plot(x_grid, example2(x_grid)$sin)
Plotting both results
lapply(example2(x_grid), plot, x_grid)

"Sapply" function in R counterpart in MATLAB to convert a code from R to MATLAB

I want to convert the code in R to MATLAB (not to executing the R code in MATLAB).
The code in R is as follows:
data_set <- read.csv("lab01_data_set.csv")
# get x and y values
x <- data_set$x
y <- data_set$y
# get number of classes and number of samples
K <- max(y)
N <- length(y)
# calculate sample means
sample_means <- sapply(X = 1:K, FUN = function(c) {mean(x[y == c])})
# calculate sample deviations
sample_deviations <- sapply(X = 1:K, FUN = function(c) {sqrt(mean((x[y == c] - sample_means[c])^2))})
To implement it in MATLAB I write the following:
%% Reading Data
% read data into memory
X=readmatrix("lab01_data_set(ViaMatlab).csv");
% get x and y values
x_read=X(1,:);
y_read=X(2,:);
% get number of classes and number of samples
K = max(y_read);
N = length(y_read);
% Calculate sample mean - 1st method
% funct1 = #(c) mean(c);
% G1=findgroups(y_read);
% sample_mean=splitapply(funct1,x_read,G1)
% Calculate sample mean - 2nd method
for m=1:3
sample_mean(1,m)=mean(x(y_read == m));
end
sample_mean;
% Calculate sample deviation - 2nd method
for m=1:3
sample_mean=mean(x(y_read == m));
sample_deviation(1,m)=sqrt(mean((x(y_read == m)-sample_mean).^2));
sample_mean1(1,m)=sample_mean;
end
sample_deviation;
sample_mean1;
As you see I get how to use a for loop in MATLAB instead of sapply in R (as 2nd method in code), but do not know how to use a function (Possibly splitaplly or any other).
PS: Do not know how to upload the data, so sorry for that part.
The MATLAB equivalent to R sapply is arrayfun - and its relatives cellfun, structfun and varfun depending on what data type your input is.
For example, in R:
> sapply(1:3, function(x) x^2)
[1] 1 4 9
is equivalent to MATLAB:
>>> arrayfun(#(x) x^2, 1:3)
ans =
1 4 9
Note that if the result of the function you pass to arrayfun, cellfun etc. doesn't have identical type or size for every input, you'll need to specify 'UniformOutput', 'false' .

Creating correlation matrix p values [duplicate]

This question already has answers here:
How to iterate through parameters to analyse
(2 answers)
Closed 8 years ago.
I can get correlation matrix using following commands:
> df<-data.frame(x=c(5,6,5,9,4,2,1,3,5,7),y=c(3.1,2.5,3.8,5.4,6.5,2.5,1.5,8.1,7.1,6.1),z=c(5,6,4,9,2,4,1,6,2,4))
> cor(df)
x y z
x 1.0000000 0.2923939 0.6566866
y 0.2923939 1.0000000 0.1167084
z 0.6566866 0.1167084 1.0000000
>
I can get individual p-values using command:
> cor.test(x,y)$p.value
[1] 0.4123234
How can I get a matrix of p-values for all these correlation coefficients? Thanks for your help.
You can also use the package Hmisc.
An example of how it works:
mycor <- rcorr(as.matrix(data), type="pearson")
mycor$r shows the correlation matrix, mycor$p the matrix with corresponding p-values.
This example calculates the p value for each of the column combinations. It is not an optimal solution (x-y and y-x p values are both calculated for example), but should provide some inspiration for you. The main trick is to use expand.grid to generate the combinations of columns, and use mapply to call cor.test on each combination:
col_combinations = expand.grid(names(df), names(df))
cor_test_wrapper = function(col_name1, col_name2, data_frame) {
cor.test(data_frame[[col_name1]], data_frame[[col_name2]])$p.value
}
p_vals = mapply(cor_test_wrapper,
col_name1 = col_combinations[[1]],
col_name2 = col_combinations[[2]],
MoreArgs = list(data_frame = df))
matrix(p_vals, 3, 3, dimnames = list(names(df), names(df)))
x y z
x 0.00000000 0.4123234 0.03914453
y 0.41232343 0.0000000 0.74814951
z 0.03914453 0.7481495 0.00000000
one way is to use corr.test (notice the double r) from package psych
.. or if you're a fan of mapply and sapply you could write your own function doing this. something like:
rrapply <- function(A, FUN, ...) mapply(function(a, B) lapply(B,
function(x) FUN(a, x, ...)), a = A, MoreArgs = list(B = A))
cor.tests <- rrapply(df, cor.test) # a matrix of cor.tests
apply(cor.tests, 1:2, function(x) x[[1]]$p.value) # and it's there
And now you can use the same logic to make a matrix of t-tests or, say, CI's of correlations

Resources