Vectors with sigma notation (R)

Vectors with sigma notation (R) - r

I'm now learning R and have some difficulties while computing sigma notation. I know how to do the basic stuff like this:
summ <- 10:100
sum(summ^3 + 4 * summ^2)
But I don't know how to do the same operations with the values that differ from i (include not only i (ex: x and y)) or operations with two sigma notations in a row.
At the beginning I thought that it just requires to do the same as in the simple sigma notation with only i's
summ <- 1:10
sum((x^summ) / (y^summ))
But it shows an error that it is not a numeric argument.
Thank you in advance for your help.

For you second formula, you can define a function like below
f <- function(x,y,n) sum((x/y)**(1:n))
For you last formula, you can rewrite the expression as a product of two terms (you need a math transformation as the first step if you want to simplify the procedure), since i and j are independent
> sum((1:20)**2)*sum(1/(5+(1:10)**3))
[1] 886.0118
Otherwise, a straightforward translation from the formula could be using nested sapply
> sum(sapply(1:20,function(i) sapply(1:10, function(j) i**2/(5+j**3))))
[1] 886.0118

That's, basically, the answer to the first question with undefined variables x and y:
x <- readline(prompt = "Enter x: ")
y <- readline(prompt = "Enter y: ")
x <- as.integer(x)
y <- as.integer(y)
i = 1:10
answer <- sum((x^i) / (y^i))
answer

Related

"Sapply" function in R counterpart in MATLAB to convert a code from R to MATLAB

I want to convert the code in R to MATLAB (not to executing the R code in MATLAB).
The code in R is as follows:
data_set <- read.csv("lab01_data_set.csv")
# get x and y values
x <- data_set$x
y <- data_set$y
# get number of classes and number of samples
K <- max(y)
N <- length(y)
# calculate sample means
sample_means <- sapply(X = 1:K, FUN = function(c) {mean(x[y == c])})
# calculate sample deviations
sample_deviations <- sapply(X = 1:K, FUN = function(c) {sqrt(mean((x[y == c] - sample_means[c])^2))})
To implement it in MATLAB I write the following:
%% Reading Data
% read data into memory
X=readmatrix("lab01_data_set(ViaMatlab).csv");
% get x and y values
x_read=X(1,:);
y_read=X(2,:);
% get number of classes and number of samples
K = max(y_read);
N = length(y_read);
% Calculate sample mean - 1st method
% funct1 = #(c) mean(c);
% G1=findgroups(y_read);
% sample_mean=splitapply(funct1,x_read,G1)
% Calculate sample mean - 2nd method
for m=1:3
sample_mean(1,m)=mean(x(y_read == m));
end
sample_mean;
% Calculate sample deviation - 2nd method
for m=1:3
sample_mean=mean(x(y_read == m));
sample_deviation(1,m)=sqrt(mean((x(y_read == m)-sample_mean).^2));
sample_mean1(1,m)=sample_mean;
end
sample_deviation;
sample_mean1;
As you see I get how to use a for loop in MATLAB instead of sapply in R (as 2nd method in code), but do not know how to use a function (Possibly splitaplly or any other).
PS: Do not know how to upload the data, so sorry for that part.

The MATLAB equivalent to R sapply is arrayfun - and its relatives cellfun, structfun and varfun depending on what data type your input is.
For example, in R:
> sapply(1:3, function(x) x^2)
[1] 1 4 9
is equivalent to MATLAB:
>>> arrayfun(#(x) x^2, 1:3)
ans =
1 4 9
Note that if the result of the function you pass to arrayfun, cellfun etc. doesn't have identical type or size for every input, you'll need to specify 'UniformOutput', 'false' .

Minimal number of coverage of big data lists

Following my question
I use the following code:
dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')
....
p32<-c('att35','att34','att32')
In the real case there can be 1024 vectors.
I would like to find all the relevant p that the unification of them will be the maximal components of dist. I this case the solution would be p1, p3, p5. I want to choose the minimal number of p. In addition, in case there is no way to cover all the of dist component so I want to choose the maximal cover with minimal number of vectors (p).
N = 32
library(qdapTools)
library(dplyr)
library(data.table)
## generate matrix of attributes
attribute_matrix <- mtabulate(list(p1, p2, p3, p4, p5,...,p32))
library (bigmemory)
## generate matrix of attributes
grid_matrix <- do.call(CJ, rep(list(1:0), N)) %>% as.big.matrix
Error: cannot allocate vector of size 8.0 Gb
I tried an alternative way for it:
grid_matrix <- do.call(CJ, rep(list(1:0), N)) %>% as.data.frame
grid_matrix <- as.matrix (grid_matrix)
And still got the same error.
How can I fix it and use it for big data? I wanted to continue with:
colnames(grid_matrix) <- paste0("p", 1:N)
combin_all_element_present <- rowSums(grid_matrix %*% attribute_matrix > 0) %>% `==`(., ncol(attribute_matrix))
grid_matrix_sub <- grid_matrix[combin_all_element_present, ]
grid_matrix_sub[rowSums(grid_matrix_sub) == min(rowSums(grid_matrix_sub)), ]

This is known as a set covering problem. It can be solved using integer linear programming. Let x1, x2, ... be 0/1 variables (one for each p variable) and represent p1, p2, ... as 0/1 vectors P1, P2, ... and dist as
a 0/1 vector D. Then the problem can be stated as:
min x1 + x2 + ... + x32
such that
P1 * x1 + P2 + x2 + ... + P32 * x32 >= D
which in R code is the following. First create a list p with the p vectors in sorted order. Use mixedsort so that p32 comes at the end instead of rigth after p3. Define attnames as the set of all att names in all the p vectors.
Then formulate the objective function (which equals the number of p's in the cover), the constraint matrix (consisting of the P vectors as columns) and the right hand side of the constraint equations (which is dist as a 0/1 vector). Finally run the integer linear program and convert the solution from a 0/1 vector to a vector of p names.
library(gtools)
library(lpSolve)
p <- mget(mixedsort(ls(pattern = "^p\\d+$")))
attnames <- mixedsort(unique(unlist(p)))
objective <- rep(1L, length(p))
const.mat <- sapply(p, function(x) attnames %in% x) + 0L
const.rhs <- (attnames %in% dist) + 0L
ans <- lp("min", objective, const.mat, ">=", const.rhs, all.bin = TRUE)
names(p)[ans$solution == 1L]
## [1] "p2" "p4" "p5"
The constraint matrix has a row for each attnames entry and a column for each p vector.
The solution produces the minimal covers of those attnames elements that are in dist. If every element of dist appears in at least one p vector then the solution will represent a cover of dist. If not, the solution will represent a cover of those att names in one or more p vectors that are also in dist; thus, this handles both cases discussed in the question. The uncovered elements of dist are:
setdiff(dist, attnames)
so if that is of zero length then the solution represents a complete cover of dist. If not the solution represents a cover of
intersect(dist, attnames)
The sorting done in the code is not stricly needed but it may be easier to work with the various inputs to the optimization by having the rows and columns of the constraint matrix in a logical order.
Note: Run this code from the question before running the above code:
dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')
p32<-c('att35','att34','att32')

The answer already provided is perfect but another approach could be the following:
dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')
library(qdapTools)
library(data.table)
attribute_matrix <- mtabulate(list(p1, p2, p3, p4, p5))
minimal_sets <- function(superset, subsets_matrix, p){
setDT(subsets_matrix)
# removing the columns that are not in the superset
updated_sub_matr <- subsets_matrix[, which(names(subsets_matrix) %in% superset), with = F]
# initializing counter for iterations and the subset selected
subset_selected <- integer(0)
counter <- p
## Loop until either we ran out of iterations counter = 0 or we found the solution
while (counter > 0 & length(superset) > 0){
## find the row with the most matches with the superset we want to achieve
max_index <- which.max(rowSums(updated_sub_matr))
## remove from the superset the entries that match that line and from the subsets_matrix those columns as they dont contribute anymore
superset <- superset[which(updated_sub_matr[max_index, ] == 0)]
updated_sub_matr <- updated_sub_matr[, - which(updated_sub_matr[max_index, ] != 0), with = F]
counter <- counter - 1
subset_selected <- c(subset_selected, max_index)
}
if (length(superset) > 0){
print(paste0("No solution found, there are(is) ", length(superset), " element(s) left ", paste(superset, collapse = "-")))
} else {
print(paste0("Found a solution after ", p - counter, " iterations"))
}
print(paste0("Selected the following subsets: ", paste(subset_selected, collapse = "-")))
}
In this function you input your superset (in this case dist), the attribute_matrix and the number p which you want to check and it outputs the best possible solution it found as well as the iterations.
> minimal_sets(dist, attribute_matrix, 1)
[1] "No solution found, there are(is) 3 element(s) left att3-att4-att6"
[1] "Selected the following subsets: 1"
> minimal_sets(dist, attribute_matrix, 3)
[1] "Found a solution after 3 iterations"
[1] "Selected the following subsets: 1-3-5"
> minimal_sets(dist, attribute_matrix, 5)
[1] "Found a solution after 3 iterations"
[1] "Selected the following subsets: 1-3-5

Sum function in R

I want to compute a simple sum, but not from 1 to the value that I put in the sum function, instead I want it to sum like I would normally do in math, where I have an expression which has some variable, that I then change from 1:4, and then R is suppose to sum the expression values.
Like
y = function(x) x**2
sum(y(x),x=3:5) = 3^2+4^2+5^2
How do I do this in R?

You almost had it, just pass the 3:5 directly to y:
> y <- function(x) x**2
> sum(y(3:5))
[1] 50

You can create a custom function:
mysum <- function(f,vals) sum(f(vals))
mysum(y,3:5)
# [1] 50
While this is not standard in R, there are uses for passing function and arguments separately:
sapply(list(sqrt=sqrt,log=log,sin=sin),mysum,vals=1:3)
# sqrt log sin
# 4.146264 1.791759 1.891888

If your function doesn't accept a vector, then you'll need to use an apply function. In base R:
y <- function(x) x^2
sum(sapply(1:4, y))
or
sum(Vectorise(y)(1:4))

Assign the values to x beforehand and than sum the result of your function. So like this:
y = function(x) x^2
x = 3:5
sum(y(x))

Select an existing variable by using readline

I'd like to allow the user of my script to pick an existing object (a vector).
I thought something like this
...
message("Select a vector of y values")
nwd <- readLines(n = 1)
return(mean(nwd))
...
but the result is NA because nwd is seen as a character.
How can I solve?
Thanks.

A bit safer than eval(parse(...)):
x <- 1:10
message("Select a vector of y values")
nwd <- readLines(n = 1)
#input x
mean(get(nwd))
#[1] 5.5

Symbolic derivatives on formulas

In R, I would like a way to take symbolic derivatives of the right hand side of formulas which may include interaction terms, squared terms, etc.
For example, I would like to be able to take the derivative of the right hand side of each of the following two [edit:three] formulas with respect to x:
y~x+I(x^2)
y~x:z
EDIT: y~x*z
I would like a function which, when each of the above three formulas are input, returns 1+2x, z, and 1+z, respectively.
I've tried the following:
f1<-y~x+I(x^2)
deriv(f1,"x")
## Error in deriv.formula(f1, "x") : Function 'I' is not in the derivatives table
f2<-y~x:z
deriv(f2,"x")
## Error in deriv.formula(f2, "x") : Function '`:`' is not in the derivatives table
Is there any way to force R to recognize I(x^2) (or, similarly, I(x*z), etc.) as x^2 (respectively, x*z), x:z as x*z (in the mathematical sense), and x*z (in the formula sense) as x+z+x*z (in the mathematical sense) for purposes of calculating the derivative?
Second, is there a way to take the output from deriv() and reshape it to look like the right hand side of a formula? In particular, I know that D() will alleviate this issue and generate output in the form I desire (though D() can't handle a formula as input), but what if I want to take derivatives with respect to multiple variables? I can work around this by applying D() over and over for each variable I'd like to take the derivative with respect to, but it would be nice to simply input a character string of all such variables and receive output suitable to be placed on the right hand side of a formula.
Thank you!

If you have a formula expression you can work with it using substitute():
substitute( x~x:z+x:y , list(`:`=as.name("*") ) )
x ~ x * z + x * y
And this will let you pass an expression object to substitute with it first being evaluated (which would otherwise not happen since substitute does not evaluate its first argument):
form1 <- expression(x ~ x : z + x : y)
rm(form2)
form2 <- do.call('substitute' , list(form , list(`:`=as.name("*") ) ))
form2
# expression(x ~ x * z + x * y)
This shows how to "reshape" the RHS so that y ~ x:z is handled like ~ x*z by extracting the RHS from its list structure where the tilde operator is being treated as a function and the LHS is the second element in (~ , <LHS>, <RHS>):
f2<-y~x:z
substar <- function(form) {
do.call('substitute' , list(form , list(`:`=as.name("*") ) )) }
f3 <- substar(f2)
deriv(f3[[3]],"x")
#----------------------
expression({
.value <- x * z
.grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
.grad[, "x"] <- z
attr(.value, "gradient") <- .grad
.value
})
If you want to work with expressions it may help to understand that they are organized like lists and that the operators are really Lisp-like functions:
> Z <- y~x+I(x^2)
> Z
y ~ x + I(x^2)
> Z[[1]]
`~`
> Z[[2]]
y
> Z[[3]]
x + I(x^2)
> Z[[3]][[1]]
`+`
> Z[[3]][[2]]
x
> Z[[3]][[3]]
I(x^2)
> Z[[3]][[3]][[1]]
I
> Z[[3]][[3]][[2]]
x^2
> Z[[3]][[3]][[2]][[1]]
`^`
If you want to see a function that will traverse an expression tree, the inimitable Gabor Grothendieck constructed one a few years ago in Rhelp: http://markmail.org/message/25lapzv54jc4wfwd?q=list:org%2Er-project%2Er-help+eval+substitute+expression

the help file of deriv (?deriv)says that expr argument in deriv function is a "
A expression or call or (except D) a formula with no lhs" . So you can't use left hand side of the equation in an expression.
On the second part of the question, if I correctly understood your question, you can do something like this: say your rhs is x^2+y^2 and you need to take partial derivative of this expression with x and y:
myexp <- expression((x^2) + (y^2))
D.sc.x <- D(myexp, "x")
> D.sc.x
2 * x
D.sc.y <- D(myexp, "y")
> D.sc.y
2 * y
In one line:
lapply(as.list(c("x","y")),function(a)D(myexp,a))
[[1]]
2 * x
[[2]]
2 * y

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Vectors with sigma notation (R) - r

That's, basically, the answer to the first question with undefined variables x and y: x <- readline(prompt = "Enter x: ") y <- readline(prompt = "Enter y: ") x <- as.integer(x) y <- as.integer(y) i = 1:10 answer <- sum((x^i) / (y^i)) answer

Related

"Sapply" function in R counterpart in MATLAB to convert a code from R to MATLAB

Minimal number of coverage of big data lists

Sum function in R

Select an existing variable by using readline

Symbolic derivatives on formulas

Categories

Resources