solving conditional symbolic equation in R - r

Here are my vectors. Note that the vector m consists of unknown value x.
r=c(1, 3, 5)
p=c(5, 3, 1)
s=c(1, 3, 5)
m=c(x, x, x)
How I can solve the following conditional vector equation in R?
I'd like to find the value of x which makes the sum of elements in the vector of p*(s-m)*(ifelse((s-m)<0, r, 6-r)) zero.
sum(p*(s-m)*(ifelse((s-m)<0, r, 6-r)))=0
I was told that rSymPy might handle this but I don't think this works for the vector equation. Any thoughts or suggestions?
In fact, I was able to get the answer in Excel using "goal seek" but would like to get R commands for solving this.

Excel does numeric calculations, not symbolic, so we assume that a numeric solution is really what you want. In that case it can be solved numerically like this:
f <- function(x) sum(p*(s-x)*(ifelse((s-x)<0, r, 6-r)))^2
optimize(f, range(s))
giving:
$minimum
[1] 2.4667
$objective
[1] 3.1554e-30

Related

Implementing equations in R

I am new to R (also not too good at math) and I am trying to calculate this equation in R with some difficulties:
X is some integer data I have, with 550 samples.
Any help is appreciated since I am unsure how to do this. I think I have to use a for loop and the sum() function but other than that I don;t know.
R supports vectorisation, which means you very rarely need to implement for loops.
For example, you can solve your equation like so:
## I'm just making up a long numerical vector for x - obviously you can use anything
x <- 1:1000
solution <- sum(20/x)^0.5
Unless the brackets denote the integral, rather than the sum? In which case:
solution <- sum( (20/x)^0.5 )

R: Efficiently Calculate Deviations from the Mean Using Row Operations on a DF (Without Using a For Loop)

I am generating a very large data frame consisting of a large number of combinations of values. As such, my coding has to be as efficient as possible or else 1) I get errors like - R cannot allocate vector of size XX or 2) the calculations take forever.
I am to the point where I need to calculate r (in the example below r = 3) deviations from the mean for each sample (1 sample per row of the df)(Labeled dev1 - dev3 in pic below):
These are my data in R:
I tried this (r is the number of values in each sample, here set to 3):
X2<-apply(X1[,1:r],1,function(x) x-X1$x.bar)
When I try this, I get:
I am guessing that this code is attempting to calculate the difference between each row of X1 (x) and the entire vector of X1$x.bar instead of 81 for the 1st row, 81.25 for the 2nd row, etc.
Once again, I can easily do this using for loops, but I'm assuming that is not the most efficient way.
Can someone please stir me in the right direction? Any assistance is appreciated.
Here is the whole code for the small sample version with r<-3. WARNING: This computes all possible combinations, so the df's get very large very quick.
options(scipen = 999)
dp <- function(x) {
dp1<-nchar(sapply(strsplit(sub('0+$', '', as.character(format(x, scientific = FALSE))), ".",
fixed=TRUE),function(x) x[2]))
ifelse(is.na(dp1),0,dp1)
}
retain1<-function(x,minuni) length(unique(floor(x)))>=minuni
# =======================================================
r<-3
x0<-seq(80,120,.25)
X0<-data.frame(t(combn(x0,r)))
names(X0)<-paste("x",1:r,sep="")
X<-X0[apply(X0,1,retain1,minuni=r),]
rm(X0)
gc()
X$x.bar<-rowMeans(X)
dp1<-dp(X$x.bar)
X1<-X[dp1<=2,]
rm(X)
gc()
X2<-apply(X1[,1:r],1,function(x) x-X1$x.bar)
Because R is vectorized you only need to subtract x.bar from from x1, x2, x3 collectively:
devs <- X1[ , 1:3] - X1[ , 4]
X1devs <- cbind(X1, devs)
That's it...
I think you just got the margin wrong, in apply you're using 1 as in row wise, but you want to do column wise so use 2:
X2<-apply(X1[,1:r], 2, function(x) x-X1$x.bar)
But from what i quickly searched, apply family isn't better in performance than loops, only in clarity. Check this post: Is R's apply family more than syntactic sugar?

Is there an R function to create a Square sequence

I'm currently trying to grasp the basics of r.
One Exercise I'm trying is creating square sequences where the sequence is the number squared itself.
I'm trying to create a sequence such as:
(1,2,3,4,5) =
(1,2,9,64,625)
Is there a function for this in r?
The answer to this question is
(1:10) ^ (0:9)
I don't exactly understand what you want your desired output to be, but arrays are your friends. Almost anything you want to do here can be vectorized.
n <- 5
x <- seq(1:n)
x
y <- c(0:(n-1))
y
z <- x^y
z

Tapply over matrix using matrix math

All,
I have the following code, I'd like to make it generalized for more clusters, ie C clusters. Is there a way to do this without a loop? Here, the rows of X correspond to variables x1,x2, and T is a linear transformation of X. Thanks.
X=matrix(c(2,3,4,5,6,7,8,9,2,3,4,5,6,7,8,9),2)
cluster=c(1,1,1,0,0,0,0,0)
T=matrix(c(1,2,2,1),2)
f<-function(x) max(eigen(t(x)%*%x)$values)
f(T%*%X[,cluster==0])+f(T%*%X[,cluster==1])
## [1] 1134.87
I was thinking of
sum(tapply(X,cluster,function(x) f(T%*%x)))
but I get this error, I think because tapply takes a vector vs matrix:
> sum(tapply(X,cluster,function(x) f(T%*%x)))
Error in tapply(X, cluster, function(x) f[x]) :
arguments must have same length
Here is an answer with a for loop, if you can find something without a loop please let me know
#
c=length(levels(factor(cluster)))
cluster=factor(cluster,labels=1:c)
s=0
for (i in 1:c){
s=s+f(T%*%X[,cluster==c])
}
s
## [1] 1134.872
Could try doing this via tapply
tapply(seq_len(ncol(X)), cluster, function(x) f(T%*%X[, x]))
# 0 1
# 3840.681 1238.826

why does sd in R return a vector for matrix input, and what can I do about it?

I am somewhat confused as to why the sd function in R returns an array for matrix input (I suppose to maintain backwards compatibility, it always will). This is very odd behaviour to me:
#3d input, same same
print(length(mean(array(rnorm(60),dim=c(3,4,5)))))
print(length(sd(array(rnorm(60),dim=c(3,4,5)))))
#1d input, same same
print(length(mean(array(rnorm(60),dim=c(60)))))
print(length(sd(array(rnorm(60),dim=c(60)))))
#2d input, different!
print(length(mean(array(rnorm(60),dim=c(12,5)))))
print(length(sd(array(rnorm(60),dim=c(12,5)))))
I get
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 5
That is sd behaves differently from mean when the input is a 2-d array (and apparently only in that case!) Consider then, this failed function to rescale each column of a k-dimensional array by the standard deviation:
re.scale <- function(x) {
#rescale by the standard deviation of each column
scales <- apply(x,2,sd)
ret.val <- sweep(x,2,scales,"/")
}
#this works just fine
x <- array(rnorm(60),dim=c(12,5))
y <- re.scale(x)
#this throws a warning
x <- array(rnorm(60),dim=c(3,4,5))
y <- re.scale(x)
Is there some other function to replace sd without this weird behavior? How would one write re.scale properly? Or a Z-score-by-column function?
It is behaving as document in sd's help page. At the very top it announces:
"If x is a matrix or a data frame, a vector of the standard deviation of the columns is returned."
Note it does not say that the arrays are included, so only arrays with two dimensions are included. If you want to stop this behavior, then just make a vector out of it with c():
sd( c(array(rnorm(60),dim=c(12,5))) )
# [1] 0.9505643
I see that you added a request for column z scores. Try this for matrices:
colMeans(x)/sd(x)
And this for arrays (although the definition of a "column" may need clarification:
apply(x, 2:3, mean)/apply(x, 2:3, sd) # will generalize to higher dimensions
The actions of sd were changed:
1. version 2.13.2(2011-09-30) and earlier
> set.seed(1)
> sd(array(rnorm(60),dim=c(12,5)))
[1] 0.8107276 1.1234795 0.7925743 0.6186082 0.9464160
Description
This function computes the standard deviation of the values in x. If
na.rm is TRUE then missing values are removed before computation
proceeds.
If x is a matrix or a data frame, a vector of the standard
deviation of the columns is returned.
2. R version 2.14.0(2011-10-31) - 2.15.3(2013-03-01)
> set.seed(1)
> sd(array(rnorm(60),dim=c(12,5)))
[1] 0.8107276 1.1234795 0.7925743 0.6186082 0.9464160
WARNING:
sd(<matrix>) is deprecated.
Use apply(*, 2, sd) instead.
Details
Prior to R 2.14.0, sd(dfrm) worked directly for a data.frame
dfrm. This is now deprecated and you are expected to use sapply(dfrm,
sd).
3. R version 3.0.0 (2013-04-03) and later
> sd(array(rnorm(60),dim=c(12,5)))
[1] 0.8551688
>
(no WARNIG)

Resources