calculating z scores in R - r

I have a sample dataframe:
data<-data.frame(a=c(1,2,3),b=c(4,5,5),c=c(6,8,7),d=c(8,9,10))
And wish to calculate the z-scores for every row in the data frame and did :
scores<-apply(data,1,zscore)
I used the zscore function from
install.packages(c("R.basic"), contriburl="http://www.braju.com/R/repos/")
And obtained this
row.names V1 V2 V3
a -1.2558275 -1.2649111 -1.0883839
b -0.2511655 -0.3162278 -0.4186092
c 0.4186092 0.6324555 0.2511655
d 1.0883839 0.9486833 1.2558275
But when I try manually calculating the z score for the first row of the data frame I obtain the following values:
-1.45 -0.29 0.4844, 1.25
Manually, for the first row, I calculated as follows:
1) calculate the row mean (4.75) for first row
2) Subtract each value from the row mean (e.g; 4.75-1., 4.75-4., 4.75-6., 4.75-8)
3) square each difference.
4) add them up and divide by the amount of samples in row 1
5) thus I obtain the variance( answer = 6.685) and then get the standard deviation ( 2.58) of the first row alone
6) Then apply the formula of z score.

The zscore function, whatever it is, seems to be the same as scale in the base package.
apply(data, 1, scale)
## [,1] [,2] [,3]
## [1,] -1.2558275 -1.2649111 -1.0883839
## [2,] -0.2511655 -0.3162278 -0.4186092
## [3,] 0.4186092 0.6324555 0.2511655
## [4,] 1.0883839 0.9486833 1.2558275
For each column, it is calculating (x - mean(x)) / sd(x).

Related

Optimization function applied to table of values in R

`values <- matrix(c(0.174,0.349,1.075,3.1424,0.173,0.346,1.038,3.114,0.171,0.343,1.03,3.09,0.17,0.34,1.02,3.06),ncol=4) `
I am attempting to maximize the total value for the dataset taking only one value from each row, and with associated costs for each column
subject to:
One value column used per row.
cost of each use of column 1 is 4
cost of each use of column 2 is 3
cost of each use of column 3 is 2
cost of each use of column 4 is 1
total cost <= 11
These are stand in values for a larger dataset. I need to be able to apply it directly to all the rows of a dataset.
I have been trying to use the lpSolve package, with no success.
`f.obj <- values
f.con <- c(4,3,2,1)
f.dir <- "<="
f.rhs <- 11
lp("max", f.obj, f.con, f.dir, f.rhs)`
I am getting a solution of "0"
I do not know how to model this in a way that chooses one value per row and then uses a different value in calculating the constraints.
Looks like the problem is as follows:
We have a matrix a[i,j] with values, and a vector c[j] with costs.
We want to select one value for each row such that:
a. total cost <= 11
b. total value is maximized
To develop a mathematical model, we introduce binary variables x[i,j] ∈ {0,1}. With this, we can write:
max sum((i,j), a[i,j]*x[i,j])
subject to
sum((i,j), c[j]*x[i,j]) <= 11
sum(j, x[i,j]) = 1 ∀i
x[i,j] ∈ {0,1}
Implement in R. I use here CVXR.
#
# data
# A : values
# C : cost
#
A <- matrix(c(0.174,0.349,1.075,3.1424,0.173,0.346,1.038,3.114,0.171,0.343,1.03,3.09,0.17,0.34,1.02,3.06),ncol=4)
C <- c(4,3,2,1)
maxcost <- 11
#
# form a matrix cmat[i,j] indicating the cost of element i,j
#
cmat <- matrix(C,nrow=dim(A)[1],ncol=dim(A)[2],byrow=T)
#
# problem:
# pick one value from each row
# such that total value of selected cells is maximized
# and cost of selected cells is limited to maxcost
#
# model:
# min sum((i,j), a[i,j]*x[i,j])
# subject to
# sum((i,j), c[j]*x[i,j]) <= maxcost
# sum(j,x[i,j]) = 1 ∀i
# x[i,j] ∈ {0,1}
#
#
library(CVXR)
x = Variable(dim(A), name="x", boolean=T)
p <- Problem(Maximize(sum_entries(A*x)),
constraints=list(
sum_entries(cmat*x) <= maxcost,
sum_entries(x,axis=1) == 1
))
res <- solve(p,verbose=T)
res$status
res$value
res$getValue(x)*A
The output looks like:
> res$status
[1] "optimal"
> res$value
[1] 4.7304
> res$getValue(x)*A
[,1] [,2] [,3] [,4]
[1,] 0.0000 0 0.000 0.17
[2,] 0.0000 0 0.343 0.00
[3,] 1.0750 0 0.000 0.00
[4,] 3.1424 0 0.000 0.00
The description in the original post is not very precise. For instance, I assumed that we need to select precisely one cell from each row. If we just want "select at most one cell from each row", then replace
sum(j, x[i,j]) = 1 ∀i
by
sum(j, x[i,j]) <= 1 ∀i
As mentioned by Steve, the lpSolve package expects a single objective function not a matrix. You could reformulate as maximize(sum(RowSums(values*xij)) given constraint
Eg, change the matrix to a vector, and change the problem to a integer optimization problem
obj <- as.vector(values)
f.con <- rep(f.con, each = 4)
r <- lp('max', obj, matrix(f.con, nrow = 1), f.dir, f.rhs, int.vec = seq_along(obj))
#' Success: the objective function is 9.899925

When I concatenate in R am I creating a row or a column?

I concatenate the following:
ExampleConCat <- c(1, 1, 1, 0) and I have a 20x4 matrix (MatrixExample as below).
I can do matrix multiplication in Rstudio as below:
matrix.multipl <- MatrixExample %*% ExampleConCat
I get the below results:
# [,1]
# cycle_1 0.99019608
# cycle_2 0.96400149
# cycle_3 0.91064055
# cycle_4 0.83460040
# cycle_5 0.74478532
# cycle_6 0.64981877
# cycle_7 0.55637987
# cycle_8 0.46893791
# cycle_9 0.39005264
# cycle_10 0.32083829
# cycle_11 0.26141338
# cycle_12 0.21127026
# cycle_13 0.16955189
# cycle_14 0.13524509
# cycle_15 0.10730721
# cycle_16 0.08474320
# cycle_17 0.06664783
# cycle_18 0.05222437
# cycle_19 0.04078855
# cycle_20 0.03176356
My understanding is that:
To multiply an m×n matrix by an n×p matrix, the ns must be the same, and the result is an m×p matrix. https://www.mathsisfun.com/algebra/matrix-multiplying.html
So, the fact that it calculates at all indicates to me that concatenate above creates a column, i.e.: MatrixExample is a 20X4 matrix, thus ExampleConCat must be a 4X1 vector, in order for these two to multiply by eachother.
Or, are there different rules when one multiplies a vector by a matrix, and could you explain those to me simply?
I noticed that when I tried
matrix.multipl <- ExampleConCat %*% MatrixExample
I get the following:
Error in ExampleConCat %*% MatrixExample : non-conformable arguments
I would appreciate an explanation which reflects that I am new to R and newer still to matrix multiplication.
# MatrixExample:
# State A State B State C State D
# cycle_1 0.721453287 0.201845444 0.06689735 0.009803922
# cycle_2 0.520494846 0.262910628 0.18059602 0.035998510
# cycle_3 0.375512717 0.257831905 0.27729592 0.089359455
# cycle_4 0.270914884 0.225616773 0.33806874 0.165399604
# cycle_5 0.195452434 0.185784574 0.36354831 0.255214678
# cycle_6 0.141009801 0.147407084 0.36140189 0.350181229
# cycle_7 0.101731984 0.114117654 0.34053023 0.443620127
# cycle_8 0.073394875 0.086845747 0.30869729 0.531062087
# cycle_9 0.052950973 0.065278842 0.27182282 0.609947364
# cycle_10 0.038201654 0.048620213 0.23401643 0.679161707
# cycle_11 0.027560709 0.035963116 0.19788955 0.738586622
# cycle_12 0.019883764 0.026460490 0.16492601 0.788729740
# cycle_13 0.014345207 0.019389137 0.13581754 0.830448113
# cycle_14 0.010349397 0.014162175 0.11073351 0.864754914
# cycle_15 0.007466606 0.010318351 0.08952225 0.892692795
# cycle_16 0.005386808 0.007502899 0.07185350 0.915256795
# cycle_17 0.003886330 0.005447095 0.05731440 0.933352173
# cycle_18 0.002803806 0.003949642 0.04547092 0.947775632
# cycle_19 0.002022815 0.002860998 0.03590474 0.959211445
# cycle_20 0.001459366 0.002070768 0.02823342 0.968236444
If you check the help section help("%*%"), it briefly describes the rule for matrix multiplcation is used for vectors.
Multiplies two matrices, if they are conformable. If one argument is a vector, it will be promoted to either a row or column matrix to make the two arguments conformable. If both are vectors of the same length, it will return the inner product (as a matrix).
Doing MatrixExample %*% ExampleConCat, as you rightly pointed out conforms to those rules, ExampleConCat is treated as a 4 by 1 matrix. But when ExampleConCat %*% MatrixExample is done, the dimensions don't match i.e. ExampleConCat has 4*1 (or 1*4) whereas MatrixExample has 20*4 as dimension.
The vector will be converted to either row or column matrix, whichever makes the matrix work, as an example please see below:
exm = c(1,1,1,0)
exm_matrix = matrix(rnorm(16),
ncol=4)
exm_matrix%*%exm
#> [,1]
#> [1,] 2.1098758
#> [2,] -1.4432619
#> [3,] -0.2540392
#> [4,] -0.4211889
exm%*%exm_matrix
#> [,1] [,2] [,3] [,4]
#> [1,] 1.161164 -0.3602107 -0.3883783 -1.580562
Created on 2021-07-02 by the reprex package (v0.3.0)

using cor.test function in R

If x be a n*m matrix, when I use cor(x), I have a m*m correlation matrix between each pair of columns.
How can I use cor.test function on the n*m matrix to have a m*m p-value matrix also?
There may be an existing function, but here's my version. p_cor_mat runs cor.test on each pair of columns in matrix x and records the p-value. These are then put into a square matrix and returned.
# Set seed
set.seed(42)
# Matrix of data
x <- matrix(runif(120), ncol = 4)
# Function for creating p value matrix
p_cor_mat <- function(x){
# All combinations of columns
colcom <- t(combn(1:ncol(x), 2))
# Calculate p values
p_vals <- apply(colcom, MAR = 1, function(i)cor.test(x[,i[1]], x[,i[2]])$p.value)
# Create matrix for result
p_mat <- diag(ncol(x))
# Fill upper & lower triangles
p_mat[colcom] <- p_mat[colcom[,2:1]] <- p_vals
# Return result
p_mat
}
# Test function
p_cor_mat(x)
#> [,1] [,2] [,3] [,4]
#> [1,] 1.0000000 0.4495713 0.9071164 0.8462530
#> [2,] 0.4495713 1.0000000 0.5960786 0.7093539
#> [3,] 0.9071164 0.5960786 1.0000000 0.7466226
#> [4,] 0.8462530 0.7093539 0.7466226 1.0000000
Created on 2019-03-06 by the reprex package (v0.2.1)
Please also see the cor.mtest() function in the corrplot package.
https://www.rdocumentation.org/packages/corrplot/versions/0.92/topics/cor.mtest

Generation of random variables

I have a problem about the generation of random variables with R .
I have to generate random variables
$X_{ij}$ (i=1,...,25, j=1,...,5 ) knowing that
each X_ij follows a binomial distribution
$X_{ij} \sim Bin(n_{ij}, p_{ij})
$and I know already
$n_{ij}$ and $p_{ij}$
for each index. How to generate these random variable?
I don't know if it could be useful, but I have generated $p_{ij}$ knowing that they are also random variable which follow a beta distribution (hence actually $X_{ij}$ follow a beta binomial)
Let's say you had the following matrices for n and p:
(n <- matrix(4:7, nrow=2))
# [,1] [,2]
# [1,] 4 6
# [2,] 5 7
set.seed(144)
(p <- matrix(rbeta(4, 1, 2), nrow=2))
# [,1] [,2]
# [1,] 0.1582904 0.2794913
# [2,] 0.5176909 0.2889718
Now you can draw samples X_{ij} with something like:
set.seed(144)
matrix(apply(cbind(as.vector(n), as.vector(p)), 1, function(x) rbinom(1, x[1], x[2])), nrow=2)
# [,1] [,2]
# [1,] 0 2
# [2,] 2 2
The cbind part of this expression builds a 2-column matrix containing each (n, p) pairing and the apply part draws a single binomially distributed sample for each (n, p) pair, with the matrix part converting the resulting vector to a matrix.

How to calculate rolling bootstrapped values and confidence intervals in R

I am new to R and am trying to calculate the bootstrapped standard deviation (sd) and associated standard error within a 30 observation rolling window. The function below performs the rolling window appropriately if I just want sd. But when I add the bootstrap function using the boot package I get the error specified below. I gather that I am trying to store bootstrap results in a vector that isn't the correct size. Does anyone have any advice on how to store just the bootstrapped sd and associated stderror for each window in rows of a new matrix? The goal is to then plot the sd and associated 95% confidence intervals for each window along the timeseries. Thanks in advance for any help.
> head(data.srs)
LOGFISH
1 0.8274083
2 1.0853433
3 0.8049845
4 0.8912097
5 1.3514569
6 0.8694499
###Function to apply rolling window
rollWin <- function(timeSeries, windowLength)
{
data<-timeSeries
nOut <- length(data[, 1]) - windowLength + 1
out <- numeric(nOut)
if (length(data[,1]) >= windowLength)
{
for (i in 1:nOut)
{
sd.fun <- function(data,d)sd(data[d], na.rm = TRUE)
out[i] <- boot(data[i:(i + windowLength - 1), ], sd.fun, R=1000)
}
}
return (list(result=out))
}
###run rolling window function. ex. rollWin(data, windowlength)
a.temp<-rollWin(data.srs,30)
> warnings()
Warning messages:
1: In out[i] <- boot(data[i:(i + windowLength - 1), ], sd.fun, ... :
number of items to replace is not a multiple of replacement length
You can simplify it quite a lot. I am not familiar with the boot package, but we can roll a function along a vector using the rollapply function quite easily, and then we can make bootstrap samples using the replicate function:
# Create some data, 12 items long
r <- runif(12)
# [1] 0.44997964 0.27425412 0.07327872 0.68054759 0.33577348 0.49239478
# [7] 0.93421646 0.19633079 0.45144966 0.53673296 0.71813017 0.85270346
require(zoo)
# use rollapply to calculate function alonga moving window
# width is the width of the window
sds <- rollapply( r , width = 4 , by = 1 , sd )
#[1] 0.19736258 0.26592331 0.16770025 0.12585750 0.13730946 0.08488467
#[7] 0.16073722 0.22460430 0.22462168
# Now we use replicate to repeatedly evaluate a bootstrap sampling method
# 'n' is number of replications
n <- 4
replicate( n , rollapply( r , width = n , function(x) sd( x[ sample(length(x) , repl = TRUE) ] ) ) )
# [,1] [,2] [,3] [,4]
# [1,] 0.17934073 0.1815371 0.11603320 0.2992379
# [2,] 0.03551822 0.2862702 0.18492837 0.2526193
# [3,] 0.09042535 0.2419768 0.13124738 0.1666012
# [4,] 0.17238705 0.1410475 0.18136178 0.2457248
# [5,] 0.32008385 0.1709326 0.32909368 0.2550859
# [6,] 0.30832533 0.1480320 0.02363968 0.1275594
# [7,] 0.23069951 0.1275594 0.25648052 0.3016909
# [8,] 0.11235170 0.2493055 0.26089969 0.3012610
# [9,] 0.16819174 0.2099518 0.18033502 0.0906986
Each column represents the rollapply which bootstraps the observations in the current window before applying sd.

Resources