Can I calculate z-score with R? [duplicate] - r

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
R, correlation: is there a func that converts a vector of nums to a vector of standard units
By reading stackoverflow's comments, I found z-score maybe calculated with Python or perl, but I did not comes across any for R yet. Did I miss it? Is it possible to be done with R?
As (http://en.wikipedia.org/wiki/Standard_score.)
z-score = (x-μ)/σ
x is a raw score to be standardized;
μ is the mean of the population;
σ is the standard deviation of the population.
I believe there are R packages designed for this? Where can we found them? Or similar package for normalization?

if x is a vector with raw scores then scale(x) is a vector with standardized scores.
Or manually: (x-mean(x))/sd(x)

Related

Defining a constant as a coefficient from an r regression [duplicate]

This question already has answers here:
Extract regression coefficient values
(4 answers)
Is there a reason to prefer extractor functions to accessing attributes with $?
(2 answers)
Closed 3 years ago.
I am trying define a constant as a coefficient from an r regression. I.e. trying to define elasticity as the number -1.64431 that you can see in the photo.
reg <- lm(ln_q ~ ln_price, data=df)
elasticity <- reg[["coefficients","ln_price"]]
print(elasticity)
I get the error:
Error in reg[["coefficients", "ln_price"]] :
incorrect number of subscripts
Any help much appreciated! :)
elasticity <- reg$coefficients[names(reg$coefficients)=="ln_price"]

How to get the "Proportion Var" line from PCA output using principal from package psych? [duplicate]

This question already has answers here:
psych: principal - loadings components
(1 answer)
Extracting output from principal function in psych package as a data frame
(2 answers)
Closed 5 years ago.
I would like to get the the "Proportion Var" line like object or vector from "pca $ loadings" below, to use its values in the PCA graphic.
I did the following:
library(psych)
data(iris)
pca <- principal(iris[1:4],nfactors = 2,rotate="varimax",scores=TRUE)
pca$loadings
How should I get the proportional var?
Another way is to compute manually, but first you need to extract all factors (e.g., all axes)
You should specify the nfactors as the number of variables that you have, which in your example is 4. So it should be like like this:
pca <- principal(iris[1:4],nfactors = 4,rotate="varimax",scores=TRUE)
Then, extract the pca$loadings after which you can compute the proportional variance by getting the sum of squared loadings per component (RC in this case) and divide them by the total sum squared loadings, which you can do by this:
colSums(pca$loadings[ , ]^2)/sum(pca$loadings[ , ]^2)
This should give you the same information as the proportion variance in the pca$loadings, albeit for all the components (RC1 to RC4 in this case).

Limit the range of forecast values for R forecast package [duplicate]

This question already has an answer here:
How to specify minimum or maximum possible values in a forecast?
(1 answer)
Closed 6 years ago.
I am performing time series modeling and plotting the eventual forecasts using R's forecast package and the base plot() function.
The final forecasts are negative although this doesn't make sense for the data (the floor is 0). Is there a way for me to tell the forecast function to limit the y-value prediction?
After writing it out, I helped myself find the right search terms. Here is the answer:
http://robjhyndman.com/hyndsight/forecasting-within-limits/

Finding best number of clusters knowing only dissimilarity matrix in R [duplicate]

This question already has answers here:
How to use dissimilarity matrix with NbClust in R
(2 answers)
Closed 5 years ago.
I have a dissimilarity matrix and I want to run hierarchical clustering using that matrix as the only input as I don't know the source data itself. For background, I aim at clustering elements using their mutual correlation as distance. Following the methodology indicate in here, I'm using the correlation matrix to compute the dissimilarity matrix to be given to hclust as input. This is working fine.
My question is: how do I find the optimal number of clusters? Is there an index that can be computed by only knowing the dissimilarity matrix? The indices in NbClust require the source data to run - it is not enough to know the dissimilarity matrix. Is there any other method I can use in R?
By just quickly looking at NbClust documentation it appears doable to only provide with the dissimilarity matrix omitting the original data source.
NbClust(data = NULL, diss = XYZ, distance = NULL ... etc
As the matrix is supplied (here referred to as XYZ), data and distance must be set to NULL. This is stated in the function Usage. NbClust should then be able to produce the partition index you are after.

Using normal distribution in r [duplicate]

This question already has answers here:
setting upper and lower limits in rnorm
(6 answers)
Closed 9 years ago.
I am using rnorm but the outputs I receive are sometimes negative. How do I create a restriction so that the outputs cannot be below 0? Example:
output = rnorm(1, 800/20, sqrt(800))
Why not abs(rnorm(1, 800/20, sqrt(800))? rnorm was written to give numbers from a normal distribution. Perhaps you are looking to get output from a truncated distribution. In that case, you might want to have a look at the truncnorm package.
library(truncnorm)
rtruncnorm( 1, a=0, b=Inf, 800/20, sqrt(800))
x = seq(-20,200,by=0.01)
y = dtruncnorm(x, a=0, b=Inf, 800/20, sqrt(800) )
plot(x,y,type="l",main="Density of a truncated normal distribution")
The Poisson distribution takes only positive values. Otherwise golbasche solution seems perfect.
hist(rpois(100, 5))

Resources