distribution from percentage with R [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have distribution of parameter (natural gas mixture composition) expressed in percents. How to test such data for distribution parameters (it should be gamma, normal or lognormal distribution) and generate random composition based on that parameters in R?

This might be a better question for CrossValidated, but:
it is not generally a good idea to choose from among a range of possible distributions according to goodness of fit. Instead, you should choose according to the qualitative characteristics of your data, something like this:
Frustratingly, this chart doesn't actually have the best choice for your data (composition, continuous, bounded between 0 and 1 [or 0 and 100]), which is a Beta distribution (although there are technical issues if you have values of exactly 0 or 100 in your sample).
In R:
## some arbitrary data
z <- c(2,8,40,45,56,58,70,89)
## fit (beta values must be in (0,1), not (0,100), so divide by 100)
(m <- MASS::fitdistr(z/100,"beta",start=list(shape1=1,shape2=1)))
## sample 1000 new values
z_new <- 100*rbeta(n=1000,shape1=m$estimate["shape1"],
shape2=m$estimate["shape2"])

Related

normally distributed population, calculating in R the probability of negative or zero readings occurring [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
In R, how do you calculate the probability of negative or zero readings occurring?
μ and σ are giving.
You can use the distribution function of the gaussian distribution:
pnorm(0,μ,σ)
(I guess you are speaking about gaussian distribution)
edit
The pnorm is the cumulative density function. Its values are between 0 and 1, and its value at x gives the area under the gaussian curve from -inf to x. In my example below, the value at 0 of pnorm give the area in pink under the gaussian curve, so the probability you are looking for, i.e. the probability of sampling a value following the corresponding gausian distribution with a value below or equal to 0.

Setting limits to a rnorm function in RStudio [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Good evening,
Even though I know it will "destroy" an actual normal distribution, I need to set a maximum and a minimum to a rnorm function in R.
I'm using survival rates in vultures to calculate population trends and although I need it to fluctuate, for logic reasons, survival rates can't be over 1 or under 0.
I tried doing it with if's and else's but I think there should be a better way to do it.
Thanks!
You could sample from a large normalized rnorm draw:
rbell <- function(n) {
r <- rnorm(n * 1000)
sample((r - min(r)) / diff(range(r)), n)
}
For example:
rbell(10)
#> [1] 0.5177806 0.5713479 0.5330545 0.5987649 0.3312775 0.5508946 0.3654235 0.3897417
#> [9] 0.1925600 0.6043243
hist(rbell(1000))
This will always be curtailed to the interval (0, 1).

Multiple regression equations [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Is there a way that I can run 100 different regressions together and get the output of all equations together in a table format?
Any software will work.
I need to find growth rates of 100 commodities using log-linear model. So I have 100 equations with dependent variable being ln(value of exports) and independent variables being time (0 to 30).
So running regression individually for 100 equations is lot of manual work.
I just require the coefficients of t for all the 100 equations. Any way to shorten the time spent doing so?
For example, assuming you have a data frame commodity_data in R with each commodity as a different column:
n <- ncol(commodity_data)
logslopes <- numeric(n)
tvec <- 0:(nrow(n)-1)
for (i in 1:n) {
m <- lm(log(commodity_data[,i]) ~ tvec)
slope <- coef(m)["tvec"]
logslopes[i] <- slope
}
There are slicker ways of doing this, but this one should work fine.

How to find scaling factor for new data in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I developed a model for my fraud detection dataset that contains 100000 records.
In my dataset, I treated 70% of the data as training data and 30% of the data as testing data. Before generating a final model for the training data, I then scaled the data using scale=TRUE in R.
But I can't scale the prediction (i.e., testing) data alone.
How do I scale the new data?
If you want to scale the new vector (v2) using the centring and scaling parameters used to scale the original vector (v1) you can do:
v1 <- 1:10
v1_scl <- scale(v1)
v2 <- sample(20, 10)
v2_scl <- (v2 - attr(v1_scl, 'scaled:center')) / attr(v1_scl, 'scaled:scale')
or if you've used the default of centring v1 on its mean and scaling by its standard deviation, you can do:
v2_scl <- (v2 - mean(v1)) / sd(v1)

Calculate difference [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have 5 values, for example like this:
3
11
8
5
8
I want to calculate the average difference between them.
If I had just two values, say 3 and 11, the difference would be 8.
But how do I do this when I have more values (for example five as in my example above)?
I can not show the answer in detail because I can not format math in this board. Please refer to the mathematics subboard for math related question.
Not exactly sure what you are after but it might be the standard deviation
The standard deviation is a measure of the relative deviation from each number with respect to the ensemble average.
You might be looking for Variance or Standard Deviation.
Variance -> A measure of the dispersion of a set of data points around their mean value. Variance is a mathematical expectation of the average squared deviations from the mean.
Estimating the variance

Resources