Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Is there a way that I can run 100 different regressions together and get the output of all equations together in a table format?
Any software will work.
I need to find growth rates of 100 commodities using log-linear model. So I have 100 equations with dependent variable being ln(value of exports) and independent variables being time (0 to 30).
So running regression individually for 100 equations is lot of manual work.
I just require the coefficients of t for all the 100 equations. Any way to shorten the time spent doing so?
For example, assuming you have a data frame commodity_data in R with each commodity as a different column:
n <- ncol(commodity_data)
logslopes <- numeric(n)
tvec <- 0:(nrow(n)-1)
for (i in 1:n) {
m <- lm(log(commodity_data[,i]) ~ tvec)
slope <- coef(m)["tvec"]
logslopes[i] <- slope
}
There are slicker ways of doing this, but this one should work fine.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Good evening,
Even though I know it will "destroy" an actual normal distribution, I need to set a maximum and a minimum to a rnorm function in R.
I'm using survival rates in vultures to calculate population trends and although I need it to fluctuate, for logic reasons, survival rates can't be over 1 or under 0.
I tried doing it with if's and else's but I think there should be a better way to do it.
Thanks!
You could sample from a large normalized rnorm draw:
rbell <- function(n) {
r <- rnorm(n * 1000)
sample((r - min(r)) / diff(range(r)), n)
}
For example:
rbell(10)
#> [1] 0.5177806 0.5713479 0.5330545 0.5987649 0.3312775 0.5508946 0.3654235 0.3897417
#> [9] 0.1925600 0.6043243
hist(rbell(1000))
This will always be curtailed to the interval (0, 1).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I would like to find a maximum economic stress scenario restricted by a limit of the mahalanobis distance of this scenario. For this, I have to consider two functions in the optimization.
To make it easier, we can work with a simplifying problem: We have a simple linear model: y=a+bx. For this I want to minimize: sum(a+bx-y)^2. But also, I have for example the restriction that: (ab*5)/2<30.
To calculate this problem with the excel solver is not a problem. But, how I get this in r?
You could try to incorporate the constraint into the objective function, like this
# example data whose exact solution lies outside the constraint
x <- runif(100, 1, 10)
y <- 3 + 5*x + rnorm(100, mean=0, sd=.5)
# big but not too big
bigConst <- sum(y^2) * 100
# if the variables lie outside the feasible region, add bigConst
f <- function(par, x, y)
sum((par["a"]+par["b"]*x-y)^2) +
if(par["a"]*par["b"]>12) bigConst else 0
# simulated annealing can deal with non-continous objective functions
sol <- optim(par=c(a=1, b=1), fn=f, method="SANN", x=x, y=y)
# this is how it looks like
plot(x,y)
abline(a=sol$par["a"], b=sol$par["b"])
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have distribution of parameter (natural gas mixture composition) expressed in percents. How to test such data for distribution parameters (it should be gamma, normal or lognormal distribution) and generate random composition based on that parameters in R?
This might be a better question for CrossValidated, but:
it is not generally a good idea to choose from among a range of possible distributions according to goodness of fit. Instead, you should choose according to the qualitative characteristics of your data, something like this:
Frustratingly, this chart doesn't actually have the best choice for your data (composition, continuous, bounded between 0 and 1 [or 0 and 100]), which is a Beta distribution (although there are technical issues if you have values of exactly 0 or 100 in your sample).
In R:
## some arbitrary data
z <- c(2,8,40,45,56,58,70,89)
## fit (beta values must be in (0,1), not (0,100), so divide by 100)
(m <- MASS::fitdistr(z/100,"beta",start=list(shape1=1,shape2=1)))
## sample 1000 new values
z_new <- 100*rbeta(n=1000,shape1=m$estimate["shape1"],
shape2=m$estimate["shape2"])
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I developed a model for my fraud detection dataset that contains 100000 records.
In my dataset, I treated 70% of the data as training data and 30% of the data as testing data. Before generating a final model for the training data, I then scaled the data using scale=TRUE in R.
But I can't scale the prediction (i.e., testing) data alone.
How do I scale the new data?
If you want to scale the new vector (v2) using the centring and scaling parameters used to scale the original vector (v1) you can do:
v1 <- 1:10
v1_scl <- scale(v1)
v2 <- sample(20, 10)
v2_scl <- (v2 - attr(v1_scl, 'scaled:center')) / attr(v1_scl, 'scaled:scale')
or if you've used the default of centring v1 on its mean and scaling by its standard deviation, you can do:
v2_scl <- (v2 - mean(v1)) / sd(v1)
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a dataset here with latitude, longitude and salinity for an area. I have these data for three different cases. First case is for normal flow conditions, second is for high flow and third case is for waterlevelrise.
I want to understand how can we use these data and then make some type of analysis.
My data set is uploaded on https://www.dropbox.com/s/285iuyv6bugm48p/dataanalysisforthreetimes.csv
Some of the things that come up to my mind are:
Find the increase or decrease of salinity for each time or even say a pattern.
Mean salinity under different conditions
The code that I used to start in R is as follows:
mydata <- read.csv("dataanalysisforthreetimes.csv")
head(mydata)
library(reshape2)
data1 <- melt(mydata,"Lat","Long")
Would you suggest if I can fit any linear model to my data? Any suggested techniques are highly appreciated.
I want to use R to do the analysis. Can you suggest any reading as well?
mean salinity for all three conditions:
data1 <- melt(mydata,id=c("Lat","Long"))
aggregate(value ~ variable, mean, data=data1)
# variable value
#1 Highflow 4.039384
#2 Levelrise 32.238867
#3 Normal 21.153334
here is how you get the mean fro your conditions. As for linear models, you are probably best googling linear models with spatial autocorrelation in R to get your started.