Multiple regressions with loop in loop in R - r

I want to run the following regressions, the variable which has the problem is EP, is a dummy variable and I must to check different cases, z (lenght=1000) is the threshold variable. Ι want to crate 1000 different variables of EP from z variable and save the coefficients. I use a loop in loop but the results are completely wrong.The code runs properly and does not make an error. The square brackets and parentheses are the code I run. The problem is that there is a huge delay and the results after two hours still running.
I reduced the sample by 99% and again I did not get a result, the code ran without problem .
I do not want anything special, just for each value of z to run a different regression and end up to stored the estimates. I can not understand why take so long. Any idea?
for (k in 1:1000){
z<-u[k]
for (i in 1:length(dS)){
if (dS[i]>=z) {
EP[i]=1
} else {
EP[i]=0
}
fitT <- dynlm(dR ~ L(dR,1)+L(EN)+L(EP)+L(ΚΜ,1)
prob[[k]] <- summary(fitT)$coefficients[1, 2]
}

You don't have a closing } for the i-loop; you also don't have a closing ) for dynlm.
Note, you can really replace your i-loop by
EP <- as.integer(dS >= z)
Next time when asking question, be clear and specific. What do you mean by "I use a loop in loop but the results are completely wrong"? Error message, etc?

Related

How can I add a simulation counter to each simulation in parallel processing? R

I need to add a simulation counter to each simulation in my code using parallel processing. So for each simulation there should be an additional value stating this is "simulation x", the next simulation will be "simulation x+1" etc. which will be stored in an additional column. The problem is that when I attempt to add a counter with a for loop then the counter only stores one digit for each combination of beta, theta and delta; not for each iteration as well. i.e. the pseudo code to help visualise this attempted solution is:
counter<-1
start parallelisation{
function
counter<-counter+1
}
end parallelisation
I've created a very simplified version of my code, hopefully if you can find a solution to this problem then I can apply the same solution to the more complex script. Note I am using 20 cores to solve my issue, you will of course know that you need to specify a reasonable amount of cores based on your PC specifications. Below is the code:
library("parallel")
betavalues<-seq(from=50,to=150,length.out=3)
thetavalues<-seq(from=200,to=300,length.out=3)
deltavalues<-seq(from=20,to=140,length.out=3)
outputbind<-c()
iterations<-5
examplefunction<- function(i=NULL){
for (j in betavalues){
for(k in thetavalues){
for(l in deltavalues){
output<-data.frame(beta=j,theta=k,delta=l)
outputbind<-rbind(outputbind,output)
}
}
}
data<-data.frame(beta=outputbind$beta,theta=outputbind$theta,delta=outputbind$delta)
}
cl <- makeCluster(mc <- getOption("cl.cores", 20))
clusterExport(cl=cl, varlist=ls())
par_results <- parLapply(1:iterations, fun=examplefunction, cl=cl)
clusterEvalQ(cl,examplefunction)
stopCluster(cl)
data <- do.call("rbind", par_results)
To clarify, I wish to add an additional column to data that will state the simulation number.
This problem has been bugging me for weeks, and a solution would be immensely appreciated!
Edit: Adding a sequence of numbers based on the length of data post parallel processing is not a sufficient solution, as the length of each simulation will vary in the more complicated script. Therefore, the solution needs to be added within or prior to the object data being created.

R: leave a dataset out in a loop

I've got a problem doing some calculations in R: I've got a large amount of datasets A[i] and on each of those datasets I'm running some iterated calculations, until the difference between two iterations becomes small enough. However, for one particular A[j], my calculations take way too long, so I suspect that data doesn't fit my method too well and I want to leave it out.
So my question is: can I write a condition in my while loop, such that if a certain time period is exceeded, R just disregards that dataset and goes on to the next one? So does there exist something like this:
while (abs (a-b) > 0.01){
calculations
for (j in 1:n){
if (time > amount){results A[j] <- 0}
}
}
Thanks in advance!
Use Sys.time() before you enter the loop, and then inside the loop check the time again to get the time the loop is running.

Trying to find mean of each column in a data set

Hello everyone I am fairly new to r programming and i was wondering if someone could help me out. I was just playing with r and wanted to make a function that returned a vector of the means of each column in a data set that the user would put in as an argument. The problem is I am trying to do it without the mean r the apply functions so I am just manually trying it out and feel I am very close to finishing it. Just wanted to ask if someone could check it to see where I made an error.
Here is my code:
findMeans<- function(data)
{
meanVec <- numeric()
for(i in 1:6)
{
mean=0
for( j in 1:153)
{
value=0
count=0
if(is.na(data[j,i])==FALSE)
{
value= value + data[i,j]
count=count+1
}
else
{
value= value +0
}
}
mean =value/count
meanVec[i]<-mean
}
meanVec
}
and when I try to list the vector it just gives this
> meanVec
numeric(0)
could anyone possibly shed some light on what I am doing wrong?
If you're looking for function writing practice, and are already aware of the colMeans function, there's a couple errors I spotted.
1) I assume that when you're going from 1:6, you're going through each column in your data frame, and 1:153, you're going through each row. If this is accurate, your value=0 and count = 0 statements should be moved a level up, next to mean = 0. Otherwise, you're resetting the value to zero every row you go through, which won't do anything but report the last value it comes across.
2) In the line value= value + data[i,j], you need data[j,i] instead. You reversed the row and column values.
With those two changes, your function seems to work for a data set with 6 columns and 153 rows. For more practice, I'd recommend trying to find a way to generalize the function for any number of columns and rows.

store results of for loop in unique objects

Here is a simple loop
for (i in seq(1,30)) {
mdl<-i
}
How do I get 30 mdl rather than just one mdl (which is happening because within the loop, mdli is being replaced by mdli+1 at every iteration. What I want is to have 30 mdl perhaps with names like mdl1, mdl2 ....mdl30
I tried this:
for (i in seq(1,30)) {
mdli<-i
}
But if I type mdl1, it says mdl1 not found whereas typing mdli gives me the value of i=5
Thank you
You can specify your store variable beforhand without determine how many values it shall store. If you want for each value a seperate variable take a look at the paste function.
x<- NULL
for (i in 1:10){
x[i] <- i*2
}
*edit: The comment above is right. This way is not the most efficent one. But I still use it when computation time is not an issue.

Error in "If" statement in R coding

This is my code:
#Start of my Code#
test1<-function(c,x){
high=0
low=0
samp=NULL
samp=sample(c,x)
for(i in 1:x){
if(samp[i]>1){high=high+1}
else if (samp[i]<0){low=low+1}}
c(high,low,mean(samp),var(samp),samp)
}
sim1 <-function(c,x){
replicate(nsim,{test1(c,x)})}
size=10
a<-sim1(overall,size)
listnormwor=NULL
countnormwor=0
meannormwor=NULL
for(i in 0:nsim-1){
**if (a[1+(size+4)*i]+a[2+(size+4)*i]==0)**{
countnormwor=countnormwor +1
for (z in 5:(size+4)){
listnormwor=c(listnormwor, a[z+(size+4)*i])}
meannormwor=c(meannormwor,a[3+(size+4)*i])}
}
countnormwor
mean(meannormwor)
var(listnormwor)
Simply, I want to say if there are no outliers (indicated as '0' in first and second value of every 14 data points), count it into a normal bucket and keep its values to calculate variance and mean later.
But the problem is that, it generates all values from a and at the very end, it provides actual values I want.
For example, it must satisfy length(listnormwor) = 10 * countnormwor
But it gives me a ridiculous amount of data and when I play around with the if statement, it says "missing value where TRUE/FALSE needed."
I'd suggest stepping through the code (sending each line to the interpreter) one line at a time. Inspect the value of variables by calling them in the interpreter. I bet this will lead you to the source of your problem. To start, create the values x and c inside the function then work from there. Instead of running the for loop, create your own index variable i. Again, the point is to work line by line and carefully check your expectations against the values that variables take at each point.

Resources