error in function - argument is a length of zero in R-studio - r

deadcheck<-function(a,t){ #function to check if dead for specific age at a time age sending to function
roe<-which( birthmort$age[i]==fertmortc$min & fertmortc$max) #checks row in fertmortc(hart) to pick an age that meets min and max age requirements I think this could be wrong...
prob<-1-(((1-fertmortc$mortality[roe])^(1/365))^t) #finds the prob for the row that meets the above requirements
if(runif(1,0,1)<=prob) {d<-TRUE} else {d<-FALSE} #I have a row that has the probability of death every 7 days.
return(d) #outputs if dead
Background: I am creating an agent based model that is a population in a dataframe that is simulating how Tuberculosis spreads in a population. ( I know that there are probably 10000 better ways of having done this). I have thus far created a loop that populates my dataframe with people ages etc. I am now trying to create a function that will go to a chart that lists the probability of death per year, based on a age bracket. 0-5,5-10,10-15 etc. (I have math in there b/c I want it to check who lives, dies, makes babies every 7 days). I have a function similar to this that check who is pregnant and it works. However I for the life of me can't figure out why this function is not working. I keep getting the following error.
Error in if (runif(1, 0, 1) <= prob) { : argument is of length zero
I am unsure how to fix this.
I apologize in advanced it this is a dumb question, I have been trying to teach myself to code over the last 4-5 months. If I asked this question in the wrong format or incorrectly then please let me know how to do so correctly.

Value of prob is of length zero. It means
prob = NULL
in this case. Try to print alter your code and add
print(prob)
so you can check partial result.

As you suspected in your comments, the expression
birthmort$age[i]==fertmortc$min & fertmortc$max
is problematic. What this does is evaluate the comparison birthmort$age[i]==fertmortc$min, and then takes the result of that comparison and combines it with fertmortc$max using the and operator. This involves forming the and of a Boolean value and an integer, which is unlikely to make much sense.
Just guessing, you perhaps want:
birthmort$age[i] >= fertmortc$min & birthmort$age[i] <= fertmortc$max
I don't know if this will fix your problem -- you haven't given enough to test it. For optimal help, you should give a reproducible example. See this for how to do so in R

Related

ezANOVA not providing Greenhouse Geiser correct df though violated

I've noticed that sometimes when I use ezANOVA from package ez I get columns stating what the Greenhouse-Geiser corrected df values are, and other times the tables with the sphericity corrections do not include the new df values, even though there are violations. For example, I just ran a 2-way repeated measures anova, and my table output looks like this:
I wish I could give repeatable data, but I genuinely don't know why it does or doesn't do it sometimes. Does anyone else know? I'll show my code below in case there's something I'm missing regarding the actual ezANOVA function. I could do the Df values by hand, but haven't found a good resource online to show me how to correct them using the epsilon value and I unfortunately was never taught that.
ez::ezANOVA(data = IntA2, wid = rat, dv = numReinforcers, within = .(component, minute))
Edit: A very nice person on the internet has explained to me how to calculate the new df values by hand (multiplying the GG epsilon by the old Dfs, in case any one else was wondering!) but I'm still unclear on why sometimes the function does it for you and other times it does not.

Mean value for different groups

I am stuck with a 'for' loop and would greatly appreciate some help.
I have a dataframe, called 'df' including data for the number of people per household (household_size), ranging from 0 (I replaced the missing values with a 0) to 8, as well as the number of car.
My aim is to write a quick code that computes the average number of cars depending on the household size.
I tried the following:
avg <- function(df){
i <- df$household_size
for (i in 0 : 8){
print(mean(df$car))
}
}
I'm pretty sure I'm missing something really basic here, but I don't know what.
Thanks everyone for your input.
I wouldn't have used a function for this. However, this is an exercise as part of an introductory coding with R module that specifically requires a for-loop.
Here a solution to print the mean for each size group using a for loop. Let me know if it worked
for(i in unique(df$household_size)){
print(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
As mentioned in a comment, I took away the function part because I don't see the point of having it. But if it's mandatory, you can use lapply, that behaves a bit like a for loop according to me:
lapply(unique(df$household_size), function(i){
return(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
)

nTrials must be be greater.... issue on conjoint design

I'm trying to create a list of conjoint cards using R.
I have followed the professor's introduction, with my own dataset, but I'm stuck with this issue, which I have no idea.
library(conjoint)
experiment<-expand.grid(
ServiceRange = c("RA", "Active", "Passive","Basic"),
IdentProce = c("high", "mid", "low"),
Fee = c(1000,500,100),
Firm = c("KorFin","KorComp","KorStrt", "ForComp")
)
print(experiment)
design=caFactorialDesign(data=experiment, type="orthogonal")
print(design)
at the "design" line, I'm keep getting the following error message:
Error in optFederov(~., data, nTrials = i, approximate = FALSE, nRepeats = 50) :
nTrials must not be greater than the number of rows in data
How do I address this issue?
You're getting this error because you have 144 rows in experiment, but the nTrials mentioned in the error gets bigger than 144. This causes an error for optFederov(), which is called inside caFactorialDesign(). The problem stems from the fact that your Fee column has relatively large values.
I'm not familiar with how the conjoint package is set up, but I can show you how to troubleshoot this error. You can read the conjoint documentation for more on how to select appropriate experimental data.
(Note that the example data in the documentation always has very low numeric values, usually values between 1-10. Compare that with your Fee vector, which has values up to 1000.)
You can see the source code for a function loaded into your RStudio namespace by highlighting the function name (e.g. caFactorialDesign) and hitting Command-Return (on a Mac - probably something similar on PC). You can also just look at the source code on GitHub.
The caFactorialDesign is implemented here. That link highlights the line (26) that is throwing the error for you:
temp.design<-optFederov(~., data, nTrials=i, approximate=FALSE, nRepeats=50)
Recall the error message:
nTrials must not be greater than the number of rows in data
You've passed in experiment as the data parameter, so nrow(experiment) will tell us what the upper limit on nTrials is:
nrow(experiment) # 144
We can actually just think of the error for this dataset as:
nTrials must not be greater than 144
Ok, so how is the value for nTrials determined? We can see nTrials is actually an argument to optFederov(), and its value is set as i - often a sign that there's a for-loop wrapping an operation. And in fact, that's what we see:
for (i in ca.number: profiles.number)
{
temp.design<-optFederov(~., data, nTrials=i, approximate=FALSE, nRepeats=50)
...
}
This tells us that optFederov() is going to get called for each value of i in the loop, which will start at ca.number and will go up to profiles.number (inclusive).
How are these two variables assigned? If we look a little higher up in the caFactorialDesign() definition, ca.number is defined on lines 5-9:
num <- data.frame(data.matrix(data))
vars.number<-length(num)
levels.number<-0
for (i in 1:length(num)) levels.number<-levels.number+max(num[i])
ca.number<-levels.number-vars.number+1
You can run these calculations outside of the function - just remember that data == experiment. So just change that first line to num <- data.frame(data.matrix(experiment)), and then run that chunk of code. You can see that ca.number == 1008!!
In other words, the very first value of i in the for-loop which calls optFederov() is already way bigger than the max limit: 1008 >> 144.
It's possible you can include these numeric values as factors or strings in your definition of experiment - I'm not sure if that is an appropriate way to do this analysis. But I hope it's clear that you won't be able to use such large values in caFactorialDesign(), unless you have a much larger number of total observations in your data.

Matrice help: Finding average without the zeros

I'm creating a Monte Carlo model using R. My model creates matrices that are filled with either zeros or values that fall within the constraints. I'm running a couple hundred thousand n values thru my model, and I want to find the average of the non zero matrices that I've created. I'm guessing I can do something in the last section.
Thanks for the help!
Code:
n<-252500
PaidLoss_1<-numeric(n)
PaidLoss_2<-numeric(n)
PaidLoss_3<-numeric(n)
PaidLoss_4<-numeric(n)
PaidLoss_5<-numeric(n)
PaidLoss_6<-numeric(n)
PaidLoss_7<-numeric(n)
PaidLoss_8<-numeric(n)
PaidLoss_9<-numeric(n)
for(i in 1:n){
claim_type<-rmultinom(1,1,c(0.00166439057698873, 0.000810856947763742, 0.00183509730283373, 0.000725503584841243, 0.00405428473881871, 0.00725503584841243, 0.0100290201433936, 0.00529190850119495, 0.0103277569136224, 0.0096449300102424, 0.00375554796858996, 0.00806589279617617, 0.00776715602594742, 0.000768180266302492, 0.00405428473881871, 0.00226186411744623, 0.00354216456128371, 0.00277398429498122, 0.000682826903379993))
claim_type<-which(claim_type==1)
claim_Amanda<-runif(1, min=34115, max=2158707.51)
claim_Bob<-runif(1, min=16443, max=413150.50)
claim_Claire<-runif(1, min=30607.50, max=1341330.97)
claim_Doug<-runif(1, min=17554.20, max=969871)
if(claim_type==1){PaidLoss_1[i]<-1*claim_Amanda}
if(claim_type==2){PaidLoss_2[i]<-0*claim_Amanda}
if(claim_type==3){PaidLoss_3[i]<-1* claim_Bob}
if(claim_type==4){PaidLoss_4[i]<-0* claim_Bob}
if(claim_type==5){PaidLoss_5[i]<-1* claim_Claire}
if(claim_type==6){PaidLoss_6[i]<-0* claim_Claire}
}
PaidLoss1<-sum(PaidLoss_1)/2525
PaidLoss3<-sum(PaidLoss_3)/2525
PaidLoss5<-sum(PaidLoss_5)/2525
PaidLoss7<-sum(PaidLoss_7)/2525
partial output of my numeric matrix
First, let me make sure I've wrapped my head around what you want to do: you have several columns -- in your example, PaidLoss_1, ..., PaidLoss_9, which have many entries. Some of these entries are 0, and you'd like to take the average (within each column) of the entries that are not zero. Did I get that right?
If so:
Comment 1: At the very end of your code, you might want to avoid using sum and dividing by a number to get the mean you want. It obviously works, but it opens you up to a risk: if you ever change the value of n at the top, then in the best case scenario you have to edit several lines down below, and in the worst case scenario you forget to do that. So, I'd suggest something more like mean(PaidLoss_1) to get your mean.
Right now, you have n as 252500, and your denominator at the end is 2525, which has the effect of inflating your mean by a factor of 100. Maybe that's what you wanted; if so, I'd recommend mean(PaidLoss_1) * 100 for the same reasons as above.
Comment 2: You can do what you want via subsetting. Take a smaller example as a demonstration:
test <- c(10, 0, 10, 0, 10, 0)
mean(test) # gives 5
test!=0 # a vector of TRUE/FALSE for which are nonzero
test[test!=0] # the subset of test which we found to be nonzero
mean(test[test!=0]) # gives 10, the average of the nonzero entries
The middle three lines are just for demonstration; the only necessary lines to do what you want are the first (to declare the vector) and the last (to get the mean). So your code should be something like PaidLoss1 <- mean(PaidLoss_1[PaidLoss_1 != 0]), or perhaps that times 100.
Comment 3: You might consider organizing your stuff into a dataframe. Instead of typing PaidLoss_1, PaidLoss_2, etc., it might make sense to organize all this PaidLoss stuff into a matrix. You could then access elements of the matrix with [ , ] indexing. This would be useful because it would clean up some of the code and prevent you from having to type lots of things; you could also then make use of things like the apply() family of functions to save you from having to type the same commands over and over for different columns (such as the mean). You could also use a dataframe or something else to organize it, but having some structure would make your life easier.
(And to be super clear, your code is exactly what my code looked like when I first started writing in R. You can decide if it's worth pursuing some of that optimization; it probably just depends how much time you plan to eventually spend in R.)

Getting value of some elemnets by having their difference

I am currently working on particular algorithm, but I face with a problem that I'm not sure what I have to do to resolve it. I appreciate if anyone helps me out.
There are some objects{O1,O2,O3,.....}, each of them has a value that we don't know about its amount, we call them {V1,V2,V3,....} also there is another element we call it w(w1,w2,w3.....) which shows the difference between values, I mean w1=v2-v1, w2=v3-v2,w3=v4-v3 and so on. I'm wondering if there is any way to get value of v1,v2,v3...etc without having the value of V1?
Looking forward for your reply guys,
Thanks.
Not in general. Knowing the differences between successive numbers in a list of numbers under-determines the set of numbers. This is particularly obvious in the case when w1 = w2 = w3 = ... = wk = 1. That would tell you that the viare consecutive numbers, but nothing else could be inferred. You wouldn't be able to distinguish 3,4,5,6,7 from 10,11,12,13,14 (for example).
Having said that, it would of course be possible if you know one of the numbers, and the known number wouldn't need to be the first one. Knowing any single one of the numbers would suffice. Furthermore, knowing something like the sum of the vi would be sufficient since you could express the sum as a function of the unknown number v1 and solve the resulting equation.

Resources