My function works in R console but not in R script - r

I tried to write some functions to calculate anova power and sample size using non-central parameter.
There're some very good functions in R but my functions were to learn and reproduce line of thought from a biostatistical book...
Despite de math involved, my "nc" and "fpower" functions just work well, and as expected:
nc <- function(diff,n,sd) {
nonc <- (diff^2/2)*(n/sd^2)
return(nonc)
}
fpower <- function(k,n,diff,sd,alpha=0.05) {
nonc <- nc(diff,n,sd)
dfn <- k - 1
dfd <- k*(n-1)
f1 <- qf(1-alpha,dfn,dfd)
f2 <- pf(f1,dfn,dfd,nonc)
return(1-f2)
}
However, my "fsample" just doesn´t work as expected. Return 2, the first n in the seq.
fsample <- function(k,diff,sd,alpha=0.05,power=0.9){
for(n in 2:5000){
if ( fpower(k,n,sd,alpha) >= power) break
}
return(n)
}
But, if I "hand" run this code in console it work as expected!!
And return the right n value.
What's wrong?

You didn't pass the diff argument to fpower, so the arguments aren't in the order you think they are. fsample should be:
fsample <- function(k,diff,sd,alpha=0.05,power=0.9){
for(n in 2:5000){
if ( fpower(k,n,diff,sd,alpha) >= power) break
}
return(n)
}
Note that this wouldn't have been a problem if you had named the arguments when you called fpower because you would have received an error about diff being missing and not having a default value:
# this will error
fsample <- function(k,diff,sd,alpha=0.05,power=0.9){
for(n in 2:5000){
if ( fpower(k=k,n=n,sd=sd,alpha=alpha) >= power) break
}
return(n)
}
Also, you might want to avoid giving data objects the same name as functions (e.g. diff, sd, and power are also functions), otherwise you may confuse yourself.

Related

Using group_modify to apply function to grouped dataframe

I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.
The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.

User defined function - issue with return values

I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.

R for-loop iterating from central value out to extremes

I'm trying to improve the speed of my code, which is trying to optimise a value using 3 variables which have large ranges. The most likely output uses values in the middle of the ranges, so it is wasting time starting from the lowest possible value of each variable. I want to start from the middle value and iterate out! The actual problem has thousands of lines with numbers from 150-650. C,H and O limits will be defined somewhat based on the starting number, but will always be more likely at a central value in the defined range. Is there a way to define the for loop to work outwards like I want? The only, quite shabby, way I can think of is to simply redefine the value within the loop from a vector (e.g. 1=20, 2=21, 3=19, etc). See current code below:
set_error<-2.5
ct<-c(325.00214,325.00952,325.02004,325.02762,325.03535,325.03831,325.04588, 325.05641,325.06402,325.06766,325.07167,325.07454,325.10396)
FormFun<-function(x){
for(C in 1:40){
for(H in 1:80){
for(O in 1:40){
test_mass=C*12+H*1.007825+O*15.9949146-1.0072765
error<-1000000*abs(test_mass-x)/x
if(error<set_error){
result<-paste("C",C,"H",H,"O",O,sep ="")
return(result)
break;break;break;break
}
}
}
}
}
old_t <- Sys.time()
ct2<-lapply(ct,FormFun)
new_t <- Sys.time() - old_t # calculate difference
print(new_t)
Use vectorization and create a closure:
FormFun1_fac <- function(gr) {
gr <<- gr
function(x, set_error){
test_mass <- with(gr, C*12+H*1.007825+O*15.9949146-1.0072765)
error <- 1000000 * abs(test_mass - x) / x
ind <- which(error < set_error)[1]
if (is.na(ind)) return(NULL)
paste0("C", gr[ind, "C"],"H", gr[ind, "H"],"O", gr[ind, "O"])
}
}
FormFun1 <- FormFun1_fac(expand.grid(C = 1:40, H = 1:80, O = 1:40))
ct21 <- lapply(ct, FormFun1, set_error = set_error)
all.equal(ct2, ct21)
#[1] TRUE
This saves a grid of all combinations of C, H, O in the function environment and calculates the error for all combinations (which is fast in vectorized code). The first combination that passes the test is returned.

Calling user inputs prompted through one R function in a second R function

Thank you very much in advance for helping me out - I am new to R programming, and have got stuck with trying to use user-inputs accepted through one function within another. The second function is a simple pay calculator where the three variables are number of hours, per hour pay rate and the number of times the rate is multiplied once the number of working hours exceeds 180. I have written a first function called enterval through which I am asking the user to enter the above variables. In a second function called salary, I am trying to use enterval to accept the inputs before running the payout calculations. I am getting stuck because the second function, salary, is breaking when I come to an "if" condition, specifying if h > 180. I am sharing my code below. Thanks again for your kind assistance. I searched among previous answers but could not find a specific instance that fully answered my query - apologies if I missed out an appropriate previous response.
The error I am getting on running this code is "Error in h > 180 :
comparison (6) is possible only for atomic and list types"
enterval <- function() {
h <- (readline("Please enter number of hours: "))
h <- as.integer(h)
r <- (readline("Please enter applicable rate: "))
r <- as.integer(r)
m <- (readline("Please confirm your multiplier: "))
m <- as.integer(m)
}
salary <- function () {
enterval()
if (h > 180) {
totalpay <- (180*r) + ((h-180)*r*m)
}
else {
totalpay <- (h*r)
}
totalpay
}
I think that what you need is the function to be like this:
enterval <- function() {
h <- (readline("Please enter number of hours: "))
h <- as.numeric(h)
r <- (readline("Please enter applicable rate: "))
r <- as.numeric(r)
m <- (readline("Please confirm your multiplier: "))
m <- as.numeric(m)
list(h=h, r=r, m=m)
}
salary <- function () {
inputs <- enterval()
if (inputs$h > 180) {
totalpay <- (180*inputs$r) + ((inputs$h-180)*inputs$r*inputs$m)
}
else {
totalpay <- (inputs$h*inputs$r)
}
totalpay
}
Output:
> salary()
Please enter number of hours: 5
Please enter applicable rate: 0.5
Please confirm your multiplier: 2
[1] 2.5
In your question, enterval just returned the value stored in m but even that was not saved anywhere (because you did not assign that to a variable inside salary so it could not be used by salary. In R functions return only the last object (or what the function return returns if used). In the function above I return a list with elements h, r and m.
Then I save that list to inputs which can be used by salary. Elements in inputs can be accessed using the $ operator.
Also, as a small addition, when you say rate I believe it is a number between 0-1 so I changed as.integer to as.numeric because as.integer will round down to the integer. Feel free to change that back to as.integer if indeed you needed an integer.
EDIT
Better and probably more advanced way of writing salary:
As per #RichardScriven 's comment a good way to avoid typing all the input$* variables is to use list2env like this:
salary <- function () {
inputs <- enterval()
list2env(inputs, environment())
if (h > 180) {
totalpay <- (180*r) + (h-180)*r*m)
}
else {
totalpay <- (h*r)
}
totalpay
}
list2env will essentially create variables out of the list elements inside salary's environment, which are immediately accesible without needing to use input$*.
Variables assigned in R functions (similar to many other programming languages) have limited scope, this means that the m you assign in your function will only be available within that function. If you want your variable to be available outside of the function you have a two major options:
Return the variable, this is the preferred option, its much cleaner and is good programming practice for numerous reasons described in many stack overflow post. An important thing to remember is a function can only return one variable.
You can do a global assignment, this will make the variable in your function have a global scope and be accessible within all functions. The code for this is m <<- 1 as opposed to m <- 1. This isn't recommended for a variety of reasons. See Global variables in R or Global and local variables in R for more on this subject.
Since you can only return one variable you might put all three objects into a data frame or a list and return that. Though I would question whether you want the value entry done in a function. Additionally if you're user input is primary to your goal R might not be the right language. That being said the code below accomplishes what you're looking for
enterval <- function() {
h <- (readline("Please enter number of hours: "))
h <- as.integer(h)
r <- (readline("Please enter applicable rate: "))
r <- as.integer(r)
m <- (readline("Please confirm your multiplier: "))
m <- as.integer(m)
salaryVariables <- data.frame("hours" = h, "rate" = r, "multiplier" = m)
return(salaryVariables)
}
salary <- function(salaryInfo) {
r <- salaryInfo$rate
h <- salaryInfo$hours
m <- salaryInfo$multiplier
if (h > 180) {
totalpay <- (180*r) + ((h-180)*r*m)
}
else {
totalpay <- (h*r)
}
return(totalpay)
}
mySalary <- enterval()
salary(mySalary)

Using self-defined functions in R to produce different random numbers Each time

I am trying to produce different uniform numbers using the Lehmar random number generator. I believe I have done this but I have a problem in producing different numbers each time I execute this function. Below is the code I am trying to do and I will explain the problem further underneath it.
MODULUS <- 2147483647
MULTIPLIER <- 48271
put_Seed <- function(x)
{
x <- (if ( x > 0)
{
x%%MODULUS
}
else
{
1000*as.numeric(Sys.time())
}
)
}
T_val <- function(Rand)
{
Q <- floor(MODULUS / MULTIPLIER)
R <- MODULUS%%MULTIPLIER;
floor(MULTIPLIER*(Rand%%Q) - R*(Rand/Q))
}
New_Random_Seed <- function(T_value_i)
{
Random_Seed <- (if (T_value_i > 0)
{
T_value_i;
}
else
{
T_value_i + MODULUS
})
}
Random <- function(New_Seed)
{
New_Seed/MODULUS
}
uniform_num <- function(a, b, r)
{
a + (b - a) * r
}
Random_Seed <- put_Seed(123456789)
uni_num <- function(k)
{
Random_Seed <- put_Seed(k)
T_value <- T_val(Random_Seed)
Random_Seed <- New_Random_Seed(T_value)
uniform_num(0, 1, Random(Random_Seed))
}
test1 <- uni_num(Random_Seed)
test2 <- uni_num(Random_Seed)
test3 <- uni_num(Random_Seed)
#Results
#test1 = 0.05380306
#test2 = 0.05380306
#test3 = 0.05380306
What I am trying to do is whenever I run the uni_num function that each time, the Random_Seed gets updated and the uniform_num(0, 1, Random(Random_Seed)) line produces a random uniform number between 0 and 1 each time the function is executed. The code works for 1 repetition but if I try to use the function again the Random_Seed has not being updated and hence the function will produce the same random number as before. This is undesirable as I wish to produce different random number's each time by having the Random_Seed updated after each repetition. Forgive me if there is a simple solution but my head is wrecked from trying to find an answer. Cheers :)
In this function:
uni_num <- function(k)
{
Random_Seed <- put_Seed(k)
T_value <- T_val(Random_Seed)
Random_Seed <- New_Random_Seed(T_value)
uniform_num(0, 1, Random(Random_Seed))
}
The target of the assignment Random_Seed <- is in the environment of the function body, and not the global environment. Thus, when you make a second call, Random_Seed in the global environment has not been modified, and you get the same results.
To write to the global environment instead, use <<-:
Random_Seed <<- New_Random_Seed(T_value)
It looks like this is the only assignment that must be modified, as New_Random_Seed returns a value and doesn't require modification of this global object. In addition, the first assignment to Random_Seed in uni_num might as well be in the function body environment.
Note that it's bad form for a function to write to the global environment. There's always a better way. But this will work for your example code.

Resources