How do I modify arguments inside a function? - r

I have a series of lines of code that replace the contents of an existing column based on the contents of another column (i.e. I am creating a categorical variable where the 'cut' function is not applicable). I am new to R and want to write a function that will perform this task on all data.frames without having to insert and customize 50 lines of code each time.
X is the data frame, Y is the categorical variable, and Z is the other (string) variable. This code works:
X$Y <- ""
X <- transform(X, Y=ifelse(Z=="Alameda",20,""))
... (many more lines)
For example I do:
d.f$loc <- ""
d.f <- transform(d.f, loc=ifelse(county=="Alameda",20,""))
# ... and so on
Now I want to do this for several dataframes and different columns instead of loc and county.
However, neither of these functions produces the desired results:
ab<-function(Y,Z,env=X) {
env$Y<-transform(env,Y=ifelse(Z=="Alameda",20,""))
...
}
abc<-function(X,Y,Z) {
X<-transform(X,Y=ifelse(Z=="Alameda",20,""))
...
}
Both of these functions run without error but do not alter the data frame X in any way. Am I doing something wrong in calling the environment or using a function within another function? It seems like a simple question and I would not post if I had not already spent 5+ hours trying to learn this. Thanks in advance!

R uses "call by value" for all objects. Only the return value goes back to the calling enviroment. parameter passing mechanism in R
You can do 
ab <- function(X, Y, Z) {
X <- transform(X, Y=ifelse(Z=="Alameda",20,""))
...
return(X)
}
If your dataframes are in a list L you can do lapply(L, ab) or eventually lapply(L, ab, Y=..., Z=...) As a result you will get a list of the modified dataframes. BTW: Have also a look at with() and within(), e.g. X$Y <- with(X, ifelse(Z=="Alameda",20,""))
implicit returning the value
There is no need for an explicit call of return(...) - you can do it implicit, i.e. using the issue that a function returns the value of its last calculated expression:
ab <- function(X, Y, Z) {
X <- transform(X, Y=ifelse(Z=="Alameda",20,""))
...
X ### <<<<< last expression
}
Here is example how you can do it for your situation:
ab <- function(X, Y, Z) {
X[, Y] <- ifelse(X[,Z]>12,20,99)
# ...
X ### <<<<< last expression
}
B <- BOD # BOD is one of the dataframes which come with R
ab(B, "loc", "demand")

Related

Passing arguments for variables in a function

I am brand-new to R and trying to understand the basic syntax of functions.
In both of the functions f(x1) and g(x,1) below, I would like to generate y=2. Only the former works.
I'm familiar with the str_interp() and paste() functions, but those seem to work only in the context of strings, not variables. E.g., prefixnum <- str_interp("${prefix}${num}") doesn't solve the issue.
My motivation is that I'd like to call a function by specifying components of variable names. My background is in Stata, where placeholders are designated with a backtick and a tick (e.g., `prefix'`num'). I've consulted a few relevant resources, to no avail.
As an aside, I've read varying thoughts about whether variables should be prefixed with its dataframe (e.g., df$var). What is the logic behind whether or not to follow this convention? Why does f(df$x1) work, but writing f(x1) and modifying the function to be y <- df$var*2 not work?
df <- data.frame(x1=1)
f <- function(var) {
y <- var*2
y
}
f(df$x1)
g <- function(prefix,num) {
y <- df$prefixnum*2 #where "prefixnum" is a placeholder of some sort
y
}
g(x,1)
Possibly you are trying to pass column name as an argument to the function. You can try to paste prefix and num together to get column name and use that to subset dataframe.
g <- function(data, prefix, num) {
y <- data[[paste0(prefix, num)]] *2
y
}
g(df,'x', 1)
#[1] 2

Write a loop for my function in r

I am currently trying to write my first loop for lagged regressions on 30 variables. Variables are labeled as rx1, rx2.... rx3, and the data frame is called my_num_data.
I have created a loop that looks like this:
z <- zoo(my_num_data)
for (i in 1:30)
{dyn$lm(my_num_data$rx[i] ~ lag(my_num_data$rx[i], 1)
+ lag(my_num_data$rx[i], 2))
}
But I received an error message:
Error in model.frame.default(formula = dyn(my_num_data$rx[i] ~ lag(my_num_data$rx[i], :
invalid type (NULL) for variable 'my_num_data$rx[i]'
Can anyone tell me what the problem is with the loop?
Thanks!
This produces a list, L, whose ith component has the name of the ith column of z and whose content is the regression of the ith column of z on its first two lags. Lag is same as lag except for a reversal of argument k's sign.
library(dyn)
z <- zoo(anscombe) # test input using builtin data.frame anscombe
Lag <- function(x, k) lag(x, -k)
L <- lapply(as.list(z), function(x) dyn$lm(x ~ Lag(x, 1:2)))
First problem, I'm pretty sure the function you're looking for is dynlm(), without the $ character. Second, using $rx[i] doesn't concatenate rx and the contents of i, it selects the (single) element in $rx with index i. Try this... edited I don't have your data, so I can't test it on my machine:
results <- list()
for (i in 1:30) {
results[[i]] <- dynlm(my_num_data[,i] ~ lag(my_num_data[,i], 1)
+ lag(my_num_data[,i], 2))
}
and then list element results[[1]] will be the results from the first regresssion, and so on.
Note that this assumes your my_num_data data.frame ONLY consists of columns rx1, rx2, etc.
I am not super familiar with R, but it appears you are trying to increase the index of rx. Is rx a vector with values at different indices?
If not the solution my be to concatenate a string
for (i in 1:30){
varName <-- "rx"+i
dyn$lm(my_num_data$rx[i] ~ lag(my_num_data$rx[i], 1)
+ lag(my_num_data$varName, 2))
}
Again, I may be way off here, as this if my first post and R is still pretty new to me.

R assign a list of values to a list of objects

Thank you for trying to help. I am happy to be corrected on all R misdemeanors.
I am not sure that I was entirely clear with my earlier post as below, so I will hope to clarify:
In the R console, my calls 'use source (etc)' to a .R file
Code within the .R file uses variables (for e.g. 'extracted info' ) ex1, ex2, ex3. These may hold strings or (a string of) numbers pulled from text.
In line with your guidance I've renamed my function to 'reset' (and ?reset indicates no other occurrences) are in scope. I'm passing both x and y which from outside the function:
#send variables ex1, ex2, ex3 together with location, loc and parse, prs to be reset with 0
reset(x<-c(loc,prs,ex1,ex2,ex3),y<-rep(c(0),length(x))) #repeats 0 in y variable as many times as there are entries for x
reset<-function(x,y){
print(c("resetting ",x," with ", y))
if (length(x) == length(y)) {x <- y
print(paste(x,"=",y),sep="") #both x and y should now be equal (to y)
} else {
paste("list lengths differ: x=",length(x)," y=",length(y),sep="")
}
}
Now both x and y are 0 but ex1, ex2 and ex3 still contain the previous values
I would like ex1, ex2 and ex3 all to be 0 before they are used in a subsequent section of code, so they don't contaminate extracted data with previous values such as:
loc<-str_locate(data[i],"=")
prs<-str_locate(data[i],",")
#extract data from the end of loc to before the occurrence of prs
ex1<-str_sub(data[i],loc[2]+1,prs[1]-1)
#cleanup
#below is simplified for example;
#in reality I wish to send ex1:ex(n) to be reset with values val1:val(n)
The desired outcome would be that back in the Rconsole >ex1 should now return 0.
Hope you can understand my dilemma and possibly help.
Say my code uses some variables to hold data extracted from a string using Stringr str_sub. The variables are temporary in that I use the values to construct other strings then they should be freed up to be used in an upcoming test: i.e. if (test==true){extract<-str_sub(string, start, end)}
For a later test, I would like extract==0; simple enough, but I have a few of these and would like to do it in one fell swoop.
I've used a for loop, but if there is a simpler way, please identify this.
My attempt is using a function:
#For variables loc, prs, ex1 and x2, set all values to 0
x<-assign(x<-c(loc, prs, ex1, ex2),y<-rep(c(0),length(x)))
#Function
assign <- function(x, y) {
if(length(x)==length(y)){
for (i in 1:length(x)){x[i]<-y[i]}
print(c("Assigned",x[i]))
return (x)
} else { print (c("list lengths differ: x=",length(x)," y=",length(y)))
}
}
The problem being that this returns x as 0, but the list of variables retain their values.
I'm a bit of a noob to both r and SO, so although I've benefitted from SO's bountiful advice on numerous occasions, this is my first question, so please be gentle. I have searched this issue, but have not found what I need in a few hours now. Hope you can help.
Beware of naming a function assign. There is already one in base-r and you will create confusion.
There are a couple of problems with your function besides its name. First, you do not need the for-loop to replace x by y, as this is a basic vectorized operation. Just use x <- y ; second, your should wrap your message in paste.
asgn <- function(x, y) {
if(length(x)==length(y)){
## This step is not needed, return(y) is better as #Rick proposed in their now deleted answer
## I am leaving it to show you how the for-loop is not needed
x<-y
return (x)
} else {
print (paste("list lengths differ: x=",length(x)," y=",length(y)))
return(x)
}
}
Then, there are a couple of problems with your function call. You use <- instead of = to specify the arguments. They are only somewhat synonymous for assigning variables, but a function argument is another matter. Finally, you are trying to use x is the definition of y in the arguments (length(x)), but this is not possible, because it is not yet defined, so it is looking for x in the parent environment. You should test your function with length(3) instead.
x<-asgn(x=c(loc, prs, ex1, ex2),y=rep(c(0),length(3)))

How to run a function a bunch of times with different arguments in R?

I am trying to make a bunch of plots simultaneously using a function which I wrote in R to make one plot at a time. My function calls multiple arguments including a FUN argument which specifies which plotting function should be used for each sample.
I would like to read a table/dataframe into R and then run the same function on each row, where each row specifies the arguments for that run.
My function looks like this:
printTiff <- function(FUN, sample, start, end) {
tiff(paste(sample,".tiff", sep=""), compression="lzw",width=892,height=495)
g <- FUN(sample,start,end)
dev.off()
}
I have a table with a column for FUN, sample, start and end, where each row should end up a different tiff. I've tried using do.call, but I can't seem to get it right. I have several hundred samples, so I'd like to avoid changing the arguments with each run.
Sample of table:
FUN sample start end
1 T7Plots sample343 27520508 27599746
2 C9Plots sample347 27522870 27565523
3 C9Plots sample345 27535342 27570585
Use mapply:
mapply(printTiff, table[,1], table[,2], table[,3] ,table[,4])
You can use match.fun to look up the function by its name, then you can use it.
printTiff <- function(FUN, sample, start, end) {
FUN <- match.fun(FUN)
paste(sample, FUN(start), end);
}
table <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
FUN sample start end
T7Plots sample343 27520508 27599746
C9Plots sample347 27522870 27565523
C9Plots sample345 27535342 27570585
")
T7Plots <- `-`
C9Plots <- function(x) 1/x
Then we can use mapply like #alexis_laz
mapply(printTiff, table[,1], table[,2], table[,3] ,table[,4])

How to print the name of current row when using apply in R?

For example, I have a matrix k
> k
d e
a 1 3
b 2 4
I want to apply a function on k
> apply(k,MARGIN=1,function(p) {p+1})
a b
d 2 3
e 4 5
However, I also want to print the rowname of the row being apply so that I can know which row the function is applied on at that time.
It may looks like this:
apply(k,MARGIN=1,function(p) {print(rowname(p)); p+1})
But I really don't do how to do that in R.
Does anyone has any idea?
Here's a neat solution to what I think you're asking. (I've called the input matrix mat rather than k for clarity - in this example, mat has 2 columns and 10 rows, and the rows are named abc1 through to abc10.)
In the code below, the result out1 is the thing you wanted to calculate (the outcome of the apply command). The result out2 comes out identically to out1 except that it prints out the rownames that it is working on (I put in a delay of 0.3 seconds per row so you can see it really does do this - take this out when you want the code to run full speed obviously!)
The trick I came up with was to cbind the row numbers (1 to n) onto the left of mat (to create a matrix with one additional column), and then use this to refer back to the rownames of mat. Note the line x = y[-1] which means that the actual calculation within the function (here, adding 1) ignores the first column of row numbers, which means it's the same as the calculation done for out1. Whatever sort of calculation you want to perform on the rows can be done this way - just pretend that y never existed, and formulate your desired calculation using x. Hope this helps.
set.seed(1234)
mat = as.matrix(data.frame(x = rpois(10,4), y = rpois(10,4)))
rownames(mat) = paste("abc", 1:nrow(mat), sep="")
out1 = apply(mat,1,function(x) {x+1})
out2 = apply(cbind(seq_len(nrow(mat)),mat),1,
function(y) {
x = y[-1]
cat("Doing row:",rownames(mat)[y[1]],"\n")
Sys.sleep(0.3)
x+1
}
)
identical(out1,out2)
You can use a variable outside of the apply call to keep track of the row index and pass the row names as an extra argument to your function:
idx <- 1
apply(k, 1, function(p, rn) {print(rn[idx]); idx <<- idx + 1; p + 1}, rownames(k))
This should work. The cat() function is what you want to use when printing results during evaluation of a function. paste(), conversely, just returns a character vector but doesn't send it to the command window.
The solution below uses a counter created as a closure, allowing it to "remember" how many times the function has been run before. Note the use of the global assign <<-. If you really want to understand what's going on here, I recommend reading through this wiki https://github.com/hadley/devtools/wiki/
Note there may be an easier way to do this; my solution assumes that there is no way to access the rownumber or rowname of a current row using typical means within an apply function. As previously mentioned, this would be no problem in a loop.
k <- matrix(c(1,2,3,4),ncol=2)
rownames(k) <- c("a","b")
colnames(k) <- c("d","e")
make.counter <- function(x){
i <- 0
function(){
i <<- i+1
i
}
}
counter1 <- make.counter()
apply(k,MARGIN=1,function(p){
current.row <- rownames(k)[counter1()]
cat(current.row,"\n")
return(p+1)
})
As far as I know you cannot do that with apply, but you could loop through the rownames of your data frame. Lame example:
lapply(rownames(mtcars), function(x) sprintf('The mpg of %s is %s.', x, mtcars[x, 1]))

Resources