Using a function to receive a vectorized output - r

I want to make a function in terms of x and coef for multiple values of x so that the output is a vector, like I've tried here:
directpoly<-function(x,coef) {
for(n in length(coef)) {
total<-sum(coef*x^(0:(n-1)))
}
total
}
This works when I input one value for x and any vector for the coefficient values, but I want more than that. I want to input a certain amount of values for the coefficients, say c(5,9,-2), and have the function produce three different values, one for each input of x for, say, x<-2:4. So in that case I'd want output 15, 14, 9. Any ideas? I am new so all help is appreciated.
Edit: I took out an "<-" that I accidentally put in there. Sorry if that was any cause for confusion. Also what I want in the end is a function
P(x) = c1 + c2*x + ... + cn*x^n-1

Does this work?
directpoly <- function(x, coef) {
seqcoef <- seq_along(coef) - 1
sapply(x, function(z) sum(coef*z^seqcoef))
}
directpoly(2:4, c(5,9,-2))
# [1] 15 14 9
If so, the trick to solving this is two-steps:
Determine what you want to do with each value of x (no vector). In this case, it's simply from among:
sum(coef*x^(1:length(coef)-1))
sum(coef*x^(0:(length(coef)-1)))
sum(coef*x^(seq_along(coef)-1))
Because I'm eventually putting this into some loop/apply formulation, I don't need to recalculate the sequence each time, so I break it out:
seqcoef <- seq_along(coef) - 1
sum(coef*x^seqcoef)
Now that you know what to do with each x`, now map or apply over it:
sapply(x, function(z) ...)
where ... is what we determined above. For clear coding, many believe the technique of hard-defining this function is good, so something like:
directpoly1 <- function(x, coef, seqcoef = seq_along(coef) - 1) {
sum(coef*x^seqcoef)
}
directpoly <- function(x, coef) {
seqcoef <- seq_along(coef) - 1
sapply(x, directpoly1, coef, seqcoef)
}
(I took a little more liberty with this version to enable running it explicitly with a scalar argument, primarily for unit-testing. It is not strictly necessary, so the function at the top of this answer should suffice.)

Related

How to use lapply with a condition in R to fit only one element each time

Suppose I have two vectors. Suppose further that I would like my function takes only one values of each vector and return me the output. Then, I would like another function to check the values of each run. If the output of the previous run is smaller than the new one. Then, I would like my function to stop and return me all the previous values. My original function is very complicated (estimation models). Hence, I try to provide an example to explain my idea.
Suppose that I have these two vectors:
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
Then, I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not. If yes, then stop and return me all the previous multiplication.
I tried this:However, this functions takes all the values at once and return me a list of the multiplication. I was thinking about using lapply, to fit one element at a time but I do not know how to work with the conditions.
myfun <- function(x, y, n){
multi <- list()
for ( i in 1:n){
multi[[i]] <- x[[i]]*y[[i]]
}
return(multi)
}
myfun(x,y,10)
Here is another try
x <- rnorm(1:20)
y <- rnorm(1:20)
myfun <- function(x, y){
multi <- x*y
return(multi)
}
This is the first function. I would like to run it element by element. Each time, I would like it to returns me only one multiplication result. Then, another function (wrapper function) check the result. It the second output of the first function (multiplication function) is larger than the first one, then stop, otherwise keep going.
I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not.
I would like the multiplication in a separate function. Then, I would like to check its output. So, I should have a warper function.
You can apply a for loop with a stopping condition, similar to what you have already:
# example input
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
# example function
f = function(xi, yi) xi*yi
# wrapper
stopifnot(length(x) == length(y))
res = vector(length(x), mode="list")
for (i in seq_along(x)){
res[[i]] = f(x[[i]], y[[i]])
if (i > 1L && res[[i]] > res[[i-1L]]) break
}
res[seq_len(i)]
Comments:
It is better to predefine the max length res might need (here, length(x)), rather than expanding it in the loop.
For this function (multiplication), there is no good reason to proceed elementwise. R's multiplication function is vectorized and fast.
You don't need to use a list-class output for this function, since it is returning doubles; res = double(length(x)) should also work.
You don't need to use list-style accessors for x, y and res unless lists are involved; res[i] = f(x[i], y[i]) should work, etc.

Struggling creating a difference function

So I have a homework problem that I am really struggling to code in R.
This is the problem: Write a function difference() that takes a vector X as a parameter and returns a vector of the
difference between each element and the next element:
X[2]-X[1], X[3]-X[2], X[4]-X[3], etc.
Thus difference(c(5,2,9,4,8)) would return c(-3,7,-5,4)
And so far I have this:
difference<-function(X) {
for (i in X)
X.val<-X[i]-X[i-1]
return(X.val)
}
difference(c(5,2,9,4,8))
I cant seem to get the function to subtract the X[2]-X[1] and it is returning one more number than it should when I run the function. Can anyone help me?
You're having a couple of problems with your code. Since this is homework, I'm not going to provide the correct code, but I'll help highlight where you're going wrong to help you get closer. The only reason I'm not providing the answer is because these are good learning experiences. If you comment with updated attempts, I'll continue to update my answer to guide you.
The issue is that you're using for (i in X), which will actually loop through the values of X and not its index. So, in your example, i will equal 5 and then 2 and then 9 and then 4 and then 8. If we start with i == 5, the code is doing this: X.val <- X[5] - X[5 - 1]. At this point you'd assign X.val to be 4 because X[5] is equal to 8 and X[4] is equal to 4. At the next iteration, i == 2. So this will set X.val to -3 because X[2] is 2 and X[1] is 5.
To fix this issue, you'd want to loop through the index of X instead. You can do this by using for (i in 1:length(X)) where length(X) will give you a number equal to the number of elements in X.
The next issue you've found is that you're getting one extra number. It's important to think about how many numbers you should have in your output and what this means in terms of where i should start. Hint: should you really be starting at 1?
Lastly, you overwrite X.val in each iteration. It surprises me that you were getting an extra number in your results given that you should have only received NA given that the last number is 8 and there are not 8 elements in X. Nevertheless, you'll need to rewrite your code so that you don't overwrite X.val, but instead append to it for each iteration.
I hope that helps.
UPDATE #1
As noted in the comments below, your code now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X[i] <- X[i] - X[i-1]
}
return(X)
}
difference(c(5, 2, 9, 4, 8))
We are now very, very close to a final solution. We just need to address a quick problem.
The problem is that we're now overriding our value of X, which is bad. Since our numbers, c(5,2,9,4,8), are passed into the function as the variable X, the line X[i] <- X[i] - X[i-1] will start to override our values. So, stepping through one iteration at a time, we get the following:
Step 1:
i gets set to 2
X[2] is currently equal to 2
We then run the line X[i] <- X[i] - X[i-1], which gets evaluated like this: X[2] <- X[2] - X[1] --> X[2] <- 2 - 5 --> X[2] <- -3
X[2] is now set to -3
Step 2:
i gets set to 3
X[3] is currently equal to 9
We then run the X[i] <- X[i] - X[i-1], which gets evaluated like this: X[3] <- X[3] - X[2] --> X[3] <- 9 - -3 --> X[3] <- 12
X[3] is now set to 12
As you can see from the first two iterations, we're overwriting our X variable, which is directly impacting the differences we get when we run our function.
To solve this, we simply go back to using X.val, like we were before. Since this variable has no values, there's nothing to be overwritten. Our function now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Now, for each iteration, nothing is overwritten and our values of X stay in tact. There are two problems that we're going to have though. If we run this new code, we'll end up with an error telling us that x.diff doesn't exist. Earlier, I told you that you can index a variable that you're making, which is true. We just have to tell R that the variable we're making is a variable first. There are several ways to do this, but the second best way to do it is to create a variable with the same class as our expected output. Since we know we want our output to be a list of numbers, we can just make X.val a numeric vector. Our code now looks like this:
difference <- function(X) {
X.val <- numeric()
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Notice that the assignment of X.val happens before we enter the for loop. As an exercise, you should think about why that's the case and then try moving it inside of the for loop and seeing what happens.
So this, solves our first problem. Try running the code and seeing what you get. You'll notice that the first element of the output is NA. Why might this be the case, and how can we fix it? Hint: it has to do with the value of i.
UPDATE #2
So now that we have the correct answer, let's look at a couple tips and tricks that are available thanks to R. R has some inherent features that it can use on vectors. To see this action, run the following example:
a <- 1:10
b <- 11:20
a + b
a - b
a * b
a / b
As you can see, R will automatically perform what is called "element wise" operations for vectors. You'll notice that a - b is pretty similar to what we were trying to do here. The difference is that a and b are two different vectors and we were dealing with one vector at a time. So how do we set up our problem to work like this? Simple: we create two vectors.
x <- c(5, 2, 9, 4, 8)
y <- x[2:length(x)]
z <- x[1:(length(x)-1)]
y - z
You should notice that y - z now gives us the answer that we wanted from our function. We can apply that to our difference function like so:
difference <- function(X) {
y <- X[2:length(X)]
z <- X[1:(length(X)-1)]
return(y-z)
}
Using this trick, we no longer need to use a for loop, which can be incredibly slow in R, and instead use the vectorized operation, which is incredibly fast in R. As was stated in the comments, we can actually skip the step of assignin those values to y and z and can instead just directly return what we want:
difference <- function(X) {
return(X[2:length(X)] - X[1:(length(X)-1)])
}
We've now just successfully created a one-line function that does what we were hoping to do. Let's see if we can make it even cleaner. R comes with two functions that are very handy for looking at data: head() and tail(). head allows you to look at the first n number of elements and tail allows you to look at the last n number of elements. Let's see an example.
a <- 1:50
head(a) # defaults to 6 elements
tail(a) # defaults to 6 elements
head(a, n=20) # we can change how many elements to return
tail(a, n=20)
head(a, n=-1) # returns all but the last element
tail(a, n=-1) # returns all but the first element
Those last two are the most important for what we want to do. In our newest version of difference we were looking at X[2:length(X)], which is another way of saying "all elements in X except the first element". We were also looking at X[1:(length(X)-1)], which is another way of saying "all elements in X except the last element". Let's clean that up:
difference <- function(X) {
return(tail(X, -1) - head(X, -1))
}
As you can see, that's a much cleaner way of defining our function.
So those are the tricks. Let's look at a couple tips. The first is to drop the return from simple functions like this. R will automatically return the last command if a function if it's not an assignment. To see this in action, try running the two different functions:
difference_1 <- function(X) {
x.diff <- tail(X, -1) - head(X, -1)
}
difference_1(1:10)
difference_2 <- function(X) {
tail(X, -1) - head(X, -1)
}
difference_2(1:10)
In difference_1 you'll notice that nothing is returned. This is because the command is an assignment command. You could force it to return a value by using the return command.
The next tip is something you won't need for a while, but it's important. Going back to the current version of difference that we have (the code you're using now, not anything I've mentioned in this update), we assign values to X.val, which causes it to "grow" over time. To see what this means, run the following code:
x.val <- numeric()
length(x)
x.val[1] <- 1
length(x)
x.val[2] <- 2
length(x)
You'll see that the length keeps growing. This is often a point of huge slowdowns in R code. The proper way to do this is to create x.val with a length equal to how big we need it. This is much, much faster and will save you some pains in the future. Here's how it would work:
difference <- function(X) {
x.val <- numeric(length=(length(X) - 1))
for (i in 2:length(X)) {
x.val[i-1] <- X[i] - X[i-1]
}
return(x.val)
}
In our current code, this doesn't make a real difference. But if you're dealing with very large data in the future, this can you hours or even days of computing time.
I hope this all helps you better understand some of the functionality in R. Good luck with everything!

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

R assign a list of values to a list of objects

Thank you for trying to help. I am happy to be corrected on all R misdemeanors.
I am not sure that I was entirely clear with my earlier post as below, so I will hope to clarify:
In the R console, my calls 'use source (etc)' to a .R file
Code within the .R file uses variables (for e.g. 'extracted info' ) ex1, ex2, ex3. These may hold strings or (a string of) numbers pulled from text.
In line with your guidance I've renamed my function to 'reset' (and ?reset indicates no other occurrences) are in scope. I'm passing both x and y which from outside the function:
#send variables ex1, ex2, ex3 together with location, loc and parse, prs to be reset with 0
reset(x<-c(loc,prs,ex1,ex2,ex3),y<-rep(c(0),length(x))) #repeats 0 in y variable as many times as there are entries for x
reset<-function(x,y){
print(c("resetting ",x," with ", y))
if (length(x) == length(y)) {x <- y
print(paste(x,"=",y),sep="") #both x and y should now be equal (to y)
} else {
paste("list lengths differ: x=",length(x)," y=",length(y),sep="")
}
}
Now both x and y are 0 but ex1, ex2 and ex3 still contain the previous values
I would like ex1, ex2 and ex3 all to be 0 before they are used in a subsequent section of code, so they don't contaminate extracted data with previous values such as:
loc<-str_locate(data[i],"=")
prs<-str_locate(data[i],",")
#extract data from the end of loc to before the occurrence of prs
ex1<-str_sub(data[i],loc[2]+1,prs[1]-1)
#cleanup
#below is simplified for example;
#in reality I wish to send ex1:ex(n) to be reset with values val1:val(n)
The desired outcome would be that back in the Rconsole >ex1 should now return 0.
Hope you can understand my dilemma and possibly help.
Say my code uses some variables to hold data extracted from a string using Stringr str_sub. The variables are temporary in that I use the values to construct other strings then they should be freed up to be used in an upcoming test: i.e. if (test==true){extract<-str_sub(string, start, end)}
For a later test, I would like extract==0; simple enough, but I have a few of these and would like to do it in one fell swoop.
I've used a for loop, but if there is a simpler way, please identify this.
My attempt is using a function:
#For variables loc, prs, ex1 and x2, set all values to 0
x<-assign(x<-c(loc, prs, ex1, ex2),y<-rep(c(0),length(x)))
#Function
assign <- function(x, y) {
if(length(x)==length(y)){
for (i in 1:length(x)){x[i]<-y[i]}
print(c("Assigned",x[i]))
return (x)
} else { print (c("list lengths differ: x=",length(x)," y=",length(y)))
}
}
The problem being that this returns x as 0, but the list of variables retain their values.
I'm a bit of a noob to both r and SO, so although I've benefitted from SO's bountiful advice on numerous occasions, this is my first question, so please be gentle. I have searched this issue, but have not found what I need in a few hours now. Hope you can help.
Beware of naming a function assign. There is already one in base-r and you will create confusion.
There are a couple of problems with your function besides its name. First, you do not need the for-loop to replace x by y, as this is a basic vectorized operation. Just use x <- y ; second, your should wrap your message in paste.
asgn <- function(x, y) {
if(length(x)==length(y)){
## This step is not needed, return(y) is better as #Rick proposed in their now deleted answer
## I am leaving it to show you how the for-loop is not needed
x<-y
return (x)
} else {
print (paste("list lengths differ: x=",length(x)," y=",length(y)))
return(x)
}
}
Then, there are a couple of problems with your function call. You use <- instead of = to specify the arguments. They are only somewhat synonymous for assigning variables, but a function argument is another matter. Finally, you are trying to use x is the definition of y in the arguments (length(x)), but this is not possible, because it is not yet defined, so it is looking for x in the parent environment. You should test your function with length(3) instead.
x<-asgn(x=c(loc, prs, ex1, ex2),y=rep(c(0),length(3)))

How to print the name of current row when using apply in R?

For example, I have a matrix k
> k
d e
a 1 3
b 2 4
I want to apply a function on k
> apply(k,MARGIN=1,function(p) {p+1})
a b
d 2 3
e 4 5
However, I also want to print the rowname of the row being apply so that I can know which row the function is applied on at that time.
It may looks like this:
apply(k,MARGIN=1,function(p) {print(rowname(p)); p+1})
But I really don't do how to do that in R.
Does anyone has any idea?
Here's a neat solution to what I think you're asking. (I've called the input matrix mat rather than k for clarity - in this example, mat has 2 columns and 10 rows, and the rows are named abc1 through to abc10.)
In the code below, the result out1 is the thing you wanted to calculate (the outcome of the apply command). The result out2 comes out identically to out1 except that it prints out the rownames that it is working on (I put in a delay of 0.3 seconds per row so you can see it really does do this - take this out when you want the code to run full speed obviously!)
The trick I came up with was to cbind the row numbers (1 to n) onto the left of mat (to create a matrix with one additional column), and then use this to refer back to the rownames of mat. Note the line x = y[-1] which means that the actual calculation within the function (here, adding 1) ignores the first column of row numbers, which means it's the same as the calculation done for out1. Whatever sort of calculation you want to perform on the rows can be done this way - just pretend that y never existed, and formulate your desired calculation using x. Hope this helps.
set.seed(1234)
mat = as.matrix(data.frame(x = rpois(10,4), y = rpois(10,4)))
rownames(mat) = paste("abc", 1:nrow(mat), sep="")
out1 = apply(mat,1,function(x) {x+1})
out2 = apply(cbind(seq_len(nrow(mat)),mat),1,
function(y) {
x = y[-1]
cat("Doing row:",rownames(mat)[y[1]],"\n")
Sys.sleep(0.3)
x+1
}
)
identical(out1,out2)
You can use a variable outside of the apply call to keep track of the row index and pass the row names as an extra argument to your function:
idx <- 1
apply(k, 1, function(p, rn) {print(rn[idx]); idx <<- idx + 1; p + 1}, rownames(k))
This should work. The cat() function is what you want to use when printing results during evaluation of a function. paste(), conversely, just returns a character vector but doesn't send it to the command window.
The solution below uses a counter created as a closure, allowing it to "remember" how many times the function has been run before. Note the use of the global assign <<-. If you really want to understand what's going on here, I recommend reading through this wiki https://github.com/hadley/devtools/wiki/
Note there may be an easier way to do this; my solution assumes that there is no way to access the rownumber or rowname of a current row using typical means within an apply function. As previously mentioned, this would be no problem in a loop.
k <- matrix(c(1,2,3,4),ncol=2)
rownames(k) <- c("a","b")
colnames(k) <- c("d","e")
make.counter <- function(x){
i <- 0
function(){
i <<- i+1
i
}
}
counter1 <- make.counter()
apply(k,MARGIN=1,function(p){
current.row <- rownames(k)[counter1()]
cat(current.row,"\n")
return(p+1)
})
As far as I know you cannot do that with apply, but you could loop through the rownames of your data frame. Lame example:
lapply(rownames(mtcars), function(x) sprintf('The mpg of %s is %s.', x, mtcars[x, 1]))

Resources