Struggling creating a difference function

Struggling creating a difference function - r

So I have a homework problem that I am really struggling to code in R.
This is the problem: Write a function difference() that takes a vector X as a parameter and returns a vector of the
difference between each element and the next element:
X[2]-X[1], X[3]-X[2], X[4]-X[3], etc.
Thus difference(c(5,2,9,4,8)) would return c(-3,7,-5,4)
And so far I have this:
difference<-function(X) {
for (i in X)
X.val<-X[i]-X[i-1]
return(X.val)
}
difference(c(5,2,9,4,8))
I cant seem to get the function to subtract the X[2]-X[1] and it is returning one more number than it should when I run the function. Can anyone help me?

You're having a couple of problems with your code. Since this is homework, I'm not going to provide the correct code, but I'll help highlight where you're going wrong to help you get closer. The only reason I'm not providing the answer is because these are good learning experiences. If you comment with updated attempts, I'll continue to update my answer to guide you.
The issue is that you're using for (i in X), which will actually loop through the values of X and not its index. So, in your example, i will equal 5 and then 2 and then 9 and then 4 and then 8. If we start with i == 5, the code is doing this: X.val <- X[5] - X[5 - 1]. At this point you'd assign X.val to be 4 because X[5] is equal to 8 and X[4] is equal to 4. At the next iteration, i == 2. So this will set X.val to -3 because X[2] is 2 and X[1] is 5.
To fix this issue, you'd want to loop through the index of X instead. You can do this by using for (i in 1:length(X)) where length(X) will give you a number equal to the number of elements in X.
The next issue you've found is that you're getting one extra number. It's important to think about how many numbers you should have in your output and what this means in terms of where i should start. Hint: should you really be starting at 1?
Lastly, you overwrite X.val in each iteration. It surprises me that you were getting an extra number in your results given that you should have only received NA given that the last number is 8 and there are not 8 elements in X. Nevertheless, you'll need to rewrite your code so that you don't overwrite X.val, but instead append to it for each iteration.
I hope that helps.
UPDATE #1
As noted in the comments below, your code now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X[i] <- X[i] - X[i-1]
}
return(X)
}
difference(c(5, 2, 9, 4, 8))
We are now very, very close to a final solution. We just need to address a quick problem.
The problem is that we're now overriding our value of X, which is bad. Since our numbers, c(5,2,9,4,8), are passed into the function as the variable X, the line X[i] <- X[i] - X[i-1] will start to override our values. So, stepping through one iteration at a time, we get the following:
Step 1:
i gets set to 2
X[2] is currently equal to 2
We then run the line X[i] <- X[i] - X[i-1], which gets evaluated like this: X[2] <- X[2] - X[1] --> X[2] <- 2 - 5 --> X[2] <- -3
X[2] is now set to -3
Step 2:
i gets set to 3
X[3] is currently equal to 9
We then run the X[i] <- X[i] - X[i-1], which gets evaluated like this: X[3] <- X[3] - X[2] --> X[3] <- 9 - -3 --> X[3] <- 12
X[3] is now set to 12
As you can see from the first two iterations, we're overwriting our X variable, which is directly impacting the differences we get when we run our function.
To solve this, we simply go back to using X.val, like we were before. Since this variable has no values, there's nothing to be overwritten. Our function now looks like this:
difference <- function(X) {
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Now, for each iteration, nothing is overwritten and our values of X stay in tact. There are two problems that we're going to have though. If we run this new code, we'll end up with an error telling us that x.diff doesn't exist. Earlier, I told you that you can index a variable that you're making, which is true. We just have to tell R that the variable we're making is a variable first. There are several ways to do this, but the second best way to do it is to create a variable with the same class as our expected output. Since we know we want our output to be a list of numbers, we can just make X.val a numeric vector. Our code now looks like this:
difference <- function(X) {
X.val <- numeric()
for (i in 2:length(X)) {
X.val[i] <- X[i] - X[i-1]
}
return(X.val)
}
Notice that the assignment of X.val happens before we enter the for loop. As an exercise, you should think about why that's the case and then try moving it inside of the for loop and seeing what happens.
So this, solves our first problem. Try running the code and seeing what you get. You'll notice that the first element of the output is NA. Why might this be the case, and how can we fix it? Hint: it has to do with the value of i.
UPDATE #2
So now that we have the correct answer, let's look at a couple tips and tricks that are available thanks to R. R has some inherent features that it can use on vectors. To see this action, run the following example:
a <- 1:10
b <- 11:20
a + b
a - b
a * b
a / b
As you can see, R will automatically perform what is called "element wise" operations for vectors. You'll notice that a - b is pretty similar to what we were trying to do here. The difference is that a and b are two different vectors and we were dealing with one vector at a time. So how do we set up our problem to work like this? Simple: we create two vectors.
x <- c(5, 2, 9, 4, 8)
y <- x[2:length(x)]
z <- x[1:(length(x)-1)]
y - z
You should notice that y - z now gives us the answer that we wanted from our function. We can apply that to our difference function like so:
difference <- function(X) {
y <- X[2:length(X)]
z <- X[1:(length(X)-1)]
return(y-z)
}
Using this trick, we no longer need to use a for loop, which can be incredibly slow in R, and instead use the vectorized operation, which is incredibly fast in R. As was stated in the comments, we can actually skip the step of assignin those values to y and z and can instead just directly return what we want:
difference <- function(X) {
return(X[2:length(X)] - X[1:(length(X)-1)])
}
We've now just successfully created a one-line function that does what we were hoping to do. Let's see if we can make it even cleaner. R comes with two functions that are very handy for looking at data: head() and tail(). head allows you to look at the first n number of elements and tail allows you to look at the last n number of elements. Let's see an example.
a <- 1:50
head(a) # defaults to 6 elements
tail(a) # defaults to 6 elements
head(a, n=20) # we can change how many elements to return
tail(a, n=20)
head(a, n=-1) # returns all but the last element
tail(a, n=-1) # returns all but the first element
Those last two are the most important for what we want to do. In our newest version of difference we were looking at X[2:length(X)], which is another way of saying "all elements in X except the first element". We were also looking at X[1:(length(X)-1)], which is another way of saying "all elements in X except the last element". Let's clean that up:
difference <- function(X) {
return(tail(X, -1) - head(X, -1))
}
As you can see, that's a much cleaner way of defining our function.
So those are the tricks. Let's look at a couple tips. The first is to drop the return from simple functions like this. R will automatically return the last command if a function if it's not an assignment. To see this in action, try running the two different functions:
difference_1 <- function(X) {
x.diff <- tail(X, -1) - head(X, -1)
}
difference_1(1:10)
difference_2 <- function(X) {
tail(X, -1) - head(X, -1)
}
difference_2(1:10)
In difference_1 you'll notice that nothing is returned. This is because the command is an assignment command. You could force it to return a value by using the return command.
The next tip is something you won't need for a while, but it's important. Going back to the current version of difference that we have (the code you're using now, not anything I've mentioned in this update), we assign values to X.val, which causes it to "grow" over time. To see what this means, run the following code:
x.val <- numeric()
length(x)
x.val[1] <- 1
length(x)
x.val[2] <- 2
length(x)
You'll see that the length keeps growing. This is often a point of huge slowdowns in R code. The proper way to do this is to create x.val with a length equal to how big we need it. This is much, much faster and will save you some pains in the future. Here's how it would work:
difference <- function(X) {
x.val <- numeric(length=(length(X) - 1))
for (i in 2:length(X)) {
x.val[i-1] <- X[i] - X[i-1]
}
return(x.val)
}
In our current code, this doesn't make a real difference. But if you're dealing with very large data in the future, this can you hours or even days of computing time.
I hope this all helps you better understand some of the functionality in R. Good luck with everything!

Related

Problem with checking logical within for loop

Inspired by the leetcode challenge for two sum, I wanted to solve it in R. But while trying to solve it by brute-force I run in to an issue with my for loop.
So the basic idea is that given a vector of integers, which two integers in the vector, sums up to a set target integer.
First I create 10000 integers:
set.seed(1234)
n_numbers <- 10000
nums <- sample(-10^4:10^4, n_numbers, replace = FALSE)
The I do a for loop within a for loop to check every single element against eachother.
# ensure that it is actually solvable
target <- nums[11] + nums[111]
test <- 0
for (i in 1:(length(nums)-1)) {
for (j in 1:(length(nums)-1)) {
j <- j + 1
test <- nums[i] + nums[j]
if (test == target) {
print(i)
print(j)
break
}
}
}
My problem is that it starts wildly printing numbers before ever getting to the right condition of test == target. And I cannot seem to figure out why.

I think there are several issues with your code:
First, you don't have to increase your j manually, you can do this within the for-statement. So if you really want to increase your j by 1 in every step you can just write:
for (j in 2:(length(nums)))
Second, you are breaking only the inner-loop of the for-loop. Look here Breaking out of nested loops in R for further information on that.
Third, there are several entries in nums that gave the "right" result target. Therefore, your if-condition works well and prints all combination of nums[i]+nums[j] that are equal to target.

Finding the value of infinite sums in r

I'm very new to r and programming so please stay with me :)
I am trying to use iterations to find the value of infinite iterations to the 4th decimal place. I.e. where the 4th decimal does not change. so 1.4223, where 3 does not change anymore so the result to 3 decimal place is 1.422.
The link above shows an example of a similar problem that I am faced with. My question is how do I create a for-loop that goes to infinity and find the value where the 4th decimal point stops changing?
I have tried using while loops but I am not sure how to stop it from just looping forever. I need some if statement like below:
result <- 0
i <- 1
d <- 1e-4
while(TRUE)
{
result <- result + (1/(i^2))
if(abs(result) < d)
{
break
}
i <- i + 1
}
result

Here's an example: to do the infinite loop, use while(TRUE) {}, and as you suggested use an if clause and break to stop when necessary.
## example equation shown
## fun <- function(x,n) {
## (x-1)^(2*n)/(n*(2*n-1))
## }
## do it for f(x)=1/x^2 instead
## doesn't have any x-dependence, but leave it in anyway
fun <- function(x,n) {
1/n^2
}
n <- 1
## x <- 0.6
tol <- 1e-4
ans <- 0
while (TRUE) {
next_term <- fun(x,n)
ans <- ans + next_term
if (abs(next_term)<tol) break
n <- n+1
}
When run this gives ans=1.635082, n=101.
R also has a rarely used repeat { } keyword, but while(TRUE) will probably be clearer to readers
there are more efficient ways to do this (i.e. calculating the numerator by multiplying it by (x-1)^2 each time)
it's generally a good idea to test for a maximum number of iterations as well so that you don't set up a truly infinite loop if your series doesn't converge or if you have a bug in your code
I haven't solved your exact problem (chose a smaller value of tol), but you should be able to adjust this to get an answer
as discussed in the answer to your previous question, this isn't guaranteed, but should generally be OK; you can check (I haven't) to be sure that the particular series you want to evaluate has well-behaved convergence

what is the most efficient way to find the most common value in a vector?

I'm trying to create a function to solve this puzzle:
An Arithmetic Progression is defined as one in which there is a constant difference between the consecutive terms of a given series of numbers. You are provided with consecutive elements of an Arithmetic Progression. There is however one hitch: exactly one term from the original series is missing from the set of numbers which have been given to you. The rest of the given series is the same as the original AP. Find the missing term.
You have to write the function findMissing(list), list will always be at least 3 numbers. The missing term will never be the first or last one.
The next section of code shows my attempt at this function. The site i'm on runs tests against the function, all of which passed, as in they output the correct missing integer.
The problem i'm facing is it's giving me a timeout error, because it takes to long to run all the tests. There are 102 tests and it's saying it takes over 12 seconds to complete them. Taking more than 12 seconds means the function isn't efficient enough.
After running my own timing tests in RStudio it seems running the function would take considerably less time than 12 seconds to run but regardless i need to make it more efficient to be able to complete the puzzle.
I asked on the site forum and someone said "Sorting is expensive, think of another way of doing it without it." I took this to mean i shouldn't be using the sort() function. Is this what they mean?
I've since found a few different ways of getting my_diff which is calculated using the sort() function. All of these ways are even less efficient than the original way of doing it.
Can anyway give me a more efficient way of doing the sort to find my_diff or maybe make other parts of the code more efficient? It's the sort() part which is apparently the inefficient part of the code though.
find_missing <- function(sequence){
len <- length(sequence)
if(len > 3){
my_diff <- as.integer(names(sort(table(diff(sequence)), decreasing = TRUE))[1])
complete_seq <- seq(sequence[1], sequence[len], my_diff)
}else{
differences <- diff(sequence)
complete_seq_1 <- seq(sequence[1],sequence[len],differences[1])
complete_seq_2 <- seq(sequence[1],sequence[len],differences[2])
if(length(complete_seq_1) == 4){
complete_seq <- complete_seq_1
}else{
complete_seq <- complete_seq_2
}
}
complete_seq[!complete_seq %in% sequence]
}
Here are a couple of sample sequences to check the code works:
find_missing(c(1,3,5,9,11))
find_missing(c(1,5,7))
Here are some of the other things i tried instead of sort:
1:
library(pracma)
Mode(diff(sequence))
2:
library(dplyr)
(data.frame(diff_1 = diff(sequence)) %>%
group_by(diff_1) %>%
summarise(count = n()) %>%
ungroup() %>%
filter(count==max(count)))[1]
3:
MaxTable <- function(sequence, mult = FALSE) {
differences <- diff(sequence)
if (!is.factor(differences)) differences <- factor(differences)
A <- tabulate(differences)
if (isTRUE(mult)) {
as.integer(levels(differences)[A == max(A)])
}
else as.integer(levels(differences)[which.max(A)])
}

Here is one way to do this using seq. We can create a sequence from minimum value in sequence to maximum value in the sequence having length as length(x) + 1 as there is exactly one term missing in the sequence.
find_missing <- function(x) {
setdiff(seq(min(x), max(x), length.out = length(x) + 1), x)
}
find_missing(c(1,3,5,9,11))
#[1] 7
find_missing(c(1,5,7))
#[1] 3

This approach takes the diff() of the vector - there will always be one difference higher than the others.
find_missing <- function(x) {
diffs <- diff(x)
x[which.max(diffs)] + min(diffs)
}
find_missing(c(1,3,5,9,11))
[1] 7
find_missing(c(1,5,7))
[1] 3

There is actually a simple formula for this, which will work even if your vector is not sorted...
find_missing <- function(x) {
(length(x) + 1) * (min(x) + max(x))/2 - sum(x)
}
find_missing(c(1,5,7))
[1] 3
find_missing(c(1,3,5,9,11,13,15))
[1] 7
find_missing(c(2,8,6))
[1] 4
It is based on the fact that the sum of the full series should be the average value times the length.

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)

I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!

Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Explaining a for loop in R

I'm very new to R, and much more new to programming in R. I have the following question and its answer (which is not mine). I've trying to understand why some values, from where they are obtained, why they are used, etc.
Question: Make the vector 3 5 7 9 11 13 15 17 with a for loop. Start
with x=numeric() and fill this vector with the for loop
I know I have to create x=numeric() so I can fill it with the result obtained from the loop.
The answer from a classmate was:
> x <- numeric()
> for(i in 1:8){
if(i==1){ ## Why ==1 and not 0, or any other value
x[i] <- 3
}else{
x[i] <- x[i-1]+2 ### And why i-1
}
I'm having similar problems in questions like:
Make a for loop that adds the second element of a vector to the first,
subtracts the third element from the result, adds the fourth again and
so on for the entire length of the vector
So far, I created the vector and the empty vector
> y = c(5, 10, 15, 20, 25, 30)
> answer <- 0
And then, when I try to do the for loop, I get stuck here:
for(i in 1:length(y)){
if(i...){ ### ==1? ==0?
answer = y[i] ###and here I really don't know how to continue.
}else if()
}
Believe me when I tell you I've read several replies to questions here, like in How to make a vector using a for loop, plus pages and pages about for loop, but cannot really figure how to solve these (and other) problems.
I repeat, I'm very new, so I'm struggling trying to understand it. Any help would be much appreciated.

First, I will annotate the loop to answer what the loop is doing.
# Initialize the vector
x <- numeric()
for(i in 1:8){
# Initialize the first element of the vector, x[1]. Remember, R indexes start at 1, not 0.
if(i==1){
x[i] <- 3
} else {
# Define each additional element in terms of the previous one (x[i - 1]
# is the element of x before the current one.
x[i] <- x[i-1]+2 ### And why i-1
}
}
A better solution that uses a loop and grows it (like the instructions state) is something like this:
x <- numeric()
for(i in 1:8){
x[i] <- 2 * i + 1
}
This is still not a good way to do things because growing a vector inside a loop is very slow. To fix this, you can preallocate the vector by telling numeric the length of the vector you want:
x <- numeric(8)
The best way to solve this would be:
2 * 1:8 + 1
using vectorized operations.
To help you solve your other problem, I suggest writing out each step of the loop as a table. For example, for my solution, the table would be
i | x[i]
------------------
1 | 2 * 1 + 1 = 3
2 | 2 * 2 + 1 = 5
and so on. This will give you an idea of what the for loop is doing at each iteration.

This is intentionally not an answer because there are better ways to solve the alternating sign summation problem than a for-loop. I suppose there could be value in getting comfortable with for-loops but the vectorized approaches in R should be learned as well. R has "argument recycling" for many of its operations, including the "*" (multiplication) operation: Look at:
(1:10)*c(1,-1)
Then take an arbitrary vector, say vec and try:
sum( vec*c(1,-1) )
The more correct answer after looking at that result would be:
vvec[1] + sum( vec[-1]*c(1,-1) )
Which has the educational advantage of illustrating R's negative indexing. Look up "argument recycling" in your documentation. The shorter objects are automagically duplicatied/triplicated/however-many-needed-cated to exactly match the length of the longest vector in the mathematical or logical expression.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Struggling creating a difference function - r

Related

Problem with checking logical within for loop

Finding the value of infinite sums in r

what is the most efficient way to find the most common value in a vector?

Indexing variables in R

Explaining a for loop in R

Categories

Resources