I've been trying to create a very simple function. Essentially I want every element in t$C changed according to the if then statement in my code, and others stay the same. So here's my code:
set.seed(20)
x1=rnorm(100)
x2=rnorm(100)
x3=rnorm(100)
t=data.frame(a=x1,b=x1+x2,c=x1+x2+x3)
fun1=function(multi1,multi2)
{
v=t$c
s=c()
for (i in v)
{
if (i<0)
{
s[i]=i*multi1
}
else if(i>0)
{
s[i]=i*multi2
}
}
return(s)
}
fun1(multi1=0.5,multi2=2)
But it gave me just a few numbers. I felt I might made some stupid mistakes but I couldn't figure out.
tl;dr This operation can be vectorized. You can use the following method, assuming you want to leave values that are 0 or NA alone.
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
If you want to include them in one side (e.g. on the positive side), it's even more simple.
with(t, c * ifelse(c < 0, 0.5, 2))
As far as your loop goes, you've got a few issues there.
First, you were indexing s by decimal values, which would likely cause errors in the calculations. This is also the reason why your result vector was so short. When you indexed in the loop, the indices were moved to integer values and since some of them were repeated, s ended up being very short.
The actual unique index length went something like this -
length(unique(as.integer(t$c)))
# [1] 9
And as a result you got, as a simple example,
s[c(1, 2, 1, 1)] <- something
Since 1 is repeated, only indices 1 and 2 were changed. This is what was happening in your loop. Further illustrated as
x <- 1:5
x[1.2]
# [1] 1
x[1.99]
# [1] 1
Next, notice below that we have allocated the vector s. We can do that because we know the length of the resulting vector will be the same as v. This is the recommended, more efficient way rather than building the vector in the loop.
Moving on, I changed for(i in v) to for(i in seq_along(v)) to correct this. Now we are indexing with a sequence for i. Then we also need to index v in the same manner. Finally, we can assign s[i] <- if(... instead of assigning to the same index inside the if() statement.
Also note that you haven't accounted for 0 or any other values that may appear in v (like NA). I added a final else where we just leave those values alone. Change that as you see necessary. Furthermore, instead of going to the global environment to get t$c, we can pass it as an argument and make this function more general (credit to #ShawnMehan for that suggestion). Here's the revised version:
fun1 <- function(vec, multi1, multi2) {
s <- vector("numeric", length(vec))
for (i in seq_along(vec)) {
s[i] <- if (vec[i] < 0) {
vec[i] * multi1
} else if(vec[i] > 0) {
vec[i] * multi2
} else {
vec[i]
}
}
return(s)
}
So now we have a length 100 result
x <- fun1(t$c, 0.5, 2)
str(x)
# num [1:100] 2.657 -0.949 7.423 -0.749 5.664 ...
I wrote this long explanation because I figure you are learning how to write a loop. In R though, we can vectorize this entire operation and put it into one line of code. The following line gives the same result as fun1(t$c, 0.5, 2).
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
Thanks to #Frank for catching my calculation oversight.
Hopefully this all makes sense. Sometimes I don't do well with explanations and technical jargon. If there are any questions, please comment.
Related
I need to write an algorithm that gives you any number n in base 3 in R.
So far I wrote that:
NameOfTheFunction <- function(n) { while (n != 0) {
{q<- n%/%3}
{r <- n%%3}
{return(r)}
q<- n } }
My problem is that I now need to stock every r in a vector. I've never done that and don't quite know how to handle it. I tried to find some things on the internet but I did not find anything really relevant to this particular situation.
After your function, use:
sapply(vector, FUN=function(n) return(NameOfTheFunction(n)))
What sapply does is, for a given vector of your choice, it will repeat the function NameOfTheFunction(n) using every element in your vector in place of n in the function. The result, in this case, will be a vector of every output from your vector.
For example:
vector <- c(10, 100, 1000, 10000)
NameOfTheFunction <- function(n) { while (n != 0) {
{q<- n%/%3}
{r <- n%%3}
{return(r)}
q<- n } }
sapply(vector, NameOfTheFunction)
[1] 1 1 1 1
I have an elementary question that I sadly cannot figure out. I have a set of numeric vector of 1s and 0s that are stored in the return variable below and whose sums are stored in the totals variable. I would like to check each of these individual vectors to see if there were consecutive zeroes in the result, and then return the total number of times this occurred. However, I'm quite rusty and/or bad at for loops/functions and cannot get this result. My latest attempt is below. Any suggestions are welcome - appreciate the help.
set.seed(1)
return = ifelse(runif(10) <= 0.6, 1, 0)
totals = sapply(1:10, function (x) sum(ifelse(runif(10)<=0.6,1,0)))
sums = function (x) {
g = 0
for (i in 1:length(x)-1) {
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Although this is not the most efficient way to do so (see akrun's answer), we can get your for loop to work:
sums=function (x)
{
g=0
# watch your brackets! 1:3-1 returns c(0,1,2), not c(1,2)!
for (i in 1:length(x)-1)
{
# To test for equality, use a double ==, rather than a single.
# also, your 'g' variable is not updated, which is what you want to do.
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Corrected:
sums <-function(x)
{
g=0
for (i in 1:(length(x)-1))
{
g= g+ifelse(x[i]+x[i+1]==0,1,0)
}
return (g)
}
You can call your function by:
return=ifelse(runif(10)<=0.6,1,0)
sums(return)
Or to generate ten vectors with random 1's and 0's, and apply your function to them, you could do:
totals = lapply(1:10, function (x) ifelse(runif(10) <= 0.6, 1, 0))
sapply(totals,sums)
Hope this helps!
If we are looking for the number of times consecutive 0's occur (i.e. greater than 1) and its length, then use rle
with(rle(return), lengths[values==0 & lengths > 1])
#[1] 4
The return vector is
return
#[1] 1 1 1 0 1 0 0 0 0 1
Now, we can see the 4 consecutive number of 0's. Just to show that the answer matches the initial vector
A for loop (incorrect answer just for the sake of answering)
sums <- function (x) {
g <- 0
for (i in tail(seq_along(x), -1)) {
if(x[i-1]==0 & x[i]==0) {
g <- g+1
}
}
g
}
sums(return)
So I have a Data frame in R where one column is a variable of a few factors and I want to create a handful of dummy variables for each factor but when I write a loop to do this I get an error.
So for example if the column is made up of various factors a, b, c and I want to code a dummy variable of 1 or 0 for each one, the code I have to create one is:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[,1] == "a") {
h[i] = 1
} else {
h[i] = 0
}
}
cbind(data, h)
This gives me the error message "the condition has length > 1 and only the first element will be used" I have seen in other places on this site saying I should try and write my own function to solve problems and avoid for loops and I don't really understand a) how to solve this by writing a function (at least immediately) b)the benefit of doing this as a function rather than with loops.
Also I ended up using the ifelse statement to create each vector and then cbind to add it to the data frame but an explanation would really be appreciated.
Change if (data[,1] == "a") { to if (data[i,1] == "a") {
Aakash is correct in pointing out the problem in your loop. Your test is
if (data[,1] == "a")
Since your test doesn't depend on i, it will be the same for every iteration. You could fix your loop like this:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
} else {
h[i] = 0
}
}
We could even simplify, since h is initialized to 0, there is no need to set it to 0 in the else case, we can just move on:
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
}
}
A more substantial improvement would be to introduce vectorization. This will speed up your code and is usually easier to write once you get the hang of it. if can only check a single condition, but ifelse is vectorized, it will take a vector of tests, a vector of "if true" results, a vector of "if false" results, and combine them:
h = ifelse(data[, 1] == "a", 1, 0)
With this, there is no need to initialize h before the statement, and we could add it directly to a data frame:
data$h = ifelse(data[, 1] == "a", 1, 0)
In this case, your test case and results are so simple, that we can do even better.
data[, 1] == "a" ## run this and look at the output
The above code is just a boolean vector of TRUE and FALSE. If we run as.numeric() on it TRUE values will be coerced to 1s and FALSE values will be coerced to 0s. So we can just do
data$h = as.numeric(data[, 1] == "a")
which will be even more efficient than ifelse.
This operation is so simple that there is no benefit in writing a function to do it.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Given an ordered vector vec <- c(1, 4, 6, 3, 2, 7), I want to compute for each element i of vec the weighted average of the previous elements where the weight is the inverse of the distance from the element i.
The function should proceed as following.
For the first element 1, should return NA (no previous element).
For the second element 4, should return 1.
For the third element 6, should return weighted.mean(x = c(1,4), w
= c(1,2)).
For the fourth element 3, should return weighted.mean(x =
c(1,4,6), w = c(1,2,3))
The resulting vector result should be, with length(result) == length(vec), c(NA, 1, 3, 4.5, 3.9, 3.266667).
UPDATE:
I clearly mean without using a loop
result <- numeric()
for (i in 1:length(vec)) {
if (i == 1) {
result <-
c(result, NA)
} else {
previous_elements <- vec[1:(i-1)]
result <-
c(result,
weighted.mean(x = previous_elements, w = 1:length(previous_elements)))
}
}
Here's a naive implementation. Create a function that does what you say; the only 'clever' thing is to use the function seq_len() instead of 1:i to generate the indexes
fun = function(i, vec)
weighted.mean(head(vec, i - 1), w=seq_len(i - 1))
and then use it in sapply
sapply(seq_along(vec), fun, vec)
This is good enough -- NaN as the first element, rather than NA, but that's easily corrected after the fact (or conceptually accepted as the right answer). It's also better than your solution, but still 'using a loop' -- the management of the result vector is done by sapply(), rather than in your loop where you have to manage it yourself. And in particular your 'copy and append' approach is very bad performance-wise, making a copy of the existing result each time through the loop. It's better to pre-allocate a result vector of the appropriate length result = numeric(length(vec)) and then fill it result[[i]] = ..., and better still to just let sapply() do the right thing for you!
The problem is that the naive implementation scales quadratically -- you make a pass along vec to process each element, and then for each element you make a second pass to calculate the weighted mean, so there are n (n - 1) / 2 calculations. So...
Take a look at weighted.mean
> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
## SNIP -- edited for brevity
w <- as.double(w)
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}
and use cumsum() instead of sum() to get the cumulative weights, rather than the individual weights, i.e., return a vector as long as x, where the ith element is the weighted mean up to that point
cumweighted.mean <- function(x, w) {
## handle NA values?
w <- as.numeric(w) # to avoid integer overflow
cumsum(x * w)[w != 0] / cumsum(w)
}
You'd like something a little different
myweighted.mean <- function(x)
c(NA, cumweighted.mean(head(x, -1), head(seq_along(x), - 1)))
This makes a single pass through the data, so scales linearly (at least in theory).
I followed the discussion over HERE and am curious why is using<<- frowned upon in R. What kind of confusion will it cause?
I also would like some tips on how I can avoid <<-. I use the following quite often. For example:
### Create dummy data frame of 10 x 10 integer matrix.
### Each cell contains a number that is between 1 to 6.
df <- do.call("rbind", lapply(1:10, function(i) sample(1:6, 10, replace = TRUE)))
What I want to achieve is to shift every number down by 1, i.e all the 2s will become 1s, all the 3s will be come 2 etc. Therefore, all n would be come n-1. I achieve this by the following:
df.rescaled <- df
sapply(2:6, function(i) df.rescaled[df.rescaled == i] <<- i-1))
In this instance, how can I avoid <<-? Ideally I would want to be able to pipe the sapply results into another variable along the lines of:
df.rescaled <- sapply(...)
First point
<<- is NOT the operator to assign to global variable. It tries to assign the variable in the nearest parent environment. So, say, this will make confusion:
f <- function() {
a <- 2
g <- function() {
a <<- 3
}
}
then,
> a <- 1
> f()
> a # the global `a` is not affected
[1] 1
Second point
You can do that by using Reduce:
Reduce(function(a, b) {a[a==b] <- a[a==b]-1; a}, 2:6, df)
or apply
apply(df, c(1, 2), function(i) if(i >= 2) {i-1} else {i})
But
simply, this is sufficient:
ifelse(df >= 2, df-1, df)
You can think of <<- as global assignment (approximately, because as kohske points out it assigns to the top environment unless the variable name exists in a more proximal environment). Examples of why this is bad are here:
Examples of the perils of globals in R and Stata