R: populate a vector incrementally - r

I'm trying to write a short function that assigns values and populates incrementally a vector, based on values in another vector.
For instance, if I have a vector of binaries a = [0,1,1,0,1], I want to create a vector b of the same length as a, that assigns a value x if a[1]=0, or a value y if a[1]=1. So b = [0.4,0.6,0.6,0.4,0.6]
I have done this:
a<-sample(0:1,20,replace=T)
assign<-function(x){
c<-vector()
for (i in 1:length(x)){
ifelse (x[[i]]>0,b<-0.6,b<-0.4)
c[[length(c)+1]]=b}
return (b)
}
but then
assign(a)
only returns the first assignment. I assume I didn't nest the loop correctly?

As you state that your vector a is binary, you can turn it into a vector of indices and use that "property":
bfroma <- function(x) c(0.4, 0.6)[x+1]
a <- c(0, 1, 1, 0, 1)
bfroma(a)
#[1] 0.4 0.6 0.6 0.4 0.6

Some comments on your code:
it is not advised to do ifelse (x[[i]] > 0, b <- 0.6, b <- 0.4); ifelse is not used as this (you'd better check ?ifelse again). Use b <- ifelse (x[[i]] > 0, 0.6, 0.4).
I think you want return(c) rather than return(b);
use a different function name, assign will mask R's built-in one.
Anyway, I figured that the whole function can be replaced by
function (x) ifelse(x > 0, 0.6, 0.4)
or
function (x) {x <- 0.4; x[x > 0] <- 0.6; x}
For your particular case where input vector is strictly 0-1 binary, we can do better. Cath has pointed out already, by indexing only:
function (x) c(0.4, 0.6)[x + 1L]
More generally, as long as x is discrete, we can use match to get position index and use fast replacement, too, but I will not elaborate on that here.

Related

Simple function in r

I've been trying to create a very simple function. Essentially I want every element in t$C changed according to the if then statement in my code, and others stay the same. So here's my code:
set.seed(20)
x1=rnorm(100)
x2=rnorm(100)
x3=rnorm(100)
t=data.frame(a=x1,b=x1+x2,c=x1+x2+x3)
fun1=function(multi1,multi2)
{
v=t$c
s=c()
for (i in v)
{
if (i<0)
{
s[i]=i*multi1
}
else if(i>0)
{
s[i]=i*multi2
}
}
return(s)
}
fun1(multi1=0.5,multi2=2)
But it gave me just a few numbers. I felt I might made some stupid mistakes but I couldn't figure out.
tl;dr This operation can be vectorized. You can use the following method, assuming you want to leave values that are 0 or NA alone.
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
If you want to include them in one side (e.g. on the positive side), it's even more simple.
with(t, c * ifelse(c < 0, 0.5, 2))
As far as your loop goes, you've got a few issues there.
First, you were indexing s by decimal values, which would likely cause errors in the calculations. This is also the reason why your result vector was so short. When you indexed in the loop, the indices were moved to integer values and since some of them were repeated, s ended up being very short.
The actual unique index length went something like this -
length(unique(as.integer(t$c)))
# [1] 9
And as a result you got, as a simple example,
s[c(1, 2, 1, 1)] <- something
Since 1 is repeated, only indices 1 and 2 were changed. This is what was happening in your loop. Further illustrated as
x <- 1:5
x[1.2]
# [1] 1
x[1.99]
# [1] 1
Next, notice below that we have allocated the vector s. We can do that because we know the length of the resulting vector will be the same as v. This is the recommended, more efficient way rather than building the vector in the loop.
Moving on, I changed for(i in v) to for(i in seq_along(v)) to correct this. Now we are indexing with a sequence for i. Then we also need to index v in the same manner. Finally, we can assign s[i] <- if(... instead of assigning to the same index inside the if() statement.
Also note that you haven't accounted for 0 or any other values that may appear in v (like NA). I added a final else where we just leave those values alone. Change that as you see necessary. Furthermore, instead of going to the global environment to get t$c, we can pass it as an argument and make this function more general (credit to #ShawnMehan for that suggestion). Here's the revised version:
fun1 <- function(vec, multi1, multi2) {
s <- vector("numeric", length(vec))
for (i in seq_along(vec)) {
s[i] <- if (vec[i] < 0) {
vec[i] * multi1
} else if(vec[i] > 0) {
vec[i] * multi2
} else {
vec[i]
}
}
return(s)
}
So now we have a length 100 result
x <- fun1(t$c, 0.5, 2)
str(x)
# num [1:100] 2.657 -0.949 7.423 -0.749 5.664 ...
I wrote this long explanation because I figure you are learning how to write a loop. In R though, we can vectorize this entire operation and put it into one line of code. The following line gives the same result as fun1(t$c, 0.5, 2).
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
Thanks to #Frank for catching my calculation oversight.
Hopefully this all makes sense. Sometimes I don't do well with explanations and technical jargon. If there are any questions, please comment.

How can I address the values in a vector based on start and stop indexes from other vectors?

Let's say I have a vector full of zeros:
x <- rep(0, 100)
I want to set the values in certain ranges to 1:
starts <- seq(10, 90, 10)
stops <- starts + round(runif(length(starts), 1, 5))
I can do this with a for loop:
for(i in seq_along(starts)) x[starts[i]:stops[i]] <- 1
But I know this is frowned upon in R. How can I do this in a vectorized way, ideally without an external package?
You can use Map() to get all of the indices, Reduce(union, ...) to drop that list down to an atomic vector of the unique indices and then [<- or replace() to replace.
replace(x, Reduce(union, Map(":", starts, stops)), 1L)
Or
x[Reduce(union, Map(":", starts, stops))] <- 1L
Additionally, for() loops are not necessarily "frowned upon" in R. It depends on the situation. Many times for() loops turn out to be the most efficient route.
A solution that uses apply:
x[unlist(apply(cbind(starts, stops), 1, function(x) x[[1]]:x[[2]]))] <- 1
starts <- seq(10, 90, 1)
change_index <- starts[starts %% 10 <= 5]
x[change_index] <- 1

return the index of a vector when the difference between the index and value satisfies a condition in r

I have been having trouble phrasing this question, so if anyone can edit it up to standard that would be great.
I have a vector that looks like this:
x <- c(1, 2, 5)
How do i return the last index where the difference between the value of the vector in that position and the position is = 0.
In this case, I would like to have
2
as the difference between the value of the vector and its position for the third element is > 0
x[3]-3.
As a side note, this is part of a larger function, where the vector 'x' was built as a vector of values that satisfy a condition (being outside of a range). In this example, the vector 'x' was built as the indexes of the vector
y <- c(1, -0.544099347708607, 0.0330854828196116, 0.126862586350202, -0.189999318205021, 0.0709946572904202, -0.0290039765997793, 0.12201693346217, -0.120410983904152, 0.0974094609584081, -0.119147919464352, 0.0154264136176002, 0.115102403861495, -0.145980255860186, 0.116998886386955, -0.137041816761002, 0.114352714471954, 0.0228895094121642, -0.0679735427311049, 0.0350071153004831, -0.0145366468920295)
Which are outside of the range (-.18, .18)
plot.ts(y)
abline(h = 0.18)
abline(h = -0.18)
You can use the Position function:
Position(function(x) {x == 0}, x - 1:length(x), right=T)
See http://stat.ethz.ch/R-manual/R-devel/library/base/html/funprog.html for more functions.
Or as #Frank said below,
Position(`!`, x - 1:length(x), right=T)
This is because 0 is falsey and other numbers are truthy.
I think the simplest approach is to test equality, not test for the difference being zero:
tail(which(x==seq_along(x)),1)
# 2
Here is another approach:
index <- 1:length(x)
max(which(x - index == 0))
#[1] 2
Or as the other Frank points out, you could test for equality instead of the difference being 0.
max(which(x == index))
One can also try this
tail(which((1:length(x)-x)==0),1)

R - classification the number - assign labels

How to convert the numeric data to string, not the datatype change, but the classification in R? Say, I got 100 numbers 0:1, and if it's > 0.5, then I need to assign a name of "Good", otherwise it's "Bad".
You could try
nums <- seq(0,1, by = .01)
res <- c('Bad', 'Good')[(nums > 0.5)+1]
Do you wish to do it using factors?
a=runif(100, 0, 1) > 0.5
b=factor(a, c(FALSE,TRUE), labels=c("Bad","Good"))
c=as.character(b)
Alternatively, if you just want to change the names in the vector, a, then:
a=runif(100, 0, 1) > 0.5
c=ifelse(a,"Good","Bad")
names(a)=c

Check if elements in a vector are drawn exclusively from another vector

I have an R list with numeric vectors of different lengths. Something like this.
l = list(a = c(0, 1, 2), b = c(0, 1), c = c(0, 1, NA), d = c(0, 1, 5))
I want to identify the vectors that have values of 0, 1, or NA and, therefore, can be converted to logical vectors. In the above example, I would identify vectors b and c.
To do this, I am going to attempting something like this.
is.logical.vector = lapply(l, FUNCTION_NAME)
But I'm not sure what function to use in place of FUNCTION_NAME (that's just a placeholder for illustrative purposes). I need something that can take a vector like allowed = c(0, 1, NA) and ensure that only the values in allowed are represented in the elements of a vector (like those in list l).
Do you know if such a function exists? Alternatively, do you know how I could construct such a function without an explicit for loop? Thank you in advance!
By the sounds of it, you are looking for a combination of all and %in%:
vapply(l, function(z) all(z %in% c(0, 1, NA)), logical(1L))
# a b c d
# FALSE TRUE TRUE FALSE
Alternatively, you can use lapply:
lapply(l, function(z) all(z %in% c(0, 1, NA)))
FYI, as.logical(5) or even as.logical(-5) also evaluate to TRUE, so your condition "therefore, can be converted to logical vectors" doesn't quite match what you actually seem to be asking for :-)

Resources