I have an elementary question that I sadly cannot figure out. I have a set of numeric vector of 1s and 0s that are stored in the return variable below and whose sums are stored in the totals variable. I would like to check each of these individual vectors to see if there were consecutive zeroes in the result, and then return the total number of times this occurred. However, I'm quite rusty and/or bad at for loops/functions and cannot get this result. My latest attempt is below. Any suggestions are welcome - appreciate the help.
set.seed(1)
return = ifelse(runif(10) <= 0.6, 1, 0)
totals = sapply(1:10, function (x) sum(ifelse(runif(10)<=0.6,1,0)))
sums = function (x) {
g = 0
for (i in 1:length(x)-1) {
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Although this is not the most efficient way to do so (see akrun's answer), we can get your for loop to work:
sums=function (x)
{
g=0
# watch your brackets! 1:3-1 returns c(0,1,2), not c(1,2)!
for (i in 1:length(x)-1)
{
# To test for equality, use a double ==, rather than a single.
# also, your 'g' variable is not updated, which is what you want to do.
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Corrected:
sums <-function(x)
{
g=0
for (i in 1:(length(x)-1))
{
g= g+ifelse(x[i]+x[i+1]==0,1,0)
}
return (g)
}
You can call your function by:
return=ifelse(runif(10)<=0.6,1,0)
sums(return)
Or to generate ten vectors with random 1's and 0's, and apply your function to them, you could do:
totals = lapply(1:10, function (x) ifelse(runif(10) <= 0.6, 1, 0))
sapply(totals,sums)
Hope this helps!
If we are looking for the number of times consecutive 0's occur (i.e. greater than 1) and its length, then use rle
with(rle(return), lengths[values==0 & lengths > 1])
#[1] 4
The return vector is
return
#[1] 1 1 1 0 1 0 0 0 0 1
Now, we can see the 4 consecutive number of 0's. Just to show that the answer matches the initial vector
A for loop (incorrect answer just for the sake of answering)
sums <- function (x) {
g <- 0
for (i in tail(seq_along(x), -1)) {
if(x[i-1]==0 & x[i]==0) {
g <- g+1
}
}
g
}
sums(return)
Related
I new to R and I'm trying to see how many iterations are needed to fill a vector with numbers 1 to 55 (no duplicates) from a random sample using runif.
At the moment, the vector has a lots of duplicates in it and my number of iterations being returned is the size of the vector. So, i'm not sure if my logic is correct.
The aim of the if statement is to check if the value from the sample exists in the vector, and if it does, choose the next one. But i'm not sure if it's correct, since the next number could already exist in the vector. Any help would be much appreciated
numbers=as.integer(runif(800, min=1, max=55)) ## my sample from runif
i=sample(numbers, 1)
## setting up my vector to store 55 unique values (1 to 55)
p=rep(0,55)
## my counters
j=0
n=1
## my while loop
while (p[n] %in% 0){
## if the sample value already exists in the vector, choose the next value from the sample
if (numbers[n] %in% p) {
p[n]=numbers[n+1]
}
else {
p[n] = numbers[n]
}
n = n + 1
j = j + 1
}
I believe that the following is what you want. Instead pf a while loop on p, the while loop should search for a new value in numbers.
set.seed(2021) # make the results reproducible
numbers <- sample(55, 800, TRUE)
## setting up my vector to store 55 unique values (1 to 55)
p <- integer(55)
# assign the elemnts of p one by one
for(j in seq_along(p)){
## if the sample value already exists in the vector,
## choose the next value from the sample
n <- 1
while (numbers[n] %in% p) {
n <- n + 1
}
if(n <= length(numbers)){
p[j] <- numbers[n]
}
}
j
#[1] 55
length(unique(p)) == length(p)
#[1] TRUE
converter2 <- function(odds){
if(grepl("/", odds) == T){
x <- str_split(odds, "/")
y <- as.numeric(x[[1]][1])
z <- as.numeric(x[[1]][2])
a <- (1 / ((y/z) + 1)) * 100
return(a)
}
else{
x <- as.numeric(odds)
x <- 1/(x + 1)
return(x*100)
}
}
This is the code I have been using to create a function that converts a single character fractional odds to a percentage (e.g. if you write "7/2" it will return 22.222, but it doesn't work on a vector, returning the error:
1: In if (grepl("/", odds) == T) { :
the condition has length > 1 and only the first element will be used
Does anyone have a good way to fix this? I was thinking of using an ifelse statement but can't figure out what will work. Thanks.
I have been out of touch with R so bear with me.
sapply(vector, converter2)
This would apply the function on every element and return a vector.
Another option is
Vectorize(converter2)(c("7/22","3/7"))
I need to write an algorithm that gives you any number n in base 3 in R.
So far I wrote that:
NameOfTheFunction <- function(n) { while (n != 0) {
{q<- n%/%3}
{r <- n%%3}
{return(r)}
q<- n } }
My problem is that I now need to stock every r in a vector. I've never done that and don't quite know how to handle it. I tried to find some things on the internet but I did not find anything really relevant to this particular situation.
After your function, use:
sapply(vector, FUN=function(n) return(NameOfTheFunction(n)))
What sapply does is, for a given vector of your choice, it will repeat the function NameOfTheFunction(n) using every element in your vector in place of n in the function. The result, in this case, will be a vector of every output from your vector.
For example:
vector <- c(10, 100, 1000, 10000)
NameOfTheFunction <- function(n) { while (n != 0) {
{q<- n%/%3}
{r <- n%%3}
{return(r)}
q<- n } }
sapply(vector, NameOfTheFunction)
[1] 1 1 1 1
So I have a Data frame in R where one column is a variable of a few factors and I want to create a handful of dummy variables for each factor but when I write a loop to do this I get an error.
So for example if the column is made up of various factors a, b, c and I want to code a dummy variable of 1 or 0 for each one, the code I have to create one is:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[,1] == "a") {
h[i] = 1
} else {
h[i] = 0
}
}
cbind(data, h)
This gives me the error message "the condition has length > 1 and only the first element will be used" I have seen in other places on this site saying I should try and write my own function to solve problems and avoid for loops and I don't really understand a) how to solve this by writing a function (at least immediately) b)the benefit of doing this as a function rather than with loops.
Also I ended up using the ifelse statement to create each vector and then cbind to add it to the data frame but an explanation would really be appreciated.
Change if (data[,1] == "a") { to if (data[i,1] == "a") {
Aakash is correct in pointing out the problem in your loop. Your test is
if (data[,1] == "a")
Since your test doesn't depend on i, it will be the same for every iteration. You could fix your loop like this:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
} else {
h[i] = 0
}
}
We could even simplify, since h is initialized to 0, there is no need to set it to 0 in the else case, we can just move on:
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
}
}
A more substantial improvement would be to introduce vectorization. This will speed up your code and is usually easier to write once you get the hang of it. if can only check a single condition, but ifelse is vectorized, it will take a vector of tests, a vector of "if true" results, a vector of "if false" results, and combine them:
h = ifelse(data[, 1] == "a", 1, 0)
With this, there is no need to initialize h before the statement, and we could add it directly to a data frame:
data$h = ifelse(data[, 1] == "a", 1, 0)
In this case, your test case and results are so simple, that we can do even better.
data[, 1] == "a" ## run this and look at the output
The above code is just a boolean vector of TRUE and FALSE. If we run as.numeric() on it TRUE values will be coerced to 1s and FALSE values will be coerced to 0s. So we can just do
data$h = as.numeric(data[, 1] == "a")
which will be even more efficient than ifelse.
This operation is so simple that there is no benefit in writing a function to do it.
I've been trying to create a very simple function. Essentially I want every element in t$C changed according to the if then statement in my code, and others stay the same. So here's my code:
set.seed(20)
x1=rnorm(100)
x2=rnorm(100)
x3=rnorm(100)
t=data.frame(a=x1,b=x1+x2,c=x1+x2+x3)
fun1=function(multi1,multi2)
{
v=t$c
s=c()
for (i in v)
{
if (i<0)
{
s[i]=i*multi1
}
else if(i>0)
{
s[i]=i*multi2
}
}
return(s)
}
fun1(multi1=0.5,multi2=2)
But it gave me just a few numbers. I felt I might made some stupid mistakes but I couldn't figure out.
tl;dr This operation can be vectorized. You can use the following method, assuming you want to leave values that are 0 or NA alone.
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
If you want to include them in one side (e.g. on the positive side), it's even more simple.
with(t, c * ifelse(c < 0, 0.5, 2))
As far as your loop goes, you've got a few issues there.
First, you were indexing s by decimal values, which would likely cause errors in the calculations. This is also the reason why your result vector was so short. When you indexed in the loop, the indices were moved to integer values and since some of them were repeated, s ended up being very short.
The actual unique index length went something like this -
length(unique(as.integer(t$c)))
# [1] 9
And as a result you got, as a simple example,
s[c(1, 2, 1, 1)] <- something
Since 1 is repeated, only indices 1 and 2 were changed. This is what was happening in your loop. Further illustrated as
x <- 1:5
x[1.2]
# [1] 1
x[1.99]
# [1] 1
Next, notice below that we have allocated the vector s. We can do that because we know the length of the resulting vector will be the same as v. This is the recommended, more efficient way rather than building the vector in the loop.
Moving on, I changed for(i in v) to for(i in seq_along(v)) to correct this. Now we are indexing with a sequence for i. Then we also need to index v in the same manner. Finally, we can assign s[i] <- if(... instead of assigning to the same index inside the if() statement.
Also note that you haven't accounted for 0 or any other values that may appear in v (like NA). I added a final else where we just leave those values alone. Change that as you see necessary. Furthermore, instead of going to the global environment to get t$c, we can pass it as an argument and make this function more general (credit to #ShawnMehan for that suggestion). Here's the revised version:
fun1 <- function(vec, multi1, multi2) {
s <- vector("numeric", length(vec))
for (i in seq_along(vec)) {
s[i] <- if (vec[i] < 0) {
vec[i] * multi1
} else if(vec[i] > 0) {
vec[i] * multi2
} else {
vec[i]
}
}
return(s)
}
So now we have a length 100 result
x <- fun1(t$c, 0.5, 2)
str(x)
# num [1:100] 2.657 -0.949 7.423 -0.749 5.664 ...
I wrote this long explanation because I figure you are learning how to write a loop. In R though, we can vectorize this entire operation and put it into one line of code. The following line gives the same result as fun1(t$c, 0.5, 2).
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
Thanks to #Frank for catching my calculation oversight.
Hopefully this all makes sense. Sometimes I don't do well with explanations and technical jargon. If there are any questions, please comment.