Elementary problems with a Loop in R - r

I am working with for-loops in R;
I have a data frame which contains n columns.
I have to build a vector of length n where each element is 1 if the column is a double, 0 otherwise.
this is what I have tried:
y<-rep(0,dim.data.frame(datafr)[2])
attach(datafr)
x<-names(dat)
for (j in 1:length(x)){
for(i in x){
if(is.double(i)){
y[j]<-1
}else{
y[j]<-0
}
}
}
However, it does not work since the y vector returned has no 1, but just n 0.

A column in a data.frame should all be the same class, so you only need to check past the first value in a column.
A simple approach is to create the vector with sapply which loops (in this case) over the columns of a frame.
datafr <- data.frame(a=1:5, b=1:5 + 0, d=letters[1:5])
sapply(datafr, is.double)
# a b d
# FALSE TRUE FALSE
If you must use a for loop, this can be unrolled with
y <- integer(ncol(datafr)) # defaults to 0
y
# [1] 0 0 0
for (j in seq_along(datafr)) {
if (is.double(datafr[[j]])) {
y[j] <- 1L
}
}
y
# [1] 0 1 0

Related

How to create matrix of all 2^n binary sequences of length n using recursion in R?

I know I can use expand.grid for this, but I am trying to learn actual programming. My goal is to take what I have below and use a recursion to get all 2^n binary sequences of length n.
I can do this for n = 1, but I don't understand how I would use the same function in a recursive way to get the answer for higher dimensions.
Here is for n = 1:
binseq <- function(n){
binmat <- matrix(nrow = 2^n, ncol = n)
r <- 0 #row counter
for (i in 0:1) {
r <- r + 1
binmat[r,] <- i
}
return(binmat)
}
I know I have to use probably a cbind in the return statement. My intuition says the return statement should be something like cbind(binseq(n-1), binseq(n)). But, honestly, I'm completely lost at this point.
The desired output should produce something like what expand.grid gives:
n = 5
expand.grid(replicate(n, 0:1, simplify = FALSE))
It should just be a matrix as binmat is being filled recursively.
As requested in a comment (below), here is a limited implementation for binary sequences only:
eg.binary <- function(n, digits=0:1) {
if (n <= 0) return(matrix(0,0,0))
if (n == 1) return(matrix(digits, 2))
x <- eg.binary(n-1)
rbind(cbind(digits[1], x), cbind(digits[2], x))
}
After taking care of an initial case that R cannot handle correctly, it treats the "base case" of n=1 and then recursively obtains all n-1-digit binary strings and prepends each digit to each of them. The digits are prepended so that the binary strings end up in their usual lexicographic order (the same as expand.grid).
Example:
eg.binary(3)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 1
[3,] 0 1 0
[4,] 0 1 1
[5,] 1 0 0
[6,] 1 0 1
[7,] 1 1 0
[8,] 1 1 1
A general explanation (with a more flexible solution) follows.
Distill the problem down to the basic operation of tacking the values of an array y onto the rows of a dataframe X, associating a whole copy of X with each value (via cbind) and appending the whole lot (via rbind):
cross <- function(X, y) {
do.call("rbind", lapply(y, function(z) cbind(X, z)))
}
For example,
cross(data.frame(A=1:2, b=letters[1:2]), c("X","Y"))
A b z
1 1 a X
2 2 b X
3 1 a Y
4 2 b Y
(Let's worry about the column names later.)
The recursive solution for a list of such arrays y assumes you have already carried out these operations for all but the last element of the list. It has to start somewhere, which evidently consists of converting an array into a one-column data frame. Thus:
eg_ <- function(y) {
n <- length(y)
if (n <= 1) {
as.data.frame(y)
} else {
cross(eg_(y[-n]), y[[n]])
}
}
Why the funny name? Because we might want to do some post-processing, such as giving the result nice names. Here's a fuller implementation:
eg <- function(y) {
# (Define `eg_` here to keep it local to `eg` if you like)
X <- eg_(y)
names.default <- paste0("Var", seq.int(length(y)))
if (is.null(names(y))) {
colnames(X) <- names.default
} else {
colnames(X) <- ifelse(names(y)=="", names.default, names(y))
}
X
}
For example:
eg(replicate(3, 0:1, simplify=FALSE))
Var1 Var2 Var3
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
eg(list(0:1, B=2:3))
Var1 B
1 0 2
2 1 2
3 0 3
4 1 3
Apparently this was the desired recursive code:
binseq <- function(n){
if(n == 1){
binmat <- matrix(c(0,1), nrow = 2, ncol = 1)
}else if(n > 1){
A <- binseq(n-1)
B <- cbind(rep(0, nrow(A)), A)
C <- cbind(rep(1, nrow(A)), A)
binmat <- rbind(B,C)
}
return(binmat)
}
Basically for n = 1 we create a [0, 1] matrix. For every n there after we add a column of 0's to the original matrix, and, separately, a column of 1's. Then we rbind the two matrices to get the final product. So I get what the algorithm is doing, but I don't really understand what the recursion is doing. For example, I don't understand the step from n = 2 to n = 3 based on the algorithm.

About missing value where TRUE/FALSE needed in R

I want to return the number of times in string vector v that the element at the next successive index has more characters than the current index.
Here's my code
BiggerPairs <- function (v) {
numberOfTimes <- 0
for (i in 1:length(v)) {
if((nchar(v[i+1])) > (nchar(v[i]))) {
numberOfTimes <- numberOfTimes + 1
}
}
return(numberOfTimes)
}
}
missing value where TRUE/FALSE needed.
I do not know why this happens.
The error you are getting is saying that your code is trying to evaluate a missing value (NA) where it expects a number. There are likely one of two reasons for this.
You have NA's in your vector v (I suspect this is not the actual issue)
The loop you wrote is from 1:length(v), however, on the last iteration, this will try the loop to try to compare v[n+1] > v[n]. There is no v[n+1], thus this is a missing value and you get an error.
To remove NAs, try the following code:
v <- na.omit(v)
To improve your loop, try the following code:
for(i in 1:(length(v) -1)) {
if(nchar(v[i + 1]) > nchar(v[i])) {
numberOfTimes <- numberOfTimes + 1
}
}
Here is some example dummy code.
# create random 15 numbers
set.seed(1)
v <- rnorm(15)
# accessing the 16th element produces an NA
v[16]
#[1] NA
# if we add an NA and try to do a comparison, we get an error
v[10] <- NA
v[10] > v[9]
#[1] NA
# if we remove NAs and limit our loop to N-1, we should get a fair comparison
v <- na.omit(v)
numberOfTimes <- 0
for(i in 1:(length(v) -1)) {
if(nchar(v[i + 1]) > nchar(v[i])) {
numberOfTimes <- numberOfTimes + 1
}
}
numberOfTimes
#[1] 5
Is this what you're after? I don't think there is any need for a for loop.
I'm generating some sample data, since you don't provide any.
# Generate some sample data
set.seed(2017);
v <- sapply(sample(30, 10), function(x)
paste(sample(letters, x, replace = T), collapse = ""))
v;
#[1] "raalmkksyvqjytfxqibgwaifxqdc" "enopfcznbrutnwjq"
#[3] "thfzoxgjptsmec" "qrzrdwzj"
#[5] "knkydwnxgfdejcwqnovdv" "fxexzbfpampbadbyeypk"
#[7] "c" "jiukokceniv"
#[9] "qpfifsftlflxwgfhfbzzszl" "foltth"
The following vector marks the positions with 1 in v where entries have more characters than the previous entry.
# The following vector has the same length as v and
# returns 1 at the index position i where
# nchar(v[i]) > nchar(v[i-1])
idx <- c(0, diff(nchar(v)) > 0);
idx;
# [1] 0 0 0 0 1 0 0 1 1 0
If you're just interested in whether there is any entry with more characters than the previous entry, you can do this:
# If you just want to test if there is any position where
# nchar(v[i+1]) > nchar(v[i]) you can do
any(idx == 1);
#[1] TRUE
Or count the number of occurrences:
sum(idx);
#[1] 3

Create a vector in a loop for every pair of samples

I do a pairwise calculation between my samples and I want every pairwise calculation to be stored in a separate vector. For 3 comparisons, I have:
sample_12 <- vector(mode="numeric", length = 10)
sample_13 <- vector(mode="numeric", length = 10)
sample_23 <- vector(mode="numeric", length = 10)
Is there a possibility to create these vectors with the corresponding names in a loop so it can work for any given number of samples?
I tried the following code but I can't access the vectors outside the for-loop, how could I solve this issue?
pop = 3
sample = vector(mode="numeric", length = 10)
for (i in 1:(pop - 1)) {
for (j in (i + 1):pop) {
name <- paste("sample",i,j, sep = "")
name <- vector(mode="numeric", length = 10)
}
}
You can use the "assign" function:
pop = 3
sample = vector(mode="numeric", length = 10)
pop_combos <- combn(pop, 2)
for (i in 1:ncol(pop_combos)) {
name <- paste("sample_",
pop_combos[,i][1],
pop_combos[,i][2],
sep="")
assign(name, sample)
}
Outside the loop you can now access the vectors:
> sample_12
[1] 0 0 0 0 0 0 0 0 0 0
Use a list:
pop = 3
combinations = apply(combn(pop, m = 2), 2, paste, collapse = "_")
sample = replicate(n = length(combinations), numeric(10), simplify = FALSE)
names(sample) = combinations
sample
# $`1_2`
# [1] 0 0 0 0 0 0 0 0 0 0
#
# $`1_3`
# [1] 0 0 0 0 0 0 0 0 0 0
#
# $`2_3`
# [1] 0 0 0 0 0 0 0 0 0 0
You can then access each element of the list, e.g., sample[["1_3"]]. This scales up very easily and doesn't require pasting together names and using assign and get, which is just asking for hard-to-find bugs. You can use lapply or for loops to iterate over each item in the list trivially. Depending on your use case, it might make more sense to use the default simplify = TRUE inside replicate and keep it as a matrix or data frame. The only reason to use a list would be if some of the vectors needed to be different lengths.
Is something like this you are searching for?
Please suppose that you save all the vectors as rows/columns in a data.frame
list.values <- list()
col <- ncol(df)
row <- nrow(df)
for( i in 1:(col*row)) {list[[i]] = df - df[i/row,i%%col]}
Now you have access to all the data frames in the list[[i * j]], that are the difference between all the elements and the element[i,j].
E.g: You want to access the values that are made between all the dataframe and the
element [2, 3]. Then, you do this View(list[[2*3]])

Ranks and identification of elements in r

I have two vectors with different elements, say x=c(1,3,4) , y= c(2,9)
I want a vector of ranges that identifies me the elements of vector x with 1 and those of y with 0, ie
(1,2,3,4,9) -----> (1,0,1,1,0)
How could you get the vector of zeros and ones (1,0,1,1,0) in r?
Thanks
The following option surely isn't numerically optimal, but it's the most simple and direct one:
a<-c(1,2,3,4)
b<-c(5,6,7,8)
f<-function(vec0,vec1,inp)
{
out<-rep(NA,length(inp)) #NA if input elements in neither vector
for(i in 1:length(inp))
{ #Logical values coerced to 0 and 1 at first, then
if(sum(inp[i]==vec0))(out[i]<-0); #summed up and if sum != 0 coerced to logical "TRUE"
}
for(i in 1:length(inp))
{
if(sum(inp[i]==vec1))(out[i]<-1);
}
return (out)
}
Works just fine:
> f(vec0=a,vec1=b,inp=c(1,6,4,8,2,4,8,7,10))
[1] 0 1 0 1 0 0 1 1 NA
first you define a function that do that
blah <- function( vector,
x=c(1,3,4),
y= c(2,9)){
outVector <- rep(x = NA, times = length(vector))
outVector[vector %in% x] <- 1
outVector[vector %in% y] <- 0
return(outVector)
}
then you can use the function:
blah(vector = 1:9)
blah(vector = c(1,2,3,4,9))
you can also change the value of x & y
blah(vector = 1:10,x = c(1:5*2), y = c((1:5*2)-1 ))

R: Remove the number of occurrences of values in one vector from another vector, but not all

Apologies for the confusing title, but I don't know how to express my problem otherwise. In R, I have the following problem which I want to solve:
x <- seq(1,1, length.out=10)
y <- seq(0,0, length.out=10)
z <- c(x, y)
p <- c(1,0,1,1,0,0)
How can I remove vector p from vector z so that vector a new vector i now has three occurrences of 1 and three occurrences 0 less, so what do I have to do to arrive at the following result? In the solution, the order of 1's and 0's in z should not matter, they just might have been in a random order, plus there can be other numbers involved as well.
i
> 1 1 1 1 1 1 1 0 0 0 0 0 0 0
Thanks in advance!
Similar to #VincentGuillemot's answer, but in functional programming style. Uses purrr package:
i <- z
map(p, function(x) { i <<- i[-min(which(i == x))]})
i
> i
[1] 1 1 1 1 1 1 1 0 0 0 0 0 0 0
There might be numerous better ways to do it:
i <- z
for (val in p) {
if (val %in% i) {
i <- i[ - which(i==val)[1] ]
}
}
Another solution that I like better because it does not require a test (and thanks fo #Franck's suggestion):
for (val in p)
i <- i[ - match(val, i, nomatch = integer(0) ) ]

Resources