R loop to list all the values of a variable - r

As an exercice, I have to create a vector of all the different values of a variable (a$dep).
This vector can be created with the code: unique(a$dep)
I need to create this vector using a for loop
I wrote a loop that doesn't give the right result but I don't understand where is the problem:
v<-vector()
for (i in seq_along(a$dep)){
v<-ifelse(a$dep[i] %in% v, v,c(v,a$dep[i]))
}
Thank you very much for your help !

Based on the description, if we need unique values an if condition is sufficient i.e. loop over the sequence of 'dep' column if the element is not (!) %in% 'v', append that element to 'v' and update the 'v' by assignment (<-)
v <- vector()
for(i in seq_along(a$dep)) {if(!a$dep[i] %in% v) v <- c(v, a$dep[i])}
As ifelse requires all arguments to be of same length, 'v' is dynamic in length as we concatenate elements to it, thus, the 'yes', 'no' (always length 1 -a$dep[i]) mismatches in length.
One option with ifelse would be to initiate a vector 'v' with the same length as the 'dep' column length, then use ifelse to check whether the 'dep' element is %in% the whole vector (return TRUE/FALSE - length 1), then return blank (yes - "" - length 1) or else return the element of 'dep (no - a$dep[i]- length 1)
v <- character(nrow(a))
for(i in seq_along(a$dep)) v[i] <- ifelse(a$dep[i] %in% v, "", a$dep[i])
and then remove the blank elements
v[v != ""]
#[1] "a" "b" "c" "e"
The ifelse is useful as vectorized function and its use would not be optimal here
data
a <- data.frame(dep = c('a', 'b', 'a', 'c', 'e', 'a'))

Related

Modifying a function to add an extra search for an specific value

With the following function, I search into a list (my_list) and returns the elements of the list that have any column with ".Positivedata"
#Function to give me the names of elements in a list with any column with character
".Positivedata" (Works OK)
names(my_list)[sapply(my_list, function(x) any(grep(".Positivedata", names(x))))]
Now I would like to modify the function to return which of the names have a value of "4" on the column ".Positivedata"
Where do I put the "== 4" in the function?
names(my_list)[sapply(my_list, function(x) any(grep(".Positivedata", names(x))) )]
We loop through the list, subset the column with grep, check whether that is equal to 4 and use which to get the position
lapply(my_list, function(x) which(x[, grep("\\.Positivedata", names(x))] == 4))
If there are multiple columns, then it would be better to get the row/column index
lapply(my_list, function(x)
which(x[, grep("\\.Positivedata", names(x))] == 4, arr.ind = TRUE))
If we want the column name among the columns that are .Positivedata having the value '4'
sapply(my_list, function(x) {
nm1 <- grep("\\.Positivedata", names(x), value = TRUE)
nm1[sapply(x[nm1], function(x) any(x==4))]
})

return a vector from a list based on a condition

I have a list
alist <- list(c(1,2,9),c(4,5,4),c(3,11,19))
and a constant
value <- 4
I want to return the vector from the list in which the first element of the vector equals the constant (i.e., (4,5,4)). I'd like to do this in base R. Can anyone help?
We can loop through the list with sapply, extract the first element, compare it with 'value' to get a logical vector and subset the 'alist' based on that
alist[sapply(alist, `[`, 1) == value]
Or with Filter
Filter(function(x) x[1] == value, alist)
If we use purrr
purrr::keep(alist, ~ .x[1] == value)

R: remove empty columns in a list within a list

How do I remove empty columns from a list within a list in R, when the columns are either "" or NA?
SAMPLE DATA:
x <- list( a = cars , b = ability.cov , d = mtcars )
x[[3]][2]<-""
So the second column in the third list is now all "", I wish to remove it from x
EDIT:The problem is I do not know which columns in which list (within the list) that is empty. I need some algorithm
I've tried the following which does not work for me:
x<-x[,colSums(x!= "") != 0 ]
To remove all columns with only value "" from the dataframes in the list you can do:
lapply(x, function(xi) xi[!sapply(xi, function(xii) all(xii==""))])
explanation:
If you have a vector xii you can test it against "", this gives a vector of logical with length same as xii.
all(...) is clear: the result is TRUE if all elements are TRUE
sapply(xi, ...) is calculating this for each column of xi. It
gives TRUE or FALSE for each column of xi
xi[!sapply()] inverts the logical vector from sapply() and uses
it as index for xi. If one element of the index is FALSE, the
column is neglected in the result.
lapply(x, ...) is running over your original list
Do not forget to store the result in an object! xnew <- lapply(...)
If you want to remove columns with only NA and "" as values:
lapply(x, function(xi) xi[!sapply(xi, function(xii) all(xii=="" | is.na(xii)))])

Matching multiple column criteria in an R dataframe from integer vectors

I have a large dataframe, populated with 1's and 0's.
I have two integer vectors, "a" and "b" which relate to specific columns in the dataframe. No column reference in a exists in b, and vice versa (i.e. no intersect).
What I'm trying to do is generate a new column containing a flag when:
ANY of the columns in "a" are 1 (on a given row) and
ALL of the columns in "b" are 0 (on the same row)
I'm trying to do this by:
processed.tbl$flag <- ifelse(processed.tbl[, a] == 1 & processed.tbl[, b] ==0,
1, 0)
but I get an error of non-conformable arrays, presumably because it's trying to join the two table subsets. How do I do this correctly (in base R ideally)?
Thanks.
Okay; think I've found a way to do this, but please do add if there's a slicker way!
processed.tbl$flag[
which( apply(processed.tbl[, b], MARGIN = 1, function(x) all (x == 0))
&apply(processed.tbl[, a], MARGIN = 1, function(y) any (y == 1))
, arr.ind = FALSE)] <- 1

Replace all values of NULL in a list of List to something else/ list of list with varying lengths

I have a list of lists that I need to convert into the correspond dataframe.
The position of the rows is important so that I can link them to another item later on. I tried one approach where I make it into a dataframe, but I can't do that because of some rows being NULL.
first = c(1,2,3)
second = c(1,2)
problemrows = NULL
mylist = list(first,second,problemrows)
#mylist - should turn into a dataframe with each row being same order as the list of lists. NULL value would be a null row
library(plyr) # doesn't work because of NULLs above
t(plyr::rbind.fill.matrix(lapply(mylist, t)))
## help^^^ Ideally the row that doesn't work would just be a null row or the new dataframe would remove these NULL lists as rows and instead assign appropriate row#s to everything else to make up for what was skipped.
# other approach - change all the NULL list of lists to something like -999 which is another fix.
first = c(1,2,3)
second = c(1,2)
problemrows = NULL
mylist = list(first,second,problemrows)
mylist <- sapply(mylist, function(x) ifelse(x == "NULL", NA, x))
library(plyr) # doesn't work because of NULLs above
t(plyr::rbind.fill.matrix(lapply(mylist, t)))

Resources