Selection of only existing combination before the normalization operation - r

I'd like to normalize some variable just only if existing combinations in var1and var2 using for, in my example:
# Create my variables
var1<-c(rep(6,25),rep(7,5))
var2<-c(1,1,1,1,1,2,2,2,2,2,5,5,5,5,5,10,10,10,10,10,11,11,11,11,11,5,5,5,5,5)
var3<-rnorm(30)
# Create a data frame
mydf<-data.frame(var1,var2,var3)
str(mydf)
# Inspection by var1 and var2
table(mydf$var1,mydf$var2)
# 1 2 5 10 11
#6 5 5 5 5 5
#7 0 0 5 0 0
# I'd like not considering "0" combinations!!
# My idea is create a subset just only for combinations that have values, but if I make:
var1ID <- unique(mydf$var1)
var2ID <- unique(mydf$var2)
for(a in 1:length(var1ID)){
for(b in 1:length(var2ID)){
mydf_sub <- mydf[mydf$var1 == var1ID[a] & mydf$var2 ==var2ID[b],]
print(var1ID[a])
print(var2ID[b])
# Normalize function
normalizevar <- function(x, na.rm = TRUE) {
return((x- min(x))/(max(x)-min(x)))
}
print(normalizevar(mydf_sub$var3))
}}
# [1] 6
# [1] 1
# [1] 0.0000000 0.1235632 0.1541684 1.0000000 0.3910381
# [1] 6
# [1] 2
# [1] 0.7911505 0.0000000 0.6296866 1.0000000 0.1904835
# [1] 6
# [1] 5
# [1] 0.6571259 1.0000000 0.1402675 0.0000000 0.4068031
# [1] 6
# [1] 10
# [1] 0.7060784 0.0000000 1.0000000 0.4842629 0.9560127
# [1] 6
# [1] 11
# [1] 0.4096362 0.4831099 1.0000000 0.0000000 0.5492811
# [1] 7
# [1] 1
# numeric(0)
# [1] 7
# [1] 2
# numeric(0)
# [1] 7
# [1] 5
# [1] 0.6208451 0.3219927 1.0000000 0.4012007 0.0000000
# [1] 7
# [1] 10
# numeric(0)
# [1] 7
# [1] 11
# numeric(0)
Here a have a problem because I'd just only the output with values existent combinations and not numeric(0). Please, any help with my problem or any dplyr approach to solving it?

Note that in the question, the normalizing function was not removing NA's, if any.
# define the function at the beginning of the script,
# never in a loop
normalizevar <- function(x, na.rm = TRUE) {
(x- min(x, na.rm = na.rm))/(max(x, na.rm = na.rm)-min(x, na.rm = na.rm))
}
# make the results reproducible
set.seed(2021)
# Create my variables
var1 <- c(rep(6,25),rep(7,5))
var2 <- c(1,1,1,1,1,2,2,2,2,2,5,5,5,5,5,10,10,10,10,10,11,11,11,11,11,5,5,5,5,5)
var3 <- rnorm(30)
mydf <- data.frame(var1,var2,var3)
Base R solution
There is no need for nested loops, two (unnested) *apply loops will do it. And in just 3 code lines.
# create the groups of var1, var2
sp <- split(mydf, mydf[1:2])
# keep the sub-data.frames with more than zero rows
sp <- sp[sapply(sp, nrow) > 0]
# and normalize var3
lapply(sp, function(X) normalizevar(X$var3))
dplyr solution
A dplyr solution could be the following.
mydf %>%
group_by(var1, var2) %>%
mutate(new_var3 = normalizevar(var3))

Related

Use mapply or lapply to nested list

I want to apply a sample function to a nested list (I will call this list bb) and I also have a list of numbers (I will call this list k) to be supplied in the sample function. I would like each of the numbers in k to iterate through all the values of each list in bb. How to do this using mapply or lapply?
Here are the data:
k <- list(1,2,4,3) #this is the list of numbers to be supplied in the `sample.int` function
b1 <- list(c(1,2,3),c(2,3,4),c(3,4,5),c(4,5,6)) #The first list of bb
b2 <- list(c(1,2),c(2,3),c(3,4),c(4,5), c(5,6)) #The second list of bb
bb <- list(b1,b2) #This is list bb containing b1 and b2 whose values are to be iterated through
I created this mapply function but it didn't get the expected outcome:
mapply(function(x, y) {
x[sample.int(y,y, replace = TRUE)]
}, bb,k, SIMPLIFY = FALSE)
This only returns 10 output values but I would like each number of k to loop through all values of the two lists in bb and so there should be 10*2 outputs for the two lists in bb. I might be using mapply in the wrong way and so I would appreciate if anyone can point me to the right direction!
outer is your friend. It's normally used to calculate the outer matrix product. Consider:
outer(1:3, 2:4)
1:3 %o% 2:4 ## or
# [,1] [,2] [,3]
# [1,] 2 3 4
# [2,] 4 6 8
# [3,] 6 9 12
It also has a FUN= argument that defaults to "*". However it enables you to calculate any function over the combinations of x and y cross-wise, i.e. x[1] X y[1], x[1] X y[2], ... whereas *apply functions only calculate x[1] X y[1], x[2] X y[2], .... So let's do it:
FUN <- Vectorize(function(x, y) x[sample.int(y, y)])
set.seed(42)
res <- outer(bb, k, FUN)
res
# [,1] [,2] [,3] [,4]
# [1,] List,1 List,2 List,4 List,3
# [2,] List,1 List,2 List,4 List,3
This result looks a little weird, but we may easily unlist it.
res <- unlist(res, recursive=F)
Result
res
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 1 2
#
# [[3]]
# [1] 1 2 3
#
# [[4]]
# [1] 2 3 4
#
# [[5]]
# [1] 2 3
#
# [[6]]
# [1] 1 2
#
# [[7]]
# [1] 2 3 4
#
# [[8]]
# [1] 4 5 6
#
# [[9]]
# [1] 1 2 3
#
# [[10]]
# [1] 3 4 5
#
# [[11]]
# [1] 3 4
#
# [[12]]
# [1] 4 5
#
# [[13]]
# [1] 2 3
#
# [[14]]
# [1] 1 2
#
# [[15]]
# [1] 1 2 3
#
# [[16]]
# [1] 2 3 4
#
# [[17]]
# [1] 3 4 5
#
# [[18]]
# [1] 2 3
#
# [[19]]
# [1] 3 4
#
# [[20]]
# [1] 1 2
VoilĂ , 20 results.

is there a way I can recycle elements of the shorter list in purrr:: map2 or purrr::walk2?

purrr does not seem to support recycling of elements of a vector in case there is a shortage of elements in one of the two (while using purrr::map2 or purrr::walk2). Unlike baseR where we just get a warning if the larger vector is not a multiple of the shorter one.
Consider this toy example:
This works:
map2(1:3,4:6,sum)
#
#[[1]]
#[1] 5
#[[2]]
#[1] 7
#[[3]]
#[1] 9
And this doesn't work:
map2(1:3,4:9,sum)
Error: .x (3) and .y (6) are different lengths
I understand very well why this is not allowed - as it can make catching bugs very difficult. But is there any way in purrr I can force this to happen? Perhaps using some base R trick with purrr?
You can put both lists in a data frame and let that command repeat your vectors:
input <- data.frame(a = 1:3, b = 4:9)
purrr::map2(input$a, input$b, sum)
It's by design with purrr but you can use Map :
Map(sum,1:3,4:9)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12
And here's how I would recycle if I had to :
x <- 1:3
y <- 4:9
l <- max(length(y), length(x))
map2(rep(x,len = l), rep(y,len = l),sum)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12

Make elements NA depending on a predicate function

How can I easily change elements of a list or vectors to NAs depending on a predicate ?
I need it to be done in a single call for smooth integration in dplyr::mutate calls etc...
expected output:
make_na(1:10,`>`,5)
# [1] 1 2 3 4 5 NA NA NA NA NA
my_list <- list(1,"a",NULL,character(0))
make_na(my_list, is.null)
# [[1]]
# [1] 1
#
# [[2]]
# [1] "a"
#
# [[3]]
# [1] NA
#
# [[4]]
# character(0)
Note:
I answered my question as I have one solution figured out but Id be happy to get alternate solutions. Also maybe this functionality is already there in base R or packaged in a prominent library
Was inspired by my frustration in my answer to this post
We can build the following function:
make_na <- function(.x,.predicate,...) {
is.na(.x) <- sapply(.x,.predicate,...)
.x
}
Or a bit better to leverage purrr's magic :
make_na <- function(.x,.predicate,...) {
if (requireNamespace("purrr", quietly = TRUE)) {
is.na(.x) <- purrr::map_lgl(.x,.predicate,...)
} else {
if("formula" %in% class(.predicate))
stop("Formulas aren't supported unless package 'purrr' is installed")
is.na(.x) <- sapply(.x,.predicate,...)
}
.x
}
This way we'll be using purrr::map_lgl if library purrr is available, sapply otherwise.
Some examples :
make_na <- function(.x,.predicate,...) {
is.na(.x) <- purrr::map_lgl(.x,.predicate,...)
.x
}
Some use cases:
make_na(1:10,`>`,5)
# [1] 1 2 3 4 5 NA NA NA NA NA
my_list <- list(1,"a",NULL,character(0))
make_na(my_list, is.null)
# [[1]]
# [1] 1
#
# [[2]]
# [1] "a"
#
# [[3]]
# [1] NA
#
# [[4]]
# character(0)
make_na(my_list, function(x) length(x)==0)
# [[1]]
# [1] 1
#
# [[2]]
# [1] "a"
#
# [[3]]
# [1] NA
#
# [[4]]
# [1] NA
If purrr is installed we can use this short form:
make_na(my_list, ~length(.x)==0)

List all combinations of strings that together cover all given elements

Say I am given the following strings:
1:{a,b,c,t}
2:{b,c,d}
3:{a,c,d}
4:{a,t}
I want to make a program that will give me all different combinations of these strings, where each combination has to include each given letter.
So for example the above combinations are strings {1&2, 1&3, 2&3&4, 1&2&3&4, 2&4}.
I was thinking of doing this with for loops, where the program would look at the first string, find which elements are missing, then work down through the list to find strings which have these letters. However I think this idea will only find combinations of two strings, and also it requires listing all letters to the program which seems very un-economical.
I think something like this should work.
sets <- list(c('a', 'b', 'c', 't'),
c('b', 'c', 'd'),
c('a', 'c', 'd'),
c('a', 't'))
combinations <- lapply(2:length(sets),
function(x) combn(1:length(sets), x, simplify=FALSE))
combinations <- unlist(combinations, FALSE)
combinations
# [[1]]
# [1] 1 2
#
# [[2]]
# [1] 1 3
#
# [[3]]
# [1] 1 4
#
# [[4]]
# [1] 2 3
#
# [[5]]
# [1] 2 4
#
# [[6]]
# [1] 3 4
#
# [[7]]
# [1] 1 2 3
#
# [[8]]
# [1] 1 2 4
#
# [[9]]
# [1] 1 3 4
#
# [[10]]
# [1] 2 3 4
#
# [[11]]
# [1] 1 2 3 4
u <- unique(unlist(sets))
u
# [1] "a" "b" "c" "t" "d"
Filter(function(x) length(setdiff(u, unlist(sets[x]))) == 0, combinations)
# [[1]]
# [1] 1 2
#
# [[2]]
# [1] 1 3
#
# [[3]]
# [1] 2 4
#
# [[4]]
# [1] 1 2 3
#
# [[5]]
# [1] 1 2 4
#
# [[6]]
# [1] 1 3 4
#
# [[7]]
# [1] 2 3 4
#
# [[8]]
# [1] 1 2 3 4
As a start...
I'll edit this answer when I have time. The following result is dependent on the order of choice. I haven't figured out how to flatten the list yet. If I could flatten it, I would sort each result then remove duplicates.
v = list(c("a","b","c","t"),c("b","c","d"),c("a","c","d"),c("a","t"))
allChars <- Reduce(union, v) # [1] "a" "b" "c" "t" "d"
charInList <- function(ch, li) which(sapply(li, function(vect) ch %in% vect))
locations <- sapply(allChars, function(ch) charInList(ch, v) )
# > locations
# $a
# [1] 1 3 4
#
# $b
# [1] 1 2
#
# $c
# [1] 1 2 3
#
# $t
# [1] 1 4
#
# $d
# [1] 2 3
findStillNeeded<-function(chosen){
haveChars <- Reduce(union, v[chosen])
stillNeed <- allChars[!allChars %in% haveChars]
if(length(stillNeed) == 0 ) return(chosen) #terminate if you dont need any more characters
return ( lapply(1:length(stillNeed), function(i) { #for each of the characters you still need
loc <- locations[[stillNeed[i]]] #find where the character is located
lapply(loc, function(j){
findStillNeeded(c(chosen, j)) #when you add this location to the choices, terminate if you dont need any more characters
})
}) )
}
result<-lapply(1:length(v), function(i){
findStillNeeded(i)
})

How to find if the numbers are continuous in R?

I have a range of values
c(1,2,3,4,5,8,9,10,13,14,15)
And I want to find the ranges where the numbers become discontinuous. All I want is this as output:
(1,5)
(8,10)
(13,15)
I need to find break points.
I need to do it in R.
Something like this?
x <- c(1:5, 8:10, 13:15) # example data
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range)
# [[1]]
# [1] 1 5
#
# [[2]]
# [1] 8 10
#
# [[3]]
# [1] 13 15
Another example:
x <- c(1, 5, 10, 11:14, 20:21, 23)
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range))
# [[1]]
# [1] 1 1
#
# [[2]]
# [1] 5 5
#
# [[3]]
# [1] 10 14
#
# [[4]]
# [1] 20 21
#
# [[5]]
# [1] 23 23
x <- c(1:5, 8:10, 13:15)
rr <- rle(x - seq_along(x))
rr$values <- seq_along(rr$values)
s <- split(x, inverse.rle(rr))
s
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 8 9 10
#
# $`3`
# [1] 13 14 15
## And then to get *literally* what you asked for:
cat(paste0("(", gsub(":", ",", sapply(s, deparse)), ")"), sep="\n")
# (1,5)
# (8,10)
# (13,15)
I published seqle which will do this for you in one line. You can load the package cgwtools or search SO for the code, as it's been posted a couple times.
Assuming that you don't care about the exact output and are looking for the min and max of each range, you can use diff/cumsum/range as follows:
x <- c(1:5, 8:10, 13:15)
x. <- c(0, cumsum( diff(x)-1 ) )
lapply( split(x, x.), range )

Resources