How to find if the numbers are continuous in R? - r

I have a range of values
c(1,2,3,4,5,8,9,10,13,14,15)
And I want to find the ranges where the numbers become discontinuous. All I want is this as output:
(1,5)
(8,10)
(13,15)
I need to find break points.
I need to do it in R.

Something like this?
x <- c(1:5, 8:10, 13:15) # example data
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range)
# [[1]]
# [1] 1 5
#
# [[2]]
# [1] 8 10
#
# [[3]]
# [1] 13 15
Another example:
x <- c(1, 5, 10, 11:14, 20:21, 23)
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range))
# [[1]]
# [1] 1 1
#
# [[2]]
# [1] 5 5
#
# [[3]]
# [1] 10 14
#
# [[4]]
# [1] 20 21
#
# [[5]]
# [1] 23 23

x <- c(1:5, 8:10, 13:15)
rr <- rle(x - seq_along(x))
rr$values <- seq_along(rr$values)
s <- split(x, inverse.rle(rr))
s
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 8 9 10
#
# $`3`
# [1] 13 14 15
## And then to get *literally* what you asked for:
cat(paste0("(", gsub(":", ",", sapply(s, deparse)), ")"), sep="\n")
# (1,5)
# (8,10)
# (13,15)

I published seqle which will do this for you in one line. You can load the package cgwtools or search SO for the code, as it's been posted a couple times.

Assuming that you don't care about the exact output and are looking for the min and max of each range, you can use diff/cumsum/range as follows:
x <- c(1:5, 8:10, 13:15)
x. <- c(0, cumsum( diff(x)-1 ) )
lapply( split(x, x.), range )

Related

Selection of only existing combination before the normalization operation

I'd like to normalize some variable just only if existing combinations in var1and var2 using for, in my example:
# Create my variables
var1<-c(rep(6,25),rep(7,5))
var2<-c(1,1,1,1,1,2,2,2,2,2,5,5,5,5,5,10,10,10,10,10,11,11,11,11,11,5,5,5,5,5)
var3<-rnorm(30)
# Create a data frame
mydf<-data.frame(var1,var2,var3)
str(mydf)
# Inspection by var1 and var2
table(mydf$var1,mydf$var2)
# 1 2 5 10 11
#6 5 5 5 5 5
#7 0 0 5 0 0
# I'd like not considering "0" combinations!!
# My idea is create a subset just only for combinations that have values, but if I make:
var1ID <- unique(mydf$var1)
var2ID <- unique(mydf$var2)
for(a in 1:length(var1ID)){
for(b in 1:length(var2ID)){
mydf_sub <- mydf[mydf$var1 == var1ID[a] & mydf$var2 ==var2ID[b],]
print(var1ID[a])
print(var2ID[b])
# Normalize function
normalizevar <- function(x, na.rm = TRUE) {
return((x- min(x))/(max(x)-min(x)))
}
print(normalizevar(mydf_sub$var3))
}}
# [1] 6
# [1] 1
# [1] 0.0000000 0.1235632 0.1541684 1.0000000 0.3910381
# [1] 6
# [1] 2
# [1] 0.7911505 0.0000000 0.6296866 1.0000000 0.1904835
# [1] 6
# [1] 5
# [1] 0.6571259 1.0000000 0.1402675 0.0000000 0.4068031
# [1] 6
# [1] 10
# [1] 0.7060784 0.0000000 1.0000000 0.4842629 0.9560127
# [1] 6
# [1] 11
# [1] 0.4096362 0.4831099 1.0000000 0.0000000 0.5492811
# [1] 7
# [1] 1
# numeric(0)
# [1] 7
# [1] 2
# numeric(0)
# [1] 7
# [1] 5
# [1] 0.6208451 0.3219927 1.0000000 0.4012007 0.0000000
# [1] 7
# [1] 10
# numeric(0)
# [1] 7
# [1] 11
# numeric(0)
Here a have a problem because I'd just only the output with values existent combinations and not numeric(0). Please, any help with my problem or any dplyr approach to solving it?
Note that in the question, the normalizing function was not removing NA's, if any.
# define the function at the beginning of the script,
# never in a loop
normalizevar <- function(x, na.rm = TRUE) {
(x- min(x, na.rm = na.rm))/(max(x, na.rm = na.rm)-min(x, na.rm = na.rm))
}
# make the results reproducible
set.seed(2021)
# Create my variables
var1 <- c(rep(6,25),rep(7,5))
var2 <- c(1,1,1,1,1,2,2,2,2,2,5,5,5,5,5,10,10,10,10,10,11,11,11,11,11,5,5,5,5,5)
var3 <- rnorm(30)
mydf <- data.frame(var1,var2,var3)
Base R solution
There is no need for nested loops, two (unnested) *apply loops will do it. And in just 3 code lines.
# create the groups of var1, var2
sp <- split(mydf, mydf[1:2])
# keep the sub-data.frames with more than zero rows
sp <- sp[sapply(sp, nrow) > 0]
# and normalize var3
lapply(sp, function(X) normalizevar(X$var3))
dplyr solution
A dplyr solution could be the following.
mydf %>%
group_by(var1, var2) %>%
mutate(new_var3 = normalizevar(var3))

split list into lists each of length x

Simple problem, given a list:
main_list <- list(1:3,
4:6,
7:9,
10:12,
13:15)
main_list
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 4 5 6
# [[3]]
# [1] 7 8 9
# [[4]]
# [1] 10 11 12
# [[5]]
# [1] 13 14 15
I want to split the list into multiple lists where I break up the original one into lists each of length x. So if I said x = 2, I would get 3 lists of length 2, 2 and the leftover 1:
target <- list(list(1:3,
4:6),
list(7:9,
10:12),
list(13:15))
target
# [[1]]
# [[1]][[1]]
# [1] 1 2 3
# [[1]][[2]]
# [1] 4 5 6
# [[2]]
# [[2]][[1]]
# [1] 7 8 9
# [[2]][[2]]
# [1] 10 11 12
# [[3]]
# [[3]][[1]]
# [1] 13 14 15
Something like:
my_split <- function(listtest, x) {
split(listtest, c(1:x))
}
target <- my_split(main_list, 2)
Thanks
here is an option with gl
split(main_list, as.integer(gl(length(main_list), 2, length(main_list))))
It can be converted to a custom function
f1 <- function(lstA, n) {
l1 < length(lstA)
split(lstA, as.integer(gl(l1, n, l1)))
}
EDIT: no conditional logic needed. Just use split() with c() and rep():
my_split <- function(l, x){
l_length <- length(l)
l_div <- l_length / x
split(l, c(rep(seq_len(l_div), each = x), rep(ceiling(l_div), l_length %% x)))
}
my_split(main_list, 2)

simulate lapply / loop over a list of parameters

I would like to simulate the frequency and severity over a list of parameters.
Here is the case for the first item in the list:
data <- data.frame(
lamda = c(5, 2, 3),
meanlog = c(9, 10, 11),
sdlog = c(2, 2.1, 2.2))
freq <- rpois(s, data$lamda[1])
freqsev <- lapply(freq, function(k) rlnorm(k, data$meanlog[1], sdlog = data$sdlog[1]))
freq
freqsev
How I set up a loop or an lapply statement to iterate over all the items in data? (not just the first).
Thanks.
We can use map (from the purrr package, part of the tidyverse package) as follows to create list columns. The contents are now stored in the freq and freqsev columns.
library(tidyverse)
set.seed(123)
s <- 2
data2 <- data %>%
mutate(freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog))))
data2$freq
# [[1]]
# [1] 4 7
#
# [[2]]
# [1] 2 4
#
# [[3]]
# [1] 6 0
data2$freqsev
# [[1]]
# [[1]][[1]]
# [1] 9330.247 28897.323 2605520.369 20370.283
#
# [[1]][[2]]
# [1] 645.4047 5206.2183 22461.1778 93729.0634 46892.3129 144595.7492 10110.8606
#
#
# [[2]]
# [[2]][[1]]
# [1] 2665.955 938950.074
#
# [[2]][[2]]
# [1] 21931.9763 354.2858 280122.6952 3147.6681
#
#
# [[3]]
# [[3]][[1]]
# [1] 957.5257 13936.3063 6265.3530 1886.0077 5927.8540 1464.5081
#
# [[3]][[2]]
# numeric(0)
Update
Here is the way to replace values larger than equal to 500.
data3 <- data2 %>%
mutate(capat500 = map(freqsev, ~map(.x, function(y) ifelse(y >= 500, 500, y))))

is there a way I can recycle elements of the shorter list in purrr:: map2 or purrr::walk2?

purrr does not seem to support recycling of elements of a vector in case there is a shortage of elements in one of the two (while using purrr::map2 or purrr::walk2). Unlike baseR where we just get a warning if the larger vector is not a multiple of the shorter one.
Consider this toy example:
This works:
map2(1:3,4:6,sum)
#
#[[1]]
#[1] 5
#[[2]]
#[1] 7
#[[3]]
#[1] 9
And this doesn't work:
map2(1:3,4:9,sum)
Error: .x (3) and .y (6) are different lengths
I understand very well why this is not allowed - as it can make catching bugs very difficult. But is there any way in purrr I can force this to happen? Perhaps using some base R trick with purrr?
You can put both lists in a data frame and let that command repeat your vectors:
input <- data.frame(a = 1:3, b = 4:9)
purrr::map2(input$a, input$b, sum)
It's by design with purrr but you can use Map :
Map(sum,1:3,4:9)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12
And here's how I would recycle if I had to :
x <- 1:3
y <- 4:9
l <- max(length(y), length(x))
map2(rep(x,len = l), rep(y,len = l),sum)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12

Splitting numeric vectors in R

If I have a vector, c(1,2,3,5,7,9,10,12)...and another vector c(3,7,10), how would I produce the following:
[[1]]
1,2,3
[[2]]
5,7
[[3]]
9,10
[[4]]
12
Notice how 3 7 and 10 become the last number of each list element (except the last one). Or in a sense the "breakpoint". I am sure there is a simple R function I am unknowledgeable of or having loss of memory.
Here's one way using cut and split:
split(x, cut(x, c(-Inf, y, Inf)))
#$`(-Inf,3]`
#[1] 1 2 3
#
#$`(3,7]`
#[1] 5 7
#
#$`(7,10]`
#[1] 9 10
#
#$`(10, Inf]`
#[1] 12
Could do
split(x, cut(x, unique(c(y, range(x)))))
## $`[1,3]`
## [1] 1 2 3
## $`(3,7]`
## [1] 5 7
## $`(7,10]`
## [1] 9 10
## $`(10,12]`
## [1] 12
Similar to #beginneR 's answer, but using findInterval instead of cut
split(x, findInterval(x, y + 1))
# $`0`
# [1] 1 2 3
#
# $`1`
# [1] 5 7
#
# $`2`
# [1] 9 10
#
# $`3`
# [1] 12

Resources