I have a numeric vector and I need to get the intervals as a list of vectors.
I thought it was easy but I'm really struggling to find a good, simple way.
A bad, complex way would be to paste the vector and its lag, and then split the result.
Here is the working but ugly reprex:
library(tidyverse)
xx = c(1, 5, 10 ,15 ,20)
paste0(lag(xx), "-", xx-1) %>% str_split("-") #nevermind the first one, it cannot really make sense anyway
#> [[1]]
#> [1] "NA" "0"
#>
#> [[2]]
#> [1] "1" "4"
#>
#> [[3]]
#> [1] "5" "9"
#>
#> [[4]]
#> [1] "10" "14"
#>
#> [[5]]
#> [1] "15" "19"
Created on 2020-09-06 by the reprex package (v0.3.0)
Is there a cleaner way to do the same thing?
You can use Map :
Map(c, xx[-length(xx)], xx[-1] - 1)
#[[1]]
#[1] 1 4
#[[2]]
#[1] 5 9
#[[3]]
#[1] 10 14
#[[4]]
#[1] 15 19
We can also use lapply iterating over the length of the variable.
lapply(seq_along(xx[-1]), function(i) c(xx[i], xx[i+1] - 1))
We can use map2 from purrr
library(purrr)
map2(xx[-length(xx)], xx[-1] -1, c)
Related
> my_data <- "08,23,02.06.2022,5,7,THISPRODUCT,09.02.2022,yes,89,25"
> lengths(gregexpr(",", my_data))+1
[1] 10
I need to get each element individually. I tried with
print(gregexpr(",", my_data))[[1]][1]
> print(gregexpr(",", my_data))[[1]][1]
[[1]]
[1] 3 6 17 19 21 33 44 48 51
attr(,"match.length")
[1] 1 1 1 1 1 1 1 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[1] 3
but my_data has the first element "08" but it displays 3.. anyone give me correct syntax to display every element.
library(tidyverse)
strings <- "08,23,02.06.2022,5,7,THISPRODUCT,09.02.2022,yes,89,25" %>%
str_split(pattern = ",") %>%
unlist()
strings[1]
#> [1] "08"
Created on 2022-06-29 by the reprex package (v2.0.1)
Let's try scan
> scan(text = my_data, what = "",sep = ",",quiet = TRUE)
[1] "08" "23" "02.06.2022" "5" "7"
[6] "THISPRODUCT" "09.02.2022" "yes" "89" "25"
Using lapply:
lapply(strsplit(my_data, ","), `[`)
Output:
[[1]]
[1] "08" "23" "02.06.2022" "5" "7" "THISPRODUCT" "09.02.2022" "yes"
[9] "89" "25"
You can simply do:
unlist(strsplit(my_data, split = ","))
I was pondering on this after having come across another question.
library(tidyverse)
set.seed(42)
df <- data.frame(x = cut(runif(100), c(0,25,75,125,175,225,299)))
tidyr::extract does a nice job splitting into groups defined by the regex:
df %>%
extract(x, c("start", "end"), "(\\d+),(\\d+)") %>% head
#> start end
#> 1 0 25
#> 2 0 25
#> 3 0 25
#> 4 0 25
#> 5 0 25
#> 6 0 25
Desired output on a character vector. I know you could just create a new function, I wondered if this is already out there.
x_chr <- as.character(df$x)
des_res <- str_split(str_extract(x_chr, "(\\d+),(\\d+)"), ",")
head(des_res)
#> [[1]]
#> [1] "0" "25"
#>
#> [[2]]
#> [1] "0" "25"
#>
#> [[3]]
#> [1] "0" "25"
#>
#> [[4]]
#> [1] "0" "25"
#>
#> [[5]]
#> [1] "0" "25"
#>
#> [[6]]
#> [1] "0" "25"
You can use strcapture in base R :
strcapture("(\\d+),(\\d+)", x_chr,
proto = list(start = numeric(), end = numeric()))
# start end
#1 0 25
#2 0 25
#3 0 25
#4 0 25
#5 0 25
#6 0 25
#...
#...
You can also use stringr::str_match :
stringr::str_match(x_chr, "(\\d+),(\\d+)")[, -1]
In str_match, 1st column returns the complete pattern whereas all the subsequent columns are the capture groups.
suppose I have a dataframe where there are two columns that indicate a direct relationship between the parallel values.
c2 <- c(2,5,7,8,10)
c1 <- c(1,3,2,7,5)
df <- data.frame(c1, c2)
Such that:
1 is related to 2 [1],
2 is related to 7 [3],
7 is related to 8 [4]
So I get a vector of the indexes 1,3, and 4
and then 3 is related to 5 [2],
and 5 is related to 10 [5]
so I get a vector of the indexes 2 and 5?
It hurts my brain.
This could be effectively solved using the igraph library:
common_ids <- clusters(graph_from_data_frame(df, directed = FALSE))$membership
split(1:nrow(df), common_ids[match(df$c1, names(common_ids))])
$`1`
[1] 1 3 4
$`2`
[1] 2 5
If also members of the groups are of interest:
split(names(common_ids), common_ids)
$`1`
[1] "1" "2" "7" "8"
$`2`
[1] "3" "5" "10"
An option with igraph
lapply(
groups(components(graph_from_data_frame(df, directed = FALSE))),
function(x) Filter(Negate(is.na),match(x, as.character(df$c1)))
)
gives
$`1`
[1] 1 3 4
$`2`
[1] 2 5
How do I remove an element from a list in R?
Imagine this workflow:
# create list
my_list <- lapply(1:10, function(x) x)
# find which ones to exclude
my_list_boolean <- sapply(my_list, function(x) ifelse(x%%2>0,F,T))
# does not work like this!
my_list[[my_list_boolean]]
Is there a solution not having to use a for loop and create a big logic around my statement?
Just use [] and not [[]]
my_list <- lapply(1:10, function(x) x)
# find which ones to exclude
my_list_boolean <- sapply(my_list, function(x) ifelse(x%%2>0,F,T))
# does not work like this!
my_list[my_list_boolean]
#> [[1]]
#> [1] 2
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 6
#>
#> [[4]]
#> [1] 8
#>
#> [[5]]
#> [1] 10
Created on 2018-11-03 by the reprex package (v0.2.1)
You can thus select element of the list with logical vector and not the content (which is [[]]
Do you mean this?
my_list[my_list_boolean]
#[[1]]
#[1] 2
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 6
#
#[[4]]
#[1] 8
#
#[[5]]
#[1] 10
I have a problem (maybe it is not that difficult but I cannot figure it out:
I have a list (l) of 25 and I want to divide the list into 5 groups but randomly. The problem I have is if I use sample(l, 5) and this 5times it does not give me unique samples. So basically, I am looking for is to choose 5 then remove them from the list and then sample again.
I hope someone has a solution... thanks
If you want Andrew's method as a function
sample2 <- function(x, sample.size){
split(x, sample(ceiling(seq_along(x)/sample.size)))
}
sample2(1:20, 5)
gives
$`1`
[1] 1 15 6 3 18
$`2`
[1] 11 7 5 10 14
$`3`
[1] 2 12 4 13 17
$`4`
[1] 19 16 20 8 9
Another method...
x <- 1:20
matrix(x[sample(seq_along(x),length(x))],ncol = 4)
Here we are randomly reordering your vector by sampling index values, then dumping results into a matrix so that its columns represent your five groups. You could also leave it as a vector, or make a list if you don't want your output as a matrix.
You could do something like this...
l <- as.list(LETTERS[1:25])
l2 <- split(l,rep(1:5,5)[sample(25)])
l2 #is then a list of five lists containing all elements of l...
$`1`
$`1`[[1]]
[1] "D"
$`1`[[2]]
[1] "I"
$`1`[[3]]
[1] "M"
$`1`[[4]]
[1] "W"
$`1`[[5]]
[1] "Y"
$`2`
$`2`[[1]]
[1] "C"
$`2`[[2]]
[1] "E"
$`2`[[3]]
[1] "H"
$`2`[[4]]
[1] "T"
$`2`[[5]]
[1] "X"
etc...