Sadly was unable to realize how to do it in R, but the idea seams simple.
What I want is a list of pairs of numbers under a range where the fist pair is the first value and the sum of first pair with the maximum length, in the end I should have something like:
somefun <- function(start, end, step){...}
l <- somefun (5, 30, 5)
l
#[[1]]
#[[1]][[1]]
#[1] 5
#
#[[1]][[2]]
#[1] 10
#
#[[2]]
#[[2]][[1]]
#[1] 11
#
#[[2]][[2]]
#[1] 16
#
#[[3]]
#[[3]][[1]]
#[1] 17
#
#[[3]][[2]]
#[1] 22
#
#[[4]]
#[[4]][[1]]
#[1] 23
#
#[[4]][[2]]
#[1] 28
#
#[[5]]
#[[5]][[1]]
#[1] 29
#[[5]][[2]]
#[1] 30
So, the final list should have the first start and the last end values, but the difference within each list shouldn't be larger than the step.
Also, I don't know if it could be the best way, but my objective is pass this values with lapply to build a plot using grid with gredExtra::grid.arrange
So the list should fit in this code
p_list = lapply(myRanges, function(a,b){
my_gg_function(myData[a:b], font=f)
})
do.call(gridExtra::grid.arrange, c(p_list, ncol=2))
Thanks in advance
How about this
somefun <- function(start, end, step){
starts <- seq(start, end, step+1)
ends <- pmin(starts + step, end)
mapply(list, starts, ends, SIMPLIFY = FALSE)
}
somefun(5, 30, 5)
We just use a basic seq() and trim as needed.
Related
I have vector a and vector b:
vector_a <- c(3,2,2,2,2)
vector_b <- c("3a","3b","3c", "2a","2b", "2ab", "2ac","2aab","2aac","2aaab","2aaac")
I want to break up vector_b according to vector_a. So for example, I want to break up vector_b into a list of 5 elements, using the length of each element in vector_a.
list_a <- list(one = c("3a","3b","3c"),
two = c("2a","2b"),
three = c("2ab","2ac"),
four = c("2aab","2aac"),
five = list(c("2aaab","2aaac"))
My initial involved looping over vector_a and trying to create a list that gives the indices I am trying to extract from vector b. So something of this sorts:
iteration_list <- vector(mode = "list", length = length(vector_a_))
counter <- 1
for(i in seq_along(vector_a)) {
iteration_list[[i]] <- counter:vector_a[[i]]
counter <- counter + vector_a[[i]]
#Drawing a blank here as I see that the next iteration would go from 4:2
}
I'm drawing a blank on the iteration portion, and I believe there has to be a vectorized function or something much more intuitive for this than what I am achieving. Thanks in advance!
Here is an option with split. Create a grouping index based on replicating the sequence of 'vector_a' with the values in 'vector_a' and use that to split the 'vector_b'. If the names of the output needs to be 'one' to 'five', use `english'
library(english)
out <- split(vector_b, rep(seq_along(vector_a), vector_a))
names(out) <- english(seq_along(out))
-output
out
#$one
#[1] "3a" "3b" "3c"
#$two
#[1] "2a" "2b"
#$three
#[1] "2ab" "2ac"
#$four
#[1] "2aab" "2aac"
#$five
#[1] "2aaab" "2aaac"
If we want to use a for loop, while we subset the 'vector_b' based on index, add (+) the ith element of 'vector_a' with 'counter' and subtract 1 to assign those elements into the ith element of 'iteration_list'
counter <- 1
for(i in seq_along(vector_a)) {
iteration_list[[i]] <- vector_b[counter:(vector_a[i] + counter-1)]
counter <- counter + vector_a[i]
}
-output
iteration_list
#[[1]]
#[1] "3a" "3b" "3c"
#[[2]]
#[1] "2a" "2b"
#[[3]]
#[1] "2ab" "2ac"
#[[4]]
#[1] "2aab" "2aac"
#[[5]]
#[1] "2aaab" "2aaac"
We can use split in combination with cut/findInterval to divide data into groups based on position in vector_a.
split(vector_b, findInterval(seq_along(vector_b),
cumsum(vector_a), left.open = TRUE))
#$`0`
#[1] "3a" "3b" "3c"
#$`1`
#[1] "2a" "2b"
#$`2`
#[1] "2ab" "2ac"
#$`3`
#[1] "2aab" "2aac"
#$`4`
#[1] "2aaab" "2aaac"
I am looking to split a string into ngrams of 3 characters - e.g HelloWorld would become "Hel", "ell", "llo", "loW" etc
How would I achieve this using R?
In Python it would take a loop using the range function - e.g. [myString[i:] for i in range(3)]
Is there a neat way to loop through the letters of a string using stringr (or another suitable function/package) to tokenize the word into a vector?
e.g.
dfWords <- c("HelloWorld", "GoodbyeMoon", "HolaSun") %>%
data.frame()
names(dfWords)[1] = "Text"
I would like to generate a new column which would contain a vector of the tokenized Text variable (preferably using dplyr). This can then be split later into new columns.
For the others that are coming here, as I did, to really find the R function that would be an equivalent to range() function in Python, I have found the answer.
And it is seq() function. A few examples will be better than words but the usage is really the same as in Python:
> seq(from = 1, to = 5, by = 1)
[1] 1 2 3 4 5
> seq(from = 1, to = 6, by = 2)
[1] 1 3 5
> seq(5)
[1] 1 2 3 4 5
In base R you could do something like this
ss <- "HelloWorld"
len <- 3
lapply(seq_len(nchar(ss) - len + 1), function(x) substr(ss, x, x + len - 1))
#[[1]]
#[1] "Hel"
#
#[[2]]
#[1] "ell"
#
#[[3]]
#[1] "llo"
#
#[[4]]
#[1] "loW"
#
#[[5]]
#[1] "oWo"
#
#[[6]]
#[1] "Wor"
#
#[[7]]
#[1] "orl"
#
#[[8]]
#[1] "rld"
Explanation: The approach is a basic sliding window method to extract substrings from ss. The return object is a list.
Another (sliding window) alternative could be zoo::rollapply with strsplit
library(zoo)
len <- 3
rollapply(unlist(strsplit(ss, "")), len, paste, collapse = "")
[1] "Hel" "ell" "llo" "loW" "oWo" "Wor" "orl" "rld"
In response to your comment/edit, here's a tidyverse option
# Sample data
df <- data.frame(words = c("HelloWorld", "GoodbyeMoon", "HolaSun"))
library(tidyverse)
library(zoo)
df %>% mutate(lst = map(str_split(words, ""), function(x) rollapply(x, len, paste, collapse = "")))
# words lst
#1 HelloWorld Hel, ell, llo, loW, oWo, Wor, orl, rld
#2 GoodbyeMoon Goo, ood, odb, dby, bye, yeM, eMo, Moo, oon
#3 HolaSun Hol, ola, laS, aSu, Sun
I have a vector made up of lists of length 10.
I have two other vectors storing their lower and upper quantiles.
Is there a way to extract the data between the quantile for each list of 10?
Basically I am looking to see how many of these have a specific number.
sims is the vector with the data
so far I have tried to use the %in% (note- sims is the vector with lists))
for (i in 1:100){
a <- 80.0 %in% sims[[i]]
}
I was going to count how many of these are true and then count them however, this only returns false and also doesn't guarantee if it is in the range.
Is there an easier way than sorting each list by extracting relevant data then checking if it is has the value?
Since you don't provide a sample dataset here is a reproducible example based on some sample data I generate
set.seed(2018)
lst <- replicate(4, sample(10), simplify = FALSE)
qrt <- lapply(lst, quantile, probs = c(0.25, 0.75))
Here I've generated the 25% and 75% quantiles for every vector in list; the result is a list with as many elements as list.
We can now use Map to select only those entries from the list elements that fall within the quantile range
Map(function(x, y) x[x >= y[1] & x <= y[2]], lst, qrt)
#[[1]]
#[1] 4 5 7 6
#
#[[2]]
#[1] 4 6 5 7
#
#[[3]]
#[1] 6 5 4 7
#
#[[4]]
#[1] 4 7 6 5
To count the number of elements within the quantile range
Map(function(x, y) sum(x >= y[1] & x <= y[2]), lst, qrt)
#[[1]]
#[1] 4
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 4
#
#[[4]]
#[1] 4
In R operators can also be expressed as a function call, e.g.
'<-'(b, 12)
for b <- 12.
Why does the following give an error:
'->'(12, b)
? (The code 12 -> b works as expected.)
Because operators are "translated" to functions by the parser and both left and right assignment are parsed to the <- function. There is no right assignment function.
e <- quote(b <- 12)
as.list(e)
#[[1]]
#`<-`
#
#[[2]]
#b
#
#[[3]]
#[1] 12
e <- quote(12 -> b)
as.list(e)
#[[1]]
#`<-`
#
#[[2]]
#b
#
#[[3]]
#[1] 12
Given a list of matrices with different number of columns:
set.seed(123)
a <- replicate(5, matrix(runif(25*30), ncol=25) , simplify=FALSE)
b <- replicate(5, matrix(runif(30*30), ncol=30) , simplify=FALSE)
list.of.matrices <- c(a,b)
How can I apply functional programming principles (i.e. using the purrr package) to operate on a specific range of columns (i.e. 8th row, and from 2nd to the end of columns)?
map(list.of.matrices[8, 2:ncol(list.of.matrices)], mean)
The above returns:
Error in 2:ncol(list.of.matrices) : argument of length 0
map_dbl makes sure the returned values are numeric and double. ~ and . is a simplified way to specify the function to apply.
library(purrr)
map_dbl(list.of.matrices, ~mean(.[8, 2:ncol(.)]))
[1] 0.4377532 0.5118923 0.5082115 0.4749039 0.4608980 0.4108388 0.4832585 0.4394764 0.4975212 0.4580137
The base R equivalent is
sapply(list.of.matrices, function(x) mean(x[8, 2:ncol(x)]))
[1] 0.4377532 0.5118923 0.5082115 0.4749039 0.4608980 0.4108388 0.4832585 0.4394764 0.4975212 0.4580137
Base R solution using the Map function in base-R:
Map(function(x){mean(x[8,2:ncol(x)])},list.of.matrices)
#[[1]]
#[1] 0.4377532
#[[2]]
#[1] 0.5118923
#[[3]]
#[1] 0.5082115
#[[4]]
#[1] 0.4749039
#[[5]]
#[1] 0.460898
#[[6]]
#[1] 0.4108388
#[[7]]
#[1] 0.4832585
#[[8]]
#[1] 0.4394764
#[[9]]
#[1] 0.4975212
#[[10]]
#[1] 0.4580137