Shortening a long vector in R

Shortening a long vector in R - r

Vectors a and b can be shortened using toString(width = 10) in Base R resulting in a shorter vector that ends in ....
However, I wonder how I can make the shortened vector to end in ..., last vector element?
My desired_output is shown below.
a <- 1:26
b <- LETTERS
toString(a, width = 10)
# [1] "1,2,...."
desired_output1 = "1,2,...,26"
toString(b, width = 10)
# [1] "A,B,...."
desired_output2 = "A,B,...,Z"

You could just add the end on.
paste(toString(a, width = 10), a[length(a)], sep=", ")
[1] "1, 2, ...., 26"
paste(toString(b, width = 10), b[length(b)], sep=", ")
[1] "A, B, ...., Z"

After applting the toString, we may use sub to remove the substring to format
f1 <- function(vec, n = 2) {
gsub("\\s+", "",
sub(sprintf("^(([^,]+, ){%s}).*, ([^,]+)$", n), "\\1...,\\3", toString(vec)))
}
-testing
> f1(a)
[1] "1,2,...,26"
> f1(b)
[1] "A,B,...,Z"
> f1(a, 3)
[1] "1,2,3,...,26"
> f1(b, 3)
[1] "A,B,C,...,Z"
> f1(a, 4)
[1] "1,2,3,4,...,26"
> f1(b, 4)
[1] "A,B,C,D,...,Z"

We could do it this way:
Creating a function that extraxt the first two elements and the last element of the vector and paste them together:
my_func <- function(x) {
a <- paste(x[1:2], collapse=",")
b <- tail(x, n=1)
paste0(a,",...,",b)
}
my_func(a)
[1] "1,2,...,26"
my_func(b)
[1] "A,B,...,Z"

library(stringr)
a <- 1:26
b <- LETTERS
reduce_string <- function(x, n_show) {
str_c(x[1:n_show], collapse = ',') %>%
str_c('....,', x[[length(x)]])
}
reduce_string(a, 2)
#> [1] "1,2....,26"
Created on 2022-01-02 by the reprex package (v2.0.1)

Related

Use Recursion in R to Split a String Into Chunks

I am trying to wrap my head around the idea of recursion. However, when I apply my recursive R function, it does not return a string split into the number of chunks desired. It only returns two chunks. However, my goal is to split a long string into multiple chunks of smaller strings of size n. I am sure there are other ways to do this, but I am trying find a recursive solution. Any help is appreciated thanks in advance.
# Sample dataset
x <- paste0(rep(letters, 10000), collapse = "")
split_group <- function(x, n = 10) {
if (nchar(x) < n) {
return(x)
} else {
beginning <- substring(x, 1, n)
remaining <- substring(x, (n + 1), (n + 1) + (n - 1))
c(beginning, split_group(remaining, n))
}
}
split_group(x = x, n = 10)
# Returns: "abcdefghij" "klmnopqrst" ""

Use <= instead of < and fix remaining.
split_group <- function(x, n = 10) {
if (nchar(x) <= n) x
else {
beginning <- substring(x, 1, n)
remaining <- substring(x, n + 1)
c(beginning, split_group(remaining, n))
}
}
x <- substring(paste(letters, collapse = ""), 1, 24)
split_group(x, 2)
## [1] "ab" "cd" "ef" "gh" "ij" "kl" "mn" "op" "qr" "st" "uv" "wx"
split_group(x, 5)
## [1] "abcde" "fghij" "klmno" "pqrst" "uvwx"
split_group(x, 6)
## [1] "abcdef" "ghijkl" "mnopqr" "stuvwx"
split_group(x, 10)
## [1] "abcdefghij" "klmnopqrst" "uvwx"
split_group(x, 23)
## [1] "abcdefghijklmnopqrstuvw" "x"
split_group(x, 24)
## [1] "abcdefghijklmnopqrstuvwx"
split_group(x, 25)
## [1] "abcdefghijklmnopqrstuvwx"
2) and some approaches without recursion The first is the shortest but the second is the simplest and only uses base R. The third only uses base R as well.
library(gsubfn)
strapply(x, "(.{1,10})", simplify = c)
## [1] "abcdefghij" "klmnopqrst" "uvwx"
ix <- seq(1, nchar(x), 10)
substring(x, ix, ix + 10 - 1)
## [1] "abcdefghij" "klmnopqrst" "uvwx"
sapply(seq(1, nchar(x), 10), function(i) substring(x, i, i + 10 - 1))
## [1] "abcdefghij" "klmnopqrst" "uvwx"
library(zoo)
s <- strsplit(x, "")[[1]]
rollapply(s, 10, by = 10, paste0, collapse = "", partial = TRUE, align = "left")
## [1] "abcdefghij" "klmnopqrst" "uvwx"

A base R option would be
x1 <- strsplit(x, "(?<=.{10})(?=.)", perl = TRUE)[[1]]
-output
> head(x1, 10)
[1] "abcdefghij" "klmnopqrst" "uvwxyzabcd" "efghijklmn" "opqrstuvwx" "yzabcdefgh" "ijklmnopqr" "stuvwxyzab" "cdefghijkl" "mnopqrstuv"

How to add dashed lines between elements of a list?

I have a list like this:
x <- 1
y <- 2
z <- "something"
my_list <- list("x" = x, "y" = y, "z" = z)
> my_list
$x
[1] 1
$y
[1] 2
$z
"something"
in truth my list is very long including big text elements such that in output I can not recognise them easily. Therefore I want to put a dashed line after every element of the list in Output like
$x
[1] 1
-------------------------------------
$y
[1] 2
-------------------------------------
$z
[1] "something"
-------------------------------------

Something like this could work.
mylistprint <- function(x){
nn <- names(x)
ll <- length(x)
if (length(nn) != ll) {
nn <- paste("Component", seq.int(ll))
}
for (i in seq_len(ll)) {
cat(nn[i], ":\n")
print(x[[i]])
cat("\n")
cat(strrep("-", 25))
cat("\n")
}
invisible(x)
}
mylistprint(my_list)
The output of this would be:
x :
[1] 1
-------------------------
y :
[1] 2
-------------------------
z :
[1] "something"
-------------------------

Using mapply
Probably a nicer way to do this is using mapply, or at least it is much shorter.
fun1 <- function(x,y) cat(paste0('$', x), y,strrep("-", 25), sep = '\n')
x <- mapply(fun1, names(my_list), my_list)
This prints:
$x
1
-------------------------
$y
2
-------------------------
$z
something
-------------------------
Single line
x <- mapply(function(x,y) cat(paste0('$', x), y,strrep("-", 25), sep = '\n'), names(my_list), my_list)
Wrap it in a function if you want
print.list <- function(list) {
x <- mapply(function(x,y) cat(paste0('$', x), y,strrep("-", 25), sep = '\n'), names(list), list)
}

From my comments, you could run a for loop, printing each element of a list, then printing "--------...-----", then the next element of a list, put this into a function and you are done, for example,
lsprint <- function(list){
for (i in 1:length(list)){
print(names(my_list)[i])
print(my_list[[i]])
print('--------------------')
}
}
lsprint(my_list)
Returns,
[1] "x"
[1] 1
[1] "--------------------"
[1] "y"
[1] 2
[1] "--------------------"
[1] "z"
[1] "something"
[1] "--------------------"
Edit: Added so you get the name

How to convert numbers in Base 2 to Base 4 in R

For instance, how to convert the number '10010000110000011000011111011000' in Base2 to number in Base4 ?

Here is one approach that breaks up the string into units of length 2 and then looks up the corresponding base 4 for the pair:
convert <- c("00"="0","01"="1","10"="2","11"="3")
from2to4 <- function(s){
if(nchar(s) %% 2 == 1) s <- paste0('0',s)
n <- nchar(s)
bigrams <- sapply(seq(1,n,2),function(i) substr(s,i,i+1))
digits <- convert[bigrams]
paste0(digits, collapse = "")
}

A one-liner approach:
> paste(as.numeric(factor(substring(a,seq(1,nchar(a),2),seq(2,nchar(a),2))))-1,collapse="")
[1] "2100300120133120"
There are multiple ways to split the string into 2 digits, see Chopping a string into a vector of fixed width character elements

Here are a couple inverses:
bin_to_base4 <- function(x){
x <- strsplit(x, '')
vapply(x, function(bits){
bits <- as.integer(bits)
paste(2 * bits[c(TRUE, FALSE)] + bits[c(FALSE, TRUE)], collapse = '')
}, character(1))
}
base4_to_bin <- function(x){
x <- strsplit(x, '')
vapply(x, function(quats){
quats <- as.integer(quats)
paste0(quats %/% 2, quats %% 2, collapse = '')
}, character(1))
}
x <- '10010000110000011000011111011000'
bin_to_base4(x)
#> [1] "2100300120133120"
base4_to_bin(bin_to_base4(x))
#> [1] "10010000110000011000011111011000"
...and they're vectorized!
base4_to_bin(bin_to_base4(c(x, x)))
#> [1] "10010000110000011000011111011000" "10010000110000011000011111011000"
For actual use, it would be a good idea to put in some sanity checks to ensure the input is actually in the appropriate base.

Convert Base2 to Base10 first, then from Base10 to Base4

Combining elements in a string vector with defined element size and accounting for not event sizes

Given is vector:
vec <- c(LETTERS[1:10])
I would like to be able to combine it in a following manner:
resA <- c("AB", "CD", "EF", "GH", "IJ")
resB <- c("ABCDEF","GHIJ")
where elements of the vector vec are merged together according to the desired size of a new element constituting the resulting vector. This is 2 in case of resA and 5 in case of resB.
Desired solution characteristics
The solution should allow for flexibility with respect to the element sizes, i.e. I may want to have vectors with elements of size 2 or 20
There may be not enough elements in the vector to match the desired chunk size, in that case last element should be shortened accordingly (as shown)
This is shouldn't make a difference but the solution should work on words as well
Attempts
Initially, I was thinking of using something on the lines:
c(
paste0(vec[1:2], collapse = ""),
paste0(vec[3:4], collapse = ""),
paste0(vec[5:6], collapse = "")
# ...
)
but this would have to be adapted to jump through the remaining pairs/bigger groups of the vec and handle last group which often would be of a smaller size.

Here is what I came up with. Using Harlan's idea in this question, you can split the vector in different number of chunks. You also want to use your paste0() idea in lapply() here. Finally, you unlist a list.
unlist(lapply(split(vec, ceiling(seq_along(vec)/2)), function(x){paste0(x, collapse = "")}))
# 1 2 3 4 5
#"AB" "CD" "EF" "GH" "IJ"
unlist(lapply(split(vec, ceiling(seq_along(vec)/5)), function(x){paste0(x, collapse = "")}))
# 1 2
#"ABCDE" "FGHIJ"
unlist(lapply(split(vec, ceiling(seq_along(vec)/3)), function(x){paste0(x, collapse = "")}))
# 1 2 3 4
#"ABC" "DEF" "GHI" "J"

vec <- c(LETTERS[1:10])
f1 <- function(x, n){
f <- function(x) paste0(x, collapse = '')
regmatches(f(x), gregexpr(f(rep('.', n)), f(x)))[[1]]
}
f1(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"
or
f2 <- function(x, n)
apply(matrix(x, nrow = n), 2, paste0, collapse = '')
f2(vec, 5)
# [1] "ABCDE" "FGHIJ"
or
f3 <- function(x, n) {
f <- function(x) paste0(x, collapse = '')
strsplit(gsub(sprintf('(%s)', f(rep('.', n))), '\\1 ', f(x)), '\\s+')[[1]]
}
f3(vec, 4)
# [1] "ABCD" "EFGH" "IJ"
I would say the last is best of these since n for the others must be a factor or you will get warnings or recycling
edit - more
f4 <- function(x, n) {
f <- function(x) paste0(x, collapse = '')
Vectorize(substring, USE.NAMES = FALSE)(f(x), which((seq_along(x) %% n) == 1),
which((seq_along(x) %% n) == 0))
}
f4(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"
or
f5 <- function(x, n)
mapply(function(x) paste0(x, collapse = ''),
split(x, c(0, head(cumsum(rep_len(sequence(n), length(x)) %in% n), -1))),
USE.NAMES = FALSE)
f5(vec, 4)
# [1] "ABCD" "EFGH" "IJ"

Here is another way, working with the original array.
A side note, working with words is not straightforward, since there is at least two ways to understand it: you can either keep each word separately or collapse them first an get individual characters. The next function can deal with both options.
vec <- c(LETTERS[1:10])
vec2 <- c("AB","CDE","F","GHIJ")
cuts <- function(x, n, bychar=F) {
if (bychar) x <- unlist(strsplit(paste0(x, collapse=""), ""))
ii <- seq_along(x)
li <- split(ii, ceiling(ii/n))
return(sapply(li, function(y) paste0(x[y], collapse="")))
}
cuts(vec2,2,F)
# 1 2
# "ABCDE" "FGHIJ"
cuts(vec2,2,T)
# 1 2 3 4 5
# "AB" "CD" "EF" "GH" "IJ"

reassign values in a list without looping

test <- list(a = list("first"= 1, "second" = 2),
b = list("first" = 3, "second" = 4))
In the list above, I would like to reassign the "first" elements to equal, let's say, five. This for loop works:
for(temp in c("a", "b")) {
test[[temp]]$first <- 5
}
Is there a way to do the same using a vectorized operation (lapply, etc)? The following extracts the values, but I can't get them reassigned:
lapply(test, "[[", "first")

Here is a vectorised one-liner using unlist and relist:
relist((function(x) ifelse(grepl("first",names(x)),5,x))(unlist(test)),test)
$a
$a$first
[1] 5
$a$second
[1] 2
$b
$b$first
[1] 5
$b$second
[1] 4

You can do it like this:
test <- lapply(test, function(x) {x$first <- 5; x})

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Shortening a long vector in R - r

You could just add the end on. paste(toString(a, width = 10), a[length(a)], sep=", ") [1] "1, 2, ...., 26" paste(toString(b, width = 10), b[length(b)], sep=", ") [1] "A, B, ...., Z"

We could do it this way: Creating a function that extraxt the first two elements and the last element of the vector and paste them together: my_func <- function(x) { a <- paste(x[1:2], collapse=",") b <- tail(x, n=1) paste0(a,",...,",b) } my_func(a) [1] "1,2,...,26" my_func(b) [1] "A,B,...,Z"

library(stringr) a <- 1:26 b <- LETTERS reduce_string <- function(x, n_show) { str_c(x[1:n_show], collapse = ',') %>% str_c('....,', x[[length(x)]]) } reduce_string(a, 2) #> [1] "1,2....,26" Created on 2022-01-02 by the reprex package (v2.0.1)

Related

Use Recursion in R to Split a String Into Chunks

How to add dashed lines between elements of a list?

How to convert numbers in Base 2 to Base 4 in R

Combining elements in a string vector with defined element size and accounting for not event sizes

reassign values in a list without looping

Categories

Resources