expand numbers with text separated by hyphen R - r

Following the question here
How to expand a given range of numbers to include all numbers separated by a dash
It worked well with "text+ number range", like in the example "Ballroom 1-3"
The code is this
expand.dash <- function(dashed) {
limits <- as.numeric(unlist(strsplit(dashed, '-')))
seq(limits[1], limits[2])
}
expand.ballrooms <- function(txt) {
str <- gsub('\\d+\\s?-\\s?\\d+', '%d', txt)
dashed_str <- gsub('[a-zA-Z ]+', '', txt)
sprintf(str, expand.dash(dashed_str))
}
>expand.ballrooms("Ballroom 1-3")
#[1] "Ballroom 1" "Ballroom 2" "Ballroom 3"
>expand.ballrooms("Ballroom 1 - 3")
#[1] "Ballroom 1" "Ballroom 2" "Ballroom 3"
But if there is "text+number - text+number", for example "Ballroom1 - Ballroom3"
The original code doesn't work well, and returns
[1] "Ballroom1 - Ballroom3" "Ballroom1 - Ballroom3" "Ballroom1 - Ballroom3"
And I reckon would need to change this line
str <- gsub('\\d+-\\d+', '%d', txt)
But I can't really figure out how it could work.
The result should still be "Ballroom 1" "Ballroom 2" "Ballroom 3".
Thank you!

Here is a base R way.
x <- "Ballroom1 - Ballroom3"
y <- gsub("[^[:digit:]\\-]", "", x)
expand.ballrooms(paste("Ballrooms", y))
#[1] "Ballrooms 1" "Ballrooms 2" "Ballrooms 3"

This should cover just about everything:
expand.ballroom <- function(x) {
m <- str_match( x, "Ballroom *(\\d+) *- *(?:Ballroom)? *(\\d+)" )
return( paste( "Ballroom", m[,2]:m[,3] ) )
}
expand.ballroom( "Ballroom1 - Ballroom3" )
expand.ballroom( "Ballroom 1 - Ballroom 3" )
expand.ballroom( "Ballroom 1-3" )
expand.ballroom( "Ballroom1-3" )
expand.ballroom( "Ballroom 1 - 3" )
It produces:
> expand.ballroom( "Ballroom1 - Ballroom3" )
[1] "Ballroom 1" "Ballroom 2" "Ballroom 3"
> expand.ballroom( "Ballroom 1 - Ballroom 3" )
[1] "Ballroom 1" "Ballroom 2" "Ballroom 3"
> expand.ballroom( "Ballroom 1-3" )
[1] "Ballroom 1" "Ballroom 2" "Ballroom 3"
> expand.ballroom( "Ballroom1-3" )
[1] "Ballroom 1" "Ballroom 2" "Ballroom 3"
> expand.ballroom( "Ballroom 1 - 3" )
[1] "Ballroom 1" "Ballroom 2" "Ballroom 3"

This will extract the first word, e.g. Ballroom from the string, and the digits.
expand.dash <- function(dashed) {
limits <- as.numeric(unlist(strsplit(dashed, '-')))
seq(limits[1], limits[2])
}
expand.ballrooms <- function(txt) {
str <- paste0(gsub("([A-Za-z]+).*", "\\1", txt), "%d")
dashed_str <- gsub('[a-zA-Z ]+', '', txt)
sprintf(str, expand.dash(dashed_str))
}
> expand.ballrooms("Ballroom 1-3")
[1] "Ballroom1" "Ballroom2" "Ballroom3"
> expand.ballrooms("Ballroom 1 - 3")
[1] "Ballroom1" "Ballroom2" "Ballroom3"
> expand.ballrooms("Ballroom1 - Ballroom3")
[1] "Ballroom1" "Ballroom2" "Ballroom3"

Base R solution:
test1 <- "Ballroom 1 - 3"
test2 <- "Ballroom1 - Ballroom3"
expand.ballrooms <- function(string_to_expand){
y <- unlist(strsplit(string_to_expand, "\\s*\\-\\s*"))
prefix <- Filter(function(z){z!=""}, trimws(gsub("(\\d+)|\\-", "", y)))
nos <- trimws(gsub("\\D+", "", y))
res <- paste(prefix, eval(parse(text=paste0(nos, collapse = ":"))))
return(res)
}
expand.ballrooms(test1)
expand.ballrooms(test2)

Related

For-loop index iteration over date vector

I know this is probably a stupid question but why do I need to start from 2 instead of one to get this list to loop properly?
date <- seq(as.Date("2021-01-01"), as.Date("2021-12-31"), by="months")
num <- length(date)
for(i in 2:num-1){
print(paste0("i = ",i))
j = i+1
sd <- date[i]
ed <- date[i+1]
print(paste0("start: ",sd))
print(paste0("end: ",ed))
}
Output:
[1] "i = 1"
[1] "start: 2021-05-01"
[1] "end: 2021-06-01"
[1] "i = 2"
[1] "start: 2021-06-01"
[1] "end: 2021-07-01"
[1] "i = 3"
[1] "start: 2021-07-01"
[1] "end: 2021-08-01"
[1] "i = 4"
[1] "start: 2021-08-01"
[1] "end: 2021-09-01"
[1] "i = 5"
[1] "start: 2021-09-01"
[1] "end: 2021-10-01"
But when I start with: for(i in 1:num-1)
It doesn't find the first indexed item properly:
[1] "i = 0"
[1] "start: "
[1] "end: 2021-05-01"
[1] "i = 1"
[1] "start: 2021-05-01"
[1] "end: 2021-06-01"
[1] "i = 2"
[1] "start: 2021-06-01"
[1] "end: 2021-07-01"
[1] "i = 3"
[1] "start: 2021-07-01"
[1] "end: 2021-08-01"
[1] "i = 4"
[1] "start: 2021-08-01"
[1] "end: 2021-09-01"
[1] "i = 5"
[1] "start: 2021-09-01"
[1] "end: 2021-10-01"
Ok So thanks to Andre Wildberg in the comments, I didn't realize there was an implied order of operations within the counter. So this was what worked, using parenthesis to make explicit the desired order of operations:
1:(num-1)
instead of
1:num-1

string manipulation in matrix in R

I have a matrix like so
A = matrix(
c("2 (1-3)", "4 (2-6)", "3 (2-4)", "1 (0.5-1.5)", "5 (2.5-7.5)", "7 (5-9)"),
nrow=3,
ncol=2)
I want to replace all strings where the first element is less than 5 (ie "0" or "1" or "2" or "3" or "4") with "< 5". It should be:
B = matrix(
c("< 5", "< 5", "< 5", "< 5", "5 (2.5-7.5)", "7 (5-9)"),
nrow=3,
ncol=2)
Any ideas?
Extract the 1st number, convert it into numeric and replace the numbers which are less than 5 with "<5".
A[as.numeric(sub('(\\d+).*', '\\1', A)) < 5] <- '< 5'
A
# [,1] [,2]
#[1,] "< 5" "< 5"
#[2,] "< 5" "5 (2.5-7.5)"
#[3,] "< 5" "7 (5-9)"
A shortcut to extract the first number and to convert it to numeric is using readr::parse_number.
A[readr::parse_number(A) < 5] <- '< 5'
Use substr() to etract the 1st chcaracter of each matrix element. As long as that is a number you can convert it to one via as.numeric()
A[as.numeric(substr(A,1,1))<5] <- "<5"
We don't need to extract and convert to numeric if there are only 5 options:
ie "0" or "1" or "2" or "3" or "4"
A[grep("^[0-4]", A)] <- "< 5"
Or
replace(A, grep("^[0-4]", A), "< 5")
Or
replace(A, startsWith("[0-4]", A), "< 5")
Result
# [,1] [,2]
# [1,] "< 5" "< 5"
# [2,] "< 5" "5 (2.5-7.5)"
# [3,] "< 5" "7 (5-9)"
1) read.table
Use read.table to get the first number in each cell giving vector firstNo. Then use replace to replace those cells with < 5.
The original input A is preserved which is generally desirable to make it easier to test and debug but if you prefer to overwrite it anyways then replace the left hand side of the second line of code with A.
No regular expressions and no packages are used.
firstNo <- read.table(text = A)[[1]]
B <- replace(A, firstNo < 5, "< 5")
B
giving:
[,1] [,2]
[1,] "< 5" "< 5"
[2,] "< 5" "5 (2.5-7.5)"
[3,] "< 5" "7 (5-9)"
Although not needed for the sample input in the question, if it is possible that the text after the left parenthesis is irregular then you might need to add the fill=TRUE or comment.char = "(" arguments to read.table.
2) gsubfn
gsubfn is like gsub except it inputs the capture groups in the regular expression, i.e. the parenthesized portions of the regular expression, into the function expressed in formula notation in the second argument and then replaces the match with the output of the function.
library(gsubfn)
B <- replace(A,
TRUE,
gsubfn("^(\\d) (.*)", ~ if (as.numeric(x) < 5) "< 5" else paste(x, y), A)
)
B
giving:
[,1] [,2]
[1,] "< 5" "< 5"
[2,] "< 5" "5 (2.5-7.5)"
[3,] "< 5" "7 (5-9)"

Convert a dataframe to large character

I have a dataframe but need to convert it to a large character. Here is an example of the dataframe structure:
texts <- c("TEXT 1", "TEXT 2", "TEXT 3")
data <- data.frame(texts)
I need this structure:
[1] "TEXT 1" "TEXT 2" "TEXT 3"
I already tried using function as.character() , but it does not work as it converts all the lines to a single line.
You can transpose and concatenate, i.e.
c(t(data))
#[1] "TEXT 1" "TEXT 2" "TEXT 3"

concatenate two lists of string in r

Here is my sample:
a = c("a","b","c")
b = c("1","2","3")
I need to concatenate a and b automatically. The result should be "a 1","a 2","a 3","b 1","b 2","b 3","c 1","c 2","c 3".
For now, I am using the paste function:
paste(a[1],b[1])
I need an automatic way to do this. Besides writing a loop, is there any easier way to achieve this?
c(outer(a, b, paste))
# [1] "a 1" "b 1" "c 1" "a 2" "b 2" "c 2" "a 3" "b 3" "c 3"
Other options are :
paste(rep.int(a,length(b)),b)
or :
with(expand.grid(b,a),paste(Var2,Var1))
You can do:
c(sapply(a, function(x) {paste(x,b)}))
[1] "a 1" "a 2" "a 3" "b 1" "b 2" "b 3" "c 1" "c 2" "c 3"
edited paste0 into paste to match OP update

Filling matrix with array coordinates in R

I am trying to fill a matrix so that each element will be a string consisting of its coordinates (row, column).
i.e.
[ '1,1' '1,2' '1,3' ]
[ '2,1' '2,2' '2,3' ]
[ '3,1' '3,2' '3,3' ]
I have been able to do this with a square matrix but it is not robust if I vary the number of rows or columns.
This is what I have so far
#Works but only with a square matrix
x <- 20 #Number of rows
y <- 20 #Number of columns
samp <- 200 #Number of frames to sample
grid = matrix(data = NA,nrow = x,ncol = y)
for (iter_col in 1:y){
for (iter_row in 1:x){
grid[iter_col,iter_row] = paste(toString(iter_row),toString(iter_col),sep = ',')
}
}
I am using this to randomly sample a grid which I superimpose on images for a cell counting method. So I do not have any data yet. Not all of these grids will have equal numbers of rows and columns.
Can you help me make this more flexible? My background in R is a little lacking so the solution my be right in front of me...
Thanks!
Edit
My variables in grid[iter_col,iter_row] were in the wrong order. Once they were switched it works for matrices of varying dimensions.
Thanks G5W for catching that error.
Here's one way using sapply
rows = 4
columns = 5
sapply(1:columns, function(i) sapply(1:rows, function(j) paste(j,i,sep = ", ")))
# [,1] [,2] [,3] [,4] [,5]
#[1,] "1, 1" "1, 2" "1, 3" "1, 4" "1, 5"
#[2,] "2, 1" "2, 2" "2, 3" "2, 4" "2, 5"
#[3,] "3, 1" "3, 2" "3, 3" "3, 4" "3, 5"
#[4,] "4, 1" "4, 2" "4, 3" "4, 4" "4, 5"
I suspect this would be much faster:
matrix(paste0(rep(seq_len(x), times=y), ", ", rep(seq_len(y), each=x)), nrow = x, ncol = y)
[,1] [,2] [,3] [,4] [,5]
[1,] "1, 1" "1, 2" "1, 3" "1, 4" "1, 5"
[2,] "2, 1" "2, 2" "2, 3" "2, 4" "2, 5"
[3,] "3, 1" "3, 2" "3, 3" "3, 4" "3, 5"
[4,] "4, 1" "4, 2" "4, 3" "4, 4" "4, 5"
OR using col and row (as mentioned in the comments by #rawr)
grid[] <- paste0(row(grid), ", ", col(grid))

Resources