string manipulation in matrix in R - r

I have a matrix like so
A = matrix(
c("2 (1-3)", "4 (2-6)", "3 (2-4)", "1 (0.5-1.5)", "5 (2.5-7.5)", "7 (5-9)"),
nrow=3,
ncol=2)
I want to replace all strings where the first element is less than 5 (ie "0" or "1" or "2" or "3" or "4") with "< 5". It should be:
B = matrix(
c("< 5", "< 5", "< 5", "< 5", "5 (2.5-7.5)", "7 (5-9)"),
nrow=3,
ncol=2)
Any ideas?

Extract the 1st number, convert it into numeric and replace the numbers which are less than 5 with "<5".
A[as.numeric(sub('(\\d+).*', '\\1', A)) < 5] <- '< 5'
A
# [,1] [,2]
#[1,] "< 5" "< 5"
#[2,] "< 5" "5 (2.5-7.5)"
#[3,] "< 5" "7 (5-9)"
A shortcut to extract the first number and to convert it to numeric is using readr::parse_number.
A[readr::parse_number(A) < 5] <- '< 5'

Use substr() to etract the 1st chcaracter of each matrix element. As long as that is a number you can convert it to one via as.numeric()
A[as.numeric(substr(A,1,1))<5] <- "<5"

We don't need to extract and convert to numeric if there are only 5 options:
ie "0" or "1" or "2" or "3" or "4"
A[grep("^[0-4]", A)] <- "< 5"
Or
replace(A, grep("^[0-4]", A), "< 5")
Or
replace(A, startsWith("[0-4]", A), "< 5")
Result
# [,1] [,2]
# [1,] "< 5" "< 5"
# [2,] "< 5" "5 (2.5-7.5)"
# [3,] "< 5" "7 (5-9)"

1) read.table
Use read.table to get the first number in each cell giving vector firstNo. Then use replace to replace those cells with < 5.
The original input A is preserved which is generally desirable to make it easier to test and debug but if you prefer to overwrite it anyways then replace the left hand side of the second line of code with A.
No regular expressions and no packages are used.
firstNo <- read.table(text = A)[[1]]
B <- replace(A, firstNo < 5, "< 5")
B
giving:
[,1] [,2]
[1,] "< 5" "< 5"
[2,] "< 5" "5 (2.5-7.5)"
[3,] "< 5" "7 (5-9)"
Although not needed for the sample input in the question, if it is possible that the text after the left parenthesis is irregular then you might need to add the fill=TRUE or comment.char = "(" arguments to read.table.
2) gsubfn
gsubfn is like gsub except it inputs the capture groups in the regular expression, i.e. the parenthesized portions of the regular expression, into the function expressed in formula notation in the second argument and then replaces the match with the output of the function.
library(gsubfn)
B <- replace(A,
TRUE,
gsubfn("^(\\d) (.*)", ~ if (as.numeric(x) < 5) "< 5" else paste(x, y), A)
)
B
giving:
[,1] [,2]
[1,] "< 5" "< 5"
[2,] "< 5" "5 (2.5-7.5)"
[3,] "< 5" "7 (5-9)"

Related

how to remove non alphabetic characters and columns from an csv file

I have a csv file that looks like this:
And in some portions the data in the columns is like this:
so as you can see, and because the "=" sign is present it wants to convert it into a formula, but what I need is the word in this case "rama...
I have extracted this term from a spam file and with R converted into a sparse matrix. So the question that I have is how can I get rid of the non-alphanumeric characters from this header in R, and then convert it again into a csv file?
Thanks
If you want a literal answer, you could try using gsub to replace any entry having one or more non alphanumeric characters:
df <- data.frame(v1=c(1,2,3), v2=c("#NAME?", "two", "#NAME?"),
stringsAsFactors=FALSE)
df <- data.frame(sapply(df, function(x) gsub(".*[^A-Za-z0-9].*", "", x)))
df
v1 v2
1 1
2 2 two
3 3
Demo
But the best/easiest thing to do here is probably to just fix your Excel formulas such that you catch these errors, and just display empty string, or some other sensible message. From what I can see, this is basically an Excel, not R, problem.
You can use gsub for that:
## A dummy matrix
example <- matrix(paste0("=", letters[1:9]),3,3)
# [,1] [,2] [,3]
#[1,] "= a" "= d" "= g"
#[2,] "= b" "= e" "= h"
#[3,] "= c" "= f" "= i"
You can remove the "=" by replacing it by "" in gsub
## Replacing the "=" by "" (nothing)
gsub("=", "", example)
# [,1] [,2] [,3]
#[1,] "a" "d" "g"
#[2,] "b" "e" "h"
#[3,] "c" "f" "i"
Or only in the first row (or in the column name, etc.)
## Removing the "=" in the first row
example <- gsub("=", "", example[,1])
# [,1] [,2] [,3]
#[1,] "a" "d" "g"
#[2,] "=b" "=e" "=h"
#[3,] "=c" "=f" "=i"

R substring based on Regular Expression

I have a strings like :
myString = "2 word1 & 4 word2"
myString = "4 word2"
myString = "2 word1"
I would like to get the number before the word1 and the number before word2
number1 = 2
number2 = 4
How can i do with a regular expression in R
I tried something like this but it only get the first number
gsub("([0-9]+).*", "\\1", myString)
You may extract specific number before a specific string using a regex with a lookahead:
> word1_res <- str_extract_all(myString, "\\d+(?=\\s*word1)")
> word1_res
[[1]]
[1] "2"
[[2]]
character(0)
[[3]]
[1] "2"
The results for word2 can be retrieved similarly:
word2_res <- str_extract_all(myString, "\\d+(?=\\s*word2)")
Details
\d+ - 1 or more digits...
(?=\\s*word2) - if immediately followed with:
\s* - 0+ whitespaces
word2 - a literal word2 substring.
A base R equivalent is
regmatches(myString, gregexpr("\\d+(?=\\s*word1)", myString, perl=TRUE))
regmatches(myString, gregexpr("\\d+(?=\\s*word2)", myString, perl=TRUE))
A sub almost equivalent solution would be
> sub(".*?(\\d+)\\s*word1.*|.*","\\1",myString)
[1] "2" "" "2"
> sub(".*?(\\d+)\\s*word2.*|.*","\\1",myString)
[1] "4" "4" ""
Note that this implies there is only one result per string, while str_extract_all will get all occurrences from the string.
To extract any chunk of 1+ digits as a whole word using a stringr solution with str_extract_all
library(stringr)
str_extract_all(myString, "\\b\\d+\\b")
or a base R one with regmatches/gregexpr:
myString <- c("2 word1 & 4 word2", "4 word2", "2 word1")
regmatches(myString, gregexpr("\\b\\d+\\b", myString))
See an online R demo. Output:
[[1]]
[1] "2" "4"
[[2]]
[1] "4"
[[3]]
[1] "2"
Details
\b - a word boundary
\d+ - 1 or more digits
\b - a word boundary.
try
myString = "2 word1 & 4 word2"
number1 = gsub("([0-9]+).*", "\\1", myString)
myString = "4 word2"
number2 = gsub("([0-9]+).*", "\\1", myString)
myString = "2 word1"
number3 = gsub("([0-9]+).*", "\\1", myString)
print(number1)
print(number2)
print(number3)
If you assign 3 times a string to myString, myString will only contain the last one.
This removes each occurrence of a letter or ampersand possibly followed by other non-space characters and then scans in what is left. The scan also converts them to numeric. No packages are used.
myString <- c("2 word1 & 4 word2", "4 word2", "2 word1")
lapply(myString, function(x) scan(text = gsub("[[:alpha:]&]\\S*", "", x), quiet = TRUE))
giving:
[[1]]
[1] 2 4
[[2]]
[1] 4
[[3]]
[1] 2

Filling matrix with array coordinates in R

I am trying to fill a matrix so that each element will be a string consisting of its coordinates (row, column).
i.e.
[ '1,1' '1,2' '1,3' ]
[ '2,1' '2,2' '2,3' ]
[ '3,1' '3,2' '3,3' ]
I have been able to do this with a square matrix but it is not robust if I vary the number of rows or columns.
This is what I have so far
#Works but only with a square matrix
x <- 20 #Number of rows
y <- 20 #Number of columns
samp <- 200 #Number of frames to sample
grid = matrix(data = NA,nrow = x,ncol = y)
for (iter_col in 1:y){
for (iter_row in 1:x){
grid[iter_col,iter_row] = paste(toString(iter_row),toString(iter_col),sep = ',')
}
}
I am using this to randomly sample a grid which I superimpose on images for a cell counting method. So I do not have any data yet. Not all of these grids will have equal numbers of rows and columns.
Can you help me make this more flexible? My background in R is a little lacking so the solution my be right in front of me...
Thanks!
Edit
My variables in grid[iter_col,iter_row] were in the wrong order. Once they were switched it works for matrices of varying dimensions.
Thanks G5W for catching that error.
Here's one way using sapply
rows = 4
columns = 5
sapply(1:columns, function(i) sapply(1:rows, function(j) paste(j,i,sep = ", ")))
# [,1] [,2] [,3] [,4] [,5]
#[1,] "1, 1" "1, 2" "1, 3" "1, 4" "1, 5"
#[2,] "2, 1" "2, 2" "2, 3" "2, 4" "2, 5"
#[3,] "3, 1" "3, 2" "3, 3" "3, 4" "3, 5"
#[4,] "4, 1" "4, 2" "4, 3" "4, 4" "4, 5"
I suspect this would be much faster:
matrix(paste0(rep(seq_len(x), times=y), ", ", rep(seq_len(y), each=x)), nrow = x, ncol = y)
[,1] [,2] [,3] [,4] [,5]
[1,] "1, 1" "1, 2" "1, 3" "1, 4" "1, 5"
[2,] "2, 1" "2, 2" "2, 3" "2, 4" "2, 5"
[3,] "3, 1" "3, 2" "3, 3" "3, 4" "3, 5"
[4,] "4, 1" "4, 2" "4, 3" "4, 4" "4, 5"
OR using col and row (as mentioned in the comments by #rawr)
grid[] <- paste0(row(grid), ", ", col(grid))

Merge two lists into single list containing single character vectors R

Okay I'm stumped, I know there are answers about merging lists, and my attempt builds on those answers, but they don't return a single char vector. I have a function that merges lists but the values are separate character vectors:
I dont want the characters as separate strings
csc.list <- mapply(c, rep("CSC", 16), c(1:16), SIMPLIFY=FALSE)
$CSC
[1] "CSC" "1"
$CSC
[1] "CSC" "2"
...
I don't know how to combine the characters in rows with a wierd heading
csc.list <- mapply(unlist, c(mapply(c, rep("CSC", 16), c(1:16), SIMPLIFY=FALSE)))
CSC CSC CSC CSC CSC ...
[1,] "CSC" "CSC" "CSC" "CSC" "CSC" ...
[2,] "1" "2" "3" "4" "5" ...
Desired Result of two merged lists
c("CSC 1", "CSC 2", "CSC 3", "CSC 4", "CSC 5", ... , "CSC 16")
[1] "CSC 1" "CSC 2" "CSC 3" "CSC 4" "CSC 5" ... "CSC 16"
Bonus if your answer scales to merging more than two, i.e. n lists into single vector of merged characters:
csc.list <- mapply(c, rep("CSC", 16), c(1:16), rep(".R", 16), SIMPLIFY=FALSE)
lalalala <- f(csc.list)
Desired result of three merged lists
[1] "CSC 1.R" "CSC 2.R" ...
(source: placekitten.com)
Are you looking for something like this?:
csc.list <- mapply(c, rep("CSC", 16), c(1:16), SIMPLIFY=FALSE)
#merge the list into a string
output<-sapply(csc.list, toString)
#remove the added commas
output<-gsub(",", "", output)
This question is 3 years old at the time of writing this. However, I am adding this answer as it might help some people. What you are looking for is the flatten_chr() function in the purrr package.
csc.list <- mapply(c, rep("CSC", 16), c(1:16), SIMPLIFY=FALSE)
library(purrr)
flatten_chr(csc.list)

Pasting two strings using paste function and its collapse argument

I am trying to paste two vectors
vector_1 <- c("a", "b")
vector_2 <- c("x", "y")
paste(vector_1, vector_2, collapse = " + ")
The output I get is
"a + b x + y "
My desired output is
"a + b + x + y"
paste with more then one argument will paste together term-by-term.
> paste(c("a","b","c"),c("A","B","C"))
[1] "a A" "b B" "c C"
the result being the length of the longest vector, with the shorter term recycled. That enables things like this to work:
> paste("A",c("1","2","BBB"))
[1] "A 1" "A 2" "A BBB"
> paste(c("1","2","BBB"),"A")
[1] "1 A" "2 A" "BBB A"
then sep is used within the elements and collapse to join the elements.
> paste(c("a","b","c"),c("A","B","C"))
[1] "a A" "b B" "c C"
> paste(c("a","b","c"),c("A","B","C"),sep="+")
[1] "a+A" "b+B" "c+C"
> paste(c("a","b","c"),c("A","B","C"),sep="+",collapse="#")
[1] "a+A#b+B#c+C"
Note that once you use collapse you get a single result rather than three.
You seem to not want to combine your two vectors element-wise, so you need to turn them into one vector, which you can do with c(), giving us the solution:
> c(vector_1, vector_2)
[1] "a" "b" "x" "y"
> paste(c(vector_1, vector_2), collapse=" + ")
[1] "a + b + x + y"
Note that sep isn't needed - you are just collapsing the individual elements into one string.

Resources