Sum of Numbers in string in R separated by Vertical Bar - r

I have a string having value as given below separated by vertical bar.
String1 <- "5|10|25|25|10|10|10|5"
String2 <- "5|10|25|25"
Is there any Direct Function to get the sum of the numbers in string ,
in this case it Should be 100 for Srting1 and 65 for string2,and I have a character vector of such.
>chk
chk
1 5|10|25|25|10|10|10|5
2 5|55|20|5|5|5|5
3 6
4 Not Available
> sum(scan(text=gsub("\\Not Available\\b", "NA", chk$chk), sep="|", what = numeric(), quiet=TRUE), na.rm = TRUE)
[1] 206
As it Should be
[1]100 100 6 NA

We can do a scan and then sum
sum(scan(text=String1, sep="|", what = numeric(), quiet=TRUE))
For multiple vectors, place it in a list and do the same operation
sapply(mget(paste0("String", 1:2)), function(x)
sum(scan(text=x, sep="|", what=numeric(), quiet=TRUE)))
# String1 String2
# 100 65
Another option is eval(parse( (not recommended though) after replacing the | with +
eval(parse(text=gsub("[|]", "+", String1)))
#[1] 100
Or as #thelatemail mentioned in the comments, assign (<-) the | to + and then do the eval(parse(..
`|` <- `+`
eval(parse(text=String1))
#[1] 100
If we have a data.frame column with strings, then it may be better to split by | to a list of vectors, convert the vectors to numeric (all the non-numeric elements coerce to NA with a friendly warning), get the sum with na.rm=TRUE
sapply(strsplit(as.character(chk$chk), "[|]"),
function(x) sum(as.numeric(x), na.rm=TRUE))
#[1] 100 100 6 0
NOTE: The as.character is not needed if the 'chk' column is already a character class
Otherwise, if we are using scan or eval(parse, it should be done for each element.

We can extract all the numbers from the string and then sum over it
library(stringr)
sum(as.numeric(unlist(str_match_all(String1, "[0-9]+"))))
#[1] 100
sum(as.numeric(unlist(str_match_all(String2, "[0-9]+"))))
#[1] 65
For multiple vectors we can keep it in a list
sapply(list(String1, String2), function(x)
sum(as.numeric(unlist(str_match_all(x, "[0-9]+")))))
#[1] 100 65

Related

Is there a way in R to count the number of substrings in a string enclosed in square brackets, all substrings are separated by commas and are quoted?

['ax', 'byc', 'crm', 'dop']
This is a character string, and I want a count of all substrings, ie 4 here as output. Want to do this for the entire column containing such strings.
We may use str_count
library(stringr)
str_count(str1, "\\w+")
[1] 4
Or may also extract the alpha numeric characters into a list and get the lengths
lengths(str_extract_all(str1, "[[:alnum:]]+"))
If it is a data.frame column, extract the column as a vector and apply str_count
str_count(df1$str1, "\\w+")
data
str1 <- "['ax', 'byc', 'crm', 'dop']"
df1 <- data.frame(str1)
Here are a few base R approaches. We use the 2 row input defined reproducibly in the Note at the end. No packages are used.
lengths(strsplit(DF$b, ","))
## [1] 4 4
nchar(gsub("[^,]", "", DF$b)) + 1
## [1] 4 4
count.fields(textConnection(DF$b), ",")
## [1] 4 4
Note
DF <- data.frame(a = 1:2, b = "['ax', 'byc', 'crm', 'dop']")

R: Applying gsub to data frames returns NAs

I am trying to convert a data frame that contains numbers and blanks to numeric. Currently, numbers are in factor format and some have ",".
df <- data.frame(num1 = c("123,456,789", "1,234,567", "1,234", ""), num2 = c("","1,012","","202"))
df
num1 num2
1 123,456,789
2 1,234,567 1,012
3 1,234
4 202
Remove "," and convert to numeric format:
df2 = as.numeric(gsub(",","",df))
Warning message:
NAs introduced by coercion
Interestingly, if I perform the same function column by column, it worked:
df$num1 = as.numeric(gsub(",","",df$num1))
df$num2 = as.numeric(gsub(",","",df$num2))
df
num1 num2
1 123456789 NA
2 1234567 1012
3 1234 NA
4 NA 202
My questions are 1. What is the cause and if there is a way to avoid converting them column by column since the actual data frame has lots more columns; and 2. What would be the best way to remove NAs or replace them by 0s for future numeric operations? I know I can use gsub to do so but just wondering if there is a better way.
We can use replace_na after replace the , with '' (str_replace_all)
library(dplyr)
library(stringr)
df %>%
mutate_all(list(~ str_replace_all(., ",", "") %>%
as.numeric %>%
replace_na(0)))
# num1 num2
#1 123456789 0
#2 1234567 1012
#3 1234 0
#4 0 202
The issue with gsub/sub is that it works on vector as described in the ?gsub
x, text -
a character vector where matches are sought, or an object which can be coerced by as.character to a character vector. Long vectors are supported.
We can loop over the columns, apply the gsub, and assign the output back to the original dataset
df[] <- lapply(df, function(x) as.numeric(gsub(",", "", x)))
df[is.na(df)] <- 0 # change the NA elements to 0

Sum number in a character string (R)

I have a vector that looks like :
numbers <- c("1/1/1", "1/0/2", "1/1/1/1", "2/0/1/1", "1/2/1")
(not always the same number of "/" character)
How can I create another vector with the sum of the numbers of each string?
Something like :
sum
3
3
4
4
4
One solution with strsplit and sapply:
sapply(strsplit(numbers, '/'), function(x) sum(as.numeric(x)))
#[1] 3 3 4 4 4
strsplit will split your stings on / (doesn't matter how many /s you have). The output of strsplit is a list, so we iterate over it to calculate the sum with sapply.
What seems to me to be the most straightforward approach here is to convert your number strings to actual valid string arithmetic expressions, and then evaluate them in R using eval along with parse. Hence, the string 1/0/2 would become 1+0+2, and then we can simply evaluate that expression.
sapply(numbers, function(x) { eval(parse(text=gsub("/", "+", x))) })
1/1/1 1/0/2 1/1/1/1 2/0/1/1 1/2/1
3 3 4 4 4
Demo
1) strapply strapply matches each string of digits using \\d+ and then applies as.numeric to it returning a list with one vector of numbers per input string. We then apply sum to each of those vectors. This solution seems particularly short.
library(gsubfn)
sapply(strapply(numbers, "\\d+", as.numeric), sum)
## [1] 3 3 4 4 4
2) read.table This applies sum(read.table(...)) to each string. It is a bit longer (but still only one line of code) but uses no packages.
sapply(numbers, function(x) sum(read.table(text = x, sep = "/")))
## 1/1/1 1/0/2 1/1/1/1 2/0/1/1 1/2/1
## 3 3 4 4 4
Add the USE.NAMES = FALSE argument to sapply if you don't want names on the output.
scan(textConnection(x), sep = "/", quiet = TRUE) could be used in place of read.table but is longer.

Reverse only alphabetical patterns in a string in R

I'm trying to learn R and a sample problem is asking to only reverse part of a string that is in alphabetical order:
String: "abctextdefgtext"
StringNew: "cbatextgfedtext"
Is there a way to identify alphabetical patterns to do this?
Here is one approach with base R based on the patterns showed in the example. We split the string to individual characters ('v1'), use match to find the position of characters with that of alphabet position (letters), get the difference of the index and check if it is equal to 1 ('i1'). Using the logical vector, we subset the vector ('v1'), create a grouping variable and reverse (rev) the vector based on grouping variable. Finally, paste the characters together to get the expected output
v1 <- strsplit(str1, "")[[1]]
i1 <- cumsum(c(TRUE, diff(match(v1, letters)) != 1L))
paste(ave(v1, i1, FUN = rev), collapse="")
#[1] "cbatextgfedtext"
Or as #alexislaz mentioned in the comments
v1 = as.integer(charToRaw(str1))
rawToChar(as.raw(ave(v1, cumsum(c(TRUE, diff(v1) != 1L)), FUN = rev)))
#[1] "cbatextgfedtext"
EDIT:
1) A mistake was corrected based on #alexislaz's comments
2) Updated with another method suggested by #alexislaz in the comments
data
str1 <- "abctextdefgtext"
You could do this in base R
vec <- match(unlist(strsplit(s, "")), letters)
x <- c(0, which(diff(vec) != 1), length(vec))
newvec <- unlist(sapply(seq(length(x) - 1), function(i) rev(vec[(x[i]+1):x[i+1]])))
paste0(letters[newvec], collapse = "")
#[1] "cbatextgfedtext"
Where s <- "abctextdefgtext"
First you find the positions of each letter in the sequence of letters ([1] 1 2 3 20 5 24 20 4 5 6 7 20 5 24 20)
Having the positions in hand, you look for consecutive numbers and, when found, reverse that sequence. ([1] 3 2 1 20 5 24 20 7 6 5 4 20 5 24 20)
Finally, you get the letters back in the last line.

R : Extract a Specific Number out of a String

I have a vector as below
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
here i want to extract the first number before the "X" for each of the elements.
In case of situations with 2 "X" i.e. "6X2X75CL" the number 12 (6 multiplied by 2) should be calculated.
expected output
6, 24, 12, 168
Thank you for the help...
Here's a possible solution using regular expressions :
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
# this regular expression finds any group of digits followed
# by a upper-case 'X' in each string and returns a list of the matches
tokens <- regmatches(data,gregexpr('[[:digit:]]+(?=X)',data,perl=TRUE))
res <- sapply(tokens,function(x)prod(as.numeric(x)))
> res
[1] 6 24 12 168
Here is a method using base R:
dataList <- strsplit(data, split="X")
sapply(dataList, function(x) Reduce("*", as.numeric(head(x, -1))))
[1] 6 24 12 168
strplit breaks up the vector along "X". The resulting list is fed to sapply which the performs an operation on all but the final element of each vector in the list. The operation is to transform the elements into numerics and the multiply them. The final element is dropped using head(x, -1).
As #zheyuan-li comments, prod can fill in for Reduce and will probably be a bit faster:
sapply(dataList, function(x) prod(as.numeric(head(x, -1))))
[1] 6 24 12 168
We can also use str_extract_all
library(stringr)
sapply(str_extract_all(data, "\\d+(?=X)"), function(x) prod(as.numeric(x)))
#[1] 6 24 12 168
ind=regexpr("X",data)
val=as.integer(substr(data, 1, ind-1))
data2=substring(data,ind+1)
ind2=regexpr("[0-9]+X", data2)
if (!all(ind2!=1)) {
val2 = as.integer(substr(data2[ind2==1], 1, attr(ind2,"match.length")[ind2==1]-1))
val[ind2==1] = val[ind2==1] * val2
}

Resources