find string that the second string is 9 using R - r

I have a list of numbers and I want to find numbers which their second string is 9. the grep() code find any number that has 9 but I am looking for a code that find number that second string is 9. so the below returns:
p <- c(34405, 09098424, 6908347, 8900333, 453434)
grep(9, p)
[1] 1 2 3 4
I am looking for something that return:
[1] 2 3 4
Thanks
Majran

We can use substr to extract the 2nd digit and check whether (==) that is equal to 9, get the numeric index by wrapping with which.
which(substr(p,2,2)=="9")
#[1] 2 3 4
Or another option is grep where we match the pattern ^.9 (where ^ suggests the start of the string, . can be any character followed by 9 i.e. the second character)
grep("^.9", p)
#[1] 2 3 4
NOTE: Here I am assuming that the OP's vector is character class because numeric elements don't have 0 padded on the left.
data
p <- c("34405", "09098424", "6908347", "8900333", "453434")

Related

How to calculate longest common substring anywhere in two strings

I am trying to calculate the longest exact common substring without gaps between a string and a vector of strings in R. How do I modify stringdist to return any common string anywhere in the two compared strings and return the distance?
Reproduce data:
string1 <- "whereiam"
vec1 <- c("firstiam","twoiswhereiaminthisvec","thisisthree","fouriamhere","fivewherehere")
Attempted stringdist function tried (doesnt work for my purposes):
library(stringdist)
stringdistvec <- stringdist(string1,vec1,method="lcs")
[1] 8 14 13 11 11 #not calculating the lcs type I want
Desired result instead with explanation of matches:
#desired to work to get this result:
desired_stringdistvec <- c(3,8,1,3,5)
[1] 3 8 1 3 5
#match 1: iam (3 common substr)
#match 2: whereiam (8 common substr)
#match 3: i (one letter only)
#match 5: iam (3 common substr)
#match 6: where (5 common substr)
One approach might be to look at the transformation sequence produced by adist() and count the characters in the longest contiguous match:
trafos <- attr(adist(string1, vec1, counts = TRUE), "trafos")
sapply(gregexpr("M+", trafos), function(x) max(0, attr(x, "match.length")))
[1] 3 8 1 3 5

How can I check for cluster patterns in a sequence of numbers and obtain the next value?

Given a set of sequences
seq1 <- c(3,3,3,7,7,7,4,4)
seq2 <- c(17,17,77,77,3)
seq3 <- c(5,5,23)
How can we create a function to check this sequence for cluster patterns and predict the next value of the sequence which in this case would be 4,3, and 23 respectively.
Edit: The sequence should first be checked for cluster patterns, if it does not contain this class of pattern then the sequence should be ignored or passed onto another function
Edit 2: A pattern should be defined by more that 1 of the same consecutive number and always grouped consistently e.g 1,1,1,2,2,2,3,3,3 is a pattern but 1,1,2,2,2,3,3 is not a pattern
Here's a way with rle in base R which checks if all run-lengths, except last, are equal and if TRUE then repeats the last value such that it has same pattern as others -
rl <- rle(seq1)$lengths
# check if all run-lengths, except last, are equal
if(all(head(rl, -1) == rl[1])) {
c(seq1, rep(seq1[length(seq1)], diff(range(rl))))
} else {
# do something else
}
# [1] 3 3 3 7 7 7 4 4 4
The same approach applies for seq2 and seq3.

How to remove only numbers from string

I have following dataframe in R
ID Village_Name
1 23
2 Name-23
3 34
4 Vasai2
5 23
I only want to remove numbers from Village_Name, my desired dataframe would be
ID Village_Name
1 Name-23
2 Vasai2
How can I do it in R?
We can use grepl to match one or more numbers from the start (^) till the end ($) of the numbers and negate (!) it so that all numbers only elements become FALSE and others TRUE
i1 <- !grepl("^[0-9]+$", df1$Village_Name)
df1[i1, ]
Based on the OP's post, it could be also
data.frame(ID = head(df1$ID, sum(i1)), Village_Name = df1$Village_Name[i1])
# ID Village_Name
#1 1 Name-23
#2 2 Vasai2
Or another option is to convert to numeric resulting in non-numeric elements to be NA and is changed to a logical vector with is.na
df1[is.na(as.numeric(df1$Village_Name)),]
Here is another option using sub:
df1[nchar(sub("\\d+", "", df1$Village_Name)) > 0, ]
Demo
The basic idea is to strip off all digits from the Village_Name column, then assert that there is at least one character remaining, which would imply that the entry is not entirely numerical.
But, I would probably go with the grepl option given by #akrun in practice.

Using if/else statement to insert a decimal for a column based on starting letter and string length of the row using R

I have a data frame "df" and want to apply if/else conditions to insert a decimal for the entire column "A"
A B
E0505 123
890 43
4505 56
Rules to apply:
If the code starts with "E" and length of the code is > 4: between character 4 and 5.
If length of the code is > 3 and the code doesn't start with "E": between character 3 and 4.
If length of the code is <= 3: return the code as such.
Final output:
A B
E050.5 123
890 43
450.5 56
I have tried this, but I am not sure how to include the condition where row starts with E or not.
ifelse(str_length(df$A)>3, as.character(paste0(substring(df$A, 1, 3),".", substring(df$A, 4))), as.character(df$A))
Use sub with regular expression, you can do this:
df$A <- sub("((?:^E.|^[^E]).{2})(.+)", "\\1.\\2", df$A)
df
# A B
#1 E050.5 123
#2 890 43
#3 450.5 56
((?:^E.|^[^E]).{2})(.+) matches strings:
case 1: starts with E followed by 4 or more characters, in which case capture the first 4 characters and the rest as two separate groups and insert . between;
case 2: not starts with E but have 4 or more characters, in which case capture the first 3 characters and the rest as two separate groups and insert . between;
Strings starting with E and has less than 5 characters in total or not starting with E and has less than 4 characters in total are not matched, and will not be modified.
If ignoring case: df$A <- sub("((?:^[Ee].|^[^Ee]).{2})(.+)", "\\1.\\2", df$A).

Sum number in a character string (R)

I have a vector that looks like :
numbers <- c("1/1/1", "1/0/2", "1/1/1/1", "2/0/1/1", "1/2/1")
(not always the same number of "/" character)
How can I create another vector with the sum of the numbers of each string?
Something like :
sum
3
3
4
4
4
One solution with strsplit and sapply:
sapply(strsplit(numbers, '/'), function(x) sum(as.numeric(x)))
#[1] 3 3 4 4 4
strsplit will split your stings on / (doesn't matter how many /s you have). The output of strsplit is a list, so we iterate over it to calculate the sum with sapply.
What seems to me to be the most straightforward approach here is to convert your number strings to actual valid string arithmetic expressions, and then evaluate them in R using eval along with parse. Hence, the string 1/0/2 would become 1+0+2, and then we can simply evaluate that expression.
sapply(numbers, function(x) { eval(parse(text=gsub("/", "+", x))) })
1/1/1 1/0/2 1/1/1/1 2/0/1/1 1/2/1
3 3 4 4 4
Demo
1) strapply strapply matches each string of digits using \\d+ and then applies as.numeric to it returning a list with one vector of numbers per input string. We then apply sum to each of those vectors. This solution seems particularly short.
library(gsubfn)
sapply(strapply(numbers, "\\d+", as.numeric), sum)
## [1] 3 3 4 4 4
2) read.table This applies sum(read.table(...)) to each string. It is a bit longer (but still only one line of code) but uses no packages.
sapply(numbers, function(x) sum(read.table(text = x, sep = "/")))
## 1/1/1 1/0/2 1/1/1/1 2/0/1/1 1/2/1
## 3 3 4 4 4
Add the USE.NAMES = FALSE argument to sapply if you don't want names on the output.
scan(textConnection(x), sep = "/", quiet = TRUE) could be used in place of read.table but is longer.

Resources