Getting last two digits of Sequence Date in R - r

I have sequence date:
names<-format(seq.Date(as.Date("2012-11-01"),as.Date("2012-12-01"),
by = 'months'),format = "%Y%m")
How can I get the last two digit, like the result for last two digits of names[1] is 11?

Using the stringr package you can just put
stringr::str_sub(string = names, start = -2, end = -1)

You could use substr():
names = substr(names, nchar(names)-1, nchar(names))
The result is:
[1] "11" "12"
Or as integer:
names = as.integer(substr(names, nchar(names)-1, nchar(names))
Result:
[1] 11 12

There can be tenths of ways. The simpliest I could invent was to find the remainder of intiger division:
as.integer(names) %% 100
that returns:
[1] 11 12
Technically these are integers. If you stricktly require characters apply as.character() to the result to cast the type.

Related

Regex: Match first two digits of a four digit number

I have:
'30Jun2021'
I want to skip/remove the first two digits of the four digit number (or any other way of doing this):
'30Jun21'
I have tried:
^.{0,5}
https://regex101.com/r/hAJcdE/1
I have the first 5 characters but I have not figured out how to skip/remove the '20'
Manipulating datetimes is better using the dedicated date/time functions.
You can convert the variable to date and use format to get the output in any format.
x <- '30Jun2021'
format(as.Date(x, '%d%b%Y'), '%d%b%y')
#[1] "30Jun21"
You can also use lubridate::dmy(x) to convert x to date.
You don't even need regex for this. Just use substring operations:
x <- '30Jun2021'
paste0(substr(x, 1, 5), substr(x, 8, 9))
[1] "30Jun21"
Use sub
sub('\\d{2}(\\d{2})$', "\\1", x)
[1] "30Jun21"
or with str_remove
library(stringr)
str_remove(x, "\\d{2}(?=\\d{2}$)")
[1] "30Jun21"
data
x <- '30Jun2021'
You could also match the format of the string with 2 capture groups, where you would match the part that you want to omit and capture what you want to keep.
\b(\d+[A-Z][a-z]+)\d\d(\d\d)\b
Regex demo
sub("\\b(\\d+[A-Z][a-z]+)\\d\\d(\\d\\d)\\b", "\\1\\2", "30Jun2021")
Output
[1] "30Jun21"

Finding the position of decimal point in an integer or a string

If i want to determine the position of a decimal point in an integer, for example, in 524.79, the position of the decimal point is 4, and that is what I want as output in R; which function or command should i use? I have tried using gregexpr as well as regexpr but each time the output comes out to be 1.
This is what I did :
x <- 524.79
gregexpr(pattern = ".", "x")
The output looks like this:
[[1]]
[1] 1
attr(,"match.length")
[1] 1
attr(,"useBytes")
[1] TRUE
The . is a metacharacter which means any character. It either needs to be escaped (\\.) or place it inside square brackets [.] or use fixed = TRUE to get the literal character
as.integer(gregexpr(pattern = ".", x, fixed = TRUE))
#[1] 4
Or a compact option is str_locate
library(stringr)
unname(str_locate(x, "[.]")[,1])
#[1] 4
The second issue in the OP's solution is quoting the object x. So, the gregexpr locates the . as 1 because there is only one character "x" and it is the first position
data
x <- 524.79
We could actually use a regex here:
x <- "524.79"
nchar(sub("(?<=\\.)\\d+", "", x, perl=TRUE))
4

Finding number of r's in the vector (Both R and r) before the first u

rquote <- "R's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
in the above code we need to find the number of r's(R and r) in rquote
You could use substrings.
## find position of first 'u'
u1 <- regexpr("u", rquote, fixed = TRUE)
## get count of all 'r' or 'R' before 'u1'
lengths(gregexpr("r", substr(rquote, 1, u1), ignore.case = TRUE))
# [1] 5
This follows what you ask for in the title of the post. If you want the count of all the "r", case insensitive, then simplify the above to
lengths(gregexpr("r", rquote, ignore.case = TRUE))
# [1] 6
Then there's always stringi
library(stringi)
## count before first 'u'
stri_count_regex(stri_sub(rquote, 1, stri_locate_first_regex(rquote, "u")[,1]), "r|R")
# [1] 5
## count all R or r
stri_count_regex(rquote, "r|R")
# [1] 6
To get the number of R's before the first u, you need to make an intermediate step. (You probably don't need to. I'm sure akrun knows some incredibly cool regular expression to get the job done, but it won't be as easy to understand as this).
rquote <- "R's internals are irrefutably intriguing"
before_u <- gsub("u[[:print:]]+$", "", rquote)
length(stringr::str_extract_all(before_u, "(R|r)")[[1]])
You may try this,
> length(str_extract_all(rquote, '[Rr]')[[1]])
[1] 6
To get the count of all r's before the first u
> length(str_extract_all(rquote, perl('u.*(*SKIP)(*F)|[Rr]'))[[1]])
[1] 5
EDIT: Just saw before the first u. In that case, we can get the position of the first 'u' from either which or match.
Then use grepl in the 'chars' up to the position (ind) to find the logical index of 'R' with ignore.case=TRUE and use sum using the strsplit output from the OP's code.
ind <- which(chars=='u')[1]
Or
ind <- match('u', chars)
sum(grepl('r', chars[seq(ind)], ignore.case=TRUE))
#[1] 5
Or we can use two gsubs on the original string ('rquote'). First one removes the characters starting with u until the end of the string (u.$) and the second matches all characters except R, r ([^Rr]) and replace it with ''. We can use nchar to get count of the characters remaining.
nchar(gsub('[^Rr]', '', sub('u.*$', '', rquote)))
#[1] 5
Or if we want to count the 'r' in the entire string, gregexpr to get the position of matching characters from the original string ('rquote') and get the length
length(gregexpr('[rR]', rquote)[[1]])
#[1] 6

find occurrence of string starting with a value in R

Is there a function for printing the total number of values contained in the dataset beginning with (a value)?
consider this dataset of 4 version numbers,
df <- c("1.20", "3.1.20", "2.45", "1.10", "1.67.4.3", "5.200.1", "70.1.2.7")
I need to only print version numbers 1.x.
My output would be:
1.20, 1.10, 1.67.4.3
(becasue these are version numbers starting with "1." I do not want to print 3.1.20 or 70.1.2.7 becasue they do not start with "1." eventhough they contain "1." as a substring
df <- c("1.20", "3.1.20", "2.45", "1.10", "1.67.4.3", "5.200.1", "70.1.2.7")
grep("^1\\.", df, value = TRUE)
Use the function substring inside brackets for subsetting:
df[substring(df, 1,2) == "1."]
Or:
sum(substr(df, 1, 2) == "1.")
[1] 3
And for the values themselves:
df[substr(df, 1, 2) == "1."]
[1] "1.20" "1.10" "1.67.4.3"
df[df<"2"]
#[1] "1.20" "1.10" "1.67.4.3"
Depending on your dataset (e.g., if there are version numbers with a leading zero), you might need to expand this suggested solution by df[df<"2" & df>="1"]
The total number of values starting with a "1" can in this case be obtained with length(df[df<"2"]) (or length(df[df<"2" & df >="1"]) ).

Extract/Remove portion of an Integer or string with random digits/characters in R

Say I have an integer
x <- as.integer(442009)
or a character string
y <- "a10ba3m1"
How do I eliminate the last two digits/character of integer/string of any length in general ?
substr returns substrings:
substr(x, 1, nchar(x)-2)
# [1] "4420"
substr(y, 1, nchar(y)-2)
# [1] "a10ba3"
If you know that the value is an integer, then you can just divide by 100 and convert back to integer (drop the decimal part). This is probably a little more efficient than converting it to a string then back.
> x <- as.integer(442009)
> floor(x/100)
[1] 4420
If you just want to remove the last 2 characters of a string then substr works.
Or, here is a regular expression that does it as well (less efficiently than substr:
> y <- "a10ba3m1"
> sub("..$", "", y)
[1] "a10ba3"
If you want to remove the last 2 digits (not any character) from a string and the last 2 digits are not guaranteed to be in the last 2 positions, then here is a regular expression that works:
> sub("[0-9]?([^0-9]*)[0-9]([^0-9]*)$", "\\1\\2", y)
[1] "a10bam"
If you want to remove up to 2 digits that appear at the very end (but not if any non digits come after them) then use this regular expression:
> sub("[0-9]{1,2}$", "", y)
[1] "a10ba3m"

Resources