Create valid Time from integer with built-in function - r

Is there way to create 8:46:01 from integer 84601 without using modulo operations in R ? something like format with mask in another languages : format(84600, "HHMMSS") ? Otherwise modulo devision is needed and some messy formulas

format(strptime("084601","%H%M%S"),"%H:%M:%S")
works, but you have to make sure that you have a two-digit hour, for example:
x <- "84601"
Put a zero in front of any 5-digit numeric strings:
xx <- gsub("([0-9]{5})","0\\1",x)
(or, as #Frank says in a comment, sprintf("%06d", x) will work for integers ...)
Convert:
format(strptime(xx,"%H%M%S"),"%H:%M:%S")
(if you don't format() you'll get a date-time string with the current date filled in ...)

Just treat it as a string:
x <- 84601
# index from end in case of extra hours digit
y <- paste0(substr(x, 1, nchar(x)-4), ':',
substr(x, nchar(x)-3, nchar(x)-2), ':',
substr(x, nchar(x)-1, nchar(x)))
y
# [1] "8:46:01"
Or with regex:
y <- gsub('(.?.)(..)(..)', '\\1:\\2:\\3', x)
y
# [1] "8:46:01"
Or with format (formatting numbers, not time):
y <- format(x, big.mark = ':', big.interval = 2L)
y
# [1] "8:46:01"
If you need an actual time class, chron::times is nice:
chron::times(y)
# [1] 08:46:01

Related

put left padded zeros inside string

i want to write a function which takes a character Vector(including numbers) as Input and left pads zeroes to the numbers in it. for example this could be an Input Vector :
x<- c("abc124.kk", "77kk-tt", "r5mm")
x
[1] "abc124.kk" "77kk-tt" "r5mm"
each string of the input Vector contains only one Vector but there all in different positions(some are at the end, some in the middle..)
i want the ouput to look like this:
"abc124.kk" "077kk-tt" "r005mm"
that means to put as many leading Zeros to the number included in the string so that it has as many Digits as the longest number.
but i want a function who does this for every string Input not only my example(the x Vector).
i already started extracting the numbers and letters and turned the numbers the way i want them but how can i put them back together and back on the right Position?
my_function<- function(x){
letters<- str_extract_all(x,"[a-z]+")
numbers<- str_extract_all(x, "[0-9]+")
digit_width<-max(nchar(numbers))
numbers_correct<- str_pad(numbers, width=digit_width, pad="0")
}
and what if i have a Vector which includes some strings without numbers? how can i exclude them and get them back without any changes ?
for example if teh Input would be
y<- c("12ab", "cd", "ef345")
the numbers variable Looks like that:
[[1]]
[1] "12"
[[2]]
character(0)
in this case i would want that the ouput at the would look like this:
"012ab" "cd" "ef345"
An option would be using gsubfn to capture the digits, convert it to numeric and then pass it to sprintf for formatting
library(gsubfn)
gsubfn("([0-9]+)", ~ sprintf("%03d", as.numeric(x)), x)
#[1] "abc124.kk" "077kk-tt" "r005mm"
x <- c("12ab", "cd", "ef345")
s = gsub("\\D", "", x)
n = nchar(s)
max_n = max(n)
sapply(seq_along(x), function(i){
if (n[i] < max_n) {
zeroes = paste(rep(0, max_n - n[i]), collapse = "")
gsub("\\d+", paste0(zeroes, s[i]), x[i])
} else {
x[i]
}
})
#[1] "012ab" "cd" "ef345"

Displaying only first 100 chars of character object

I have a read in a corpus of text with the following command and assigned it to an object in R. Now I would like to only display the first 100 characters of the character object. How is this possible?
text <- readChar(fileName, file.info(fileName)$size)
> class(text)
[1] "character"
> nchar(text)
[1] 32460
Using substr ?
substr(text, 1, 100)
or even substring
substring(text, 1, 100)
substr(x, start, stop)
substring(text, first, last = 1000000L)
substr(x, start, stop) <- value
substring(text, first, last = 1000000L) <- value
use these for extracting or replacing substrings in a character vector.

Adding leading 0s in r

I have a large data frame that is filled with characters such as:
x <- c("Y188","Y204" ,"Y221","EP121_1" ,"Y233" , "Y248" ,"Y268", "BB2","BB20",
"BB32" ,"BB044" ,"BB056" , "Y234" , "Y249" ,"Y271" ,"BB3", "BB21", "BB33",
"BB045","BB057" ,"Y236", "Y250", "Y272" , "BB4", "BB22" )
As you can see, certain tags such as BB20 only have two integers. I would like the entire list of characters to have at least 3 integers like this(the issue is only in the BB tags if that helps):
Y188, Y204, Y221, EP121_1, Y233, Y248, Y268, BB002, BB020, BB032, BB044,
BB056, Y234, Y249, Y271, BB003, BB021, BB033, BB045, BB057, Y236, Y250,
Y272, BB004, BB022
Ive looked into the sprintf and FormatC functions but still am having no luck.
A forceful approach with a nested gsub call:
gsub("(.*[A-Z])(\\d{1}$)", "\\100\\2",
gsub("(.*[A-Z])(\\d{2}$)", "\\10\\2", x))
# [1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020"
# [10] "BB032" "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033"
# [19] "BB045" "BB057" "Y236" "Y250" "Y272" "BB004" "BB022"
There is surely a more general way to do this, but for such a localized task, two simple sub can be enough: add one trailing zero for two-digit numbers, two trailing zeros for one-digit numbers.
x <- sub("^BB(\\d{1})$","BB00\\1",x)
x <- sub("^BB(\\d{2})$","BB0\\1",x)
This works, but will have edge case
# indicator for numeric of length less than three
num <- gsub("[^0-9]", "", x)
id <- nchar(num) < 3
# overwrite relevant values with the reformatted ones
x[id] <- paste0(gsub("[0-9]", "", x)[id],
formatC(as.numeric(num[id]), width = 3, flag = "0"))
[1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020" "BB032"
[11] "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033" "BB045" "BB057"
[21] "Y236" "Y250" "Y272" "BB004" "BB022"
It can be done using sprintf and gsub function.This step would extract numeric values and change its format.
num=sprintf("%03d",as.numeric(gsub("[^[:digit:]]", "", x)))
Next step would be to paste back numbers with changed format
x=paste(gsub("[^[:alpha:]]", "", x),num,sep="")

Extract/Remove portion of an Integer or string with random digits/characters in R

Say I have an integer
x <- as.integer(442009)
or a character string
y <- "a10ba3m1"
How do I eliminate the last two digits/character of integer/string of any length in general ?
substr returns substrings:
substr(x, 1, nchar(x)-2)
# [1] "4420"
substr(y, 1, nchar(y)-2)
# [1] "a10ba3"
If you know that the value is an integer, then you can just divide by 100 and convert back to integer (drop the decimal part). This is probably a little more efficient than converting it to a string then back.
> x <- as.integer(442009)
> floor(x/100)
[1] 4420
If you just want to remove the last 2 characters of a string then substr works.
Or, here is a regular expression that does it as well (less efficiently than substr:
> y <- "a10ba3m1"
> sub("..$", "", y)
[1] "a10ba3"
If you want to remove the last 2 digits (not any character) from a string and the last 2 digits are not guaranteed to be in the last 2 positions, then here is a regular expression that works:
> sub("[0-9]?([^0-9]*)[0-9]([^0-9]*)$", "\\1\\2", y)
[1] "a10bam"
If you want to remove up to 2 digits that appear at the very end (but not if any non digits come after them) then use this regular expression:
> sub("[0-9]{1,2}$", "", y)
[1] "a10ba3m"

Convert HH:MM:SS to hours (for more than 24 hours) in R

I would like to convert hours more than 24 hours in R.
For example, I have a dataframe which contains hours and minutes like [HH:MM]:
[1] "111:15" "221:15" "111:15" "221:15" "42:05"
I want them to be converted in hours like this:
"111.25" "221.25" "111.25" "221.25" "42.08333333"
as.POSIXct()
function works for general purpose, but not for more than 24 hours.
You can split the strings with strsplit and use sapply to transform all values.
vec <- c("111:15", "221:15", "111:15", "221:15", "42:05")
sapply(strsplit(vec, ":"), function(x) {
x <- as.numeric(x)
x[1] + x[2] / 60
})
The result:
[1] 111.25000 221.25000 111.25000 221.25000 42.08333
I would just parse the strings with regex. Grab the bit before the : then add on the bit after the : divided by 60
> foo = c("111:15", "221:15", "111:15", "221:15", "42:05")
> foo
[1] "111:15" "221:15" "111:15" "221:15" "42:05"
> as.numeric(gsub("([^:]+).*", "\\1", foo)) + as.numeric(gsub(".*:([0-9]{2})$", "\\1", foo))/60
[1] 111.25000 221.25000 111.25000 221.25000 42.08333
Another possibility is a vectorized function such as:
FUN <- function(time){
hours <- sapply(time,FUN=function(x) as.numeric(strsplit(x,split=":")[[1]][1]))
minutes <- sapply(time,FUN=function(x) as.numeric(strsplit(x,split=":")[[1]][2]))
result <- hours+(minutes/60)
return(as.numeric(result))
}
Where you use strsplit to extract the hours and minutes, of which you then take the sum after dividing the minutes by 60.
You can then use the function like this:
FUN(c("111:15","221:15","111:15","221:15","42:05"))
[1] 111.25000 221.25000 111.25000 221.25000 42.08333
strapplyc Here ia a solution using strapplyc in the gsubfn package. It passes the match to each of the parenthesized regular expressions (i.e. the hours and the minutes) to the function described in the third argument. The function can be specified using the usual R function notation and it also supports a short form using a formula (used here) where the right hand side of the formula is the function body and the left hand side represent the arguments and defaults to the free variables (m, h) in the right hand side. We suppose that the original character vector is ch.
library(gsubfn)
strapply(ch, "(\\d+):(\\d+)", ~ as.numeric(h) + as.numeric(m)/60, simplify = TRUE)
numeric processing Another way is to replace the : with a . and manipulate it numerically into what we want:
num <- as.numeric(chartr(":", ".", ch))
trunc(num) + 100 * (num %% 1) / 60
sub This is yet another approach:
h <- as.numeric(sub(":.*", "", ch))
m <- as.numeric(sub(".*:", "", ch))
h + m / 60
The codes above each gives a numberic result but we could wrap each in as.character(...) if a character result were desired.
read.table
as.matrix(read.table(text = ch, sep = ":")) %*% c(1, 1/60)
eval/parse. This one maipulates each one into an R expression which is evaluated. This one is short but the use of eval is often frowned upon:
sapply(parse(text = sub(":", "+(1/60)*", ch)), eval)
ADDED additional solutions.

Resources