what is the effect of print function in sapply? - r

data <-c("001","002","103","119","129")
n1<- sapply(data,function(x){
x<-gsub(pattern="(\\d+)(\\d\\d)$","\\2",x)
if(gsub("(\\d)(\\d)","\\1",x)=="0")
x <- gsub("(\\d)(\\d)","\\2",x)
},USE.NAMES=FALSE)
n2<- sapply(data,function(x){
x<-gsub(pattern="(\\d+)(\\d\\d)$","\\2",x)
if(gsub("(\\d)(\\d)","\\1",x)=="0")
x <- gsub("(\\d)(\\d)","\\2",x)
print(x)},USE.NAMES=FALSE)
Why n2 can get a vector of "1" "2" "3" "19" "29" ,n1 can not?n2 is more one line print(x) than n1,what is the effect of print function here?

What is happening here exactly is a little easier to spot when we apply some better indentation and add some spaces:
n2 <- sapply(data, function(x) {
x <- gsub(pattern = "(\\d+)(\\d\\d)$", "\\2", x)
if (gsub("(\\d)(\\d)", "\\1", x) == "0") x <- gsub("(\\d)(\\d)", "\\2", x)
print(x)
}, USE.NAMES=FALSE)
If you do not use an explicit return statement, R will return the outcome of the last operation. In the first case, when the if statement fails, the last x <- will be skipped, and NULL will be returned. Adding print(x) both prints the number to the screen, and causes it to be returned from the function. This explains that the second case does always have a valid (non-NULL) return value.
In stead of print(x), I would use return(x), or simply x.

Related

put left padded zeros inside string

i want to write a function which takes a character Vector(including numbers) as Input and left pads zeroes to the numbers in it. for example this could be an Input Vector :
x<- c("abc124.kk", "77kk-tt", "r5mm")
x
[1] "abc124.kk" "77kk-tt" "r5mm"
each string of the input Vector contains only one Vector but there all in different positions(some are at the end, some in the middle..)
i want the ouput to look like this:
"abc124.kk" "077kk-tt" "r005mm"
that means to put as many leading Zeros to the number included in the string so that it has as many Digits as the longest number.
but i want a function who does this for every string Input not only my example(the x Vector).
i already started extracting the numbers and letters and turned the numbers the way i want them but how can i put them back together and back on the right Position?
my_function<- function(x){
letters<- str_extract_all(x,"[a-z]+")
numbers<- str_extract_all(x, "[0-9]+")
digit_width<-max(nchar(numbers))
numbers_correct<- str_pad(numbers, width=digit_width, pad="0")
}
and what if i have a Vector which includes some strings without numbers? how can i exclude them and get them back without any changes ?
for example if teh Input would be
y<- c("12ab", "cd", "ef345")
the numbers variable Looks like that:
[[1]]
[1] "12"
[[2]]
character(0)
in this case i would want that the ouput at the would look like this:
"012ab" "cd" "ef345"
An option would be using gsubfn to capture the digits, convert it to numeric and then pass it to sprintf for formatting
library(gsubfn)
gsubfn("([0-9]+)", ~ sprintf("%03d", as.numeric(x)), x)
#[1] "abc124.kk" "077kk-tt" "r005mm"
x <- c("12ab", "cd", "ef345")
s = gsub("\\D", "", x)
n = nchar(s)
max_n = max(n)
sapply(seq_along(x), function(i){
if (n[i] < max_n) {
zeroes = paste(rep(0, max_n - n[i]), collapse = "")
gsub("\\d+", paste0(zeroes, s[i]), x[i])
} else {
x[i]
}
})
#[1] "012ab" "cd" "ef345"

r identify and return string prior to last delimiter

I have found similar questions on stackoverflow, but not exactly what I'm looking for. I have a vector like:
x <- c('w/x/y/z', 'x/y/z', 'y/z')
I want to return a vector that gives me the string (in my real data, a longer string than 1 letter) that is before the last / and prior to any other /. So, the output for this would be:
y <- c('y', 'y', 'y')
I have tried using strsplit and tail to split by / and return the last 2 items, but don't know how to only get the second to last item.
y <- sapply(lapply(spl, tail,2), paste, collapse = '/')
y
[1] "y/z" "y/z" "y/z"
ifelse(grepl("/", x), sapply(strsplit(x, "/"), function(S) tail(S, 2)[1]), NA)
#[1] "y" "y" "y"

Time conversion to number

There is a vector with a time value. How can I remove a colon and convert a text value to a numeric value. i.e. from "10:01:02" - character to 100102 - numeric. All that I could find is presented below.
> x <- c("10:01:02", "11:01:02")
> strsplit(x, split = ":")
[[1]]
[1] "10" "01" "02"
[[2]]
[1] "11" "01" "02"
If you want to do everything in one line, you can use the destring() function from taRifx to remove everything that isn't a number and convert the result to numeric.
taRifx::destring(x)
This will also work if some of your data's formatted in a different way, such as "10-01-02", though you may have to set the value of keep.
destring("10-10-10", keep = "0-9")
And if you don't want to have to install the taRifx package you can define the destring() function locally.
destring <- function(x, keep = "0-9.-")
{
return(as.numeric(gsub(paste("[^", keep, "]+", sep = ""),
"", x)))
}
We can use gsub to replace : with "". After that, use as.numeric to do the conversion.
x <- as.numeric(gsub(":", "", x, fixed = TRUE))
Or we can use the regex suggest by Soto
x <- as.numeric(gsub('\\D+', '', x))
Try with
x <- as.numeric(x)
and then to make sure
class(x)

How to split a string from right-to-left, like Python's rsplit()?

Suppose a vector:
xx.1 <- c("zz_ZZ_uu_d", "II_OO_d")
I want to get a new vector splitted from right most and only split once. The expected results would be:
c("zz_ZZ_uu", "d", "II_OO", "d").
It would be like python's rsplit() function. My current idea is to reverse the string, and split the with str_split() in stringr.
Any better solutions?
update
Here is my solution returning n splits, depending on stringr and stringi. It would be nice that someone provides a version with base functions.
rsplit <- function (x, s, n) {
cc1 <- unlist(stringr::str_split(stringi::stri_reverse(x), s, n))
cc2 <- rev(purrr::map_chr(cc1, stringi::stri_reverse))
return(cc2)
}
Negative lookahead:
unlist(strsplit(xx.1, "_(?!.*_)", perl = TRUE))
# [1] "zz_ZZ_uu" "d" "II_OO" "d"
Where a(?!b) says to find such an a which is not followed by a b. In this case .*_ means that no matter how far (.*) there should not be any more _'s.
However, it seems to be not that easy to generalise this idea. First, note that it can be rewritten as positive lookahead with _(?=[^_]*$) (find _ followed by anything but _, here $ signifies the end of a string). Then a not very elegant generalisation would be
rsplit <- function(x, s, n) {
p <- paste0("[^", s, "]*")
rx <- paste0(s, "(?=", paste(rep(paste0(p, s), n - 1), collapse = ""), p, "$)")
unlist(strsplit(x, rx, perl = TRUE))
}
rsplit(vec, "_", 1)
# [1] "a_b_c_d_e_f" "g" "a" "b"
rsplit(vec, "_", 3)
# [1] "a_b_c_d" "e_f_g" "a_b"
where e.g. in case n=3 this function uses _(?=[^_]*_[^_]*_[^_]*$).
Another two. In both I use "(.*)_(.*)" as the pattern to capture both parts of the string. Remember that * is greedy so the first (.*) will match as many characters as it can.
Here I use regexec to capture where your susbtrings start and end, and regmatches to reconstruct them:
unlist(lapply(regmatches(xx.1, regexec("(.*)_(.*)", xx.1)),
tail, -1))
And this one is a little less academic but easy to understand:
unlist(strsplit(sub("(.*)_(.*)", "\\1###\\2", xx.1), "###"))
What about just pasting it back together after it's split?
rsplit <- function( x, s ) {
spl <- strsplit( "zz_ZZ_uu_d", s, fixed=TRUE )[[1]]
res <- paste( spl[-length(spl)], collapse=s, sep="" )
c( res, spl[length(spl)] )
}
> rsplit("zz_ZZ_uu_d", "_")
[1] "zz_ZZ_uu" "d"
I also thought about a very similar approach to that of Ari
> res <- lapply(strsplit(xx.1, "_"), function(x){
c(paste0(x[-length(x)], collapse="_" ), x[length(x)])
})
> unlist(res)
[1] "zz_ZZ_uu" "d" "II_OO" "d"
This gives exactly what you want and is the simplest approach:
require(stringr)
as.vector(t(str_match(xx.1, '(.*)_(.*)') [,-1]))
[1] "zz_ZZ_uu" "d" "II_OO" "d"
Explanation:
str_split() is not the droid you're looking for, because it only does left-to-right split, and splitting then repasting all the (n-1) leftmost matches is a total waste of time. So use str_split() with a regex with two capture groups. Note the first (.*)_ will greedy match everything up to the last occurrence of _, which is what you want. (This will fail if there isn't at least one _, and return NAs)
str_match() returns a matrix where the first column is the entire string, and subsequent columns are individual capture groups. We don't want the first column, so drop it with [,-1]
as.vector() will unroll that matrix column-wise, which is not what you want, so we use t() to transpose it to unroll row-wise
str_match(string, pattern) is vectorized over both string and pattern, which is neat

Recording output from loops in R

I am doing an exercise in my R class, and I hope you can help. The task is to create my own script that determines whether or not a number is a palindrome. My idea was to create a repetition structure that records each digit in a number of any size, compares those digits in order, and then makes a call as to whether the number is a palindrome or not.
So far, I thought I could use the "for" command to break the number down, like this:
# Initialize
Number <- 242
Number
N <- nchar(Number)
N
# Find numbers and digits
if (Number == 0) {
print ("Number must be greater than 0")
}
if (Number < 0) {
print ("Number must be greater than 0")
}
for (i in 1:N) {
print (Number)
Digit <- Number %/% 10^(N-1)
print (Digit)
Number <- Number %% 10^(N-1)
N <- N-1
}
The problem, though, is that since this structure overwrites the variables in each loop, I cannot print all the digits out separately once the loop is done. Can I command R to print out and record the digits produced in each loop, so that they can be compared to each other downstream and used to assess whether the original number was a palindrome or not? Thanks for your help.
There's better ways of checking for palindrome-ness in R, for which you should see the other answers. For your specific problem of keeping track of things during a for loop, one approach is to make a vector that's as long as the for loop and assign to the ith element of the vector in the ith iteration of the loop.
Number <- 12345
N <- nchar(Number)
backwardsDigits <- numeric(N) ## a vector of numerics of length N
for (i in N:1) {
backwardsDigits[i] <- Number %/% 10^(i-1)
Number <- Number %% 10^(i-1)
}
backwardsDigits
all(backwardsDigits == rev(backwardsDigits))
You could use forwardsDigits instead by writing to forwardsDigits[N - i + 1] in the loop. You don't really need to print anything during the loop, though it can be helpful for debugging.
As #thelatemail suggested, there is another (perhaps more intuitive way) to do this.
First, let's convert the number 117711 to a string and split it up.
charsplit <- strsplit(as.character(117712), "")
[[1]]
[1] "1" "1" "7" "7" "1" "2"
Then, we'll take it out of list form and reverse it
revchar <- rev(unlist(charsplit))
[1] "2" "1" "7" "7" "1" "1"
Finally, we'll paste these together and convert them into a number:
palinum <- as.numeric(paste(revchar, collapse=""))
[1] "217711"
We can then check if they're identical:
117712 == palinum
[1] FALSE
We can even write a function to do it for us.
is.palindrome <- function(number){
charsplit <- strsplit(as.character(number), "")
revchar <- rev(unlist(charsplit))
palinum <- as.numeric(paste(revchar, collapse=""))
number==palinum
}
is.palindrome(117712)
[1] FALSE
is.palindrome(117711)
[1] TRUE

Resources