odd behavior of print within mapply - r

I am seeing some unexpected behavior (to me anyway) when print() is included as a side effect in a function wrapped in mapply().
For example, this works as expected (and yes I know it's not how we add vectors):
mapply(function(i,j) i+j, i=1:3, j=4:6) # returns [1] 5 7 9
And so does this:
mapply(function(i,j) paste(i, "plus", j, "equals", i+j), i=1:3, j=4:6)
# returns [1] "1 plus 4 equals 5" "2 plus 5 equals 7" "3 plus 6 equals 9"
But this doesn't:
mapply(function(i,j) print(paste(i, "plus", j, "equals", i+j)), i=1:3, j=4:6)
# returns:
# [1] "1 plus 4 equals 5"
# [1] "2 plus 5 equals 7"
# [1] "3 plus 6 equals 9"
# [1] "1 plus 4 equals 5" "2 plus 5 equals 7" "3 plus 6 equals 9"
What's going on here? I haven't used mapply() in a while, so maybe this is a no-brainer... I'm using R version 3.4.0.

print both prints its argument and returns its value.
p <- print("abc")
# [1] "abc"
p
# [2] "abc"
So each element gets printed, then the vector of stuff gets returned (and printed). Try e.g. invisible(mapply(...)) or m <- mapply(...) for comparison.
FWIW cat() returns NULL ...

Related

Use regex to extract mixed fraction and text that may also contain mixed fractions with R (stringr)

Please see below for a sample of the data I am working with, I have 100K+ entries in total, however.
note that the ... in the comment under UNIT is just to make it fit. For example, the full UNIT text is for the first item is 4- to 5-mm-diameter and the fifth item is 3 1/2- to 4-inch-diameter, etc.
library(tidyverse)
#i QTY UNIT
parts <- c("6 4- to 5-mm-diameter plugs", #1 6 4- to...diameter
"6 large bricks", #2 6 large
"1 1/3 shipment concrete", #3 1.33 shipment
"1 (14- to 15-oz) gold bars", #4 1 (14- to 15-oz)
"16 3 1/2- to 4-inch-diameter caps", #5 16 3 1/2- to...eter
"1 1/2 tons sand", #6 1.5 tons
"2 1 1/4- to 3-inch diameter caps", #7 2 1 1/4- to...eter
"1/3 shipment cement") #8 .333 shipment
I've had some moderate success working from some of the answers on SO but I run into problems when the UNIT text also contains mixed fractions as in items 1 and 5:
# Goal: extract QTY as mixed frac
parts %>%
str_extract("(\\d+[\\/\\d[ ]?]*|\\d*)")
# i=1, 5 broken
#[1] "6 4" "6 " "1 1/3 " "1 " "16 3 1/2" "1 1/2 "
# Goal: extract UNIT word
parts %>%
str_extract("[[:graph:]]{3,11}|[- to ].{5,21}")
# all i with some problem
# [1] " 4- to 5-mm-diameter p" " large bricks" " 1/3 shipment concrete"
# [4] " (14- to 15-oz) gold b" " 3 1/2- to 4-inch-diam" " 1/2 tons sand"
My goals is to extract QTY and UNIT as shown in the comment of the code from first to last entry as 6, 6, 1 1/3, 1, 16, 1, 2, 1/3 - in addition, I am trying to pull out the text under UNIT abbreviated just so it'd fit in the code section - here it is in full: 4- to 5-mm-diameter, large, shipment, (14- to 15-oz), 3 1/2- to 4-inch-diameter, tons, 1 1/4- to 3-inch diameter, shipment.
My intuition suggests I should do this in two steps but please let me know if there are better ways to achieve this.
Thank you.
edit: added a critical example number 8.
You may use
m <- str_match(parts, '^(\\d+(?:\\s+\\d+/\\d+)?|\\d+/\\d+)\\s+((?:\\d+(?:-?in(?:ch)?|")?\\s+)*\\S+(?:\\s+to\\s+(?:\\d+(?:-?in(?:ch)?|")?\\s+)*\\S+)?)')
qty <- m[,2]
# => [1] "6" "6" "1 1/3" "1" "16" "1 1/2" "2" "1/3"
unit <- m[,3]
# => [1] "4- to 5-mm-diameter" "large"
# [3] "shipment" "(14- to 15-oz)"
# [5] "3 1/2- to 4-inch-diameter" "tons"
# [7] "1 1/4- to 3-inch diameter" "shipment"
See the R demo and the regex demo. Details:
^ - start of string
(\d+(?:\s+\d+/\d+)?|\d+/\d+) - Group 1 (m[,2]): one or more digits followed with an optional occurrence of one or more whitespaces, one or more digits, / and one or more digits, or a / enclosed with one or more digits
\s+ - one or more whitespaces
((?:\d+(?:-?in(?:ch)?|")?\s+)*\S+(?:\s+to\s+(?:\d+(?:-?in(?:ch)?|")?\s+)*\S+)?) - Group 2 (m[,3]):
(?:\d+(?:-?in(?:ch)?|")?\s+)* - zero or more occurrences of one or more digits followed with an optional occurrence of a " or an optional -, in and then an optional ch substring and then one or more whitespaces
\S+ - one or more chars other than whitespace (a "word")
(?:\s+to\s+(?:\d+(?:-?in(?:ch)?|")?\s+)*\S+)? - an optional occurrence of:
\s+to\s+ - to enclosed with one or more whitespaces
(?:\d+(?:-?in(?:ch)?|")?\s+)* - see above
\S+ - one or more chars other than whitespace.

grep and regrex for R phone numbers

I would like to get the phone numbers from a file. I know the numbers have different forms, I don't know how to code for each form. Using grep and regrexpr in R. The numbers are written in this form:
xxx-xxx-xxxx ,
(xxx)xxx-xxxx,
xxx xxx xxxx,
xxx.xxx.xxxx
Try this:
phones <- c("foo 111-111-1111 bar" , "(111)111-1111 quux", "who knows 111 111 1111", "111.111.1111 I do", "111)111-1111 should not work", "1111111111 ditto", "a 111-111-1111 b (222)222-2222 c")
re <- gregexpr("(\\(\\d{3}\\)|\\d{3}[-. ])\\d{3}[-. ]\\d{4}", phones)
regmatches(phones, re)
# [[1]]
# [1] "111-111-1111"
# [[2]]
# [1] "(111)111-1111"
# [[3]]
# [1] "111 111 1111"
# [[4]]
# [1] "111.111.1111"
# [[5]]
# character(0)
# [[6]]
# character(0)
# [[7]]
# [1] "111-111-1111" "(222)222-2222"
In the data, I provide a few examples with other text on both, either, and neither side, as well as two examples that should not match. (That is: a starter "test set", as you want to make sure you both match good examples and no-match bad examples.) The last one hopes to match multiple numbers in one string/sentence.
gregexpr and regmatches are useful for finding and extracting or replacing regex-substrings within 1+ strings. For a "replace" example, one could do:
regmatches(phones, re) <- "GONE!"
phones
# [1] "foo GONE! bar" "GONE! quux"
# [3] "who knows GONE!" "GONE! I do"
# [5] "111)111-1111 should not work" "1111111111 ditto"
# [7] "a GONE! b GONE! c"
Obviously contrived replacement but certainly usable. Note though that regmatches operates in side-effect, meaning that it modified the phones vector in-place instead of returning the value. It's possible to force it to operate not in side-effect, but it is a little less intuitive:
phones # I reset it to the original value
# [1] "foo 111-111-1111 bar" "(111)111-1111 quux"
# [3] "who knows 111 111 1111" "111.111.1111 I do"
# [5] "111)111-1111 should not work" "1111111111 ditto"
# [7] "a 111-111-1111 b (222)222-2222 c"
`regmatches<-`(phones, re, value = "GONE!")
# [1] "foo GONE! bar" "GONE! quux"
# [3] "who knows GONE!" "GONE! I do"
# [5] "111)111-1111 should not work" "1111111111 ditto"
# [7] "a GONE! b GONE! c"
phones
# [1] "foo 111-111-1111 bar" "(111)111-1111 quux"
# [3] "who knows 111 111 1111" "111.111.1111 I do"
# [5] "111)111-1111 should not work" "1111111111 ditto"
# [7] "a 111-111-1111 b (222)222-2222 c"
Edit: scope-creep.
out <- unlist(Filter(length, regmatches(phones, re)))
out
# [1] "111-111-1111" "(111)111-1111" "111 111 1111" "111.111.1111" "111-111-1111"
# [6] "(222)222-2222"
gsub("[^0-9]", "", out)
# [1] "1111111111" "1111111111" "1111111111" "1111111111" "1111111111" "2222222222"
out <- gsub("[^0-9]", "", out)
sprintf("(%s)%s-%s", substr(out, 1, 3), substr(out, 4, 6), substr(out, 7, 10))
# [1] "(111)111-1111" "(111)111-1111" "(111)111-1111" "(111)111-1111" "(111)111-1111"
# [6] "(222)222-2222"

Format function output as a customised multiple lines string

I'm trying to make a function which gives output with simple format.
If I already calculated estimated values of beta's, what should I do if I want following result format.
Coefficients
-------------
Constant: 5.2
Beta1: 4
Beta2: 9
Beta3: 2
.
.
.
I tried cat() function but to use cat(), I have to write every line manually like:
cat("Coefficients","\n","-------------","\n","Constant: 5.2","\n","Beta1: 4",....)
Is there any way to make that simple result format?
If you have a vector of 10 results and you want to label them Beta1 to Beta10 you could do:
result = 10:1
b_order = 1:10
paste0("beta", b_order, ": ", result)
This gives:
[1] "beta1: 10" "beta2: 9" "beta3: 8" "beta4: 7" "beta5: 6" "beta6: 5" "beta7: 4" "beta8: 3" "beta9: 2" "beta10: 1"

add trace/breakpoint while already in R's browser

Edit: for the record, the accepted answer has a significant down-fall in that it re-executes the first n lines of code in the function when re-debugged. This might be okay, but when those lines of code include side-effects (e.g., database updates) and/or long-time calculations, it becomes obvious what is happening. I do not believe R provides the ability to do it "properly" (as some other languages do). Bummer.
Some debuggers allow you to dynamically add breakpoints while in the debugger. Is that functionality possible in R? An example:
quux <- function(..)
{ # line 1
"line 2"
"line 3"
"line 4"
"line 5"
"line 6"
}
trace("quux", tracer = browser, at = 3)
# [1] "quux"
quux()
# Tracing quux() step 3
# Called from: eval(expr, envir, enclos)
# Browse[1]>
# debug: [1] "line 3"
While debugging, I believe I want to jump ahead in the code. Imagine the function has a few hundred lines of code, and I'd prefer to not step through them.
I'd like to be able to do this, and jump from the current line to the next interesting line, but unfortunately it just continues out of the function.
# Browse[2]>
trace("quux", tracer = browser, at = 5)
# [1] "quux"
# Browse[2]>
c
# [1] "line 6"
# # (out of the debugger)
The trace call while in the debugger merely added the breakpoint to the original (global) function, as shown if I immediately call the function again:
quux()
# Tracing quux() step 5
# Called from: eval(expr, envir, enclos)
# Browse[1]>
# debug: [1] "line 5"
I tried setting both at once (at=c(3,5)) while inside the browser, but this just sets those lines for when I exit the debugger and call the function again.
I'm guessing this has to do with the function to which trace is attaching the breakpoint. Looking into trace (and .TraceWithMethods), I think I need to be setting where, but I cannot figure out how to get it to set a new breakpoint/trace on the in-debugging function.
(The larger picture is that I'm troubleshooting a function that is dealing with a kafka-led stream of data. My two options are currently (a) restart the function with the more appropriate trace, but this requires me to purge and restart the data stream as well; or (b) go line-by-line in the debugger, tedious when there are many hundreds of lines of code.)
This may be kind of a solution. First do as in your post:
> quux <- function(..)
+ { # line 1
+ x <- 1 # added for illustration
+ "line 3"
+ "line 4"
+ "line 5"
+ print(x) # added for illustration
+ "line 7"
+ "line 8"
+ }
>
> trace("quux", tracer = browser, at = 4)
[1] "quux"
> quux()
Tracing quux() step 4
Called from: eval(expr, p)
Browse[1]> n
debug: [1] "line 4"
Next, we do as follows in the debugger:
Browse[2]> this_func <- eval(match.call()[[1]]) # find out which funcion is called
Browse[2]> formals(this_func) <- list() # remove arguments
Browse[2]> body(this_func) <- body(this_func)[-(2:4)] # remove lines we have evalutaed
Browse[2]> trace("this_func", tracer = browser,
+ at = 8 - 4 + 1) # at new line - old trace point
Tracing function "this_func" in package "base"
[1] "this_func"
Browse[2]> this_func # print for illustration
function ()
{
"line 5"
print(x)
"line 7"
"line 8"
}
Browse[2]> environment(this_func) <- environment() # change enviroment so x is present
Browse[2]> this_func() # call this_func
[1] 1
[1] "line 8"
The downside is that we end back at "line 5" in the original call to quux after we exit from the call to this_func. Further, we have to keep track of the last at value. We may be able to get this from another function?

how to sort out a nested list in R

The original data was a simple list named "data" like this
[1] "score: 10 / review 1 / ID 1
[2] "score: 9 / review 2 / ID 2
[3] "score: 8 / review 3 / ID 3
----
[30] "score: 7 / review 30 / ID&DATE: 30
In order to sort out scores reviews and ID&DATEs separately,
I first made it a matrix, and then split them by "/" using str_split "stringr"
so the whole process went like this.
a1 <- readLines("data.txt")
a2 <- t(a1) # Matrix
a3 <- t(a2) # reversing rows and columns
b1 <- str_split(a,"/")
here is the problem
b1 came out as a nested list like this.
[[1]]
[1] "score: 10"
[2] "review 1"
[3] "ID 1"
[[2]]
[1] "score: 9"
[2] "review 2"
[3] "ID 2"
[[3]]
[1] "score: 8"
[2] "review 3"
[3] "ID 3"
------
[[30]]
[1] "score: 7"
[2] "review 30"
[3] "ID 30"
I want to extract the values of [[1]][1], [[2]][1], [[3]][1], ... [[30]][1], [[n]][2], and [[n]][3] SEPARATELY, and make each one of them a dataframe.
Any clues?
The following would work for a particular type of nested list that looks like your data. Without a reproducible example, I don't know for sure:
# create nested list
temp <- list(a=c(list("score: 10"), "review 1", "ID 1"),
b=c("score: 9", "review 2", "ID 2"),
c=c("score: 8", "review 3","ID 3"))
# create data frame from this list
df <- data.frame(score=unlist(sapply(temp, function(i) i[1])),
review=unlist(sapply(temp, function(i) i[2])),
ID=unlist(sapply(temp, function(i) i[3])))
I use sapply to pull out elements from each list item. Then, unlist is applied to the output so that it becomes a vector. All of this out put is wrapped in a data.frame. Note that you can rearrange the output so that the variables are arranged differently.
An even cleaner method, mentioned by #parfait, uses do.call and rbind:
# construct data.frame, rbinding each list item
df <- data.frame(do.call(rbind, temp))
# add the desired names
names(df) <- c('score', 'review', 'ID')

Resources