R's grepl in Julia - julia

I am not trying to reinvent the wheel. Just looking for a function which searches a string or string vector and returns true for each element where the match is found. This is what I tried so far.
grepl(x::String, y) = length(search(x, y)) > 0
grepl(x::Vector{String}, y) = length.(search(x, y)) .> 0
grepl(x::Vector{AbstractString}, y) = length.(search(x, y)) .> 0
Example usage:
v = string.('a':'z')
x = rand(v, 100) .* rand(v, 100) .* rand(v, 100)
grepl(convert(Vector{String}, x), "z")
Well, this would be a working example if I could get my types to work properly. Basically I could use the return to select only elements which have "z" in them.

Just use contains. On 0.6, you can use it directly with dot-broadcasting:
julia> contains.(["foo","bar","baz"],"ba")
3-element BitArray{1}:
false
true
true
On 0.5, you can simply wrap the second argument in an array: contains.(["foo","bar","baz"],["ba"]).

Related

In R, how to write a nested function that uses the arguments from the outer function?

f1<-function(x,y){
f2<-function(a,b){
print("f2")
return(a+b)}
f2(x,y)
print("f1")
return(x-y)}
f1(8,5)
I was trying above code to figure out the steps of operating function within function, so instead of writing two separate functions, I write the above code. But I can't get the output for a+b (which is 13)
[1] "f2"
[1] "f1"
[1] 3
#[1] 13, this output is missing.
How should the code be corrected? Thank you.
*additional question: when I only write x-y instead of return(x-y) at the last line of the function f1, I got the same output. Is simply write x-y a bad practice or accpetable?
-------------------------Update:
I just find out a way to get all the four outputs by changing the 4th line from return(a+b) to print(a+b)
or to make it more simple, only use the x,yarguments:
f1<-function(x,y) {
f2<-function() {
print("f2")
print(x+y)
}
f2()
print("f1")
x-y
}
while I still don't understand why using return(x+y) or simply x+y at the 4th line could not get the output of 13?
When an expression is on a line by itself it will automatically print if you do it at the R console but that does not happen if it is within a function or within an expression. Use cat or print for displaying.
To return two objects return a list containing both of them as shown at below.
The value of the last line that is run in a function is returned so you rarely need return.
f1a <- function(x, y) {
f2 <- function(a, b) {
print("f2")
a + b
}
print("f1")
list(x - y, f2(x, y))
}
result <- f1a(8, 5)
## [1] "f1"
## [1] "f2"
result[[1]]
## [1] 3
result[[2]]
## [1] 13
result
## [[1]]
## [1] 3
##
## [[2]]
## [1] 13
Other things we could do would be to replace the list(...) line in the code above with one of the following. (The c versions would only be used if we knew that the arguments were always scalars.)
list(f1 = x - y, f2 = f2(x, y)) # named list
c(x - y, f2(x, y)) # 2 element numeric vector
c(f1 = x - y, f2 = f2(x, y)) # 2 element named numeric vector
cbind(f1 = x - y, f2 = f2(x, y)) # matrix w column names
data.frame(f1 = x - y, f2 = f2(x, y)) # data.frame

How to insert an element at a specific position of an empty vector?

In R we can create an empty vector where it is possible to insert an element in any position of this vector.
Example:
> x <- c()
> x[1] = 10
> x[4] = 20
The final result is:
> x
[1] 10 NA NA 20
I would like to do something similar using Julia, but couldn't find a way to do this.
The “append” function do not perform something like that.
Could anyone help?
You need to do this in two steps:
First resize the vector or create a vector with an appropriate size.
Next set the elements accordingly.
Since you are coming from R I assume you want the vector to be initially filled with missing values. Here is the way to do this.
In my example I assume you want to store integers in the vector. Before both options load the Missings.jl package:
using Missings
Option 1. Start with an empty vector
julia> x = missings(Int, 0)
Union{Missing, Int64}[]
julia> resize!(x, 4)
4-element Vector{Union{Missing, Int64}}:
missing
missing
missing
missing
julia> x[1] = 10
10
julia> x[4] = 40
40
julia> x
4-element Vector{Union{Missing, Int64}}:
10
missing
missing
40
Option 2. Preallocate a vector
julia> x = missings(Int, 4)
4-element Vector{Union{Missing, Int64}}:
missing
missing
missing
missing
julia> x[1] = 10
10
julia> x[4] = 40
40
The reason why Julia does not resize the vectors automatically is for safety. Sometimes it would be useful, but most of the time if x is an empty vector and you write x[4] = 40 it is a bug in the code and Julia catches such cases.
EDIT
What you can do is:
function setvalue(vec::Vector, idx, val)
#assert idx > 0
if idx > length(vec)
resize!(vec, idx)
end
vec[idx] = val
return vec
end

outer reuses first element of X instead of doing its job

I have a two argument function that takes as its first input a triple of pairs of numbers in the form "(a, b)(c, d)(e, f)" (as a character string) and as second argument a pair of numbers (also written as a character string of the form "(a, b)") and outputs a logical that states if the pair (the second argument) is one of the three pairs in the triple (the first argument). I actually wrote two versions:
version1 <- function(x, y){#x is a triple of pairs, y is a pair
pairsfromthistriple <- paste(c("", "(", "("), strsplit(x, split = ")(", fixed = T)[[1]], c(")", ")", ""), sep = "")
y %in% pairsfromthistriple
}
version2 <- function(x, y){#x is triple of pairs, y is pair
y == substr(x, 1, 6) | y == substr(x, 7, 12) | y == substr(x, 13, 18)
}
I want to set this function loose for every triple-of-pairs from a vector of triples an every pair from some vector of pairs using outer. For here I'll us the following very short vectors:
triples <- c("(1, 2)(3, 4)(5, 6)", "(1, 2)(3, 5)(4, 6)")
names(triples) <- triples
pairs <- c("(5, 6)", "(3, 5)")
names(pairs) <- pairs
So here we go:
test1 <- outer(X = triples, Y = pairs, FUN = version1)
test2 <- outer(X = triples, Y = pairs, FUN = version2)
test2 evaluates to exactly what you expect, but test1 gives a non-sensical output:
> test1
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) TRUE FALSE
> test2
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) FALSE TRUE
The natural conclusion is that there is an error in version1, but it is not as simple as that. 'Manually' computing the terms in the matrix using version1 gives:
> version1(triples[1], pairs[1])
[1] TRUE
> version1(triples[1], pairs[2])
[1] FALSE
> version1(triples[2], pairs[1])
[1] FALSE
> version1(triples[2], pairs[2])
[1] TRUE
exactly as it should! So at least part of the fault is with the function outer. In fact what happens (in this small example it is not so clear, but this is very visible in larger examples) is that outer correctly computes the first row of its output matrix, but then copies this first row over and over to make up the subsequent rows. Obviously this is not what I want. If I only wanted to compute version1(x, y) for all y in some vector but just one single x, I would have used sapply rather than outer.
What is going on here?
Note this detail from the documentation for ?outer:
X and Y must be suitable arguments for FUN. Each will be extended by rep to length the products of the lengths of X and Y before FUN is called.
FUN is called with these two extended vectors as arguments (plus any arguments in ...). It must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).
Your version1 function is not vectorized properly like version2 is. You can see this by simply testing it on the original triples and pairs vectors, which should both match.
version1(triples, pairs)
#> [1] TRUE FALSE
version2(triples, pairs)
#> (5, 6) (3, 5)
#> TRUE TRUE
Your version1 function seems designed for use with apply(), because you retrieve a list from strsplit() but then just take the first element. If you want to maintain the approach of splitting the vector, then you would have to use the apply family of functions. Without using them, you are going to expand the triples or x vector into something much longer than y and you can't do element wise comparison.
However, I would just use something very simple. stringr::str_detect is already vectorized for string and pattern, so you can just use that directly.
library(stringr)
outer(X = triples, Y = pairs, FUN = str_detect)
#> (5, 6) (3, 5)
#> (1, 2)(3, 4)(5, 6) TRUE FALSE
#> (1, 2)(3, 5)(4, 6) FALSE TRUE

R identical returning False for strings that are identical [duplicate]

This question already has an answer here:
Behavior of identical() in apply in R
(1 answer)
Closed 4 years ago.
I have a data frame consisting of identical strings, but the identical() function is returning false when I compare them?
Example:
df <- data.frame("x" = rep("a", times = 10),
"y" = rep("a", times = 10))
checkEquality <- function(x) {
y = x[1]
z = x[2]
return(identical(y, z))
}
apply(df[1:2], 1, checkEquality)
This code returns a vector of FALSE when it should return a vector of TRUE. I have no idea what's going on here. Any help appreciated.
It's because they're not totally identical. Your function takes the data frame row by row and then compares the former columns. Since you use the single bracket operator [] you maintain the column and row names:
x = df[1,]
x[1]
x
1 a
x[2]
y
1 a
While the value is the same, the column names are different so the two vectors are not identical.
If you use the double bracket notation [[]], then it will extract just that one element, dropping the row and column names and it should work:
checkEquality <- function(x) {
y = x[[1]]
z = x[[2]]
return(identical(y, z))
}
apply(df, 1, checkEquality)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
I haven't used identical() before, but have you tried ifelse()?
ifelse(col1==col2, 'TRUE', 'FALSE')

An Error in R: When I try to apply outer function:

Here is my code:
Step1: Define a inverse function which I will use later
inverse = function (f, lower = -100, upper = 100) {
function (y) uniroot((function (x) f(x) - y), lower = lower, upper = upper)[1]
}
Step2: Here is my functions and their inverse:
F1<-function(x,m1,l,s1,s2){l*pnorm((x-m1)/s1)+(1-l)*pnorm((x+m1)/s2)}
F1_inverse = inverse(function(x) F1(x,1,0.1,2,1) , -100, 100)
F2<-function(x,m2,l,s1,s2){l*pnorm((x-m2)/s1)+(1-l)*pnorm((x+m2)/s2)}
F2_inverse = inverse(function(x) F1(x,1,0.1,2,1) , -100, 100)
Step3: Here is my final function which combines the above functions (I am sure the function is correct):
copwnorm<-function(x,y,l,mu1,mu2,sd1,sd2) {
(l*dnorm(((F1_inverse(pnorm(x))$root-mu1)/sd1))*
dnorm(((F2_inverse(pnorm(y))$root-mu2)/sd1)))
}
Step4: I want to create a contour plot for the function in Stepenter code here3:
x<-seq(-2,2,0.1)
y<-seq(-2,2,0.1)
z<-outer(x,y,copwnorm)
contour(x,y,z,xlab="x",ylab="y",nlevels=15)
Here is the problem comes in, when I tried to apply function outer(x,y,copwnorm), it gives me an error:invalid function value in 'zeroin'. May I ask how to solve this problem?
I believe it is a very commom misconception to assume that outer(x, y, FUN) calls the function parameter (FUN) once for each required pair x[i] and y[j]. Actually, outer calls FUN only once, after creating all possible pairs, combining every element of x with every element of y, in a manner similar to the function expand.grid.
I'll show that with an example: consider this function, which is a wrapper for the product and print a message every time it's called:
f <- function(x,y)
{
cat("f called with arguments: x =", capture.output(dput(x)), "y =", capture.output(dput(y)), "\n")
x*y
}
This function is "naturally" vectorized, so we can call it with vector arguments:
> f(c(1,2), c(3,4))
f called with arguments: x = c(1, 2) y = c(3, 4)
[1] 3 8
Using outer:
> outer(c(1,2), c(3,4), f)
f called with arguments: x = c(1, 2, 1, 2) y = c(3, 3, 4, 4)
[,1] [,2]
[1,] 3 4
[2,] 6 8
Notice the combinations generated.
If we can't guarantee that the function can handle vector arguments, there is a simple trick to ensure the function gets called only once for each pair in the combinations: Vectorize. This creates another function that calls the original function once for each element in the arguments:
> Vectorize(f)(c(1,2),c(3,4))
f called with arguments: x = 1 y = 3
f called with arguments: x = 2 y = 4
[1] 3 8
So we can make a "safe" outer with it:
> outer(c(1,2), c(3,4), Vectorize(f))
f called with arguments: x = 1 y = 3
f called with arguments: x = 2 y = 3
f called with arguments: x = 1 y = 4
f called with arguments: x = 2 y = 4
[,1] [,2]
[1,] 3 4
[2,] 6 8
In this case, the results are the same because f was written in a vectorized way, i.e., because "*" is vectorized. But if your function is not written with this in mind, using it directly in outer may fail or (worse) may give wrong results.

Resources