I have a lot of unclean data in the form:
abc
abc/def
abc/de
abc/d
abc/def/i j k
abc/def/i
abc/def/i j
This is just the part of the data I would like to change. This is part of much bigger set of data.
I would like to change all the elements to abc/def/i j k.
I have used the gsub() function as follows:
gsub('abc[a-z/]', 'abc/def/i j k', str)
output :
abc/def/i j k
abc/def/i j k/def
abc/def/i j k/de
abc/def/i j k/d
The problem being that it replaces any occurrence of the pattern.
The only solution where i got decent enough results are where i hard code all the possible options like this:
gsub('abc$|abc/d$|abc/de$|abc/def/i$', 'abc/def/i j k', str)
However, this would not work if there is a variation in any new data.
So I was wondering if it was possible to get the result without hard coding the parameters.
You may use
x <- c("abc", "abc/def","abc/de","abc/d","abc/def/i j k","abc/def/i","abc/def/i j")
sub("^(abc)(?:/[^/]*)?", "\\1/def", x)
## => [1] "abc/def" "abc/def" "abc/def" "abc/def"
## [5] "abc/def/i j k" "abc/def/i" "abc/def/i j"
See R demo
Details:
^ - start of string
(abc) - Group 1: abc
(?:/[^/]*)? - an optional group matching a sequence of:
/ - a /
[^/]* - 0+ chars other than /
Related
I want to extract all substrings that begin with M and are terminated by a *
The string below as an example;
vec<-c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
Would ideally return;
MGMTPRLGLESLLE
MTPRLGLESLLE
I have tried the code below;
regmatches(vec, gregexpr('(?<=M).*?(?=\\*)', vec, perl=T))[[1]]
but this drops the first M and only returns the first string rather than all substrings within.
"GMTPRLGLESLLE"
You can use
(?=(M[^*]*)\*)
See the regex demo. Details:
(?= - start of a positive lookahead that matches a location that is immediately followed with:
(M[^*]*) - Group 1: M, zero or more chars other than a * char
\* - a * char
) - end of the lookahead.
See the R demo:
library(stringr)
vec <- c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
matches <- stringr::str_match_all(vec, "(?=(M[^*]*)\\*)")
unlist(lapply(matches, function(z) z[,2]))
## => [1] "MGMTPRLGLESLLE" "MTPRLGLESLLE"
If you prefer a base R solution:
vec <- c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
matches <- regmatches(vec, gregexec("(?=(M[^*]*)\\*)", vec, perl=TRUE))
unlist(lapply(matches, tail, -1))
## => [1] "MGMTPRLGLESLLE" "MTPRLGLESLLE"
This could be done instead with a for loop on a char array converted from you string.
If you encounter a M you start concatenating chars to a new string until you encounter a *, when you do encounter a * you push the new string to an array of strings and start over from the first step until you reach the end of your loop.
It's not quite as interesting as using REGEX to do it, but it's failsafe.
It is not possible to use regular expressions here, because regular languages don't have memory states required for nested matches.
stringr::str_extract_all("abaca", "a[^a]*a") only gives you aba but not the sorrounding abaca.
The first M was dropped, because (?<=M) is a positive look behind which is by definition not part of the match, but just behind it.
I'm trying to understand 2 OCaml operators: ## and |>
I understand that x |> f is just f(x), but why it exists? I cannot see why. The same for ##, which as I unferstood, is just normal function application
For example:
match get_ipv4_hlen_version buf |> version with
| 0x40 -> Ok buf
| n -> Error (Printf.sprintf "IPv4 presented with a packet that claims a different IP version: %x" n)
why not write just get_ipv4_hlen_version version buf?
What about
let options_len = nearest_4 ## Cstruct.len t.options
why not let options_len = nearest_4 Cstruct.len t.options
?
I suppose it has to do with precedence, I recall some of these things from Haskell but I don't know Haskell I just read somewhere.
How do I know the precedence of things?
if more context is needed, these 2 codes came from https://github.com/mirage/mirage-tcpip/blob/master/src/ipv4/ipv4_packet.ml
The notational value of |> only appears if you have several nested function applications. Many people find this:
x |> f a |> g b c |> h d
easier to read than this:
h d (g b c (f a x))
because it's no longer necessary to match up the parentheses mentally, and because the operations are applied in left-to-right order (which is arguably natural for readers of English and other left-to-right languages).
If you are familiar with Unix command lines, it might help to think of the |> operator as similar to the Unix pipe operator |.
A lower-precedence function application operator like ## also helps avoid parentheses (and mental matching thereof). Many people find this:
f x ## g a b ## h c d
easier to read than this:
f x ((g a b) (h c d))
Your example for ## is wrong. This
let options_len = nearest_4 ## Cstruct.len t.options
is equivalent to this:
let options_len = nearest_4 (Cstruct.len t.options)
and is not equivalent to what you wrote.
The precedence of an operator is determined by its first character. This, in turn, is defined by the table in Section 7.7.1 of the OCaml manual.
(Granted, you need to read very carefully the text just before the table to see the rule for precedence.)
Update
Full disclosure: I never use |> or ## in my own code. I have no problem with a few parentheses, and I generally use let to break a big expression down into smaller pieces.
The |> operator is very convenient. It is the equivalent of the pipe in the shell. It allows you to write code like this:
let make_string n =
Array.init n float_of_int
|> Array.map (fun x -> x -. 0.5 *. (float_of_int (n-1)))
|> Array.map (fun x -> Printf.sprintf "-- %10.6f --" x)
|> Array.to_list
|> String.concat "\n"
in
make_string 5
(* Output:
-- -2.000000 --
-- -1.000000 --
-- 0.000000 --
-- 1.000000 --
-- 2.000000 --
*)
In this example, each line starting with a |> takes the output of the previous transformation, so we can see the flow of data transformations, like in Bash when we write something like
ls | grep txt | sort | uniq
The ## operator is the "backwards pipe". It allows to remove parenthesis that would make the code less readable. For example, take the case where we want to make a chain of matrix products like C = A.B.C.D. You want the code to be consistent with the mathematical formula, so you want to write it in the same order. If mm A B makes the matrix multiplication of A and B, then we can write
let mat_C =
mm mat_A ## mm mat_B ## mm mat_C mat_D
instead of
let mat_C =
mm mat_A (mm mat_B (mm mat_C mat_D))
To split a number into digits in a given base, Julia has the digits() function:
julia> digits(36, base = 4)
3-element Array{Int64,1}:
0
1
2
What's the reverse operation? If you have an array of digits and the base, is there a built-in way to convert that to a number? I could print the array to a string and use parse(), but that sounds inefficient, and also wouldn't work for bases > 10.
The previous answers are correct, but there is also the matter of efficiency:
sum([x[k]*base^(k-1) for k=1:length(x)])
collects the numbers into an array before summing, which causes unnecessary allocations. Skip the brackets to get better performance:
sum(x[k]*base^(k-1) for k in 1:length(x))
This also allocates an array before summing: sum(d.*4 .^(0:(length(d)-1)))
If you really want good performance, though, write a loop and avoid repeated exponentiation:
function undigit(d; base=10)
s = zero(eltype(d))
mult = one(eltype(d))
for val in d
s += val * mult
mult *= base
end
return s
end
This has one extra unnecessary multiplication, you could try to figure out some way of skipping that. But the performance is 10-15x better than the other approaches in my tests, and has zero allocations.
Edit: There's actually a slight risk to the type handling above. If the input vector and base have different integer types, you can get a type instability. This code should behave better:
function undigits(d; base=10)
(s, b) = promote(zero(eltype(d)), base)
mult = one(s)
for val in d
s += val * mult
mult *= b
end
return s
end
The answer seems to be written directly within the documentation of digits:
help?> digits
search: digits digits! ndigits isdigit isxdigit disable_sigint
digits([T<:Integer], n::Integer; base::T = 10, pad::Integer = 1)
Return an array with element type T (default Int) of the digits of n in the given base,
optionally padded with zeros to a specified size. More significant digits are at higher
indices, such that n == sum([digits[k]*base^(k-1) for k=1:length(digits)]).
So for your case this will work:
julia> d = digits(36, base = 4);
julia> sum([d[k]*4^(k-1) for k=1:length(d)])
36
And the above code can be shortened with the dot operator:
julia> sum(d.*4 .^(0:(length(d)-1)))
36
Using foldr and muladd for maximum conciseness and efficiency
undigits(d; base = 10) = foldr((a, b) -> muladd(base, b, a), d, init=0)
I have a table "weather". I insert weather conditions for a particular day. I can't seem to write a function that prints the contents of "weather" (see below for things I've tried.
day = "Friday"
conditions = {"Sunny", "85", "windy"}
weather = {{}} --nested table
for k, v in pairs(conditions) do
weather[day] = {[k]=v}
end
I've tried two things to print the weather table and neither work.
for k, v in pairs(weather) do
print(k, v)
end
---- Output ---
1 table: 0x2542ae0
Friday table: 0x25431a0
This doesn't work either, but I thought it would
for k, v in pairs(weather) do
for l, w in pairs(v) do
print(l, w)
end
end
----Output----
3 windy
You are overwriting weather[day] in the first loop and so only the last value remains.
I think you want simply this, instead of that loop:
weather[day] = conditions
I have the image and the vector
a = imread('Lena.tiff');
v = [0,2,5,8,10,12,15,20,25];
and this M-file
function y = Funks(I, gama, c)
[m n] = size(I);
for i=1:m
for j=1:n
J(i, j) = (I(i, j) ^ gama) * c;
end
end
y = J;
imshow(y);
when I'm trying to do this:
f = Funks(a,v,2)
I am getting this error:
??? Error using ==> mpower
Integers can only be combined with integers of the same class, or scalar doubles.
Error in ==> Funks at 5
J(i, j) = (I(i, j) ^ gama) * c;
Can anybody help me, with this please?
The error is caused because you're trying to raise a number to a vector power. Translated (i.e. replacing formal arguments with actual arguments in the function call), it would be something like:
J(i, j) = (a(i, j) ^ [0,2,5,8,10,12,15,20,25]) * 2
Element-wise power .^ won't work either, because you'll try to "stuck" a vector into a scalar container.
Later edit: If you want to apply each gamma to your image, maybe this loop is more intuitive (though not the most efficient):
a = imread('Lena.tiff'); % Pics or GTFO
v = [0,2,5,8,10,12,15,20,25]; % Gamma (ar)ray -- this will burn any picture
f = cell(1, numel(v)); % Prepare container for your results
for k=1:numel(v)
f{k} = Funks(a, v(k), 2); % Save result from your function
end;
% (Afterwards you use cell array f for further processing)
Or you may take a look at the other (more efficient if maybe not clearer) solutions posted here.
Later(er?) edit: If your tiff file is CYMK, then the result of imread is a MxNx4 color matrix, which must be handled differently than usual (because it 3-dimensional).
There are two ways I would follow:
1) arrayfun
results = arrayfun(#(i) I(:).^gama(i)*c,1:numel(gama),'UniformOutput',false);
J = cellfun(#(x) reshape(x,size(I)),results,'UniformOutput',false);
2) bsxfun
results = bsxfun(#power,I(:),gama)*c;
results = num2cell(results,1);
J = cellfun(#(x) reshape(x,size(I)),results,'UniformOutput',false);
What you're trying to do makes no sense mathematically. You're trying to assign a vector to a number. Your problem is not the MATLAB programming, it's in the definition of what you're trying to do.
If you're trying to produce several images J, each of which corresponds to a certain gamma applied to the image, you should do it as follows:
function J = Funks(I, gama, c)
[m n] = size(I);
% get the number of images to produce
k = length(gama);
% Pre-allocate the output
J = zeros(m,n,k);
for i=1:m
for j=1:n
J(i, j, :) = (I(i, j) .^ gama) * c;
end
end
In the end you will get images J(:,:,1), J(:,:,2), etc.
If this is not what you want to do, then figure out your equations first.