I have a string which contains HTML code and an image. I need to get the value of the src attribute from that string. I try to use this code but it's not working
foreach (Match match in Regex.Matches(wordHTML, "<img.*?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase))
{
wordHTML = Regex.Replace(wordHTML, match.Groups[1].Value, "Temp/"+ match.Groups[1].Value);
}
my image path
<img width="165" height="138" src="636697542198949135.files/image002.jpg" v:shapes="Рисунок_x0020_7">
Julio's answer is a good one, but the next regex uses backreference in case the src has single or double quotes in it and also contemplates empty src's:
<img[^>]*?\ssrc=(["'])([^\1]*?)\1
The full src of the img (without quotes) will be group number 2 in the regular expression
I am try this expression and this work.
src=(?:\"|\')?(?<imgSrc>[^>]*[^/].(?:jpg|bmp|gif|png))(?:\"|\')?
Try with this:
<img\s+[^>]*\bsrc=["']([^"']+)["']
Demo
<img # literal '<img'
\s+ # one or more 'spaces'
[^>]* # 0 or more non-'>' character
\b # word boundary
src=["'] # literal src=
["'] # " or '
([^"']+) # capture: one or more non ' and " character
["'] # literal "
Try to specify pattern like this:
string pattern = #"<img\s+[^>]*\bsrc=[\"']([^\"']+)[\"']";
foreach (Match match in Regex.Matches(sentence, pattern))
Related
I have a dataframe with one column that represents the requests made by my users. A few examples look like this:
GET /enviro/html/tris/tris_overview.html
GET /./enviro/gif/emcilogo.gif
GET /docs/exposure/meta_exp.txt.html
GET /hrmd/
GET /icons/circle_logo_small.gif
I only want to select the last part of the string after the last "." in such a way that I return the pagetype of the string. The output of these lines should therefore be:
.html
.gif
.html
.gif
I tried doing this with sub but I only manage to select everything after the first "." example:
tring <- c("GET /enviro/html/tris/tris_overview.html", "GET /./enviro/gif/emcilogo.gif", "GET /docs/exposure/meta_exp.txt.html", "GET /hrmd/", "GET /icons/circle_logo_small.gif")
sub("^[^.]*", "", sapply(strsplit(tring, "\\s+"), `[`, 2))
this returns:
".html"
"./enviro/gif/emcilogo.gif"
".txt.html"
""
".gif"
I created the following gsub code that works for string containing two points:
gsub(pattern = ".*\\.", replacement = "", "GET /./enviro/gif/finds.gif", "\\s+")
this returns:
"gif"
However, I cant seem to come up with one gsub/sub that works for all possible input. It should read the string from right to left. Stop when it sees the first "." and return everything that was found after that "."
I am new to R and I can't come up with something that is doing this. Any help would be highly appreciated!
You can't change the string parsing direction with R regex. Instead, you may match all up to . and remove it, or match the . that has no . chars to the right of it till the end of string.
string <- c('GET /enviro/html/tris/tris_overview.html','GET /./enviro/gif/emcilogo.gif','GET /docs/exposure/meta_exp.txt.html','GET /hrmd/','GET /icons/circle_logo_small.gif')
res <- regmatches(string, regexec("\\.[^.]*$", string))
res[lengths(res)==0] <- ""
unlist(res)
Or
sub("^(.*(?=\\.)|.*)", "", string, perl=TRUE)
See the R online demo. Both return
[1] ".html" ".gif" ".html" "" ".gif"
Here, \.[^.]*$ matches a . and then any 0+ chars other than . till the end of string. The sub code used ^(.*(?=\\.)|.*) pattern that matches the start of string, then either any 0+ chars as many as possible till . without consuming the dot, or just matches any 0+ chars as many as possible, and replaces the match with an empty string.
See Regex 1 and Regex 2 demos.
Here is a regex-free solution:
sapply(
seq_along(a),
function(i) {
if (grepl("\\.", a[i])) tail(strsplit(a[i], "\\.")[[1]], 1) else ""
}
)
# [1] "html" "gif" "html" "" "gif"
I am using the following regular expression to find words and phrases in a document. (Have to use regular expression and have to use \b.)
\b (zoo|a & b|dummy)\b
When I try to find matches in the following string
going to the zoo with a & b
a & b doesn't get matched. However, if I remove the leading and following space from the string and regex, making both a&b, it matches, but I do need to those spaces.
Use \s for spaces
string strRegex = #"\b\s(zoo|a\s&\sb|dummy)\b";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = #"going to the zoo with a & b";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
Console.WriteLine(myMatch);
}
I'm using a linq query where i do something liike this:
viewModel.REGISTRATIONGRPS = (From a In db.TABLEA
Select New SubViewModel With {
.SOMEVALUE1 = a.SOMEVALUE1,
...
...
.SOMEVALUE2 = If(commaseparatedstring.Contains(a.SOMEVALUE1), True, False)
}).ToList()
Now my Problem is that this does'n search for words but for substrings so for example:
commaseparatedstring = "EWM,KI,KP"
SOMEVALUE1 = "EW"
It returns true because it's contained in EWM?
What i would need is to find words (not containing substrings) in the comma separated string!
Option 1: Regular Expressions
Regex.IsMatch(commaseparatedstring, #"\b" + Regex.Escape(a.SOMEVALUE1) + #"\b")
The \b parts are called "word boundaries" and tell the regex engine that you are looking for a "full word". The Regex.Escape(...) ensures that the regex engine will not try to interpret "special characters" in the text you are trying to match. For example, if you are trying to match "one+two", the Regex.Escape method will return "one\+two".
Also, be sure to include the System.Text.RegularExpressions at the top of your code file.
See Regex.IsMatch Method (String, String) on MSDN for more information.
Option 2: Split the String
You could also try splitting the string which would be a bit simpler, though probably less efficient.
commaseparatedstring.Split(new Char[] { ',' }).Contains( a.SOMEVALUE1 )
what about:
- separating the commaseparatedstring by comma
- calling equals() on each substring instead of contains() on whole thing?
.SOMEVALUE2 = If(commaseparatedstring.Split(',').Contains(a.SOMEVALUE1), True, False)
I am trying to write a regular expression that doesn't allow single or double quotes in a string (could be single line or multiline string). Based on my last question, I wrote like this ^(?:(?!"|').)*$, but it is not working. Really appreciate if anybody could help me out here.
Just use a character class that excludes quotes:
^[^'"]*$
(Within the [] character class specifier, the ^ prefix inverts the specification, so [^'"] means any character that isn't a ' or ".)
Just use a regex that matches for quotes, and then negate the match result:
var regex = new Regex("\"|'");
bool noQuotes = !regex.IsMatch("My string without quotes");
Try this:
string myStr = "foo'baa";
bool HasQuotes = myStr.Contains("'") || myStr.Contains("\""); //faster solution , I think.
bool HasQuotes2 = Regex.IsMatch(myStr, "['\"]");
if (!HasQuotes)
{
//not has quotes..
}
This regular expression below, allows alphanumeric and all special characters except quotes(' and "")
#"^[a-zA-Z-0-9~+:;,/#&_#*%$!()\[\] ]*$"
You can use it like
[RegularExpression(#"^[a-zA-Z-0-9~+:;,/#&_#*%$!()**\[\]** ]*$", ErrorMessage = "Should not allow quotes")]
here use escape sequence() for []. Since its not showing in this post
i need a Regular Expression to convert a a string to a link.i wrote something but it doesnt work in asp.net.i couldnt solve and i am new in Regular Expression.This function converts (bkz: string) to (bkz: show.aspx?td=string)
Dim pattern As String = "<bkz[a-z0-9$-$&-&.-.ö-öı-ış-şç-çğ-ğü-ü\s]+)>"
Dim regex As New Regex(pattern, RegexOptions.IgnoreCase)
str = regex.Replace(str, "<font color=""#CC0000"">$1</font>")
Generic remarks on your code: beside the lack of opening parentheses, you do redundant things: $-$ isn't incorrect but can be simplified into $ only. Same for accented chars.
Everybody will tell you that font tag is deprecated even in plain HTML: favor span with style attribute.
And from your question and the example in the reply, I think the expression could be something like:
\(bkz: ([a-z0-9$&.öışçğü\s]+)\)
the replace string would look like:
(bkz: <span style=""color: #C00"">$1</span>)
BUT the first $1 must be actually URL encoded.
Your regexp is in trouble because of a ')' without '('
Would:
<bkz:\s+((?:.(?!>))+?.)>
work better ?
The first group would capture what you are after.
Thanks Vonc,Now it doesnt raise error but also When i assign str to a Label.Text,i cant see the link too.Forexample after i bind str to my label,it should be viewed in view-source ;
<span id="Label1">(bkz: here)</span>
But now,it is in viewsource source;
<span id="Label1">(bkz: here)</span>