Remove string if it is only last part - r

I have a dataframe as follows:
A B
mediafile 1
filemedia 1
media time 1
time media 1
How do I remove the word "media" only if it is the last string in the column. Final Output:
A B
mediafile 1
file 1
media time 1
time 1
Thanks!

In regex, $ means "end of the string", so media$ will match media only if it is immediately followed by the end of the string.
Use gsub for find/replace:
your_data$A = gsub(pattern = "media$", replacement = "", x = your_data$A)
R uses regex the same as any other language, so in the future I'd recommend searching SO for something like "[regex] at end of string", which turned up this question, from which you probably could have generalized.

Related

How can I use R Regular Expressions to catch a Hebrew word?

I've been trying to catch the word
עונה
plus the subsequent number after it in a string such as
כל הילדים אוכלים, עונה 2 , פרק 8-לזניית ירקות וסלמון בדבש
Demonstrating it on Regex101.com was straightforward enough, with עונה(\s+\d+|\d+), but with R I came up empty.
str<-"כל הילדים אוכלים, עונה 2 , פרק 8-לזניית ירקות וסלמון בדבש"
exp<-"עונה(\\s+\\d+|\\d+)"
str_extract_all(str,exp)
Output:
[[1]]
character(0)
You can use this regex:
/[\u0590-\u05FF]/*

Regular Expression To exclude sub-string name(job corps) Includes at least 1 upper case letter, 1 lower case letter, 1 number and 1 symbol except "#"

Regular Expression To exclude sub-string name(job corps)
Includes at least 1 upper case letter, 1 lower case letter, 1 number and 1 symbol except "#"
I have written something like below :
^((?!job corps).)(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!#$%^&*]).*$
I tested with the above regular expression, not working for special character.
can anyone guide on this..
If I understand well your requirements, you can use this pattern:
^(?![^a-z]*$|[^A-Z]*$|[^0-9]*$|[^!#$%^&*]*$|.*?job corps)[^#]*$
If you only want to allow characters from [a-zA-Z0-9^#$%&*] changes the pattern to:
^(?![^a-z]*$|[^A-Z]*$|[^0-9]*$|[^!#$%^&*]*$|.*?job corps)[a-zA-Z0-9^#$%&*]*$
details:
^ # start of the string
(?! # not followed by any of these cases
[^a-z]*$ # non lowercase letters until the end
|
[^A-Z]*$ # non uppercase letters until the end
|
[^0-9]*$
|
[^!#$%^&*]*$
|
.*?job corps # any characters and "job corps"
)
[^#]* # characters that are not a #
$ # end of the string
demo
Note: you can write the range #$%& like #-& to win a character.
stribizhev, your answer is correct
^(?!.job corps)(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[!#$%^&])(?!.#).$
can verify the expression in following url:
http://www.freeformatter.com/regex-tester.html

String recognition in idl

I have the following strings:
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat
and from each I want to extract the three variables, 1. SWIR32 2. the date and 3. the text following the date. I want to automate this process for about 200 files, so individually selecting the locations won't exactly work for me.
so I want:
variable1=SWIR32
variable2=2005210
variable3=East_A
variable4=SWIR32
variable5=2005210
variable6=Froemke-Hoy
I am going to be using these to add titles to graphs later on, but since the position of the text in each string varies I am unsure how to do this using strmid
I think you want to use a combination of STRPOS and STRSPLIT. Something like the following:
s = ['F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat', $
'F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat']
name = STRARR(s.length)
date = name
txt = name
foreach sub, s, i do begin
sub = STRMID(sub, 1+STRPOS(sub, '\', /REVERSE_SEARCH))
parts = STRSPLIT(sub, '_', /EXTRACT)
name[i] = parts[0]
date[i] = parts[1]
txt[i] = STRJOIN(parts[2:*], '_')
endforeach
You could also do this with a regular expression (using just STRSPLIT) but regular expressions tend to be complicated and error prone.
Hope this helps!

Pyparsing - name not starting with a character

I am trying to use Pyparsing to identify a keyword which is not beginning with $ So for the following input:
$abc = 5 # is not a valid one
abc123 = 10 # is valid one
abc$ = 23 # is a valid one
I tried the following
var = Word(printables, excludeChars='$')
var.parseString('$abc')
But this doesn't allow any $ in var. How can I specify all printable characters other than $ in the first character position? Any help will be appreciated.
Thanks
Abhijit
You can use the method I used to define "all characters except X" before I added the excludeChars parameter to the Word class:
NOT_DOLLAR_SIGN = ''.join(c for c in printables if c != '$')
keyword_not_starting_with_dollar = Word(NOT_DOLLAR_SIGN, printables)
This should be a bit more efficient than building up with a Combine and a NotAny. But this will match almost anything, integers, words, valid identifiers, invalid identifiers, so I'm skeptical of the value of this kind of expression in your parser.

grep on two strings

I'm working to grab two different elements in a string.
The string look like this,
str <- c('a_abc', 'b_abc', 'abc', 'z_zxy', 'x_zxy', 'zxy')
I have tried with the different options in ?grep, but I can't get it right, 'm doing something like this,
grep('[_abc]:[_zxy]',str, value = TRUE)
and what I would like is,
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
any help would be appreciated.
Use normal parentheses (, not the square brackets [
grep('_(abc|zxy)',str, value = TRUE)
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
To make the grep a bit more flexible, you could do something like:
grep('_.{3}$',str, value = TRUE)
Which will match an underscore _ followed by any character . three times {3} followed immediately by the end of the string $
this should work: grep('_abc|_zxy', str, value=T)
X|Y matches when either X matches or Y matches
In this case just doing:
str[grep("_",str)]
will work... is it more complicated in your specific case?

Resources