I am looking to edit a file containing MAC addresses to all end in f
Example
<accessPointMeasurement mac="xx:xx:xx:xx:xx:xx"/>
<accessPointMeasurement mac="xx:xx:xx:xx:xx:xf"/>
I've tried looking on the site how to do it and can select only the MAC addresses but I can't do the final part of changing the last part. There are several thousand and need a way to change them all at once.
Can anyone help?
Thank You
How about:
Ctlr+H
Find what: (?<=<accessPointMeasurement mac="[^"]{16})f
Replace with: Whatever you want to replace the f
Replace all
(?<= ... ) is called lookbehind, it's a zero length assertion.
Related
I'm trying to grep strings that end in a dash in R, but having trouble. I've worked out how to grep strings ending in any punctuation mark, maybe not the best way but this worked:
grep("\\#[[:print:]]+[[:punct:]]$",c)
Can't for the life of me work out how to grep strings that end specifically in a dash
for example these strings:
- # (piano) - not this.
- # hello hello - not this either.
I'd like to sub all the stuff between the dashes (and including the dashes) with nothing "" and leave the text to the right of the second dash, which end in full stops. So, I would like the output to be (for example, based on the example above):
not this.
and
not this either.
Any help would be appreciated.
Thank you!
Maro
UPDATE:
Hi again everyone,
I'm just updating my original question again:
So what I had in my original data was these three examples (I tried to simplify in my original post above, but I think it might be helpful for you all to see what I was actually dealing with):
- # (Piano) - no, and neither can you.
- # (Piano) - uh-huh.
- # Many dreams ago - Try it again.
(numbers 1-3 are for the purposes of making things clearer, they are not part of the strings)
I was trying to find a way to delete all the stuff between and including the two dashes, and leave all the stuff after the second dash, so I wanted my output to be:
no, and neither can you.
uh-huh.
Try it again.
I ended up using this:
gsub(("-[[:blank:]]#[[:blank:]]\\(?[A-Z][a-z]*\\)?[[:blank:]]-", "", c)
which helped me get 1. and 2. in one go. But this didn't help with 3 - I thought by including the question mark after the open and close bracket (which I thought meant 'optional') this would help me get all three targets, but for some reason it didn't. To then get 3, I just ended up targeting that specific string i.e. - # Many dreams ago -, by using:
gsub(("- # Many dreams ago -"), "", c)
I'm new to this, so not the best solution I'm sure.
In my original post (this has been edited a couple of times) I included square brackets around the three strings, which explains some of the answers I originally received from members of the community. Apologies for the confusion!
Thanks everyone - if there's anything that doesn't make sense, please let me know, and I'll try to clarify.
Maro
If you want to stay in between the square brackets you can start the match at #, then use a negated character class [^][]* matching optional chars other than an opening or closing square bracket, and match the last -
Replace the match with an empty string.
c <- "[- # (piano) - not this.]"
sub("#[^][]*-", "", c)
Output
[1] "[- not this.]"
For a more specific match of that string format, you can match the whole line including the square brackets, the # and the string ending on a full stop, and capture what you want to keep.
In the replacement use the capture group value.
c <- c("[- # (piano) - not this.]", "[- # hello hello - not this either.]")
sub("\\[[^][#]*#[^][]*-\\s*([^][]*\\.)]", "\\1", c)
Output
[1] "not this." "not this either."
stopwords_tr <- data.frame(word = stopwords::stopwords("tr",source="stopwords-iso"), stringsAsFactors = FALSE)
stopwords_tr
Some characters in stopwords_tr are not in Turkish. For example;
1 acaba
2 acep
3 adamakıllı
4 adeta
5 ait
6 altmýþ <-Here must be: altmış
7 altmış
8 altý <-Here must be: altı
I'm looking for a way to fix them.
stopwords_tr$word<-gsub("ý","ı",stopwords_tr$word)
The result has not changed.
I tried these, but it didn't.
Encoding (stopwords_tr $ word) <- "WINDOWS-1254"
Encoding (stopwords_tr $ word) <- "LATIN-5"
Encoding (stopwords_tr $ word) <- "UTF-8"
Another interesting thing.
When you double-click stopwords_tr in R Studio to display it, the character appears "ý". In Console, it looks like "y".
Is there a parameter to set encoding?
Thanks to everyone.
If you're sure this is an error, I think the best way to fix this is to fix the original source: post an issue to https://github.com/stopwords-iso/stopwords-iso/issues or https://github.com/stopwords-iso/stopwords-tr/issues (not sure which is better; try one, and if you get it wrong, they'll tell you!)
But check that it really is wrong. I don't know Turkish, but when I do a Google search for "altmýþ", I find it on several pages that look like Turkish to me, e.g. https://greatsong.net/PAROLES-ISMAIL-YK,ISTEMIYORUM-SENI,101646494.html. Probably an encoding error, but if it is a common one, maybe you really do want it in the list.
Regarding the display issues: sounds like you're on Windows. R on Windows has issues displaying non-native characters. You probably don't have Icelandic installed, so it will have trouble displaying a word like altmýþ.
I followed #user2554330's advice. However, I applied to a different address than the address he showed.
I contacted the creator of stopwords-tr (Kenneth Benoit). The problem stems from a mis-encoded data source. I also noticed repetitive words and reported them. Together we solved the character problem. stopwords-tr was updated. In the following address;
(Fix Turkish #16)
https://github.com/quanteda/stopwords/pull/16
devtools::install_github("quanteda/stopwords", ref = "fix-tr")
stopwords("tr", source = "stopwords-iso")
"Turkish Stopwords" now seems to be properly encoded.
Greetings..
I'm having trouble reading this table into R:
http://www.census.gov/popest/about/geo/state_geocodes_v2012.txt
I tried all of the following:
read.table("http://www.census.gov/popest/about/geo/state_geocodes_v2012.txt")
read.table("http://www.census.gov/popest/about/geo/state_geocodes_v2012.txt",skip=7,header=FALSE)
read.table("http://www.census.gov/popest/about/geo/state_geocodes_v2012.txt",skip=8,header=FALSE)
read.table("http://www.census.gov/popest/about/geo/state_geocodes_v2012.txt",skip=10,header=FALSE)
If I tell it that the separator is a tab, i get the wrong table:
d = read.table(file="http://www.census.gov/popest/about/geo/state_geocodes_v2012.txt",header=FALSE,skip=7,sep="\t")
the only thing that seems to work is readLines. but then i don't know how to get a data.frame out of each line.
d =readLines("http://www.census.gov/popest/about/geo/state_geocodes_v2012.txt")
any suggestions? thanks.
I agree that read.fwf will work, once you've worked out the widths.
But, Yeah -- I just hate people who allow whitespace inside elements (e.g. "SouthDakota" ) . One other thing you can do is edit the source text file, replacing {2,N} spaces with a tab. That will leave the state names as-is but give you a workable delimiter.
I am currently investigating the most appropriate dictionary to use in an application I am building.
Inspecting the dictionaries which are bundled with Sublime Text 2, the file format is as you would expect - a list of alphabetically ordered words. However, alot of those words have additional information appended to them. Take this snippet as an example:
abaft
abbreviation/M
abdicate/DNGSn
Abelard/M
abider/M
Abidjan
ablaze
abloom
aboveground
abrader/M
Abram/M
abreaction/MS
abrogator/MS
abscond/DRSG
absinthe/MS
absoluteness/S
absorbency/SM
abstract/ShTVDPiGY
absurdness/S
A fruitless Google search has not shed any light on what the letters after the slash (/) mean.
Maybe they hint at the sex of the word, but that is only a guess and I'd prefer to read a formal explanation of their meaning.
Has anybody come across these?
The letters following the slash are called affixes. These encodings can be prefixes or suffixes that may be applied to the root word.
See this blog post for a nice explanation and examples of what these affixes can be used for.
Another place to look is the aspell manual.
TLDR: each letter in the .dic file following the slash is a name of a rule in the .aff file.
https://superuser.com/a/633869/367530
Each rule is in the .aff file for that language. The rules come in two
flavors: SFX for suffixes, and PFX for prefixes. Each line begins with
PFX/SFX and then the rule letter identifier (the ones that follow the
word in the dictionary file:
PFX [rule_letter_identifier] [combineable_flag]
[number_of_rule_lines_that_follow]
You can normally ignore the combinable flag, it is Y or N depending on
whether it can be combined with other rules. Then there are some
number of lines (indicated by the ) that list different possibilities
for how this rule applies in different situations. It looks like this:
PFX [rule_letter_identifier] [number_of_letters_to_delete]
[what_to_add] [when_to_add_it]
For example:
SFX B Y 3
SFX B 0 able [^aeiou]
SFX B 0 able ee
SFX B e able [^aeiou]e
If B is one of the letters following a word, i.e. someword/B, then this is one of the
rules that can apply. There are three possibilities that can happen
(because there are three lines). Only one will apply:
able is added to the end when the end of the word is not (indicated by ^) one of the letters in the set (indicated by [ ]) of letters a, e, i, o, and u. For example, question → questionable
able is added to the end when the end of the word is ee. For example, agree → agreeable.
able is added to the end when the end of the word is not a vowel ([^aeiou]) followed by an e. The letter e is stripped (the column before able). For example, excite → excitable.
PFX rules are the same, but apply at the beginning of the word instead
for prefixes.
Lets say, I highlighted (matched) text present in brackets using
/(.*)
Now, how to copy the highlighted text only (i.e matching pattern, not entire line) into a buffer, so that I paste it some where.
Multiple approaches are presented in this Vim Tips Wiki page. The simplest approach is the following custom command:
function! CopyMatches(reg)
let hits = []
%s//\=len(add(hits, submatch(0))) ? submatch(0) : ''/ge
let reg = empty(a:reg) ? '+' : a:reg
execute 'let #'.reg.' = join(hits, "\n") . "\n"'
endfunction
command! -register CopyMatches call CopyMatches(<q-reg>)
When you search, you can use the e flag to motion to the end of the match. So if I understand your question correctly, if you searched using eg.:
/bar
And you wish to copy it, use:
y//e
This will yank using the previous search pattern until the end of the match.
Do you want to combine every (foo) in the buffer in one register (which would look like (foo)(bar)(baz)…) or do you want to yank a single (foo) that you matched?
The last is done with ya( if you want the parenthesis or yi( if you only want what's between.
Ingo's answer takes care of the former.