Replacing a specific part - unix

I have a list like this:
DEL075MD1BWP30P140LVT
AN2D4BWP30P140LVT
INVD0P7BWP40P140
IND2D6BWP30P140LVT
I want to replace everything in between D and BWP with a *
How can I do that in unix and tcl

Do you have the whole list available at the same time, or are you getting one item at a time from somewhere?
Should all D-BWP groups be processed, or just one per item?
If just one per item, should it be the first or last (those are the easiest alternatives)?
Tcl REs don't have any lookbehind, which would have been nice here. But you can do without both lookbehinds and lookaheads if you capture the goalpost and paste them into the replacement as back references. The regular expression for the text between the goalposts should be [^DB]+, i.e. one or more of any text that doesn't include D or B (to make sure the match doesn't escape the goalposts and stick to other Ds or Bs in the text). So: {(D)[^DB]+(BWP)} (braces around the RE is usually a good idea).
If you have the whole list and want to process all groups, try this:
set result [regsub -all {(D)[^DB]+(BWP)} $lines {\1*\2}]
(If you can only work with one line at a time, it's basically the same, you just use a variable for a single line instead of a variable for the whole list. In the following examples, I use lmap to generate individual lines, which means I need to have the whole list anyway; this is just an example.)
Process just the first group in each line:
set result [lmap line $lines {
regsub {(D)[^DB]+(BWP)} $line {\1*\2}
}]
Process just the last group in each line:
set result [lmap line $lines {
regsub {(D)[^DB]+(BWP[^D]*)$} $line {\1*\2}
}]
The {(D)[^DB]+(BWP[^D]*)$} RE extends the right goalpost to ensure that there is no D (and hence possibly a new group) anywhere between the goalpost and the end of the string.
Documentation:
lmap (for Tcl 8.5),
lmap,
regsub,
set,
Syntax of Tcl regular expressions

Related

How do I extract a section number and the text after it?

I have a question.
My text file contains lines such as:
1.1        Description.
This is the description.
1.1.1      Quality Assurance
Random sentence.
1.6.1    Quality Control. Quality Control is the responsibility of the contractor.
I'm trying to find out how to get:
1.1        Description
1.1.1      Quality Assurance
1.6.1    Quality Control
Right now, I have:
txt1 <- readLines("text1.txt")
txt2<-grep("^[0-9.]+", txt1, value = TRUE)
file<-write(txt2, "text3.txt")
which results in:
1.1        Description.
1.1.1      Quality Assurance
1.6.1    Quality Control. Quality Control is the responsibility of the contractor.
You are using grep with value=TRUE, which
returns a character vector containing the selected elements of x
(after coercion, preserving names but no other attributes).
This means, that if your regular expression matches anything in the line, the all line will be returned. You managed to build your regular expression to match numbers in the begining of the line. So all the lines which begin with numbers get selected.
It seems that your goal is not to select the all line, but to select only until there is a line break or a period.
So, you need to adjust the regular expression to be more specific, and you need to extract only the matching portion of the line.
A regular expression that matches what you want can be:
"^([0-9]\\.?)+ .+?(\\.|$)"
It selects numbers with dots, followed by a space, followed by anything, and stops matching things when a . comes or the line ends. I recommend the following website to better understand what the regex does: https://regexr.com/
The next step is extracting from the given lines only the matching portion, and not the all line where the regex has a match. For this we'll use the function regexpr, which tells us where the matches are, and the function regmatches, which helps us extract those matches:
txt1 <- readLines("text.txt")
regmatches(txt1, regexpr("^([0-9]\\.?)+ .+?(\\.|$)", txt1))

R - Split character vector using regex

I've got some kind of logfile I'd like to read and analyse. Unfortunately the files are saved in a pretty "ugly" way (with lots of special characters in between), so I'm not able to read in just the lines with each one being an entry. The only way to separate the different entries is using regular expressions, since the beginning of each entry follows a specified pattern.
My first approach was to identify the pattern in the character vector (I use read_file from the readr-package) and use the corresponding positions to split the vector with strsplit. Unfortunately the positions seem not always to match, since the result doesn't always correspond to the entries (I'd guess that there's a problem with the special characters).
A typical line of the file looks as follows:
16/10/2017, 21:51 - George: This is a typical entry here
The corresponding regular expressions looks as follows:
([[:digit:]]{2})/([[:digit:]]{2})/([[:digit:]]{4}), ([[:digit:]]{2}):([[:digit:]]{2}) - ([[:alpha:]]+):
The first thing I want is a data.frame with each line corresponding to a specific entry (in a next step I'd split the pattern into its different parts).
What I tried so far was the following:
regex.log = "([[:digit:]]{2})/([[:digit:]]{2})/([[:digit:]]{4}), ([[:digit:]]{2}):([[:digit:]]{2}) - ([[:alpha:]]+):"
log.regex = gregexpr(regex.log, file.log)[[1]]
log.splitted = substring(file.log, log.regex, log.regex[2:355]-1)
As can be seen this logfile has 355 entries. The first ones are separated correctly. How can I separate the character vector using a regular expression without loosing the information of the regular expression/pattern?
Use capturing and non-capturing groups to identify the parts you want to keep, and be sure to use anchors:
file.log = "16/10/2017, 21:51 - George: This is a typical entry here"
regex.log = "^((?:[[:digit:]]{2})\\/(?:[[:digit:]]{2})\\/(?:[[:digit:]]{4}), (?:[[:digit:]]{2}):(?:[[:digit:]]{2}) - (?:[[:alpha:]]+)): (.*)$"
gsub(regex.log,"\\1",file.log)
>> "16/10/2017, 21:51 - George"
gsub(regex.log,"\\2",file.log)
>> "This is a typical entry here"

How to replace a string pattern with different strings quickly?

For example, I have many HTML tabs to style, they use different classes, and will have different backgrounds. Background images files have names corresponding to class names.
The way I found to do it is yank:
.tab.home {
background: ...home.jpg...
}
then paste, then :s/home/about.
This is to be repeated for a few times. I found that & can be used to repeat last substitute, but only for the same target string. What is the quickest way to repeat a substitute with different target string?
Alternatively, probably there are more efficient ways to do such a thing?
I had a quick play with some vim macro magic and came up with the following idea... I apologise for the length. I thought it best to explain the steps..
First, place the text block you want to repeat into a register (I picked register z), so with the cursor at the beginning of the .tab line I pressed "z3Y (select reg z and yank 3 lines).
Then I entered the series of VIM commands I wanted into the buffer as )"zp:.,%s/home/. (Just press i and type the commands)
This translate to;
) go the end of the current '{}' block,
"zp paste a copy of the text in register z,
.,%s/home/ which has two tricks.
The .,% ensures the substitution applies to everything from the start of the .tab to the end of the closing }, and,
The command is incomplete (ie, does not have a at the end), so vim will prompt me to complete the command.
Note that while %s/// will perform a substitution across every line of the file, it is important to realise that % is an alias for range 1,$. Using 1,% as a range, causes the % to be used as the 'jump to matching parenthesis' operator, resulting in a range from the current line to the end of the % match. (which in this example, is the closing brace in the block)
Then, after placing the cursor on the ) at the beginning of the line, I typed "qy$ which means yank all characters to the end of the line into register q.
This is important, because simply yanking the line with Y will include a carriage return in the register, and will cause the macro to fail.
I then executed the content of register q with #q and I was prompted to complete the s/home/ on the command line.
After typing the replacement text and pressing enter, the pasted block (from register z) appeared in the buffer with the substitutions already applied.
At this point you can repeat the last #qby simple typing ##. You don't even need to move the cursor down to the end of the block because the ) at the start of the macro does that for you.
This effectively reduces the process of yanking the original text, inserting it, and executing two manual replace commands into a simple ##.
You can safely delete the macro string from your edit buffer when done.
This is incredibly vim-ish, and might waste a bit of time getting it right, but it could save you even more when you do.
Vim macro's might be the trick you are looking for.
From the manual, I found :s//new-replacement. Seemed to be too much typing.
Looking for a better answer.

Copy the highlighted pattern in gvim

Lets say, I highlighted (matched) text present in brackets using
/(.*)
Now, how to copy the highlighted text only (i.e matching pattern, not entire line) into a buffer, so that I paste it some where.
Multiple approaches are presented in this Vim Tips Wiki page. The simplest approach is the following custom command:
function! CopyMatches(reg)
let hits = []
%s//\=len(add(hits, submatch(0))) ? submatch(0) : ''/ge
let reg = empty(a:reg) ? '+' : a:reg
execute 'let #'.reg.' = join(hits, "\n") . "\n"'
endfunction
command! -register CopyMatches call CopyMatches(<q-reg>)
When you search, you can use the e flag to motion to the end of the match. So if I understand your question correctly, if you searched using eg.:
/bar
And you wish to copy it, use:
y//e
This will yank using the previous search pattern until the end of the match.
Do you want to combine every (foo) in the buffer in one register (which would look like (foo)(bar)(baz)…) or do you want to yank a single (foo) that you matched?
The last is done with ya( if you want the parenthesis or yi( if you only want what's between.
Ingo's answer takes care of the former.

Simple Vim Programming (vimrc file)

I'm trying to learn how to configure my .vimrc file with my own functions.
I'd like to write a function that traverses every line in a file and counts the total number of characters, but ignores all whitespace. This is for a programming exercise and as a stepping stone to more complex programs (I know there are other ways to get this example value using Vim or external programs).
Here's what I have so far:
function countchars()
let line = 0
let count = 0
while line < line("$")
" update count here, don't count whitespace
let line = getline(".")
return count
endfun
What functional code could I replace that commented line with?
If I understand the question correctly, you're looking to count the number of non-whitespace characters in a line. A fairly simple way to do this is to remove the whitespace and look at the length of the resulting line. Therefore, something like this:
function! Countchars()
let l = 1
let char_count = 0
while l <= line("$")
let char_count += len(substitute(getline(l), '\s', '', 'g'))
let l += 1
endwhile
return char_count
endfunction
The key part of the answer to the question is the use of substitute. The command is:
substitute(expr,pattern,repl,flags)
expr in this case is getline(l) where l is the number of the line being iterated over. getline() returns the content of the line, so this is what is being parsed. The pattern is the regular expression \s which matches any single whitespace character. It is replaced with '', i.e. an empty string. The flag g makes it repeat the substitute as many times as whitespace is found on the line.
Once the substitution is complete, len() gives the number of non-whitespace characters and this is added to the current value of char_count with +=.
A few things that I've changed from your sample:
The function name starts with a capital letter (this is a requirement for user defined functions: see :help user-functions)
I've renamed count to char_count as you can't have a variable with the same name as a function and count() is a built-in function
Likewise for line: I renamed this to l
The first line in a file is line 1, not line 0, so I initialised l to 1
The while loop counted up to but not including the last line, I assume you wanted all the lines in the file (this is probably related to the line numbering starting at 1): I changed your code to use <= instead of <
Blocks aren't based on indentation in vim, so the while needs an endwhile
In your function, you have let line = getline('.')
I added a ! on the function definition as it makes incremental development much easier (everytime you re-source the file, it will override the function with the new version rather than spitting out an error message about it already existing).
Incrementing through the file works slightly differently...
In your function, you had let line = getline('.'). Ignoring the variable name, there are still some problems with this implementation. I think what you meant was let l = line('.'), which gives the line number of the current line. getline('.') gives the contents of the current line, so the comparison on the while line would be comparing the content of the current line with the number of the last line and this would fail. The other problem is that you're not actually moving through the file, so the current line would be whichever line you were on when you called the function and would never change, resulting in an infinite loop. I've replaced this with a simple += 1 to step through the file.
There are ways in which the current line would be a useful way to do this, for example if you were writing a function with that took a range of lines, but I think I've written enough for now and the above will hopefully get you going for now. There are plenty of people on stackoverflow to help with any issues anyway!
Have a look at:
:help usr_41.txt
:help function-list
:help user-functions
:help substitute()
along with the :help followed by the various things I used in the function (getline(), line(), let+= etc).
Hope that was helpful.
This approach uses lists:
function! Countchars()
let n = 0
for line in getline(1,line('$'))
let n += len(split(line,'\zs\s*'))
endfor
return n
endfunction
I suppose you have already found the solution.
Just for info:
I use this to count characters without spaces in Vim:
%s/\S/&/gn

Resources