Need help to find command unix to split line basis on certain string - unix

For example , the line is /opt/SP/oracluster/MW/Config/Domains/OT/bin
I want to divide it into two lines with separator string /Config/Domains/ , it should show two lines as
/opt/SP/oracluster/MW &
/OT/bin
I need to separate line basis on certain string/word
Can anyone assist..
I tried using cut ,split,awk commands but didn't work

Related

Xquery preserve spaces while tokenizing

I am trying to achieve the below using XQuery
Input
<DemoXML>
This is a sample line one
this is line number two
this line contains multiple spaces
paragraph ends
</DemoXML
Required Output(Two Records)
<Record1>
This is a sample line one
this line contains multiple spaces
paragraph ends
</Record1>
<Record2>
This is a sample line one
this line contains multiple spaces
paragraph ends
</Record2>
I tried using Tokenize but the problem is tokenize function removes all the 'Spaces' in the secondline.
this is line number two
fn:tokenize($input,'\n')
Tokenize Output
This is a sample line one
this is line number two
this line contains multiple spaces
paragraph ends
Can someone let me know a workaround plz
Your attached query is working fine. Also attached generated output for your reference. May be issue in processor which you are using. I test this query in Marklogic console and Oxygen Editor with XQuery 9.6.0.7
let $val1:=
This is a sample line one
this is line number two
this line contains multiple spaces
paragraph ends
return tokenize($val1,'\n')
Generate Output:
This is a sample line one this is line number two this line contains multiple spaces paragraph ends

R - Split character vector using regex

I've got some kind of logfile I'd like to read and analyse. Unfortunately the files are saved in a pretty "ugly" way (with lots of special characters in between), so I'm not able to read in just the lines with each one being an entry. The only way to separate the different entries is using regular expressions, since the beginning of each entry follows a specified pattern.
My first approach was to identify the pattern in the character vector (I use read_file from the readr-package) and use the corresponding positions to split the vector with strsplit. Unfortunately the positions seem not always to match, since the result doesn't always correspond to the entries (I'd guess that there's a problem with the special characters).
A typical line of the file looks as follows:
16/10/2017, 21:51 - George: This is a typical entry here
The corresponding regular expressions looks as follows:
([[:digit:]]{2})/([[:digit:]]{2})/([[:digit:]]{4}), ([[:digit:]]{2}):([[:digit:]]{2}) - ([[:alpha:]]+):
The first thing I want is a data.frame with each line corresponding to a specific entry (in a next step I'd split the pattern into its different parts).
What I tried so far was the following:
regex.log = "([[:digit:]]{2})/([[:digit:]]{2})/([[:digit:]]{4}), ([[:digit:]]{2}):([[:digit:]]{2}) - ([[:alpha:]]+):"
log.regex = gregexpr(regex.log, file.log)[[1]]
log.splitted = substring(file.log, log.regex, log.regex[2:355]-1)
As can be seen this logfile has 355 entries. The first ones are separated correctly. How can I separate the character vector using a regular expression without loosing the information of the regular expression/pattern?
Use capturing and non-capturing groups to identify the parts you want to keep, and be sure to use anchors:
file.log = "16/10/2017, 21:51 - George: This is a typical entry here"
regex.log = "^((?:[[:digit:]]{2})\\/(?:[[:digit:]]{2})\\/(?:[[:digit:]]{4}), (?:[[:digit:]]{2}):(?:[[:digit:]]{2}) - (?:[[:alpha:]]+)): (.*)$"
gsub(regex.log,"\\1",file.log)
>> "16/10/2017, 21:51 - George"
gsub(regex.log,"\\2",file.log)
>> "This is a typical entry here"

Regex for comma delimited list with line breaks

The format I would like to allow in my text boxes are comma delimited lists followed by a line break in between the comma delimited lists. Here is an example of what I want from the user:
1,2,3
1,2,4
1,2,5
1,2,6
So far I have limited the user using this ValidationExpression:
^([1-9][0-9]*[]*[ ]*,[ ]*)*[1-9][0-9]*$
However with that expression, the user is only able to enter one row of comma delimited numbers.
How can proceed to accept multiple rows by accepting line breaks?
It is possible to check if the input has the correct format. I would recommend to use groups and repeat them:
((\d+,)+\d+\n?)+
But to check if the matrix is symmetric you have to use something else then regex.
Check it out here: https://regex101.com/r/GqtOuQ/2/
If you want to be a bit more user friendly it is possible to allow as much horizontal spaces as the user wants to add between the number and comma. This can be done with he regex group \h which allows every whitespace except \n.
The regex code looks now a bit more messy:
((\h*\d+\h*,\h*)+\h*\d+\h*\n?\h*)+
Check this out here: https://regex101.com/r/GqtOuQ/3
Here is the version that should work with .NET:
(([ \t]*\d+[ \t]*,[ \t]*)+[ \t]*\d+[ \t]*\n?[ \t]*)+

.ksh paste user input value into dataset

Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.
I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.

Finding Extra String between two files in unix

I need to find the string part which is different in two text files in unix and not the whole line. Any help how can i do this?

Resources