How to remove/add spaces in all textfiles? - unix

I have several files that look like these, e.g. test.in:
apple foo bar
hello world
I need to achieve this desired output, a space after every character:
a p p l e f o o b a r
h e l l o w o r l d
I though possibly i'll first remove all spaces and then add spaces to each character, as such:
sed 's/\s//g' test.in | sed -e 's/\(.\)/\1 /g'
but is there other ways?

This awk may do:
awk -v FS="" '{gsub(/ /,"");$1=$1}1' file
a p p l e f o o b a r
h e l l o w o r l d
This first remove all space, then since FS (Field Separator) is set to nothing, the $1=$1 reconstruct all fields with one space.
This does not add space at the end as most of the other sed and perl command here.
Or based on sed posted here.
awk '{gsub(/ /,"");gsub(/./,"& ")}1' file
a p p l e f o o b a r
h e l l o w o r l d

You can combine your two sed commands into a single command instead:
$ sed 's/\s//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d
Note the use of . and & instead of \(.\) and \1.
On systems that do not support \s to designate matching whitespace, you can use [[::blank::]] instead:
$ sed 's/[[:blank:]]//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d

Through perl,
$ perl -ple 's/([^ ]|^)(?! )/\1 /g' file
a p p l e f o o b a r
h e l l o w o r l d
Add an inline edit option -i to save the changes made,
perl -i -ple 's/([^ ]|^)(?! )/\1 /g' file

sed 's/ //g;s/./& /g' filename
&: refers to that portion of the pattern space which matched

Or maybe something like this with sed :
$ sed 's/./& /g;s/ //g' file
a p p l e f o o b a r
h e l l o w o r l d

This might work for you (GNU sed):
sed 's/\B/ /g' file

Related

Is there a better way to write a right-recursive grammar's production rules than this?

Scenario: Give production rules for a RIGHT-recursive grammar that
describes the set of all non-empty strings made from the characters
R and N, which may contain arbitrarily many contiguous
repetitions of R, but precisely two or precisely three contiguous
repetitions of N.
Answer:
A -> N B | R+ A
B -> N D | N C | N ε
C -> N D | N ε
D -> R+ D | R ε
Incorrect:
A -> NNB | NNNB | RA | R
B -> R | RA | ε
edit: the above is not correct, I misunderstood the scenario.
Correct:
S -> RS | A
A -> NA | NB
B -> RB | RC
C -> NC | ND
D -> RD | RE | ε
E -> NE | NF
F -> RF | ε
How it works:
It starts with S, that can generate 0 or more R or move to A, which generates the first group of Ns. Then it moves to B, which generates the Rs between 1st and 2nd group of Ns. Then it move to C, which generates the 2nd group of Ns. Then it moves to D, which can generate 0 or more Rs and either finish or move to E, which generates the 3rd group of Ns. Lastly it moves to F, which generates 0 or more Rs.
This works just as well and is simpler:
S -> RS | A
A -> NA | NB
B -> RB | RC
C -> NC | ND
D -> RD | E
E -> NE | F
F -> RF | ε
It is the same up to D where instead of providing an ε option it provides an option to add another group of R's or go to E which is another group of N's, but this would not occur if there were no R's previously anyway as they would have been outputted as a conversion from C, and then another option to recursively add R's or an empty string.
Example parse tree generated from the input NRNR
S
\
A
/ \
N B
/ \
R C
/ \
N D
/ \
R D
\
E
\
F
\
ε

Adding a counter to a specific string using unix

I am trying to add a counter to a specific string using unix, I have tried some sed and awk commands but I can't seem to do it properly.
My input file is:
Event_ A D L K
Event_ B P R
Event_ C F I
Event_ J K
M
N
O
Event_ Q S
X
Y
Z
G
T
What I'm hoping to get is:
Event_00000001 A D L K
Event_00000002 B P R
Event_00000003 C F I
Event_00000004 J K
M
N
O
Event_00000005 Q S
X
Y
Z
G
T
Can anyone help?
Use this awk:
awk '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' yourfile
If fields are delimited by \t(Tab),
awk -F"\t" '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' OFS='\t' yourfile
Test:
$ awk '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' file
Event_000001 A D L K
Event_000002 B P R
Event_000003 C F I
Event_000004 J K
M
N
O
Event_000005 Q S
X
Y
Z
G
T

Concat csv files and strip the header

I have n number of csv files that I will need to concatenate. The issue is I need to remove the header file from each one.
I have tried using these
tail -n +2 $INPUT_FILE_PATH/$FILE > $NEW_INPUT_FILE_PATH
***This puts the filename and path in the newfile
==> /file path/filename1.csv <==
A, B, C, D
E, F, G, H
==> /file path/filename2.csv <==
I, J, K, L
M, N, O, P
I have tried
sed 1d $INPUT_FILE_PATH/$FILE > $NEW_INPUT_FILE_PATH
***Only removes the header from the first file.
A, B, C, D,
E, F, G, H
Header1, header2, header3, header4
I, J, K, L
M, N, O, P
How can I have the result be
A, B, C, D,
E, F, G, H
I, J, K, L
M, N, O, P
You can use find and sed for that:
find /path/to/files -name '*.csv' -exec sed '1d' {} \;
awk 'FNR>1' file1 file2 ...

Turning row-based data into columns by header

I have one (fairly large) file, formatted like such:
SET1
A B C D E F G
SET2
H I J K L M
SETX
(...)
etc.
I would prefer to have them
SET1 SET2 SETX
A H (...)
B I
C J
D K
E L
F M
G
Note that the columns are unequally long, and they are not ordered by size. My file is too big to use the column function inbuilt in unix, and attempts at getting cute by splicing the file and then pasting it together have had problematic results (that is, it has resulted in the empty columns getting the same content as the separator, which doesn't work for my purposes - they both ended up being "\t"). Note that each set may contain several hundred entries, and I have thousands of sets, making awk impractical (at least with my admittedly limited skills there).
Ideally, the output should be readable in R, but at this point I'd be very happy for something that is practically translatable into R input. Note that I can totally live with this having a non-whitespace separator if that is more practical.
Many thanks in advance for any help! Working in an external linux environment.
Edit:
I also have the file available as
SET1
A
B
C
D
E
F
G
SET2
H
I
J
K
L
M
If that could make it easier.
I guess this is more what you wanted:
awk -v OFS="\t"
'/^SET/ {sets[++cols]=$0; set=$0; max_recs=(c>max_recs?c:max_recs); c=0; next}
NF{a[cols,++c]=$0}
END {
for (i=1;i<=cols; i++) printf "%s%s", sets[i], OFS
print ""
for (i=1; i<=max_recs; i++) {
for (j=1; j<=cols; j++) printf "%s%s", a[j,i], OFS
print ""
}
}' file
For this given input:
SET1
B
C
D
E
F
G
SET2
H
I
J
K
L
M
AAA
SET3
A
B
C
D
It returns:
$ awk -v OFS="\t" '/^SET/ {sets[++cols]=$0; set=$0; max_recs=(c>max_recs?c:max_recs); c=0; next} NF{a[cols,++c]=$0} END {for (i=1;i<=cols; i++) printf "%s%s", sets[i], OFS; print ""; for (i=1; i<=max_recs; i++) { for (j=1; j<=cols; j++) printf "%s%s", a[j,i], OFS; print ""}}' file
SET1 SET2 SET3
B H A
C I B
D J C
E K D
F L
G M
AAA
Previous solution with just one block.
You can use paste to show files side by side.
In this case, let's use head and tail to the get half and half. Then, xargs to print one block of text per line. Then they are ready to be pasted:
paste -d"\t" <(head -2 file | xargs -n1) <(tail -2 file | xargs -n1)
For your given input it returns:
SET1 SET2
A H
B I
C J
D K
E L
F M
G

How to justify this symbol in MathType

I have a formula in MathType attached below. But I could not justify the position of $+\infty$ symbol. I want it appear just after the "${$" and aligns to the left of the second term.
Thank you for your help.
The LaTex Code:
${{R}{1}}\left( {{x}{pi}},{{G}{q}},{{x}{qj}} \right)=\,\left{ \begin{matrix}
+\infty & p=q \
\underset{l=1}{\overset{d}{\mathop \sum }}\,({{x}{pi}}\left[ l \right]-{{x}{qj}}\left[ l \right])\left( 2\left( {{x}{qj}}\left[ l \right]-{{{\bar{x}}}{q}}\left[ l \right] \right)+({{x}{pi}}\left[ l \right]-{{x}{qj}}\left[ l \right])(\left| {{G}{q}} \right|-1)/|{{G}{q}}| \right) & p\ne q \
\end{matrix} \right.$
I must use array statement instead of matrix.
LaTex Code:
[
{{R}{1}}\left( {{x}{pi}},{{G}{q}},{{x}{qj}} \right)=\,\left{ \begin{array}{#{}lc}
+\infty & p=q \
\underset{l=1}{\overset{d}{\mathop \sum }}\,({{x}{pi}}\left[ l \right]-{{x}{qj}}\left[ l \right])\left( 2\left( {{x}{qj}}\left[ l \right]-{{{\bar{x}}}{q}}\left[ l \right] \right)+({{x}{pi}}\left[ l \right]-{{x}{qj}}\left[ l \right])(\left| {{G}{q}} \right|-1)/|{{G}{q}}| \right) & p\ne q \
\end{array} \right.
]

Resources