Concat csv files and strip the header - unix

I have n number of csv files that I will need to concatenate. The issue is I need to remove the header file from each one.
I have tried using these
tail -n +2 $INPUT_FILE_PATH/$FILE > $NEW_INPUT_FILE_PATH
***This puts the filename and path in the newfile
==> /file path/filename1.csv <==
A, B, C, D
E, F, G, H
==> /file path/filename2.csv <==
I, J, K, L
M, N, O, P
I have tried
sed 1d $INPUT_FILE_PATH/$FILE > $NEW_INPUT_FILE_PATH
***Only removes the header from the first file.
A, B, C, D,
E, F, G, H
Header1, header2, header3, header4
I, J, K, L
M, N, O, P
How can I have the result be
A, B, C, D,
E, F, G, H
I, J, K, L
M, N, O, P

You can use find and sed for that:
find /path/to/files -name '*.csv' -exec sed '1d' {} \;

awk 'FNR>1' file1 file2 ...

Related

Perform replacement in nested predicate using recursion

I'm trying to write a set of predicates that replace terms in nested predicates using recursion; i.e.
Given:
r(a, aa).
r(c, cc).
r(e, ee).
p(a, b, c).
p(a, b, p(d, e, f)).
p(a, p(p(b, c, d), e, f), g).
I want:
p(aa, b, cc)
p(aa, b, p(d, ee, f))
p(aa, p(p(b, cc, d), ee, f), g)
Here is a (probably wildly incorrect) attempt:
inf(p(A, B, C), p(AA, BB, CC)):-
p(A, B, C),
( r(A, AA);
r(B, BB);
r(C, CC)
).
inf(p(A, B, C), p(AA, BB, CC)):-
p(A, B, C),
( r(A, AA);
r(B, BB);
r(C, CC)
),
( inf(A, AA);
inf(B, BB);
inf(C, CC)
).
With a call to inf(X, Y). this yields:
X = p(a, b, c),
Y = p(aa, _1262, _1264)
X = p(a, b, c),
Y = p(_1064, _1066, cc)
X = p(a, b, p(d, e, f)),
Y = p(aa, _1074, _1076)
X = p(a, p(p(b, c, d), e, f), g),
Y = p(aa, _1082, _1084)
false
which is not what I want. I suspect there is something wrong with how my base case combines with the code doing replacements.
Any help would be greatly appreciated!
Thanks/JC
Here's a simplified approach which might have some exception cases for you to examine and explore, but it illustrates a handy use of (=..)/2 and maplist/3. (=..)/2 provides an equivalence between a term and a list (e.g., p(a, b, p(d, e, f)) =.. L results in L = [p, a, b, p(d, e, f)] and Term =.. [foo, x, y] results in Term = foo(x, y)). By getting a list equivalent of a term, you can use recursive list processing to handle arbitrary compound terms.
maplist(foo, List1, List2) exercises a query foo(X1, X2) for every corresponding element X1 of List1 and X2 of List2 and succeeds if each query succeeds and provides argument instantiations for each success as Prolog normally does on a query.
You can use maplist(r, TermList, SubList) to perform a simple substitution using the mapping r as long as r succeeds for every element of the list. However, in this case, you'd want a mapping that succeeds with the same term back again if there is no mapping. For this, you can define map_r as below.
% map_r is the mapping defined by 'r', or term maps to itself
map_r(X, M) :-
r(X, M).
map_r(X, X) :-
\+ r(X, _).
% A functor on its own is just itself after term substitution
term_subst(Term, Functor) :-
Term =.. [Functor]. % Term has no arguments
% A functor with arguments is the same functor with args substituted
term_subst(Term, TermSub) :-
Term =.. [Functor | [Arg|Args]], % Term has at least one arg
maplist(map_r, [Arg|Args], ArgsMap), % mapping of matching args
maplist(term_subst, ArgsMap, ArgsSub), % recursive substitution for sub-terms
TermSub =.. [Functor | ArgsSub].

Adding a counter to a specific string using unix

I am trying to add a counter to a specific string using unix, I have tried some sed and awk commands but I can't seem to do it properly.
My input file is:
Event_ A D L K
Event_ B P R
Event_ C F I
Event_ J K
M
N
O
Event_ Q S
X
Y
Z
G
T
What I'm hoping to get is:
Event_00000001 A D L K
Event_00000002 B P R
Event_00000003 C F I
Event_00000004 J K
M
N
O
Event_00000005 Q S
X
Y
Z
G
T
Can anyone help?
Use this awk:
awk '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' yourfile
If fields are delimited by \t(Tab),
awk -F"\t" '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' OFS='\t' yourfile
Test:
$ awk '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' file
Event_000001 A D L K
Event_000002 B P R
Event_000003 C F I
Event_000004 J K
M
N
O
Event_000005 Q S
X
Y
Z
G
T

Decompression of a list in prolog

I need to decompress a list in prolog , like in the example below :
decode([[a,1],[b,2],[c,1],[d,3]],L).
L = [a, b, b, c, d, d, d] ;
I made this code :
divide(L,X,Y):-length(X,1),append(X,Y,L).
divide2(L,X,Y):-divide(L,[X|_],[Y|_]).
makelist(_,N,[]):- N =< 0 .
makelist(X,Y,[X|Result]):-Y1 is Y-1,makelist(X,Y1,Result).
makelist2(L,L2):-divide2(L,X,Y),makelist(X,Y,L2).
decode([],[]).
decode([H|T],L):-makelist2(H,H2),append(H2,L,L2),decode(T,L2).
and when i call
makelist2([a,3],L2).
L2 = [a,a,a].
but when i call
decode([[a,3],[b,1],[c,4]],L)
runs continuously. What am i doing wrong ?
Another variation of the theme, using a slightly modified version of Boris' repeat/3 predicate:
% True when L is a list with N repeats of X
repeat([X, N], L) :-
length(L, N),
maplist(=(X), L).
decode(Encoded, Decoded) :-
maplist(repeat, Encoded, Expanded),
flatten(Expanded, Decoded).
If Encode = [[a,1],[b,2],[c,1],[d,3]], then in the above decode/2, the maplist/3 call will yield Expanded = [[a],[b,b],[c],[d,d,d]], and then the flatten/2 call results in Decoded = [a,b,b,c,d,d,d].
In SWI Prolog, instead of flatten/2, you can use append/2 since you only need a "flattening" at one level.
EDIT: Adding a "bidirectional" version, using a little CLPFD:
rle([], []).
rle([X], [[1,X]]).
rle([X,Y|T], [[1,X]|R]) :-
X \== Y, % use dif(X, Y) here, if available
rle([Y|T], R).
rle([X,X|T], [[N,X]|R]) :-
N #= N1 + 1,
rle([X|T], [[N1,X]|R]).
This will yield:
| ?- rle([a,a,a,b,b], L).
L = [[3,a],[2,b]] ? ;
(1 ms) no
| ?- rle(L, [[3,a],[2,b]]).
L = [a,a,a,b,b] ? ;
no
| ?- rle([a,a,a,Y,Y,Z], [X, [N,b],[M,c]]).
M = 1
N = 2
X = [3,a]
Y = b
Z = c ? a
no
| ?- rle([A,B,C], D).
D = [[1,A],[1,B],[1,C]] ? ;
C = B
D = [[1,A],[2,B]] ? ;
B = A
D = [[2,A],[1,C]] ? ;
B = A
C = A
D = [[3,A]] ? ;
(2 ms) no
| ?- rle(A, [B,C]).
A = [D,E]
B = [1,D]
C = [1,E] ? ;
A = [D,E,E]
B = [1,D]
C = [2,E] ? ;
A = [D,E,E,E]
B = [1,D]
C = [3,E] ? ;
...
| ?- rle(A, B).
A = []
B = [] ? ;
A = [C]
B = [[1,C]] ? ;
A = [C,D]
B = [[1,C],[1,D]] ? ;
...
As #mat suggests in his comment, in Prolog implementations that have dif/2, then dif(X,Y) is preferable to X \== Y above.
The problem is in the order of your append and decode in the last clause of decode. Try tracing it, or even better, trace it "by hand" to see what happens.
Another approach: see this answer. So, with repeat/3 defined as:
% True when L is a list with N repeats of X
repeat(X, N, L) :-
length(L, N),
maplist(=(X), L).
You can write your decode/2 as:
decode([], []).
decode([[X,N]|XNs], Decoded) :-
decode(XNs, Decoded_rest),
repeat(X, N, L),
append(L, Decoded_rest, Decoded).
But this is a slightly roundabout way to do it. You could define a difference-list version of repeat/3, called say repeat/4:
repeat(X, N, Reps, Reps_back) :-
( succ(N0, N)
-> Reps = [X|Reps0],
repeat(X, N0, Reps0, Reps_back)
; Reps = Reps_back
).
And then you can use a difference-list version of decode/2, decode_1/3
decode(Encoded, Decoded) :-
decode_1(Encoded, Decoded, []).
decode_1([], Decoded, Decoded).
decode_1([[X,N]|XNs], Decoded, Decoded_back) :-
repeat(X, N, Decoded, Decoded_rest),
decode_1(XNs, Decoded_rest, Decoded_back).
?- decode([[a,1],[b,2],[c,1],[d,3]],L).
L = [a, b, b, c, d, d, d].
?- decode([[a,3],[b,1],[c,0],[d,3]],L).
L = [a, a, a, b, d, d, d].
?- decode([[a,3]],L).
L = [a, a, a].
?- decode([],L).
L = [].
You can deal with both direction with this code :
:- use_module(library(lambda)).
% code from Pascal Bourguignon
packRuns([],[]).
packRuns([X],[[X]]).
packRuns([X|Rest],[XRun|Packed]):-
run(X,Rest,XRun,RRest),
packRuns(RRest,Packed).
run(Var,[],[Var],[]).
run(Var,[Var|LRest],[Var|VRest],RRest):-
run(Var,LRest,VRest,RRest).
run(Var,[Other|RRest],[Var],[Other|RRest]):-
dif(Var,Other).
%end code
pack_1(In, Out) :-
maplist(\X^Y^(X = [V|_],
Y = [V, N],
length(X, N),
maplist(=(V), X)),
In, Out).
decode(In, Out) :-
when((ground(In); ground(Out1)),pack_1(Out1, In)),
packRuns(Out, Out1).
Output :
?- decode([[a,1],[b,2],[c,1],[d,3]],L).
L = [a, b, b, c, d, d, d] .
?- decode(L, [a,b,b,c,d,d,d]).
L = [[a, 1], [b, 2], [c, 1], [d, 3]] .
a compact way:
decode(L,D) :- foldl(expand,L,[],D).
expand([S,N],L,E) :- findall(S,between(1,N,_),T), append(L,T,E).
findall/3 it's the 'old fashioned' Prolog list comprehension facility
decode is a poor name for your predicate: properly done, you predicate should be bi-directional — if you say
decode( [[a,1],[b,2],[c,3]] , L )
You should get
L = [a,b,b,c,c,c].
And if you say
decode( L , [a,b,b,c,c,c] ) .
You should get
L = [[a,1],[b,2],[c,3]].
So I'd use a different name, something like run_length_encoding/2. I might also not use a list to represent individual run lengths as [a,1] is this prolog term: .(a,.(1,[]). Just use a simple term with arity 2 — myself, I like using :/2 since it's defined as an infix operator, so you can simply say a:1.
Try this on for size:
run_length_encoding( [] , [] ) . % the run-length encoding of the empty list is the empty list.
run_length_encoding( [X|Xs] , [R|Rs] ) :- % the run-length encoding of a non-empty list is computed by
rle( Xs , X:1 , T , R ) , % - run-length encoding the prefix of the list
run_length_encoding( T , Rs ) % - and recursively run-length encoding the remainder
. % Easy!
rle( [] , C:N , [] , C:N ) . % - the run is complete when the list is exhausted.
rle( [X|Xs] , C:N , [X|Xs] , C:N ) :- % - the run is complete,
X \= C % - when we encounter a break
. %
rle( [X|Xs] , X:N , T , R ) :- % - the run continues if we haven't seen a break, so....
N1 is N+1 , % - increment the run length,
rle( Xs, X:N1, T, R ) % - and recurse down.
. % Easy!
In direct answer to the original question of, What am I doing wrong?...
When I ran the original code, any expected use case "ran indefinitely" without yielding a result.
Reading through the main predicate:
decode([],[]).
This says that [] is the result of decoding []. Sounds right.
decode([H|T],L) :- makelist2(H,H2), append(H2,L,L2), decode(T,L2).
This says that L is the result of decoding [H|T] if H2 is an expansion of H (which is what makelist2 does... perhaps - we'll go over that below), and H2 appended to this result gives another list L2 which is the decoded form of the original tail T. That doesn't sound correct. If I decode [H|T], I should (1) expand H, (2) decode T giving L2, then (3) append H to L2 giving L.
So the corrected second clause is:
decode([H|T], L) :- makelist2(H, H2), decode(T, L2), append(H2, L2, L).
Note the argument order of append/3 and that the call occurs after the decode of the tail. As Boris pointed out previously, the incorrect order of append and the recursive decode can cause the continuous running without any output as append with more uninstantiated arguments generates a large number of unneeded possibilities before decode can succeed.
But now the result is:
| ?- decode([[a,3]], L).
L = [a,a,a] ? ;
L = [a,a,a,a] ? ;
...
If you try out our other predicates by hand in the Prolog interpreter, you'll find that makelist2/2 has an issue:
It produces the correct result, but also a bunch of incorrect results. Let's have a look at makelist2/2. We can try this predicate by itself and see what happens:
| ?- makelist2([a,3], L).
L = [a,a,a] ? ;
L = [a,a,a,a] ? ;
...
There's an issue: makelist2/2 should only give the first solution, but it keeps going, giving incorrect solutions. Let's look closer at makelist/2:
makelist2(L,L2) :- divide2(L,X,Y), makelist(X,Y,L2).
It takes a list L of the form [A,N], divides it (via divide2/3) into X = A and Y = N, then calls an auxiliary, makelist(X, Y, L2).
makelist(_,N,[]):- N =< 0 .
makelist(X,Y,[X|Result]):-Y1 is Y-1,makelist(X,Y1,Result).
makelist/3 is supposed to generate a list (the third argument) by replicating the first argument the number of times given in the second argument. The second, recursive clause appears to be OK, but has one important flaw: it will succeed even if the value of Y is less than or equal to 0. Therefore, even though a correct solution is found, it keeps succeeding on incorrect solutions because the base case allows the count to be =< 0:
| ?- makelist(a,2,L).
L = [a,a] ? ;
L = [a,a,a] ? ;
We can fix makelist/2 as follows:
makelist(_,N,[]):- N =< 0 .
makelist(X,Y,[X|Result]):- Y > 0, Y1 is Y-1, makelist(X,Y1,Result).
Now the code will generate a correct result. We just needed to fix the second clause of decode/2, and the second clause of makelist/3.
| ?- decode([[a,3],[b,4]], L).
L = [a,a,a,b,b,b,b]
yes
The complete, original code with just these couple of corrections looks like this:
divide(L, X, Y) :- length(X, 1), append(X, Y, L).
divide2(L, X, Y) :- divide(L, [X|_], [Y|_]).
makelist(_, N, []) :- N =< 0 .
makelist(X, Y, [X|Result]) :- Y > 0, Y1 is Y-1, makelist(X,Y1,Result).
makelist2(L, L2) :- divide2(L, X, Y), makelist(X, Y, L2).
decode([], []).
decode([H|T], L) :- makelist2(H,H2), decode(T,L2), append(H2,L2,L).
Note some simple, direct improvements. The predicate, divide2(L, X, Y) takes a list L of two elements and yields each, individual element, X and Y. This predicate is unnecessary because, in Prolog, you can obtain these elements by simple unification: L = [X, Y]. You can try this right in the Prolog interpreter:
| ?- L = [a,3], L = [X,Y].
L = [a,3]
X = a
Y = 3
yes
We can then completely remove the divide/3 and divide2/3 predicates, and replace a call to divide2(L, X, Y) with L = [X,Y] and reduce makelist2/2 to:
makelist2(L, L2) :- L = [X, Y], makelist(X, Y, L2).
Or more simply (because we can do the unification right in the head of the clause):
makelist2([X,Y], L2) :- makelist(X, Y, L2).
You could just remove makelist2/2 and call makelist/2 directly from decode/2 by unifying H directly with its two elements, [X, N]. So the original code simplifies to:
makelist(_, N, []) :- N =< 0 .
makelist(X, Y, [X|Result]) :- Y > 0, Y1 is Y-1, makelist(X,Y1,Result).
decode([], []).
decode([[X,N]|T], L) :- makelist(X, N, H2), decode(T, L2), append(H2, L2, L).
And makelist/3 can be performed a bit more clearly using one of the methods provided in the other answers (e.g., see Boris' repeat/3 predicate).

How to remove/add spaces in all textfiles?

I have several files that look like these, e.g. test.in:
apple foo bar
hello world
I need to achieve this desired output, a space after every character:
a p p l e f o o b a r
h e l l o w o r l d
I though possibly i'll first remove all spaces and then add spaces to each character, as such:
sed 's/\s//g' test.in | sed -e 's/\(.\)/\1 /g'
but is there other ways?
This awk may do:
awk -v FS="" '{gsub(/ /,"");$1=$1}1' file
a p p l e f o o b a r
h e l l o w o r l d
This first remove all space, then since FS (Field Separator) is set to nothing, the $1=$1 reconstruct all fields with one space.
This does not add space at the end as most of the other sed and perl command here.
Or based on sed posted here.
awk '{gsub(/ /,"");gsub(/./,"& ")}1' file
a p p l e f o o b a r
h e l l o w o r l d
You can combine your two sed commands into a single command instead:
$ sed 's/\s//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d
Note the use of . and & instead of \(.\) and \1.
On systems that do not support \s to designate matching whitespace, you can use [[::blank::]] instead:
$ sed 's/[[:blank:]]//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d
Through perl,
$ perl -ple 's/([^ ]|^)(?! )/\1 /g' file
a p p l e f o o b a r
h e l l o w o r l d
Add an inline edit option -i to save the changes made,
perl -i -ple 's/([^ ]|^)(?! )/\1 /g' file
sed 's/ //g;s/./& /g' filename
&: refers to that portion of the pattern space which matched
Or maybe something like this with sed :
$ sed 's/./& /g;s/ //g' file
a p p l e f o o b a r
h e l l o w o r l d
This might work for you (GNU sed):
sed 's/\B/ /g' file

Rectangular Peg Solitaire in Prolog?

possible quick question here since I'm new to Prolog. I'm trying to convert this code for solving a triangular peg solitaire puzzle into solving a rectangular peg solitaire puzzle. The problem I think I'm facing is trying to figure out how to let the program know it completed the puzzle. Here's what I've got currently:
% Legal jumps along a line.
linjmp([x, x, o | T], [o, o, x | T]).
linjmp([o, x, x | T], [x, o, o | T]).
linjmp([H|T1], [H|T2]) :- linjmp(T1,T2).
% Rotate the board
rotate([[A, B, C, D, E, F],
[G, H, I, J, K, L],
[M, N, O, P, Q, R],
[S, T, U, V, W, X]],
[[S, M, G, A],
[T, N, H, B],
[U, O, I, C],
[V, P, J, D],
[W, Q, K, E],
[X, R, L, F]]).
rotateBack([[A, B, C, D],
[E, F, G, H],
[I, J, K, L],
[M, N, O, P],
[Q, R, S, T],
[U, V, W, X]],
[[D, H, L, P, T, X],
[C, G, K, O, S, W],
[B, F, J, N, R, V],
[A, E, I, M, Q, U]]).
% A jump on some line.
horizjmp([A|T],[B|T]) :- linjmp(A,B).
horizjmp([H|T1],[H|T2]) :- horizjmp(T1,T2).
% One legal jump.
jump(B,A) :- horizjmp(B,A).
jump(B,A) :- rotate(B,BR), horizjmp(BR,BRJ), rotateBack(A,BRJ).
%jump(B,A) :- rotate(BR,B), horizjmp(BR,BRJ), rotate(BRJ,A).
% Series of legal boards.
series(From, To, [From, To]) :- jump(From, To).
series(From, To, [From, By | Rest])
:- jump(From, By),
series(By, To, [By | Rest]).
% A solution.
solution(L) :- series([[o, x, x, x, x, x],
[x, x, x, x, x, x],
[x, x, x, x, x, x],
[x, x, x, x, x, x]], L).
The triangular puzzle code required that the user input what the ending table would look like, but I didn't want that. I want this to show any possible solution. The table will always be exactly 6x4. I liked the idea of rotating the grid to continue to simply figure out horizontal jumps, so I changed the rotate function to rotate it's side, and added a RotateBack function to put it back into place. I figured I would have to do this because the grid isn't symmetrical. Since it will always be this size, I figure the simplest way to find the end is to set up a counter that will count how many moves are taken place. Once we hit 22 moves (the max moves possible to clear the whole grid except for 1 peg), then the solution will be a success.
In other words, I think I need to remove this code:
% Series of legal boards.
series(From, To, [From, To]) :- jump(From, To).
series(From, To, [From, By | Rest])
:- jump(From, By),
series(By, To, [By | Rest]).
And change it so that it sets up a counter that stops at 22. Any suggestions?
I think you could count the pegs, or better, fail when there are at least 2.
To do it efficiently, should be (untested code)
finished(L) :-
\+ call_nth(find_peg(L), 2).
find_peg(L) :-
member(R, L),
memberchk(R, x).
call_nth/2, as defined in this answer, requires the builtin nb_setval. This is available in SWI-Prolog or Yap.

Resources