I have been reading tutorials and I need to use cat and paste functions (for making a kind of array, table) the thing is that all tutorials use these commands on files, and reading files from the hard drive are making my task very very slow, so I wanted to know how to use them with variables, I tried and got erroneous results, so maybe you can help me with the sintaxis using that.
basically I want to make a table in a variable like this:
00001 Tacos
00023 pizza
00076 burger
00103 chopsuey
00167 burrito
01034 Tamales
And I'm getting every element after executing a program and taking specific data from the output, so I'm getting:
00001
Tacos
00023
pizza
....
You dont have to do the program, just wanted to be sure that cat and paste are the ones to use here and their syntax, if they aren't I accept any suggestion.
Sorry, I maybe did not explain myself, sorry about that, I have a and b, both variables, a is 00001 and b y is Tacos, then i want them to merge together and store them in a variable, then do the same again but to put them in a new line. Sorry for the misundertanding.
at the end i want a variable with this in it:
00001 Tacos
00023 pizza
00076 burger
00103 chopsuey
00167 burrito
01034 Tamales
Thanks!
If you have a tool only working on files as input (e. g. diff or paste), you can use the <(…) notation to create a fake file whose contents is created by a shell command:
cat <(echo "hello world")
This will print hello world. The fake file lacks some of the abilities of on-disk files; it cannot be seeked for instance. So programs which want to seek a specific position in the file, for instance to read the file twice, will fail on this. But for your case, it should suffice and you can use stuff like this:
paste <(echo "$a") <(echo "$b")
For your case more concrete:
cat input.txt | {
x=''
y=''
while read a
do
read b
x=$(echo "$x"; echo "$a")
y=$(echo "$y"; echo "$b")
done
paste <(echo "$x") <(echo "$y")
}
(I'm assuming the input to be this here:)
00001
Tacos
00023
pizza
00076
burger
00103
chopsuey
00167
burrito
01034
Tamales
Related
Hello this my first question to StackOverflow, not sure about the forum and topic.
While participating in an Open Mainframe initiative using Visual Studio Code and Putty for Unix I developed a sample program in COBOL showing international sayings (german, english, french, spanish, latin for now). It works fine via batch with JCL to file and being called from REXX. In file I can't see special chars for non-english but I had a lucky punch with a twin-program in PL/1 (doing the same and showing the special chars in REXX).
Now my question: I also tried to call by mvscmd from Unix bash script. It works so far but dont show me the special chars. Ok I have last chance to call mvscmd from Python. Or alternatively I can transfer file from MVS to unix (for any reason then it automatically converts and I see my special chars contents).
Where is the place to handle it? Cobol? (as I said, for any reason PL/1 can do. I only use standard put edit in PL/1 vs display in Cobol). Converting the Sysprint/Sysout?
Any specialist can help me?
Hello and sorry for late replay. Well the whole code is a little bit much but I guess my problem is the following - MVSCMD direct coded in the shell script
#!/bin/sh
parm='Z08800.FYD.DATA'
#echo "arg1=>"$1"<"
[ ! -z "$1" ] && parm=$parm","$1
#echo "arg2=>"$2"<"
[ ! -z "$2" ] && parm=$parm","$2
#echo "parm=>"$parm"<"
mvscmd --pgm=saycob --args=$parm \
--steplib='z08800.fyd.load' \
--sysin=dummy \
--sysout=*
I have some more shell script but this is the main. I directly put it to sysout (its the COBOL diplay. I can use fixed string or my saying read from MVS file). When using PL/1 program the last file is then sysprint because PL/1 makes it by PUT EDIT.
I assume my codepage is pretty wrong. But I dont know how to repair. I used some settings in the shell but LANG remains on C ??? By the way this Unix seems to be quite old and I only have the chance to use it until August.
My main interest is to use the program on Mainframe and in JCL and/or REXX.
But they gave us chance with this embedded Unix (?) also so I wanted to try.
Direct Sysout from COBOL program to Unix terminal.
I meant when executing the program on the Mainframe and then watching the result file in ISPF (old stuff) editor by PF3 I can see German and Spanish and French special characters. So they are there seems, produced by COBOL and PL/1.
When transfering the MVS file (kind of PDS) into the UNIX by MVSCMD, it is also fine (special chars) but thats not what I wanted.
I tried to use Python instead flat shell but its going even worse. I cannot direct the Sysout to terminal, all what is Python able to call is on the Mainframe and with the MVS filesystem. So I have to transfer it after. It is to much overhead in my eyes when call say 7 sayings and I want them to be displayed in the Unix terminal lol.
Here is my REXX that is doing the trick
/* rexx */
ARG PARM1 PARM2
PARAMETER = '/Z08800.FYD.DATA'
If Length(PARM1) > 0
Then PARAMETER = PARAMETER","PARM1
If Length(PARM2) > 0
Then PARAMETER = PARAMETER","PARM2
PARAMETER = "'"PARAMETER"'"
Address TSO "Alloc File(sysprint) Dataset(*)"
Address TSO "Alloc File(sysin) Dummy"
Address TSO "Call fyd.load(saypli)" PARAMETER
Address TSO "Free File(sysprint)"
Address TSO "Free File(sysin)"
It is now the other Load, the PL/1 - but the COBOL does the same with Sysout instead of Sysprint.
It is shown in my REXX terminal that is also called by ISPF and then 3.4 in the edit panel. The program has no manual input but reads file. And yes, the sayings are not allocated here, I read them by dynamic allocation but it doesnt matter from where my strings come to the DISPLAY / PUT EDIT
And this now JCL. OK works little different, it stores to PDS member
//SAYCOB JOB
//COBCLG EXEC IGYWCLG,
// PARM.GO='Z08800.FYD.DATA'
// SET MBR=SAYCOB
//COBOL.SYSIN DD DSN=&SYSUID..FYD.SOURCE(&MBR),DISP=SHR
//LKED.SYSLMOD DD DSN=&SYSUID..FYD.LOAD(&MBR),DISP=SHR
//GO.SYSOUT DD SYSOUT=*
//*-------------------------------------------------------------
//*
//*-------------------------------------------------------------
//SAYCOB EXEC PGM=&MBR,PARM='Z08800.FYD.DATA,001,007'
//STEPLIB DD DSN=&SYSUID..FYD.LOAD,DISP=SHR
//SYSOUT DD DSN=&SYSUID..FYD.OUTPUT(&MBR),DISP=SHR
//*-------------------------------------------------------------
//LIST EXEC PGM=LINE80,PARM='/80'
//STEPLIB DD DSN=&SYSUID..FYD.LOAD,DISP=SHR
//SYSIN DD DSN=&SYSUID..FYD.OUTPUT(&MBR),DISP=SHR
//SYSPRINT DD SYSOUT=*
//
Here in the parameter I give them the library to my sayings and then I allocate by PL/1 or COBOL. I can of course show, but its a little bit much, about 200 lines... The problem is not MVS I guess but the Unix codepage.
If I begin with a wholly Japanese sentence and run it through MeCab, I get something like this:
$ echo "吾輩は猫である" | mecab
吾輩 名詞,代名詞,一般,*,*,*,吾輩,ワガハイ,ワガハイ
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
猫 名詞,一般,*,*,*,*,猫,ネコ,ネコ
で 助動詞,*,*,*,特殊・ダ,連用形,だ,デ,デ
ある 助動詞,*,*,*,五段・ラ行アル,基本形,ある,アル,アル
EOS
If I smash together everything I get from the last column, I get "ワガハイワネコデアル", which I can then feed into a speech synthesis program and get output. Said program, however, doesn't handle English words.
I throw English into MeCab, it manages to tokenise it (probably naively at the spaces), but gives no reading:
$ echo "I am a cat" | mecab
I 名詞,固有名詞,組織,*,*,*,*
am 名詞,一般,*,*,*,*,*
a 名詞,一般,*,*,*,*,*
cat 名詞,固有名詞,組織,*,*,*,*
EOS
I want to get readings for these as well, even if they're not perfect, so that I can get something along the lines of "アイアムアキャット".
I have already scoured the web for solutions and whereas I do find a bunch of web sites which have transliteration that appears to be adequate, I can't find any way to do it in my own code. In a couple of cases, I emailed the site authors and got no response yet after waiting for a few weeks. (Just how far behind on their inboxes are these people?)
There are a number of directions I can go but I hit dead ends on all of them so far, so this is my compound question:
MeCab takes custom dictionaries. Is there a custom dictionary which fills in the English knowledge somewhat?
Is there some other library or tool that can take English and spit out Katakana?
Is there some library or tool that can take IPA (International Phonetic Alphabet) and spit out Katakana? (I know how to get from English to IPA.)
As an aside, I find that the software "VOICEROID" can speak English text (poorly, but adequately for my purposes). This software uses MeCab too (or at least its DLL and dictionary files are included in the install.) It also uses another library, Cabocha, which as far as I can tell by running it does the exact same thing as MeCab. It could be using custom dictionaries for either of these two libraries to do the job, or the code to do it could be in the proprietary AITalk library they are using. More research is needed and I haven't figured out how to run either tool against their dictionaries to test it out directly either.
I have lots (millions) of small log files in s3 in with its name (date/time) helping to define it i.e. servername-yyyy-mm-dd-HH-MM. e.g.
s3://my_bucket/uk4039-2015-05-07-18-15.csv
s3://my_bucket/uk4039-2015-05-07-18-16.csv
s3://my_bucket/uk4039-2015-05-07-18-17.csv
s3://my_bucket/uk4039-2015-05-07-18-18.csv
...
s3://my_bucket/uk4339-2015-05-07-19-23.csv
s3://my_bucket/uk4339-2015-05-07-19-24.csv
...
etc
From EC2, using the AWS CLI, I would like to simultaneously download all files that are have the minute equal 16 for 2015, for all only server uk4339 and uk4338
Is there a clever way to do this?
Also if this is a terrible file structure in s3 to query data, I would be extremely grateful for any advice on how to set this up better.
I can put a relevant aws s3 cp ... command into a loop in a shell/bash script to sequentially download the relevant files but, was wondering if there was something more efficient.
As an added bonus I would like to row bind the results together too as one csv.
A quick example of a mock csv file can be generated in R using this line of R code
R> write.csv(data.frame(cbind(a1=rnorm(100),b1=rnorm(100),c1=rnorm(100))),file='uk4339-2015-05-07-19-24.csv',row.names=FALSE)
The csv that is created is uk4339-2015-05-07-19-24.csv. FYI, I will be importing the combined data into R at the end.
As you didn't answer my questions, nor indicate what OS you use, it is somewhat hard to make any concrete suggestions, so I will briefly suggest you use GNU Parallel to parallelise your S3 fetch requests to get around the latency.
Suppose you somehow generate a list of all the S3 files you want and put the resulting list in a file called GrabMe.txt like this
s3://my_bucket/uk4039-2015-05-07-18-15.csv
s3://my_bucket/uk4039-2015-05-07-18-16.csv
s3://my_bucket/uk4039-2015-05-07-18-17.csv
s3://my_bucket/uk4039-2015-05-07-18-18.csv
Then you can get them in parallel, say 32 at a time, like this:
parallel -j 32 echo aws s3 cp {} . < GrabMe.txt
or if you prefer reading left-to-right
cat GrabMe.txt | parallel -j 32 echo aws s3 cp {} .
You can obviously alter the number of parallel requests from 32 to any other number. At the moment, it just echoes the command it would run, but you can remove the word echo when you see how it works.
There is a good tutorial here, and Ole Tange (the author of GNU Parallel) is on SO, so we are in good company.
I have a very large file (~10 GB) that can be compressed to < 1 GB using gzip. I'm interested in using sort FILE | uniq -c | sort to see how often a single line is repeated, however the 10 GB file is too large to sort and my computer runs out of memory.
Is there a way to compress the file while preserving newlines (or an entirely different method all together) that would reduce the file to a small enough size to sort, yet still leave the file in a condition that's sortable?
Or any other method of finding out / countin how many times each line is repetead inside a large file (a ~10 GB CSV-like file) ?
Thanks for any help!
Are you sure you're running out of the Memory (RAM?) with your sort?
My experience debugging sort problems leads me to believe that you have probably run out of diskspace for sort to create it temporary files. Also recall that diskspace used to sort is usually in /tmp or /var/tmp.
So check out your available disk space with :
df -g
(some systems don't support -g, try -m (megs) -k (kiloB) )
If you have an undersized /tmp partition, do you have another partition with 10-20GB free? If yes, then tell your sort to use that dir with
sort -T /alt/dir
Note that for sort version
sort (GNU coreutils) 5.97
The help says
-T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp;
multiple options specify multiple directories
I'm not sure if this means can combine a bunch of -T=/dr1/ -T=/dr2 ... to get to your 10GB*sortFactor space or not. My experience was that it only used the last dir in the list, so try to use 1 dir that is big enough.
Also, note that you can go to the whatever dir you are using for sort, and you'll see the acctivity of the temporary files used for sorting.
I hope this helps.
As you appear to be a new user here on S.O., allow me to welcome you and remind you of four things we do:
. 1) Read the FAQs
. 2) Please accept the answer that best solves your problem, if any, by pressing the checkmark sign. This gives the respondent with the best answer 15 points of reputation. It is not subtracted (as some people seem to think) from your reputation points ;-)
. 3) When you see good Q&A, vote them up by using the gray triangles, as the credibility of the system is based on the reputation that users gain by sharing their knowledge.
. 4) As you receive help, try to give it too, answering questions in your area of expertise
There are some possible solutions:
1 - use any text processing language (perl, awk) to extract each line and save the line number and a hash for that line, and then compare the hashes
2 - Can / Want to remove the duplicate lines, leaving just one occurence per file? Could use a script (command) like:
awk '!x[$0]++' oldfile > newfile
3 - Why not split the files but with some criteria? Supposing all your lines begin with letters:
- break your original_file in 20 smaller files: grep "^a*$" original_file > a_file
- sort each small file: a_file, b_file, and so on
- verify the duplicates, count them, do whatever you want.
On a coupon site someone posted a shell script for finding Godaddy discount codes.
1 - Could someone explain how this script works?
Specifically, I'm confused about the syntax:
links url -dump | grep AI
2 - Does shell scripting allow you to spider a site just as perl/python/ruby would?
3 - Is the most efficient way to accomplish the desired goal or would perl/python/ruby be a more effective technology to use for this task?
4 - Is this ethical/legal?
#!/bin/sh
gdaddy=600
while [ "$gdaddy" -lt "700" ]
do
for i in a b c d e f g h i j k l m n o p q r s t u v w x y z
do
echo "The results for gdr0$gdaddy"a"$i" >> output
links http://www.godaddy.com/default.aspx?isc=gdr0$gdaddy"a"$i -dump | grep -A1 "SPECIAL OFFER" >> output
echo >> output
echo >> output
done
gdaddy=`expr $gdaddy + 1`
done
1. links is a text-based web browser. The -dump command makes links output the text of the web page to the terminal, and the following grep command outputs any line that contains the words "SPECIAL OFFER" and the following line (-A1 means "and 1 line After that").
2. You can spider a site using shell scripting, by using links or similar to fetch the web pages and output their URLs. (I've done this, for a website spell checker script.)
3. Use whatever tools you're happiest with. Personally I prefer Python for this kind of thing, but as I say, I've used shell scripting to do it.
4. Legal? Ask a lawyer. Ethical? Ask your conscience.
Legal and Ethical
Assuming you are in the U.S., there aren't any laws restricting the access of a website by a script such as yours.
Those pages are not referenced in robots.txt.
And for godaddy in particular, it's not an ethical problem... When I swapped my registration service over to them I called their sales number, told them what I wanted to do, and they told me on the phone the best code to use.
Dump the content returned for URLs, where the last letter is substituted for a-z, and find a line in it containing "SPECIAL OFFER". Pad it with newline.
Yes, with utilities such as links, wget, telnet.
It is good enough for not demanding things such as this (traversing a small set of URLs)
That's up to site's terms of service and your legislation.
Legality relates to where you live. Consult a legal professional.
Ethical - if you have to ask, it's not. =)