Third littler example/littler script --how do you run it? - r

So, I'm trying to reproduce the example here
So the first three examples:
echo 'cat(pi^2,"\n")' | r
and
r -e 'cat(pi^2, "\n")'
and
ls -l /boot | awk '!/^total/ {print $5}' | \
r -e 'fsizes <- as.integer(readLines());
print(summary(fsizes)); stem(fsizes)'
work great. The third one:
$ cat examples/fsizes.r
#!/usr/bin/env r
fsizes <- as.integer(readLines())
print(summary(fsizes))
stem(fsizes)
How do you run this? Sorry for the dumb question I am no bash guru...

If the file is in examples/fsizes.r, then make it executable:
chmod +x examples/fsizes.r
And then run it with:
./examples/fsizes.r
The script expects input, one integer per line. When you run it, you can enter line by line, and press control-d to end the input. Or, you can create a file with numbers, and use input redirection, for example:
./examples/fsizes.r < input.txt

Related

Append "/" to end of directory

Completely noob question but, using ls piped to grep, I need to find files or directories that have all capitals in their name, and directories need to have "/" appended to indicate that it is a directory. Trying to append the "/" is the only part I am stuck on. Again, I apologize for the amateur question. I currently have ls | grep [A-Z] and the example out should be: BIRD, DOG, DOGDIR/
It's an interesting question because it's a somewhat difficult thing to accomplish with a bash one-liner.
Here's what I came up with. It doesn't seem very elegant, but I'm not sure how to improve.
find /animals -type d -or -type f \
| grep '/[A-Z]*$' \
| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' \
| sed -e 's/, *$//'; echo
I'll break that down for you
find /animals -type d -or -type f writes out, once per line, the directories and files it found in /animals (see below for my test environment dockerfile - I created /animals to match your desired output). Find can't do a regex match as far as I know on the name, so...
grep '/[A-Z]*$' filter's find's output so that only paths are shown where the last part of the file or directory name, after the final /, is all uppercase
xargs -I + bash -c '...' when you're in a shell and you want to use a "for" loop, chances are what you should be using is xargs. Learn it, know it, love it. xargs takes its input, separated by default by $IFS, and runs the command you give it for each piece of input . So this is going to run a bash shell for each path. that passed the grep filter. In my case, -I + will make xargs replace the literal '+' character with its current input filename. -I also makes it pass one at a time through xargs. For more information, see the xargs manual page.
'echo -n $(basename +)$( test -d + && echo -n /),\\ ' this is the inner bash script that will be run by xargs for each path that got through grep.
basename + cuts the directory component off the path; from your example output you don't want eg /animals/DOGDIR/, you want DOGDIR/. basename is the program that trims the directories for us.
test -d + && echo -n / checks to see whether + (remember xargs will replace it with filename) is a directory ,and if so, runs echo -n /. the -n argument to echo suppresses the newline, important to get the output in the CSV format you specified.
now we can put it all together to see that we're echo -n the output of basename + , with / appended, if it's a directory, and then , appended to that. All the echos run with -n to suppress newlines to keep output CSV looking.
| sed -e 's/, *$//'; echo is purely for formatting. Adding , to each individual output was an easy way to get the CSV, but it leaves us with a final , at the end of the list. The sed invocation removes , followed by any number of spaces at the end of the output so far - eg the entire output from all the xargs invocations. And since we never did output a newline at the end of that output, the final echo is adding that.
Usually in unix shells, you probably wouldn't want a CSV style output. You'd probably instead want a newline-separated output in most cases, one matching file per line, and that would be somewhat simpler to do because you wouldn't need all that faffing with -n and , to make it CSV style. But, valid requirement if the need is there.
FROM debian
RUN mkdir -p /animals
WORKDIR /animals
RUN mkdir -p DOGDIR lowerdir && touch DOGDIR/DOG DOGDIR/lowerDOG2 lowerdir/BIRD
ENTRYPOINT [ "/bin/bash" ]
CMD [ "-c" , "find /animals -type d -or -type f | grep '/[A-Z]*$'| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' | sed -e 's/, *$//'; echo"]
$ docker run --rm test
BIRD, DOGDIR/, DOG
You can start looking at
ls -F | grep -v "[[:lower:]]"
I did not add something for a comma-seperated line, because this is the wrong method: Parsing ls should be avoided ! It will go wrong for filenames like
I am a terribble filename,
with newlines inside me,
and the ls command combined with grep
will only show the last line
BECAUSE THIS LINE HAS NO LOWERCASE CHARACTERS
To get the files without a pipe, you can use
shopt -s extglob
ls -dp +([[:upper:]])
shopt -u extglob
An explanation of the extglob and uppercase can be found at https://unix.stackexchange.com/a/389071/57293
When you want the output in one line, you can get troubles with filenames that have newlines or commas in its name. You might want something like
# parsing ls, yes wrong and failing for some files
ls -dp +([[:upper:]]) | tr "\n" "," | sed 's/,$/\n/'

Unix command to replace first column of a .csv file

I want a unix command (that I will call in a ControlM job) that changes the value of the first column of my .csv file (not the header line), with the date of the previous day (expected format : YYYY-MM-DD).
I tried many commands but none of them do want I want :
tmp=$(mktemp) && awk -F\| -v val=`date -d yesterday +%F` 'NR>1 {gsub($1,val)}' file.csv > "$tmp" && mv "$tmp" file.csv
or :
awk -F\| -v val=`date -d yesterday +%F` '{gsub($1, val)}1' file.csv
even tried gensub but not working.
Example of what I want :
Input :
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-05;2017-11-15;BRIDGE;HELLO
2019-03-05;2018-03-17;WORK;DATA
Output I want (as today is 2019-03-07):
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-06;2017-11-15;BRIDGE;HELLO
2019-03-06;2018-03-17;WORK;DATA
Can you help please and give me examples of commands that should work, I'm not finding a solution.
Thanks a lot
Could you please try following first?(not saving output into file.csv itself it will print output on terminal once happy then you could use answer
provided at last of this post)
awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv
Problems identified in OP's code(and fixed in my suggestion):
1- Use of backtick is depreciated now to save shell variable's values, so instead use val=$(date....) for declaring awk's variable named val.
2- Use of -F, you have set your field separator as \| which is pipe but when we see your provided sample Input_file carefully it is delimited with ;(semi colon) NOT | so that is also one of the reason why it is not reflecting in output.
3- Since use of gsub($1,val), replaces whole line to only with value of variable val
because
syntax of gsub is something like: gsub(your_regex/value_needs_to_be_replaced,"new_value"/variable_which_should_be_there_after_replacement,current_line/variable). Since you have defined wrong field separator so whole line being treated as $1 and thus when you print it by doing awk -F\| -v val=$(date -d yesterday +%F) 'NR>1 {gsub($1,val)} 1' file.csv it will only print previous dates.
4- 4th and main issue is you have NOT printed anything, so even you did mistakes you will NOT see any output either on terminal or in output file.
If happy then you could run your own command to make changes into Input_file itself.(I am assuming that you are having propervaluein your tmp variable here)
tmp=$(mktemp) && awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv > "$tmp" && mv "$tmp" file.csv

R, STDIN to Rscript

I am trying to make a R script - test.R - that can take either a file or a text string directly from a pipe in unix as in either:
file | test.R
or:
cat Sometext | test.R
Tried to follow answers here and here but I am clearly missing something. Is it the piping above or my script below that gives me a error like:
me#lnx: cat AAAA | test.R
bash: test.R: command not found
cat: AAAA: No such file or directory
My test script:
#!/usr/bin/env Rscript
input <- file("stdin", "r")
x <- readLines(input)
write(x, "")
UPDATE.
The script:
#!/usr/bin/env Rscript
con <- file("stdin")
open(con, blocking=TRUE)
x <- readLines(con)
x <- somefunction(x) #Do something or nothing with x
write(x,"")
close(con)
Then both cat file | ./test.R and echo AAAA | ./test.R yield the expected.
I still like r over Rscript here (but then I am not unbiased in this ...)
edd#rob:~$ (echo "Hello,World";echo "Bye,Bye") | r -e 'X <- readLines(stdin());print(X)' -
Hello,World
Bye,Bye
[1] "Hello,World" "Bye,Bye"
edd#rob:~$
r can also read.csv() directly:
edd#rob:~$ (echo "X,Y"; echo "Hello,World"; echo "Bye,Bye") | r -d -e 'print(X)' -
X Y
1 Hello World
2 Bye Bye
edd#rob:~$
The -d is essentially a predefined 'read stdin into X via read.csv' which I think I borrowed as an idea from rio or another package.
Edit: Your example works with small changes:
Make it executable: chmod 0755 ex.R
Pipe output in correctly, ie use echo not cat
Use the ./ex.R notation for a file in the current dir
I changed it to use print(x)
Then:
edd#rob:~$ echo AAA | ./ex.R
[1] "AAA"
edd#rob:~$
I generally use R from a terminal application (BASH shell). I have only done a few experiments with Rscript, but including the #! line allows the script to be run in R, while permitting the use of RScript to generate an executable file. I have to use chmod to set the executable flag on my test file. Your call to write() should print the same output to the console in R or RScript, but if I want to save my output to a file I call sink("fileName") to open the connection and sink() to close it. This generally gives me control of the output and how it is rendered. If I called my script "myScript.rs" and made it executable (chmod u+x myScript.rs) I can type something like ./myScript.rs to run it and get the output on OS X or Linux. Instead of a pipe | you might try redirection > or >> to create or append.

unix combine grep w and v command

I want to search a file and include the text #!/bin/bash, but exclude any other line that has a # sign. These two commands: grep -w '#!/bin/bash' file and grep -v '^#' file each do one part of this job. I would like this to be a single command, so here's what I've tried.
grep -w '#!/bin/bash' | grep -v '^#' file
This excludes lines beginning with #, but doesn't include the line #!/bin/bash
grep -w '#!/bin/bash' -v '^#' file
This just prints every line but #!/bin/bash
grep "^[^#]\|^#\!/bin/bash$" test.sh
Explanation:
^[^#] means starts by something different that #
\| is a or
^#\!/bin/bash$ is the exact line #!/bin/bash
So .. it looks as if you're trying to strip comments from bash files without removing their shebang.
The grep command can search for regular expressions, but isn't so good at applying rules of logic. You could do something like this:
grep -v '^#[^!]' input.sh
But you'd fail to strip comments that are affixed to the ends of lines. Note that I'm being a little more liberal with this regex, since it's entirely possible that a script might use something other than /bin/bash for its shebang. :-)
Another possibility would be to use awk. This lets you apply logic that cannot be expressed within a regular expression. For example, if you want to keep the commented line only if it is a shebang on the first line of the file, and remove all other comments, awk can express that as follows:
awk '
NF==1 && /^#!/; # if we're on the first line and find shebang, print.
/^#/ { next } # if this is a comment line, skip it.
1 # print everything else.
' input.sh

Get a range of lines from a file given the start and end line numbers

I need to extract a set number of lines from a file given the start line number and end line number.
How could I quickly do this under unix (it's actually Solaris so gnu flavour isn't available).
Thx
To print lines 6-10:
sed -n '6,10p' file
If the file is huge, and the end line number is small compared to the number of lines, you can make it more efficient by:
sed -n '10q;6,10p' file
From testing a file with a fairly large number of lines:
$ wc -l test.txt
368048 test.txt
$ du -k test.txt
24640 test.txt
$ time sed -n '10q;6,10p' test.txt >/dev/null
real 0m0.005s
user 0m0.001s
sys 0m0.003s
$ time sed -n '6,10p' test.txt >/dev/null
real 0m0.123s
user 0m0.092s
sys 0m0.030s
Or
head -n "$last" file | tail -n +"$first"
I wrote a Haskell program called splitter that does exactly this: have a read through my release blog post.
You can use the program as follows:
$ cat somefile | splitter 4,6-10,50-
That will get lines four, six to ten and lines fifty onwards. And that is all that there is to it. You will need Haskell to install it. Just:
$ cabal install splitter
And you are done. I hope that you find this program useful.
you can do it with nawk as well
#!/bin/sh
start=10
end=20
nawk -vs="$start" -ve="$end" 'NR>e{exit}NR>=s' file

Resources