Converting this code from R to Shell script? - r

So I'm running a program that works but the issue is my computer is not powerful enough to handle the task. I have a code written in R but I have access to a supercomputer that runs a Unix system (as one would expect).
The program is designed to read a .csv file and find everything with the unit ft3(monthly total) in the "Units" column and select the value in the column before it. The files are charts that list things in multiple units.
To convert this program in R:
getwd()
setwd("/Users/youruserName/Desktop")
myData= read.table("yourFileName.csv", header=T, sep=",")
funData= subset(myData, units="ft3(monthly total)", select=units:value)
write.csv(funData, file="funData.csv")
To a program in Shell Script, I tried:
pwd
cd /Users/yourusername/Desktop
touch RunThisProgram
nano RunThisProgram
(((In nano, I wrote)))
if
grep -r yourFileName.csv ft3(monthly total)
cat > funData.csv
else
cat > nofun.csv
fi
control+x (((used control x to close nano)))
chmod -x RunThisProgram
./RunThisProgram
(((It runs for a while)))
We get a funData.csv file output but that file is empty
What am I doing wrong?

It isn't actually running, because there are a couple problems with your script.
grep needs the pattern first, and quoted; -r is for recursing a
directory...
if without a then
cat is called wrong so it is actually reading from stdin.
You really only need one line:
grep -F "ft3(monthly total)" yourFileName.csv > funData.csv

Related

Command line - Awk command for Windows

I have a CSV file with millions of rows. I would like to open a connection to the file and filter unnecessary rows before opening it in R. To detail, I would like to import every 30th row starting at the second row.
I am operating on a Windows machine. I know the following command achieves the desired result on an Apple; however, it doesn't work on my Windows machine.
awk 'BEGIN{i=0}{i++;if (i%30==2) print $1}' < test.csv
In R, if I ran this code on an Apple, I would get the desired result:
write.csv(1:100000, file = "test.csv")
file.pipe <- pipe("awk 'BEGIN{i=0}{i++;if (i%30==2) print $1}' < test.csv")
res <- read.csv(file.pipe)
Clearly, I know nothing about Windows CLI, so could someone translate this awk command to Windows language for me and possibly explain how the translation achieves the desired result?
Thanks in advance!
UPDATE:
So I have downloaded Git and have successfully been able to complete this task using Git command line, but I need to implement it in R because I have to do this task on thousands of files. Anyone know how to make R run this command through Git?
write.csv(1:100000, file = "test.csv")
file.pipe <- pipe("awk \"BEGIN{i=0}{i++;if (i%30==2) print $1}\" test.csv")
res <- read.csv(file.pipe)
On windows the programline of awk need to be surrounded by double quotes. They are escaped because of the other double quotes on the same line.
Also the '<' before the input file is not needed (an I doubt that it is needed on an Apple.)

Writing SoX Stats to File When Running via .bat

Amateur coder here. I have an old R script that runs SoX stats on files across multiple folders and writes the outputs to txt files. A peer requested to use it on their Windows machine and would like to be able to just click a .bat file to run it. I made one but it outputs SoX stats to the batch file window and the txt files the stats typically output to are blank. I've read it's because of how R handles stderr but I can't find a solution for my issue. I'm really only versed in R but have been teaching myself Python, so if there's an alternative language solution like that, I'm all ears.
My "sox-stats.R" Script
setwd("~/R/sox-stats/")
for(l in list.dirs(recursive=FALSE)){
setwd(paste0("~/R/sox-stats/", l))
print(paste0("Directory set to ", l))
dir.create("./stats/", showWarnings = FALSE)
directoryAll <- list.files()
statFolder <- list.files(pattern="stats")
for(file in setdiff(directoryAll, statFolder)){
firstSox <- paste("sox \"",file,"\" -n stats",sep="")
write(system(firstSox, intern = TRUE), paste("./stats/", file, "_stats.txt", sep=""))
print(paste0(file, " has been processed."))
}
}
My .bat File
#ECHO OFF
ECHO Hello, so you want to get some SoX Stats?
ECHO ------------------------------------------------
ECHO Press any key to start me up!
PAUSE >NUL
cd C:\Program Files\R\R-3.3.0\bin
rscript sox-stats.R
IF ERRORLEVEL 1 ECHO SoX Says: Something went wrong. Check messages
above for clues. Press any key to close SoX Stats.
IF NOT ERRORLEVEL 1 ECHO SoX Says: PROCESSING COMPLETE. Press any key to close this window
and open your "sox-stats" folder.
pause >nul
IF NOT ERRORLEVEL 1 %SystemRoot%\explorer.exe "C:\Users\user\Documents\R\sox-stats"
Thanks in advance!
One way to approach this would be to use Julia, which is available for Windows. The meat of the process would be something like:
run(pipeline(`soxi mytest.wav`,"soxi.out"))
This code uses backticks to signal a console command to Julia, sets up a pipe into a named file and runs the whole thing. You could examine the contents of directories as required with a readdir() into an array of filenames and place a filter on the array as required.

How to cat using part of a filename in terminal?

I'm using terminal on OS 10.X. I have some data files of the format:
mbh5.0_mrg4.54545454545_period0.000722172513951.params.dat
mbh5.0_mrg4.54545454545_period0.00077271543854.params.dat
mbh5.0_mrg4.59090909091_period-0.000355232058085.params.dat
mbh5.0_mrg4.59090909091_period-0.000402015664015.params.dat
I know that there will be some files with similar numbers after mbh and mrg, but I won't know ahead of time what the numbers will be or how many similarly numbered ones there will be. My goal is to cat all the data from all the files with similar numbers after mbh and mrg into one data file. So from the above I would want to do something like...
cat mbh5.0_mrg4.54545454545*dat > mbh5.0_mrg4.54545454545.dat
cat mbh5.0_mrg4.5909090909*dat > mbh5.0_mrg4.5909090909.dat
I want to automate this process because there will be many such files.
What would be the best way to do this? I've been looking into sed, but I don't have a solution yet.
for file in *.params.dat; do
prefix=${file%_*}
cat "$file" >> "$prefix.dat"
done
This part ${file%_*} remove the last underscore and following text from the end of $file and saves the result in the prefix variable. (Ref: http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion)
It's not 100% clear to me what you're trying to achieve here but if you want to aggregate files into a file with the same number after "mbh5.0_mrg4." then you can do the following.
ls -l mbh5.0_mrg4* | awk '{print "cat " $9 " > mbh5.0_mrg4." substr($9,12,11) ".dat" }' | /bin/bash
The "ls -s" lists the file and the "awk" takes the 9th column from the result of the ls. With some string concatenation the result is passed to /bin/bash to be executed.
This is a linux bash script, so assuming you have /bind/bash, I'm not 100% famililar with OS X. This script also assumes that the number youre grouping on is always in the same place in the filename. I think you can change /bin/bash to almost any shell you have installed.

Run a command multiple times with arguments given from standard input

I remember seeing a unix command that would take lines from standard input and execute another command multiple times, with each line of input as the arguments. For the life of me I can't remember what the command was, but the syntax was something like this:
ls | multirun -r% rm %
In this case rm % was the command to run multiple times, and -r% was an option than means replace % with the input line (I don't remember what the real option was either, I'm just using -r as an example). The complete command would remove all files in the current by passing the name of each file in turn to rm (assuming, of course, that there are no directories in the current directory). What is the real name of multirun?
The command is called 'xargs' :-) and you can run it as following
ls | xargs echo I would love to rm -f the files

Why did my use of the read command not do what I expected?

I did some havoc on my computer, when I played with the commands suggested by vezult [1]. I expected the one-liner to ask file-names to be removed. However, it immediately removed my files in a folder:
> find ./ -type f | while read x; do rm "$x"; done
I expected it to wait for my typing of stdin:s [2]. I cannot understand its action. How does the read command work, and where do you use it?
What happened there is that read reads from stdin. When you put it at the end of a pipe, it read from that pipe.
So your find becomes
file1
file2
and so on; read reads that and replaces x successively with file1 then file2, and so your loop becomes
rm "file1"
rm "file2"
and sure enough, that rm's every file starting at the current directory ".".
A couple hints.
You didn't need the "/".
It's better and safer to say
find . -type f
because should you happen to type ". /" (ie, dot SPACE slash) find will start at the current directory and then go look starting at the root directory. That trick, given the right privileges, would delete every file in the computer. "." is already the name of a directory; you don't need to add the slash.
The find or rm commands will do this
It sounds like what you wanted to do was go through all the files in all the directories starting at the current directory ".", and have it ASK if you want to delete it. You could do that with
find . -type f -exec rm -i {} \;
or
find . -type f -ok rm {} \;
and not need a loop at all. You can also do
rm -r -i *
and get nearly the same effect, except that it will try to delete directories too. If the directory is empty, that'll even work.
Another thought
Come to think of it, unless you have a LOT of files, you could also do
rm -i `find . -type f`
Now the find in backquotes will become a bunch of file names on the command line, and the '-i' interactive flag on rm will ask the yes or no question.
Charlie Martin gives you a good dissection and explanation of what went wrong with your specific example, but doesn't address the general question of:
When should you use the read command?
The answer to that is - when you want to read successive lines from some file (quite possibly the standard output of some previous sequence of commands in a pipeline), possibly splitting the lines into several separate variables. The splitting is done using the current value of '$IFS', which normally means on blanks and tabs (newlines don't count in this context; they separate lines). If there are multiple variables in the read command, then the first word goes into the first variable, the second into the second, ..., and the residue of the line into the last variable. If there's only one variable, the whole line goes into that variable.
There are many uses. This is one of the simpler scripts I have that uses the split option:
#!/bin/ksh
#
# #(#)$Id: mkdbs.sh,v 1.4 2008/10/12 02:41:42 jleffler Exp $
#
# Create basic set of databases
MKDUAL=$HOME/bin/mkdual.sql
ELEMENTS=$HOME/src/sqltools/SQL/elements.sql
cat <<! |
mode_ansi with log mode ansi
logged with buffered log
unlogged
stores with buffered log
!
while read dbs logging
do
if [ "$dbs" = "unlogged" ]
then bw=""; cw=""
else bw="-ebegin"; cw="-ecommit"
fi
sqlcmd -xe "create database $dbs $logging" \
$bw -e "grant resource to public" -f $MKDUAL -f $ELEMENTS $cw
done
The cat command with a here-document has its output sent to a pipe, so the output goes into the while read dbs logging loop. The first word goes into $dbs and is the name of the (Informix) database I want to create. The remainder of the line is placed into $logging. The body of the loop deals with unlogged databases (where begin and commit do not work), then run a program sqlcmd (completely separate from the Microsoft new-comer of the same name; it's been around since about 1990) to create a database and populate it with some standard tables and data - a simulation of the Oracle 'dual' table, and a set of tables related to the 'table of elements'.
Other scripts that use the read command are bigger (by far), but generally read lines containing one or more file names and some other attributes of relevance, and then apply an appropriate transform to the files using the attributes.
Osiris JL: file * | grep 'sh.*script' | sed 's/:.*//' | xargs wgrep read
esqlcver:read version letter
jlss: while read directory
jlss: read x || exit
jlss: read x || exit
jlss: while read file type link owner group perms
jlss: read x || exit
jlss: while read file type link owner group perms
kb: while read size name
mkbod: while read directory
mkbod:while read dist comp
mkdbs:while read dbs logging
mkmsd:while read msdfile master
mknmd:while read gfile sfile version notes
publictimestamp:while read name type title
publictimestamp:while read name type title
Osiris JL:
'Osiris JL: ' is my command line prompt; I ran this in my 'bin' directory. 'wgrep' is a variant of grep that only matches entire words (to avoid words like 'already'). This gives some indication of how I've used it.
The 'read x || exit' lines are for an interactive script that reads a response from standard input, but exits if the command gets EOF (for example, if standard input comes from /dev/null).

Resources