using
system(paste("wc -l file_1.txt"))
in R to obtain the line number of a file
The output is
1601 file_1.txt
My problem is that if I type
system(paste("wc -l file_1.txt"))->kt
and then
kt
[1] 0
I would need to be able to say whether
system(paste("wc -l file_1.txt"))->kt
kt[1]==1600
or not..but I cant access the elements from the system commadn or the printout...how can i do that to somehow check whether the file has 1600 lines without reading it into R first...
system only returns the return value of your command by default, you need to use its intern argument:
system(paste("wc -l banner.p"), intern=T)->kt
kt would then be some string like
<lines> <filename>
And then you could parse the string.
Related
I working in R (on a Windows OS) attempting to count the number of words in the text file without loading the file into memory. The idea is to get some stats on the file size, line count, word count, etc. A call to R's system() function that uses find for the line count is not hard to come by:
How do I do a "word count" command in Windows Command Prompt
lineCount <- system(paste0('find /c /v "" ', path), intern = T)
The command that I'm trying to work with for the word count is a PowerShell command: Measure-Object. I can get the following code to run without throwing an error but it returns an incorrect count.
print(system2("Measure-Object", args = c('count_words.txt', '-Word')))
[1] 127
The file, count_words.txt has on the order of millions of words. I also tested it on a .txt file with far fewer words.
"There are seven words in this file."
But the count again is returned as 127.
print(system2("Measure-Object", args = c('seven_words.txt', '-Word')))
[1] 127
Does system2() recognize PowerShell commands? What is the correct syntax for a call to the function when using Measure-Object? Why is it returning the same value regardless of actual word count?
The issues -- overview
So, you have two issues going on here:
You aren't telling system2() to use powershell
You aren't using the right powershell syntax
The solution
command <- "Get-Content C:/Users/User/Documents/test1.txt | Measure-Object -Word"
system2("powershell", args = command)
where you replace C:/Users/User/Documents/test2.txt with whatever the path to your file is. I created two .txt files, one with the text "There are seven words in this file." and the other with the text "But there are eight words in this file." I then ran the following in R:
command <- "Get-Content C:/Users/User/Documents/test1.txt | Measure-Object -Word"
system2("powershell", args = command)
Lines Words Characters Property
----- ----- ---------- --------
7
command <- "Get-Content C:/Users/User/Documents/test2.txt | Measure-Object -Word"
system2("powershell", args = command)
Lines Words Characters Property
----- ----- ---------- --------
8
More explanation
From help("system2"):
system2 invokes the OS command specified by command.
One main issue is that Measure-Object isn't a system command -- it's a PowerShell command. The system command for PowerShell is powershell, which is what you need to invoke.
Then, further, you didn't quite have the right PowerShell syntax. If you take a look at the docs, you'll see the PowerShell command you really want is
Get-Content C:/Users/User/Documents/count_words.txt | Measure-Object -Word
(check out example three on the linked documentation).
I am trying to get line count from a file like below in a ksh script. But it returns nothing :
filerecordcount= $((`wc -l <../data/act.dat`))
I also tried these :
filerecordcount= `wc -l <../data/act.dat`
filerecordcount= $(wc -l <../data/act.dat)
When i print the variable its not printing the value in the variable.
print "Record Count in .dat file : $filerecordcount." 1>&2;
But when i try the same from command prompt it returns the count
wc -l<../data/act.dat
You don't have to give the space after the = in assigning part. Use the below one. It will work fine. But don't forget to print the variable filerecordcount.
UPDATE:
filerecordcount=$((`wc -l <../data/act.dat`))
Simplify simplify. Your backquotes are doing the command expansion, and their output is being processed by $((...)) as an arithmetic expression. It's a little redundant.
filerecordcount=$(wc -l < ../data/act.dat)
No space after the =, and just one level of command expansion.
Alternately you can use process substitution:
read filerecordcount < <(wc -l < ../data/act.dat)
Or you could even do this without a subshell, using a loop:
filerecordcount=0
while read junk; do ((filerecordcount++)); done < ../data/act.dat
I remember seeing a unix command that would take lines from standard input and execute another command multiple times, with each line of input as the arguments. For the life of me I can't remember what the command was, but the syntax was something like this:
ls | multirun -r% rm %
In this case rm % was the command to run multiple times, and -r% was an option than means replace % with the input line (I don't remember what the real option was either, I'm just using -r as an example). The complete command would remove all files in the current by passing the name of each file in turn to rm (assuming, of course, that there are no directories in the current directory). What is the real name of multirun?
The command is called 'xargs' :-) and you can run it as following
ls | xargs echo I would love to rm -f the files
I am using AIX and I have a string "There is no process to read data written to a pipe". I want to get the output 2 lines before and 4 lines after this string.
The string is present like more than 100 times in the log and I want to output, the last result in the log with this string
I tried using :
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=2 a=4 s="There is no process to read data written to a pipe" File.log
The output from this command is that I am getting all the 100 plus results where the above string is present
The -A number -B number command is not working in AIX
If you have GNU grep available, you can use it instead of awk:
grep -B 2 -A 4 "There is no process to read data written to a pipe" File.log
I did some havoc on my computer, when I played with the commands suggested by vezult [1]. I expected the one-liner to ask file-names to be removed. However, it immediately removed my files in a folder:
> find ./ -type f | while read x; do rm "$x"; done
I expected it to wait for my typing of stdin:s [2]. I cannot understand its action. How does the read command work, and where do you use it?
What happened there is that read reads from stdin. When you put it at the end of a pipe, it read from that pipe.
So your find becomes
file1
file2
and so on; read reads that and replaces x successively with file1 then file2, and so your loop becomes
rm "file1"
rm "file2"
and sure enough, that rm's every file starting at the current directory ".".
A couple hints.
You didn't need the "/".
It's better and safer to say
find . -type f
because should you happen to type ". /" (ie, dot SPACE slash) find will start at the current directory and then go look starting at the root directory. That trick, given the right privileges, would delete every file in the computer. "." is already the name of a directory; you don't need to add the slash.
The find or rm commands will do this
It sounds like what you wanted to do was go through all the files in all the directories starting at the current directory ".", and have it ASK if you want to delete it. You could do that with
find . -type f -exec rm -i {} \;
or
find . -type f -ok rm {} \;
and not need a loop at all. You can also do
rm -r -i *
and get nearly the same effect, except that it will try to delete directories too. If the directory is empty, that'll even work.
Another thought
Come to think of it, unless you have a LOT of files, you could also do
rm -i `find . -type f`
Now the find in backquotes will become a bunch of file names on the command line, and the '-i' interactive flag on rm will ask the yes or no question.
Charlie Martin gives you a good dissection and explanation of what went wrong with your specific example, but doesn't address the general question of:
When should you use the read command?
The answer to that is - when you want to read successive lines from some file (quite possibly the standard output of some previous sequence of commands in a pipeline), possibly splitting the lines into several separate variables. The splitting is done using the current value of '$IFS', which normally means on blanks and tabs (newlines don't count in this context; they separate lines). If there are multiple variables in the read command, then the first word goes into the first variable, the second into the second, ..., and the residue of the line into the last variable. If there's only one variable, the whole line goes into that variable.
There are many uses. This is one of the simpler scripts I have that uses the split option:
#!/bin/ksh
#
# #(#)$Id: mkdbs.sh,v 1.4 2008/10/12 02:41:42 jleffler Exp $
#
# Create basic set of databases
MKDUAL=$HOME/bin/mkdual.sql
ELEMENTS=$HOME/src/sqltools/SQL/elements.sql
cat <<! |
mode_ansi with log mode ansi
logged with buffered log
unlogged
stores with buffered log
!
while read dbs logging
do
if [ "$dbs" = "unlogged" ]
then bw=""; cw=""
else bw="-ebegin"; cw="-ecommit"
fi
sqlcmd -xe "create database $dbs $logging" \
$bw -e "grant resource to public" -f $MKDUAL -f $ELEMENTS $cw
done
The cat command with a here-document has its output sent to a pipe, so the output goes into the while read dbs logging loop. The first word goes into $dbs and is the name of the (Informix) database I want to create. The remainder of the line is placed into $logging. The body of the loop deals with unlogged databases (where begin and commit do not work), then run a program sqlcmd (completely separate from the Microsoft new-comer of the same name; it's been around since about 1990) to create a database and populate it with some standard tables and data - a simulation of the Oracle 'dual' table, and a set of tables related to the 'table of elements'.
Other scripts that use the read command are bigger (by far), but generally read lines containing one or more file names and some other attributes of relevance, and then apply an appropriate transform to the files using the attributes.
Osiris JL: file * | grep 'sh.*script' | sed 's/:.*//' | xargs wgrep read
esqlcver:read version letter
jlss: while read directory
jlss: read x || exit
jlss: read x || exit
jlss: while read file type link owner group perms
jlss: read x || exit
jlss: while read file type link owner group perms
kb: while read size name
mkbod: while read directory
mkbod:while read dist comp
mkdbs:while read dbs logging
mkmsd:while read msdfile master
mknmd:while read gfile sfile version notes
publictimestamp:while read name type title
publictimestamp:while read name type title
Osiris JL:
'Osiris JL: ' is my command line prompt; I ran this in my 'bin' directory. 'wgrep' is a variant of grep that only matches entire words (to avoid words like 'already'). This gives some indication of how I've used it.
The 'read x || exit' lines are for an interactive script that reads a response from standard input, but exits if the command gets EOF (for example, if standard input comes from /dev/null).