Extract Data from a File Unix - unix

I have file that has space separated columns from that i want to extract specific data .below is the format of the file :
12:00:01 AM CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle
12:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
12:02:01 AM all 93.42 0.00 0.53 0.00 0.00 0.00 0.10 0.00 5.95
12:03:01 AM 1 88.62 0.00 1.71 0.00 0.00 0.00 0.71 0.00 8.96
12:01:01 AM 2 92.56 0.00 0.70 0.00 0.00 0.00 1.17 0.00 5.58
12:01:01 AM 3 86.90 0.00 1.57 0.00 0.00 0.00 0.55 0.00 10.99
01:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
01:02:01 AM all 93.42 0.00 0.53 0.00 0.00 0.00 0.10 0.00 5.95
01:03:01 AM all 88.62 0.00 1.71 0.00 0.00 0.00 0.71 0.00 8.96
01:01:01 AM 2 92.56 0.00 0.70 0.00 0.00 0.00 1.17 0.00 5.58
01:01:01 AM 3 86.90 0.00 1.57 0.00 0.00 0.00 0.55 0.00 10.99
12:01:01 PM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
12:02:01 PM 0 93.42 0.00 0.53 0.00 0.00 0.00 0.10 0.00 5.95
12:03:01 PM 1 88.62 0.00 1.71 0.00 0.00 0.00 0.71 0.00 8.96
12:01:01 PM 2 92.56 0.00 0.70 0.00 0.00 0.00 1.17 0.00 5.58
12:01:01 PM 3 86.90 0.00 1.57 0.00 0.00 0.00 0.55 0.00 10.99
Now from this file i want those rows that have time like 12:01:01 AM/PM i means for every hourly basis and have all in the CPU column
So after extraction i want below data but i am not able to get that.
12:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
01:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
12:01:01 PM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
Please suggest me how we can get that data in UNIX

If you add the -E option to grep it allows you to look for "Extended Regular Expressions". One such expression is
"CPU|01:01"
which will allow you to find all lines containing the word "CPU" (such as your column heading line) and also any lines with "01:01" in them. It is called an "alternation" and uses the pipe symbol (|) to separate alternate sub-parts.
So, an answer would be"
grep -E "CPU|01:01 .*all" yourFile > newFile
Try running:
man grep
to get the manual (help) page.

awk to the rescue!
if you need field specific matches awk is the right tool.
$ awk '$3=="all" && $1~/01:01$/' file
12:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
01:01:01 AM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
12:01:01 PM all 78.13 0.00 0.98 0.00 0.00 0.00 0.56 0.00 20.33
you can extract the header as well, with this
$ awk 'NR==1 || $3=="all" && $1~/01:01$/' file

Related

Correct displaced columns in a table

A bit of context: the File I've shown below is generated by a VLSI tool. It consists of timing delays caused by various components in a circuit. When I generate this "timing file" the fields are not properly organised sometimes.
The generated file:
something1 0.20 0.00 0.00
something2 6 12.95
something3 0.00 0.08 0.00 0.00 0.07
something4 6 8.70
something5 0.00 0.03 0.00 0.00 0.05
something6 5 4.70
What I want:
something1 0.20 0.00 0.00
something2 6 12.95
something3 0.00 0.08 0.00 0.00 0.07
something4 6 8.70
something5 0.00 0.03 0.00 0.00 0.05
something6 5 4.70
The displacement for  something4
and something6keep recurring throughout the table in a particular order(say every 2 lines or 1 line). Only something2 has a different displacement whereas all the other displacements follow something4/something6.
So far I have no clue how to proceed with this. Any way to fix this?
$ awk '{gsub(/ {6}/,","); gsub(/ +/,",")} 1' file | column -s, -t
something1 0.20 0.00 0.00
something2 6 12.95
something3 0.00 0.08 0.00 0.00 0.07
something4 6 8.70
something5 0.00 0.03 0.00 0.00 0.05
something6 5 4.70
or:
$ awk 'BEGIN{FS=OFS="\t"} {gsub(/ {6}/,FS); gsub(/ +/,FS); $1=$1} 1' file
something1 0.20 0.00 0.00
something2 6 12.95
something3 0.00 0.08 0.00 0.00 0.07
something4 6 8.70
something5 0.00 0.03 0.00 0.00 0.05
something6 5 4.70
Another way with awk
awk 'NF>3{$2=OFS$2}NF==4{$2=OFS$2}{$1=$1}1' OFS='\t' infile

How to extract value of CPU idle from sar command using AWK

From the outut of a sar command, I want to extract only the lines in which the %iowait value is higher than a set threshold.
I tried using AWK but somehow I'm not able to perform the action.
sar -u -f sa12 | sed 's/\./,/g' | awk -f" " '{ if ( $7 -gt 0 ) print $0 }'
I tried to substitute the . with , and using -gt but still no joy.
Can someone suggest a solution?
If we need entire line output of sar -u with iowait > 0.01 then, we can use this ,
Command
sar -u | grep -v "CPU" | awk '$7 > 0.01'
Output will be similar to
03:40:01 AM all 3.16 0.00 0.05 0.11 0.00 96.68
04:40:01 PM all 0.19 0.00 0.05 0.02 0.00 99.74
if wish to out specific fields, say only iowait, we can use as given below,
Command to out specific field(s),
sar -u | grep -v "CPU" | awk '{if($7 > 0.01 ) print $7}'
Output will be
0.11
0.02
Note : grep -v is used just to remove the headings in the output
Hope this helps,
My sar -u gives several lines similar to the following:
Linux 4.4.0-127-generic (v1) 06/12/2018 _x86_64_ (1 CPU)
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:05:01 AM all 0.29 0.00 0.30 0.01 0.00 99.40
12:15:01 AM all 0.33 0.00 0.34 0.00 0.00 99.32
12:25:01 AM all 0.33 0.00 0.30 0.01 0.00 99.36
12:35:01 AM all 0.31 0.00 0.29 0.01 0.00 99.39
12:45:01 AM all 0.33 0.00 0.32 0.01 0.00 99.35
12:55:01 AM all 0.32 0.00 0.30 0.00 0.00 99.38
01:05:01 AM all 0.32 0.00 0.28 0.00 0.00 99.39
01:15:01 AM all 0.33 0.00 0.30 0.01 0.00 99.37
01:25:01 AM all 0.31 0.00 0.30 0.01 0.00 99.39
01:35:01 AM all 0.31 0.00 0.33 0.00 0.00 99.36
01:45:01 AM all 0.31 0.00 0.28 0.01 0.00 99.40
01:55:01 AM all 0.31 0.00 0.30 0.00 0.00 99.38
02:05:01 AM all 0.31 0.00 0.28 0.01 0.00 99.40
02:15:01 AM all 0.32 0.00 0.30 0.01 0.00 99.38
02:25:01 AM all 0.31 0.00 0.30 0.01 0.00 99.38
02:35:01 AM all 0.33 0.00 0.33 0.00 0.00 99.33
02:45:01 AM all 0.35 0.00 0.32 0.01 0.00 99.32
02:55:01 AM all 0.28 0.00 0.30 0.00 0.00 99.42
03:05:01 AM all 0.32 0.00 0.31 0.00 0.00 99.37
03:15:01 AM all 0.34 0.00 0.30 0.01 0.00 99.36
03:25:01 AM all 0.32 0.00 0.29 0.01 0.00 99.38
03:35:01 AM all 0.33 0.00 0.26 0.00 0.00 99.40
03:45:01 AM all 0.34 0.00 0.29 0.00 0.00 99.36
03:55:01 AM all 0.30 0.00 0.28 0.01 0.00 99.41
04:05:01 AM all 0.32 0.00 0.30 0.01 0.00 99.37
04:15:01 AM all 0.37 0.00 0.31 0.01 0.00 99.32
04:25:01 AM all 1.78 2.04 0.59 0.05 0.00 95.55
To filter out those where %iowait is greater than, let's say, 0.01:
sar -u | awk '$7>0.01{print}'
Linux 4.4.0-127-generic (v1) 06/12/2018 _x86_64_ (1 CPU)
04:25:01 AM all 1.78 2.04 0.59 0.05 0.00 95.55
05:15:01 AM all 0.34 0.00 0.32 0.02 0.00 99.32
06:35:01 AM all 0.33 0.22 1.23 4.48 0.00 93.74
06:45:01 AM all 0.16 0.00 0.12 0.02 0.00 99.71
10:35:01 AM all 0.22 0.00 0.13 0.02 0.00 99.63
12:15:01 PM all 0.42 0.00 0.16 0.03 0.00 99.40
01:45:01 PM all 0.17 0.00 0.11 0.02 0.00 99.71
04:05:01 PM all 0.15 0.00 0.12 0.03 0.00 99.70
04:15:01 PM all 0.42 0.00 0.23 0.10 0.00 99.25
Edit:
As correctly pointed out by #Ed Morton, the awk code can be shortened to simply awk '$7>0.01', since the default action is to print the current line.

Opening .mea file in R

I have downloaded a file with the extension .mea. It's climate data. I don't know how to import it in r. even I don't know how to open in MacOS. Here is what the first lines of data look like.
IPCC Data Distribution Centre Results from model HADCM3 11-07-2002
Grid is 96 * 73 Month is Jan
HADCM A1F
Total precipitation (mm/day)
7008 format is (10F8.2) missing code is 9999.99
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I did it the following way:
First I split the file into 12 small files, each containing one month's data, using the command line "split" function:
split -l 706 filename newfilePrefix
Then read in each small file with the following
readr::read_table(filename, col_names=FALSE, skip=5)

R profiling spending a lot of time using .External2

I am learning how to use R profiling, and have run the Rprof command on my code.
The summaryRprof function has shown that a lot of time is spent using .External2. What is this? Additionally, there is a large proportion of the total time spent on <Anonymous>, is there a way to find out what this is?
> summaryRprof("test")
$by.self
self.time self.pct total.time total.pct
".External2" 4.30 27.74 4.30 27.74
"format.POSIXlt" 2.70 17.42 2.90 18.71
"which.min" 2.38 15.35 4.12 26.58
"-" 1.30 8.39 1.30 8.39
"order" 1.16 7.48 1.16 7.48
"match" 0.58 3.74 0.58 3.74
"file" 0.44 2.84 0.44 2.84
"abs" 0.40 2.58 0.40 2.58
"scan" 0.30 1.94 0.30 1.94
"anyDuplicated.default" 0.20 1.29 0.20 1.29
"unique.default" 0.20 1.29 0.20 1.29
"unlist" 0.18 1.16 0.20 1.29
"c" 0.16 1.03 0.16 1.03
"data.frame" 0.14 0.90 0.22 1.42
"structure" 0.12 0.77 1.74 11.23
"as.POSIXct.POSIXlt" 0.12 0.77 0.12 0.77
"strptime" 0.12 0.77 0.12 0.77
"as.character" 0.08 0.52 0.90 5.81
"make.unique" 0.08 0.52 0.16 1.03
"[.data.frame" 0.06 0.39 1.54 9.94
"<Anonymous>" 0.04 0.26 4.34 28.00
"lapply" 0.04 0.26 1.70 10.97
"rbind" 0.04 0.26 0.94 6.06
"as.POSIXlt.POSIXct" 0.04 0.26 0.04 0.26
"ifelse" 0.04 0.26 0.04 0.26
"paste" 0.02 0.13 0.92 5.94
"merge.data.frame" 0.02 0.13 0.56 3.61
"[<-.factor" 0.02 0.13 0.52 3.35
"stopifnot" 0.02 0.13 0.04 0.26
".deparseOpts" 0.02 0.13 0.02 0.13
".External" 0.02 0.13 0.02 0.13
"close.connection" 0.02 0.13 0.02 0.13
"doTryCatch" 0.02 0.13 0.02 0.13
"is.na" 0.02 0.13 0.02 0.13
"is.na<-.default" 0.02 0.13 0.02 0.13
"mean" 0.02 0.13 0.02 0.13
"seq.int" 0.02 0.13 0.02 0.13
"sum" 0.02 0.13 0.02 0.13
"sys.function" 0.02 0.13 0.02 0.13
$by.total
total.time total.pct self.time self.pct
"write.table" 5.10 32.90 0.00 0.00
"<Anonymous>" 4.34 28.00 0.04 0.26
".External2" 4.30 27.74 4.30 27.74
"mapply" 4.22 27.23 0.00 0.00
"head" 4.16 26.84 0.00 0.00
"which.min" 4.12 26.58 2.38 15.35
"eval" 3.16 20.39 0.00 0.00
"eval.parent" 3.14 20.26 0.00 0.00
"write.csv" 3.14 20.26 0.00 0.00
"format" 2.92 18.84 0.00 0.00
"format.POSIXlt" 2.90 18.71 2.70 17.42
"do.call" 1.78 11.48 0.00 0.00
"structure" 1.74 11.23 0.12 0.77
"lapply" 1.70 10.97 0.04 0.26
"FUN" 1.66 10.71 0.00 0.00
"format.POSIXct" 1.62 10.45 0.00 0.00
"[.data.frame" 1.54 9.94 0.06 0.39
"[" 1.54 9.94 0.00 0.00
"-" 1.30 8.39 1.30 8.39
"order" 1.16 7.48 1.16 7.48
"rbind" 0.94 6.06 0.04 0.26
"paste" 0.92 5.94 0.02 0.13
"as.character" 0.90 5.81 0.08 0.52
"read.csv" 0.84 5.42 0.00 0.00
"read.table" 0.84 5.42 0.00 0.00
"as.character.POSIXt" 0.82 5.29 0.00 0.00
"match" 0.58 3.74 0.58 3.74
"merge.data.frame" 0.56 3.61 0.02 0.13
"merge" 0.56 3.61 0.00 0.00
"[<-.factor" 0.52 3.35 0.02 0.13
"[<-" 0.52 3.35 0.00 0.00
"strftime" 0.48 3.10 0.00 0.00
"file" 0.44 2.84 0.44 2.84
"weekdays" 0.42 2.71 0.00 0.00
"weekdays.POSIXt" 0.42 2.71 0.00 0.00
"abs" 0.40 2.58 0.40 2.58
"unique" 0.38 2.45 0.00 0.00
"scan" 0.30 1.94 0.30 1.94
"data.frame" 0.22 1.42 0.14 0.90
"cbind" 0.22 1.42 0.00 0.00
"anyDuplicated.default" 0.20 1.29 0.20 1.29
"unique.default" 0.20 1.29 0.20 1.29
"unlist" 0.20 1.29 0.18 1.16
"anyDuplicated" 0.20 1.29 0.00 0.00
"as.POSIXct" 0.18 1.16 0.00 0.00
"as.POSIXlt" 0.18 1.16 0.00 0.00
"c" 0.16 1.03 0.16 1.03
"make.unique" 0.16 1.03 0.08 0.52
"as.POSIXct.POSIXlt" 0.12 0.77 0.12 0.77
"strptime" 0.12 0.77 0.12 0.77
"as.POSIXlt.character" 0.12 0.77 0.00 0.00
"object.size" 0.12 0.77 0.00 0.00
"as.POSIXct.default" 0.10 0.65 0.00 0.00
"Ops.POSIXt" 0.08 0.52 0.00 0.00
"type.convert" 0.08 0.52 0.00 0.00
"!=" 0.06 0.39 0.00 0.00
"as.POSIXlt.factor" 0.06 0.39 0.00 0.00
"as.POSIXlt.POSIXct" 0.04 0.26 0.04 0.26
"ifelse" 0.04 0.26 0.04 0.26
"stopifnot" 0.04 0.26 0.02 0.13
"$" 0.04 0.26 0.00 0.00
"$.data.frame" 0.04 0.26 0.00 0.00
"[[" 0.04 0.26 0.00 0.00
"[[.data.frame" 0.04 0.26 0.00 0.00
"head.default" 0.04 0.26 0.00 0.00
".deparseOpts" 0.02 0.13 0.02 0.13
".External" 0.02 0.13 0.02 0.13
"close.connection" 0.02 0.13 0.02 0.13
"doTryCatch" 0.02 0.13 0.02 0.13
"is.na" 0.02 0.13 0.02 0.13
"is.na<-.default" 0.02 0.13 0.02 0.13
"mean" 0.02 0.13 0.02 0.13
"seq.int" 0.02 0.13 0.02 0.13
"sum" 0.02 0.13 0.02 0.13
"sys.function" 0.02 0.13 0.02 0.13
"%in%" 0.02 0.13 0.00 0.00
".rs.getSingleClass" 0.02 0.13 0.00 0.00
"[.POSIXlt" 0.02 0.13 0.00 0.00
"==" 0.02 0.13 0.00 0.00
"close" 0.02 0.13 0.00 0.00
"data.row.names" 0.02 0.13 0.00 0.00
"deparse" 0.02 0.13 0.00 0.00
"factor" 0.02 0.13 0.00 0.00
"is.na<-" 0.02 0.13 0.00 0.00
"match.arg" 0.02 0.13 0.00 0.00
"match.call" 0.02 0.13 0.00 0.00
"pushBack" 0.02 0.13 0.00 0.00
"seq" 0.02 0.13 0.00 0.00
"seq.POSIXt" 0.02 0.13 0.00 0.00
"simplify2array" 0.02 0.13 0.00 0.00
"tryCatch" 0.02 0.13 0.00 0.00
"tryCatchList" 0.02 0.13 0.00 0.00
"tryCatchOne" 0.02 0.13 0.00 0.00
"which" 0.02 0.13 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 15.5

Compana function for compositional analysis freezes in R

I'm trying to run compositional analysis of the use of different type of habitats by ground nesting chicks on a set of data using R Studio. It starts processing but gives never stops. I have to manually stop the processing or kill R Studio. (Same result in R.)
I'm using the campana function from the adehabitatHS package. From the adehabitat I'm able to run the sample pheasant and squirrel data without any problems. (I've tried calling campana from both packages with the same result.)
For each chick, the habitat available varies as it's taken as a buffer zone around their nest site.
My data
This is the available habitats for each chick:
grass fallow.plot oil.seed.rape spring.barley winter.wheat maize other.crops other woodland hedgerow
1 23.35 7.53 45.75 0.00 0.00 0.00 0.00 0.00 23.37 0.00
2 86.52 10.35 0.00 0.00 1.24 0.00 0.00 1.89 0.00 0.00
3 5.18 10.33 28.36 38.82 0.00 0.00 17.17 0.14 0.00 0.00
4 4.26 18.32 27.31 32.66 3.82 0.00 0.00 5.02 5.52 3.09
5 4.26 18.32 27.31 32.66 3.82 0.00 0.00 5.02 5.52 3.09
6 12.52 10.35 0.00 0.00 0.00 18.02 43.59 13.15 2.37 0.00
7 21.41 11.56 59.25 0.00 0.00 0.00 0.00 5.82 0.00 1.96
8 21.41 11.56 59.25 0.00 0.00 0.00 0.00 5.82 0.00 1.96
9 36.17 16.93 0.00 30.14 0.00 0.00 0.00 7.08 9.68 0.00
10 0.00 12.17 26.49 0.00 3.99 55.77 0.00 1.58 0.00 0.00
11 0.00 10.27 67.41 1.93 18.30 0.00 0.00 1.18 0.00 0.91
12 2.66 5.38 0.00 14.39 54.06 0.00 8.40 3.83 7.84 3.44
13 2.66 5.38 0.00 14.39 54.06 0.00 8.40 3.83 7.84 3.44
14 84.22 8.00 0.00 0.00 0.00 2.90 0.00 0.22 3.84 0.82
15 84.22 8.00 0.00 0.00 0.00 2.90 0.00 0.22 3.84 0.82
16 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00
17 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00
18 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00
19 86.85 13.04 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00
20 21.41 8.11 0.47 8.08 0.00 0.00 56.78 2.26 0.00 2.89
This is the used habitats (mcp):
grass fallow.plot oil.seed.rape spring.barley winter.wheat maize other.crops other woodland hedgerow
1 41.14 58.67 0.19 0.00 0.00 0.00 0.00 0.00 0 0.0
2 35.45 64.55 0.00 0.00 0.00 0.00 0.00 0.00 0 0.0
3 10.10 60.04 7.72 21.37 0.00 0.00 0.00 0.77 0 0.0
4 0.00 44.55 0.00 50.27 0.00 0.00 0.00 5.18 0 0.0
5 2.82 48.48 44.80 0.00 0.00 0.00 0.00 0.00 0 3.9
6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0 0.0
7 0.00 87.41 12.59 0.00 0.00 0.00 0.00 0.00 0 0.0
8 0.00 83.59 16.41 0.00 0.00 0.00 0.00 0.00 0 0.0
9 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0.0
10 0.00 18.93 0.00 0.00 0.00 81.07 0.00 0.00 0 0.0
11 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0.0
12 0.00 22.79 0.00 0.00 77.13 0.00 0.00 0.08 0 0.0
13 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0 0.0
14 54.60 44.97 0.00 0.00 0.00 0.00 0.00 0.43 0 0.0
15 62.86 36.57 0.00 0.00 0.00 0.00 0.00 0.57 0 0.0
16 11.15 88.10 0.00 0.00 0.00 0.00 0.00 0.75 0 0.0
17 20.06 79.62 0.00 0.00 0.00 0.00 0.00 0.32 0 0.0
18 38.64 60.95 0.00 0.00 0.00 0.00 0.00 0.41 0 0.0
19 3.81 95.81 0.00 0.00 0.00 0.00 0.00 0.38 0 0.0
20 0.00 3.56 0.00 0.00 0.00 0.00 96.44 0.00 0 0.0
I've tried both parametric and randomisation tests with the same results. The code I'm running:
habuse <- compana(used, avail, test = "randomisation",rnv = 0.001, nrep = 500, alpha = 0.1)
habuse <- compana(used, avail, test = "parametric")
Any ideas where I'm going wrong?
I've discovered the answer to my own question. For the used data, the function replaces 0 values with the value you specify (0.001 in my case). But it doesn't replace 0 values in the available data, and it doesn't like them either.
I replaced all the 0s with 0.001 in the available table, adjusted the other values and the function worked.

Resources