Search for text between two time frame using sed

Search for text between two time frame using sed - unix

I have log files with time stamps. I want to search for text between two time stamps using sed even if the first time stamp or the last time stamp are not present.
For example, if I search between 9:30 and 9:40 then it should return text even if neither 9:30 nor 9:40 is there but the time stamp is between 9:30 and 9:40.
I am using a sed one liner:
sed -n '/7:30:/,/7:35:/p' xyz.log
But it only returns data if both the time stamps are present; it will print everything if one of the time stamp are missing. And if the time is in 12 hr format it will pull data for both AM and PM.
Additionally, I have different time stamp formats for different log files so I need a generic command.
Here are some time format examples:
<Jan 27, 2013 12:57:16 AM MST>
Jan 29, 2013 8:58:12 AM
2013-01-31 06:44:04,883
Some of them contain AM/PM i.e. 12 hr format and others contain 24 hr format so I have to account for that as well.
I have tried this as well but it doesn't work:
sed -n -e '/^2012-07-19 18:22:48/,/2012-07-23 22:39:52/p' history.log

With the serious medley of time formats you have to parse, sed is not the correct tool to use. I'd automatically reach for Perl, but Python would do too, and you probably could do it in awk if you put your mind to it. You need to normalize the time formats (you don't say anything about date, so I assume you're working only with the time portion).
#!/usr/bin/env perl
use strict;
use warnings;
use constant debug => 0;
my $lo = "09:30";
my $hi = "09:40";
my $lo_tm = to_minutes($lo);
my $hi_tm = to_minutes($hi);
while (<>)
{
print "Read: $_" if debug;
if (m/\D\d\d?:\d\d:\d\d/)
{
my $tm = normalize_hhmm($_);
print "Normalized: $tm\n" if debug;
print $_ if ($tm >= $lo_tm && $tm<= $hi_tm);
}
}
sub to_minutes
{
my($val) = #_;
my($hh, $mm) = split /:/, $val;
if ($hh < 0 || $hh > 24 || $mm < 0 || $mm >= 60 || ($hh == 24 && $mm != 0))
{
print STDERR "to_minutes(): garbage = $val\n";
return undef;
}
return $hh * 60 + $mm;
}
sub normalize_hhmm
{
my($line) = #_;
my($hhmm, $ampm) = $line =~ m/\D(\d\d?:\d\d):\d\d\s*(AM|PM|am|pm)?/;
my $tm = to_minutes($hhmm);
if (defined $ampm)
{
if ($ampm =~ /(am|AM)/)
{
$tm -= 12 * 60 if ($tm >= 12 * 60);
}
else
{
$tm += 12 * 60 if ($tm < 12 * 60);
}
}
return $tm;
}
I used the sample data:
<Jan 27, 2013 12:57:16 AM MST>
Jan 29, 2013 8:58:12 AM
2013-01-31 06:44:04,883
Feb 2 00:00:00 AM
Feb 2 00:59:00 AM
Feb 2 01:00:00 AM
Feb 2 01:00:00 PM
Feb 2 11:00:00 AM
Feb 2 11:00:00 PM
Feb 2 11:59:00 AM
Feb 2 11:59:00 PM
Feb 2 12:00:00 AM
Feb 2 12:00:00 PM
Feb 2 12:59:00 AM
Feb 2 12:59:00 PM
Feb 2 00:00:00
Feb 2 00:59:00
Feb 2 01:00:00
Feb 2 11:59:59
Feb 2 12:00:00
Feb 2 12:59:59
Feb 2 13:00:00
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 2 09:41:00
Feb 2 23:00:00
Feb 2 23:59:00
Feb 2 24:00:00
Feb 3 09:30:00
Feb 3 09:40:00
and it produced what I consider the correct output:
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 3 09:30:00
Feb 3 09:40:00
I'm sure this isn't the only way to do the processing; it seems to work, though.
If you need to do date analysis, then you need to use one of the date or time manipulation packages from CPAN to deal with the problems. The code above also hard codes the times in the script. You'd probably want to handle them as command line arguments, which is perfectly doable, but isn't scripted above.

Related

Append list of logged in users to a log file using crontab?

I need to create a basic log file through the use of a crontab job that appends a timestamp, followed by a list of logged in users. It must be at 23:59 each night.
(I have used 18 18 * * * as an example to make sure the job works for now)
So far, I have;
!#/bin/bash
59 23 * * * (date ; who) >> /root/userlogfile.txt
for my crontab script, the output;
Fri Dec 9 18:18:01 UTC 2022
root console 00:00 Dec 9 18:15:15
My required output is something similar to;
Fri 09 Dec 23:59:00 GMT 2022
user1 tty2 2017-11-30 22:00 (:0)
user5 pts/1 2017-11-30 20:35 (192.168.1.1)
How would I go about this?

Reading a date, time text file and converting to string using strptime()?

I have a text file of many rows containing date and time and the end goal is for me to group together the number of rows per week that their date values are in. This is so that I can plot a scatter diagram with x values being the week number and y values being the frequency. For example the text file (dates.txt):
Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013
So from here, week 1 will have a frequency of 6 and week 2 will have a frequency of 1
As I want to plot a scatter diagram for this, I want to convert them to text value first using strptime() with format %a %b
my attempt so far has been
time_stamp <- strptime(time_stamp, format='%a.%b')
However it shows the input string is too long. I'm very new to R-studio so could somebody please help me figure this out?
Thank you
Example of final output graph : https://imgur.com/a/3o3DivA

You could use readLines() to avoid the data frame, then read time using strptime, and finally strftime to format the output.
strftime(strptime(readLines('dates.txt'), '%c'), '%a.%b')
# [1] "Sat.May" "Sat.May" "Mon.May" "Tue.May" "Thu.May" "Thu.May" "Fri.May" "Fri.May" "Sat.May"
Edit
So it appears that your dates have a time zone abbreviation "Mon Apr 06 23:49:29 PDT 2009". Since it is constant during the dates we can specify it literally in the pattern.
We will use '%d_%m' for strftime to get something numeric seperated by _ with which we feed strsplit and then type.convert into numerics.
Finally we unlist, create a matrix that we fill byrow, and plot the guy.
strptime(readLines('timestamp.txt'), '%a %b %d %H:%M:%S PDT %Y') |>
strftime('%d_%m') |>
strsplit('_') |>
type.convert(as.is=TRUE) |>
unlist() |>
matrix(ncol=2, byrow=TRUE) |>
plot(pch=20, col=4, main='My Plot', xlab='day', ylab='month')
Note: Please use R>=4.1 for the |> pipes.

You need to first read (or assign) the data, parse it to a date type and then use that to e.g. get the number of the week.
Here is one example
text <- "Mon May 11 22:51:27 2013
Mon May 11 22:58:34 2013
Wed May 13 23:15:27 2013
Thu May 14 04:11:22 2013
Sat May 16 19:46:55 2013
Sat May 16 22:29:54 2013
Sun May 17 02:08:45 2013
Sun May 17 23:55:15 2013
Mon May 18 00:42:07 2013"
data <- read.table(text=text, sep='\n', col.names="dates")
data$parse <- anytime::anytime(data$dates)
data$week <- as.integer(format(data$parse, "%V"))
data
The result is a new data.frame object:
> data
dates parse week
1 Mon May 11 22:51:27 2013 2013-05-11 22:51:27 19
2 Mon May 11 22:58:34 2013 2013-05-11 22:58:34 19
3 Wed May 13 23:15:27 2013 2013-05-13 23:15:27 20
4 Thu May 14 04:11:22 2013 2013-05-14 04:11:22 20
5 Sat May 16 19:46:55 2013 2013-05-16 19:46:55 20
6 Sat May 16 22:29:54 2013 2013-05-16 22:29:54 20
7 Sun May 17 02:08:45 2013 2013-05-17 02:08:45 20
8 Sun May 17 23:55:15 2013 2013-05-17 23:55:15 20
9 Mon May 18 00:42:07 2013 2013-05-18 00:42:07 20
>

How to create new variable based on time and preexisting variables?

I have a dataset with repeated measurements on multiple individuals over time. It looks something like this:
ID Time Event
1 Jan 1 2012, 4pm Abx
1 Jan 2 2012, 2pm Test
1 Jan 26 2012 3 pm Test
1 Jan 29 2012 10 pm Abx
1 Jan 30 2012, 3 pm Test
1 Jan 5 2012 3 pm Test
2 Jan 1 2012, 4pm Abx
2 Jan 2 2012, 2pm Test
2 Jan 26 2012 3 pm Test
The dataset is currently based around events. It will later be filtered down to just tests. What I need to do is make a new variable that is 1 when certain events (Abx, in this case) occur within a certain time range of tests. So if the event 'Abx' occurs within, let's say, 48 hours of a Test event, the new variable should equal 1. Otherwise, it should equal zero.
I'm hoping to produce something like this:
ID Time Event New_variable
1 Jan 1 2012, 4pm Abx 1
1 Jan 2 2012, 2pm Test 1
1 Jan 26 2012 3 pm Test 0
1 Jan 29 2012 10 pm Abx 1
1 Jan 30 2012, 3 pm Test 1
1 Jan 5 2012 3 pm Test 0
2 Jan 1 2012, 4pm Abx 1
2 Jan 2 2012, 2pm Test 1
2 Jan 26 2012 3 pm Test 0
I know that I could probably solve this with a combination of Dplyr mutate functions combined with ifelse statements, and if I just wanted to make a variable that reads "1" when the antibiotic event occurs I could do that like this:
test %>%
mutate(New_variable = ifelse(Event == 'Abx', 1, 0)) -> test2
But I don't know how to factor in time so that Test events = 1 within 48 hours of an Abx event. I also am not sure how to make sure that the condition is applied only within the same ID. How can I do this?
Any help is appreciated!
Update: Thank you so much for the suggestions! I'm going to try these out on the data, but I think they'll work. If they don't, I'll be back soon. Success! I also modified the suggested helper function to include additional options (for more than one type of Abx):
abxRows <- type == "Abx" | type == "Abx2"

To the data provided, I added two "Abx" events which should not be one (i.e. one that was not within 48 hours and one that wasn't in the same group as the test that was within 48 hours).
library(dplyr)
library(lubridate)
library(purrr)
eventData <-
data.frame(stringsAsFactors = FALSE,
ID = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1),
Time = c("Jan 1 2012 4 pm", "Jan 2 2012, 2pm",
"Jan 26 2012 3 pm", "Jan 29 2012 10 pm",
"Jan 30 2012 3 pm", "Jan 5 2012 3 pm",
"Jan 1 2012 4 pm", "Jan 2 2012, 2pm",
"Jan 26 2012 3 pm", "Feb 12 2012 1pm",
"Jan 16 2012 3 pm", "Jan 16 2012 1 pm"),
Event = c("Abx", "Test", "Test", "Abx", "Test", "Test",
"Abx", "Test", "Test", "Abx", "Abx", "Test")
) %>%
mutate(Time = mdy_h(Time),
window = if_else(Event == "Test",
interval(Time - hours(48), Time + hours(48)),
interval(NA, NA))
)
First, you want to make sure the Time column is a time format. Then create a column of the lubridate Interval class that creates a 48 hr window around "Test" events.
Define the helper function that will check if the event occurred within the window.
chkFun <- function(eventTime, intervals, grp, type){
abxRows <- type == "Abx"
testRows <- !abxRows
hits <- map2_lgl(eventTime, grp,
~any(.x %within% intervals[grp %in% .y], na.rm = TRUE)) &
abxRows
testHits <- map_lgl(which(testRows),
~any(eventTime[abxRows & (grp[.x] == grp)] %within%
intervals[.x]))
hits[testRows] <- testHits
as.integer(hits)
}
This function first goes through and test if the "Abx" events occur within the intervals. It then determines which "Test" rows have an interval that contains a "Abx" event. The function returns the combination of these cast as integers.
Last, just use a mutate statement with the helper function, dropping the window column
eventData %>%
mutate(New_variable = chkFun(Time, window, ID, Event)) %>%
select(-window)
Alternatively, the helper function could just take the data.frame as an argument and assume the column names. In the form above, though, if you define it first in your script, it could also be used in the original definition of eventData
Results:
#> ID Time Event New_variable
#> 1 1 2012-01-01 16:00:00 Abx 1
#> 2 1 2012-01-02 14:00:00 Test 1
#> 3 1 2012-01-26 15:00:00 Test 0
#> 4 1 2012-01-29 22:00:00 Abx 1
#> 5 1 2012-01-30 15:00:00 Test 1
#> 6 1 2012-01-05 15:00:00 Test 0
#> 7 2 2012-01-01 16:00:00 Abx 1
#> 8 2 2012-01-02 14:00:00 Test 1
#> 9 2 2012-01-26 15:00:00 Test 0
#> 10 2 2012-02-12 13:00:00 Abx 0
#> 11 2 2012-01-16 15:00:00 Abx 0
#> 12 1 2012-01-16 13:00:00 Test 0

So I dont have a copy of your data, so Im not sure what for kmat your dates are in...
I would recommend converting the date to the right format using as.POSIXct(Time, format="%b %d %Y, %I%p") For more info on the format look up ?strptime, but I think that is right for your column.
If we assume your data frame is like this... I know I have changed parts of it but this is for simplicity
df <- data.frame(ID = c(rep(1,6),rep(2,3)),
Time=c(seq(from=start, by=interval*6840, to=end)[1:6],seq(from=start, by=interval*6840, to=end)[1:3]),
Event = rep(c("Abs","Test","Test"),3))
This would look like this
ID Time Event
1 1 2012-01-01 00:00:00 Abs
2 1 2012-01-05 18:00:00 Test
3 1 2012-01-10 12:00:00 Test
4 1 2012-01-15 06:00:00 Abs
5 1 2012-01-20 00:00:00 Test
6 1 2012-01-24 18:00:00 Test
7 2 2012-01-01 00:00:00 Abs
8 2 2012-01-05 18:00:00 Test
9 2 2012-01-10 12:00:00 Test
So you can use the following code to test whether a Test falls within 48 hours of an Abs
df[which(df$Event=="Test"),]$Time %in% unlist(Map(`:`, df[which(df$Event=="Abs"),]$Time-48*60*60, df[which(df$Event=="Abs"),]$Time+48*60*60))
So this will return FALSE for all, but that is because the synthetic data is at larger time steps.
To unpack this...
df[which(df$Event=="Test"),]$Time Gives the times of tests
%in% Says look for what precedes this, in a set of values that follows it.
So what follows it is: unlist(Map(`:`, df[which(df$Event=="Abs"),]$Time-48*60*60, df[which(df$Event=="Abs"),]$Time+48*60*60))
This creates a list of dates +/- 48 hours from each Abs. to add or subtract 48 hours, POSIXct objects like this done in seconds, hence 48*60*60

How to run cronjob on alternate weekday?

I have a script which runs everyday at 1.00 AM regularly for every day.
But On every alternate Wednesday I need to change the timings to 6.00 AM and which currently I am doing separately on every Tuesday Manually.
e.g
Wednesday Nov 09 2016 6.00 AM.
Wednesday Nov 23 2016 6.00 AM.
Wednesday Dec 07 2016 6.00 AM.
The main thing is for every Wednesday in between the job should be as per regular timings.

Using this bash trick it could be done with 3 cron entries (possibly 2):
#Every day except Wednesdays at 1am
0 1 * * 0,1,2,4,5,6 yourCommand
#Every Wednesdays at 1am, proceeds only on even weeks
0 1 * * 3 test $((10#$(date +\%W)\%2)) -eq 0 && yourCommand
#Every Wednesdays at 6am, proceeds only on odd weeks
0 6 * * 3 test $((10#$(date +\%W)\%2)) -eq 1 && yourCommand
Change the -eq's to 1 or 0 depending if you want to start with odd or even week. It should work according to your example, because Wednesday Nov 09 2016 6.00 AM is even.

Subsetting a dataframe based on the values of two or more columns

I would like to subset a timeseries dataframe based on my requirement.
I have a dataframe something similar to the one mentioned below.
> df
Date Year Month Day Time Parameter
2012-04-19 2012 04 19 7:00:00 26
2012-04-19 2012 04 19 7:00:00 20
.................................................
2012-05-01 2012 05 01 00:00:00 23
2012-05-01 2012 05 01 00:30:00 22
.................................................
2015-04-30 2015 04 30 23:30:00 20
.................................................
2015-05-01 2015 05 01 00:00:00 26
From the dataframe similar to this I will like to select all the data from the first of May 2012 2012-05-01 to the end of April 2015-04-30, regardless of the starting and end date of the dataframe.
However, I am familiar with the grep function to select the data from one particular column. I have been using the following code with grep and with.
# To select one particular year
> df.2012 <- df[grep("2012", df$Year),]
# To select two or more years at the same time
> df.sel.yr <- df[grep("201[2-5]", df$Year),]
# To select one particular month of a particular year.
> df.Dec.2012 <- df[with(df, Year=="2012" & Month=="12"), ]
With several Lines of commands i will be able to do it. But it would save a lot of time if I can do it with only few or one line of command.
Any help will be appreciated. Thank you in advance.

If your date column is not of class date first convert it to one by,
df$Date <- as.Date(df$Date)
and then you can subset the date by,
df[df$Date >= as.Date("2012-05-01") & df$Date <= as.Date("2015-04-30"), ]
# Date Year Month Day Time Parameter
#3 2012-05-01 2012 5 1 00:00:00 23
#4 2012-05-01 2012 5 1 00:30:00 22
#5 2015-04-30 2015 4 30 23:30:00 20

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Search for text between two time frame using sed - unix

Related

Append list of logged in users to a log file using crontab?

Reading a date, time text file and converting to string using strptime()?

How to create new variable based on time and preexisting variables?

How to run cronjob on alternate weekday?

Subsetting a dataframe based on the values of two or more columns

Categories

Resources