Fluent-bit, How can I use strftime in path - fluent-bit

my log file name contains the current date, like my_log_210616.log
and I need to tail the file in fluent-bit. I tried with,
[INPUT]
Name tail
Path /var/log/my-service/my_log_%y%m%d.log
[OUTPUT]
Name stdout
Match *
but it doesn't watch the file. I replaced my_log_%y%m%d.log with my_log_210616.log, then it works.
How can I use strftime in the path?

One solution is to use a path that matches any date. Since fluent-bit will read the log files from their tail you won’t get data from the older files.
You could also add ’Ignore_Older 24h’ to the input config. This will ignore files with modified times older than 24 hours. Using ’Ignore_Older’ with a parser that extracts the event time works even better.
You could also do more elaborate filtering by file name in a lua filter.

Related

Is there a way to parse out information from a xcom_pull in Airflow?

So what I'm working with is I have a DAG that has specific information that is being passed through tasks, everything is working as it should. The file needs to be stored into a reports/ folder for the following tasks to work correctly. I'm calling the actual name of the report through a xcom_pull but I also want to parse out information from this xcom_pull in order to capture the unique filename itself to use later on in other tasks. I have a task later on that inserts this filename into the csv file, but I need it to match the filename itself so its a 1:1 match.
I want to parse out information of a xcom_pull option and I'm having issues doing so. The example I have is below:
report_filename = "reports/{}_{}".format('report_example', str(uuid.uuid1()))
get_report = GoogleCampaignManagerDownloadReportOperator(
task_id="get_report",
profile_id=1234,
api_version=1234,
bucket_name=test_bucket,
report_name=report_filename,
report_id=report_id,
file_id=file_id,
)
report_filename_test = xcom_pull(get_report, 'report_name')
sanitize_report = SanitizeReportOperator(
task_id='sanitize_report',
dest_bucket=test_bucket,
dest_object=report_filename_test,
shared_object=str(report_filename_test).replace('reports/', ''),
append_timestamp=True,
append_filename=True
)
As of right now the xcom_pull pulls down the following:
reports/report_example_b3413b62-cc8a-11ec-bded-52e9ae62e477.csv.gz
However, I want to have another xcom_pull that will only pull the following:
report_example_b3413b62-cc8a-11ec-bded-52e9ae62e477.csv.gz
I have tried converting report_filename_test to a string and using the replace function, so for example:
new_test = str(report_filename_test).replace('reports/', '')
But when attempting this, it makes the new_test converting into a NULL format or ignores it completely and saves the file later on into a reports/ folder.
I have also tried passing the report_filename into a list and grabbing the first iteration and grabbing the first iteration, but with how Airflow works from task to task, it creates a new filename with a different uuid each time, which is not what I'm aiming to have done. I have also tried doing a PythonOperator option to create a function specifically to name the file and be called later on throughout the DAG but have not had any luck with this either.
Is there a way to do this where you can parse out the information from a xcom_pull or another way to make this work? The end goal is to essentially have a file name with a specific uuid that I can pass through into the csv file and rename the file to the same specific uuid that is being built without the folder name in front.
I'm just looking to have a unique filename be passed through multiple tasks that is the exact same each time with a uuid format. I'm running out of ideas of how to make this work and have been stuck on this for almost two weeks now.
Any help with this would be greatly appreciated!

Extract exactly one file (any) from each 7zip archive, in bulk (Unix)

I have 1,500 7zip archives, each archive contains 2 to 10 files, with no subdirectories.
Each file has the same extension, however the filename varies.
I only want one file out of each archive, but I'd like to perform this in bulk. I do not care which file is taken out, as long as only one file is taken out. It can be the first file, the newest, the biggest, the smallest, it doesn't matter.
Here's an example:
aa.7z {blah 56.smc, blah 57.smc, 1 blah 58.smc}
ab.7z {xx.smc, xx 1.smc, xx_2.smc}
ac.7z {1.smc}
I want to run something equivalent to:
7z e *.7z # But somehow only extract one file
Thank you!
Ultimately my solution was to extract all files and run the following in the directory:
for n in *; do echo "$n"; done > files.txt
I then imported that list into excel, and split the files by a special character that divided the title of the file with the qualifying data inside the filename (for example: Some Title (V1) [X2].smc), specifically I used a brackets delimiter.
Then I removed all duplicates, leaving me with only one edition of each from the zip. I finally remerged the columns (unfortunately the bracket was deleted during the splitting so wrote a function to add it back on the condition of whether there was content in the next column) and then resaved files.txt, after a bit of reviewing StackOverflow for answers, deleted files based on an input file (files.txt). A word of warning on this, spaces in filenames cause problems with rm and xargs so I had to encapsulate the variable with quotes.
Ultimately this still didn't serve me well enough so I just used a different resource entirely.
Posting this answer so others who find themselves in a similar predicament find an alternative resolution.

list of files with space in the name

I would like to get the list of files with a specific extention in a folder. However, these files has space in the name. So for example, imagining I have files named file test1.txt, file test2.txt, file test3.txt, file test4.txt, if I do
list.files(pattern="file test*.txt")
I got
character(0)
NOTA: Apparentely, using simply pattern="file test*" it works fine but I need the extention file as well.
Try:
list.files(pattern="file test.*.txt")
Actually, what this says is:
list.files(pattern="file test(.*).txt")
(which also works). . refers to any character and * refers to the idea that this character should be present 0 or more times (see ?regex).
In your kast example you said that using pattern="file test*" works but you need a way to search for the extension as well.
All you have to do is Change your code to pattern="file test.*.txt". This would make your code search for any filename that matched "file testX.txt" with any one character in place of X.

i want to create a file with the system date in another directory and copy some data into it. Unix

I want to create a file with system date in another directory and copy data difference of two files into it.
NOW=$(date +"%H_%D")
file="log_$NOW.txt"
diff tmp1.txt tmp2.txt > $temp/log_$NOW.txt
i am using above code. But file is not getting generated. Apart from it if i create a file with simple name i.e without using $NOW the file is getting generated. Please help me.
The format string to date produces something like 16_12/03/13. This contains directory separators so the filename becomes invalid. Instead use dots to separate the date:
NOW=$(date +"%H_%m.%d.%y")
which should produce strings like 16_12.03.13

Reading a file into R with partly unknown filename

Is there a way to read a file into R where I do not know the complete file name. Something like.
read.csv("abc_*")
In this case I do not know the complete file name after abc_
If you have exactly one file matching your criteria, you can do it like this:
read.csv(dir(pattern='^abc_')[1])
If there is more than one file, this approach would just use the first hit. In a more elaborated version you could loop over all matches and append them to one dataframe or something like that.
Note that the pattern uses regular expressions and thus is a bit different from what you did expect (and what I wrongly assumed at my first shot to answer the question). Details can be found using ?regex
If you have a directory you want to submit, you have do modify the dir command accordingly:
read.csv(dir('path/to/your/file', full.names=T, pattern="^abc"))
The submitted path in your case may be c:\\users\\user\\desktop, and then the pattern as above. full.names=T forces dir() to output a whole path and not only the file name. Try running dir(...) without the read.csv to understand what is happening there.
If you want to give your path as a complete string, it again gets a bit more complicated:
filepath <- 'path/to/your/file/abc_'
read.csv(dir(dirname(filepath), full.names=T, pattern=paste("^", basename(filepath), sep='')))
That process will fail if your filename contains any regular expression keywords. You would have to substitute then with their corresponding escape sequences upfront. But that again is another topic.

Resources