Python, The fastest way to find string in multiple text files (some files are big)

Python, The fastest way to find string in multiple text files (some files are big) - python-3.4

I try to search a string in multiple files, my code works fine but for big text files it takes a few minutes.
wrd = b'my_word'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'rb') as ofile:
#### loops through every line in the file comparing the strings ####
for line in ofile:
if wrd in line:
try:
sendMail(...)
logging.warning('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}' .format(e.strerror))
sys.exit(0)
I found this topic : Python finds a string in multiple files recursively and returns the file path
but it does not answer my question..
Do you have an idea how to make it faster ?
Am using python3.4 on windows server 2003.
Thx ;)

My files are generated from an oracle application and if there is an error, i log it and stop generation my files.
So i search my string by reading the files from the end, because the string am looking for is an Oracle error and is at the end of the files.
wrd = b'ORA-'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'r') as ofile:
try:
ofile.seek (0, 2) # Seek a end of file
fsize = ofile.tell() # Get Size
ofile.seek (max (fsize-1024, 0), 0) # Set pos a last n chars
lines = ofile.readlines() # Read to end
lines = lines[-10:] # Get last 10 lines
for line in lines:
if string in line:
sendMail(.....)
logging.error('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}'.format(e.strerror))
sys.exit(0)

Related

Migrating to Qt6/PyQt6: what are all the deprecated short-form names in Qt5?

I'm trying to migrate a codebase from PyQt5 to PyQt6. I read in this article (see https://www.pythonguis.com/faq/pyqt5-vs-pyqt6/) that all enum members must be named using their fully qualified names. The article gives this example:
# PyQt5
widget = QCheckBox("This is a checkbox")
widget.setCheckState(Qt.Checked)
# PyQt6
widget = QCheckBox("This is a checkbox")
widget.setCheckState(Qt.CheckState.Checked)
Then the article continues:
"There are too many updated values to mention them all here. But if you're converting a codebase you can usually just search online for the short-form and the longer form will be in the results."
I get the point. This quote basically says something along the lines:
"If the Python interpreter runs into an error, and the error turns out to be a short-form enum, you'll likely find the solution online."
I get that. But this is not how I want to migrate the codebase. I want a full list of all the short-form enums and then perform a global search-and-replace for each.
Where can I find such a list?

I wrote a script to extract all the short-form and corresponding fully qualified enum names from the PyQt6 installation. It then does the conversions automatically:
# -*- coding: utf-8 -*-
# ================================================================================================ #
# ENUM CONVERTER TOOL #
# ================================================================================================ #
from typing import *
import os, argparse, inspect, re
q = "'"
help_text = '''
Copyright (c) 2022 Kristof Mulier
MIT licensed, see bottom
ENUM CONVERTER TOOL
===================
The script starts from the toplevel directory (assuming that you put this file in that directory)
and crawls through all the files and folders. In each file, it searches for old-style enums to
convert them into fully qualified names.
HOW TO USE
==========
Fill in the path to your PyQt6 installation folder. See line 57:
pyqt6_folderpath = 'C:/Python39/Lib/site-packages/PyQt6'
Place this script in the toplevel directory of your project. Open a terminal, navigate to the
directory and invoke this script:
$ python enum_converter_tool.py
WARNING
=======
This script modifies the files in your project! Make sure to backup your project before you put this
file inside. Also, you might first want to do a dry run:
$ python enum_converter_tool.py --dry_run
FEATURES
========
You can invoke this script in the following ways:
$ python enum_converter_tool.py No parameters. The script simply goes through
all the files and makes the replacements.
$ python enum_converter_tool.py --dry_run Dry run mode. The script won't do any replace-
ments, but prints out what it could replace.
$ python enum_converter_tool.py --show Print the dictionary this script creates to
convert the old-style enums into new-style.
$ python enum_converter_tool.py --help Show this help info
'''
# IMPORTANT: Point at the folder where PyQt6 stub files are located. This folder will be examined to
# fill the 'enum_dict'.
pyqt6_folderpath = 'C:/Python39/Lib/site-packages/PyQt6'
# Figure out where the toplevel directory is located. We assume that this converter tool is located
# in that directory. An os.walk() operation starts from this toplevel directory to find and process
# all files.
toplevel_directory = os.path.realpath(
os.path.dirname(
os.path.realpath(
inspect.getfile(
inspect.currentframe()
)
)
)
).replace('\\', '/')
# Figure out the name of this script. It will be used later on to exclude oneself from the replace-
# ments.
script_name = os.path.realpath(
inspect.getfile(inspect.currentframe())
).replace('\\', '/').split('/')[-1]
# Create the dictionary that will be filled with enums
enum_dict:Dict[str, str] = {}
def fill_enum_dict(filepath:str) -> None:
'''
Parse the given stub file to extract the enums and flags. Each one is inside a class, possibly a
nested one. For example:
---------------------------------------------------------------------
| class Qt(PyQt6.sip.simplewrapper): |
| class HighDpiScaleFactorRoundingPolicy(enum.Enum): |
| Round = ... # type: Qt.HighDpiScaleFactorRoundingPolicy |
---------------------------------------------------------------------
The enum 'Round' is from class 'HighDpiScaleFactorRoundingPolicy' which is in turn from class
'Qt'. The old reference style would then be:
> Qt.Round
The new style (fully qualified name) would be:
> Qt.HighDpiScaleFactorRoundingPolicy.Round
The aim of this function is to fill the 'enum_dict' with an entry like:
enum_dict = {
'Qt.Round' : 'Qt.HighDpiScaleFactorRoundingPolicy.Round'
}
'''
content:str = ''
with open(filepath, 'r', encoding='utf-8', newline='\n', errors='replace') as f:
content = f.read()
p = re.compile(r'(\w+)\s+=\s+\.\.\.\s+#\s*type:\s*([\w.]+)')
for m in p.finditer(content):
# Observe the enum's name, eg. 'Round'
enum_name = m.group(1)
# Figure out in which classes it is
class_list = m.group(2).split('.')
# If it belongs to just one class (no nesting), there is no point in continuing
if len(class_list) == 1:
continue
# Extract the old and new enum's name
old_enum = f'{class_list[0]}.{enum_name}'
new_enum = ''
for class_name in class_list:
new_enum += f'{class_name}.'
continue
new_enum += enum_name
# Add them to the 'enum_dict'
enum_dict[old_enum] = new_enum
continue
return
def show_help() -> None:
'''
Print help info and quit.
'''
print(help_text)
return
def convert_enums_in_file(filepath:str, dry_run:bool) -> None:
'''
Convert the enums in the given file.
'''
filename:str = filepath.split('/')[-1]
# Ignore the file in some cases
if any(filename == fname for fname in (script_name, )):
return
# Read the content
content:str = ''
with open(filepath, 'r', encoding='utf-8', newline='\n', errors='replace') as f:
content = f.read()
# Loop over all the keys in the 'enum_dict'. Perform a replacement in the 'content' for each of
# them.
for k, v in enum_dict.items():
if k not in content:
continue
# Compile a regex pattern that only looks for the old enum (represented by the key of the
# 'enum_dict') if it is surrounded by bounds. What we want to avoid is a situation like
# this:
# k = 'Qt.Window'
# k found in 'qt.Qt.WindowType.Window'
# In the situation above, k is found in 'qt.Qt.WindowType.Window' such that a replacement
# will take place there, messing up the code! By surrounding k with bounds in the regex pat-
# tern, this won't happen.
p = re.compile(fr'\b{k}\b')
# Substitute all occurences of k (key) in 'content' with v (value). The 'subn()' method re-
# turns a tuple (new_string, number_of_subs_made).
new_content, n = p.subn(v, content)
if n == 0:
assert new_content == content
continue
assert new_content != content
print(f'{q}{filename}{q}: Replace {q}{k}{q} => {q}{v}{q} ({n})')
content = new_content
continue
if dry_run:
return
with open(filepath, 'w', encoding='utf-8', newline='\n', errors='replace') as f:
f.write(content)
return
def convert_all(dry_run:bool) -> None:
'''
Search and replace all enums.
'''
for root, dirs, files in os.walk(toplevel_directory):
for f in files:
if not f.endswith('.py'):
continue
filepath = os.path.join(root, f).replace('\\', '/')
convert_enums_in_file(filepath, dry_run)
continue
continue
return
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description = 'Convert enums to fully-qualified names',
add_help = False,
)
parser.add_argument('-h', '--help' , action='store_true')
parser.add_argument('-d', '--dry_run' , action='store_true')
parser.add_argument('-s', '--show' , action='store_true')
args = parser.parse_args()
if args.help:
show_help()
else:
#& Check if 'pyqt6_folderpath' exists
if not os.path.exists(pyqt6_folderpath):
print(
f'\nERROR:\n'
f'Folder {q}{pyqt6_folderpath}{q} could not be found. Make sure that variable '
f'{q}pyqt6_folderpath{q} from line 57 points to the PyQt6 installation folder.\n'
)
else:
#& Fill the 'enum_dict'
type_hint_files = [
os.path.join(pyqt6_folderpath, _filename)
for _filename in os.listdir(pyqt6_folderpath)
if _filename.endswith('.pyi')
]
for _filepath in type_hint_files:
fill_enum_dict(_filepath)
continue
#& Perform requested action
if args.show:
import pprint
pprint.pprint(enum_dict)
elif args.dry_run:
print('\nDRY RUN\n')
convert_all(dry_run=True)
else:
convert_all(dry_run=False)
print('\nQuit enum converter tool\n')
# MIT LICENSE
# ===========
# Copyright (c) 2022 Kristof Mulier
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction, in-
# cluding without limitation the rights to use, copy, modify, merge, publish, distribute, sublicen-
# se, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to
# do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or substan-
# tial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRIN-
# GEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Make sure you backup your Python project. Then place this file in the toplevel directory of the project. Modify line 57 (!) such that it points to your PyQt6 installation folder.
First run the script with the --dry_run flag to make sure you agree with the replacements. Then run it without any flags.

How do I turn a file's contents into a dictionary?

I have a function that I want to open .dat files with, to extract data from them, but the problem is I don't know how to turn that data back into a dictionary to store in a variable. Currently, the data in the files are stored like this: "{"x":0,"y":1}" (it uses up only one line of the file, which is just the normal structure of a dictionary).
Below is just the function where I open the .dat file and try to extract stuff from it.
def openData():
file = fd.askopenfile(filetypes=[("Data",".dat"),("All Files",".*")])
filepath = file.name
if file is None:
return
with open(filepath,"r") as f:
contents = dict(f.read())
print(contents["x"]) #let's say there is a key called "x" in that dictionary
This is the error that I get from it: (not because the key "x" is not in dict, trust me)
Exception in Tkinter callback
Traceback (most recent call last):
File "...\AppData\Local\Programs\Python\Python39\lib\tkinter\__init__.py", line 1892, in __call__
return self.func(*args)
File "...\PycharmProjects\[this project]\main.py", line 204, in openData
contents = dict(f.read())
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Process finished with exit code 0
Update: I tried using json and it worked, thanks to #match
def openData():
file = fd.askopenfile(filetypes=[("Data",".dat"),("All Files",".*")])
filepath = file.name
if file is None:
return
with open(filepath,"r") as f:
contents = dict(json.load(f))
print(contents["x"])

You need to parse the data to get a data structure from a string, fortunately, Python provides a function for safely parsing Python data structures: ast.literal_eval(). E.g:
import ast
...
with open("/path/to/file", "r") as data:
dictionary = ast.literal_eval(data.read())
Reference stackoverflow

how to view, open and save a .rdb file in RStudio

I am able to see every database in the .rdb file in the variable environment as a "promise" as per direction here. Now, I want to edit one of the file and save it. How can I do that? I am new in R.

In a discussion on r-pkg-devel, Ivan Krylov provided the following function ro read an RDB database:
# filename: the .rdb file
# offset, size: the pair of values from the .rdx
# type: 'gzip' if $compressed is TRUE, 'bzip2' for 2, 'xz' for 3
readRDB <- function(filename, offset, size, type = 'gzip') {
f <- file(filename, 'rb')
on.exit(close(f))
seek(f, offset + 4)
unserialize(memDecompress(readBin(f, 'raw', size - 4), type))
}
Therefore, you should be able to implement the reverse using a combination of serialize, memCompress, and writeBin.
Note that if the object changes size, you will also have to adjust the index file.

Workaround for case-sensitive input to dir

I am using Octave 5.1.0 on Windows 10 (x64). I am parsing a series of directories looking for an Excel spreadsheet in each directory with "logbook" in its filename. The problem is these files are created by hand and the filenaming isn't consistent: sometimes it's "LogBook", other times it's "logbook", etc...
It looks like the string passed as input to the dir function is case-sensitive so if I don't have the correct case, dir returns an empty struct. Currently, I am using the following workaround, but I wondered if there was a better way of doing this (for a start I haven't captured all possible upper/lower case combinations):
logbook = dir('*LogBook.xls*');
if isempty(logbook)
logbook = dir('*logbook.xls*');
if isempty(logbook)
logbook = dir('*Logbook.xls*');
if isempty(logbook)
logbook = dir('*logBook.xls*');
if isempty(logbook)
error(['Could not find logbook spreadsheet in ' dir_name '.'])
end
end
end
end

You need to get the list of filenames (either via readdir, dir, ls), and then search for the string in that list. If you use readdir, it can be done like this:
[files, err, msg] = readdir ('.'); # read current directory
if (err != 0)
error ("failed to readdir (error code %d): %s", msg);
endif
logbook_indices = find (cellfun (#any, regexpi (files, 'logbook'));
logbook_filenames = files(logbook_indices);
A much less standard approach could be:
glob ('*[lL][oO][gG][bB][oO][kK]*')

Jupyter: suppress %%file magic output

When using IPython's %%file magic to write the content of a notebook cell to a file in the current working directory, is there a way to suppress the Created file ... info text displayed on execution of the cell?
Sometimes creating files in this way is super handy (for example when using a Matlab kernel) but this is a huge problem with respect to version control, I don't want the structure of my local filesystem to be present in code that others work on as well.

source for this function
#cell_magic
def writefile(self, line, cell):
"""Write the contents of the cell to a file.
The file will be overwritten unless the -a (--append) flag is specified.
"""
args = magic_arguments.parse_argstring(self.writefile, line)
if re.match(r'^(\'.*\')|(".*")$', args.filename):
filename = os.path.expanduser(args.filename[1:-1])
else:
filename = os.path.expanduser(args.filename)
if os.path.exists(filename):
if args.append:
print("Appending to %s" % filename)
else:
print("Overwriting %s" % filename)
else:
print("Writing %s" % filename)
mode = 'a' if args.append else 'w'
with io.open(filename, mode, encoding='utf-8') as f:
f.write(cell)
File: /usr/local/lib/python3.6/dist-packages/IPython/core/magics/osm.py

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Python, The fastest way to find string in multiple text files (some files are big) - python-3.4

Related

Migrating to Qt6/PyQt6: what are all the deprecated short-form names in Qt5?

How do I turn a file's contents into a dictionary?

how to view, open and save a .rdb file in RStudio

Workaround for case-sensitive input to dir

Jupyter: suppress %%file magic output

Categories

Resources