Simple log file processing in Python

The other day I found myself in the unfortunate position of needing to scan through raw server logs to try and gather some information around a rare issue. Opening these log files in a text editor and doing a quick text search wasn't a great option: the log files had millions of log lines, were 500MB+ in size, and the text editors just gave up trying to search, multi-select, and extract the lines I needed.

I've recently gotten into Python (initially as a requirement for a project at work), and while I still have a lot to learn, I have found it to be amazing tool for scripting out quick little solutions to annoying problems. Like debugging server logs.

A couple minutes, and 22 lines of python later: I had taken a few million lines of server logs, and extracted the ~50 or so messages that were relevant. So I decided to take a few extra minutes and publish this post to encourage others to give Python a shot, with an example (of a pretty common) use case.

parse_logs.py:

import os
import re

# Regex used to match relevant loglines (in this case, a specific IP address)
line_regex = re.compile(r".*fwd=\"12.34.56.78\".*__aSyNcId_<_PQGroClU__quot;)

# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("output/parsed_lines.log")
# Overwrites the file, ensure we're starting out with a blank file
with open(output_filename, "w") as out_file:
    out_file.write("")

# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
    # Open input file in 'read' mode
    with open("test_log.log", "r") as in_file:
        # Loop over each log line
        for line in in_file:
            # If log line matches our regex, print to console, and output file
            if (line_regex.search(line)):
                print line
                out_file.write(line)

If you'd like to see this example running on your own machine:

Example Run: