A Modular Design
As a data professional, you will need to understand existing code and applications, and how they work. More importantly, it is important that you understand how to maintain applications currently in production. There will be code and functions already written in which you will need to modify or call from your own code base to meet a specific business requirement. It is rare that one will be writing a brand new Python application in isolation, that doesn’t integrate with other applications or processes. This challenge warrants an understanding of modular design in developing Python packages, modules, and functions while adhering to business requirements.
This concept is represented visually in the image for this article. It is a collecton of modules within an organization where the "lime" square can represent the module you are responsible for. It fits into a larger context of application modules, applications that may span teams and departments and it’s important that your module integrates well within the overall scheme. This includes having well defined interfaces, documented functions, and a clear purpose for the module being written or maintained.
Strategy Object Pattern
The DRY (Don’t Repeat Yourself) principal states that code for performing a specific task is not repeated throughout your classes and modules. Rather, the logic for performing a specific function resides in one place.
In the Strategy Object pattern depicted in Figure 1, there is an abstract base classes which defines one or more abstract methods. The abstract class cannot be instatiated on its own or used as an object. It merely serves as a base template for subclasses to override. The abstract method cannot be called from another function or class since it serves as a template method for subclasses to override. Other non-abstract functions in the super class are shared among subclasses. In the class diagram in Figure 1, the can_ingest() method is a base method that can be called from subclasses. An abstract class cannot be used on its own. Its purpose is to provide a base template for subclasses to inherit from. We think of the abstract class as the basic building block for more specialized subclasses such as the CSVIngestor subclass in Figure 1.
Each subclass performs a specialized operation such as parsing a specific file type or format. As a result, we adhere to the DRY (Don't Repeat Yourself) principal which follows a modular design. It is also extensible in that additional subclasses for new file formats can be added to the package without having to modify existing code.
PEP 257 and PEP 8 Style Guides
The PEP 257 style guide defines the docstring conventions for modules, classes, and functions in a Python application. It is important for having consistent documentation of your packages, modules, classes and methods.
Each module contains a module-level comment describing the purpose of the module. Complex functions, classes, and methods include a docstring annotating the primary action of the callable function, any additional clarifications, followed by descriptions of parameters and return values.
It’s important that the docstring comments that you write describe the interface of the unit of code. Implementation-level details, except in extreme cases, belong as normal comments inline with the code. The docstring discusses what the function does, and possibly why, but doesn’t dwell too much on how the function works. The docstring conventions and pydocstyle module are discussed here at http://www.pydocstyle.org/en/stable/usage.html#command-line-interface
Recommended by LinkedIn
The following command runs at the terminal to install the pydocstyle package. Pydocstyle scans your applilcation code for violations of the PEP 257 style guide recommendations.
PS> pip install pydocstyle
Running this next command in the Powershell terminal prompt will output the PEP 257 messages to the clipboard.
PS> pydocstyle | clip
An example PEP 257 message is "D100: Missing docstring in public module". This means that no docstring is present on the first line if the python module file.
Conclusion
It is helpful to think modularly when designing applications in Python. There are many OOP (Objected Oriented Process) principals such as inheritance, polymorphism, and stategy objects to utilize. It is helpful to think of your code or processes that you write as modules that are callable and which can call into other modules. It makes the code you write more maintanable in the long-run. It is helpful for someone reading or maintaining your code to see a modular design in which each class has a single responsibility. Each function has a single purpose and is well documented according to PEP 257 and PEP 8 style guidelines. Having a modular design makes the overall application maintainable and understood better by other developers, security professionals, and Ops professionals.
References
Michael, great article on python code reuse. I will share with our developement teams!