Analysis of Hidden Technical Debt in Machine Learning Systems

5 min readNov 1, 2020

Hidden Technical Debt in Machine Learning Systems offers a very interesting high-level overview of the numerous extra layers of technical debt that exist in Machine Learning enabled systems. Unlike standard software systems, ML-enabled systems utilize external data instead of standard code and software logic and contain a machine learning component. This replacement of standard software logic with data results in systems that are much harder to maintain in the long run if the proper precautions aren’t taken. Therefore, it’s imperative that every Data Scientist/Machine Learning Engineer be aware of the various debts that come with ML-enabled systems in order to prevent a serious catastrophe in the future.

Model Complexity

Model complexity refers to the overall complex nature of ML-enabled systems, the process through which data is input and output, and any intermediary stages in between. This complexity makes it entirely impossible to make isolated changes in an ML-enabled system, as any change would result in a variation of the ML component within. Additionally, an interesting problem that arises as complexity in an ML system increases is the advent of undeclared consumers. Undeclared consumers are other systems or parts of the development stack that silently utilize outputs and/or intermediary files generated by the ML system. This poses a huge risk since these components are now silently coupled to the system, and any changes in the ML component affect the silent consumers. This coupling could result in adverse outcomes which are tough to debug at best and possibly cascading failures at worst.

Data Dependencies

ML is required in exactly those cases when the desired behavior cannot be effectively expressed in software logic without dependency on external data.

Unlike regular software systems, ML-enabled systems are entirely dependent on external data. When the inputs to an ML system aren’t strictly maintained, the input data may change and lead to an adverse effect on the system. This includes any improvements to the input signals, since changing anything changes everything. Additionally, over time underutilized data features, legacy features, bundled features, and/or correlated features can generate inefficiencies at best and faults at worst. It’s imperative that regular input validation checks are made, and exhaustive leave-one-out feature selection evaluations are run to eliminate underutilized features.

Feedback Loops

One of the key features of live ML systems is that they often end up influencing their own behavior if they update over time.

ML systems are unique in that they require training, which requires data. Often over time, direct feedback loops arise and ML systems directly influence the selection of their own future training data. Although this is a relatively tough problem to deal with, it’s exactly what data scientists love to research and solve. The more challenging issue is hidden feedback loops, which is when two systems influence each other indirectly through the world. An example being, two stock market prediction models from independent investment firms. Any improvements (or at worst, bugs) to one may influence the bidding and buying behavior of the other.

Anti-Patterns

An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive. Within an ML-enabled system, there are a few unique anti-patterns which hinder the maintainability of the system. Glue code, which is often used to get data into and out of a general-purpose ML solution, ends up creating lots of supporting code that is costly in the long term. Pipeline jungles, which organically evolve over time, are the result of incremental scrapes, joins, and sampling steps often with intermediate outputs and files. Managing and testing these pipelines are costly and time-consuming, however since 2015 many libraries (such as scikit-learn with the Pipeline class) come with built-in pipeline abstractions easing their management.

Other Areas of ML Debt

Lastly, there are many other areas of technical debt in the production of ML-enabled systems. Process management debt, which occurs in very mature systems that tend to have hundreds or thousands of models running simultaneously, involves managing and assigning resources with different business priorities. Reproducibility debt involves designing real-world systems and ensuring strict reproducibility, which is difficult when using randomized algorithms, non-determinism in parallel learning, and interactions with the outside world. Further, probably the most important type of debt, cultural debt. Cultural debt exists when there is a hard line between research and engineering, which is counterproductive in the long term. Therefore, it’s imperative to cultivate a culture that rewards simplicity, stability, and reproducibility.

Key Takeaways

The goal is not to add new functionality, but to enable future improvements, reduce errors, and improve maintainability.

As the Data Science field continues to grow, it’s important that those within the community are aware of the issues involved with putting ML into production. Luckily, since generally 95% of any ML system isn’t actually ML, it works in our benefit to learn from the software engineering field and take advantage of the many decades of learned experience. The authors of Hidden Technical Debt in Machine Learning Systems did an excellent job of expressing the additional layers of technical debt involved in ML systems, and the various solutions/measures to limit it. Some of the key ways they offer to pay down the debt are:

Using common APIs, which allows support infrastructure to be more reusable.
Isolating models by serving ensembles to reduce interaction between the external world and models.
Creating versioned copies of inputs, to prevent detriments to the system from changes in the input.
Regularly running exhaustive leave-one-feature-out evaluations, to identify and remove unnecessary features.
Testing of input signals, providing sanity checks which prevent corruption of models.
Improving documentation

Data Science has never been an isolated field, but it is more important then ever that as a community we pay attention to the long term implications of our ML systems. Taking the time and consideration from the beginning when managing these systems will result in better maintainability and future growth which otherwise would have been burdensome.

Originally published at https://edwardamor.xyz on November 1, 2020.