ML security has the same goal as all cybersecurity measures: reducing the risk of sensitive data being exposed. If a bad actor interferes with your ML model or the data it uses, that model may output incorrect results that, at best, undermine the benefits of ML and, at worst, negatively impact your business or customers.
“Executives should care about this because there’s nothing worse than doing the wrong thing very quickly and confidently,” says Zach Hanif, vice president of machine learning platforms at Capital One. And while Hanif works in a regulated industry—financial services—requiring additional levels of governance and security, he says that every business adopting ML should take the opportunity to examine its security practices.
Devon Rollins, vice president of cyber engineering and machine learning at Capital One, adds, “Securing business-critical applications requires a level of differentiated protection. It’s safe to assume many deployments of ML tools at scale are critical given the role they play for the business and how they directly impact outcomes for users.”
Novel security considerations to keep in mind
While best practices for securing ML systems are similar to those for any software or hardware system, greater ML adoption also presents new considerations. “Machine learning adds another layer of complexity,” explains Hanif. “This means organizations must consider the multiple points in a machine learning workflow that can represent entirely new vectors.” These core workflow elements include the ML models, the documentation and systems around those models and the data they use, and the use cases they enable.
It’s also imperative that ML models and supporting systems are developed with security in mind right from the start. It is not uncommon for engineers to rely on freely available open-source libraries developed by the software community, rather than coding every single aspect of their program. These libraries are often designed by software engineers, mathematicians, or academics who might not be as well versed in writing secure code. “The people and the skills necessary to develop high-performance or cutting-edge ML software may not always intersect with security-focused software development,” Hanif adds.
According to Rollins, this underscores the importance of sanitizing open-source code libraries used for ML models. Developers should think about considering confidentiality, integrity, and availability as a framework to guide information security policy. Confidentiality means that data assets are protected from unauthorized access; integrity refers to the quality and security of data; and availability ensures that the right authorized users can easily access the data needed for the job at hand.
Additionally, ML input data can be manipulated to compromise a model. One risk is inference manipulation—essentially changing data to trick the model. Because ML models interpret data differently than the human brain, data could be manipulated in ways that are imperceptible by humans, but that nevertheless change the results. For example, all it may take to compromise a computer vision model may be changing a pixel or two in an image of a stop sign used in that model. The human eye would still see a stop sign, but the ML model might not categorize it as a stop sign. Alternatively, one might probe a model by sending a series of varying input data, thus learning how the model works. By observing how the inputs affect the system, Hanif explains, outside actors might figure out how to disguise a malicious file so it eludes detection.
Another vector for risk is the data used to train the system. A third party might “poison” the training data so that the machine learns something incorrectly. As a result, the trained model will make mistakes—for example, automatically identifying all stop signs as yield signs.