X

Artificial intelligence and data privacy

Turning a risk into a benefit…

One of the most important reasons business, especially consumer facing business, wants to have lots of data is to know as much about the market, us, as possible. Artificial intelligence (AI) has made that focus on customers more and more accurate. While business has been becoming more invasive, governments have begun to look at and pass regulations that begin to provide certain limits. Privacy matters to the electorate, and smart business looks at how to use data to find out information while remaining in compliance with regulatory rules.

Almost ten years ago, Target created an algorithm that figured out if people were pregnant based on purchase patterns, and the company then sent coupons to the addresses of those customers. That kind of predictive action was problematic, especially in an instance when a young woman hadn’t yet told her father she was pregnant, but mailed coupons informed him instead. The choice to send information is an ethical issue, one that business often handles badly. The more important question is what businesses can know about individuals. The EU’s GDPR and California’s CCPA are just the beginning. Neither is perfect and both will evolve, but privacy laws will expand. While each business must look at what to do with legally obtained data according to its own corporate ethics, all must pay attention to legal compliance issues.

One of the important areas where the issue of up-front privacy matters is the medical sector. This privacy matters, especially in the US, one of the few nations without universal healthcare and where protections against pre-existing conditions remain fragile. HIPAA (The Health Insurance Portability and Accountability Act) was passed in 1996, and it is the US foundation for defining personally identifiable information (PII) and creating limitations on how PII can be shared. While it has focused on how a company can share data with other medical industry firms, and who in a company can see the information, IT and development teams have often decided they are not part of the ruling. The developers claim to need all the data to make accurate systems.

That claim has led, over the last few decades to an evolving set of techniques that allow data to be manipulated in ways that protect PII still leaving data that is still statistically valid for analysis. Their complexity continues to increase, to handle both privacy and analysis, and are, fortunately, beyond the scope of this article.

Other sectors, such as finance, life sciences, and government have similar needs to protect PII as information is passed around internal organizations and between companies and governments.

AI has made the challenge both more addressable and more of a risk. The ability to train a deep learning (DL) system on large amounts of data has increased the speed of analysis and results, but the need for more and more data increases the risk of lack of privacy. In order to provide processes to handle that challenge in a reasonable time frame, software can again help.

Legal and compliance, meet the developers

Every mid- and large-sized company is likely to have legal and compliance teams to manage regulatory and contract risks. They are also involved in privacy issues. However, it’s no surprise that they speak a different language than do developers. Just like between German and Spanish, neither language is intrinsically better, they are different. How can we better translate?

The inclusion of legal and compliance teams also means an increased importance of the CxO suite. While all major systems have some involvement at that level, privacy requires heavy involvement for three primary reasons. First, multiple major groups are involved in the decision and use of privacy systems. One of the main purposes of the CxO level is to balance those competing needs and demands. Second, privacy and compliance, as mentioned above, are a direct reflection of a company’s ethics and business practices. Those are set at the CxO level. Third, privacy issues are a major business risk. Lack of a strong privacy policy and process can be a significant cost financially and in reputation, and can even kill a company.

In the fast moving world of modern data and business, that translation is critical. Privacy isn’t a technical issue, it’s not a legal issue, it’s a business issue. “Today legal and compliance teams manually manage the increasing complexity of data privacy regulations. Time spent managing this extensive process will create a huge bottleneck for businesses to get access to data, said Che Wijesinghe, CEO, Cape Privacy. “There is a technology gap for lawyers to be able to define and manage privacy policy efficiently. Cape provides an easy to use method to implement advanced privacy techniques for secure and trusted data sharing.”

Cape Privacy is a company that states its focus as providing software which integrates into a company’s existing data science and machine learning infrastructure, enabling all parties to work together on projects and policies. That is true startup speak, with an initial focus on the implementation of privacy policies at the development level. The company works to help programmers work to oversee privacy policies in the data and between applications. What its link to the legal and compliance groups is more rudimentary, there’s clearly a vision showing intent to improve collaboration as the solution grows.

Near-open systems matter

One key issue to developing such a system is that of which type to build. Throughout software, the cloud and the increased capacity for interoperation has mean stronger APIs and applications built from open components. A pure open systems won’t work in this any more than they’ve worked in the past. Unix didn’t leave academia until HP, IBM, and Sun bundled their versions of “open” with services and unique updates.

Companies such as Cape Privacy are leveraging open components, such as TensorFlow, to help developers. They’ve even created Cape Python, their own flavor of the coding language, to leverage knowledge and enhanced productivity. For you techies, much of what they provide is available on GitHub. “While open source had benefits across industries, it is critical for providing trust in privacy solutions,” said Katharine Jarmul, Head of Product, Cape Privacy. “The ability for regulatory agencies to view the code base and ensure the appropriate tools, and only the appropriate tools, are in use addresses legal and compliance concerns.”

On the higher end, they work to ensure that development is open in order to work on multiple cloud infrastructures, providing companies the ability to know that portability exists.

That openness is also why deep learning is not yet part of a solution. There is still not the transparency needed into the DL layers in order to have the trust necessary for privacy concerns. Rather, these systems aim to help manage information privacy for machine learning applications.

Artificial intelligence applications are not open, and can put privacy at risk. The addition of good tools to address privacy for data being used by AI systems is an important early step in adding trust into the AI equation.

David A. Teich