Known security risks exacerbated by AI
As
part of our AI auditing framework blog series, Reuben Binns, our Research Fellow in
Artificial Intelligence (AI), Peter Brown, Technology Policy Group Manager, and
Valeria Gallo, Technology Policy Adviser, look at how AI can exacerbate known
security risks and make them more difficult to manage.
This blog
forms part of our ongoing consultation on developing the ICO framework for
auditing AI.
data must always be processed in a manner that ensures appropriate levels of security
against unauthorised processing, accidental loss, destruction or damage.
“one-size-fits-all” approach to security. The appropriate security measures
organisations should adopt depend on the level and type of risks that arise
from specific processing activities. Using AI to process any personal data will
have important implications for an organisation’s security risk profile, which need
to be assessed and managed carefully.
be triggered by the introduction of new types of risks, eg adversarial attacks
on machine learning models, which we will examine in future blogs. In this post
we will focus on the way AI can adversely affect security by making known risks
worse and more challenging to control.
security is a key component of our AI Auditing Framework, but is also central
to our work as the information rights regulator. The ICO is planning to expand
its general security guidance to take into account the additional requirements
set out in the new General Data Protection Regulation (GDPR). While this
guidance will not be AI-specific, it will cover a range of topics that are
relevant for organisations using AI, including software supply chain security
and increasing use of open-source software.
to hear your views on this topic so we can integrate them into both the
framework and the guidance. We encourage you to use the comments section below,
or to email us, to share your thoughts on AI related security
challenges, best practices, and any additional guidance you would like the ICO
to issue.
in AI vs. traditional technologies
of the unique characteristics of AI mean compliance with security requirements
can be more challenging than with more established technologies, both from a
technological and human perspective.
a technological perspective AI systems introduce new kinds of complexity not
found in the IT systems most organisations will have dealt with previously.
They are also likely to rely heavily on third party code or relationships, and
will need to be integrated with several other new and existing IT components, which
are also intricately connected. This complexity may make it more difficult to
identify and manage some security risks, and may increase others, such as the
risk of outages.
a human perspective, the people involved in building and deploying AI systems
are likely to have a wider range of backgrounds than usual, including traditional
software engineering, systems administration, data scientists, statisticians,
as well as domain experts. Security practices and expectations may vary
significantly, and for some there may be less understanding of broader security
compliance requirements. Security of personal data may not always have been a
key priority, especially if someone was previously building AI applications
with non-personal data or in a research capacity.
practices about how to process data securely in data science and AI engineering
are still developing, which causes further complications.
is not possible to list all known security risks that might be exacerbated when
AI is used to process personal data. The impact of AI on security will depend
on the way the technology is built and deployed, the complexity of the
organisation, and the strength and maturity of the existing risk management
capabilities.
following hypothetical scenario should raise awareness of some of the known
security risks that AI can exacerbate and some of the challenges.
key message for organisations is: review
risk management practices to ensure personal data is secure in an AI context.
scenario: AI in recruitment
recruitment firm decides to use an AI system based on machine learning (ML) to match
CVs to job descriptions automatically, rather than through a manual review. The
AI system will select the best candidates to be forwarded to potential employers
for consideration. To make a recommendation, the AI system will process the job
descriptions, personal data provided by the candidates themselves, and data provided
by the employers about previous hiring decisions for similar roles.
Risk example #1 – Losing track of training data
systems require large sets of training and testing data to be shared. In the
example above, for the AI system to be effective, employers will need to share
data about similar previous hiring decisions (e.g. sales manager) with the
recruitment firm.
some sharing of personal data (e.g. candidates’ CVs) would have taken place while
the CV scanning process was manual, it did not involve the transfer of large
quantities of personal data between the employers and the recruitment firm.
questions about the legal basis for the processing, sharing this additional data
could involve creating multiple copies, in different formats stored in
different locations (see below), which require important security and
information governance considerations:
- The
employer may need to copy HR and recruitment data into a separate database
system to interrogate and select the data relevant to the vacancies the
recruitment firm is working on. - The
selected data subsets will need to be saved and exported into files, and then transferred
to the recruitment firm in compressed form. - Upon
receipt the recruitment firm could upload the files to a remote location, eg
the cloud. - Once
in the cloud, the files may be loaded into a programming environment to be
cleaned and used in building the AI system. - Once ready, the data is likely to be saved
into a new file to be used at a later time.
recruitment firm and employers, this will increase the risk of a data breach, including
unauthorised processing, loss, destruction and damage.
training data will need to be shared, managed, and when necessary deleted in
line with security policies. While many recruitment firms will already have
information governance and security policies in place, these may no longer be fit-for-purpose
once AI is adopted, and should be reviewed and, if necessary, updated.
teams should record and document all movements and storing of personal data from
one location to another. This will help organisations apply the appropriate security
risk controls and monitor their effectiveness. Clear audit trails are also necessary
to satisfy accountability and documentation requirements.
addition, any intermediate files containing personal data, eg compressed versions
of files created to transfer data between systems, should be deleted as soon as
it is no longer required.
on the likelihood and severity of the risk to data subjects, organisations may
also need to apply de-identification techniques to training data before it is
extracted from its source and shared internally or externally.
example, the employers may need to remove certain features from their HR data,
or apply privacy enhancing technologies (PETs) like differential privacy,
before sharing it with the recruitment firm.
more on these techniques, see our Anonymisation
Code of Practice and future blog posts on data minimisation. New guidance on Anonymisation will
also be published soon.
Risk example #2 – Security risks introduced by externally maintained software used to build AI systems
few organisations build AI systems entirely in-house. In most cases, the design,
building, and running of AI systems will be provided, at least in part, by
third parties that the organisation may not always have a contractual
relationship with.
if an organisation hires its own ML engineers, they may still rely
significantly on third-party frameworks and code libraries. In fact, many of
the most popular ML development frameworks are open source.
third-party and open source code is a valid option. Developing all software components
of an AI system from scratch requires a large investment of time and resources
that many organisations cannot afford, and especially compared to open source
tools, would not benefit from the rich ecosystem of contributors and services
built up around existing frameworks.
one important drawback is that these standard ML frameworks often depend on
other pieces of software being already installed on an IT system. To give a
sense of the risks involved, a recent study found the most
popular ML development frameworks include up to 887,000 lines of code and rely
on 137 external dependencies. Therefore implementing AI will require changes to
an organisation’s software stack (and possibly hardware) that may introduce additional
security risks.
example, let’s say the recruitment firm above hired an ML engineer to build the
automated CV filtering system using a Python-based ML framework. The ML framework
depends on a number of specialist open-source programming libraries, which needed
to be downloaded on the firm’s IT system.
of these libraries, contains a software function to convert the raw training
data into the format required to train the ML model. It is later discovered the
function has a security vulnerability. Due to an unsafe default configuration,
an attacker introduced and executed malicious code remotely on the system by
disguising it as training data.
is not a far-fetched example, in January of 2019, such a vulnerability
was discovered in ‘NumPy’, a popular library for the Python programming
language used by many machine learning developers.
AI systems are built in house, externally, or a combination of both, they will
need to be assessed for security risks. As well as ensuring the security of any
code developed in-house, organisations need to assess the security of any externally
maintained code and frameworks.
ICO has already produced some guidance
on managing security of internal and external code in the related context of
online services. This includes external code security measures, such as subscribing
to security advisories to be notified of vulnerabilities, and internal code
security measures, such as coding standards and source code review. The same or
similar measures will apply to AI applications. However as we mentioned at the
beginning, the ICO is developing further security guidance, which will include
additional recommendations for the oversight and review of externally
maintained source code, as well as its implications for security and data protection
by design.
addition however, organisations developing ML systems can further mitigate security
risks associated with third party code, by separating the ML development
environment from the rest of their IT
infrastructure where possible.
achieve this are:
- Use ‘virtual machines’
or ‘containers’
– emulations of a
computer system that run inside, but isolated from the rest of the IT system. These can be pre-configured specifically for
ML tasks. In our recruitment example, if the ML engineer had used a virtual
machine, then the vulnerability could have been contained. - Many ML systems are developed using programming
languages that are well-developed for scientific and machine learning uses, like
Python, but are not necessarily the most secure. However, it is possible to
train an ML model using one programming language (eg Python) but then, before
deployment, convert the model into another language (eg Java) that makes making
insecure coding less likely. To
return to our recruitment example, another way the ML engineer could have
mitigated the risk of a malicious attack on CV filtering model, would have been
to convert the model into a different programming language prior to deployment.
this topic and welcome any feedback on our current thinking. In particular, we
would appreciate your insights on the following questions:
- How and to what degree are organisations currently inspecting externally maintained software code for potential vulnerabilities?
- Are there any other well-known security risks which AI is likely to exacerbate? If so, which ones and what effect will AI have?
- What should any additional ICO security guidance cover?
Dr Reuben Binns, a researcher working on AI and data protection, joined the ICO on a fixed term fellowship in December 2018. During his two-year term, Reuben will research and investigate a framework for auditing algorithms and conduct further in-depth research activities in AI and machine learning. |
Valeria Gallo is currently seconded to the ICO as a Technology Policy Adviser. She works with Reuben Binns, our Artificial Intelligence (AI) Research Fellow, on the development of the ICO Auditing Framework for AI. Prior to her secondment, Valeria was responsible for analysing and developing thought leadership on the impact of technological innovation on regulation and supervision of financial services firms.
|
Peter Brown, Group Manager (Technology Policy), leads a team of specialists providing technology expertise to all ICO departments, supporting the ICO’s broad range of activities and working towards the goals of the ICO’s Technology Strategy.
|