General-purpose AI poses some significant risks. How can we mitigate them?

In the past two articles (Potential Risks from General-purpose AI Systems: Part 1- Risks " and " Potential Risks from General-purpose AI Systems- Part II: Systemic Risks), we discussed the risks of general-purpose AI. These risks are clear and significant, and some people may already be experiencing them. How about we look at some mitigation strategies? It is challenging to manage general-purpose AI — risk management in this context involves identifying, assessing, mitigating, and monitoring the risks.

Why is risk management for general-purpose AI complex?

1. Broad use case for general-purpose AI.

The uses of general-purpose AI are broad. While you are using it to find a good recipe for a novel pancake, a medical student uses it to explain a diagnosis, and a physicist uses it to break down a Newtonian theory. This wide range of use cases, including the generation of video or simulation products, makes it difficult to comprehensively anticipate the relevant use cases, identify the risks, or test how the system will behave in real-world circumstances. It is difficult to determine what the user will use the general-purpose AI for and help address the associated risks.

2. There is little or no Model explainability.

Developers still understand little about how the general-purpose AI models operate. When it is challenging to tell how the model operates, predicting behavioral issues or resolving unknowns when they arise becomes difficult. The shift from traditional programming has made these models more elusive. General-purpose AI models are trained on large volumes of data, making it challenging to scrutinize their inner workings. While this can be increased with model explanation and interpretability techniques, the research into the two concepts remains nascent.

3. AI agents

AI agents, which are general-purpose systems that can autonomously act, plan, delegate, and achieve goals, pose significant new challenges for risk management. AI agents use general software to search, schedule, and program to accomplish their tasks. As they become increasingly useful across many sectors and industries, they may also exacerbate several risks. One possible challenge is that users might not always know what their AI agents are doing. This potential to act outside anyone’s control could make it easy for attackers to hijack agents and instruct them to perform malicious actions. Additionally, there is a chance that, with everyone using AI agents, they will interact with one another, creating new, complex risks. Minimum approaches have been developed to manage risks associated with AI agents.

4. Evidence dilemma

One of the significant issues with addressing general-purpose AI risks is the pace of advancement in its uses and capabilities. The most evident one is how fast academic cheating using general-purpose AI has shifted from negligible to widespread. Before measures could be implemented to regulate general-purpose AI for academic cheating, almost everyone could access and use it for those tasks. As long as evidence for risk remains incomplete, decision-makers cannot know whether the risk will emerge or has already emerged; this is known as an evidence dilemma. This creates a trade-off, though: implementing preemptive or early mitigation measures might prove unnecessary, while waiting for conclusive evidence could leave people and society vulnerable to risks. This can be reduced by implementing early warning systems and risk management frameworks that lessen the dilemma. The EW and risk framework in two ways: triggering specific mitigation measures when there is new evidence of risks or requiring developers to provide evidence of safety before releasing a new model.

5. Information gap.

AI companies do not release or share information about their general-purpose AI systems, especially before release. This limits the information available to policymakers and risk managers in non-industry research and governments in implementing risk mitigation strategies. Companies cite commercial and safety concerns as reasons to limit information sharing; however, this information gap limits risk management by other actors.

This could also be examined in relation to the competitive pressure on AI companies and governments, which forces them to focus on developing systems rather than prioritizing risk management. The competition forces the players to focus more on investment in other resources than on risk management.

Nonetheless, we have techniques and frameworks for managing risks posed by general-purpose AI.

Companies and regulatory groups can use the existing methods, techniques, and frameworks to identify and assess risks. We also have methods for mitigating and monitoring these risks.

1. Assessing general prose AI systems for risks

This approach relies on spot checks, i.e., testing the behavior of general-purpose AI in specific situations. This makes it severely limited. Remember what we said about the evidence dilemma? If we cannot conceptualize the risks, we cannot assess them. The method can help surface potential hazards of the models before they are deployed. However, some of these hazards may be missed or over- or under-estimated. Additionally, there is a difference between real-world conditions and test conditions; users, if they choose to, can find new ways to misuse or expose themselves to unknown risks when using general-purpose AI.

2. Evaluators need substantial expertise, resources, and sufficient access to relevant information

If we want risk identification and assessment to be effective, the evaluators require substantial expertise and resources. They also need to have access to relevant information. Rigorous assessment of risks requires combining multiple evaluation approaches. The evaluators need to understand different facets of user needs before testing for edge cases and novel risks. Evaluators need more time and direct access to the models and training data. This means the evaluators would be best placed within the company to access the data and information on the technical methodologies used to train the models. Since companies would not provide such information and data to outsiders, the evaluators and ethics boards should be in-house.

3. Training general-purpose AI to function more safely.

Despite the investment in training methods focusing on user safety, no current method can reliably prevent even overtly unsafe outputs. One method that focuses on exposing models to conditions that cause them to misbehave or fail during training is adversarial training. The aim is to build resistance to these cases. Regardless, adversaries still find new ways to attack or circumvent safeguards with low-to-moderate effort. The data and user feedback used to train the models are unreliable and imperfect, leading the models to mislead users on difficult questions and make errors. There is promise, however, in methods that use AI to detect misleading behavior.

4. Monitoring techniques

This entails identifying risks and evaluating a model's performance in use. The process also involves different interventions to prevent harmful actions. This improves the safety of a general-purpose AI after it is deployed to users. Current monitoring strategies monitor system performance and identify potentially harmful inputs/outputs. However, skilled or moderately skilled users can circumvent the safeguards. This creates significant research areas around using hardware-enabled mechanisms to monitor and, more effectively, prevent these circumnavigations.

5. Safeguarding privacy

One of the biggest concerns when using general-purpose AI is how it treats private data. While people might do as much as possible to avoid sharing private information when interacting with general-purpose AI, the breadcrumbs they share can likely be used to create a user profile. Multiple methods help safeguard privacy across the AI lifecycle to prevent this risk. These methods include removing sensitive information from training data and controlling how much information is learned from it, such as through differential privacy and confidential computing. However, many privacy-enhancing methods from other research fields are not yet applicable to general-purpose AI systems due to the computational requirements of AI systems.

Conclusion.

It is possible to mitigate some of the risks associated with general-purpose AI frameworks, policies, and approaches. However, we still have a long way to go before all the conceivable dangers are mitigated or managed effectively. As a result, there is a gap in research to come up with better and more effective methods to address risks in general-purpose AI, especially when the growth rate is as fast as it is. In the following article, we will explore various frameworks proposed to support AI risk management. These frameworks include the NIST AI risk management framework and the EU AI Act.