OpenAIs AI Model: Enhanced Trustworthiness, Increased Vulnerability
Image Source:
AI + Machine Learning
Oct 18, 2023 9:00 AM

OpenAIs AI Model: Enhanced Trustworthiness, Increased Vulnerability

by HubSite 365 about Microsoft

Software Development Redmond, Washington

External Blog Post
Pro User

AI + Machine Learning

Microsoft backs research revealing OpenAIs GPT-4 as a more trustworthy yet easier-to-trick AI model.

OpenAI's leading AI model, backed by Microsoft, has seen developments in its trustworthiness but also susceptibility to manipulation according to fresh research. The AI model known as GPT-4 received a better trustworthiness index than its predecessor due to improved privacy protection and lesser toxic outcomes. The model outshone in safeguarding against skewed attacks. However, the findings also discovered a heightened vulnerability to user manipulation leading to potential privacy breaches.

The research consortium, comprising experts from Stanford University, University of Illinois Urbana-Champaign, Center for AI Safety, University of California, Berkeley, and Microsoft Research, emphasized that the vulnerabilities are not present in consumer-focused GPT-4-based applications, which form a substantial part of Microsoft’s present product suite. The research further clarified that delivered AI applications employ several mitigation methods to prevent potential damages occurring at the model tier of the technology.

The researchers gauged the model’s trustworthiness on multiple facets including privacy, stereotypes, machine ethics, toxicity, robustness against adversarial tests, and fairness. These tests involved the application of standard prompts on GPT-3.5 and GPT-4, followed by prompts designed to explore potential biases and breaches in content policy. The final step included attempts to trick the models into ignoring security measures. According to the research team, the aim is to propel the research community to utilize, develop, and potentially avert malicious actions.

In an important communication, the OpenAI team got informed about the research.

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Microsoft Research Blog

AI-based models indeed represent a significant innovation

AI-based models indeed represent a significant innovation that promises to reshape the future of technology. However, the areas of privacy and security stay central to the utilization and widespread acceptance of such models. It is, thus, important to consistently evaluate these technologies on parameters of trustworthiness, feasibility, and security. The latest work on OpenAI's AI model, GPT-4, sheds light on these aspects, highlighting not only its increased trustworthiness over its predecessor GPT-3.5 but also revealing newfound vulnerabilities, emphasizing the importance of the ongoing evolution of AI technologies.

Considering the adaptable nature of AI, mitigation strategies to guard against potential threats are indispensably integral to the development process. To ensure safe deployment of AI, active collaboration within the research community is decisive. This can help preempt harmful actions and contribute to constructing powerful, reliable models. The research demonstrates the necessity of comprehensive evaluations, continual vigilance, and coordinated efforts to foster advanced, yet responsible AI applications.

AI + Machine Learning - OpenAIs AI Model: Enhanced Trustworthiness, Increased Vulnerability

More trustworthy but easier to trick

The breakthrough AI model by OpenAI, GPT-4, has been lauded as more reliable, yet more vulnerable to manipulation than its earlier version, GPT-3.5. A study funded by Microsoft found that GPT-4 can safeguard private data, prevent biased results, and resist adversarial attacks more effectively. However, the model can also be misled into ignoring safety precautions and exposing personal data and conversation histories. The study discovered that users can circumvent protection around GPT-4, as the AI can easily be influenced by misleading commands.

The researchers tested these weaknesses with consumer-facing GPT-4-based products - the majority of Microsoft's current offerings. The products did not exhibit these vulnerabilities owing to the use of various strategies to prevent harm that could occur at the model level of the technology.

To gauge the trustworthiness of the AI model, several factors were evaluated, such as toxicity, stereotypes, privacy, machine ethics, fairness, and strength at resisting adversarial tests. The researchers employed multiple prompts, including words that could have been banned, to push the model to flout its content policy restrictions without manifestly discriminating against specific groups. The final test was to bait the models into disregarding the safeguards altogether.

The results of the research were shared with the OpenAI team, underlining that the trustworthiness assessment is only a starting point. The research team hopes to collaborate with other investigators to build on these findings and create stronger and more trustable models in the future. The benchmarks were also published publicly for others to replicate the research findings.

Artificial intelligence models like GPT-4, often undergo red teaming, a process in which developers test multiple prompts to verify if they produce undesired results. OpenAI's CEO, Sam Altman, admitted that despite these precautions, GPT-4 is still flawed and limited. The Federal Trade Commission is currently examining OpenAI for potential consumer harm, such as spreading false facts.

The research group from the University of Illinois Urbana-Champaign, Stanford University, University of California, Berkeley, AI safety center, and Microsoft Research created a thorough trustworthiness evaluation platform for large language models (LLMs). The recent paper, 'DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models', focuses on GPT-4 and GPT-3.5. The paper takes into account diverse factors like toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness.

The research documented some unidentified issues related to trustworthiness. For instance, it is possible to mislead GPT models into generating toxic and biased outputs and revealing private information. The study also found that, although GPT-4 is generally more trustable than GPT-3.5, it is more susceptible when presented with intentionally harmful prompts designed to bypass the security measures of LLMs.

The research offers a comprehensive trustworthiness evaluation of GPT models and highlights the gaps in trust issues. The team's benchmark is publicly available to encourage further research and to enable preemptive actions against exploitation by adversaries. The group has aimed to create a platform that others in the research community can build upon to create stronger and more reliable models for the future.

Following their research, the team has shared their findings with both OpenAI and product groups at Microsoft. It was confirmed that the potential weaknesses identified do not affect current customer services. This is attributed to the mitigation strategies that finished AI applications use to address potential harms at the model level of the technology. We expect continued collaboration in the research community to build upon these findings and make the models more trustable and powerful going forward.


OpenAI model, Trustworthy AI, AI trick, Advanced AI, OpenAI AI updates, AI deception, AI ethics, Reliable AI, Artificial Intelligence OpenAI, AI Improvement.