5 September 2024

Government test shows humans far outperforming AI because they understand the task better

5 September 2024 | Chris Johnson

Artificial intelligence didn’t quite get the task assigned to them in a targeted government test, with humans doing a much better job.

A government trial of artificial intelligence has shown it was quite poor at summarising public submissions when compared to how humans do the work.

As the Federal Government continues to stress that its workforce must learn how to embrace the responsible use of AI, a targeted trial at the Australian Securities and Investments Commission (ASIC) has delivered results that are, to say the least, very interesting.

Submitted summaries were marked by reviewers who were unaware AI played any role in the summaries at all, with the outcome being that human work was graded roughly twice as highly as the AI papers.

Across all criteria, humans far outperformed AI.

A report of the test concluded that the use of AI could potentially create unnecessary workloads due to the greater need for fact-checking.

Amazon Web Services (AWS) conducted a test for ASIC to assess the capability of generative AI to summarise a sample of public submissions made to an external parliamentary joint committee inquiry.

Meta’s Llama2-70B model was prompted to focus on ASIC references and explain what they meant while summarising the submissions.

ASIC staff were given the same task, using identical directions and prompts.

The reviewers found wrong information, lack of nuance, misplaced context and overlooked emphases in some of the summaries they were marking – which turned out to be AI-generated work.

The inquiry was looking into audit and consultancy firms, and the test that was conducted on submissions to it is what AWS called a Proof of Concept (PoC).

When ASIC officials appeared in May before the Senate Committee on Adopting Artificial Intelligence, they were asked about the trial and its results.

ASIC officials took a question on notice and, after being pressed by Greens Senator David Shoebridge and independent Senator David Pocock, committed to tabling the report.

With that report now delivered, AWS stressed the PoC was not used for any of ASIC’s regulatory work or business purposes.

The report does raise, however, a number of red flags.

“The objectives of the PoC were to explore and trial Gen AI technologies, to focus on measuring the quality of the generated output rather than the performance of the models and to understand the future potential for business use of Gen AI,” the report states.

“The final assessment results of the PoC showed that out of a maximum of 75 points, the aggregated human summaries scored 61 (81 per cent), and the aggregated Gen AI summaries scored 35 (47 per cent).

“Whilst the Gen AI summaries scored lower on all criteria, it is important to note the PoC tested the performance of one particular AI model (Llama2-70B) at one point in time.

“The PoC was also specific to one use case with prompts selected for this kind of inquiry.

“In the final assessment, ASIC assessors generally agreed that AI outputs could potentially create more work if used (in current state), due to the need to fact check outputs, or because the original source material actually presented information better.

“The assessments showed that one of the most significant issues with the model was its limited ability to pick up the nuance or context required to analyse submissions.”

In its key observations, the report noted that to a human, the request to summarise a document appears straightforward, whereas AI had difficulty with the task consisting of several different actions “depending on the specifics of the summarisation request”.

In the PoC, the summarisation work was achieved by a series of discrete tasks, with the selected AI model found to perform strongly with some actions and less capably with others.

The report found that generic prompting without specific directions or considerations resulted in lower-quality output compared to specific or targeted prompting.

And it suggests that things can only get better.

“An environment for rapid experimentation and iteration is necessary, as well as monitoring outcomes,” the report states.

“Technology is advancing rapidly in this area. More powerful and accurate models and Gen AI solutions are being continually released, with several promising models released during the period of the PoC. It is highly likely that future models will improve performance and accuracy of the results.

“The PoC provided valuable learnings, demonstrating the current capabilities of Llama2-70B as well as the potential for growth.

“Although there are opportunities for Gen AI, particularly as the technology continues to advance, this PoC also found limitations and challenges for adopting Gen AI for this specific use case.”

Join the conversation

All Comments

All Comments
Website Comments
Facebook Comments

LatestOldest

Mary Taylor8:16 pm 05 Sep 24

It’s not about “understanding better”, it’s about “understanding at all”. LLMs are incapable of comprehension in any way.

Featured Businesses

Community Clubs

Canberra Southern Cross Club

We're proud to give you a place where friends and family can come together for good food and great entertainment.

Find out more

Mortgage Brokers

Clarity Home Loans

A passionate team of Canberrans helping other Canberrans secure their home loans. No frills, no commissions, no brainer.

Find out more

Developments

Ginninderry

A sustainable community of international significance in the Capital Region.

Find out more

Insurance Brokers

allinsure

Allinsure has been a trusted insurance advisory to thousands of Australian business owners for almost 20 years.

Find out more

Commercial Real Estate Agents

Raine & Horne Commercial

Your business and property is important to us.

Find out more

Professional Services

RSM

We share skills, insight and resources, as well as a client-centric approach that's based on a deep understanding of your business.

Find out more

Lawyers

BDN Lawyers

BDN has provided legal services to to Canberra, Queanbeyan and the region for over 160 years.

Find out more

Electricians

True Connection Electrical

Since 2014, True Connection Electrical have been servicing residential and small commercial clients with reliable, trustworthy and top-quality electrical work.

Find out more

View the best of Canberra

Australian, Trusted, Cleared and Experienced

Government test shows humans far outperforming AI because they understand the task better

Join the conversation

What's Trending

Featured Properties

Today's Poll

Featured Businesses

Community Clubs

Mortgage Brokers

Developments

Insurance Brokers

Commercial Real Estate Agents

Professional Services

Lawyers

Electricians

What's On

Sleeping Stories

Feast of Italy

Hands Across Canberra's Canberra Day Appeal Fun Run

National Museum of Australia: Pompeii Exhibition

Daily Digest