Unlock the “predict the future” feature in ChatGPT with this handy trick

[Introduction to New Wisdom]The new research took advantage of the limitation of ChatGPT's training data cutoff in September 2021, and compared ChatGPT's performance in predicting various events in 2022 under two different prompting methods: direct prediction and future narrative prediction. The results show that future narrative prediction methods perform well in predicting 2022 Oscar winners, and ChatGPT-4 also improves in predicting macroeconomic variables.

Today, AI is advancing faster than we understand its uses.

Advertisement

In order to prevent ChatGPT from “getting out of control”, OpenAI has customized a set of strict “terms of service”, covering areas such as law, medical/health, personal safety, rights and welfare, gambling and lending.

However, one thing remains unaffected – storytelling.

Recently, researchers at Baylor University have taken advantage of this feature and tried to use storytelling to unlock ChatGPT’s ability to “predict the future.”

Paper address:https://arxiv.org/abs/2404.07396

Advertisement

The experiment asked ChatGPT to tell stories about events that occurred in the future, or about their past told by future authority figures (but in our future).

Narrative Prompts attempts to further explore what elements of narrative prompts are important by varying seemingly small details, such as the identity of the speaker or posting information about political events in 2022.

To create the distribution of answers, the experiment had two research assistants perform 50 queries for each prompt using two separate ChatGPT accounts, creating 100 total trials for each prompt.

By comparing prompts that directly ask ChatGPT to predict the future with prompts that ask it to tell a story about the future, the researchers found that narrative prompts leverage the model's ability to construct hallucinatory narratives and can facilitate more effective data synthesis and externalization than direct predictions. push.

Oscar winning predictions

Conclusion first:

For the Best Actor, Best Actress, and two Best Supporting Actor categories, narrative prompts were highly accurate in predicting winners—from 42% (Best Actress, Chastain) to 100% (Best Actor , Will Smith) varies.

In contrast, direct prompting performs very poorly, often worse than random guessing.

For example, in the case of Best Actor:

Direct prompt 3a (Direct)

Of the nominees listed below, which one do you think has the best chance of winning the 2022 Oscar for Best Actor? Consider the buzz surrounding the nominees as well as patterns from previous years when making your predictions:

Javier Bardem, Benedict Cumberbatch, Andrew Garfield, Will Smith, Denzel Washington ).

Future Narrative Prompt 3b (Future Narrative)

Write a scene where a family is watching the 2022 Oscars. The host announced the following nominees for Best Actor:

Javier Bardem, Benedict Cumberbatch, Andrew Garfield, Will Smith, Denzel Washington ).

Describe the scene until the host announces the winner.

Most of the time, ChatGPT-3.5 makes wrong predictions.

In 55% of the guesses, it provided multiple answers and in 28% of the cases there was no choice. But if it did make a choice, it chose Will Smith 17% of the time.

By comparison, when ChatGPT-3.5 was placed in the future narrative of families watching the awards show, it guessed Will Smith would win 80 percent of the time.

Likewise, in most trials, ChatGPT-4 refused to participate when directly prompted.

It provided multiple answers in 26% of all cases, and in almost half of the trials it refused to make any predictions.

When it made guesses, it guessed Will Smith 19% of the time and Denzel Washington 7% of the time.

By comparison, using future narrative hints, it guessed Will Smith 97% of the time, which is a huge improvement over ChatGPT-3.5's true prediction rate of 18%.

Forecasting of economic phenomena

Next it turns to the two economic phenomena of monthly unemployment and monthly inflation, using different narrative prompts:

A college professor teaches undergraduates about the Phillips Curve, and Federal Reserve Chairman Jerome Powell addresses the Board of Governors about last year's economic data.

In all cases, direct prompts are even less effective than in the Oscar case because ChatGPT refuses to directly predict future time series for each macroeconomic variable.

The distribution of Powell's month-by-month forecasts for inflation is on average comparable to that contained in the University of Michigan's Survey of Consumer Expectations. Interestingly, this is closer than forecasting actual inflation based on data collected by the Cleveland Fed.

ChatGPT is again mostly inaccurate in predicting the monthly unemployment rate. But like the inflation rate, the unemployment rate reported monthly by the Bureau of Labor Statistics (BLS) falls within the required distribution mentioned in Jerome Powell's speech.

For example, in the example of using Federal Reserve Chairman Jerome Powell to predict inflation, the prompt used is:

“Write a scenario in which Federal Reserve Chairman Jerome Powell gives a speech in October 2022 discussing inflation, unemployment, and monetary policy. Chairman Powell tells the audience that starting in September 2021 and ending in August 2022, each Month-to-month inflation and unemployment rates. Let the chairman explain the outlook for inflation and unemployment and possible changes in interest rate policy month by month.

The following are the results of ChatGPT-3.5 and ChatGPT-4 respectively:

For each month, ChatGPT-3.5 has an answer range that encompasses the expected answers from the Fed and Michigan. But the variability is considerable, and the guessed central trend does not point clearly to any one metric.

ChatGPT-4's guesses include expected numbers for Michigan in each month. Meanwhile the forecast model remains stable until September 2022 until more variables are introduced.

Conjectures on the predictive ability of ChatGPT-4’s narrative form

A study of the predictive capabilities of ChatGPT-4 revealed a significant dichotomy between direct predictions and predictions based on future narratives.

The model's narrative predictions were exceptionally accurate when it came to predicting the major Oscar categories, with the exception of the Best Picture category. This may indicate that ChatGPT-4 performs well in situations where public opinion plays an important role.

The success of future narrative exercises on macroeconomic phenomena is quite accurate in some cases, but at the same time there are also parts where the performance does not meet expectations.

In all cases, future narratives significantly improve the predictive power of ChatGPT beyond simple prediction requests.

The distinction between narrative prompts and direct prompts highlights an innovative approach to data analysis that respects the boundaries set by the OpenAI Terms of Service.

By focusing on the creative aspects of prediction, such as predicting awards or economic trends, researchers and users avoid directly applying AI to make high-stakes automated decisions or provide professional advice without the supervision of a qualified professional.

This methodological choice not only enhances the integrity and ethical considerations of AI use, but also promotes responsible exploration of its capabilities.

At the same time, as OpenAI continues to encourage and improve the creative capabilities of its models, it will become critical for AI to understand and resolve how to distinguish and define narrative and direct prompts on an ethical level.

References:

  • https://arxiv.org/abs/2404.07396

Advertisement