After exploding onto the scene in 2022 and beginning to embed itself in businesses in 2023, generative artificial intelligence (AI) is poised to define the future landscape of AI in 2024. As researchers and enterprises strive to integrate this groundbreaking technology into everyday life, the year ahead promises to be pivotal.
Much like the evolution of computers, generative AI has rapidly progressed from massive mainframes to accessible personal machines. In 2023, we witnessed the emergence of increasingly efficient foundation models with open licenses, such as Meta's LlaMa family and others like StableLM and Falcon. These models, enhanced by community-developed techniques and datasets, rival even the most powerful proprietary models on many benchmarks.
While media attention often focuses on the capabilities of cutting-edge models, the most significant developments in 2024 may lie in governance, middleware, training techniques, and data pipelines. These advancements aim to make generative AI more trustworthy, sustainable, and accessible for both enterprises and end users.
Discover the Latest AI Trends for the Upcoming Year
When generative AI first entered the mainstream, business leaders relied heavily on marketing materials and sensationalized news coverage for their understanding. Limited hands-on experience typically involved experimenting with tools like ChatGPT and DALL-E. Now, as the initial excitement settles, the business community has developed a more nuanced perception of AI-powered solutions.
Gartner's Hype Cycle places Generative AI at the "Peak of Inflated Expectations," poised for a transition into the "Trough of Disillusionment." Deloitte's Q1 2024 report suggests that many leaders anticipate significant transformative impacts in the short term. The reality likely lies somewhere in between: while generative AI presents unique opportunities, it won't fulfill every expectation.
Comparing real-world outcomes to hype is subjective. While standalone tools like ChatGPT capture public attention, seamless integration into existing services often ensures longevity. Tools like Google's "Smart Compose" feature, introduced in 2018, quietly paved the way for today's text-generating services. Many impactful generative AI tools are integrated elements of enterprise environments, enhancing existing tools rather than replacing them—such as Microsoft Office's "Copilot" features or Adobe Photoshop's "Generative Fill."
The trajectory of AI adoption hinges more on its integration into everyday workflows than on hypothetical capabilities. According to an IBM survey of over 1,000 enterprise employees , key drivers of AI adoption include advances in accessible AI tools, cost reduction, process automation, and the integration of AI into standard business applications.
With the ambition of state-of-the-art generative AI on the rise, the next wave of advancements is poised to revolutionize the field. Beyond simply improving performance within specific domains, the focus now shifts to multimodal models capable of processing various types of data inputs. While models operating across different modalities are not entirely new—such as CLIP for text-to-image and Wave2Vec for speech-to-text—they've historically operated in one direction and were trained for specific tasks.
The emerging generation of interdisciplinary models, including proprietary ones like OpenAI's GPT-4V and Google's Gemini, as well as open source alternatives like LLaVa, Adept, or Qwen-VL, breaks these boundaries. They seamlessly transition between natural language processing (NLP) and computer vision tasks. Moreover, new models like Google's Lumiere introduce video capabilities, enabling text-to-video and image-to-video tasks, or using images as style references.
Multimodal AI offers immediate benefits, enabling more intuitive and versatile AI applications and virtual assistants. Users can now receive natural language answers to image queries or receive visual aids alongside step-by-step text instructions for spoken repair queries.
At a deeper level, multimodal AI enriches training and inference by processing diverse data inputs, particularly video, which offers holistic learning opportunities. According to Peter Norvig, Distinguished Education Fellow at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), the constant stream of unfiltered, unintentional data from 24/7 surveillance cameras presents a wealth of information previously untapped by AI models.
This broader data scope promises a deeper understanding of the world for AI models.
1. Language Models and Open Source Advancements
In the realm of domain-specific models, especially Large Language Models (LLMs), the pursuit of larger parameter counts may be yielding diminishing returns. Sam Altman, CEO of OpenAI, hinted at this during MIT's Imagination in Action event last April, suggesting a shift towards improving models in other ways. While massive models have propelled the AI revolution, they come with significant drawbacks. Only large companies can afford the resources and energy consumption required to train and maintain these behemoth models.
On the other hand, smaller models offer a more efficient alternative. Research from Deepmind in March 2022 showed that training smaller models on more data often outperforms larger models trained on less data. Recent innovations in LLMs, particularly those built upon foundation models like LLaMa, Llama 2, and Mistral, demonstrate that downsizing models doesn't necessarily sacrifice performance.
The momentum of open models continues to grow. Mistral's release of "Mixtral" in December 2023, boasting 8 neural networks with 7 billion parameters each, outperforms larger models like Llama 2 and even competes with OpenAI's GPT-3.5 on many benchmarks at faster inference speeds. Meta's announcement of training Llama 3 models, slated for open sourcing, further underscores the trend towards accessible, high-performance AI models.
Advancements in smaller models offer three key advantages:
GPU Deficits and Cloud Costs
The shift towards smaller models is being propelled by both necessity and entrepreneurial drive, as rising cloud computing costs coincide with dwindling hardware availability.
According to James Landay, Vice-Director and Faculty Director of Research at Stanford HAI, increasing demand for AI capabilities among major companies is leading to a surge in GPU procurement. This surge not only highlights the need for expanded GPU production but also prompts innovators to develop more cost-effective and user-friendly hardware solutions.
Currently, cloud providers shoulder much of the computing load, as few AI adopters maintain their own infrastructure. However, hardware shortages are exacerbating the challenges and costs associated with setting up on-premise servers. In the long run, this may drive up cloud costs as providers upgrade and optimize their infrastructure to meet the demands of generative AI.
Amidst this uncertainty, enterprises must remain adaptable. This includes embracing smaller, more efficient models when necessary and larger, high-performance models when feasible.
Model Optimization is Getting Easy Accessible
The trend towards maximizing the performance of compact models receives significant support from recent contributions by the open-source community.
Key advancements stem not only from new foundation models but also from novel techniques and resources, such as open-source datasets, for training, fine-tuning, and aligning pre-trained models. Notable techniques gaining traction in 2023 include:
These advancements, coupled with parallel developments in open-source models ranging from 3 to 70 billion parameters, have the potential to democratize AI by granting smaller players, such as startups and amateurs, access to sophisticated AI capabilities previously beyond their reach.
Tailored Local Models and Data Pipelines
In 2024, enterprises have the opportunity to differentiate themselves by developing custom AI models tailored to their specific needs, rather than relying on prepackaged solutions from major AI providers. Leveraging existing open-source AI models and tools, coupled with the right data and development framework, organizations can adapt these models to address a wide range of real-world scenarios, from customer support to supply chain management to document analysis.
Open-source models offer a cost-effective means for organizations to create powerful custom AI models, trained on proprietary data and refined for precise requirements. This approach is particularly advantageous in specialized industries like legal, healthcare, and finance, where unique terminologies and concepts may not be covered in pre-trained models.
Industries such as legal, finance, and healthcare can also benefit from locally deployable models that run efficiently on modest hardware. Keeping AI processes local mitigates the risk of sensitive data exposure to third parties, ensuring data privacy and security. Additionally, utilizing techniques like augmented generation (RAG) to access relevant information externally reduces model size, enhancing speed and cost-effectiveness.
As 2024 progresses, the focus on proprietary data pipelines for fine-tuning will increasingly shape competitive advantage in the AI landscape.
More Dynamic Virtual Agents
Armed with refined tools and a year's worth of market insights, businesses are poised to broaden the scope of virtual agents beyond basic customer service chatbots.
As AI systems evolve and incorporate diverse streams of information, they not only enhance communication and instruction-following but also expand into task automation. According to Stanford's Norvig, while 2023 saw the emergence of chat-based AI interactions, 2024 will witness virtual agents performing tasks on behalf of users, such as making reservations, planning trips, and integrating with other services.
Multimodal AI, in particular, offers extensive opportunities for seamless interaction with virtual agents. For instance, instead of merely requesting recipes, users can now utilize a camera to scan their fridge's contents and receive recipe suggestions based on available ingredients. Initiatives like Be My Eyes are piloting AI tools that enable users, particularly those with visual impairments, to interact directly with their surroundings using multimodal AI, reducing reliance on human assistance.
Moving Forward
As we navigate through a pivotal year in artificial intelligence, staying abreast of emerging trends is vital for optimizing opportunities, mitigating risks, and ethically scaling the adoption of generative AI. Explore how Xpatech can empower your AI journey today