Coursera-IBM-Fundamentals of Building AI Agents

Summary

Limitations of Monolithic Models

Large language models (LLMs) are powerful but limited by their training data. They lack access to personal or real-time information (e.g., your vacation balance) and are difficult to adapt.

Compound AI Systems

The solution is to build systems around the model. By integrating LLMs with external components like databases, search functions, or calculators, they can solve more complex, real-world problems. For example, an LLM can generate a database query to fetch your personal vacation days and then formulate a correct response. This modular, system-design approach is faster and easier to adapt than retraining the model itself. Retrieval-Augmented Generation (RAG) is a common example.

The Control Logic Problem

Traditional compound systems use programmatic control logic, which is rigid. If a user asks an unexpected question (e.g., "What's the weather?"), the system fails because its predefined path (e.g., always checking the vacation database) is inflexible.

Introduction of AI Agents

AI agents address this rigidity by using an LLM to manage the system's control logic. Instead of following a fixed program, the LLM is prompted to think slow:

reason,
plan,
break down complex problems,
decide which tools to use,
observe results,
iterate if needed.

Implementation (ReAct Framework)

A popular method, ReAct, explicitly combines "Reasoning" and "Acting." The LLM is instructed to think, act (by calling a tool), observe the result, and repeat until it finds the answer.

Use Case & Trade-offs

Agents excel at complex, open-ended tasks (e.g., calculating how much sunscreen to pack for a Florida trip, which requires fetching vacation days, weather data, medical guidelines, and math). For simple, narrow tasks, a programmatic system is more efficient. For complex, varied tasks, the flexible, agentic approach is more suitable, though it's still evolving and often benefits from human oversight.

Core Capabilities

Reasoning

The LLM plans and reasons through steps.
Acting

The LLM uses external "tools" (APIs, databases, calculators, other models) to gather information or perform actions.
Memory

The agent can access conversation history or its own reasoning logs for context and personalization.

NVIDIA Tech Stack