In our previous overview, we introduced the Google Agent Development Kit (ADK) as a powerful Python framework for building sophisticated AI agents. Now, let's dive deeper into some of the specific features that make ADK a compelling choice for developers looking to create agents that can reason, plan, use tools, and interact effectively with the world.
1. The Core: Configuring the `LlmAgent`
The heart of most ADK applications is the LlmAgent
(aliased as Agent
for convenience). This agent uses a Large Language Model (LLM) for its core reasoning and decision-making. Configuring it effectively is key:
name
(str): A unique identifier for your agent within the application.model
(str | BaseLlm): Specify the LLM to use. You can provide a model name string (like 'gemini-1.5-flash') or an instance of a model class (e.g.,Gemini()
). ADK resolves string names using its registry.instruction
(str | Callable): This is crucial for guiding the agent's behavior, personality, and task execution. It can be a simple string or a callable function that dynamically generates instructions based on the current context.tools
(list[Callable | BaseTool]): A list of capabilities you grant the agent. This can include Python functions or instances ofBaseTool
subclasses. More on this below!generate_content_config
(types.GenerateContentConfig): Fine-tune the LLM's generation parameters, such as temperature, top-p, safety settings, and stop sequences.
Code Sample: Basic Agent Configuration
from google.adk import Agent
from google.genai import types
from google.generativeai.types import HarmCategory, HarmBlockThreshold # Corrected import
# Define safety settings to allow discussion about specific topics if needed
safety_settings = [
types.SafetySetting(
category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE # Example: adjust as needed
),
]
# Configure generation parameters
gen_config = types.GenerateContentConfig(
temperature=0.7,
top_p=0.9,
safety_settings=safety_settings
)
# Create the agent
simple_chatbot = Agent(
name='friendly_explainer',
model='gemini-1.5-flash',
instruction='You are Bob, a friendly and knowledgeable assistant who explains complex topics simply. Always introduce yourself as Bob.',
generate_content_config=gen_config
)
# This agent can now be used with a Runner
2. The Power of Tools: Extending Agent Capabilities
Agents become truly useful when they can interact with external systems. ADK's tooling system is exceptionally flexible.
Effortless Function Tools
One of ADK's most convenient features is its ability to turn standard Python functions into tools automatically. Simply add the function to the agent's tools
list. ADK inspects the function's signature (type hints) and docstring to generate the necessary schema (FunctionDeclaration
) for the LLM to understand how and when to use it.
Code Sample: Function as a Tool
import random
from google.adk import Agent
# Define a standard Python function with type hints and a docstring
def get_weather(city: str) -> str:
"""Gets the current weather for a specified city."""
# In a real scenario, this would call a weather API
print(f"--> Checking weather for {city}...")
conditions = ["Sunny", "Cloudy", "Rainy", "Windy"]
temp = random.randint(5, 30)
condition = random.choice(conditions)
result = f"The weather in {city} is {condition} with a temperature of {temp}°C."
print(f"--> Result: {result}")
return result
# Create an agent and simply add the function to its tools list
weather_agent = Agent(
name='weather_reporter',
model='gemini-1.5-flash',
instruction='You provide weather information when asked about a city.',
tools=[get_weather] # ADK handles the rest!
)
# Now, if the user asks "What's the weather like in London?",
# the agent can call the get_weather function.
Built-in and External Code Execution
Agents can also execute code:
- Built-in Execution: For Gemini 2 models, you can use the
built_in_code_execution
tool. This leverages the model's internal capabilities without running code in your environment. Simply import and add it to the tools list. - External Executors: For more control or different environments, assign an instance of a
BaseCodeExecutor
subclass to the agent'scode_executor
parameter. Options includeVertexAiCodeExecutor
(runs code securely in a managed Vertex AI environment),ContainerCodeExecutor
, or even anUnsafeLocalCodeExecutor
(use with extreme caution). These executors handle parsing code blocks from the LLM response and returning the output.
Code Sample: Adding Code Execution Capabilities
from google.adk import Agent
from google.adk.tools import built_in_code_execution # For Gemini 2+
from google.adk.code_executors import VertexAiCodeExecutor # Example external executor
# Option 1: Using Built-in Execution (Gemini 2+)
analysis_agent_builtin = Agent(
name='data_analyst_builtin',
model='gemini-2.0-flash-001', # Requires a Gemini 2 model
instruction='Analyze data using code execution when necessary.',
tools=[built_in_code_execution]
)
# Option 2: Using an External Executor (e.g., Vertex AI)
# Requires setting up the Vertex AI Code Interpreter Extension
# vertex_executor = VertexAiCodeExecutor() # resource_name might be needed
analysis_agent_external = Agent(
name='data_analyst_external',
model='gemini-1.5-flash', # Can use other models
instruction='Analyze data using the provided code executor.',
code_executor=vertex_executor # Assign the executor instance
# Note: If using an external executor, don't add built_in_code_execution tool
)
Retrieval-Augmented Generation (RAG)
Give your agents access to up-to-date or domain-specific knowledge using retrieval tools. ADK provides a BaseRetrievalTool
and specific implementations like VertexAiRagRetrieval
. These tools allow the agent to query knowledge bases (like those hosted on Vertex AI) and incorporate the retrieved information into their responses. For newer models like Gemini 2, adding the VertexAiRagRetrieval
tool can leverage the model's built-in RAG capabilities directly.
Code Sample: Adding a RAG Tool
from google.adk import Agent
from google.adk.tools.retrieval import VertexAiRagRetrieval
# Configure the RAG tool to point to your Vertex AI RAG Corpus/Resources
# Replace with your actual resource names/corpora IDs
doc_retriever = VertexAiRagRetrieval(
name='internal_doc_search',
description='Searches the company knowledge base for policy documents.',
# Example: Specify either rag_corpora or rag_resources
rag_corpora=['projects/PROJECT_ID/locations/LOCATION/ragCorpora/CORPUS_ID']
# rag_resources=[vertexai.preview.rag.RagResource(...)]
)
policy_agent = Agent(
name='policy_advisor',
model='gemini-2.0-flash-001', # Gemini 2 preferred for built-in integration
instruction='Answer questions about company policy using the internal document search tool.',
tools=[doc_retriever]
)
API Integration
While not shown in detail here, ADK includes powerful tools for interacting with REST APIs defined by OpenAPI specifications or Google API Discovery Docs (found in modules like openapi_tool
and google_api_tool
). These tools can automatically parse API definitions, handle authentication (using ADK's `auth` components), and allow the agent to call external APIs.
3. Orchestrating Agent Workflows
Complex tasks often require multiple steps or different specialized agents. ADK provides container agents to manage these workflows:
SequentialAgent
: Runs a list of sub-agents one after another, passing the context along. Useful for multi-step processes.ParallelAgent
: (Found in `adk/agents/parallel_agent.py`) Runs sub-agents concurrently (behavior might depend on the runner implementation).LoopAgent
: (Found in `adk/agents/loop_agent.py`) Allows for iterative processes based on conditions.
ADK also supports dynamic agent transfer, where an LLM agent can decide to hand off control to another agent (parent, peer, or sub-agent) based on the conversation, using mechanisms like the transfer_to_agent_tool
.
Code Sample: Sequential Workflow
from google.adk import Agent, SequentialAgent
# Define specialized sub-agents
researcher = Agent(
name='web_researcher',
model='gemini-1.5-flash',
instruction='Find information on the web about a topic using search tools.',
tools=[Google Search_tool] # Assuming Google Search_tool is defined/imported
)
summarizer = Agent(
name='report_summarizer',
model='gemini-1.5-flash',
instruction='Summarize the provided text into a concise report.',
# This agent might expect text input via session state or context
)
# Create a sequential workflow
workflow_manager = SequentialAgent(
name='research_and_summarize_workflow',
description='Finds information online and then summarizes it.',
sub_agents=[researcher, summarizer]
)
# Running workflow_manager will first execute researcher, then summarizer.
4. Planning and Reasoning Strategies
For tasks requiring complex reasoning and planning, ADK offers Planners. The LlmAgent
can be configured with a planner
. One example is the PlanReActPlanner
. This planner guides the LLM to first generate an explicit plan (marked with /*PLANNING*/
), then execute steps (often involving tool calls marked with /*ACTION*/
), interleave reasoning about the results (/*REASONING*/
), and potentially replan (/*REPLANNING*/
) if needed, before producing the /*FINAL_ANSWER*/
. This structured approach makes the agent's thought process more transparent and controllable.
5. Managing State, Memory, and Artifacts
Agents need context. ADK manages this through several service abstractions, often used implicitly by a Runner
:
- Session Service (`BaseSessionService`): Manages the history of events (user messages, agent responses, tool calls) and the current key-value state within a single conversation (session). Implementations like
InMemorySessionService
are provided. - Memory Service (`BaseMemoryService`): Allows for longer-term persistence and retrieval of information, potentially across sessions. You might use this to recall past interactions or facts.
- Artifact Service (`BaseArtifactService`): Handles the storage and retrieval of binary data or files (like images, PDFs, CSVs) associated with a session. Implementations for in-memory and Google Cloud Storage (`GcsArtifactService`) exist.
The InMemoryRunner
conveniently bundles in-memory versions of these services for easy local development and testing.
Code Sample: Using the InMemoryRunner
from google.adk import Agent
from google.adk.runners import InMemoryRunner
from google.genai import types
import uuid # For generating unique IDs
# Assume 'my_agent' is an already defined ADK Agent instance
my_agent = Agent(name='test_agent', model='gemini-1.5-flash', instruction='Be helpful.')
# InMemoryRunner handles session, memory, artifact services internally
runner = InMemoryRunner(agent=my_agent, app_name='MyTestApp')
# Prepare input
user_input = "Hello ADK!"
message = types.Content(role='user', parts=[types.Part(text=user_input)])
user_id = str(uuid.uuid4())
session_id = str(uuid.uuid4())
# Run and process events
print(f"User: {user_input}")
for event in runner.run(user_id=user_id, session_id=session_id, new_message=message):
if event.is_final_response() and event.content and event.content.parts:
response_text = ''.join(part.text for part in event.content.parts if part.text)
print(f"Agent: {response_text}")
6. Advanced Customization: Callbacks
ADK provides numerous callback points within the agent lifecycle (e.g., before_model_callback
, after_tool_callback
, before_agent_callback
) allowing developers to inspect, modify, or even intercept requests and responses at various stages. This enables fine-grained control and integration with custom logging, monitoring, or logic.
Conclusion
The Google Agent Development Kit (ADK) offers a rich set of features designed to streamline the creation of powerful and sophisticated AI agents. From its flexible tooling system and code execution capabilities to agent orchestration, planning strategies, and state management, ADK provides the components needed to build agents that can tackle complex, real-world tasks. By understanding and leveraging these features, developers can significantly accelerate the development of next-generation AI applications.
Comments
Post a Comment