opto.utils.llm¶
AbstractModel ¶
A minimal abstraction of a model api that refreshes the model every reset_freq seconds (this is useful for long-running models that may require refreshing certificates or memory management).
Args: factory: A function that takes no arguments and returns a model that is callable. reset_freq: The number of seconds after which the model should be refreshed. If None, the model is never refreshed.
AutoGenLLM ¶
AutoGenLLM(
config_list: List = None,
filter_dict: Dict = None,
reset_freq: Union[int, None] = None,
)
Bases: AbstractModel
This is the main class Trace uses to interact with the model. It is a
wrapper around autogen's OpenAIWrapper. For using models not supported by
autogen, subclass AutoGenLLM and override the _factory
and create
method. Users can pass instances of this class to optimizers' llm argument.
create ¶
Make a completion for a given config using available clients. Besides the kwargs allowed in openai's [or other] client, we allow the following additional kwargs. The config in each client will be overridden by the config.
Args:
- context (Dict | None): The context to instantiate the prompt or messages. Default to None.
It needs to contain keys that are used by the prompt template or the filter function.
E.g., prompt="Complete the following sentence: {prefix}, context={"prefix": "Today I feel"}
.
The actual prompt will be:
"Complete the following sentence: Today I feel".
More examples can be found at templating.
- cache (AbstractCache | None): A Cache object to use for response cache. Default to None.
Note that the cache argument overrides the legacy cache_seed argument: if this argument is provided,
then the cache_seed argument is ignored. If this argument is not provided or None,
then the cache_seed argument is used.
- agent (AbstractAgent | None): The object responsible for creating a completion if an agent.
- (Legacy) cache_seed (int | None) for using the DiskCache. Default to 41.
An integer cache_seed is useful when implementing "controlled randomness" for the completion.
None for no caching.
Note: this is a legacy argument. It is only used when the cache argument is not provided.
- filter_func (Callable | None): A function that takes in the context and the response
and returns a boolean to indicate whether the response is valid. E.g.,
- allow_format_str_template (bool | None): Whether to allow format string template in the config. Default to false.
- api_version (str | None): The api version. Default to None. E.g., "2024-02-01".
Example: >>> # filter_func example: >>> def yes_or_no_filter(context, response): >>> return context.get("yes_or_no_choice", False) is False or any( >>> text in ["Yes.", "No."] for text in client.extract_text_or_completion_object(response) >>> )
Raises: - RuntimeError: If all declared custom model clients are not registered - APIError: If any model client create call raises an APIError
LiteLLM ¶
Bases: AbstractModel
This is an LLM backend supported by LiteLLM library.
https://docs.litellm.ai/docs/completion/input
To use this, set the credentials through the environment variable as instructed in the LiteLLM documentation. For convenience, you can set the default model name through the environment variable TRACE_LITELLM_MODEL. When using Azure models via token provider, you can set the Azure token provider scope through the environment variable AZURE_TOKEN_PROVIDER_SCOPE.
CustomLLM ¶
Bases: AbstractModel
This is for Custom server's API endpoints that are OpenAI Compatible. Such server includes LiteLLM proxy server.
LLMFactory ¶
Factory for creating LLM instances with predefined profiles.
The code comes with these built-in profiles:
llm_default = LLM(profile="default") # gpt-4o-mini
llm_premium = LLM(profile="premium") # gpt-4
llm_cheap = LLM(profile="cheap") # gpt-4o-mini
llm_fast = LLM(profile="fast") # gpt-3.5-turbo-mini
llm_reasoning = LLM(profile="reasoning") # o1-mini
You can override those built-in profiles:
LLMFactory.register_profile("default", "LiteLLM", model="gpt-4o", temperature=0.5)
LLMFactory.register_profile("premium", "LiteLLM", model="o1-preview", max_tokens=8000)
LLMFactory.register_profile("cheap", "LiteLLM", model="gpt-3.5-turbo", temperature=0.9)
LLMFactory.register_profile("fast", "LiteLLM", model="gpt-3.5-turbo", max_tokens=500)
LLMFactory.register_profile("reasoning", "LiteLLM", model="o1-preview")
An Example of using Different Backends
# Register custom profiles for different use cases
LLMFactory.register_profile("advanced_reasoning", "LiteLLM", model="o1-preview", max_tokens=4000)
LLMFactory.register_profile("claude_sonnet", "LiteLLM", model="claude-3-5-sonnet-latest", temperature=0.3)
LLMFactory.register_profile("custom_server", "CustomLLM", model="llama-3.1-8b")
# Use in different contexts
reasoning_llm = LLM(profile="advanced_reasoning") # For complex reasoning
claude_llm = LLM(profile="claude_sonnet") # For Claude responses
local_llm = LLM(profile="custom_server") # For local deployment
# Single LLM optimizer with custom profile
optimizer1 = OptoPrime(parameters, llm=LLM(profile="advanced_reasoning"))
# Multi-LLM optimizer with multiple profiles
optimizer2 = OptoPrimeMulti(parameters, llm_profiles=["cheap", "premium", "claude_sonnet"], generation_technique="multi_llm")
DummyLLM ¶
LLM ¶
A unified entry point for all supported LLM backends.
Usage: # pick by env var (default: LiteLLM) llm = LLM() # or override explicitly llm = LLM(backend="AutoGen", config_list=my_configs) # or use predefined profiles llm = LLM(profile="premium") # Use premium model llm = LLM(profile="cheap") # Use cheaper model llm = LLM(profile="reasoning") # Use reasoning/thinking model
auto_construct_oai_config_list_from_env ¶
Collect various API keys saved in the environment and return a format like: [{"model": "gpt-4", "api_key": xxx}, {"model": "claude-3.5-sonnet", "api_key": xxx}]
Note this is a lazy function that defaults to gpt-40 and claude-3.5-sonnet. If you want to specify your own model, please provide an OAI_CONFIG_LIST in the environment or as a file