opto.trainer.guide¶
Guide ¶
Base class for all guides that provide feedback on content.
Guide evaluates generated content and provide feedback to help improve it. Different implementations may use different strategies for evaluation, such as LLM-based comparison, keyword matching, or custom verification.
LLMJudge ¶
LLMJudge(
model: Optional[str] = None,
llm: Optional[AbstractModel] = None,
prompt_template: Optional[str] = None,
system_prompt: Optional[str] = None,
correctness_template: Optional[str] = None,
use_formatted_response: bool = True,
)
Bases: Guide
This is a combined metric + feedback guide that asks LLM to provide a binary judgment (True/False) and then if False, provide feedback.
This is an implementation of LLM-as-a-judge.
Initialize the VerbalGuide with an LLM and prompt templates.
Args: model: The name of the LLM model to use (if llm is not provided) llm: An instance of AbstractModel to use for generating feedback prompt_template: Custom prompt template with {response} and {reference} placeholders system_prompt: Custom system prompt for the LLM correctness_template: Template to use when response is deemed correct by metric use_formatted_response: Whether to format the response with additional context; if False, the raw LLM response is returned
DEFAULT_CORRECTNESS_TEMPLATE
class-attribute
instance-attribute
¶
DEFAULT_INCORRECTNESS_TEMPLATE
class-attribute
instance-attribute
¶
DEFAULT_PROMPT_TEMPLATE
class-attribute
instance-attribute
¶
DEFAULT_PROMPT_TEMPLATE = "The query is: {query}.\n\n\nThe student answered: {response}.\n\n\nThe correct answer is: {reference}.\n\n\nReason whether the student answer is correct. If the student answer is correct, please say {correctness_template}. Otherwise, if the student answer is incorrect, say {incorrectness_template} and provide feedback to the student. The feedback should be specific and actionable."
DEFAULT_SYSTEM_PROMPT
class-attribute
instance-attribute
¶
correctness_template
instance-attribute
¶
get_feedback ¶
get_feedback(
query: str,
response: str,
reference: Optional[str] = None,
**kwargs
) -> Tuple[float, str]
Get LLM-generated feedback by comparing response with reference information.
Args: query: The query to analyze (e.g., user query, task, etc.) response: The response generated by LLM (e.g., student answer, code, etc.) reference: The expected information or correct answer **kwargs: Additional parameters (unused in this implementation)
Returns: score: a float number provided by the metric function feedback: A string containing the LLM-generated feedback