Example selectors#

在写prompt的时候会包含一些示例来帮助LLM生成我们想要的答案,可以直接将示例硬编码在prompt中,LangChain提供了Example Selectors来动态的选择。

Example selectors的功能是将示例格式化到Prompt中,并且可以动态的选择。

官网教程:https://python.langchain.com/v0.2/docs/how_to/#example-selectors 既然是动态选择,那就有动态选择的方法,Example selector类型如下

类型

描述

Similarity(相似性)

使用输入和示例之间的语义相似性来决定选择哪些示例。

MMR(最大边际相关性)

使用最大边际相关性来决定选择哪些示例。

Length

根据可以适应特定长度的数量选择示例

Ngram

使用 ngram 重叠来决定选择哪些示例。

根据长度要求选择示例(Length)#

LengthBasedExampleSelector根据长度要求选择示例

from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

# 示例
examples = [
    {"input": "hi", "output": "嘿"},
    {"input": "bye", "output": "拜拜"},
    {"input": "soccer", "output": "足球"},
    {"input":"The selector allows for a threshold score to be set","output": "选择器允许设置阈值分数"},
    {"input":"Examples with an ngram overlap score less than or equal to the threshold are excluded.","output": "排除ngram重叠得分小于或等于阈值的示例。"}
]
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
    examples=examples,  #示例
    example_prompt=example_prompt, #
    # 格式化之后的示例的最大长度
    # 长度是通过get_text_length方法来测量的
    max_length=25,
    # 获取字符串的长度,用来判断哪些示例应该包含在里面,
    # 内置的get_text_length,如果默认分词计算方式不满足,可以自己扩展
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)

# FewShotPromptTemplate: 在Prompt中包含Example(示例)
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({
    "word":"today is good day"
})
print(res.to_string())

如上所示:在Prompt中,已经嵌入了示例,因为max_length为25,我们所有的示例加在一块也没有超过25,所以就把所有的examples都嵌入到Prompt中了。 将max_length变为10,如下:

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
    examples=examples,  #示例
    example_prompt=example_prompt, #
    # 格式化之后的示例的最大长度
    # 长度是通过get_text_length方法来测量的
    max_length=10,
    # 获取字符串的长度,用来判断哪些示例应该包含在里面,
    # 内置的get_text_length,如果默认分词计算方式不满足,可以自己扩展
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)

# FewShotPromptTemplate: 在Prompt中包含Example(示例)
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({
    "word":"today is good day"
})
print(res.to_string())

可以看到,调整为10的的时候,示例变少了。 或者输入的内容很多,示例也会变少,如下:

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
    examples=examples,  #示例
    example_prompt=example_prompt, #
    # 格式化之后的示例的最大长度
    # 长度是通过get_text_length方法来测量的
    max_length=25,
    # 获取字符串的长度,用来判断哪些示例应该包含在里面,
    # 内置的get_text_length,如果默认分词计算方式不满足,可以自己扩展
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)

# FewShotPromptTemplate: 在Prompt中包含Example(示例)
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({
    "word":"One common prompting technique for achieving better performance is to include examples as part of the prompt."
})
print(res.to_string())

根据相似性选择示例(Similarity)#

SemanticSimilarityExampleSelector通过相似性选择示例。 它是通过输入和示例相似性选择示例。 实现是,将示例先Embedding(特征化),在选择示例的时候,将输入先Embedding,再在示例中找个最相似的。

from langchain_chroma import Chroma
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_core.prompts import FewShotPromptTemplate
from langchain_openai import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    Chroma,
    k=1,
)
similar_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(word="hello"))

根据输入相似度选择示例(最大边际相关性 MMR)#

MaxMarginalRelevanceExampleSelector 通过MMR来选择示例

  • MMR是一种在信息检索中常用的方法,它的目标是在相关性和多样性之间找到一个平衡

  • MMR会首先找出与输入最相似(即余弦相似度最大)的样本

  • 然后在迭代添加样本的过程中,对于与已选择样本过于接fire近(即相似度过高)的样本进行惩罚

  • MMR既能确保选出的样本与输入高度相关,又能保证选出的样本之间有足够的多样性

  • 关注如何在相关性和多样性之间找到一个平衡 它通过最大余弦相似度来查找。

from langchain_chroma import Chroma
from langchain_core.example_selectors import  MaxMarginalRelevanceExampleSelector
from langchain_core.prompts import FewShotPromptTemplate
from langchain_openai import OpenAIEmbeddings
import langchain
langchain.debug = True

example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    # The list of examples available to select from.
    examples,
    # The embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # The VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # The number of examples to produce.
    k=2,
)

mmr_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)

res = mmr_prompt.invoke({"word":"hello"})
print(res.to_string())

使用 ngram 重叠来选择示例#

NGramOverlapExampleSelector使用根据ngram重叠得分来选择 根据示例和输入之间的相似度来选择和排序示例,根据ngram重叠得分。 ngram重叠得分是一个介于0.0和1.0之间(包括边界)的浮点数。 他允许设置阈值分数。分数小于或等于阈值的示例会被排除,默认情况下,阈值设置为-1.0,因此不会排除任何示例,只会重新排序它们。将阈值设置为0.0将排除与输入没有ngram重叠的示例。

from langchain_community.example_selectors import NGramOverlapExampleSelector

example_selector = NGramOverlapExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-1.0,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threshold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score,
    # and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({"word":"Setting the threshold to 0.0 will exclude examples that have no ngram overlaps with the input."})
print(res.to_string())

自定义示例选择器#

在自定义的时候需要继承BaseExampleSelector,并且重写select_examples方法。 在下面中,根据单词的长度来选择要挑选哪个示例。

from langchain_core.example_selectors.base import BaseExampleSelector


class CustomExampleSelector(BaseExampleSelector):
    def __init__(self, examples):
        self.examples = examples

    def add_example(self, example):
        self.examples.append(example)

    def select_examples(self, input_variables):
        # This assumes knowledge that part of the input will be a 'text' key
        new_word = input_variables["input"]
        new_word_length = len(new_word)

        # Initialize variables to store the best match and its length difference
        best_match = None
        smallest_diff = float("inf")

        # Iterate through each example
        for example in self.examples:
            # Calculate the length difference with the first word of the example
            current_diff = abs(len(example["input"]) - new_word_length)

            # Update the best match if the current one is closer in length
            if current_diff < smallest_diff:
                smallest_diff = current_diff
                best_match = example

        return [best_match]

# 创建自定义选择器
example_selector = CustomExampleSelector(examples)
example_selector.select_examples({"input": "okay"})

在添加一个示例

example_selector.add_example({"input": "forward", "output": "前进的"})
example_selector.select_examples({"input": "okay"})
# 代入Prompt
prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {input}\nOutput:",
    input_variables=["input"]
)
res = prompt.invoke({"input":"text"})
print(res.to_string())

到此,这一章结束了。