Example selectors

Example selectors#

在写prompt的时候会包含一些示例来帮助LLM生成我们想要的答案，可以直接将示例硬编码在prompt中，LangChain提供了Example Selectors来动态的选择。

Example selectors的功能是将示例格式化到Prompt中，并且可以动态的选择。

官网教程：https://python.langchain.com/v0.2/docs/how_to/#example-selectors 既然是动态选择，那就有动态选择的方法，Example selector类型如下

类型	描述
Similarity（相似性）	使用输入和示例之间的语义相似性来决定选择哪些示例。
MMR（最大边际相关性）	使用最大边际相关性来决定选择哪些示例。
Length	根据可以适应特定长度的数量选择示例
Ngram	使用 ngram 重叠来决定选择哪些示例。

根据长度要求选择示例（Length）#

LengthBasedExampleSelector根据长度要求选择示例

from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

# 示例
examples = [
    {"input": "hi", "output": "嘿"},
    {"input": "bye", "output": "拜拜"},
    {"input": "soccer", "output": "足球"},
    {"input":"The selector allows for a threshold score to be set","output": "选择器允许设置阈值分数"},
    {"input":"Examples with an ngram overlap score less than or equal to the threshold are excluded.","output": "排除ngram重叠得分小于或等于阈值的示例。"}
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
    examples=examples,  #示例
    example_prompt=example_prompt, #
    # 格式化之后的示例的最大长度
    # 长度是通过get_text_length方法来测量的
    max_length=25,
    # 获取字符串的长度，用来判断哪些示例应该包含在里面，
    # 内置的get_text_length,如果默认分词计算方式不满足，可以自己扩展
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)

# FewShotPromptTemplate： 在Prompt中包含Example（示例）
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({
    "word":"today is good day"
})
print(res.to_string())

如上所示：在Prompt中，已经嵌入了示例，因为max_length为25，我们所有的示例加在一块也没有超过25，所以就把所有的examples都嵌入到Prompt中了。将max_length变为10，如下：

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
    examples=examples,  #示例
    example_prompt=example_prompt, #
    # 格式化之后的示例的最大长度
    # 长度是通过get_text_length方法来测量的
    max_length=10,
    # 获取字符串的长度，用来判断哪些示例应该包含在里面，
    # 内置的get_text_length,如果默认分词计算方式不满足，可以自己扩展
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)

# FewShotPromptTemplate： 在Prompt中包含Example（示例）
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({
    "word":"today is good day"
})
print(res.to_string())

可以看到，调整为10的的时候，示例变少了。或者输入的内容很多，示例也会变少，如下：

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
    examples=examples,  #示例
    example_prompt=example_prompt, #
    # 格式化之后的示例的最大长度
    # 长度是通过get_text_length方法来测量的
    max_length=25,
    # 获取字符串的长度，用来判断哪些示例应该包含在里面，
    # 内置的get_text_length,如果默认分词计算方式不满足，可以自己扩展
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)

# FewShotPromptTemplate： 在Prompt中包含Example（示例）
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({
    "word":"One common prompting technique for achieving better performance is to include examples as part of the prompt."
})
print(res.to_string())

根据相似性选择示例（Similarity）#

SemanticSimilarityExampleSelector通过相似性选择示例。它是通过输入和示例相似性选择示例。实现是，将示例先Embedding（特征化），在选择示例的时候，将输入先Embedding，再在示例中找个最相似的。

from langchain_chroma import Chroma
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_core.prompts import FewShotPromptTemplate
from langchain_openai import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    Chroma,
    k=1,
)
similar_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)

# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(word="hello"))

根据输入相似度选择示例(最大边际相关性 MMR)#

MaxMarginalRelevanceExampleSelector 通过MMR来选择示例

MMR是一种在信息检索中常用的方法，它的目标是在相关性和多样性之间找到一个平衡
MMR会首先找出与输入最相似（即余弦相似度最大）的样本
然后在迭代添加样本的过程中，对于与已选择样本过于接fire近（即相似度过高）的样本进行惩罚
MMR既能确保选出的样本与输入高度相关，又能保证选出的样本之间有足够的多样性
关注如何在相关性和多样性之间找到一个平衡它通过最大余弦相似度来查找。

from langchain_chroma import Chroma
from langchain_core.example_selectors import  MaxMarginalRelevanceExampleSelector
from langchain_core.prompts import FewShotPromptTemplate
from langchain_openai import OpenAIEmbeddings
import langchain
langchain.debug = True

example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    # The list of examples available to select from.
    examples,
    # The embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # The VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # The number of examples to produce.
    k=2,
)

mmr_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)

res = mmr_prompt.invoke({"word":"hello"})
print(res.to_string())

使用 ngram 重叠来选择示例#

NGramOverlapExampleSelector使用根据ngram重叠得分来选择根据示例和输入之间的相似度来选择和排序示例，根据ngram重叠得分。 ngram重叠得分是一个介于0.0和1.0之间（包括边界）的浮点数。他允许设置阈值分数。分数小于或等于阈值的示例会被排除，默认情况下，阈值设置为-1.0，因此不会排除任何示例，只会重新排序它们。将阈值设置为0.0将排除与输入没有ngram重叠的示例。

from langchain_community.example_selectors import NGramOverlapExampleSelector

example_selector = NGramOverlapExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-1.0,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threshold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score,
    # and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {word}\nOutput:",
    input_variables=["word"],
)
res = dynamic_prompt.invoke({"word":"Setting the threshold to 0.0 will exclude examples that have no ngram overlaps with the input."})
print(res.to_string())

自定义示例选择器#

在自定义的时候需要继承BaseExampleSelector，并且重写select_examples方法。在下面中，根据单词的长度来选择要挑选哪个示例。

from langchain_core.example_selectors.base import BaseExampleSelector


class CustomExampleSelector(BaseExampleSelector):
    def __init__(self, examples):
        self.examples = examples

    def add_example(self, example):
        self.examples.append(example)

    def select_examples(self, input_variables):
        # This assumes knowledge that part of the input will be a 'text' key
        new_word = input_variables["input"]
        new_word_length = len(new_word)

        # Initialize variables to store the best match and its length difference
        best_match = None
        smallest_diff = float("inf")

        # Iterate through each example
        for example in self.examples:
            # Calculate the length difference with the first word of the example
            current_diff = abs(len(example["input"]) - new_word_length)

            # Update the best match if the current one is closer in length
            if current_diff < smallest_diff:
                smallest_diff = current_diff
                best_match = example

        return [best_match]

# 创建自定义选择器
example_selector = CustomExampleSelector(examples)
example_selector.select_examples({"input": "okay"})

在添加一个示例

example_selector.add_example({"input": "forward", "output": "前进的"})
example_selector.select_examples({"input": "okay"})

# 代入Prompt
prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="将英文翻译中文",
    suffix="Input: {input}\nOutput:",
    input_variables=["input"]
)
res = prompt.invoke({"input":"text"})
print(res.to_string())

到此，这一章结束了。