Example selectors#
在写prompt的时候会包含一些示例来帮助LLM生成我们想要的答案,可以直接将示例硬编码在prompt中,LangChain提供了Example Selectors
来动态的选择。
Example selectors的功能是将示例格式化到Prompt中,并且可以动态的选择。
官网教程:https://python.langchain.com/v0.2/docs/how_to/#example-selectors
既然是动态选择,那就有动态选择的方法,Example selector
类型如下
类型 |
描述 |
---|---|
Similarity(相似性) |
使用输入和示例之间的语义相似性来决定选择哪些示例。 |
MMR(最大边际相关性) |
使用最大边际相关性来决定选择哪些示例。 |
Length |
根据可以适应特定长度的数量选择示例 |
Ngram |
使用 ngram 重叠来决定选择哪些示例。 |
根据长度要求选择示例(Length)#
LengthBasedExampleSelector
根据长度要求选择示例
from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
# 示例
examples = [
{"input": "hi", "output": "嘿"},
{"input": "bye", "output": "拜拜"},
{"input": "soccer", "output": "足球"},
{"input":"The selector allows for a threshold score to be set","output": "选择器允许设置阈值分数"},
{"input":"Examples with an ngram overlap score less than or equal to the threshold are excluded.","output": "排除ngram重叠得分小于或等于阈值的示例。"}
]
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
examples=examples, #示例
example_prompt=example_prompt, #
# 格式化之后的示例的最大长度
# 长度是通过get_text_length方法来测量的
max_length=25,
# 获取字符串的长度,用来判断哪些示例应该包含在里面,
# 内置的get_text_length,如果默认分词计算方式不满足,可以自己扩展
# get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
# FewShotPromptTemplate: 在Prompt中包含Example(示例)
dynamic_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="将英文翻译中文",
suffix="Input: {word}\nOutput:",
input_variables=["word"],
)
res = dynamic_prompt.invoke({
"word":"today is good day"
})
print(res.to_string())
如上所示:在Prompt中,已经嵌入了示例,因为max_length
为25,我们所有的示例加在一块也没有超过25,所以就把所有的examples都嵌入到Prompt中了。
将max_length
变为10,如下:
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
examples=examples, #示例
example_prompt=example_prompt, #
# 格式化之后的示例的最大长度
# 长度是通过get_text_length方法来测量的
max_length=10,
# 获取字符串的长度,用来判断哪些示例应该包含在里面,
# 内置的get_text_length,如果默认分词计算方式不满足,可以自己扩展
# get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
# FewShotPromptTemplate: 在Prompt中包含Example(示例)
dynamic_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="将英文翻译中文",
suffix="Input: {word}\nOutput:",
input_variables=["word"],
)
res = dynamic_prompt.invoke({
"word":"today is good day"
})
print(res.to_string())
可以看到,调整为10的的时候,示例变少了。 或者输入的内容很多,示例也会变少,如下:
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="""
Input: {input}
Output: {output}""",
)
# 长度选择器
example_selector = LengthBasedExampleSelector(
examples=examples, #示例
example_prompt=example_prompt, #
# 格式化之后的示例的最大长度
# 长度是通过get_text_length方法来测量的
max_length=25,
# 获取字符串的长度,用来判断哪些示例应该包含在里面,
# 内置的get_text_length,如果默认分词计算方式不满足,可以自己扩展
# get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
# FewShotPromptTemplate: 在Prompt中包含Example(示例)
dynamic_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="将英文翻译中文",
suffix="Input: {word}\nOutput:",
input_variables=["word"],
)
res = dynamic_prompt.invoke({
"word":"One common prompting technique for achieving better performance is to include examples as part of the prompt."
})
print(res.to_string())
根据相似性选择示例(Similarity)#
SemanticSimilarityExampleSelector
通过相似性选择示例。
它是通过输入和示例相似性选择示例。
实现是,将示例先Embedding(特征化),在选择示例的时候,将输入先Embedding,再在示例中找个最相似的。
from langchain_chroma import Chroma
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_core.prompts import FewShotPromptTemplate
from langchain_openai import OpenAIEmbeddings
example_selector = SemanticSimilarityExampleSelector.from_examples(
examples,
OpenAIEmbeddings(),
Chroma,
k=1,
)
similar_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="将英文翻译中文",
suffix="Input: {word}\nOutput:",
input_variables=["word"],
)
# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(word="hello"))
根据输入相似度选择示例(最大边际相关性 MMR)#
MaxMarginalRelevanceExampleSelector
通过MMR来选择示例
MMR是一种在信息检索中常用的方法,它的目标是在相关性和多样性之间找到一个平衡
MMR会首先找出与输入最相似(即余弦相似度最大)的样本
然后在迭代添加样本的过程中,对于与已选择样本过于接fire近(即相似度过高)的样本进行惩罚
MMR既能确保选出的样本与输入高度相关,又能保证选出的样本之间有足够的多样性
关注如何在相关性和多样性之间找到一个平衡 它通过最大余弦相似度来查找。
from langchain_chroma import Chroma
from langchain_core.example_selectors import MaxMarginalRelevanceExampleSelector
from langchain_core.prompts import FewShotPromptTemplate
from langchain_openai import OpenAIEmbeddings
import langchain
langchain.debug = True
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
Chroma,
# The number of examples to produce.
k=2,
)
mmr_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="将英文翻译中文",
suffix="Input: {word}\nOutput:",
input_variables=["word"],
)
res = mmr_prompt.invoke({"word":"hello"})
print(res.to_string())
使用 ngram 重叠来选择示例#
NGramOverlapExampleSelector
使用根据ngram重叠得分来选择
根据示例和输入之间的相似度来选择和排序示例,根据ngram重叠得分。 ngram重叠得分是一个介于0.0和1.0之间(包括边界)的浮点数。
他允许设置阈值分数。分数小于或等于阈值的示例会被排除,默认情况下,阈值设置为-1.0,因此不会排除任何示例,只会重新排序它们。将阈值设置为0.0将排除与输入没有ngram重叠的示例。
from langchain_community.example_selectors import NGramOverlapExampleSelector
example_selector = NGramOverlapExampleSelector(
# The examples it has available to choose from.
examples=examples,
# The PromptTemplate being used to format the examples.
example_prompt=example_prompt,
# The threshold, at which selector stops.
# It is set to -1.0 by default.
threshold=-1.0,
# For negative threshold:
# Selector sorts examples by ngram overlap score, and excludes none.
# For threshold greater than 1.0:
# Selector excludes all examples, and returns an empty list.
# For threshold equal to 0.0:
# Selector sorts examples by ngram overlap score,
# and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="将英文翻译中文",
suffix="Input: {word}\nOutput:",
input_variables=["word"],
)
res = dynamic_prompt.invoke({"word":"Setting the threshold to 0.0 will exclude examples that have no ngram overlaps with the input."})
print(res.to_string())
自定义示例选择器#
在自定义的时候需要继承BaseExampleSelector
,并且重写select_examples
方法。
在下面中,根据单词的长度来选择要挑选哪个示例。
from langchain_core.example_selectors.base import BaseExampleSelector
class CustomExampleSelector(BaseExampleSelector):
def __init__(self, examples):
self.examples = examples
def add_example(self, example):
self.examples.append(example)
def select_examples(self, input_variables):
# This assumes knowledge that part of the input will be a 'text' key
new_word = input_variables["input"]
new_word_length = len(new_word)
# Initialize variables to store the best match and its length difference
best_match = None
smallest_diff = float("inf")
# Iterate through each example
for example in self.examples:
# Calculate the length difference with the first word of the example
current_diff = abs(len(example["input"]) - new_word_length)
# Update the best match if the current one is closer in length
if current_diff < smallest_diff:
smallest_diff = current_diff
best_match = example
return [best_match]
# 创建自定义选择器
example_selector = CustomExampleSelector(examples)
example_selector.select_examples({"input": "okay"})
在添加一个示例
example_selector.add_example({"input": "forward", "output": "前进的"})
example_selector.select_examples({"input": "okay"})
# 代入Prompt
prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="将英文翻译中文",
suffix="Input: {input}\nOutput:",
input_variables=["input"]
)
res = prompt.invoke({"input":"text"})
print(res.to_string())
到此,这一章结束了。