Xây dựng một Kiến trúc RAG Nhỏ gọn

1. Giới thiệu về RAG

Mô hình ngôn ngữ lớn (LLM) có thể tạo ra những "ảo giác" gây hiểu lầm, dựa trên thông tin lỗi thời, không hiệu quả khi xử lý kiến thức cụ thể, thiếu sự hiểu biết sâu sắc về lĩnh vực chuyên môn và kém trong khả năng suy luận.

Trong bối cảnh đó, **Công nghệ Tăng cường Sinh thành bằng Truy xuất** (Retrieval-Augmented Generation, RAG) đã ra đời, trở thành một xu hướng quan trọng trong kỷ nguyên AI. RAG cải thiện đáng kể độ chính xác và tính liên quan của nội dung bằng cách truy xuất thông tin từ cơ sở dữ liệu văn bản rộng lớn trước khi sinh thành câu trả lời. RAG giúp giảm thiểu vấn đề ảo giác, tăng tốc độ cập nhật kiến thức và tăng cường tính truy xuất của nội dung sinh thành, làm cho mô hình ngôn ngữ lớn trở nên thực tế và đáng tin cậy hơn.

Các thành phần cơ bản của RAG bao gồm:

Một mô-đun vector hóa để chuyển đổi đoạn văn bản thành vector.
Một mô-đun tải và chia nhỏ tài liệu để tải tài liệu và chia thành các đoạn.
Một cơ sở dữ liệu để lưu trữ các đoạn văn bản và biểu diễn vector tương ứng.
Một mô-đun truy xuất để tìm kiếm các đoạn văn bản liên quan theo yêu cầu.
Một mô-đun mô hình lớn để trả lời câu hỏi dựa trên các đoạn văn bản được truy xuất.

2. Vector hóa

Đầu tiên, chúng ta sẽ triển khai một lớp vector hóa, đây là nền tảng của kiến trúc RAG. Lớp này chủ yếu dùng để chuyển đổi đoạn văn bản thành vector.

Chúng ta sẽ tạo một lớp cơ sở `BaseEmbeddings` để thuận tiện cho việc mở rộng mã nguồn.

class BaseEmbeddings:
    """Lớp cơ sở cho các embedding"""
    def __init__(self, path: str, is_api: bool) -> None:
        self.path = path
        self.is_api = is_api

    def get_embedding(self, text: str, model: str) -> list:
        raise NotImplementedError

    @classmethod
    def cosine_similarity(cls, vector1: list, vector2: list) -> float:
        """Tính độ tương đồng cosin giữa hai vector"""
        dot_product = sum(a * b for a, b in zip(vector1, vector2))
        magnitude1 = sum(a * a for a in vector1) ** 0.5
        magnitude2 = sum(b * b for b in vector2) ** 0.5
        if not magnitude1 or not magnitude2:
            return 0
        return dot_product / (magnitude1 * magnitude2)

Lớp `BaseEmbeddings` có hai phương thức: `get_embedding` để lấy biểu diễn vector của văn bản và `cosine_similarity` để tính độ tương đồng cosin giữa hai vector. Khi khởi tạo, chúng ta thiết lập đường dẫn đến mô hình hoặc xác định xem nó có phải là API hay không.

Kế thừa từ `BaseEmbeddings`, chúng ta chỉ cần triển khai phương thức `get_embedding`.

class OpenAIEmbedding(BaseEmbeddings):
    """Lớp cho embedding của OpenAI"""
    def __init__(self, path: str = '', is_api: bool = True) -> None:
        super().__init__(path, is_api)
        if self.is_api:
            from openai import OpenAI
            self.client = OpenAI()
            self.client.api_key = os.getenv("OPENAI_API_KEY")
            self.client.base_url = os.getenv("OPENAI_BASE_URL")

    def get_embedding(self, text: str, model: str = "text-embedding-3-large") -> list:
        if self.is_api:
            text = text.replace("\n", " ")
            return self.client.embeddings.create(input=[text], model=model).data[0].embedding
        else:
            raise NotImplementedError

3. Tải và Chia nhỏ Tài liệu

Chúng ta sẽ triển khai một lớp để tải và chia nhỏ tài liệu. Lớp này sẽ tải tài liệu và chia thành các đoạn.

Tài liệu có thể là bài viết, sách, cuộc hội thoại, đoạn mã, v.v. Chúng ta sẽ hỗ trợ các loại tệp như PDF, MD, TXT.

def read_file_content(file_path: str):
    if file_path.endswith('.pdf'):
        return read_pdf(file_path)
    elif file_path.endswith('.md'):
        return read_markdown(file_path)
    elif file_path.endswith('.txt'):
        return read_text(file_path)
    else:
        raise ValueError("Loại tệp không được hỗ trợ")

Sau khi đọc nội dung tệp, chúng ta sẽ chia nhỏ tài liệu theo độ dài token. Chúng ta cũng cần đảm bảo rằng các đoạn có một phần chồng chéo để đảm bảo tính liên tục của nội dung.

def get_chunk(text: str, max_token_len: int = 600, cover_content: int = 150):
    chunk_text = []
    curr_len = 0
    curr_chunk = ''
    lines = text.split('\n')
    for line in lines:
        line = line.replace(' ', '')
        line_len = len(enc.encode(line))
        if line_len > max_token_len:
            print(f'Cảnh báo: độ dài dòng = {line_len}')
        if curr_len + line_len <= max_token_len:
            curr_chunk += line + '\n'
            curr_len += line_len + 1
        else:
            chunk_text.append(curr_chunk)
            curr_chunk = curr_chunk[-cover_content:] + line
            curr_len = line_len + cover_content
    if curr_chunk:
        chunk_text.append(curr_chunk)
    return chunk_text

4. Cơ sở Dữ liệu & Truy xuất Vector

Chúng ta cần thiết kế một cơ sở dữ liệu vector để lưu trữ các đoạn văn bản và biểu diễn vector tương ứng, cũng như một mô-đun truy xuất để tìm kiếm các đoạn văn bản liên quan theo yêu cầu.

Một cơ sở dữ liệu cho kiến trúc RAG nhỏ nhất cần thực hiện các chức năng sau:

`persist`: Lưu trữ cơ sở dữ liệu cục bộ.
`load_vector`: Tải cơ sở dữ liệu từ cục bộ.
`get_vector`: Lấy biểu diễn vector của văn bản.
`query`: Tìm kiếm các đoạn văn bản liên quan theo yêu cầu.

class VectorStore:
    def __init__(self, document: list = ['']) -> None:
        self.document = document

    def get_vector(self, EmbeddingModel: BaseEmbeddings) -> list:
        pass

    def persist(self, path: str = 'storage'):
        pass

    def load_vector(self, path: str = 'storage'):
        pass

    def query(self, query: str, EmbeddingModel: BaseEmbeddings, k: int = 1) -> list:
        query_vector = EmbeddingModel.get_embedding(query)
        result = np.array([self.get_similarity(query_vector, vector) for vector in self.vectors])
        return np.array(self.document)[result.argsort()[-k:][::-1]].tolist()

5. Mô-đun Mô hình Lớn

Chúng ta sẽ triển khai một lớp cơ sở cho mô hình lớn, để dễ dàng mở rộng cho các mô hình khác.

class BaseModel:
    def __init__(self, path: str = '') -> None:
        self.path = path

    def chat(self, prompt: str, history: list, content: str) -> str:
        pass

    def load_model(self):
        pass

Ví dụ, chúng ta sẽ sử dụng mô hình **InternLM2-chat-7B**.

class InternLMChat(BaseModel):
    def __init__(self, path: str = '') -> None:
        super().__init__(path)
        self.load_model()

    def chat(self, prompt: str, history: list = [], content: str = '') -> str:
        prompt = PROMPT_TEMPLATE['InternLM_PROMPT_TEMPALTE'].format(question=prompt, context=content)
        response, history = self.model.chat(self.tokenizer, prompt, history)
        return response

    def load_model(self):
        import torch
        from transformers import AutoTokenizer, AutoModelForCausalLM
        self.tokenizer = AutoTokenizer.from_pretrained(self.path, trust_remote_code=True)
        self.model = AutoModelForCausalLM.from_pretrained(self.path, torch_dtype=torch.float16, trust_remote_code=True).cuda()

Chúng ta có thể sử dụng một từ điển để lưu trữ tất cả các prompt, điều này giúp dễ dàng bảo trì.

PROMPT_TEMPLATE = {
    'InternLM_PROMPT_TEMPALTE': """Tóm tắt nội dung, sau đó sử dụng nội dung để trả lời câu hỏi. Nếu bạn không biết câu trả lời, hãy nói rằng bạn không biết. Luôn trả lời bằng tiếng Việt.
    Câu hỏi: {question}
    Nội dung tham khảo:
    ...
    {context}
    ...
    Nếu nội dung tham khảo không đủ để trả lời, hãy nói rằng bạn không biết.
    Trả lời:"""
}

6. Demo Tiny-RAG

Chúng ta sẽ xem qua một ví dụ về Tiny-RAG.

from RAG.VectorBase import VectorStore
from RAG.utils import ReadFiles
from RAG.LLM import OpenAIChat, InternLMChat
from RAG.Embeddings import JinaEmbedding, ZhipuEmbedding

docs = ReadFiles('./data').get_content(max_token_len=600, cover_content=150)
vector = VectorStore(docs)
embedding = ZhipuEmbedding()
vector.get_vector(EmbeddingModel=embedding)
vector.persist(path='storage')

question = 'Nguyên lý hoạt động của git là gì?'
content = vector.query(question, model='zhipu', k=1)[0]
chat = InternLMChat(path='model_path')
print(chat.chat(question, [], content))

Chúng ta cũng có thể tải cơ sở dữ liệu đã được xử lý từ cục bộ.

from RAG.VectorBase import VectorStore
from RAG.utils import ReadFiles
from RAG.LLM import OpenAIChat, InternLMChat
from RAG.Embeddings import JinaEmbedding, ZhipuEmbedding

vector = VectorStore()
vector.load_vector('./storage')

question = 'Nguyên lý hoạt động của git là gì?'
embedding = ZhipuEmbedding()
content = vector.query(question, EmbeddingModel=embedding, k=1)[0]
chat = InternLMChat(path='model_path')
print(chat.chat(question, [], content))

7. Tổng kết

Qua bài viết, bạn đã học cách xây dựng một kiến trúc RAG nhỏ gọn. Một kiến trúc RAG nhỏ nhất bao gồm các thành phần sau:

Mô-đun vector hóa
Mô-đun tải và chia nhỏ tài liệu
Cơ sở dữ liệu
Mô-đun truy xuất vector
Mô-đun mô hình lớn

Thẻ: RAG Vectorization Document Processing Database Vector Retrieval

Đăng vào ngày 25 tháng 6 lúc 09:55

Thành phố Cuồng loạn