Hãy tìm hiểu PandasAI: Từ Set up đến Analyze dữ liệu với AI

Tìm hiểu PandasAI: Từ Set up đến Analyze dữ liệu với AI

Are you struggling with complex data analysis and coding? Do you want to interact with data naturally like a human? PandasAI, an AI-enhanced extension of the popular Pandas library, can help you achieve this. In this tutorial, you will learn how to set up PandasAI, perform single-table and multi-table analysis, and visualize data with ease.

What is PandasAI?

PandasAI is an extension of Pandas library that integrates large language models (LLMs) to enable natural language interaction with data. This tool significantly simplifies data analysis by allowing you to ask questions in plain English and generate insights without writing complex code.

Key Features of PandasAI:

  • Easy setup and installation
  • Natural language data querying
  • Single-table and multi-table analysis
  • Data visualization
  • Safe environment for AI-generated code
  • Support for various LLM models

How to Get Started

Before you start, ensure your system meets the minimum requirements (Python 3.8-3.11 supported). You can install PandasAI using pip or poetry:

Installation

Install PandasAI and its dependencies:

pip install pandasai pip install pandasai-litellm # or using poetry poetry add pandasai poetry add pandasai-litellm

Configuring PandasAI

After installation, configure PandasAI with your preferred LLM. For example, to use LiteLLM with OpenAI model:

import pandasai as pai from pandasai_litellm.litellm import LiteLLM llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY") pai.config.set({"llm": llm})

Loading Data

PandasAI supports various data formats such as CSV, Excel, and JSON. Here's how to load data:

# From CSV df = pai.read_csv("path/to/your/data.csv") # From dictionary employees_data = { 'EmployeeID': [1, 2, 3, 4, 5], 'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'], 'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance'] } df = pai.DataFrame(employees_data)

Basic Analysis with PandasAI

Use PandasAI to perform quick data analysis by asking questions in natural language:

# Simple data query result = df.chat("What is the average salary by department?") print(result) # More complex analysis result = df.chat("What are the top 3 highest-paid employees?") print(result)

Data Visualization

PandasAI also supports data visualization. For example:

# Create a bar chart of average salary by department df.chat( "Plot a bar chart showing average salary by department, using different colors for each bar" )

Multi-Table Analysis

When dealing with multiple tables, PandasAI helps you find relationships between data:

# Create two DataFrames employees_df = pai.DataFrame(employees_data) salaries_df = pai.DataFrame(salaries_data) # Analyze the highest-paid employee across departments result = pai.chat("Who gets paid the most?") print(result) # Visualize the data result = pai.chat( "Create a bar chart showing the distribution of salaries by department", employees_df, salaries_df ) print(result)

Ensuring Data Security

To maintain data security, use PandasAI in a Docker environment:

import pandasai as pai from pandasai_docker import DockerSandbox from pandasai_litellm.litellm import LiteLLM llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY") pai.config.set({"llm": llm}) # Initialize Docker sandBox sandbox = DockerSandbox() sandbox.start() # Perform a query in the Docker environment result = pai.chat("Who gets paid the most?", employees_df, salaries_df, sandbox=sandbox) # Stop the sandBox after use sandbox.stop()

Advanced Configuration

Enhance PandasAI's functionality by configuring it according to your needs:

pai.config.set({ "llm": llm, "cache": True, "cache_path": "./cache", "log_level": "INFO", "output_format": "markdown" })

Real-World Example: Employee Salary Analysis

Let's walk through a practical example:

# Create employee and salary data employees_data = { 'EmployeeID': [1, 2, 3, 4, 5, 6, 7, 8], 'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William', 'Ava', 'Noah', 'Sophia'], 'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance', 'IT', 'Sales', 'Marketing'] } salaries_data = { 'EmployeeID': [1, 2, 3, 4, 5, 6, 7, 8], 'Salary': [5000, 6000, 4500, 7000, 5500, 8000, 6500, 7500] } employees_df = pai.DataFrame(employees_data) salaries_df = pai.DataFrame(salaries_data) # Analyze average salary by department average_salary = employees_df.chat("What is the average salary by department?") print(average_salary) # Find top 3 highest-paid employees top_employees = employees_df.chat("Who are the top 3 employees with the highest salaries?") print(top_employees) # Visualize salary distribution by department visualization = employees_df.chat( "Create a bar chart showing the salary distribution by department, with each bar representing a department and showing the minimum, average, and maximum salary" ) print(visualization)

Conclusion

PandasAI is a powerful tool that simplifies data analysis by bridging the gap between data and AI. With its intuitive interface and robust features, it empowers you to focus on insights rather than code. Explore more about PandasAI and its capabilities to enhance your data analysis workflow.

Thẻ: PandasAI data analysis AI python Pandas Library

Đăng vào ngày 21 tháng 6 lúc 20:36