DeepAnalyze: let AI become your exclusive data scientist! Open source projects in depth analysis

In this era of data explosion, data analysis has become the core of business decision-making. However, the shortage of data scientists, high labor costs, and the learning threshold of traditional data analysis tools have deterred many enterprises. Today, I want to introduce you to aRevolutionary open source project--DeepAnalyze, which makes it easy to have aAI Data Scientist online 24/7!

What is DeepAnalyze?

DeepAnalyze isThe first agent-based large language model for autonomous data science(Agentic LLM), jointly developed by a team from Renmin University of China and Tsinghua University. It can not only perform traditional data analysis tasks, but also act like a human data scientistAutonomous scheduling and optimizationTrue "end-to-end" automation of the entire data science process, from raw data to analyst-level research reports.

💡 Key breakthroughs: DeepAnalyze-8B (8B parameters only) outperforms commercial large models such as GPT-4o-mini in several benchmarks, becoming the first model to accomplish theOpen Data ResearchThe open source model of the

DeepAnalyze's five core capabilities

1️⃣ End-to-End Data Science Pipeline

DeepAnalyze automates the entire data science process:

  • Data preparation: automatic handling of missing values, de-duplication, format conversion
  • data analysis: Calculation of statistical indicators, identification of data patterns
  • data modeling: Build predictive models and evaluate performance
  • data visualization: Generate professional charts and visualization reports
  • Report Generation: Output analyst-level professional reports

2️⃣ Open Data Research

Unlike traditional tools, DeepAnalyze is not limited to predefined processes, it can:

  • self-directed explorationData sources that "think" like human data scientists.
  • flexible responseMultiple data formats: CSV, Excel, JSON, database, Markdown, etc.
  • deep-rootedThe story behind the data, delivering truly insightful analysis

🌟 After uploading multiple data files, DeepAnalyze automatically analyzes student enrollment patterns, institutional transfer networks, and generates in-depth research reports with specialized charts and graphs.

3️⃣ Autonomy without human intervention

DeepAnalyze's most powerful features areAutonomous scheduling and optimizationAbility:

  • it will firstprogramAnalyze the path
  • after thatproactive explorationdigital
  • in one's turnunderstandingsdata structure
  • fulfillmentData preparation and analysis
  • eventualGenerate professional reports

The whole process requires no pre-defined workflow and thinks like a human being, yet is more efficient and precise than a human being!

4️⃣ Multiple Data Source Support

DeepAnalyze can handle a wide range of data formats:

  • 📊 Structured data: CSV, Excel, database
  • 📦 semi-structured data: JSON, XML, YAML
  • 📝 Unstructured data: TXT, Markdown

No matter what format your data is in, DeepAnalyze can "see" it and analyze it in depth.

5️⃣ Full Open Source

DeepAnalyze's greatest strengths areCompletely open source::

  • Model Weights Open Source
  • Code is completely open
  • Training data is publicly available
  • Deployment tutorials in detail

You don't need to rely on any closed source APIs to have your own data science assistant!

How does DeepAnalyze work? Technical Analysis

The innovation of DeepAnalyze is itsCurriculum-based agent training(Curriculum-based Agentic Training) andData-driven trajectory synthesisTechnology:

🧠 Five core interaction actions

DeepAnalyze operates autonomously through five special action tags:

  • ⟨Analyze⟩: Analysis and planning
  • ⟨Understand⟩: Understanding Data Structures
  • Code⟨Code⟩: Generate data analysis code
  • ⟨Execute⟩: Execute the code and get the result
  • ⟨Answer⟩: Generate final report

These actions allow DeepAnalyze to act like a human beingThink-Act-Feedback, continuously optimizing the analysis process.

📚 Course-based training methods

DeepAnalyze employs a "simple to complex" training strategy:

  1. Single-capability fine-tuning: Acquire basic competencies first (e.g., data comprehension, code generation)
  2. Multi-competency agent training: Learning to combine multiple competencies to solve complex problems
  3. Enhanced Learning Optimization: Continuous improvement of decision-making in real environments

This training method solves the problem of "reward sparsity" of traditional LLM in complex data tasks, and allows the model to truly learn to "think like a data scientist".

How to Deploy DeepAnalyze: A Hands-On Tutorial

📦 Preparations

  1. Cloning Codebase::
PHP
git clone https://github.com/ruc-datalab/DeepAnalyze.git
cd DeepAnalyze
  1. Creating a Virtual Environment::
PHP
conda create -n deepanalyze python=3.12 -y
conda activate deepanalyze
  1. Installation of dependencies::
PHP
pip install -r requirements.txt
# training dependencies
(cd . /deepanalyze/ms-swift/ && pip install -e .)
(cd . /deepanalyze/SkyRL/ && pip install -e .)

🚀 Deployment model

  1. Download model::
  • DeepAnalyze-8B is available for direct download.
  • or fine-tuned based on DeepSeek-R1-0528-Qwen3-8B
  1. Starting services::
PHP
cd demo/chat
npm install
cd .
bash start.sh
  1. access interface::

🌐 API Call Methods

You can also integrate it into your own system via API:

PHP
python demo/backend.py

Then use curl to test the API:

PHP
curl -X POST http://localhost:8200/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Generate a data science report."}], "workspace": "example/student_loan/"}'

Generate professional reports in three steps

Let's look at a real-life usage scenario:

  1. Prepare data: Place the data files to be analyzed in the working directory (e.g.example/student_loan/)
  2. Submission of mandates::
PHP
from deepanalyze import DeepAnalyzeVLLM

prompt = """# Instruction
Generate a data science report.

# Data
File 1: {"name": "bool.xlsx", "size": "4.8KB"}
File 2: {"name": "person.csv", "size": "10.6KB"}
... (more file descriptions)""""

workspace = "/path/to/your/workspace"
deepanalyze = DeepAnalyzeVLLM("path/to/DeepAnalyze-8B/")
answer = deepanalyze.generate(prompt, workspace=workspace)
  1. Getting results::
  • You'll get a copy that includesProfessional charts, statistical analysis and business advicefull report of the Committee on the Elimination of Discrimination against Women
  • Supports export to PDF format for direct use in reporting

Why choose DeepAnalyze?

characterizationDeep AnalyzeTraditional toolsClosed Source API
autonomy✅ Autonomous programming of the entire process❌ Manual guidance required⚠️ Limited Autonomy
Data format support✅ Multiple formats⚠️ Limited Support✅ Support
expand one's financial resources✅ Completely open source
(manufacturing, production etc) costs✅ One-time deployment❌ High user fees
customizability✅ Fully customizable⚠️ Limited

Embracing the New Era of Autonomous Data Science

DeepAnalyze represents a new direction in data science - theFrom workflow-driven to AI autonomy-driven. It's not just a tool, it's your24-hour online data science team, able to understand your business needs and take ownership of complex data analysis tasks.

🌟 immediate action: Access GitHub repository Get the full code and tutorials and start your journey to autonomous data analysis today!

For more products, please check out

See more at

ShirtAI - Penetrating Intelligence The AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge) How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep