In this era of data explosion, data analysis has become the core of business decision-making. However, the shortage of data scientists, high labor costs, and the learning threshold of traditional data analysis tools have deterred many enterprises. Today, I want to introduce you to aRevolutionary open source project--DeepAnalyze, which makes it easy to have aAI Data Scientist online 24/7!
What is DeepAnalyze?
DeepAnalyze isThe first agent-based large language model for autonomous data science(Agentic LLM), jointly developed by a team from Renmin University of China and Tsinghua University. It can not only perform traditional data analysis tasks, but also act like a human data scientistAutonomous scheduling and optimizationTrue "end-to-end" automation of the entire data science process, from raw data to analyst-level research reports.
💡 Key breakthroughs: DeepAnalyze-8B (8B parameters only) outperforms commercial large models such as GPT-4o-mini in several benchmarks, becoming the first model to accomplish theOpen Data ResearchThe open source model of the
DeepAnalyze's five core capabilities
1️⃣ End-to-End Data Science Pipeline
DeepAnalyze automates the entire data science process:
- Data preparation: automatic handling of missing values, de-duplication, format conversion
- data analysis: Calculation of statistical indicators, identification of data patterns
- data modeling: Build predictive models and evaluate performance
- data visualization: Generate professional charts and visualization reports
- Report Generation: Output analyst-level professional reports

2️⃣ Open Data Research
Unlike traditional tools, DeepAnalyze is not limited to predefined processes, it can:
- self-directed explorationData sources that "think" like human data scientists.
- flexible responseMultiple data formats: CSV, Excel, JSON, database, Markdown, etc.
- deep-rootedThe story behind the data, delivering truly insightful analysis
🌟 After uploading multiple data files, DeepAnalyze automatically analyzes student enrollment patterns, institutional transfer networks, and generates in-depth research reports with specialized charts and graphs.

3️⃣ Autonomy without human intervention
DeepAnalyze's most powerful features areAutonomous scheduling and optimizationAbility:
- it will firstprogramAnalyze the path
- after thatproactive explorationdigital
- in one's turnunderstandingsdata structure
- fulfillmentData preparation and analysis
- eventualGenerate professional reports
The whole process requires no pre-defined workflow and thinks like a human being, yet is more efficient and precise than a human being!

4️⃣ Multiple Data Source Support
DeepAnalyze can handle a wide range of data formats:
- 📊 Structured data: CSV, Excel, database
- 📦 semi-structured data: JSON, XML, YAML
- 📝 Unstructured data: TXT, Markdown
No matter what format your data is in, DeepAnalyze can "see" it and analyze it in depth.

5️⃣ Full Open Source
DeepAnalyze's greatest strengths areCompletely open source::
- Model Weights Open Source
- Code is completely open
- Training data is publicly available
- Deployment tutorials in detail
You don't need to rely on any closed source APIs to have your own data science assistant!

How does DeepAnalyze work? Technical Analysis
The innovation of DeepAnalyze is itsCurriculum-based agent training(Curriculum-based Agentic Training) andData-driven trajectory synthesisTechnology:
🧠 Five core interaction actions
DeepAnalyze operates autonomously through five special action tags:
⟨Analyze⟩: Analysis and planning⟨Understand⟩: Understanding Data StructuresCode⟨Code⟩: Generate data analysis code⟨Execute⟩: Execute the code and get the result⟨Answer⟩: Generate final report
These actions allow DeepAnalyze to act like a human beingThink-Act-Feedback, continuously optimizing the analysis process.

📚 Course-based training methods
DeepAnalyze employs a "simple to complex" training strategy:
- Single-capability fine-tuning: Acquire basic competencies first (e.g., data comprehension, code generation)
- Multi-competency agent training: Learning to combine multiple competencies to solve complex problems
- Enhanced Learning Optimization: Continuous improvement of decision-making in real environments
This training method solves the problem of "reward sparsity" of traditional LLM in complex data tasks, and allows the model to truly learn to "think like a data scientist".

How to Deploy DeepAnalyze: A Hands-On Tutorial
📦 Preparations
- Cloning Codebase::
git clone https://github.com/ruc-datalab/DeepAnalyze.git
cd DeepAnalyze- Creating a Virtual Environment::
conda create -n deepanalyze python=3.12 -y
conda activate deepanalyze- Installation of dependencies::
pip install -r requirements.txt
# training dependencies
(cd . /deepanalyze/ms-swift/ && pip install -e .)
(cd . /deepanalyze/SkyRL/ && pip install -e .)🚀 Deployment model
- Download model::
- DeepAnalyze-8B is available for direct download.
- or fine-tuned based on DeepSeek-R1-0528-Qwen3-8B
- Starting services::
cd demo/chat
npm install
cd .
bash start.sh- access interface::
- Open your browser to access
http://localhost:4000 - Upload data files and enter analysis instructions
🌐 API Call Methods
You can also integrate it into your own system via API:
python demo/backend.pyThen use curl to test the API:
curl -X POST http://localhost:8200/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Generate a data science report."}], "workspace": "example/student_loan/"}'Generate professional reports in three steps
Let's look at a real-life usage scenario:
- Prepare data: Place the data files to be analyzed in the working directory (e.g.
example/student_loan/) - Submission of mandates::
from deepanalyze import DeepAnalyzeVLLM
prompt = """# Instruction
Generate a data science report.
# Data
File 1: {"name": "bool.xlsx", "size": "4.8KB"}
File 2: {"name": "person.csv", "size": "10.6KB"}
... (more file descriptions)""""
workspace = "/path/to/your/workspace"
deepanalyze = DeepAnalyzeVLLM("path/to/DeepAnalyze-8B/")
answer = deepanalyze.generate(prompt, workspace=workspace)- Getting results::
- You'll get a copy that includesProfessional charts, statistical analysis and business advicefull report of the Committee on the Elimination of Discrimination against Women
- Supports export to PDF format for direct use in reporting

Why choose DeepAnalyze?
| characterization | Deep Analyze | Traditional tools | Closed Source API |
|---|---|---|---|
| autonomy | ✅ Autonomous programming of the entire process | ❌ Manual guidance required | ⚠️ Limited Autonomy |
| Data format support | ✅ Multiple formats | ⚠️ Limited Support | ✅ Support |
| expand one's financial resources | ✅ Completely open source | ✅ | ❌ |
| (manufacturing, production etc) costs | ✅ One-time deployment | ✅ | ❌ High user fees |
| customizability | ✅ Fully customizable | ⚠️ Limited | ❌ |
Embracing the New Era of Autonomous Data Science
DeepAnalyze represents a new direction in data science - theFrom workflow-driven to AI autonomy-driven. It's not just a tool, it's your24-hour online data science team, able to understand your business needs and take ownership of complex data analysis tasks.
🌟 immediate action: Access GitHub repository Get the full code and tutorials and start your journey to autonomous data analysis today!