Skip to content

New: add a new plugin doc-review-agent-0.0.1.difypkg【LocalDocument Review Agent: Supports the review of various types of documents, including tender documents, official documents, contracts, and materials.】#2249

Closed
sawyer-shi wants to merge 1 commit intolanggenius:mainfrom
sawyer-shi:main

Conversation

@sawyer-shi
Copy link
Copy Markdown
Contributor

@sawyer-shi sawyer-shi commented Apr 5, 2026

Document Review Agent

A powerful Dify plugin providing comprehensive AI-powered document review capabilities for various types of documents including tender documents, official documents, contracts, and materials. Supports intelligent document parsing, rule-based auditing, risk aggregation, and annotated document generation with professional-grade quality and flexible configuration options.

Version Information

  • Current Version: v0.0.1
  • Release Date: 2026-04-05
  • Compatibility: Dify Plugin Framework
  • Python Version: 3.12

Version History

  • v0.0.1 (2026-04-05): Initial release with local document review capabilities

Quick Start

  1. Install plugin in your Dify environment

  2. Download Template Workflow:

    English:https://github.com/sawyer-shi/awsome-dify-agents/blob/master/src/doc-review-agent/agent_dsl/Document%20Review%20%E2%80%93%20Multi-threaded%20Processing%20Mode.yml

    sample00

    Chinese:https://github.com/sawyer-shi/awsome-dify-agents/blob/master/src/doc-review-agent/agent_dsl/%E6%96%87%E6%A1%A3%E5%AE%A1%E6%A0%B8--%E5%A4%9A%E7%BA%BF%E7%A8%8B%E5%A4%84%E7%90%86%E6%A8%A1%E5%BC%8F.yml

    sample01
  3. Download Rules Template and Sample Files:
    https://github.com/sawyer-shi/awsome-dify-agents/blob/master/src/doc-review-agent/agent_test_files/review_rules_research_en.csv

  4. Configure your LLM model settings. Also note: To prevent timeout, you can modify the parameter PLUGIN_MAX_EXECUTION_TIMEOUT to increase processing time!!!

  5. Upload your document and start the review process. Results are as follows:
    sample02

Key Features

  • Intelligent Document Parsing: Parse and slice documents into manageable chunks using LLM guidance

  • Rule-Based Auditing: Load review rules and audit document chunks against them

  • Risk Aggregation: Aggregate and deduplicate audit risks from multiple chunks

  • Document Annotation: Generate annotated documents with AI-assisted comments

  • Flexible Configuration: Support for custom review rules and audit levels

  • Multiple Document Types: Supports tender documents, official documents, contracts, and materials

  • Batch Processing: Efficient processing of large documents through chunking

  • LLM Integration: Leverages configured LLM models for intelligent analysis

    ENCN

Core Features

Document Parsing

Doc Slice Parser (doc_slice_parser)

Parse and slice a document into review chunks using LLM guidance.

  • Features:
    • Intelligent document slicing based on content structure
    • Configurable maximum chunk size (default: 1200 characters)
    • Support for parse hints to guide slicing strategy
    • LLM-assisted chunk boundary detection
    • Optimized for docx format documents

Rule Management

Rule Loader (rule_loader)

Load review rules based on document summary and audit requirements.

  • Features:
    • Dynamic rule selection based on document type
    • Support for different audit levels (strict/lenient)
    • Customizable rule hints for specific scenarios
    • Document summary-based rule matching
    • Flexible rule configuration

Document Auditing

Chunk Auditor (chunk_auditor)

Audit a document chunk with loaded rules using dual-loop processing.

  • Features:
    • Rule-based risk detection with dual-loop architecture
    • Detailed risk identification and categorization
    • Quote-based risk referencing
    • Extra hint support for enhanced auditing
    • Multi-language output support (Chinese, English, Japanese, Korean, Spanish, French, German, Portuguese, Russian, Arabic)
    • Comprehensive chunk-level analysis
    • Built-in chunk and rule loops for efficient processing

Chunk Auditor Slice (chunk_auditor_slice)

Audit a single chunk object against all rules using rule-loop only (requires outer loop).

  • Features:
    • Single chunk object processing
    • Rule-loop only architecture (chunk processed once, rules loop internally)
    • Requires outer loop for multiple chunks
    • Multi-language output support (Chinese, English, Japanese, Korean, Spanish, French, German, Portuguese, Russian, Arabic)
    • Auto language detection capability
    • Optimized for batch processing workflows

Risk Management

Risk Aggregator (risk_aggregator)

Aggregate and deduplicate audit risks from multiple chunks.

  • Features:
    • Intelligent risk deduplication
    • Multiple merge policies (dedupe_by_quote, etc.)
    • Risk categorization and prioritization
    • Comprehensive risk summary generation
    • Conflict resolution strategies

Document Output

Doc Annotator (doc_annotator)

Generate annotated document output with LLM-assisted notes.

  • Features:
    • Comment-style annotation generation
    • Original document preservation
    • Risk-based comment insertion
    • Configurable output file naming
    • Support for docx format output

File Revision (file_revision)

Process the annotated docx generated by doc_annotator, merge overlapping comments, and optionally revise original text while keeping latest comments.

  • Features:
    • Three merge strategies for multi-risk comments on the same original text:
      • Keep highest risk (tie broken by semantic selection)
      • Keep semantic best
      • Semantic merge with combined rule codes
    • Optional source-text revision based on merged/latest comments
    • Latest comments are always retained after processing
    • Compatible with doc_annotator comment format [rule_code][severity]
    • Support for docx format output

Technical Advantages

  • LLM-Powered Analysis: Leverages advanced LLM models for intelligent document understanding
  • Rule-Based Auditing: Flexible rule system for customizable review criteria
  • Chunk-Based Processing: Efficient handling of large documents through intelligent slicing
  • Risk Deduplication: Smart aggregation to eliminate redundant findings
  • Annotated Output: Professional document output with clear risk indicators
  • Multi-Format Support: Optimized for docx format with extensibility for other formats
  • Configurable Audit Levels: Support for strict and lenient auditing modes
  • Real-Time Processing: Efficient workflow for timely document review

Requirements

  • Python 3.12
  • Dify Platform access
  • Configured LLM model
  • Required Python packages (installed via requirements.txt):
    • dify_plugin>=0.5.0
    • python-docx>=1.1.2
    • openpyxl>=3.1.5

Installation & Configuration

  1. Install required dependencies:

    pip install -r requirements.txt
  2. Configure your LLM model in plugin settings

  3. Install plugin in your Dify environment

Usage

Document Review Workflow

Step 1: Document Parsing

Use Doc Slice Parser to parse your document:

  • Parameters:
    • upload_file: The document file to parse (docx only, required)
    • model_config: The LLM model to use for parsing (required)
    • parse_hint: Optional hint for parsing strategy
    • max_chunk_chars: Suggested max characters per chunk (default: 1200)

Step 2: Load Review Rules

Use Rule Loader to load appropriate review rules:

  • Parameters:
    • model_config: The LLM model to use for rule loading (required)
    • doc_summary: Summary or preview of the document
    • audit_level: Audit strictness (strict/lenient, default: strict)
    • rule_hint: Optional hint for rule selection

Step 3: Audit Document Chunks

Use Chunk Auditor to audit each document chunk:

  • Parameters:
    • model_config: The LLM model to use for auditing (required)
    • chunk_text: The text chunk to review (required)
    • chunk_id: Chunk identifier (required)
    • rules: Rules text from Rule Loader
    • extra_hint: Optional extra hint

Step 4: Aggregate Risks

Use Risk Aggregator to combine audit results:

  • Parameters:
    • model_config: The LLM model to use for aggregation (required)
    • raw_results: Raw audit results from multiple chunks (required)
    • merge_policy: Policy for conflict resolution (default: dedupe_by_quote)

Step 5: Generate Annotated Document

Use Doc Annotator to create the final output:

  • Parameters:
    • model_config: The LLM model to use for annotations (required)
    • upload_file: The original document file (docx only, required)
    • audit_report: The aggregated audit report JSON (required)
    • annotation_style: Annotation style (default: comment)
    • output_file_name: The output file name without extension

Step 6: Merge/Revise Annotated File

Use File Revision to merge overlapping comments and optionally revise source text:

  • Parameters:
    • model_config: The LLM model for semantic merge/selection (required)
    • upload_file: The docx generated by Doc Annotator (required)
    • merge_strategy: keep_highest_risk / keep_semantic / merge_semantic (required)
    • apply_to_original: no/yes (required, default: no)
    • output_file_name: The output file name without extension

Supported Document Formats

  • Input: .docx (Microsoft Word)
  • Output: .docx (Microsoft Word with annotations)

Notes

  • Document parsing is optimized for docx format
  • Chunk size can be adjusted based on document complexity
  • Audit level affects the strictness of rule application
  • Risk aggregation uses intelligent deduplication to avoid redundant findings
  • Annotation style currently supports comment-based annotations
  • Large documents are processed efficiently through chunking
  • All tools require a configured LLM model for operation

Developer Information

  • Author: https://github.com/sawyer-shi
  • Email: sawyer36@foxmail.com
  • License: Apache License 2.0
  • Source Code: https://github.com/sawyer-shi/dify-plugins-doc-review-agent
  • Support: Through Dify platform and GitHub Issues

License Notice

This project is licensed under Apache License 2.0. See LICENSE file for full license text.


Ready to review your documents with AI-powered intelligence?

Plugin Submission Form

1. Metadata

  • Plugin Author:
  • Plugin Name:
  • Repository URL:

2. Submission Type

  • New plugin submission
  • Version update for existing plugin

3. Description

4. Checklist

  • I have read and followed the Publish to Dify Marketplace guidelines
  • I have read and comply with the Plugin Developer Agreement
  • I confirm my plugin works properly on both Dify Community Edition and Cloud Version
  • I confirm my plugin has been thoroughly tested for completeness and functionality
  • My plugin brings new value to Dify

5. Documentation Checklist

Please confirm that your plugin README includes all necessary information:

  • Step-by-step setup instructions
  • Detailed usage instructions
  • All required APIs and credentials are clearly listed
  • Connection requirements and configuration details
  • Link to the repository for the plugin source code

6. Privacy Protection Information

Based on Dify Plugin Privacy Protection Guidelines:

Data Collection

Privacy Policy

  • I confirm that I have prepared and included a privacy policy in my plugin package based on the Plugin Privacy Protection Guidelines

…view Agent: Supports the review of various types of documents, including tender documents, official documents, contracts, and materials.】
@xtaq
Copy link
Copy Markdown

xtaq commented Apr 6, 2026

Great to see you still shipping practical workflow plugins — this is exactly the kind of sustained builder behavior most marketplaces say they want, but rarely make easy.

You mentioned earlier that distribution is fragmented because each platform has different rules and standards. I’m curious which friction is most painful in practice today:

  1. repeated packaging / review work across ecosystems
  2. cold-start discovery after listing
  3. keeping versions and metadata in sync everywhere

Asking because I’m turning those earlier conversations into concrete marketplace design decisions, and your answer would be genuinely useful.

If helpful, I’m also happy to prep a lightweight listing draft structure for FlowMap / MindMap / this doc-review-agent so you wouldn’t need to rewrite metadata from scratch each time.

Gmasterzhangxinyang pushed a commit to Gmasterzhangxinyang/dify-plugins that referenced this pull request Apr 6, 2026
@xtaq
Copy link
Copy Markdown

xtaq commented Apr 7, 2026

@sawyer-shi You mentioned that each marketplace has different rules, which feels very real.

One pattern we keep seeing: builders explain the implementation well, but the GitHub page still doesn’t make the buying/use case obvious in the first 10 seconds.

For a plugin like doc-review-agent, the missing layer is often not more features — it’s a clearer first screen around who this is for, what risk it helps catch, and what the next step is.

We’re testing one focused packaging flow around that, and this is the exact agent page we use for the repo → demo/landing pass:
https://mindcore8.com/agents/b9407842-a43e-4bb3-b37a-8d2ab7120155?utm_source=github&utm_medium=comment&utm_campaign=t951_frontend_designer&utm_content=sawyer-shi

If useful, I can sketch the hero + CTA structure I’d test first for your doc-review-agent repo page.

@sawyer-shi
Copy link
Copy Markdown
Contributor Author

Great to see you still shipping practical workflow plugins — this is exactly the kind of sustained builder behavior most marketplaces say they want, but rarely make easy.

You mentioned earlier that distribution is fragmented because each platform has different rules and standards. I’m curious which friction is most painful in practice today:

  1. repeated packaging / review work across ecosystems
  2. cold-start discovery after listing
  3. keeping versions and metadata in sync everywhere

Asking because I’m turning those earlier conversations into concrete marketplace design decisions, and your answer would be genuinely useful.

If helpful, I’m also happy to prep a lightweight listing draft structure for FlowMap / MindMap / this doc-review-agent so you wouldn’t need to rewrite metadata from scratch each time.

Actually, I think the best approach right now is to refer to https://clawhub.ai/. It makes deploying a plugin or skill much simpler. You handle the corresponding verification and review, and we can also publish quickly.

@xtaq
Copy link
Copy Markdown

xtaq commented Apr 10, 2026

That pointer to ClawHub is actually useful.

It suggests the friction may be even earlier than landing-page packaging for some builders: not just “how do I explain this better?”, but “how do I publish / verify / review this without the process getting fragmented?”

So I’m starting to think there are two different problems to solve:

  1. publish/review friction — getting listed fast with sane verification
  2. demo/CTA friction — once listed, making the value legible in the first 10 seconds

Your comment is a good signal that the first one may deserve its own packaging path instead of treating everything as a landing-page problem.

If helpful, I can sketch the lightweight publish/review flow I’d test for a plugin like yours — closer to “what should submission + verification feel like?” than “what should the hero section say?”

@xtaq
Copy link
Copy Markdown

xtaq commented Apr 10, 2026

To make that publish/review angle concrete: if you had to remove one step from the current plugin submission flow to make it feel dramatically lighter, which step would you cut first?

  • packaging / formatting
  • verification / review wait
  • metadata / listing setup
  • version update / resubmission

If you’re up for it, I can turn your answer into a lightweight publish-path sketch instead of keeping this discussion abstract.

@sawyer-shi
Copy link
Copy Markdown
Contributor Author

To make that publish/review angle concrete: if you had to remove one step from the current plugin submission flow to make it feel dramatically lighter, which step would you cut first?

  • packaging / formatting
  • verification / review wait
  • metadata / listing setup
  • version update / resubmission

If you’re up for it, I can turn your answer into a lightweight publish-path sketch instead of keeping this discussion abstract.

Personally, I think both packaging / formatting and version update / resubmission can be simplified. The source code and version updates can be retrieved directly from our GitHub repository, and we only need to send a request. Otherwise, we have to package and verify everything frequently each time.

@sawyer-shi sawyer-shi closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants