All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Data Pipeline (
data_pipeline.py): Composable ETL pipelines with stage-based processing.StageStatusandPipelineStatusenums for lifecycle tracking.ErrorStrategyenum (stop, skip, retry, default).StageResultwith timing, retries, success/failure tracking.PipelineResultwith stage aggregation, success_rate, summary.Stagewith conditional execution, error strategies, retry support.DataPipelinewith add/remove/enable/disable stages, before/after hooks, full error strategy handling.fan_out()parallel handler execution.fan_in()result combination.batch_process()with error strategies.ValidationRuleandDataValidationResultfor data validation. Functional transforms:map_data(),filter_data(),reduce_data(),group_by(),flatten(),distinct(),chunk(). Factories:create_data_pipeline(),create_validation_rule(). - Concurrency (
concurrency.py): Thread pool execution, task queues, and synchronization primitives.TaskStatusandWorkerStateenums.TaskResultwith timing, worker_id, success/failure.PoolStatswith success_rate, avg_task_ms.AtomicCounterthread-safe counter with increment/decrement/reset.AtomicValuethread-safe value holder with compare_and_set.Taskwith priority for priority queue ordering.TaskQueue(PriorityQueue-based) with timeout support.WorkerPoolwith configurable worker threads, submit with priority, wait with timeout, results tracking.parallel_map()concurrent list processing.Debouncerdelay-based call debouncing.Throttleinterval-based rate limiting.Onceexecute-once guarantee (thread-safe). Factories:create_worker_pool(),create_task_queue(),create_counter(),create_atomic(). - CLI Helpers (
cli_helpers.py): CLI argument parsing, output formatting, and progress indicators.Colorenum (11 ANSI codes).OutputFormatenum (plain, colored, json, markdown, quiet).Verbosityenum (5 levels).supports_color()terminal detection. Color helpers:colorize(),red(),green(),yellow(),blue(),cyan(),bold(),dim(). Text formatting:truncate(),pad_right(),pad_left(),indent_text(),format_table_simple(),format_key_value(),format_list(),format_size(),format_duration(),format_percentage().ProgressBarwith advance/render.Spinnerwith Braille animation.CLIArgandCLICommandfor argument definitions.parse_args()with type coercion (int/float/bool/list).format_help()help text generation.draw_box()Unicode box drawing. Factories:create_progress_bar(),create_spinner(). - Protocols (
protocols.py): Protocol definitions, type contracts, and algebraic types.Resulttype (Ok/Err) with unwrap, map,try_result().Optiontype (Some/Nothing) for nullable values.Eithertype (Left/Right) for disjoint unions. 6 runtime-checkable Protocol interfaces:Serializable,Renderable,Validatable,Disposable,Configurable,Identifiable. Type guards:is_serializable(),is_renderable(),is_validatable(),is_dict_like(),is_list_like(),is_callable(),is_numeric(),is_non_empty_string().Lazyevaluation with caching and reset.Pairtype with swap and to_tuple. Safe type conversions:safe_int(),safe_float(),safe_bool(),safe_str(). - New Exports: 109 new public API exports. Total public API: 464 exports.
- HTTP Middleware (
http_middleware.py): Full HTTP middleware pipeline for request/response processing.MiddlewarePhaseenum (request, response, error).RequestMethodenum (7 HTTP methods).Requestclass with method, url, headers, body, params, metadata,has_body(),set_header()/get_header(),to_dict().Responseclass with status_code, body, headers,is_success/is_redirect/is_client_error/is_server_error/is_errorproperties.MiddlewareEntrywith name, handler, priority, enabled, phase.MiddlewareStackwithuse()/remove()/enable()/disable(), priority-based chain building, error-phase middleware, response-phase middleware, request/error counters.RequestLoggerwith per-request timing, summary (avg/min/max ms). Built-in middleware factories:header_injection(),timeout_middleware(),retry_middleware()(configurable retries, status codes, backoff),auth_middleware(),content_type_middleware(),user_agent_middleware(). Factory helpers:create_response(),create_request(),create_middleware_stack(),create_logger(). - Multi-Format Serialization (
serialization.py): Serialize/deserialize data across 6 formats.Formatenum (JSON, YAML, TOML, CSV, XML, MARKDOWN_TABLE).SerializationResultandDeserializationResultwith size tracking and error reporting. JSON:to_json()/from_json()with indent, sort_keys, ensure_ascii. YAML: pure-Python serializer/parser (no external deps) with dict/list/value detection, boolean/null/number coercion. CSV:to_csv()/from_csv()with configurable delimiter and headers. XML:to_xml()/from_xml()with recursive serialization, entity escaping, custom root tags. Markdown tables:to_markdown_table()/from_markdown_table()with column alignment.detect_format()auto-detects format from content.convert()between any two formats.serialize()universal serializer.pretty_json()/minify_json()helpers. - Rate Limiter (
rate_limiter.py): API rate limiting with multiple strategies.LimiterStrategyenum (TOKEN_BUCKET, SLIDING_WINDOW, FIXED_WINDOW).LimitActionenum (REJECT, WAIT, THROTTLE).RateLimitInfowith remaining, limit, reset_at, retry_after, utilization,to_dict(),to_headers()(X-RateLimit-* headers).RateLimitStatswith total/allowed/rejected requests, rejection_rate, avg_wait_ms.TokenBucketwith configurable rate and capacity, thread-safe refill.SlidingWindowwith deque-based timestamp tracking.FixedWindowwith periodic reset.KeyedRateLimiterfor per-key rate limiting with auto-creation.@rate_limitdecorator.RateLimitExceededexception with info attribute. Factory functions:create_token_bucket(),create_sliding_window(),create_fixed_window(),create_keyed_limiter(). - Testing Utilities (
testing_utils.py): Test fixtures, mocks, assertions, and snapshot testing.AssertionModeenum (6 modes).TestFixturewith setup/teardown lifecycle and context manager.MockFunctionwith call recording,called_with(), side effects,set_return().MockSequencefor sequential return values. Sample data generators:sample_markdown(),sample_research_data(),sample_config(). Assertion helpers:assert_markdown_valid()(heading hierarchy, code fences),assert_contains_all(),assert_json_valid(),assert_word_count_range(),assert_no_duplicates().SnapshotStorefor comparison/regression testing with line-level diffs.TimingResultand timing helpers:time_execution(),assert_fast(). - New Exports: 72 new public API exports. Total public API: 355 exports.
- Template Engine (
template_engine.py): Full document template engine with Jinja2-like syntax.TokenTypeenum (17 types: text, variable, if/elif/else/endif, for/endfor, block/endblock, extends, include, macro/endmacro, call_macro, comment, raw).TokenandTemplateContextwith dot-notation access, scoped variables, macro support. 16 built-in filters (upper, lower, title, strip, default, length, join, replace, truncate, first, last, reverse, sort, unique, capitalize, wordcount). Expression evaluator with pipe filters, comparisons, boolean operators.render_template()with variable substitution, conditionals (if/elif/else), for loops (loop.index/first/last/length), template inheritance (extends/block), includes, macros, comments, raw blocks.validate_template(),extract_variables(),list_filters(),create_context(). Presets:report_template(),comparison_template(). - Content Security (
security.py): Security scanning, sanitization, and content policy enforcement.ThreatLevelenum (5 levels: none, low, medium, high, critical).ThreatTypeenum (8 types: xss, injection, secret_leak, unsafe_url, path_traversal, ssrf, pii, malicious_content).SecurityFindingandSecurityReportwith markdown output, severity/type grouping.ContentPolicywith configurable rules (max_length, allow_html/scripts/iframes/data_urls, blocked_domains, require_https).sanitize_html()removes scripts/event handlers/javascript URIs.detect_secrets()catches API keys, AWS keys (AKIA...), GitHub tokens, passwords, private keys, JWTs, DB connection strings.detect_pii()catches emails, phone numbers, SSNs, credit cards.validate_url()with SSRF detection (private IPs/localhost), blocked domains, HTTPS enforcement.check_path_traversal(),sanitize_markdown(),redact_text(),content_hash(),generate_token(),constant_time_compare(),mask_secret().scan_content()comprehensive scan.strict_policy()andrelaxed_policy()presets. - Advanced Caching (
caching.py): Multi-tier caching with eviction policies.EvictionPolicyenum (LRU, LFU, FIFO, TTL).CacheEntrywith TTL, access tracking, expiration.CacheStatswith hit/miss/eviction counters, hit_rate.Cacheclass with get/set/delete/has/clear, per-key TTL override, auto-eviction on max_size,purge_expired().TieredCache(L1/L2) with automatic promotion to higher tiers on access.ComputeCachewith auto-compute on miss via loader function.cache_key()deterministic key generation.@memoizedecorator with configurable max_size and TTL.create_cache(),create_tiered_cache(),create_compute_cache()factories. - Diagnostics & Health (
diagnostics.py): System diagnostics, health checks, and performance profiling.HealthStatusenum (healthy/degraded/unhealthy/unknown).CheckCategoryenum (system/dependency/config/network/performance/storage).CheckResultandDiagnosticReportwith markdown/dict output, category/status grouping.Profilerclass for benchmarking functions with min/max/avg/median timing.EnvironmentInfocollection with secret masking.DependencyStatuschecker for required/optional packages.run_diagnostics()full diagnostic suite.quick_check()lightweight health check.self_test()verifies core deepworm functionality (readability, keywords, scoring).create_profiler()factory. - New Exports: 57 new public API exports. Total public API: 283 exports.
- Text Transform (
transform.py): Comprehensive text transformation utilities.TransformTypeenum (case/whitespace/markdown/replace/structure/custom).TransformResultwith change tracking and diff_ratio. Case transforms:to_title_case()with small-word awareness,to_sentence_case(). Whitespace:normalize_whitespace()(collapse blank lines),fix_indentation(). Markdown:normalize_headings(),strip_html(),normalize_links(),strip_comments(). Search/replace:find_replace()with regex and case-insensitive support,find_replace_batch(). Structure:wrap_text(),extract_section(),remove_section(),reorder_sections().TransformChainfor composable multi-step transforms.cleanup_transform()preset chain. - Audit Trail (
audit.py): Change tracking and audit logging for document operations.AuditActionenum (10 actions: create, update, delete, read, export, validate, approve, reject, archive, restore).AuditLevelenum (debug, info, warning, error, critical).AuditEntrywith SHA-256 checksum, UUID entry_id, ISO timestamps.AuditPolicywith configurable actor/detail requirements, max entries, min level.AuditReportwith markdown output, grouping by action/actor/level.AuditLogwith query filters, listeners, export (JSON/text).strict_audit_policy()andminimal_audit_policy()presets. - Markdown Formatter (
formatter.py): Advanced markdown formatting and normalization.ListStyle(dash/asterisk/plus),EmphasisStyle,TableAlignment(left/center/right).FormatOptionsfor configurable formatting.normalize_lists(),sort_list(),format_table()with column alignment.normalize_emphasis()(asterisk/underscore conversion).normalize_code_fences(),add_language_labels()with open/close tracking.normalize_blockquotes(),add_heading_ids()with slug generation.ensure_blank_lines_around_headings(),format_document()full pipeline.create_format_options(). - Knowledge Graph (
graph.py): Document relationship analysis and knowledge graph construction.EdgeTypeenum (10 types: references, contains, related_to, depends_on, parent_of, child_of, cites, similar_to, precedes, follows).NodeandEdgedata structures with metadata.KnowledgeGraphwith add/remove nodes and edges, filtering. Path finding:has_path(),shortest_path()(BFS). Analysis:connected_components(),topological_sort()(with cycle detection).subgraph()extraction,stats()(density, components, avg_degree). Export:to_dict(),to_mermaid(),to_dot()(Graphviz). Helpers:extract_concept_graph()from headings,extract_link_graph()from markdown links.create_graph(),merge_graphs()with node deduplication. - New Exports: 55 new public API exports. Total public API: 226 exports.
- Content Compliance (
compliance.py): Style guide enforcement and content quality checking.Severityenum (error/warning/info/suggestion).IssueCategoryenum (formatting/style/consistency/content/accessibility/structure).ComplianceReportwith score (0-100), category/severity grouping, markdown output.StyleGuidewith configurable rules (sentence/paragraph length, banned/preferred words, require intro/conclusion). 13 built-in checks: sentence length, paragraph length, heading hierarchy, alt text, banned words, preferred words, passive voice, weasel words, clichés, redundant phrases, formatting, consecutive headings, structure.academic_style_guide()andtechnical_style_guide()presets. - Internationalization (
i18n.py): Multi-language support and translation management.TranslationEntrywith locale-aware fallback.TranslationCatalogwith PO and JSON export, coverage statistics.LanguageDetectionwith confidence scoring and script identification.detect_language()for 12 languages (en, tr, de, fr, es, pt, it, ja, zh, ko, ar, ru). 8 script types (Latin, CJK, Hiragana, Katakana, Hangul, Arabic, Cyrillic, Devanagari).extract_translatable()for markdown.create_catalog()andmerge_catalogs(). - Document Schema Validation (
schema.py): Structured data validation and document schema enforcement.FieldTypeenum (string, integer, float, boolean, list, dict, date, url, email, markdown).SchemaFieldwith constraint-based validation (min/max length, pattern, choices, value range).SectionRulefor heading presence and word count constraints.DocumentSchemawithvalidate_data(),validate_document(),to_json_schema()export.report_schema()andarticle_schema()presets.create_schema()dict-based construction helper. - Pipeline Hooks & Middleware (
hooks.py): Lifecycle hooks for document processing pipelines.HookStageenum with 10 stages (pre/post research, analysis, generation, export, error, complete).HookContextwith data store and cancellation.HookRegistrywith register/unregister, enable/disable, priority ordering.Pipelinewith composable multi-step processing and automatic hook integration.PipelineResultwith timing and error aggregation.create_middleware()before/after wrapper.@hookdecorator for global registry. - New Exports: 35 new public API exports. Total public API: 171 exports.
- Word Cloud & Frequency Analysis (
wordcloud.py): Generate word frequency data and cloud visualizations.WordFrequencydataclass with count, frequency, rank, TF-IDF, weight.WordCloudDatawith multiple output formats (markdown table, inline HTML cloud, CSV, size map).generate_word_cloud()with 130+ built-in stop words, configurable max_words, min_length, min_count, custom stop words.compare_word_clouds()for frequency distribution comparison.tfidf_cloud()for multi-document TF-IDF analysis. Markdown stripping and code block/URL removal in tokenizer. - Document Revision Tracking (
revisions.py): Track changes between document versions with full history management.Revisionwith SHA-256 content hashing, word/line counts.RevisionDiffwith LCS-based diff algorithm, unified diff output, markdown format.RevisionHistorywith add/get/rollback/changelog/statistics.compute_diff()with modification detection (adjacent delete+add merging).track_changes()for quick two-version comparison.merge_revisions()with chronological ordering and deduplication. - Comprehensive Statistics (
statistics.py): 25+ document metrics with markdown awareness.TextStatisticscovering characters, words, sentences, paragraphs, vocabulary richness, hapax legomena, reading/speaking time (238/150 WPM).compare_statistics()for side-by-side document comparison with diff.vocabulary_analysis()with frequency distribution, rare words, type-token ratio.section_statistics()for per-heading breakdown.reading_level()with Flesch-Kincaid Grade Level and Automated Readability Index. - Table of Contents (
toc.py): Generate, customize, and inject table of contents from markdown headings.TocEntrywith auto-anchor slugification, depth tracking.TableOfContentswith flat view, level filtering, max_depth. Multiple output formats: markdown, numbered markdown (hierarchical 1, 1.1, 1.2), HTML.extract_toc()with duplicate anchor handling.inject_toc()with marker-based or auto-placement insertion.merge_tocs()for combining multiple ToCs. - New Exports: 27 new public API exports. Total public API: 136 exports.
- Timeline Extraction (
timeline.py): Extract dates and events from reports to build chronological timelines. 7 date pattern types (ISO, full date, month-year, quarter, decade, century, year references). Auto-categorization (technology, business, science, policy, milestone).Timelinewith sort, filter, merge, deduplicate.compare_timelines()for overlap analysis. Markdown list, table, and dict output. - Bibliography Management (
references.py): Structured reference management with APA, MLA, and BibTeX formatting.Referencewith citation_key, author_string.Bibliographywith add/find/sort/deduplicate, group by type.extract_references()detects inline citations, markdown links, bare URLs, DOIs, Author (Year) patterns.inject_bibliography()andmerge_bibliographies(). - Sentiment Analysis (
sentiment.py): Lexicon-based sentiment analysis with negation handling and intensifier detection.SentimentScorewith positive/negative/compound scores.ToneAnalysiswith formality, objectivity, 6 bias patterns.analyze_report_sentiment()with section and sentence breakdowns.sentiment_diff()for comparing text sentiment. - Cross-Referencing (
crossref.py): Detect, create, and validate internal cross-references.CrossRefIndexwith targets (section, figure, table) and links.build_crossref_index()scans{#label}and{@label}syntax.inject_crossrefs()replaces references with formatted display.generate_list_of_figures()andgenerate_list_of_tables(). Validation for unresolved references and duplicate labels. - New Exports: 25 new public API exports including
Timeline,TimelineEvent,extract_timeline,create_timeline,compare_timelines,Reference,Bibliography,extract_references,create_reference,inject_bibliography,merge_bibliographies,SentimentScore,SentimentReport,ToneAnalysis,analyze_sentiment,analyze_tone,analyze_report_sentiment,sentiment_diff,CrossRefIndex,CrossRefTarget,build_crossref_index,inject_crossrefs,generate_list_of_figures,generate_list_of_tables. Total public API: 109 exports.
- Glossary Extraction (
glossary.py): Automatic glossary generation from research reports. 5 definition patterns ("defined as", "refers to", "which is", "i.e.", em-dash), abbreviation detection (e.g., "Natural Language Processing (NLP)"), compound term extraction from headings.Glossarywith add/get/remove/sort (alphabetical, frequency, occurrence).inject_glossary()appends formatted glossary section. Markdown table and definition list output. - Text Similarity Analysis (
similarity.py): Three similarity metrics — cosine similarity (TF vectors), Jaccard similarity (set overlap), overlap coefficient.compare_texts()combines all metrics withis_similar(>0.6) andis_duplicate(>0.85) thresholds.detect_plagiarism()via common n-gram sequences.find_similar()corpus search.text_fingerprint()for document fingerprinting. - Report Annotations (
annotations.py): 6 annotation types — comment, highlight, question, todo, warning, fact_check.AnnotationSetwith add/resolve/filter/summary.annotate_report()with inline HTML comments or append styles.extract_annotations()parses HTML comment and CriticMarkup ({>> <<}) formats.auto_annotate()detects vague language, unsupported statistics, and TODO markers. - Batch Research (
batch.py): Run multiple research tasks sequentially withcreate_batch()andrun_batch().BatchConfigwith stop_on_error, retry_failed (configurable max_retries), delay_between tasks.BatchResultwith success_rate,combine_reports(), markdown summary.batch_from_file()loads topics from text files. - New Exports: 20 new public API exports including
AnnotationSet,AnnotationType,annotate_report,auto_annotate,extract_annotations,BatchConfig,BatchResult,BatchStatus,BatchTask,create_batch,run_batch,Glossary,GlossaryEntry,extract_glossary,inject_glossary,SimilarityResult,compare_texts,cosine_similarity,detect_plagiarism,find_similar. Total public API: 85 exports.
- Keyword Extraction (
keywords.py): TF-based keyword and keyphrase extraction from reports. Bigram/trigram phrase detection, stop word filtering, deduplication of subsumed terms,extract_tags()for short tag generation.KeywordResultwith markdown table output. - Footnote Management (
footnotes.py): Convert inline citations and markdown links to numbered footnotes. Three render styles (markdown, endnotes, inline).renumber_footnotes()fixes gaps,strip_footnotes()removes all markers,merge_footnotes()combines multiple results. - Unified Export Hub (
export.py): Single interface for multi-format report export — Markdown (with ToC), HTML (responsive CSS), JSON (structured sections), plain text (word-wrapped), Notion (block API), CSV.batch_export()for exporting to multiple formats at once. - Summary & Abstract Generator (
summary.py): 4 summarization styles — executive, abstract, bullets, TLDR.extract_key_findings()with importance scoring (8 signal patterns).extract_topics()from report headings. - Readability Analysis (
readability.py): 4 readability formulas — Flesch Reading Ease, Flesch-Kincaid Grade, Gunning Fog, Coleman-Liau. Vocabulary richness, reading level classification, markdown stripping. - Progress Tracking (
progress.py): Real-time research progress with 10 stages, callback support, ETA estimation, progress bar formatting. - New Exports: 25 new public API exports including
Keyword,KeywordResult,extract_keywords,extract_tags,FootnoteResult,add_footnotes,merge_footnotes,renumber_footnotes,strip_footnotes,ExportFormat,ExportOptions,ExportResult,export_report,batch_export,Summary,summarize,extract_key_findings,extract_topics,ReadabilityResult,analyze_readability,ProgressTracker,ProgressSnapshot,ResearchStage. Total public API: 65 exports.
- Report Outline Generation: Structured outline creation with 3 styles — comprehensive (6 sections), brief (3 sections), academic (8 sections with Abstract, Literature Review, Methodology). Comparison-aware section generation for "vs" topics. Reverse-engineering outlines from existing reports via
outline_from_report(). - Source Credibility Scoring: Multi-factor credibility assessment for web sources. 3-tier domain authority system, content quality analysis (research language, references, spam detection), freshness scoring, URL structure signals.
CredibilityReportwith markdown table output. - Notion Export: Convert markdown reports to Notion API block format. Supports headings, paragraphs, code blocks, quotes, lists, tables, dividers. Rich text parsing with bold, italic, inline code, and links. Roundtrip support via
notion_to_markdown(). - Progress Tracking: Real-time research progress tracking with 10 research stages, callback support, ETA estimation, and progress bar utilities.
ProgressTrackerwithProgressSnapshotfor serializable progress state. - Environment Variable Config Overrides:
DEEPWORM_*environment variables (e.g.,DEEPWORM_DEPTH=5,DEEPWORM_PROVIDER=anthropic) override config file settings.Config.from_env()classmethod for explicit env-based configuration. - Retry Strategies: Advanced retry decorator with 4 backoff strategies (exponential, linear, constant, exponential_jitter). Circuit breaker pattern with closed/open/half-open states and auto-recovery.
- Markdown Link Checker: Extract and validate links from markdown reports.
LinkReportwith health scoring, broken link detection, and markdown output. - New Exports:
CredibilityScore,CredibilityReport,score_source,score_sources,NotionBlock,NotionPage,export_notion_json,markdown_to_notion,OutlineSection,ReportOutline,generate_outline,outline_from_reportadded to public API.
- Research Planner (
--plan,--plan-only): Pre-research topic analysis that generates a structured research plan with sub-questions, key aspects, complexity estimation, and suggested depth/breadth settings. Uses LLM for intelligent analysis with heuristic fallback. - YAML Config Support: Load configuration from
deepworm.yaml,deepworm.yml,.deepworm.yaml, or.deepworm.ymlfiles. Supports both flat and nested (deepworm:key) formats. - Config File Flag (
--config FILE): Load configuration from a specific TOML or YAML file. - Topic Validator: Automatic topic validation before research — catches empty/too-short/too-long topics, detects vague or overly broad topics, normalizes whitespace, provides improvement suggestions.
- Markdown Table Generation: Utility module for creating well-formatted markdown tables from lists of dicts, key-value pairs, or CSV data. Supports column alignment, transposition, and CSV import/export.
- Content Extraction: Advanced HTML content extraction with metadata (title, author, date, description), heading/link/code block extraction, reading time estimation, and content quality scoring.
- New Exports:
ResearchPlan,generate_plan,estimate_complexity,ValidationResult,validate_topicadded to public API. - PyYAML added as optional dependency (
pip install deepworm[yaml]).
- Config Validation: All configuration values are now validated on creation. Invalid provider, depth, breadth, temperature, or search settings raise clear
ValueErrormessages. - Report Quality Scoring (
--score): Automated report quality assessment across 5 dimensions — structure, depth, sources, readability, completeness — with letter grades (A+ to F) and improvement suggestions. - Research Metrics (
--metrics): Detailed instrumentation tracking: timing breakdown (search/analysis/synthesis), API call counts, fetch success rates, duplicate detection stats, retry counts, and error tracking. - Rate Limiting: Built-in rate limiter for LLM API calls (
max_requests_per_minuteconfig option) prevents hitting provider limits. - Research Timeout (
--timeout SECONDS): Set a time budget for research; automatically proceeds to synthesis when budget expires. - Section Filtering (
--sections PATTERN): Filter report output to only sections matching a regex pattern. - Parallel Search: Search queries now execute concurrently (up to 4 workers), significantly speeding up the research phase.
- Report Diffing (
--diff OLD NEW): Compare two report files side-by-side with unified diff, added/removed line counts, and similarity ratio. - Report Analysis: Table of contents generation (
--toc), report statistics (--stats), link extraction, and report summaries. - Research Resume (
--resume [FILE]): Resume interrupted research from saved session files. Use--resume autoto find and resume the latest in-progress session. - Logging Module (
--log-file,--log-level): Structured logging with configurable levels and optional file output for debugging. - Link Extraction: Extract all links from reports (inline markdown, bare URLs) with deduplication.
- Report Summary: Auto-extract a brief summary from the report's first content paragraph.
- New modules:
scoring.py,metrics.py,diff.py,log.py - 258 tests (up from 203)
- Follow-up Questions: Auto-generated follow-up questions appended to research reports. Disable with
--no-followup. - Interactive Mode (
--interactive/-i): Post-research Q&A loop that lets you ask follow-up questions about the report using the same LLM. - Clipboard Export (
--copy): Copy the research report directly to your system clipboard (macOS, Linux, Windows). - Multi-Language Support (
--lang CODE): Generate reports in 17 languages including English, Turkish, German, French, Spanish, Portuguese, Italian, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Swedish, and Ukrainian. Use--list-languagesto see all options. - Research Chaining (
--chain N): Run progressive deep-dive research that builds on previous findings. Each step identifies the most important sub-topic to explore next. - Configuration Profiles: Save and reuse research configurations with
--save-profile,--profile,--list-profiles,--delete-profile. - Source Export (
--export-sources FILE): Export discovered sources as JSON, CSV, or BibTeX for external citation management. - Source Import: Programmatic API to import previously exported sources.
- New modules:
languages.py,chain.py,profiles.py,sources.py - 203 tests (up from 154)
0.2.0 - 2025-01-20
- Research history: Persistent JSONL log of all completed research (
--history,--history-search,--history-stats,--history-clear) - Custom exceptions:
DeepWormErrorhierarchy with user-friendly messages and hints (APIKeyError,RateLimitError,ProviderError,ConfigError, etc.) - Citation formatting: APA, MLA, Chicago, and BibTeX citation styles with auto-publisher detection (
deepworm.citations) - Plugin/hook system: 6 hook types for pipeline customization (
transform_queries,filter_source,post_analysis,post_report,pre_search,post_search) - Structured event system:
EventEmitterwith 13 event types for progress tracking - Async research API:
AsyncResearcherandasync_research()for web framework integration - HTML export: Responsive reports with dark mode CSS (
--format htmlor.htmlextension) - Multiple search providers: Brave Search API and SearXNG in addition to DuckDuckGo (
--search-provider) - TOML config file support:
deepworm.toml,.deepworm.toml,pyproject.toml [tool.deepworm] - Disk cache: Cached search results and page content with 24h TTL (
--no-cache,--clear-cache) - Streaming output: Real-time report generation (
--stream) - Session save/resume: Auto-save after each iteration
- Source quality scoring: Domain authority heuristics and keyword overlap ranking
- Retry decorator:
@retry()with exponential backoff, exception filtering, and callbacks - Text utilities:
chunk_text()for splitting long documents,sanitize_filename() - Thread-safe rate limiter
- 5 new example scripts: FastAPI server, plugin usage, event monitoring, async research, HTML export
- 132 tests (up from 36 in v0.1.0)
- Improved CLI error handling with friendly messages and
--debugtraceback - LLM client validates API keys at initialization with clear error messages
- Research engine records all completed research to persistent history
0.1.0 - 2025-01-20
- Initial release
- Core research engine with iterative deep search loop
- Multi-provider LLM support (OpenAI, Anthropic, Google, Ollama)
- Web search via DuckDuckGo (with HTML fallback)
- Concurrent page fetching with ThreadPoolExecutor
- CLI with interactive mode
- Python API (
research(),DeepResearcher) - Comparison mode (
--compare) for multi-topic research - Persona mode (
--persona) for research perspective tuning - JSON output (
--json) for programmatic usage - Debug logging (
--debug) - Report export to Markdown, plain text, or JSON
- Retry logic with exponential backoff for LLM calls
- Source relevance tracking
- Per-iteration and total timing display
- 36 tests with CI across Python 3.9–3.13
- CONTRIBUTING.md