Releases: hashangit/Extract2MD
v2.0.0
Extract2MD v2.0.0 - Major Release
🚀 Full Redesign & Complete API Overhaul
Release Date: 24-05-2025
Version: 2.0.0 (Breaking Changes)
Migration Support: Legacy API maintained for transition period
📋 Release Overview
Extract2MD v2.0.0 represents a complete reimagining of the library with a focus on developer experience, intuitive usage patterns, and modern architecture. This major release introduces a revolutionary scenario-based API that replaces the complex instance-based approach with clear, purpose-driven methods.
Core Philosophy: Instead of configuring complex options, developers now choose from 5 distinct conversion scenarios that match their specific use cases.
⚠️ Breaking Changes
API Complete Redesign
- Old: Instance-based API with complex configuration options
- New: Static methods with scenario-based approach
- Impact: All existing integrations require updates
- Migration: Legacy API available as
LegacyExtract2MDConverterduring transition
Configuration Changes
- Old: Loose configuration object with numerous optional parameters
- New: Structured configuration with validation and default merging
- Impact: Configuration structure has changed significantly
- Migration: Use
ConfigValidatorfor seamless config handling
Import/Export Changes
- Old: Single converter class export
- New: Modular exports with main converter and utilities
- Impact: Import statements need updating
- Migration: Update imports and follow new module structure
✨ New Features
🎯 Scenario-Based API
Five distinct conversion methods designed for specific use cases:
1. Quick Only - Extract2MDConverter.quickOnly()
- Purpose: Fast PDF.js-based text extraction
- Best For: Clean PDFs with selectable text
- Performance: Fastest option, minimal processing
- Use Case: Documentation, reports, digital-native PDFs
2. High Accuracy OCR Only - Extract2MDConverter.highAccuracyOCROnly()
- Purpose: Tesseract OCR with canvas rendering
- Best For: Scanned documents, images, complex layouts
- Performance: Slower but highly accurate
- Use Case: Scanned books, historical documents, printed materials
3. Quick + LLM - Extract2MDConverter.quickPlusLLM()
- Purpose: Fast extraction enhanced with AI processing
- Best For: PDFs needing structure improvement
- Performance: Moderate, WebGPU accelerated
- Use Case: Business documents, formatted reports
4. High Accuracy + LLM - Extract2MDConverter.highAccuracyPlusLLM()
- Purpose: OCR processing with AI enhancement
- Best For: Complex documents requiring both OCR and AI
- Performance: Comprehensive, highest quality
- Use Case: Academic papers, technical documents
5. Combined + LLM - Extract2MDConverter.combinedPlusLLM()
- Purpose: All extraction methods with AI post-processing
- Best For: Maximum accuracy and formatting
- Performance: Most thorough, longest processing time
- Use Case: Critical documents, archival processing
🧩 Modular Architecture
Complete internal refactoring into specialized modules:
Extract2MDConverter.js- Main converter with scenario methodsWebLLMEngine.js- Encapsulated LLM integrationConfigValidator.js- Configuration validation and defaultsOutputParser.js- LLM output cleaning and formattingSystemPrompts.js- Centralized prompt management
📚 Comprehensive Documentation Suite
New Documentation Files:
MIGRATION.md- Step-by-step migration guide with code examplesDEPLOYMENT.md- Complete deployment guide for all environmentsconfig.example.json- Full configuration example- Updated
README.md- Rewritten for new API
Interactive Examples:
demo.html- Live interactive demo showcasing all 5 scenariosusage-examples.js- Updated code examples for new API- SSL certificates - Demo server setup for local testing
⚙️ Enhanced Configuration System
- Structured Configuration Object with clear hierarchy
- Built-in Validation with
ConfigValidatorutility - JSON Configuration Support for external config files
- Default Value Merging for simplified setup
- Type Safety with comprehensive TypeScript definitions
🧪 Robust Testing Framework
New comprehensive test suite:
scenarios.test.js- Tests for all 5 scenario methodssimple.test.js- Basic structure validationnewline-optimization.test.js- Markdown formatting testssimple-newline.test.js- Standalone newline processing testsvalidate-deployment.js- Deployment readiness validation
🔧 Technical Improvements
Build System Enhancements
- Dual Bundle Generation: UMD and ESM formats
- Optimized Distribution: Essential workers and definitions copied to dist
- Updated Entry Points: Proper main, module, and types configuration
- Enhanced Packaging: Improved file inclusion/exclusion
TypeScript Integration
- Complete Type Definitions in
src/types/index.d.ts - Scenario Method Types with proper return types and parameters
- Configuration Interfaces for type-safe config handling
- Legacy Compatibility Types for migration support
Performance Optimizations
- WebGPU Capability Detection for LLM scenarios
- Modular Loading reduces initial bundle size
- Optimized Canvas Rendering for OCR processing
- Streaming LLM Support for better user experience
Developer Experience
- Clear Error Messages with improved error handling
- Progress Tracking across all conversion scenarios
- Intuitive Method Names that clearly indicate functionality
- Consistent Return Formats across all scenarios
🛤️ Migration Guide
Immediate Steps
- Install v2.0.0:
npm install extract2md@2.0.0 - Use Legacy API: Replace
Extract2MDConverterwithLegacyExtract2MDConverter - Test Functionality: Ensure existing code works with legacy API
- Plan Migration: Review
MIGRATION.mdfor upgrade path
Recommended Migration Process
- Identify Usage Patterns: Determine which scenarios match your current usage
- Update Configuration: Migrate to new structured config format
- Replace Method Calls: Switch to appropriate scenario-based methods
- Update Error Handling: Adapt to new error formats
- Test Thoroughly: Validate output quality and performance
Timeline
- v2.0.0 - v2.x.x: Legacy API available alongside new API
- v3.0.0: Legacy API will be removed (future major release)
- Recommended: Migrate within 1 months for best support
📦 Installation & Deployment
NPM Installation
npm install extract2md@2.0.0Import Examples
// New API (recommended)
import { Extract2MDConverter } from 'extract2md';
// Legacy API (for migration)
import { LegacyExtract2MDConverter } from 'extract2md';
// Utilities
import { ConfigValidator, OutputParser } from 'extract2md';Deployment Options
- Node.js Applications: Full feature support
- Web Applications: Browser-compatible with WebWorkers
- CDN Distribution: Direct browser usage
- Static Sites: Pre-built bundle integration
🌟 What's New in Detail
WebLLM Engine Integration
- Standalone Engine Class for better modularity
- Streaming Support for real-time processing feedback
- Model Loading Management with error handling
- WebGPU Optimization for enhanced performance
Output Processing Pipeline
- Thinking Tag Removal from LLM outputs
- Markdown Normalization for consistent formatting
- Newline Optimization for better readability
- Post-processing Hooks for custom transformations
Configuration Validation
- Schema-based Validation with clear error messages
- Default Value Injection for missing configuration
- Type Coercion for flexible config input
- JSON File Support for external configuration
Enhanced Error Handling
- Scenario-specific Errors with context information
- Validation Errors with field-level details
- Processing Errors with progress context
- Recovery Suggestions for common issues
🔮 Looking Forward
Planned Enhancements
- Additional Scenarios based on user feedback
- Performance Optimizations for large document processing
- Enhanced LLM Models support and configuration
- Advanced Output Formats beyond Markdown
Community & Support
- Migration Support: Comprehensive documentation and examples
- Community Feedback: Open to suggestions for new scenarios
- Regular Updates: Incremental improvements and bug fixes
- Long-term Support: Commitment to stable API evolution
📞 Support & Resources
- Migration Guide:
MIGRATION.md- Complete migration instructions - Deployment Guide:
DEPLOYMENT.md- Production deployment best practices - Interactive Demo:
examples/demo.html- Try all scenarios - Configuration Example:
config.example.json- Complete config reference - Type Definitions: Full TypeScript support included
🙏 Acknowledgments
This major release represents months of development focused on creating the most intuitive and powerful PDF-to-Markdown conversion experience. Thank you to all contributors and early adopters who provided feedbac...
