# PyWebLayout Testing Strategy This document outlines the comprehensive unit testing strategy for the pyWebLayout project. ## Testing Philosophy The testing strategy follows these principles: - **Separation of Concerns**: Each component is tested independently - **Comprehensive Coverage**: All public APIs and critical functionality are tested - **Integration Testing**: End-to-end workflows are validated - **Regression Prevention**: Tests prevent breaking changes - **Documentation**: Tests serve as living documentation of expected behavior ## Test Organization ### Current Test Files (Implemented) #### ✅ `test_html_style.py` Tests the `HTMLStyleManager` class for CSS parsing and style management. **Coverage:** - Style initialization and defaults - Style stack operations (push/pop) - CSS property parsing (font-size, font-weight, colors, etc.) - Color parsing (named, hex, rgb, rgba) - Tag-specific default styles - Inline style parsing - Font object creation - Style combination (tag + inline styles) #### ✅ `test_html_text.py` Tests the `HTMLTextProcessor` class for text buffering and word creation. **Coverage:** - Text buffer management - HTML entity reference handling - Character reference processing (decimal/hex) - Word creation with styling - Paragraph management - Text flushing operations - Buffer state operations #### ✅ `test_html_content.py` Integration tests for the `HTMLContentReader` class covering complete HTML parsing. **Coverage:** - Simple paragraph parsing - Heading levels (h1-h6) - Styled text (bold, italic) - Lists (ul, ol, dl) - Tables with headers and cells - Blockquotes with nested content - Code blocks with language detection - HTML entities - Nested element structures - Complex document parsing #### ✅ `test_abstract_blocks.py` Tests for the core abstract block element classes. **Coverage:** - Paragraph word management - Heading levels and properties - Quote nesting capabilities - Code block line management - List creation and item handling - Table structure (rows, cells, sections) - Image properties and scaling - Simple elements (hr, br) #### ✅ `test_runner.py` Test runner script for executing all tests with summary reporting. --- ## Additional Tests Needed ### 🔄 High Priority (Should Implement Next) #### `test_abstract_inline.py` Tests for inline elements and text formatting. **Needed Coverage:** - Word creation and properties - Word hyphenation functionality - FormattedSpan management - Word chaining (previous/next relationships) - Font style application - Language-specific hyphenation #### `test_abstract_document.py` Tests for document structure and metadata. **Needed Coverage:** - Document creation and initialization - Metadata management (title, author, language, etc.) - Block addition and management - Anchor creation and resolution - Resource management - Table of contents generation - Chapter and book structures #### `test_abstract_functional.py` Tests for functional elements (links, buttons, forms). **Needed Coverage:** - Link creation and type detection - Link execution for different types - Button functionality and state - Form field management - Form validation and submission - Parameter handling #### `test_style_system.py` Tests for the style system (fonts, colors, alignment). **Needed Coverage:** - Font creation and properties - Color representation and manipulation - Font weight, style, decoration enums - Alignment enums and behavior - Style inheritance and cascading ### 🔧 Medium Priority #### `test_html_elements.py` Unit tests for the HTML element handlers. **Needed Coverage:** - BlockElementHandler individual methods - ListElementHandler state management - TableElementHandler complex scenarios - InlineElementHandler link processing - Handler coordination and delegation - Error handling in handlers #### `test_html_metadata.py` Tests for HTML metadata extraction. **Needed Coverage:** - Meta tag parsing - Open Graph extraction - JSON-LD structured data - Title and description extraction - Language detection - Character encoding handling #### `test_html_resources.py` Tests for HTML resource extraction. **Needed Coverage:** - CSS stylesheet extraction - JavaScript resource identification - Image source collection - Media element detection - External resource resolution - Base URL handling #### `test_io_base.py` Tests for the base reader architecture. **Needed Coverage:** - BaseReader interface compliance - MetadataReader abstract methods - ContentReader abstract methods - ResourceReader abstract methods - CompositeReader coordination ### 🔍 Lower Priority #### `test_concrete_elements.py` Tests for concrete rendering implementations. **Needed Coverage:** - Box model calculations - Text rendering specifics - Image rendering and scaling - Page layout management - Functional element rendering #### `test_typesetting.py` Tests for the typesetting system. **Needed Coverage:** - Flow algorithms - Pagination logic - Document pagination - Line breaking - Hyphenation integration #### `test_epub_reader.py` Tests for EPUB reading functionality. **Needed Coverage:** - EPUB file structure parsing - Manifest processing - Chapter extraction - Metadata reading - Navigation document parsing #### `test_integration.py` End-to-end integration tests. **Needed Coverage:** - Complete HTML-to-document workflows - EPUB-to-document workflows - Style application across parsers - Resource resolution chains - Error handling scenarios ## Testing Infrastructure ### Test Dependencies ```python # Required for testing unittest # Built-in Python testing framework unittest.mock # For mocking and test doubles ``` ### Test Data - Create `tests/data/` directory with sample files: - `sample.html` - Well-formed HTML document - `complex.html` - Complex nested HTML - `malformed.html` - Edge cases and error conditions - `sample.epub` - Sample EPUB file - `test_images/` - Sample images for testing ### Continuous Integration - Tests should run on Python 3.6+ - All tests must pass before merging - Aim for >90% code coverage - Performance regression testing for parsing speed ## Running Tests ### Run All Tests ```bash python tests/test_runner.py ``` ### Run Specific Test Module ```bash python tests/test_runner.py html_style python -m unittest tests.test_html_style ``` ### Run Individual Test ```bash python -m unittest tests.test_html_style.TestHTMLStyleManager.test_color_parsing ``` ### Run with Coverage ```bash pip install coverage coverage run -m unittest discover tests/ coverage report -m coverage html # Generate HTML report ``` ## Test Quality Guidelines ### Test Naming - Test files: `test_.py` - Test classes: `Test` - Test methods: `test_` ### Test Structure 1. **Arrange**: Set up test data and mocks 2. **Act**: Execute the functionality being tested 3. **Assert**: Verify the expected behavior ### Mock Usage - Mock external dependencies (file I/O, network) - Mock complex objects when testing units in isolation - Prefer real objects for integration tests ### Edge Cases - Empty inputs - Invalid inputs - Boundary conditions - Error scenarios - Performance edge cases ## Success Metrics - **Coverage**: >90% line coverage across all modules - **Performance**: No test takes longer than 1 second - **Reliability**: Tests pass consistently across environments - **Maintainability**: Tests are easy to understand and modify - **Documentation**: Tests clearly show expected behavior ## Implementation Priority 1. **Week 1**: Complete high-priority abstract tests 2. **Week 2**: Implement HTML processing component tests 3. **Week 3**: Add integration and end-to-end tests 4. **Week 4**: Performance and edge case testing This testing strategy ensures comprehensive coverage of the pyWebLayout library while maintaining good separation of concerns and providing clear documentation of expected behavior.