7.8 KiB
PyWebLayout Testing Strategy
This document outlines the comprehensive unit testing strategy for the pyWebLayout project.
Testing Philosophy
The testing strategy follows these principles:
- Separation of Concerns: Each component is tested independently
- Comprehensive Coverage: All public APIs and critical functionality are tested
- Integration Testing: End-to-end workflows are validated
- Regression Prevention: Tests prevent breaking changes
- Documentation: Tests serve as living documentation of expected behavior
Test Organization
Current Test Files (Implemented)
✅ test_html_style.py
Tests the HTMLStyleManager class for CSS parsing and style management.
Coverage:
- Style initialization and defaults
- Style stack operations (push/pop)
- CSS property parsing (font-size, font-weight, colors, etc.)
- Color parsing (named, hex, rgb, rgba)
- Tag-specific default styles
- Inline style parsing
- Font object creation
- Style combination (tag + inline styles)
✅ test_html_text.py
Tests the HTMLTextProcessor class for text buffering and word creation.
Coverage:
- Text buffer management
- HTML entity reference handling
- Character reference processing (decimal/hex)
- Word creation with styling
- Paragraph management
- Text flushing operations
- Buffer state operations
✅ test_html_content.py
Integration tests for the HTMLContentReader class covering complete HTML parsing.
Coverage:
- Simple paragraph parsing
- Heading levels (h1-h6)
- Styled text (bold, italic)
- Lists (ul, ol, dl)
- Tables with headers and cells
- Blockquotes with nested content
- Code blocks with language detection
- HTML entities
- Nested element structures
- Complex document parsing
✅ test_abstract_blocks.py
Tests for the core abstract block element classes.
Coverage:
- Paragraph word management
- Heading levels and properties
- Quote nesting capabilities
- Code block line management
- List creation and item handling
- Table structure (rows, cells, sections)
- Image properties and scaling
- Simple elements (hr, br)
✅ test_runner.py
Test runner script for executing all tests with summary reporting.
Additional Tests Needed
🔄 High Priority (Should Implement Next)
test_abstract_inline.py
Tests for inline elements and text formatting.
Needed Coverage:
- Word creation and properties
- Word hyphenation functionality
- FormattedSpan management
- Word chaining (previous/next relationships)
- Font style application
- Language-specific hyphenation
test_abstract_document.py
Tests for document structure and metadata.
Needed Coverage:
- Document creation and initialization
- Metadata management (title, author, language, etc.)
- Block addition and management
- Anchor creation and resolution
- Resource management
- Table of contents generation
- Chapter and book structures
test_abstract_functional.py
Tests for functional elements (links, buttons, forms).
Needed Coverage:
- Link creation and type detection
- Link execution for different types
- Button functionality and state
- Form field management
- Form validation and submission
- Parameter handling
test_style_system.py
Tests for the style system (fonts, colors, alignment).
Needed Coverage:
- Font creation and properties
- Color representation and manipulation
- Font weight, style, decoration enums
- Alignment enums and behavior
- Style inheritance and cascading
🔧 Medium Priority
test_html_elements.py
Unit tests for the HTML element handlers.
Needed Coverage:
- BlockElementHandler individual methods
- ListElementHandler state management
- TableElementHandler complex scenarios
- InlineElementHandler link processing
- Handler coordination and delegation
- Error handling in handlers
test_html_metadata.py
Tests for HTML metadata extraction.
Needed Coverage:
- Meta tag parsing
- Open Graph extraction
- JSON-LD structured data
- Title and description extraction
- Language detection
- Character encoding handling
test_html_resources.py
Tests for HTML resource extraction.
Needed Coverage:
- CSS stylesheet extraction
- JavaScript resource identification
- Image source collection
- Media element detection
- External resource resolution
- Base URL handling
test_io_base.py
Tests for the base reader architecture.
Needed Coverage:
- BaseReader interface compliance
- MetadataReader abstract methods
- ContentReader abstract methods
- ResourceReader abstract methods
- CompositeReader coordination
🔍 Lower Priority
test_concrete_elements.py
Tests for concrete rendering implementations.
Needed Coverage:
- Box model calculations
- Text rendering specifics
- Image rendering and scaling
- Page layout management
- Functional element rendering
test_typesetting.py
Tests for the typesetting system.
Needed Coverage:
- Flow algorithms
- Pagination logic
- Document pagination
- Line breaking
- Hyphenation integration
test_epub_reader.py
Tests for EPUB reading functionality.
Needed Coverage:
- EPUB file structure parsing
- Manifest processing
- Chapter extraction
- Metadata reading
- Navigation document parsing
test_integration.py
End-to-end integration tests.
Needed Coverage:
- Complete HTML-to-document workflows
- EPUB-to-document workflows
- Style application across parsers
- Resource resolution chains
- Error handling scenarios
Testing Infrastructure
Test Dependencies
# Required for testing
unittest # Built-in Python testing framework
unittest.mock # For mocking and test doubles
Test Data
- Create
tests/data/directory with sample files:sample.html- Well-formed HTML documentcomplex.html- Complex nested HTMLmalformed.html- Edge cases and error conditionssample.epub- Sample EPUB filetest_images/- Sample images for testing
Continuous Integration
- Tests should run on Python 3.6+
- All tests must pass before merging
- Aim for >90% code coverage
- Performance regression testing for parsing speed
Running Tests
Run All Tests
python tests/test_runner.py
Run Specific Test Module
python tests/test_runner.py html_style
python -m unittest tests.test_html_style
Run Individual Test
python -m unittest tests.test_html_style.TestHTMLStyleManager.test_color_parsing
Run with Coverage
pip install coverage
coverage run -m unittest discover tests/
coverage report -m
coverage html # Generate HTML report
Test Quality Guidelines
Test Naming
- Test files:
test_<module_name>.py - Test classes:
Test<ClassName> - Test methods:
test_<specific_functionality>
Test Structure
- Arrange: Set up test data and mocks
- Act: Execute the functionality being tested
- Assert: Verify the expected behavior
Mock Usage
- Mock external dependencies (file I/O, network)
- Mock complex objects when testing units in isolation
- Prefer real objects for integration tests
Edge Cases
- Empty inputs
- Invalid inputs
- Boundary conditions
- Error scenarios
- Performance edge cases
Success Metrics
- Coverage: >90% line coverage across all modules
- Performance: No test takes longer than 1 second
- Reliability: Tests pass consistently across environments
- Maintainability: Tests are easy to understand and modify
- Documentation: Tests clearly show expected behavior
Implementation Priority
- Week 1: Complete high-priority abstract tests
- Week 2: Implement HTML processing component tests
- Week 3: Add integration and end-to-end tests
- Week 4: Performance and edge case testing
This testing strategy ensures comprehensive coverage of the pyWebLayout library while maintaining good separation of concerns and providing clear documentation of expected behavior.