pyWebLayout/tests/TESTING_STRATEGY.md

7.8 KiB

PyWebLayout Testing Strategy

This document outlines the comprehensive unit testing strategy for the pyWebLayout project.

Testing Philosophy

The testing strategy follows these principles:

  • Separation of Concerns: Each component is tested independently
  • Comprehensive Coverage: All public APIs and critical functionality are tested
  • Integration Testing: End-to-end workflows are validated
  • Regression Prevention: Tests prevent breaking changes
  • Documentation: Tests serve as living documentation of expected behavior

Test Organization

Current Test Files (Implemented)

test_html_style.py

Tests the HTMLStyleManager class for CSS parsing and style management.

Coverage:

  • Style initialization and defaults
  • Style stack operations (push/pop)
  • CSS property parsing (font-size, font-weight, colors, etc.)
  • Color parsing (named, hex, rgb, rgba)
  • Tag-specific default styles
  • Inline style parsing
  • Font object creation
  • Style combination (tag + inline styles)

test_html_text.py

Tests the HTMLTextProcessor class for text buffering and word creation.

Coverage:

  • Text buffer management
  • HTML entity reference handling
  • Character reference processing (decimal/hex)
  • Word creation with styling
  • Paragraph management
  • Text flushing operations
  • Buffer state operations

test_html_content.py

Integration tests for the HTMLContentReader class covering complete HTML parsing.

Coverage:

  • Simple paragraph parsing
  • Heading levels (h1-h6)
  • Styled text (bold, italic)
  • Lists (ul, ol, dl)
  • Tables with headers and cells
  • Blockquotes with nested content
  • Code blocks with language detection
  • HTML entities
  • Nested element structures
  • Complex document parsing

test_abstract_blocks.py

Tests for the core abstract block element classes.

Coverage:

  • Paragraph word management
  • Heading levels and properties
  • Quote nesting capabilities
  • Code block line management
  • List creation and item handling
  • Table structure (rows, cells, sections)
  • Image properties and scaling
  • Simple elements (hr, br)

test_runner.py

Test runner script for executing all tests with summary reporting.


Additional Tests Needed

🔄 High Priority (Should Implement Next)

test_abstract_inline.py

Tests for inline elements and text formatting.

Needed Coverage:

  • Word creation and properties
  • Word hyphenation functionality
  • FormattedSpan management
  • Word chaining (previous/next relationships)
  • Font style application
  • Language-specific hyphenation

test_abstract_document.py

Tests for document structure and metadata.

Needed Coverage:

  • Document creation and initialization
  • Metadata management (title, author, language, etc.)
  • Block addition and management
  • Anchor creation and resolution
  • Resource management
  • Table of contents generation
  • Chapter and book structures

test_abstract_functional.py

Tests for functional elements (links, buttons, forms).

Needed Coverage:

  • Link creation and type detection
  • Link execution for different types
  • Button functionality and state
  • Form field management
  • Form validation and submission
  • Parameter handling

test_style_system.py

Tests for the style system (fonts, colors, alignment).

Needed Coverage:

  • Font creation and properties
  • Color representation and manipulation
  • Font weight, style, decoration enums
  • Alignment enums and behavior
  • Style inheritance and cascading

🔧 Medium Priority

test_html_elements.py

Unit tests for the HTML element handlers.

Needed Coverage:

  • BlockElementHandler individual methods
  • ListElementHandler state management
  • TableElementHandler complex scenarios
  • InlineElementHandler link processing
  • Handler coordination and delegation
  • Error handling in handlers

test_html_metadata.py

Tests for HTML metadata extraction.

Needed Coverage:

  • Meta tag parsing
  • Open Graph extraction
  • JSON-LD structured data
  • Title and description extraction
  • Language detection
  • Character encoding handling

test_html_resources.py

Tests for HTML resource extraction.

Needed Coverage:

  • CSS stylesheet extraction
  • JavaScript resource identification
  • Image source collection
  • Media element detection
  • External resource resolution
  • Base URL handling

test_io_base.py

Tests for the base reader architecture.

Needed Coverage:

  • BaseReader interface compliance
  • MetadataReader abstract methods
  • ContentReader abstract methods
  • ResourceReader abstract methods
  • CompositeReader coordination

🔍 Lower Priority

test_concrete_elements.py

Tests for concrete rendering implementations.

Needed Coverage:

  • Box model calculations
  • Text rendering specifics
  • Image rendering and scaling
  • Page layout management
  • Functional element rendering

test_typesetting.py

Tests for the typesetting system.

Needed Coverage:

  • Flow algorithms
  • Pagination logic
  • Document pagination
  • Line breaking
  • Hyphenation integration

test_epub_reader.py

Tests for EPUB reading functionality.

Needed Coverage:

  • EPUB file structure parsing
  • Manifest processing
  • Chapter extraction
  • Metadata reading
  • Navigation document parsing

test_integration.py

End-to-end integration tests.

Needed Coverage:

  • Complete HTML-to-document workflows
  • EPUB-to-document workflows
  • Style application across parsers
  • Resource resolution chains
  • Error handling scenarios

Testing Infrastructure

Test Dependencies

# Required for testing
unittest  # Built-in Python testing framework
unittest.mock  # For mocking and test doubles

Test Data

  • Create tests/data/ directory with sample files:
    • sample.html - Well-formed HTML document
    • complex.html - Complex nested HTML
    • malformed.html - Edge cases and error conditions
    • sample.epub - Sample EPUB file
    • test_images/ - Sample images for testing

Continuous Integration

  • Tests should run on Python 3.6+
  • All tests must pass before merging
  • Aim for >90% code coverage
  • Performance regression testing for parsing speed

Running Tests

Run All Tests

python tests/test_runner.py

Run Specific Test Module

python tests/test_runner.py html_style
python -m unittest tests.test_html_style

Run Individual Test

python -m unittest tests.test_html_style.TestHTMLStyleManager.test_color_parsing

Run with Coverage

pip install coverage
coverage run -m unittest discover tests/
coverage report -m
coverage html  # Generate HTML report

Test Quality Guidelines

Test Naming

  • Test files: test_<module_name>.py
  • Test classes: Test<ClassName>
  • Test methods: test_<specific_functionality>

Test Structure

  1. Arrange: Set up test data and mocks
  2. Act: Execute the functionality being tested
  3. Assert: Verify the expected behavior

Mock Usage

  • Mock external dependencies (file I/O, network)
  • Mock complex objects when testing units in isolation
  • Prefer real objects for integration tests

Edge Cases

  • Empty inputs
  • Invalid inputs
  • Boundary conditions
  • Error scenarios
  • Performance edge cases

Success Metrics

  • Coverage: >90% line coverage across all modules
  • Performance: No test takes longer than 1 second
  • Reliability: Tests pass consistently across environments
  • Maintainability: Tests are easy to understand and modify
  • Documentation: Tests clearly show expected behavior

Implementation Priority

  1. Week 1: Complete high-priority abstract tests
  2. Week 2: Implement HTML processing component tests
  3. Week 3: Add integration and end-to-end tests
  4. Week 4: Performance and edge case testing

This testing strategy ensures comprehensive coverage of the pyWebLayout library while maintaining good separation of concerns and providing clear documentation of expected behavior.