pyWebLayout/tests/TESTING_STRATEGY.md

# PyWebLayout Testing Strategy

This document outlines the comprehensive unit testing strategy for the pyWebLayout project.

## Testing Philosophy

The testing strategy follows these principles:
- **Separation of Concerns**: Each component is tested independently
- **Comprehensive Coverage**: All public APIs and critical functionality are tested
- **Integration Testing**: End-to-end workflows are validated
- **Regression Prevention**: Tests prevent breaking changes
- **Documentation**: Tests serve as living documentation of expected behavior

## Test Organization

### Current Test Files (Implemented)

#### ✅ `test_html_style.py`
Tests the `HTMLStyleManager` class for CSS parsing and style management.

**Coverage:**
- Style initialization and defaults
- Style stack operations (push/pop)
- CSS property parsing (font-size, font-weight, colors, etc.)
- Color parsing (named, hex, rgb, rgba)
- Tag-specific default styles
- Inline style parsing
- Font object creation
- Style combination (tag + inline styles)

#### ✅ `test_html_text.py`
Tests the `HTMLTextProcessor` class for text buffering and word creation.

**Coverage:**
- Text buffer management
- HTML entity reference handling
- Character reference processing (decimal/hex)
- Word creation with styling
- Paragraph management
- Text flushing operations
- Buffer state operations

#### ✅ `test_html_content.py`
Integration tests for the `HTMLContentReader` class covering complete HTML parsing.

**Coverage:**
- Simple paragraph parsing
- Heading levels (h1-h6)
- Styled text (bold, italic)
- Lists (ul, ol, dl)
- Tables with headers and cells
- Blockquotes with nested content
- Code blocks with language detection
- HTML entities
- Nested element structures
- Complex document parsing

#### ✅ `test_abstract_blocks.py`
Tests for the core abstract block element classes.

**Coverage:**
- Paragraph word management
- Heading levels and properties
- Quote nesting capabilities
- Code block line management
- List creation and item handling
- Table structure (rows, cells, sections)
- Image properties and scaling
- Simple elements (hr, br)

#### ✅ `test_runner.py`
Test runner script for executing all tests with summary reporting.

---

## Additional Tests Needed

### 🔄 High Priority (Should Implement Next)

#### `test_abstract_inline.py`
Tests for inline elements and text formatting.

**Needed Coverage:**
- Word creation and properties
- Word hyphenation functionality
- FormattedSpan management
- Word chaining (previous/next relationships)
- Font style application
- Language-specific hyphenation

#### `test_abstract_document.py`
Tests for document structure and metadata.

**Needed Coverage:**
- Document creation and initialization
- Metadata management (title, author, language, etc.)
- Block addition and management
- Anchor creation and resolution
- Resource management
- Table of contents generation
- Chapter and book structures

#### `test_abstract_functional.py`
Tests for functional elements (links, buttons, forms).

**Needed Coverage:**
- Link creation and type detection
- Link execution for different types
- Button functionality and state
- Form field management
- Form validation and submission
- Parameter handling

#### `test_style_system.py`
Tests for the style system (fonts, colors, alignment).

**Needed Coverage:**
- Font creation and properties
- Color representation and manipulation
- Font weight, style, decoration enums
- Alignment enums and behavior
- Style inheritance and cascading

### 🔧 Medium Priority

#### `test_html_elements.py`
Unit tests for the HTML element handlers.

**Needed Coverage:**
- BlockElementHandler individual methods
- ListElementHandler state management
- TableElementHandler complex scenarios
- InlineElementHandler link processing
- Handler coordination and delegation
- Error handling in handlers

#### `test_html_metadata.py`
Tests for HTML metadata extraction.

**Needed Coverage:**
- Meta tag parsing
- Open Graph extraction
- JSON-LD structured data
- Title and description extraction
- Language detection
- Character encoding handling

#### `test_html_resources.py`
Tests for HTML resource extraction.

**Needed Coverage:**
- CSS stylesheet extraction
- JavaScript resource identification
- Image source collection
- Media element detection
- External resource resolution
- Base URL handling

#### `test_io_base.py`
Tests for the base reader architecture.

**Needed Coverage:**
- BaseReader interface compliance
- MetadataReader abstract methods
- ContentReader abstract methods
- ResourceReader abstract methods
- CompositeReader coordination

### 🔍 Lower Priority

#### `test_concrete_elements.py`
Tests for concrete rendering implementations.

**Needed Coverage:**
- Box model calculations
- Text rendering specifics
- Image rendering and scaling
- Page layout management
- Functional element rendering

#### `test_typesetting.py`
Tests for the typesetting system.

**Needed Coverage:**
- Flow algorithms
- Pagination logic
- Document pagination
- Line breaking
- Hyphenation integration

#### `test_epub_reader.py`
Tests for EPUB reading functionality.

**Needed Coverage:**
- EPUB file structure parsing
- Manifest processing
- Chapter extraction
- Metadata reading
- Navigation document parsing

#### `test_integration.py`
End-to-end integration tests.

**Needed Coverage:**
- Complete HTML-to-document workflows
- EPUB-to-document workflows
- Style application across parsers
- Resource resolution chains
- Error handling scenarios

## Testing Infrastructure

### Test Dependencies
```python
# Required for testing
unittest  # Built-in Python testing framework
unittest.mock  # For mocking and test doubles
```

### Test Data
- Create `tests/data/` directory with sample files:
  - `sample.html` - Well-formed HTML document
  - `complex.html` - Complex nested HTML
  - `malformed.html` - Edge cases and error conditions
  - `sample.epub` - Sample EPUB file
  - `test_images/` - Sample images for testing

### Continuous Integration
- Tests should run on Python 3.6+
- All tests must pass before merging
- Aim for >90% code coverage
- Performance regression testing for parsing speed

## Running Tests

### Run All Tests
```bash
python tests/test_runner.py
```

### Run Specific Test Module
```bash
python tests/test_runner.py html_style
python -m unittest tests.test_html_style
```

### Run Individual Test
```bash
python -m unittest tests.test_html_style.TestHTMLStyleManager.test_color_parsing
```

### Run with Coverage
```bash
pip install coverage
coverage run -m unittest discover tests/
coverage report -m
coverage html  # Generate HTML report
```

## Test Quality Guidelines

### Test Naming
- Test files: `test_<module_name>.py`
- Test classes: `Test<ClassName>`
- Test methods: `test_<specific_functionality>`

### Test Structure
1. **Arrange**: Set up test data and mocks
2. **Act**: Execute the functionality being tested
3. **Assert**: Verify the expected behavior

### Mock Usage
- Mock external dependencies (file I/O, network)
- Mock complex objects when testing units in isolation
- Prefer real objects for integration tests

### Edge Cases
- Empty inputs
- Invalid inputs
- Boundary conditions
- Error scenarios
- Performance edge cases

## Success Metrics

- **Coverage**: >90% line coverage across all modules
- **Performance**: No test takes longer than 1 second
- **Reliability**: Tests pass consistently across environments
- **Maintainability**: Tests are easy to understand and modify
- **Documentation**: Tests clearly show expected behavior

## Implementation Priority

1. **Week 1**: Complete high-priority abstract tests
2. **Week 2**: Implement HTML processing component tests
3. **Week 3**: Add integration and end-to-end tests
4. **Week 4**: Performance and edge case testing

This testing strategy ensures comprehensive coverage of the pyWebLayout library while maintaining good separation of concerns and providing clear documentation of expected behavior.