pyWebLayout/tests/TESTING_STRATEGY.md

300 lines
7.8 KiB
Markdown

# PyWebLayout Testing Strategy
This document outlines the comprehensive unit testing strategy for the pyWebLayout project.
## Testing Philosophy
The testing strategy follows these principles:
- **Separation of Concerns**: Each component is tested independently
- **Comprehensive Coverage**: All public APIs and critical functionality are tested
- **Integration Testing**: End-to-end workflows are validated
- **Regression Prevention**: Tests prevent breaking changes
- **Documentation**: Tests serve as living documentation of expected behavior
## Test Organization
### Current Test Files (Implemented)
#### ✅ `test_html_style.py`
Tests the `HTMLStyleManager` class for CSS parsing and style management.
**Coverage:**
- Style initialization and defaults
- Style stack operations (push/pop)
- CSS property parsing (font-size, font-weight, colors, etc.)
- Color parsing (named, hex, rgb, rgba)
- Tag-specific default styles
- Inline style parsing
- Font object creation
- Style combination (tag + inline styles)
#### ✅ `test_html_text.py`
Tests the `HTMLTextProcessor` class for text buffering and word creation.
**Coverage:**
- Text buffer management
- HTML entity reference handling
- Character reference processing (decimal/hex)
- Word creation with styling
- Paragraph management
- Text flushing operations
- Buffer state operations
#### ✅ `test_html_content.py`
Integration tests for the `HTMLContentReader` class covering complete HTML parsing.
**Coverage:**
- Simple paragraph parsing
- Heading levels (h1-h6)
- Styled text (bold, italic)
- Lists (ul, ol, dl)
- Tables with headers and cells
- Blockquotes with nested content
- Code blocks with language detection
- HTML entities
- Nested element structures
- Complex document parsing
#### ✅ `test_abstract_blocks.py`
Tests for the core abstract block element classes.
**Coverage:**
- Paragraph word management
- Heading levels and properties
- Quote nesting capabilities
- Code block line management
- List creation and item handling
- Table structure (rows, cells, sections)
- Image properties and scaling
- Simple elements (hr, br)
#### ✅ `test_runner.py`
Test runner script for executing all tests with summary reporting.
---
## Additional Tests Needed
### 🔄 High Priority (Should Implement Next)
#### `test_abstract_inline.py`
Tests for inline elements and text formatting.
**Needed Coverage:**
- Word creation and properties
- Word hyphenation functionality
- FormattedSpan management
- Word chaining (previous/next relationships)
- Font style application
- Language-specific hyphenation
#### `test_abstract_document.py`
Tests for document structure and metadata.
**Needed Coverage:**
- Document creation and initialization
- Metadata management (title, author, language, etc.)
- Block addition and management
- Anchor creation and resolution
- Resource management
- Table of contents generation
- Chapter and book structures
#### `test_abstract_functional.py`
Tests for functional elements (links, buttons, forms).
**Needed Coverage:**
- Link creation and type detection
- Link execution for different types
- Button functionality and state
- Form field management
- Form validation and submission
- Parameter handling
#### `test_style_system.py`
Tests for the style system (fonts, colors, alignment).
**Needed Coverage:**
- Font creation and properties
- Color representation and manipulation
- Font weight, style, decoration enums
- Alignment enums and behavior
- Style inheritance and cascading
### 🔧 Medium Priority
#### `test_html_elements.py`
Unit tests for the HTML element handlers.
**Needed Coverage:**
- BlockElementHandler individual methods
- ListElementHandler state management
- TableElementHandler complex scenarios
- InlineElementHandler link processing
- Handler coordination and delegation
- Error handling in handlers
#### `test_html_metadata.py`
Tests for HTML metadata extraction.
**Needed Coverage:**
- Meta tag parsing
- Open Graph extraction
- JSON-LD structured data
- Title and description extraction
- Language detection
- Character encoding handling
#### `test_html_resources.py`
Tests for HTML resource extraction.
**Needed Coverage:**
- CSS stylesheet extraction
- JavaScript resource identification
- Image source collection
- Media element detection
- External resource resolution
- Base URL handling
#### `test_io_base.py`
Tests for the base reader architecture.
**Needed Coverage:**
- BaseReader interface compliance
- MetadataReader abstract methods
- ContentReader abstract methods
- ResourceReader abstract methods
- CompositeReader coordination
### 🔍 Lower Priority
#### `test_concrete_elements.py`
Tests for concrete rendering implementations.
**Needed Coverage:**
- Box model calculations
- Text rendering specifics
- Image rendering and scaling
- Page layout management
- Functional element rendering
#### `test_typesetting.py`
Tests for the typesetting system.
**Needed Coverage:**
- Flow algorithms
- Pagination logic
- Document pagination
- Line breaking
- Hyphenation integration
#### `test_epub_reader.py`
Tests for EPUB reading functionality.
**Needed Coverage:**
- EPUB file structure parsing
- Manifest processing
- Chapter extraction
- Metadata reading
- Navigation document parsing
#### `test_integration.py`
End-to-end integration tests.
**Needed Coverage:**
- Complete HTML-to-document workflows
- EPUB-to-document workflows
- Style application across parsers
- Resource resolution chains
- Error handling scenarios
## Testing Infrastructure
### Test Dependencies
```python
# Required for testing
unittest # Built-in Python testing framework
unittest.mock # For mocking and test doubles
```
### Test Data
- Create `tests/data/` directory with sample files:
- `sample.html` - Well-formed HTML document
- `complex.html` - Complex nested HTML
- `malformed.html` - Edge cases and error conditions
- `sample.epub` - Sample EPUB file
- `test_images/` - Sample images for testing
### Continuous Integration
- Tests should run on Python 3.6+
- All tests must pass before merging
- Aim for >90% code coverage
- Performance regression testing for parsing speed
## Running Tests
### Run All Tests
```bash
python tests/test_runner.py
```
### Run Specific Test Module
```bash
python tests/test_runner.py html_style
python -m unittest tests.test_html_style
```
### Run Individual Test
```bash
python -m unittest tests.test_html_style.TestHTMLStyleManager.test_color_parsing
```
### Run with Coverage
```bash
pip install coverage
coverage run -m unittest discover tests/
coverage report -m
coverage html # Generate HTML report
```
## Test Quality Guidelines
### Test Naming
- Test files: `test_<module_name>.py`
- Test classes: `Test<ClassName>`
- Test methods: `test_<specific_functionality>`
### Test Structure
1. **Arrange**: Set up test data and mocks
2. **Act**: Execute the functionality being tested
3. **Assert**: Verify the expected behavior
### Mock Usage
- Mock external dependencies (file I/O, network)
- Mock complex objects when testing units in isolation
- Prefer real objects for integration tests
### Edge Cases
- Empty inputs
- Invalid inputs
- Boundary conditions
- Error scenarios
- Performance edge cases
## Success Metrics
- **Coverage**: >90% line coverage across all modules
- **Performance**: No test takes longer than 1 second
- **Reliability**: Tests pass consistently across environments
- **Maintainability**: Tests are easy to understand and modify
- **Documentation**: Tests clearly show expected behavior
## Implementation Priority
1. **Week 1**: Complete high-priority abstract tests
2. **Week 2**: Implement HTML processing component tests
3. **Week 3**: Add integration and end-to-end tests
4. **Week 4**: Performance and edge case testing
This testing strategy ensures comprehensive coverage of the pyWebLayout library while maintaining good separation of concerns and providing clear documentation of expected behavior.