300 lines
7.8 KiB
Markdown
300 lines
7.8 KiB
Markdown
# PyWebLayout Testing Strategy
|
|
|
|
This document outlines the comprehensive unit testing strategy for the pyWebLayout project.
|
|
|
|
## Testing Philosophy
|
|
|
|
The testing strategy follows these principles:
|
|
- **Separation of Concerns**: Each component is tested independently
|
|
- **Comprehensive Coverage**: All public APIs and critical functionality are tested
|
|
- **Integration Testing**: End-to-end workflows are validated
|
|
- **Regression Prevention**: Tests prevent breaking changes
|
|
- **Documentation**: Tests serve as living documentation of expected behavior
|
|
|
|
## Test Organization
|
|
|
|
### Current Test Files (Implemented)
|
|
|
|
#### ✅ `test_html_style.py`
|
|
Tests the `HTMLStyleManager` class for CSS parsing and style management.
|
|
|
|
**Coverage:**
|
|
- Style initialization and defaults
|
|
- Style stack operations (push/pop)
|
|
- CSS property parsing (font-size, font-weight, colors, etc.)
|
|
- Color parsing (named, hex, rgb, rgba)
|
|
- Tag-specific default styles
|
|
- Inline style parsing
|
|
- Font object creation
|
|
- Style combination (tag + inline styles)
|
|
|
|
#### ✅ `test_html_text.py`
|
|
Tests the `HTMLTextProcessor` class for text buffering and word creation.
|
|
|
|
**Coverage:**
|
|
- Text buffer management
|
|
- HTML entity reference handling
|
|
- Character reference processing (decimal/hex)
|
|
- Word creation with styling
|
|
- Paragraph management
|
|
- Text flushing operations
|
|
- Buffer state operations
|
|
|
|
#### ✅ `test_html_content.py`
|
|
Integration tests for the `HTMLContentReader` class covering complete HTML parsing.
|
|
|
|
**Coverage:**
|
|
- Simple paragraph parsing
|
|
- Heading levels (h1-h6)
|
|
- Styled text (bold, italic)
|
|
- Lists (ul, ol, dl)
|
|
- Tables with headers and cells
|
|
- Blockquotes with nested content
|
|
- Code blocks with language detection
|
|
- HTML entities
|
|
- Nested element structures
|
|
- Complex document parsing
|
|
|
|
#### ✅ `test_abstract_blocks.py`
|
|
Tests for the core abstract block element classes.
|
|
|
|
**Coverage:**
|
|
- Paragraph word management
|
|
- Heading levels and properties
|
|
- Quote nesting capabilities
|
|
- Code block line management
|
|
- List creation and item handling
|
|
- Table structure (rows, cells, sections)
|
|
- Image properties and scaling
|
|
- Simple elements (hr, br)
|
|
|
|
#### ✅ `test_runner.py`
|
|
Test runner script for executing all tests with summary reporting.
|
|
|
|
---
|
|
|
|
## Additional Tests Needed
|
|
|
|
### 🔄 High Priority (Should Implement Next)
|
|
|
|
#### `test_abstract_inline.py`
|
|
Tests for inline elements and text formatting.
|
|
|
|
**Needed Coverage:**
|
|
- Word creation and properties
|
|
- Word hyphenation functionality
|
|
- FormattedSpan management
|
|
- Word chaining (previous/next relationships)
|
|
- Font style application
|
|
- Language-specific hyphenation
|
|
|
|
#### `test_abstract_document.py`
|
|
Tests for document structure and metadata.
|
|
|
|
**Needed Coverage:**
|
|
- Document creation and initialization
|
|
- Metadata management (title, author, language, etc.)
|
|
- Block addition and management
|
|
- Anchor creation and resolution
|
|
- Resource management
|
|
- Table of contents generation
|
|
- Chapter and book structures
|
|
|
|
#### `test_abstract_functional.py`
|
|
Tests for functional elements (links, buttons, forms).
|
|
|
|
**Needed Coverage:**
|
|
- Link creation and type detection
|
|
- Link execution for different types
|
|
- Button functionality and state
|
|
- Form field management
|
|
- Form validation and submission
|
|
- Parameter handling
|
|
|
|
#### `test_style_system.py`
|
|
Tests for the style system (fonts, colors, alignment).
|
|
|
|
**Needed Coverage:**
|
|
- Font creation and properties
|
|
- Color representation and manipulation
|
|
- Font weight, style, decoration enums
|
|
- Alignment enums and behavior
|
|
- Style inheritance and cascading
|
|
|
|
### 🔧 Medium Priority
|
|
|
|
#### `test_html_elements.py`
|
|
Unit tests for the HTML element handlers.
|
|
|
|
**Needed Coverage:**
|
|
- BlockElementHandler individual methods
|
|
- ListElementHandler state management
|
|
- TableElementHandler complex scenarios
|
|
- InlineElementHandler link processing
|
|
- Handler coordination and delegation
|
|
- Error handling in handlers
|
|
|
|
#### `test_html_metadata.py`
|
|
Tests for HTML metadata extraction.
|
|
|
|
**Needed Coverage:**
|
|
- Meta tag parsing
|
|
- Open Graph extraction
|
|
- JSON-LD structured data
|
|
- Title and description extraction
|
|
- Language detection
|
|
- Character encoding handling
|
|
|
|
#### `test_html_resources.py`
|
|
Tests for HTML resource extraction.
|
|
|
|
**Needed Coverage:**
|
|
- CSS stylesheet extraction
|
|
- JavaScript resource identification
|
|
- Image source collection
|
|
- Media element detection
|
|
- External resource resolution
|
|
- Base URL handling
|
|
|
|
#### `test_io_base.py`
|
|
Tests for the base reader architecture.
|
|
|
|
**Needed Coverage:**
|
|
- BaseReader interface compliance
|
|
- MetadataReader abstract methods
|
|
- ContentReader abstract methods
|
|
- ResourceReader abstract methods
|
|
- CompositeReader coordination
|
|
|
|
### 🔍 Lower Priority
|
|
|
|
#### `test_concrete_elements.py`
|
|
Tests for concrete rendering implementations.
|
|
|
|
**Needed Coverage:**
|
|
- Box model calculations
|
|
- Text rendering specifics
|
|
- Image rendering and scaling
|
|
- Page layout management
|
|
- Functional element rendering
|
|
|
|
#### `test_typesetting.py`
|
|
Tests for the typesetting system.
|
|
|
|
**Needed Coverage:**
|
|
- Flow algorithms
|
|
- Pagination logic
|
|
- Document pagination
|
|
- Line breaking
|
|
- Hyphenation integration
|
|
|
|
#### `test_epub_reader.py`
|
|
Tests for EPUB reading functionality.
|
|
|
|
**Needed Coverage:**
|
|
- EPUB file structure parsing
|
|
- Manifest processing
|
|
- Chapter extraction
|
|
- Metadata reading
|
|
- Navigation document parsing
|
|
|
|
#### `test_integration.py`
|
|
End-to-end integration tests.
|
|
|
|
**Needed Coverage:**
|
|
- Complete HTML-to-document workflows
|
|
- EPUB-to-document workflows
|
|
- Style application across parsers
|
|
- Resource resolution chains
|
|
- Error handling scenarios
|
|
|
|
## Testing Infrastructure
|
|
|
|
### Test Dependencies
|
|
```python
|
|
# Required for testing
|
|
unittest # Built-in Python testing framework
|
|
unittest.mock # For mocking and test doubles
|
|
```
|
|
|
|
### Test Data
|
|
- Create `tests/data/` directory with sample files:
|
|
- `sample.html` - Well-formed HTML document
|
|
- `complex.html` - Complex nested HTML
|
|
- `malformed.html` - Edge cases and error conditions
|
|
- `sample.epub` - Sample EPUB file
|
|
- `test_images/` - Sample images for testing
|
|
|
|
### Continuous Integration
|
|
- Tests should run on Python 3.6+
|
|
- All tests must pass before merging
|
|
- Aim for >90% code coverage
|
|
- Performance regression testing for parsing speed
|
|
|
|
## Running Tests
|
|
|
|
### Run All Tests
|
|
```bash
|
|
python tests/test_runner.py
|
|
```
|
|
|
|
### Run Specific Test Module
|
|
```bash
|
|
python tests/test_runner.py html_style
|
|
python -m unittest tests.test_html_style
|
|
```
|
|
|
|
### Run Individual Test
|
|
```bash
|
|
python -m unittest tests.test_html_style.TestHTMLStyleManager.test_color_parsing
|
|
```
|
|
|
|
### Run with Coverage
|
|
```bash
|
|
pip install coverage
|
|
coverage run -m unittest discover tests/
|
|
coverage report -m
|
|
coverage html # Generate HTML report
|
|
```
|
|
|
|
## Test Quality Guidelines
|
|
|
|
### Test Naming
|
|
- Test files: `test_<module_name>.py`
|
|
- Test classes: `Test<ClassName>`
|
|
- Test methods: `test_<specific_functionality>`
|
|
|
|
### Test Structure
|
|
1. **Arrange**: Set up test data and mocks
|
|
2. **Act**: Execute the functionality being tested
|
|
3. **Assert**: Verify the expected behavior
|
|
|
|
### Mock Usage
|
|
- Mock external dependencies (file I/O, network)
|
|
- Mock complex objects when testing units in isolation
|
|
- Prefer real objects for integration tests
|
|
|
|
### Edge Cases
|
|
- Empty inputs
|
|
- Invalid inputs
|
|
- Boundary conditions
|
|
- Error scenarios
|
|
- Performance edge cases
|
|
|
|
## Success Metrics
|
|
|
|
- **Coverage**: >90% line coverage across all modules
|
|
- **Performance**: No test takes longer than 1 second
|
|
- **Reliability**: Tests pass consistently across environments
|
|
- **Maintainability**: Tests are easy to understand and modify
|
|
- **Documentation**: Tests clearly show expected behavior
|
|
|
|
## Implementation Priority
|
|
|
|
1. **Week 1**: Complete high-priority abstract tests
|
|
2. **Week 2**: Implement HTML processing component tests
|
|
3. **Week 3**: Add integration and end-to-end tests
|
|
4. **Week 4**: Performance and edge case testing
|
|
|
|
This testing strategy ensures comprehensive coverage of the pyWebLayout library while maintaining good separation of concerns and providing clear documentation of expected behavior.
|