pyWebLayout/ARCHITECTURE.md
Duncan Tourolle 8d892bfe28
Some checks failed
Python CI / test (push) Failing after 4m28s
New alignement handlers
2025-06-08 14:08:29 +02:00

234 lines
8.0 KiB
Markdown

# pyWebLayout Architecture: Abstract vs Concrete
This document explains the fundamental architectural separation between **Abstract** and **Concrete** layers in the pyWebLayout library.
## Overview
The pyWebLayout library follows a clear separation between two distinct layers:
- **Abstract Layer**: Represents the logical structure and content of documents (HTML/EPUB text)
- **Concrete Layer**: Handles the spatial rendering and visual representation of content
This separation provides flexibility, testability, and clean separation of concerns.
## Abstract Layer (`pyWebLayout/abstract/`)
The Abstract layer deals with the **logical structure** of documents without concerning itself with how content will be visually rendered.
### Key Components
#### `abstract/block.py`
- `Block`: Base class for all block-level content
- `Paragraph`: Represents a logical paragraph containing words
- `Heading`: Represents headings with semantic levels (H1-H6)
- `HList`: Represents ordered/unordered lists
- `Image`: Represents image references
#### `abstract/inline.py`
- `Word`: Represents individual words with text content and styling information
- Contains methods for hyphenation and text manipulation
- Does **not** handle rendering or spatial layout
#### `abstract/document.py`
- `Document`: Container for the overall document structure
- `Chapter`: Logical grouping of blocks (for books/long documents)
### Characteristics of Abstract Classes
1. **Content-focused**: Store text, structure, and semantic meaning
2. **Layout-agnostic**: No knowledge of fonts, pixels, or rendering
3. **Reusable**: Same content can be rendered in different formats/sizes
4. **Serializable**: Can be saved/loaded without rendering context
### Example: Abstract Word
```python
# An Abstract Word knows its text content and semantic properties
word = Word("supercalifragilisticexpialidocious", font_style)
word.hyphenate() # Logical operation - finds break points
parts = word.get_hyphenated_parts() # Returns ["super-", "cali-", "fragi-", ...]
```
## Concrete Layer (`pyWebLayout/concrete/`)
The Concrete layer handles the **spatial representation** and actual rendering of content.
### Key Components
#### `concrete/text.py`
- `Text`: Renders a specific text fragment with precise positioning
- `Line`: Manages a line of `Text` objects with spacing and alignment
- Handles actual pixel measurements, font rendering, and positioning
#### `concrete/page.py`
- `Page`: Top-level container for rendered content
- `Container`: Layout manager for organizing renderable objects
- Handles spatial layout, pagination, and visual composition
#### `concrete/box.py`
- `Box`: Base class for all spatially-aware renderable objects
- Provides positioning, sizing, and rendering capabilities
### Characteristics of Concrete Classes
1. **Rendering-focused**: Handle pixels, fonts, images, and visual output
2. **Spatially-aware**: Know exact positions, sizes, and layout constraints
3. **Implementation-specific**: Tied to specific rendering technologies (PIL, etc.)
4. **Non-portable**: Rendering results are tied to specific display contexts
### Example: Concrete Text
```python
# A Concrete Text object handles actual rendering
text = Text("super-", font) # Specific text fragment
text._calculate_dimensions() # Computes exact pixel size
image = text.render() # Produces actual visual output
```
## The Transformation Process
The architecture involves a clear transformation from Abstract to Concrete:
```
Abstract Document
[Parser Layer]
Abstract Blocks (Paragraph, Heading, etc.)
[Layout Engine]
Concrete Objects (Text, Line, Page)
[Rendering Engine]
Visual Output (Images, PDF, etc.)
```
### Example Transformation
```python
# 1. Abstract content
paragraph = Paragraph()
paragraph.add_word(Word("This", font))
paragraph.add_word(Word("is", font))
paragraph.add_word(Word("a", font))
paragraph.add_word(Word("test", font))
# 2. Layout transformation
layout = ParagraphLayout(line_width=200, line_height=20)
lines = layout.layout_paragraph(paragraph) # Returns List[Line]
# 3. Each Line contains concrete Text objects
for line in lines:
for text_obj in line.text_objects: # List[Text]
print(f"Text: '{text_obj.text}' at position {text_obj._origin}")
```
## Key Architectural Principles
### 1. **Single Responsibility**
- Abstract classes: Handle content and structure
- Concrete classes: Handle rendering and layout
### 2. **Separation of Concerns**
- Text parsing/processing ≠ Text rendering
- Document structure ≠ Page layout
- Content semantics ≠ Visual presentation
### 3. **Immutable Abstract Content**
- Abstract content remains unchanged during rendering
- Multiple concrete representations can be generated from same abstract content
- Enables pagination, different formats, responsive layouts
### 4. **One-to-Many Relationships**
- One Abstract Word → Multiple Concrete Text objects (hyphenation)
- One Abstract Paragraph → Multiple Concrete Lines
- One Abstract Document → Multiple Concrete Pages
## Common Anti-Patterns to Avoid
### ❌ **Mixing Concerns**
```python
# WRONG: Abstract class knowing about pixels
class Word:
def __init__(self, text):
self.text = text
self.rendered_width = None # ❌ Concrete concern in abstract class
```
### ❌ **renderable_words Concept**
```python
# WRONG: Confusing abstract and concrete
line.renderable_words # ❌ This suggests Words are renderable
# Words are abstract - only Text objects render
```
### ✅ **Correct Separation**
```python
# CORRECT: Clear separation
abstract_word = Word("test") # Abstract content
concrete_text = Text("test", font) # Concrete rendering
line.text_objects.append(concrete_text) # Concrete objects in concrete container
```
## Benefits of This Architecture
### 1. **Flexibility**
- Same content can be rendered at different sizes
- Multiple output formats from single source
- Easy to implement responsive design
### 2. **Testability**
- Abstract logic can be tested without rendering
- Layout algorithms can be tested independently
- Visual rendering can be mocked
### 3. **Performance**
- Abstract content can be cached and reused
- Layout can be computed once for multiple renderings
- Incremental updates possible
### 4. **Maintainability**
- Clear boundaries between text processing and rendering
- Changes to rendering don't affect content parsing
- Easy to swap rendering backends
## File Organization
```
pyWebLayout/
├── abstract/ # Content and structure
│ ├── block.py # Document blocks (Paragraph, Heading, etc.)
│ ├── inline.py # Inline content (Word, etc.)
│ ├── document.py # Document structure
│ └── functional.py # Links, buttons, etc.
├── concrete/ # Rendering and layout
│ ├── text.py # Text and Line rendering
│ ├── page.py # Page layout and containers
│ ├── box.py # Base rendering classes
│ ├── image.py # Image rendering
│ └── functional.py # Interactive elements
├── typesetting/ # Layout algorithms
│ ├── paragraph_layout.py # Abstract → Concrete transformation
│ ├── flow.py # Text flow management
│ └── pagination.py # Page breaking logic
└── style/ # Styling and formatting
├── fonts.py # Font management
├── layout.py # Layout constants
└── alignment.py # Alignment enums
```
## Conclusion
The Abstract/Concrete separation is fundamental to pyWebLayout's design. It ensures clean separation between content processing and visual rendering, enabling flexible, maintainable, and testable document processing pipelines.
**Remember**:
- **Abstract** = What to display (content, structure, semantics)
- **Concrete** = How to display it (pixels, fonts, positioning, rendering)
This architecture enables the library to handle complex document layouts while maintaining clear, understandable code organization.