234 lines
8.0 KiB
Markdown
234 lines
8.0 KiB
Markdown
# pyWebLayout Architecture: Abstract vs Concrete
|
|
|
|
This document explains the fundamental architectural separation between **Abstract** and **Concrete** layers in the pyWebLayout library.
|
|
|
|
## Overview
|
|
|
|
The pyWebLayout library follows a clear separation between two distinct layers:
|
|
|
|
- **Abstract Layer**: Represents the logical structure and content of documents (HTML/EPUB text)
|
|
- **Concrete Layer**: Handles the spatial rendering and visual representation of content
|
|
|
|
This separation provides flexibility, testability, and clean separation of concerns.
|
|
|
|
## Abstract Layer (`pyWebLayout/abstract/`)
|
|
|
|
The Abstract layer deals with the **logical structure** of documents without concerning itself with how content will be visually rendered.
|
|
|
|
### Key Components
|
|
|
|
#### `abstract/block.py`
|
|
- `Block`: Base class for all block-level content
|
|
- `Paragraph`: Represents a logical paragraph containing words
|
|
- `Heading`: Represents headings with semantic levels (H1-H6)
|
|
- `HList`: Represents ordered/unordered lists
|
|
- `Image`: Represents image references
|
|
|
|
#### `abstract/inline.py`
|
|
- `Word`: Represents individual words with text content and styling information
|
|
- Contains methods for hyphenation and text manipulation
|
|
- Does **not** handle rendering or spatial layout
|
|
|
|
#### `abstract/document.py`
|
|
- `Document`: Container for the overall document structure
|
|
- `Chapter`: Logical grouping of blocks (for books/long documents)
|
|
|
|
### Characteristics of Abstract Classes
|
|
|
|
1. **Content-focused**: Store text, structure, and semantic meaning
|
|
2. **Layout-agnostic**: No knowledge of fonts, pixels, or rendering
|
|
3. **Reusable**: Same content can be rendered in different formats/sizes
|
|
4. **Serializable**: Can be saved/loaded without rendering context
|
|
|
|
### Example: Abstract Word
|
|
|
|
```python
|
|
# An Abstract Word knows its text content and semantic properties
|
|
word = Word("supercalifragilisticexpialidocious", font_style)
|
|
word.hyphenate() # Logical operation - finds break points
|
|
parts = word.get_hyphenated_parts() # Returns ["super-", "cali-", "fragi-", ...]
|
|
```
|
|
|
|
## Concrete Layer (`pyWebLayout/concrete/`)
|
|
|
|
The Concrete layer handles the **spatial representation** and actual rendering of content.
|
|
|
|
### Key Components
|
|
|
|
#### `concrete/text.py`
|
|
- `Text`: Renders a specific text fragment with precise positioning
|
|
- `Line`: Manages a line of `Text` objects with spacing and alignment
|
|
- Handles actual pixel measurements, font rendering, and positioning
|
|
|
|
#### `concrete/page.py`
|
|
- `Page`: Top-level container for rendered content
|
|
- `Container`: Layout manager for organizing renderable objects
|
|
- Handles spatial layout, pagination, and visual composition
|
|
|
|
#### `concrete/box.py`
|
|
- `Box`: Base class for all spatially-aware renderable objects
|
|
- Provides positioning, sizing, and rendering capabilities
|
|
|
|
### Characteristics of Concrete Classes
|
|
|
|
1. **Rendering-focused**: Handle pixels, fonts, images, and visual output
|
|
2. **Spatially-aware**: Know exact positions, sizes, and layout constraints
|
|
3. **Implementation-specific**: Tied to specific rendering technologies (PIL, etc.)
|
|
4. **Non-portable**: Rendering results are tied to specific display contexts
|
|
|
|
### Example: Concrete Text
|
|
|
|
```python
|
|
# A Concrete Text object handles actual rendering
|
|
text = Text("super-", font) # Specific text fragment
|
|
text._calculate_dimensions() # Computes exact pixel size
|
|
image = text.render() # Produces actual visual output
|
|
```
|
|
|
|
## The Transformation Process
|
|
|
|
The architecture involves a clear transformation from Abstract to Concrete:
|
|
|
|
```
|
|
Abstract Document
|
|
↓
|
|
[Parser Layer]
|
|
↓
|
|
Abstract Blocks (Paragraph, Heading, etc.)
|
|
↓
|
|
[Layout Engine]
|
|
↓
|
|
Concrete Objects (Text, Line, Page)
|
|
↓
|
|
[Rendering Engine]
|
|
↓
|
|
Visual Output (Images, PDF, etc.)
|
|
```
|
|
|
|
### Example Transformation
|
|
|
|
```python
|
|
# 1. Abstract content
|
|
paragraph = Paragraph()
|
|
paragraph.add_word(Word("This", font))
|
|
paragraph.add_word(Word("is", font))
|
|
paragraph.add_word(Word("a", font))
|
|
paragraph.add_word(Word("test", font))
|
|
|
|
# 2. Layout transformation
|
|
layout = ParagraphLayout(line_width=200, line_height=20)
|
|
lines = layout.layout_paragraph(paragraph) # Returns List[Line]
|
|
|
|
# 3. Each Line contains concrete Text objects
|
|
for line in lines:
|
|
for text_obj in line.text_objects: # List[Text]
|
|
print(f"Text: '{text_obj.text}' at position {text_obj._origin}")
|
|
```
|
|
|
|
## Key Architectural Principles
|
|
|
|
### 1. **Single Responsibility**
|
|
- Abstract classes: Handle content and structure
|
|
- Concrete classes: Handle rendering and layout
|
|
|
|
### 2. **Separation of Concerns**
|
|
- Text parsing/processing ≠ Text rendering
|
|
- Document structure ≠ Page layout
|
|
- Content semantics ≠ Visual presentation
|
|
|
|
### 3. **Immutable Abstract Content**
|
|
- Abstract content remains unchanged during rendering
|
|
- Multiple concrete representations can be generated from same abstract content
|
|
- Enables pagination, different formats, responsive layouts
|
|
|
|
### 4. **One-to-Many Relationships**
|
|
- One Abstract Word → Multiple Concrete Text objects (hyphenation)
|
|
- One Abstract Paragraph → Multiple Concrete Lines
|
|
- One Abstract Document → Multiple Concrete Pages
|
|
|
|
## Common Anti-Patterns to Avoid
|
|
|
|
### ❌ **Mixing Concerns**
|
|
```python
|
|
# WRONG: Abstract class knowing about pixels
|
|
class Word:
|
|
def __init__(self, text):
|
|
self.text = text
|
|
self.rendered_width = None # ❌ Concrete concern in abstract class
|
|
```
|
|
|
|
### ❌ **renderable_words Concept**
|
|
```python
|
|
# WRONG: Confusing abstract and concrete
|
|
line.renderable_words # ❌ This suggests Words are renderable
|
|
# Words are abstract - only Text objects render
|
|
```
|
|
|
|
### ✅ **Correct Separation**
|
|
```python
|
|
# CORRECT: Clear separation
|
|
abstract_word = Word("test") # Abstract content
|
|
concrete_text = Text("test", font) # Concrete rendering
|
|
line.text_objects.append(concrete_text) # Concrete objects in concrete container
|
|
```
|
|
|
|
## Benefits of This Architecture
|
|
|
|
### 1. **Flexibility**
|
|
- Same content can be rendered at different sizes
|
|
- Multiple output formats from single source
|
|
- Easy to implement responsive design
|
|
|
|
### 2. **Testability**
|
|
- Abstract logic can be tested without rendering
|
|
- Layout algorithms can be tested independently
|
|
- Visual rendering can be mocked
|
|
|
|
### 3. **Performance**
|
|
- Abstract content can be cached and reused
|
|
- Layout can be computed once for multiple renderings
|
|
- Incremental updates possible
|
|
|
|
### 4. **Maintainability**
|
|
- Clear boundaries between text processing and rendering
|
|
- Changes to rendering don't affect content parsing
|
|
- Easy to swap rendering backends
|
|
|
|
## File Organization
|
|
|
|
```
|
|
pyWebLayout/
|
|
├── abstract/ # Content and structure
|
|
│ ├── block.py # Document blocks (Paragraph, Heading, etc.)
|
|
│ ├── inline.py # Inline content (Word, etc.)
|
|
│ ├── document.py # Document structure
|
|
│ └── functional.py # Links, buttons, etc.
|
|
│
|
|
├── concrete/ # Rendering and layout
|
|
│ ├── text.py # Text and Line rendering
|
|
│ ├── page.py # Page layout and containers
|
|
│ ├── box.py # Base rendering classes
|
|
│ ├── image.py # Image rendering
|
|
│ └── functional.py # Interactive elements
|
|
│
|
|
├── typesetting/ # Layout algorithms
|
|
│ ├── paragraph_layout.py # Abstract → Concrete transformation
|
|
│ ├── flow.py # Text flow management
|
|
│ └── pagination.py # Page breaking logic
|
|
│
|
|
└── style/ # Styling and formatting
|
|
├── fonts.py # Font management
|
|
├── layout.py # Layout constants
|
|
└── alignment.py # Alignment enums
|
|
```
|
|
|
|
## Conclusion
|
|
|
|
The Abstract/Concrete separation is fundamental to pyWebLayout's design. It ensures clean separation between content processing and visual rendering, enabling flexible, maintainable, and testable document processing pipelines.
|
|
|
|
**Remember**:
|
|
- **Abstract** = What to display (content, structure, semantics)
|
|
- **Concrete** = How to display it (pixels, fonts, positioning, rendering)
|
|
|
|
This architecture enables the library to handle complex document layouts while maintaining clear, understandable code organization.
|