8.0 KiB
pyWebLayout Architecture: Abstract vs Concrete
This document explains the fundamental architectural separation between Abstract and Concrete layers in the pyWebLayout library.
Overview
The pyWebLayout library follows a clear separation between two distinct layers:
- Abstract Layer: Represents the logical structure and content of documents (HTML/EPUB text)
- Concrete Layer: Handles the spatial rendering and visual representation of content
This separation provides flexibility, testability, and clean separation of concerns.
Abstract Layer (pyWebLayout/abstract/)
The Abstract layer deals with the logical structure of documents without concerning itself with how content will be visually rendered.
Key Components
abstract/block.py
Block: Base class for all block-level contentParagraph: Represents a logical paragraph containing wordsHeading: Represents headings with semantic levels (H1-H6)HList: Represents ordered/unordered listsImage: Represents image references
abstract/inline.py
Word: Represents individual words with text content and styling information- Contains methods for hyphenation and text manipulation
- Does not handle rendering or spatial layout
abstract/document.py
Document: Container for the overall document structureChapter: Logical grouping of blocks (for books/long documents)
Characteristics of Abstract Classes
- Content-focused: Store text, structure, and semantic meaning
- Layout-agnostic: No knowledge of fonts, pixels, or rendering
- Reusable: Same content can be rendered in different formats/sizes
- Serializable: Can be saved/loaded without rendering context
Example: Abstract Word
# An Abstract Word knows its text content and semantic properties
word = Word("supercalifragilisticexpialidocious", font_style)
word.hyphenate() # Logical operation - finds break points
parts = word.get_hyphenated_parts() # Returns ["super-", "cali-", "fragi-", ...]
Concrete Layer (pyWebLayout/concrete/)
The Concrete layer handles the spatial representation and actual rendering of content.
Key Components
concrete/text.py
Text: Renders a specific text fragment with precise positioningLine: Manages a line ofTextobjects with spacing and alignment- Handles actual pixel measurements, font rendering, and positioning
concrete/page.py
Page: Top-level container for rendered contentContainer: Layout manager for organizing renderable objects- Handles spatial layout, pagination, and visual composition
concrete/box.py
Box: Base class for all spatially-aware renderable objects- Provides positioning, sizing, and rendering capabilities
Characteristics of Concrete Classes
- Rendering-focused: Handle pixels, fonts, images, and visual output
- Spatially-aware: Know exact positions, sizes, and layout constraints
- Implementation-specific: Tied to specific rendering technologies (PIL, etc.)
- Non-portable: Rendering results are tied to specific display contexts
Example: Concrete Text
# A Concrete Text object handles actual rendering
text = Text("super-", font) # Specific text fragment
text._calculate_dimensions() # Computes exact pixel size
image = text.render() # Produces actual visual output
The Transformation Process
The architecture involves a clear transformation from Abstract to Concrete:
Abstract Document
↓
[Parser Layer]
↓
Abstract Blocks (Paragraph, Heading, etc.)
↓
[Layout Engine]
↓
Concrete Objects (Text, Line, Page)
↓
[Rendering Engine]
↓
Visual Output (Images, PDF, etc.)
Example Transformation
# 1. Abstract content
paragraph = Paragraph()
paragraph.add_word(Word("This", font))
paragraph.add_word(Word("is", font))
paragraph.add_word(Word("a", font))
paragraph.add_word(Word("test", font))
# 2. Layout transformation
layout = ParagraphLayout(line_width=200, line_height=20)
lines = layout.layout_paragraph(paragraph) # Returns List[Line]
# 3. Each Line contains concrete Text objects
for line in lines:
for text_obj in line.text_objects: # List[Text]
print(f"Text: '{text_obj.text}' at position {text_obj._origin}")
Key Architectural Principles
1. Single Responsibility
- Abstract classes: Handle content and structure
- Concrete classes: Handle rendering and layout
2. Separation of Concerns
- Text parsing/processing ≠ Text rendering
- Document structure ≠ Page layout
- Content semantics ≠ Visual presentation
3. Immutable Abstract Content
- Abstract content remains unchanged during rendering
- Multiple concrete representations can be generated from same abstract content
- Enables pagination, different formats, responsive layouts
4. One-to-Many Relationships
- One Abstract Word → Multiple Concrete Text objects (hyphenation)
- One Abstract Paragraph → Multiple Concrete Lines
- One Abstract Document → Multiple Concrete Pages
Common Anti-Patterns to Avoid
❌ Mixing Concerns
# WRONG: Abstract class knowing about pixels
class Word:
def __init__(self, text):
self.text = text
self.rendered_width = None # ❌ Concrete concern in abstract class
❌ renderable_words Concept
# WRONG: Confusing abstract and concrete
line.renderable_words # ❌ This suggests Words are renderable
# Words are abstract - only Text objects render
✅ Correct Separation
# CORRECT: Clear separation
abstract_word = Word("test") # Abstract content
concrete_text = Text("test", font) # Concrete rendering
line.text_objects.append(concrete_text) # Concrete objects in concrete container
Benefits of This Architecture
1. Flexibility
- Same content can be rendered at different sizes
- Multiple output formats from single source
- Easy to implement responsive design
2. Testability
- Abstract logic can be tested without rendering
- Layout algorithms can be tested independently
- Visual rendering can be mocked
3. Performance
- Abstract content can be cached and reused
- Layout can be computed once for multiple renderings
- Incremental updates possible
4. Maintainability
- Clear boundaries between text processing and rendering
- Changes to rendering don't affect content parsing
- Easy to swap rendering backends
File Organization
pyWebLayout/
├── abstract/ # Content and structure
│ ├── block.py # Document blocks (Paragraph, Heading, etc.)
│ ├── inline.py # Inline content (Word, etc.)
│ ├── document.py # Document structure
│ └── functional.py # Links, buttons, etc.
│
├── concrete/ # Rendering and layout
│ ├── text.py # Text and Line rendering
│ ├── page.py # Page layout and containers
│ ├── box.py # Base rendering classes
│ ├── image.py # Image rendering
│ └── functional.py # Interactive elements
│
├── typesetting/ # Layout algorithms
│ ├── paragraph_layout.py # Abstract → Concrete transformation
│ ├── flow.py # Text flow management
│ └── pagination.py # Page breaking logic
│
└── style/ # Styling and formatting
├── fonts.py # Font management
├── layout.py # Layout constants
└── alignment.py # Alignment enums
Conclusion
The Abstract/Concrete separation is fundamental to pyWebLayout's design. It ensures clean separation between content processing and visual rendering, enabling flexible, maintainable, and testable document processing pipelines.
Remember:
- Abstract = What to display (content, structure, semantics)
- Concrete = How to display it (pixels, fonts, positioning, rendering)
This architecture enables the library to handle complex document layouts while maintaining clear, understandable code organization.