pyWebLayout/ARCHITECTURE.md
2025-11-12 12:03:27 +00:00

8.0 KiB

pyWebLayout Architecture: Abstract vs Concrete

This document explains the fundamental architectural separation between Abstract and Concrete layers in the pyWebLayout library.

Overview

The pyWebLayout library follows a clear separation between two distinct layers:

  • Abstract Layer: Represents the logical structure and content of documents (HTML/EPUB text)
  • Concrete Layer: Handles the spatial rendering and visual representation of content

This separation provides flexibility, testability, and clean separation of concerns.

Abstract Layer (pyWebLayout/abstract/)

The Abstract layer deals with the logical structure of documents without concerning itself with how content will be visually rendered.

Key Components

abstract/block.py

  • Block: Base class for all block-level content
  • Paragraph: Represents a logical paragraph containing words
  • Heading: Represents headings with semantic levels (H1-H6)
  • HList: Represents ordered/unordered lists
  • Image: Represents image references

abstract/inline.py

  • Word: Represents individual words with text content and styling information
  • Contains methods for hyphenation and text manipulation
  • Does not handle rendering or spatial layout

abstract/document.py

  • Document: Container for the overall document structure
  • Chapter: Logical grouping of blocks (for books/long documents)

Characteristics of Abstract Classes

  1. Content-focused: Store text, structure, and semantic meaning
  2. Layout-agnostic: No knowledge of fonts, pixels, or rendering
  3. Reusable: Same content can be rendered in different formats/sizes
  4. Serializable: Can be saved/loaded without rendering context

Example: Abstract Word

# An Abstract Word knows its text content and semantic properties
word = Word("supercalifragilisticexpialidocious", font_style)
word.hyphenate()  # Logical operation - finds break points
parts = word.get_hyphenated_parts()  # Returns ["super-", "cali-", "fragi-", ...]

Concrete Layer (pyWebLayout/concrete/)

The Concrete layer handles the spatial representation and actual rendering of content.

Key Components

concrete/text.py

  • Text: Renders a specific text fragment with precise positioning
  • Line: Manages a line of Text objects with spacing and alignment
  • Handles actual pixel measurements, font rendering, and positioning

concrete/page.py

  • Page: Top-level container for rendered content
  • Container: Layout manager for organizing renderable objects
  • Handles spatial layout, pagination, and visual composition

concrete/box.py

  • Box: Base class for all spatially-aware renderable objects
  • Provides positioning, sizing, and rendering capabilities

Characteristics of Concrete Classes

  1. Rendering-focused: Handle pixels, fonts, images, and visual output
  2. Spatially-aware: Know exact positions, sizes, and layout constraints
  3. Implementation-specific: Tied to specific rendering technologies (PIL, etc.)
  4. Non-portable: Rendering results are tied to specific display contexts

Example: Concrete Text

# A Concrete Text object handles actual rendering
text = Text("super-", font)  # Specific text fragment
text._calculate_dimensions()  # Computes exact pixel size
image = text.render()  # Produces actual visual output

The Transformation Process

The architecture involves a clear transformation from Abstract to Concrete:

Abstract Document
       ↓
   [Parser Layer]
       ↓  
Abstract Blocks (Paragraph, Heading, etc.)
       ↓
   [Layout Engine]
       ↓
Concrete Objects (Text, Line, Page)
       ↓
   [Rendering Engine]  
       ↓
Visual Output (Images, PDF, etc.)

Example Transformation

# 1. Abstract content
paragraph = Paragraph()
paragraph.add_word(Word("This", font))
paragraph.add_word(Word("is", font))
paragraph.add_word(Word("a", font))
paragraph.add_word(Word("test", font))

# 2. Layout transformation
layout = ParagraphLayout(line_width=200, line_height=20)
lines = layout.layout_paragraph(paragraph)  # Returns List[Line]

# 3. Each Line contains concrete Text objects
for line in lines:
    for text_obj in line.text_objects:  # List[Text]
        print(f"Text: '{text_obj.text}' at position {text_obj._origin}")

Key Architectural Principles

1. Single Responsibility

  • Abstract classes: Handle content and structure
  • Concrete classes: Handle rendering and layout

2. Separation of Concerns

  • Text parsing/processing ≠ Text rendering
  • Document structure ≠ Page layout
  • Content semantics ≠ Visual presentation

3. Immutable Abstract Content

  • Abstract content remains unchanged during rendering
  • Multiple concrete representations can be generated from same abstract content
  • Enables pagination, different formats, responsive layouts

4. One-to-Many Relationships

  • One Abstract Word → Multiple Concrete Text objects (hyphenation)
  • One Abstract Paragraph → Multiple Concrete Lines
  • One Abstract Document → Multiple Concrete Pages

Common Anti-Patterns to Avoid

Mixing Concerns

# WRONG: Abstract class knowing about pixels
class Word:
    def __init__(self, text):
        self.text = text
        self.rendered_width = None  # ❌ Concrete concern in abstract class

renderable_words Concept

# WRONG: Confusing abstract and concrete
line.renderable_words  # ❌ This suggests Words are renderable
                      # Words are abstract - only Text objects render

Correct Separation

# CORRECT: Clear separation
abstract_word = Word("test")  # Abstract content
concrete_text = Text("test", font)  # Concrete rendering
line.text_objects.append(concrete_text)  # Concrete objects in concrete container

Benefits of This Architecture

1. Flexibility

  • Same content can be rendered at different sizes
  • Multiple output formats from single source
  • Easy to implement responsive design

2. Testability

  • Abstract logic can be tested without rendering
  • Layout algorithms can be tested independently
  • Visual rendering can be mocked

3. Performance

  • Abstract content can be cached and reused
  • Layout can be computed once for multiple renderings
  • Incremental updates possible

4. Maintainability

  • Clear boundaries between text processing and rendering
  • Changes to rendering don't affect content parsing
  • Easy to swap rendering backends

File Organization

pyWebLayout/
├── abstract/           # Content and structure
│   ├── block.py       # Document blocks (Paragraph, Heading, etc.)
│   ├── inline.py      # Inline content (Word, etc.)
│   ├── document.py    # Document structure
│   └── functional.py  # Links, buttons, etc.
│
├── concrete/          # Rendering and layout
│   ├── text.py        # Text and Line rendering
│   ├── page.py        # Page layout and containers
│   ├── box.py         # Base rendering classes
│   ├── image.py       # Image rendering
│   └── functional.py  # Interactive elements
│
├── typesetting/       # Layout algorithms
│   ├── paragraph_layout.py  # Abstract → Concrete transformation
│   ├── flow.py        # Text flow management
│   └── pagination.py  # Page breaking logic
│
└── style/             # Styling and formatting
    ├── fonts.py       # Font management
    ├── layout.py      # Layout constants
    └── alignment.py   # Alignment enums

Conclusion

The Abstract/Concrete separation is fundamental to pyWebLayout's design. It ensures clean separation between content processing and visual rendering, enabling flexible, maintainable, and testable document processing pipelines.

Remember:

  • Abstract = What to display (content, structure, semantics)
  • Concrete = How to display it (pixels, fonts, positioning, rendering)

This architecture enables the library to handle complex document layouts while maintaining clear, understandable code organization.