dtourolle/pyWebLayout

Fork 0

Duncan Tourolle 8d892bfe28

Python CI / test (push) Failing after 4m28s

Details

New alignement handlers

2025-06-08 14:08:29 +02:00

8.0 KiB

Raw Blame History

pyWebLayout Architecture: Abstract vs Concrete

This document explains the fundamental architectural separation between Abstract and Concrete layers in the pyWebLayout library.

Overview

The pyWebLayout library follows a clear separation between two distinct layers:

Abstract Layer: Represents the logical structure and content of documents (HTML/EPUB text)
Concrete Layer: Handles the spatial rendering and visual representation of content

This separation provides flexibility, testability, and clean separation of concerns.

Abstract Layer (`pyWebLayout/abstract/`)

The Abstract layer deals with the logical structure of documents without concerning itself with how content will be visually rendered.

Key Components

`abstract/block.py`

Block: Base class for all block-level content
Paragraph: Represents a logical paragraph containing words
Heading: Represents headings with semantic levels (H1-H6)
HList: Represents ordered/unordered lists
Image: Represents image references

`abstract/inline.py`

Word: Represents individual words with text content and styling information
Contains methods for hyphenation and text manipulation
Does not handle rendering or spatial layout

`abstract/document.py`

Document: Container for the overall document structure
Chapter: Logical grouping of blocks (for books/long documents)

Characteristics of Abstract Classes

Content-focused: Store text, structure, and semantic meaning
Layout-agnostic: No knowledge of fonts, pixels, or rendering
Reusable: Same content can be rendered in different formats/sizes
Serializable: Can be saved/loaded without rendering context

Example: Abstract Word

# An Abstract Word knows its text content and semantic properties
word = Word("supercalifragilisticexpialidocious", font_style)
word.hyphenate()  # Logical operation - finds break points
parts = word.get_hyphenated_parts()  # Returns ["super-", "cali-", "fragi-", ...]

Concrete Layer (`pyWebLayout/concrete/`)

The Concrete layer handles the spatial representation and actual rendering of content.

Key Components

`concrete/text.py`

Text: Renders a specific text fragment with precise positioning
Line: Manages a line of Text objects with spacing and alignment
Handles actual pixel measurements, font rendering, and positioning

`concrete/page.py`

Page: Top-level container for rendered content
Container: Layout manager for organizing renderable objects
Handles spatial layout, pagination, and visual composition

`concrete/box.py`

Box: Base class for all spatially-aware renderable objects
Provides positioning, sizing, and rendering capabilities

Characteristics of Concrete Classes

Rendering-focused: Handle pixels, fonts, images, and visual output
Spatially-aware: Know exact positions, sizes, and layout constraints
Implementation-specific: Tied to specific rendering technologies (PIL, etc.)
Non-portable: Rendering results are tied to specific display contexts

Example: Concrete Text

# A Concrete Text object handles actual rendering
text = Text("super-", font)  # Specific text fragment
text._calculate_dimensions()  # Computes exact pixel size
image = text.render()  # Produces actual visual output

The Transformation Process

The architecture involves a clear transformation from Abstract to Concrete:

Abstract Document
       ↓
   [Parser Layer]
       ↓  
Abstract Blocks (Paragraph, Heading, etc.)
       ↓
   [Layout Engine]
       ↓
Concrete Objects (Text, Line, Page)
       ↓
   [Rendering Engine]  
       ↓
Visual Output (Images, PDF, etc.)

Example Transformation

# 1. Abstract content
paragraph = Paragraph()
paragraph.add_word(Word("This", font))
paragraph.add_word(Word("is", font))
paragraph.add_word(Word("a", font))
paragraph.add_word(Word("test", font))

# 2. Layout transformation
layout = ParagraphLayout(line_width=200, line_height=20)
lines = layout.layout_paragraph(paragraph)  # Returns List[Line]

# 3. Each Line contains concrete Text objects
for line in lines:
    for text_obj in line.text_objects:  # List[Text]
        print(f"Text: '{text_obj.text}' at position {text_obj._origin}")

Key Architectural Principles

1. Single Responsibility

Abstract classes: Handle content and structure
Concrete classes: Handle rendering and layout

2. Separation of Concerns

Text parsing/processing ≠ Text rendering
Document structure ≠ Page layout
Content semantics ≠ Visual presentation

3. Immutable Abstract Content

Abstract content remains unchanged during rendering
Multiple concrete representations can be generated from same abstract content
Enables pagination, different formats, responsive layouts

4. One-to-Many Relationships

One Abstract Word → Multiple Concrete Text objects (hyphenation)
One Abstract Paragraph → Multiple Concrete Lines
One Abstract Document → Multiple Concrete Pages

Common Anti-Patterns to Avoid

❌ Mixing Concerns

# WRONG: Abstract class knowing about pixels
class Word:
    def __init__(self, text):
        self.text = text
        self.rendered_width = None  # ❌ Concrete concern in abstract class

❌ renderable_words Concept

# WRONG: Confusing abstract and concrete
line.renderable_words  # ❌ This suggests Words are renderable
                      # Words are abstract - only Text objects render

✅ Correct Separation

# CORRECT: Clear separation
abstract_word = Word("test")  # Abstract content
concrete_text = Text("test", font)  # Concrete rendering
line.text_objects.append(concrete_text)  # Concrete objects in concrete container

Benefits of This Architecture

1. Flexibility

Same content can be rendered at different sizes
Multiple output formats from single source
Easy to implement responsive design

2. Testability

Abstract logic can be tested without rendering
Layout algorithms can be tested independently
Visual rendering can be mocked

3. Performance

Abstract content can be cached and reused
Layout can be computed once for multiple renderings
Incremental updates possible

4. Maintainability

Clear boundaries between text processing and rendering
Changes to rendering don't affect content parsing
Easy to swap rendering backends

File Organization

pyWebLayout/
├── abstract/           # Content and structure
│   ├── block.py       # Document blocks (Paragraph, Heading, etc.)
│   ├── inline.py      # Inline content (Word, etc.)
│   ├── document.py    # Document structure
│   └── functional.py  # Links, buttons, etc.
│
├── concrete/          # Rendering and layout
│   ├── text.py        # Text and Line rendering
│   ├── page.py        # Page layout and containers
│   ├── box.py         # Base rendering classes
│   ├── image.py       # Image rendering
│   └── functional.py  # Interactive elements
│
├── typesetting/       # Layout algorithms
│   ├── paragraph_layout.py  # Abstract → Concrete transformation
│   ├── flow.py        # Text flow management
│   └── pagination.py  # Page breaking logic
│
└── style/             # Styling and formatting
    ├── fonts.py       # Font management
    ├── layout.py      # Layout constants
    └── alignment.py   # Alignment enums

Conclusion

The Abstract/Concrete separation is fundamental to pyWebLayout's design. It ensures clean separation between content processing and visual rendering, enabling flexible, maintainable, and testable document processing pipelines.

Remember:

Abstract = What to display (content, structure, semantics)
Concrete = How to display it (pixels, fonts, positioning, rendering)

This architecture enables the library to handle complex document layouts while maintaining clear, understandable code organization.

8.0 KiB Raw Blame History

pyWebLayout Architecture: Abstract vs Concrete

Overview

Abstract Layer (pyWebLayout/abstract/)

Key Components

abstract/block.py

abstract/inline.py

abstract/document.py

Characteristics of Abstract Classes

Example: Abstract Word

Concrete Layer (pyWebLayout/concrete/)

Key Components

concrete/text.py

concrete/page.py

concrete/box.py

Characteristics of Concrete Classes

Example: Concrete Text

The Transformation Process

Example Transformation

Key Architectural Principles

1. Single Responsibility

2. Separation of Concerns

3. Immutable Abstract Content

4. One-to-Many Relationships

Common Anti-Patterns to Avoid

❌ Mixing Concerns

❌ renderable_words Concept

✅ Correct Separation

Benefits of This Architecture

1. Flexibility

2. Testability

3. Performance

4. Maintainability

File Organization

Conclusion

8.0 KiB

Raw Blame History

Abstract Layer (`pyWebLayout/abstract/`)

`abstract/block.py`

`abstract/inline.py`

`abstract/document.py`

Concrete Layer (`pyWebLayout/concrete/`)

`concrete/text.py`

`concrete/page.py`

`concrete/box.py`