# pyWebLayout Architecture: Abstract vs Concrete This document explains the fundamental architectural separation between **Abstract** and **Concrete** layers in the pyWebLayout library. ## Overview The pyWebLayout library follows a clear separation between two distinct layers: - **Abstract Layer**: Represents the logical structure and content of documents (HTML/EPUB text) - **Concrete Layer**: Handles the spatial rendering and visual representation of content This separation provides flexibility, testability, and clean separation of concerns. ## Abstract Layer (`pyWebLayout/abstract/`) The Abstract layer deals with the **logical structure** of documents without concerning itself with how content will be visually rendered. ### Key Components #### `abstract/block.py` - `Block`: Base class for all block-level content - `Paragraph`: Represents a logical paragraph containing words - `Heading`: Represents headings with semantic levels (H1-H6) - `HList`: Represents ordered/unordered lists - `Image`: Represents image references #### `abstract/inline.py` - `Word`: Represents individual words with text content and styling information - Contains methods for hyphenation and text manipulation - Does **not** handle rendering or spatial layout #### `abstract/document.py` - `Document`: Container for the overall document structure - `Chapter`: Logical grouping of blocks (for books/long documents) ### Characteristics of Abstract Classes 1. **Content-focused**: Store text, structure, and semantic meaning 2. **Layout-agnostic**: No knowledge of fonts, pixels, or rendering 3. **Reusable**: Same content can be rendered in different formats/sizes 4. **Serializable**: Can be saved/loaded without rendering context ### Example: Abstract Word ```python # An Abstract Word knows its text content and semantic properties word = Word("supercalifragilisticexpialidocious", font_style) word.hyphenate() # Logical operation - finds break points parts = word.get_hyphenated_parts() # Returns ["super-", "cali-", "fragi-", ...] ``` ## Concrete Layer (`pyWebLayout/concrete/`) The Concrete layer handles the **spatial representation** and actual rendering of content. ### Key Components #### `concrete/text.py` - `Text`: Renders a specific text fragment with precise positioning - `Line`: Manages a line of `Text` objects with spacing and alignment - Handles actual pixel measurements, font rendering, and positioning #### `concrete/page.py` - `Page`: Top-level container for rendered content - `Container`: Layout manager for organizing renderable objects - Handles spatial layout, pagination, and visual composition #### `concrete/box.py` - `Box`: Base class for all spatially-aware renderable objects - Provides positioning, sizing, and rendering capabilities ### Characteristics of Concrete Classes 1. **Rendering-focused**: Handle pixels, fonts, images, and visual output 2. **Spatially-aware**: Know exact positions, sizes, and layout constraints 3. **Implementation-specific**: Tied to specific rendering technologies (PIL, etc.) 4. **Non-portable**: Rendering results are tied to specific display contexts ### Example: Concrete Text ```python # A Concrete Text object handles actual rendering text = Text("super-", font) # Specific text fragment text._calculate_dimensions() # Computes exact pixel size image = text.render() # Produces actual visual output ``` ## The Transformation Process The architecture involves a clear transformation from Abstract to Concrete: ``` Abstract Document ↓ [Parser Layer] ↓ Abstract Blocks (Paragraph, Heading, etc.) ↓ [Layout Engine] ↓ Concrete Objects (Text, Line, Page) ↓ [Rendering Engine] ↓ Visual Output (Images, PDF, etc.) ``` ### Example Transformation ```python # 1. Abstract content paragraph = Paragraph() paragraph.add_word(Word("This", font)) paragraph.add_word(Word("is", font)) paragraph.add_word(Word("a", font)) paragraph.add_word(Word("test", font)) # 2. Layout transformation layout = ParagraphLayout(line_width=200, line_height=20) lines = layout.layout_paragraph(paragraph) # Returns List[Line] # 3. Each Line contains concrete Text objects for line in lines: for text_obj in line.text_objects: # List[Text] print(f"Text: '{text_obj.text}' at position {text_obj._origin}") ``` ## Key Architectural Principles ### 1. **Single Responsibility** - Abstract classes: Handle content and structure - Concrete classes: Handle rendering and layout ### 2. **Separation of Concerns** - Text parsing/processing ≠ Text rendering - Document structure ≠ Page layout - Content semantics ≠ Visual presentation ### 3. **Immutable Abstract Content** - Abstract content remains unchanged during rendering - Multiple concrete representations can be generated from same abstract content - Enables pagination, different formats, responsive layouts ### 4. **One-to-Many Relationships** - One Abstract Word → Multiple Concrete Text objects (hyphenation) - One Abstract Paragraph → Multiple Concrete Lines - One Abstract Document → Multiple Concrete Pages ## Common Anti-Patterns to Avoid ### ❌ **Mixing Concerns** ```python # WRONG: Abstract class knowing about pixels class Word: def __init__(self, text): self.text = text self.rendered_width = None # ❌ Concrete concern in abstract class ``` ### ❌ **renderable_words Concept** ```python # WRONG: Confusing abstract and concrete line.renderable_words # ❌ This suggests Words are renderable # Words are abstract - only Text objects render ``` ### ✅ **Correct Separation** ```python # CORRECT: Clear separation abstract_word = Word("test") # Abstract content concrete_text = Text("test", font) # Concrete rendering line.text_objects.append(concrete_text) # Concrete objects in concrete container ``` ## Benefits of This Architecture ### 1. **Flexibility** - Same content can be rendered at different sizes - Multiple output formats from single source - Easy to implement responsive design ### 2. **Testability** - Abstract logic can be tested without rendering - Layout algorithms can be tested independently - Visual rendering can be mocked ### 3. **Performance** - Abstract content can be cached and reused - Layout can be computed once for multiple renderings - Incremental updates possible ### 4. **Maintainability** - Clear boundaries between text processing and rendering - Changes to rendering don't affect content parsing - Easy to swap rendering backends ## File Organization ``` pyWebLayout/ ├── abstract/ # Content and structure │ ├── block.py # Document blocks (Paragraph, Heading, etc.) │ ├── inline.py # Inline content (Word, etc.) │ ├── document.py # Document structure │ └── functional.py # Links, buttons, etc. │ ├── concrete/ # Rendering and layout │ ├── text.py # Text and Line rendering │ ├── page.py # Page layout and containers │ ├── box.py # Base rendering classes │ ├── image.py # Image rendering │ └── functional.py # Interactive elements │ ├── typesetting/ # Layout algorithms │ ├── paragraph_layout.py # Abstract → Concrete transformation │ ├── flow.py # Text flow management │ └── pagination.py # Page breaking logic │ └── style/ # Styling and formatting ├── fonts.py # Font management ├── layout.py # Layout constants └── alignment.py # Alignment enums ``` ## Conclusion The Abstract/Concrete separation is fundamental to pyWebLayout's design. It ensures clean separation between content processing and visual rendering, enabling flexible, maintainable, and testable document processing pipelines. **Remember**: - **Abstract** = What to display (content, structure, semantics) - **Concrete** = How to display it (pixels, fonts, positioning, rendering) This architecture enables the library to handle complex document layouts while maintaining clear, understandable code organization.