dtourolle/pyWebLayout

Fork 0

Duncan Tourolle 65ab46556f

Python CI / test (push) Failing after 3m55s

Details

big update with ok rendering

2025-08-27 22:22:54 +02:00

12 KiB

Raw Blame History

Recursive Position System

A flexible, hierarchical position tracking system for dynamic content positioning in document layout applications.

Overview

The Recursive Position System provides a powerful way to track positions within complex, nested document structures. Unlike traditional flat position systems that only track basic coordinates, this system can reference any type of content (words, images, table cells, list items, etc.) with full hierarchical context.

Key Features

Hierarchical Position Tracking: Navigate through nested document structures with precision
Dynamic Content Type Support: Handle words, images, tables, lists, forms, and more
Flexible Serialization: Save positions as JSON or Python shelf objects
Position Relationships: Query ancestor/descendant relationships between positions
Fluent Builder Pattern: Easy position creation with method chaining
Metadata Support: Store rendering context (font scale, themes, etc.)
Real-world Applications: Perfect for ereaders, document editors, and CMS systems

Architecture

Core Components

ContentType Enum: Defines all supported content types
LocationNode: Represents a single position within a content type
RecursivePosition: Hierarchical position with a path of LocationNodes
PositionBuilder: Fluent interface for creating positions
PositionStorage: Persistent storage with JSON and shelf support

Position Hierarchy

Positions are represented as paths from document root to specific locations:

Document → Chapter[2] → Block[5] → Paragraph → Word[12] → Character[3]
Document → Chapter[1] → Block[3] → Table → Row[2] → Cell[1] → Word[0]
Document → Chapter[0] → Block[1] → Image

Usage Examples

Basic Position Creation

from pyWebLayout.layout.recursive_position import PositionBuilder

# Create a word position with character-level precision
position = (PositionBuilder()
            .chapter(2)
            .block(5)
            .paragraph()
            .word(12, offset=3)
            .with_rendering_metadata(font_scale=1.5, theme="dark")
            .build())

print(position)  # document[0] -> chapter[2] -> block[5] -> paragraph[0] -> word[12]+3

Different Content Types

from pyWebLayout.layout.recursive_position import (
    create_word_position, create_image_position, 
    create_table_cell_position, create_list_item_position
)

# Word in a paragraph
word_pos = create_word_position(chapter=1, block=3, word=15, char_offset=2)

# Image in a block
image_pos = create_image_position(chapter=2, block=1, image_index=0)

# Cell in a table
table_pos = create_table_cell_position(chapter=0, block=4, row=2, col=1, word=5)

# Item in a list
list_pos = create_list_item_position(chapter=1, block=2, item=3, word=0)

Complex Nested Structures

# Position in a nested list
nested_pos = (PositionBuilder()
              .chapter(2)
              .block(5)
              .list(0, list_type="ordered")
              .list_item(2)
              .list(1, list_type="unordered")  # Nested list
              .list_item(1)
              .word(3)
              .build())

# Position in a table cell with metadata
table_pos = (PositionBuilder()
             .chapter(3)
             .block(10)
             .table(0, table_type="financial", columns=5)
             .table_row(2, row_type="data")
             .table_cell(1, cell_type="currency", format="USD")
             .word(0, text="$1,234.56")
             .build())

Position Relationships

# Check ancestor/descendant relationships
chapter_pos = PositionBuilder().chapter(1).block(2).build()
word_pos = PositionBuilder().chapter(1).block(2).paragraph().word(5).build()

print(chapter_pos.is_ancestor_of(word_pos))  # True
print(word_pos.is_descendant_of(chapter_pos))  # True

# Find common ancestors
other_pos = create_word_position(1, 3, 0)  # Different block
common = word_pos.get_common_ancestor(other_pos)
print(common)  # document[0] -> chapter[1]

Serialization and Storage

from pyWebLayout.layout.recursive_position import PositionStorage

# JSON storage
storage = PositionStorage("bookmarks", use_shelf=False)

# Save positions
storage.save_position("my_document", "bookmark1", position)
storage.save_position("my_document", "bookmark2", other_position)

# Load positions
loaded = storage.load_position("my_document", "bookmark1")
all_bookmarks = storage.list_positions("my_document")

# Shelf storage (binary, more efficient for large datasets)
shelf_storage = PositionStorage("bookmarks", use_shelf=True)
shelf_storage.save_position("my_document", "bookmark1", position)

Content Types

The system supports the following content types:

Type	Description	Example Usage
`DOCUMENT`	Document root	Always present as root node
`CHAPTER`	Document chapters/sections	Chapter navigation
`BLOCK`	Block-level elements	Paragraphs, headings, tables
`PARAGRAPH`	Text paragraphs	Text content
`HEADING`	Section headings	H1-H6 elements
`TABLE`	Table structures	Data tables
`TABLE_ROW`	Table rows	Row navigation
`TABLE_CELL`	Table cells	Cell-specific content
`LIST`	List structures	Ordered/unordered lists
`LIST_ITEM`	List items	Individual list entries
`WORD`	Individual words	Word-level precision
`IMAGE`	Images	Visual content
`LINK`	Hyperlinks	Interactive links
`BUTTON`	Interactive buttons	Form controls
`FORM_FIELD`	Form input fields	User input
`LINE`	Rendered text lines	Layout-specific
`PAGE`	Rendered pages	Pagination

Ereader Integration

The system is designed for ereader applications with features like:

Bookmark Management

# Save reading position with context
reading_pos = (PositionBuilder()
               .chapter(3)
               .block(15)
               .paragraph()
               .word(23, offset=7)
               .with_rendering_metadata(
                   font_scale=1.2,
                   page_size=[600, 800],
                   theme="sepia"
               )
               .build())

storage.save_position("novel", "chapter3_climax", reading_pos)

# Jump to chapter start
chapter_start = PositionBuilder().chapter(5).block(0).paragraph().word(0).build()

# Navigate within chapter
current_pos = PositionBuilder().chapter(5).block(12).paragraph().word(45).build()

# Check if positions are in same chapter
same_chapter = chapter_start.get_common_ancestor(current_pos)
chapter_node = same_chapter.get_node(ContentType.CHAPTER)
print(f"Both in chapter {chapter_node.index}")

Font Scaling Support

# Position with rendering metadata
position = (PositionBuilder()
            .chapter(2)
            .block(8)
            .paragraph()
            .word(15)
            .with_rendering_metadata(
                font_scale=1.5,
                page_size=[800, 600],
                line_height=24,
                theme="dark"
            )
            .build())

# Metadata persists through serialization
json_str = position.to_json()
restored = RecursivePosition.from_json(json_str)
print(restored.rendering_metadata["font_scale"])  # 1.5

Advanced Features

# Truncate position to specific level
word_pos = create_word_position(2, 5, 12, 3)
block_pos = word_pos.copy().truncate_to_type(ContentType.BLOCK)
print(block_pos)  # document[0] -> chapter[2] -> block[5]

# Navigate between related positions
table_cell_pos = create_table_cell_position(1, 3, 2, 1, 0)
next_cell_pos = table_cell_pos.copy()
cell_node = next_cell_pos.get_node(ContentType.TABLE_CELL)
cell_node.index = 2  # Move to next column

Metadata Usage

# Rich metadata support
position = (PositionBuilder()
            .chapter(1)
            .block(5)
            .table(0, 
                   table_type="financial",
                   columns=5,
                   rows=20,
                   title="Q3 Results")
            .table_row(3, 
                      row_type="data",
                      category="revenue")
            .table_cell(2,
                       cell_type="currency",
                       format="USD",
                       precision=2)
            .word(0, text="$1,234,567.89")
            .build())

# Access metadata
table_node = position.get_node(ContentType.TABLE)
print(table_node.metadata["title"])  # "Q3 Results"

cell_node = position.get_node(ContentType.TABLE_CELL)
print(cell_node.metadata["format"])  # "USD"

Performance Considerations

Memory Usage

Positions are lightweight (typically < 1KB serialized)
Path-based structure minimizes memory overhead
Metadata is optional and only stored when needed

Serialization Performance

JSON: Human-readable, cross-platform, ~2-3x larger
Shelf: Binary format, faster for large datasets, Python-specific

Comparison Operations

Position equality: O(n) where n is path depth
Ancestor/descendant checks: O(min(depth1, depth2))
Common ancestor finding: O(min(depth1, depth2))

Integration with Existing Systems

Backward Compatibility

The system can coexist with existing position tracking:

# Convert from old RenderingPosition
def convert_old_position(old_pos):
    return (PositionBuilder()
            .chapter(old_pos.chapter_index)
            .block(old_pos.block_index)
            .paragraph()
            .word(old_pos.word_index)
            .build())

# Convert to old format (lossy)
def convert_to_old(recursive_pos):
    chapter_node = recursive_pos.get_node(ContentType.CHAPTER)
    block_node = recursive_pos.get_node(ContentType.BLOCK)
    word_node = recursive_pos.get_node(ContentType.WORD)
    
    return RenderingPosition(
        chapter_index=chapter_node.index if chapter_node else 0,
        block_index=block_node.index if block_node else 0,
        word_index=word_node.index if word_node else 0
    )

Migration Strategy

Phase 1: Implement recursive system alongside existing system
Phase 2: Update bookmark storage to use new format
Phase 3: Migrate existing bookmarks
Phase 4: Update layout engines to generate recursive positions
Phase 5: Remove old position system

Testing

Comprehensive test suite covers:

Position creation and manipulation
Serialization/deserialization
Storage systems (JSON and shelf)
Position relationships
Real-world scenarios
Performance benchmarks

Run tests with:

python -m pytest tests/layout/test_recursive_position.py -v

Examples

See examples/recursive_position_demo.py for a complete demonstration of all features.

Future Enhancements

Potential improvements:

Position Comparison: Implement <, >, <=, >= operators for sorting
Path Compression: Optimize storage for deep hierarchies
Query Language: SQL-like queries for position sets
Indexing: B-tree indexing for large position collections
Diff Operations: Calculate differences between positions
Batch Operations: Efficient bulk position updates

Conclusion

The Recursive Position System provides a robust, flexible foundation for position tracking in complex document structures. Its hierarchical approach, rich metadata support, and efficient serialization make it ideal for modern ereader applications and document management systems.

The system's design prioritizes:

Flexibility: Handle any content type or nesting level
Performance: Efficient operations and minimal memory usage
Usability: Intuitive builder pattern and clear APIs
Persistence: Reliable serialization and storage options
Extensibility: Easy to add new content types and features

This makes it a significant improvement over traditional flat position systems and provides a solid foundation for advanced document navigation features.

12 KiB

Raw Blame History

Recursive Position System

Overview

Key Features

Architecture

Core Components

Position Hierarchy

Usage Examples

Basic Position Creation

Different Content Types

Complex Nested Structures

Position Relationships

Serialization and Storage

Content Types

Ereader Integration

Bookmark Management

Chapter Navigation

Font Scaling Support

Advanced Features

Position Navigation

Metadata Usage

Performance Considerations

Memory Usage

Serialization Performance

Comparison Operations

Integration with Existing Systems

Backward Compatibility

Migration Strategy

Testing

Examples

Future Enhancements

Conclusion

12 KiB Raw Blame History

Recursive Position System

Overview

Key Features

Architecture

Core Components

Position Hierarchy

Usage Examples

Basic Position Creation

Different Content Types

Complex Nested Structures

Position Relationships

Serialization and Storage

Content Types

Ereader Integration

Bookmark Management

Chapter Navigation

Font Scaling Support

Advanced Features

Position Navigation

Metadata Usage

Performance Considerations

Memory Usage

Serialization Performance

Comparison Operations

Integration with Existing Systems

Backward Compatibility

Migration Strategy

Testing

Examples

Future Enhancements

Conclusion

12 KiB

Raw Blame History