# Recursive Position System A flexible, hierarchical position tracking system for dynamic content positioning in document layout applications. ## Overview The Recursive Position System provides a powerful way to track positions within complex, nested document structures. Unlike traditional flat position systems that only track basic coordinates, this system can reference any type of content (words, images, table cells, list items, etc.) with full hierarchical context. ## Key Features - **Hierarchical Position Tracking**: Navigate through nested document structures with precision - **Dynamic Content Type Support**: Handle words, images, tables, lists, forms, and more - **Flexible Serialization**: Save positions as JSON or Python shelf objects - **Position Relationships**: Query ancestor/descendant relationships between positions - **Fluent Builder Pattern**: Easy position creation with method chaining - **Metadata Support**: Store rendering context (font scale, themes, etc.) - **Real-world Applications**: Perfect for ereaders, document editors, and CMS systems ## Architecture ### Core Components 1. **ContentType Enum**: Defines all supported content types 2. **LocationNode**: Represents a single position within a content type 3. **RecursivePosition**: Hierarchical position with a path of LocationNodes 4. **PositionBuilder**: Fluent interface for creating positions 5. **PositionStorage**: Persistent storage with JSON and shelf support ### Position Hierarchy Positions are represented as paths from document root to specific locations: ``` Document → Chapter[2] → Block[5] → Paragraph → Word[12] → Character[3] Document → Chapter[1] → Block[3] → Table → Row[2] → Cell[1] → Word[0] Document → Chapter[0] → Block[1] → Image ``` ## Usage Examples ### Basic Position Creation ```python from pyWebLayout.layout.recursive_position import PositionBuilder # Create a word position with character-level precision position = (PositionBuilder() .chapter(2) .block(5) .paragraph() .word(12, offset=3) .with_rendering_metadata(font_scale=1.5, theme="dark") .build()) print(position) # document[0] -> chapter[2] -> block[5] -> paragraph[0] -> word[12]+3 ``` ### Different Content Types ```python from pyWebLayout.layout.recursive_position import ( create_word_position, create_image_position, create_table_cell_position, create_list_item_position ) # Word in a paragraph word_pos = create_word_position(chapter=1, block=3, word=15, char_offset=2) # Image in a block image_pos = create_image_position(chapter=2, block=1, image_index=0) # Cell in a table table_pos = create_table_cell_position(chapter=0, block=4, row=2, col=1, word=5) # Item in a list list_pos = create_list_item_position(chapter=1, block=2, item=3, word=0) ``` ### Complex Nested Structures ```python # Position in a nested list nested_pos = (PositionBuilder() .chapter(2) .block(5) .list(0, list_type="ordered") .list_item(2) .list(1, list_type="unordered") # Nested list .list_item(1) .word(3) .build()) # Position in a table cell with metadata table_pos = (PositionBuilder() .chapter(3) .block(10) .table(0, table_type="financial", columns=5) .table_row(2, row_type="data") .table_cell(1, cell_type="currency", format="USD") .word(0, text="$1,234.56") .build()) ``` ### Position Relationships ```python # Check ancestor/descendant relationships chapter_pos = PositionBuilder().chapter(1).block(2).build() word_pos = PositionBuilder().chapter(1).block(2).paragraph().word(5).build() print(chapter_pos.is_ancestor_of(word_pos)) # True print(word_pos.is_descendant_of(chapter_pos)) # True # Find common ancestors other_pos = create_word_position(1, 3, 0) # Different block common = word_pos.get_common_ancestor(other_pos) print(common) # document[0] -> chapter[1] ``` ### Serialization and Storage ```python from pyWebLayout.layout.recursive_position import PositionStorage # JSON storage storage = PositionStorage("bookmarks", use_shelf=False) # Save positions storage.save_position("my_document", "bookmark1", position) storage.save_position("my_document", "bookmark2", other_position) # Load positions loaded = storage.load_position("my_document", "bookmark1") all_bookmarks = storage.list_positions("my_document") # Shelf storage (binary, more efficient for large datasets) shelf_storage = PositionStorage("bookmarks", use_shelf=True) shelf_storage.save_position("my_document", "bookmark1", position) ``` ## Content Types The system supports the following content types: | Type | Description | Example Usage | |------|-------------|---------------| | `DOCUMENT` | Document root | Always present as root node | | `CHAPTER` | Document chapters/sections | Chapter navigation | | `BLOCK` | Block-level elements | Paragraphs, headings, tables | | `PARAGRAPH` | Text paragraphs | Text content | | `HEADING` | Section headings | H1-H6 elements | | `TABLE` | Table structures | Data tables | | `TABLE_ROW` | Table rows | Row navigation | | `TABLE_CELL` | Table cells | Cell-specific content | | `LIST` | List structures | Ordered/unordered lists | | `LIST_ITEM` | List items | Individual list entries | | `WORD` | Individual words | Word-level precision | | `IMAGE` | Images | Visual content | | `LINK` | Hyperlinks | Interactive links | | `BUTTON` | Interactive buttons | Form controls | | `FORM_FIELD` | Form input fields | User input | | `LINE` | Rendered text lines | Layout-specific | | `PAGE` | Rendered pages | Pagination | ## Ereader Integration The system is designed for ereader applications with features like: ### Bookmark Management ```python # Save reading position with context reading_pos = (PositionBuilder() .chapter(3) .block(15) .paragraph() .word(23, offset=7) .with_rendering_metadata( font_scale=1.2, page_size=[600, 800], theme="sepia" ) .build()) storage.save_position("novel", "chapter3_climax", reading_pos) ``` ### Chapter Navigation ```python # Jump to chapter start chapter_start = PositionBuilder().chapter(5).block(0).paragraph().word(0).build() # Navigate within chapter current_pos = PositionBuilder().chapter(5).block(12).paragraph().word(45).build() # Check if positions are in same chapter same_chapter = chapter_start.get_common_ancestor(current_pos) chapter_node = same_chapter.get_node(ContentType.CHAPTER) print(f"Both in chapter {chapter_node.index}") ``` ### Font Scaling Support ```python # Position with rendering metadata position = (PositionBuilder() .chapter(2) .block(8) .paragraph() .word(15) .with_rendering_metadata( font_scale=1.5, page_size=[800, 600], line_height=24, theme="dark" ) .build()) # Metadata persists through serialization json_str = position.to_json() restored = RecursivePosition.from_json(json_str) print(restored.rendering_metadata["font_scale"]) # 1.5 ``` ## Advanced Features ### Position Navigation ```python # Truncate position to specific level word_pos = create_word_position(2, 5, 12, 3) block_pos = word_pos.copy().truncate_to_type(ContentType.BLOCK) print(block_pos) # document[0] -> chapter[2] -> block[5] # Navigate between related positions table_cell_pos = create_table_cell_position(1, 3, 2, 1, 0) next_cell_pos = table_cell_pos.copy() cell_node = next_cell_pos.get_node(ContentType.TABLE_CELL) cell_node.index = 2 # Move to next column ``` ### Metadata Usage ```python # Rich metadata support position = (PositionBuilder() .chapter(1) .block(5) .table(0, table_type="financial", columns=5, rows=20, title="Q3 Results") .table_row(3, row_type="data", category="revenue") .table_cell(2, cell_type="currency", format="USD", precision=2) .word(0, text="$1,234,567.89") .build()) # Access metadata table_node = position.get_node(ContentType.TABLE) print(table_node.metadata["title"]) # "Q3 Results" cell_node = position.get_node(ContentType.TABLE_CELL) print(cell_node.metadata["format"]) # "USD" ``` ## Performance Considerations ### Memory Usage - Positions are lightweight (typically < 1KB serialized) - Path-based structure minimizes memory overhead - Metadata is optional and only stored when needed ### Serialization Performance - **JSON**: Human-readable, cross-platform, ~2-3x larger - **Shelf**: Binary format, faster for large datasets, Python-specific ### Comparison Operations - Position equality: O(n) where n is path depth - Ancestor/descendant checks: O(min(depth1, depth2)) - Common ancestor finding: O(min(depth1, depth2)) ## Integration with Existing Systems ### Backward Compatibility The system can coexist with existing position tracking: ```python # Convert from old RenderingPosition def convert_old_position(old_pos): return (PositionBuilder() .chapter(old_pos.chapter_index) .block(old_pos.block_index) .paragraph() .word(old_pos.word_index) .build()) # Convert to old format (lossy) def convert_to_old(recursive_pos): chapter_node = recursive_pos.get_node(ContentType.CHAPTER) block_node = recursive_pos.get_node(ContentType.BLOCK) word_node = recursive_pos.get_node(ContentType.WORD) return RenderingPosition( chapter_index=chapter_node.index if chapter_node else 0, block_index=block_node.index if block_node else 0, word_index=word_node.index if word_node else 0 ) ``` ### Migration Strategy 1. **Phase 1**: Implement recursive system alongside existing system 2. **Phase 2**: Update bookmark storage to use new format 3. **Phase 3**: Migrate existing bookmarks 4. **Phase 4**: Update layout engines to generate recursive positions 5. **Phase 5**: Remove old position system ## Testing Comprehensive test suite covers: - Position creation and manipulation - Serialization/deserialization - Storage systems (JSON and shelf) - Position relationships - Real-world scenarios - Performance benchmarks Run tests with: ```bash python -m pytest tests/layout/test_recursive_position.py -v ``` ## Examples See `examples/recursive_position_demo.py` for a complete demonstration of all features. ## Future Enhancements Potential improvements: 1. **Position Comparison**: Implement `<`, `>`, `<=`, `>=` operators for sorting 2. **Path Compression**: Optimize storage for deep hierarchies 3. **Query Language**: SQL-like queries for position sets 4. **Indexing**: B-tree indexing for large position collections 5. **Diff Operations**: Calculate differences between positions 6. **Batch Operations**: Efficient bulk position updates ## Conclusion The Recursive Position System provides a robust, flexible foundation for position tracking in complex document structures. Its hierarchical approach, rich metadata support, and efficient serialization make it ideal for modern ereader applications and document management systems. The system's design prioritizes: - **Flexibility**: Handle any content type or nesting level - **Performance**: Efficient operations and minimal memory usage - **Usability**: Intuitive builder pattern and clear APIs - **Persistence**: Reliable serialization and storage options - **Extensibility**: Easy to add new content types and features This makes it a significant improvement over traditional flat position systems and provides a solid foundation for advanced document navigation features.