372 lines
12 KiB
Markdown
372 lines
12 KiB
Markdown
# Recursive Position System
|
|
|
|
A flexible, hierarchical position tracking system for dynamic content positioning in document layout applications.
|
|
|
|
## Overview
|
|
|
|
The Recursive Position System provides a powerful way to track positions within complex, nested document structures. Unlike traditional flat position systems that only track basic coordinates, this system can reference any type of content (words, images, table cells, list items, etc.) with full hierarchical context.
|
|
|
|
## Key Features
|
|
|
|
- **Hierarchical Position Tracking**: Navigate through nested document structures with precision
|
|
- **Dynamic Content Type Support**: Handle words, images, tables, lists, forms, and more
|
|
- **Flexible Serialization**: Save positions as JSON or Python shelf objects
|
|
- **Position Relationships**: Query ancestor/descendant relationships between positions
|
|
- **Fluent Builder Pattern**: Easy position creation with method chaining
|
|
- **Metadata Support**: Store rendering context (font scale, themes, etc.)
|
|
- **Real-world Applications**: Perfect for ereaders, document editors, and CMS systems
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
1. **ContentType Enum**: Defines all supported content types
|
|
2. **LocationNode**: Represents a single position within a content type
|
|
3. **RecursivePosition**: Hierarchical position with a path of LocationNodes
|
|
4. **PositionBuilder**: Fluent interface for creating positions
|
|
5. **PositionStorage**: Persistent storage with JSON and shelf support
|
|
|
|
### Position Hierarchy
|
|
|
|
Positions are represented as paths from document root to specific locations:
|
|
|
|
```
|
|
Document → Chapter[2] → Block[5] → Paragraph → Word[12] → Character[3]
|
|
Document → Chapter[1] → Block[3] → Table → Row[2] → Cell[1] → Word[0]
|
|
Document → Chapter[0] → Block[1] → Image
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Basic Position Creation
|
|
|
|
```python
|
|
from pyWebLayout.layout.recursive_position import PositionBuilder
|
|
|
|
# Create a word position with character-level precision
|
|
position = (PositionBuilder()
|
|
.chapter(2)
|
|
.block(5)
|
|
.paragraph()
|
|
.word(12, offset=3)
|
|
.with_rendering_metadata(font_scale=1.5, theme="dark")
|
|
.build())
|
|
|
|
print(position) # document[0] -> chapter[2] -> block[5] -> paragraph[0] -> word[12]+3
|
|
```
|
|
|
|
### Different Content Types
|
|
|
|
```python
|
|
from pyWebLayout.layout.recursive_position import (
|
|
create_word_position, create_image_position,
|
|
create_table_cell_position, create_list_item_position
|
|
)
|
|
|
|
# Word in a paragraph
|
|
word_pos = create_word_position(chapter=1, block=3, word=15, char_offset=2)
|
|
|
|
# Image in a block
|
|
image_pos = create_image_position(chapter=2, block=1, image_index=0)
|
|
|
|
# Cell in a table
|
|
table_pos = create_table_cell_position(chapter=0, block=4, row=2, col=1, word=5)
|
|
|
|
# Item in a list
|
|
list_pos = create_list_item_position(chapter=1, block=2, item=3, word=0)
|
|
```
|
|
|
|
### Complex Nested Structures
|
|
|
|
```python
|
|
# Position in a nested list
|
|
nested_pos = (PositionBuilder()
|
|
.chapter(2)
|
|
.block(5)
|
|
.list(0, list_type="ordered")
|
|
.list_item(2)
|
|
.list(1, list_type="unordered") # Nested list
|
|
.list_item(1)
|
|
.word(3)
|
|
.build())
|
|
|
|
# Position in a table cell with metadata
|
|
table_pos = (PositionBuilder()
|
|
.chapter(3)
|
|
.block(10)
|
|
.table(0, table_type="financial", columns=5)
|
|
.table_row(2, row_type="data")
|
|
.table_cell(1, cell_type="currency", format="USD")
|
|
.word(0, text="$1,234.56")
|
|
.build())
|
|
```
|
|
|
|
### Position Relationships
|
|
|
|
```python
|
|
# Check ancestor/descendant relationships
|
|
chapter_pos = PositionBuilder().chapter(1).block(2).build()
|
|
word_pos = PositionBuilder().chapter(1).block(2).paragraph().word(5).build()
|
|
|
|
print(chapter_pos.is_ancestor_of(word_pos)) # True
|
|
print(word_pos.is_descendant_of(chapter_pos)) # True
|
|
|
|
# Find common ancestors
|
|
other_pos = create_word_position(1, 3, 0) # Different block
|
|
common = word_pos.get_common_ancestor(other_pos)
|
|
print(common) # document[0] -> chapter[1]
|
|
```
|
|
|
|
### Serialization and Storage
|
|
|
|
```python
|
|
from pyWebLayout.layout.recursive_position import PositionStorage
|
|
|
|
# JSON storage
|
|
storage = PositionStorage("bookmarks", use_shelf=False)
|
|
|
|
# Save positions
|
|
storage.save_position("my_document", "bookmark1", position)
|
|
storage.save_position("my_document", "bookmark2", other_position)
|
|
|
|
# Load positions
|
|
loaded = storage.load_position("my_document", "bookmark1")
|
|
all_bookmarks = storage.list_positions("my_document")
|
|
|
|
# Shelf storage (binary, more efficient for large datasets)
|
|
shelf_storage = PositionStorage("bookmarks", use_shelf=True)
|
|
shelf_storage.save_position("my_document", "bookmark1", position)
|
|
```
|
|
|
|
## Content Types
|
|
|
|
The system supports the following content types:
|
|
|
|
| Type | Description | Example Usage |
|
|
|------|-------------|---------------|
|
|
| `DOCUMENT` | Document root | Always present as root node |
|
|
| `CHAPTER` | Document chapters/sections | Chapter navigation |
|
|
| `BLOCK` | Block-level elements | Paragraphs, headings, tables |
|
|
| `PARAGRAPH` | Text paragraphs | Text content |
|
|
| `HEADING` | Section headings | H1-H6 elements |
|
|
| `TABLE` | Table structures | Data tables |
|
|
| `TABLE_ROW` | Table rows | Row navigation |
|
|
| `TABLE_CELL` | Table cells | Cell-specific content |
|
|
| `LIST` | List structures | Ordered/unordered lists |
|
|
| `LIST_ITEM` | List items | Individual list entries |
|
|
| `WORD` | Individual words | Word-level precision |
|
|
| `IMAGE` | Images | Visual content |
|
|
| `LINK` | Hyperlinks | Interactive links |
|
|
| `BUTTON` | Interactive buttons | Form controls |
|
|
| `FORM_FIELD` | Form input fields | User input |
|
|
| `LINE` | Rendered text lines | Layout-specific |
|
|
| `PAGE` | Rendered pages | Pagination |
|
|
|
|
## Ereader Integration
|
|
|
|
The system is designed for ereader applications with features like:
|
|
|
|
### Bookmark Management
|
|
|
|
```python
|
|
# Save reading position with context
|
|
reading_pos = (PositionBuilder()
|
|
.chapter(3)
|
|
.block(15)
|
|
.paragraph()
|
|
.word(23, offset=7)
|
|
.with_rendering_metadata(
|
|
font_scale=1.2,
|
|
page_size=[600, 800],
|
|
theme="sepia"
|
|
)
|
|
.build())
|
|
|
|
storage.save_position("novel", "chapter3_climax", reading_pos)
|
|
```
|
|
|
|
### Chapter Navigation
|
|
|
|
```python
|
|
# Jump to chapter start
|
|
chapter_start = PositionBuilder().chapter(5).block(0).paragraph().word(0).build()
|
|
|
|
# Navigate within chapter
|
|
current_pos = PositionBuilder().chapter(5).block(12).paragraph().word(45).build()
|
|
|
|
# Check if positions are in same chapter
|
|
same_chapter = chapter_start.get_common_ancestor(current_pos)
|
|
chapter_node = same_chapter.get_node(ContentType.CHAPTER)
|
|
print(f"Both in chapter {chapter_node.index}")
|
|
```
|
|
|
|
### Font Scaling Support
|
|
|
|
```python
|
|
# Position with rendering metadata
|
|
position = (PositionBuilder()
|
|
.chapter(2)
|
|
.block(8)
|
|
.paragraph()
|
|
.word(15)
|
|
.with_rendering_metadata(
|
|
font_scale=1.5,
|
|
page_size=[800, 600],
|
|
line_height=24,
|
|
theme="dark"
|
|
)
|
|
.build())
|
|
|
|
# Metadata persists through serialization
|
|
json_str = position.to_json()
|
|
restored = RecursivePosition.from_json(json_str)
|
|
print(restored.rendering_metadata["font_scale"]) # 1.5
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### Position Navigation
|
|
|
|
```python
|
|
# Truncate position to specific level
|
|
word_pos = create_word_position(2, 5, 12, 3)
|
|
block_pos = word_pos.copy().truncate_to_type(ContentType.BLOCK)
|
|
print(block_pos) # document[0] -> chapter[2] -> block[5]
|
|
|
|
# Navigate between related positions
|
|
table_cell_pos = create_table_cell_position(1, 3, 2, 1, 0)
|
|
next_cell_pos = table_cell_pos.copy()
|
|
cell_node = next_cell_pos.get_node(ContentType.TABLE_CELL)
|
|
cell_node.index = 2 # Move to next column
|
|
```
|
|
|
|
### Metadata Usage
|
|
|
|
```python
|
|
# Rich metadata support
|
|
position = (PositionBuilder()
|
|
.chapter(1)
|
|
.block(5)
|
|
.table(0,
|
|
table_type="financial",
|
|
columns=5,
|
|
rows=20,
|
|
title="Q3 Results")
|
|
.table_row(3,
|
|
row_type="data",
|
|
category="revenue")
|
|
.table_cell(2,
|
|
cell_type="currency",
|
|
format="USD",
|
|
precision=2)
|
|
.word(0, text="$1,234,567.89")
|
|
.build())
|
|
|
|
# Access metadata
|
|
table_node = position.get_node(ContentType.TABLE)
|
|
print(table_node.metadata["title"]) # "Q3 Results"
|
|
|
|
cell_node = position.get_node(ContentType.TABLE_CELL)
|
|
print(cell_node.metadata["format"]) # "USD"
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Memory Usage
|
|
|
|
- Positions are lightweight (typically < 1KB serialized)
|
|
- Path-based structure minimizes memory overhead
|
|
- Metadata is optional and only stored when needed
|
|
|
|
### Serialization Performance
|
|
|
|
- **JSON**: Human-readable, cross-platform, ~2-3x larger
|
|
- **Shelf**: Binary format, faster for large datasets, Python-specific
|
|
|
|
### Comparison Operations
|
|
|
|
- Position equality: O(n) where n is path depth
|
|
- Ancestor/descendant checks: O(min(depth1, depth2))
|
|
- Common ancestor finding: O(min(depth1, depth2))
|
|
|
|
## Integration with Existing Systems
|
|
|
|
### Backward Compatibility
|
|
|
|
The system can coexist with existing position tracking:
|
|
|
|
```python
|
|
# Convert from old RenderingPosition
|
|
def convert_old_position(old_pos):
|
|
return (PositionBuilder()
|
|
.chapter(old_pos.chapter_index)
|
|
.block(old_pos.block_index)
|
|
.paragraph()
|
|
.word(old_pos.word_index)
|
|
.build())
|
|
|
|
# Convert to old format (lossy)
|
|
def convert_to_old(recursive_pos):
|
|
chapter_node = recursive_pos.get_node(ContentType.CHAPTER)
|
|
block_node = recursive_pos.get_node(ContentType.BLOCK)
|
|
word_node = recursive_pos.get_node(ContentType.WORD)
|
|
|
|
return RenderingPosition(
|
|
chapter_index=chapter_node.index if chapter_node else 0,
|
|
block_index=block_node.index if block_node else 0,
|
|
word_index=word_node.index if word_node else 0
|
|
)
|
|
```
|
|
|
|
### Migration Strategy
|
|
|
|
1. **Phase 1**: Implement recursive system alongside existing system
|
|
2. **Phase 2**: Update bookmark storage to use new format
|
|
3. **Phase 3**: Migrate existing bookmarks
|
|
4. **Phase 4**: Update layout engines to generate recursive positions
|
|
5. **Phase 5**: Remove old position system
|
|
|
|
## Testing
|
|
|
|
Comprehensive test suite covers:
|
|
|
|
- Position creation and manipulation
|
|
- Serialization/deserialization
|
|
- Storage systems (JSON and shelf)
|
|
- Position relationships
|
|
- Real-world scenarios
|
|
- Performance benchmarks
|
|
|
|
Run tests with:
|
|
```bash
|
|
python -m pytest tests/layout/test_recursive_position.py -v
|
|
```
|
|
|
|
## Examples
|
|
|
|
See `examples/recursive_position_demo.py` for a complete demonstration of all features.
|
|
|
|
## Future Enhancements
|
|
|
|
Potential improvements:
|
|
|
|
1. **Position Comparison**: Implement `<`, `>`, `<=`, `>=` operators for sorting
|
|
2. **Path Compression**: Optimize storage for deep hierarchies
|
|
3. **Query Language**: SQL-like queries for position sets
|
|
4. **Indexing**: B-tree indexing for large position collections
|
|
5. **Diff Operations**: Calculate differences between positions
|
|
6. **Batch Operations**: Efficient bulk position updates
|
|
|
|
## Conclusion
|
|
|
|
The Recursive Position System provides a robust, flexible foundation for position tracking in complex document structures. Its hierarchical approach, rich metadata support, and efficient serialization make it ideal for modern ereader applications and document management systems.
|
|
|
|
The system's design prioritizes:
|
|
- **Flexibility**: Handle any content type or nesting level
|
|
- **Performance**: Efficient operations and minimal memory usage
|
|
- **Usability**: Intuitive builder pattern and clear APIs
|
|
- **Persistence**: Reliable serialization and storage options
|
|
- **Extensibility**: Easy to add new content types and features
|
|
|
|
This makes it a significant improvement over traditional flat position systems and provides a solid foundation for advanced document navigation features.
|