# System Architecture Overview ## Introduction AI4DRPM follows a **layered architecture** pattern with clear separation of concerns. ## Architectural Layers ```mermaid graph TB subgraph "Web Layer" A[FastAPI REST API] B[API Routers] C[Middleware] end subgraph "Service Layer" D[Resource Services] E[Engine Services] F[Shared Services] end subgraph "Asynchronous Tasks" M[Celery Workers] N[Task Queue
Redis] end subgraph "Data Layer" L[Repositories] K[SQLAlchemy ORM] O[(PostgreSQL)] end subgraph "External Systems" Q[LLM APIs] R[EU Cellar SPARQL] end C --> A A --> B B --> D B --> E B --> F D --> L F --> L L --> K K --> O B --> M M --> E N --> M E --> Q D --> R style O fill:#e1f5ff style K fill:#fff4e1 ``` ## Layer Descriptions ### 1. Web Layer **Responsibility**: HTTP request handling, response formatting, authentication **Components**: - `api.py` - FastAPI application initialization - `routers/` - Endpoint definitions organized by domain - `dependencies.py` - Dependency injection (auth, database sessions) - `schemas/` - Domain-organized Pydantic request/response models - Middleware - CORS, security headers, logging ### 2. Service Layer **Responsibility**: Business logic, orchestration, data validation **Components**: #### Resource Services (`services/resources/`) - `document_collection_service.py` - Document collection from EU Cellar - `document_parsing_service.py` - Document parsing with tulit - `document_metadata_service.py` - SPARQL-based document discovery, metadata enrichment, and CELEX metadata extraction - `legal_resource_service.py` - Legal resource CRUD - `provision_service.py` - Legal provision CRUD - `classification_service.py` - Legal Provision Classification CRUD - `analysis_service.py` - Analysis CRUD - `category_service.py` - Category CRUD - `statement_service.py` - Statement generation - `token_usage_service.py` - LLM token usage tracking #### Engine Services (`services/engine/`) - `pipeline_service.py` - Pipeline lookup, orchestration and execution repository. - `prompt_service.py` - Prompt CRUD - `training_service.py` - Model training and evaluation orchestration #### Haystack Integration (`services/haystack/`) - `components/` - Haystack components used in pipelines (retrievers, classifiers, parsers, custom processors) - `component_registry.py` - Registry for available Haystack components and factories - `config.py` - Haystack configuration management - `document_contract.py` - Utilities for converting between Haystack `Document` objects and internal data models - `pipeline_validator.py` - Pipeline definition validation and graph checks - `streaming.py` - Utilities for streaming Haystack pipeline results - `utils.py` - Haystack-related utilities All the workflows are implemented as Haystack pipelines composed from the above components and orchestrated by `pipeline_service`. This consolidates Haystack-specific logic under `services/haystack/` while keeping orchestration and execution concerns in `services/engine`. #### Shared Services (`services/shared/`) - `user_service.py` - User CRUD - `refresh_token_service.py` - Token lifecycle management - `statistics_service.py` - System statistics and dashboards ### 3. Data Layer **Responsibility**: Data access, ORM mapping, database operations **Components**: - `db/models/` - SQLAlchemy models organized by domain - `db/repositories/` - Data access patterns - `db/migrations/` - Alembic migration scripts - `db/database.py` - Database connection and session management ### 4. Task Queue & Workers **Responsibility**: Asynchronous job processing, background tasks **Components**: - `tasks/celery_worker.py` - Celery application configuration - `tasks/tasks.py` - Task definitions - `tasks/handler.py` - Task execution handlers - `tasks/factory.py` - Task factory pattern - `tasks/utils.py` - Task utilities - `tasks/types.py` - Task status and record types ### 5. Authentication & Security **Responsibility**: User authentication, authorization, security **Components**: - `auth/security.py` - JWT token generation/validation ### 6. Utilities Layer **Responsibility**: Cross-cutting concerns, helper functions **Components**: - `utils/sparql_utils.py` - SPARQL query execution - `utils/refresh_token_utils.py` - Token utilities - `utils/serialization.py` - Serialization helpers - `utils/utils.py` - General utilities ## Technology Integration Points ### External APIs - **OpenAI-compatible API**: LLM-based text analysis and annotation - **SPARQL Endpoints**: Knowledge graph queries (e.g., Cellar) ### Databases - **PostgreSQL**: Primary data store ### Message Queues - **Celery + Redis**: Asynchronous task processing ## Configuration Management Configuration is managed through: 1. **Environment Variables** (`.env` file) 2. **Config JSON** (`config.json` for paths) 3. **Database Configuration** (Alembic for migrations) ## Logging & Monitoring ### Logging - Structured logging to `logs/ai4drpm.log` - Log rotation (via `logrotate.conf`) - Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL ## Deployment Architecture ### Docker Compose Deployment - Multi-container setup - Separate containers for: API, Worker, PostgreSQL, Redis - Volume mounts for persistence - Network isolation