System Architecture Overview

Introduction

AI4DRPM follows a layered architecture pattern with clear separation of concerns.

Architectural Layers

        graph TB
    subgraph "Web Layer"
        A[FastAPI REST API]
        B[API Routers]
        C[Middleware]
    end
    
    subgraph "Service Layer"
        D[Resource Services]
        E[Engine Services]
        F[Shared Services]
    end
    
    subgraph "Asynchronous Tasks"
        M[Celery Workers]
        N[Task Queue<br/>Redis]
    end
    
    subgraph "Data Layer"
        L[Repositories]
        K[SQLAlchemy ORM]
        O[(PostgreSQL)]
    end
    
    subgraph "External Systems"
        Q[LLM APIs]
        R[EU Cellar SPARQL]
    end
    
    C --> A
    A --> B
    B --> D
    B --> E
    B --> F
    
    D --> L
    F --> L
    L --> K
    K --> O
    
    B --> M
    M --> E
    N --> M
    
    E --> Q
    D --> R
    
    style O fill:#e1f5ff
    style K fill:#fff4e1
    

Layer Descriptions

1. Web Layer

Responsibility: HTTP request handling, response formatting, authentication

Components:

  • api.py - FastAPI application initialization

  • routers/ - Endpoint definitions organized by domain

  • dependencies.py - Dependency injection (auth, database sessions)

  • schemas/ - Domain-organized Pydantic request/response models

  • Middleware - CORS, security headers, logging

2. Service Layer

Responsibility: Business logic, orchestration, data validation

Components:

Resource Services (services/resources/)

  • document_collection_service.py - Document collection from EU Cellar

  • document_parsing_service.py - Document parsing with tulit

  • document_metadata_service.py - SPARQL-based document discovery, metadata enrichment, and CELEX metadata extraction

  • legal_resource_service.py - Legal resource CRUD

  • provision_service.py - Legal provision CRUD

  • classification_service.py - Legal Provision Classification CRUD

  • analysis_service.py - Analysis CRUD

  • category_service.py - Category CRUD

  • statement_service.py - Statement generation

  • token_usage_service.py - LLM token usage tracking

Engine Services (services/engine/)

  • pipeline_service.py - Pipeline lookup, orchestration and execution repository.

  • prompt_service.py - Prompt CRUD

  • training_service.py - Model training and evaluation orchestration

Haystack Integration (services/haystack/)

  • components/ - Haystack components used in pipelines (retrievers, classifiers, parsers, custom processors)

  • component_registry.py - Registry for available Haystack components and factories

  • config.py - Haystack configuration management

  • document_contract.py - Utilities for converting between Haystack Document objects and internal data models

  • pipeline_validator.py - Pipeline definition validation and graph checks

  • streaming.py - Utilities for streaming Haystack pipeline results

  • utils.py - Haystack-related utilities

All the workflows are implemented as Haystack pipelines composed from the above components and orchestrated by pipeline_service. This consolidates Haystack-specific logic under services/haystack/ while keeping orchestration and execution concerns in services/engine.

Shared Services (services/shared/)

  • user_service.py - User CRUD

  • refresh_token_service.py - Token lifecycle management

  • statistics_service.py - System statistics and dashboards

3. Data Layer

Responsibility: Data access, ORM mapping, database operations

Components:

  • db/models/ - SQLAlchemy models organized by domain

  • db/repositories/ - Data access patterns

  • db/migrations/ - Alembic migration scripts

  • db/database.py - Database connection and session management

4. Task Queue & Workers

Responsibility: Asynchronous job processing, background tasks

Components:

  • tasks/celery_worker.py - Celery application configuration

  • tasks/tasks.py - Task definitions

  • tasks/handler.py - Task execution handlers

  • tasks/factory.py - Task factory pattern

  • tasks/utils.py - Task utilities

  • tasks/types.py - Task status and record types

5. Authentication & Security

Responsibility: User authentication, authorization, security

Components:

  • auth/security.py - JWT token generation/validation

6. Utilities Layer

Responsibility: Cross-cutting concerns, helper functions

Components:

  • utils/sparql_utils.py - SPARQL query execution

  • utils/refresh_token_utils.py - Token utilities

  • utils/serialization.py - Serialization helpers

  • utils/utils.py - General utilities

Technology Integration Points

External APIs

  • OpenAI-compatible API: LLM-based text analysis and annotation

  • SPARQL Endpoints: Knowledge graph queries (e.g., Cellar)

Databases

  • PostgreSQL: Primary data store

Message Queues

  • Celery + Redis: Asynchronous task processing

Configuration Management

Configuration is managed through:

  1. Environment Variables (.env file)

  2. Config JSON (config.json for paths)

  3. Database Configuration (Alembic for migrations)

Logging & Monitoring

Logging

  • Structured logging to logs/ai4drpm.log

  • Log rotation (via logrotate.conf)

  • Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL

Deployment Architecture

Docker Compose Deployment

  • Multi-container setup

  • Separate containers for: API, Worker, PostgreSQL, Redis

  • Volume mounts for persistence

  • Network isolation