# System Architecture Overview

## Introduction

AI4DRPM follows a **layered architecture** pattern with clear separation of concerns.

## Architectural Layers

```mermaid
graph TB
    subgraph "Web Layer"
        A[FastAPI REST API]
        B[API Routers]
        C[Middleware]
    end
    
    subgraph "Service Layer"
        D[Resource Services]
        E[Engine Services]
        F[Shared Services]
    end
    
    subgraph "Asynchronous Tasks"
        M[Celery Workers]
        N[Task Queue<br/>Redis]
    end
    
    subgraph "Data Layer"
        L[Repositories]
        K[SQLAlchemy ORM]
        O[(PostgreSQL)]
    end
    
    subgraph "External Systems"
        Q[LLM APIs]
        R[EU Cellar SPARQL]
    end
    
    C --> A
    A --> B
    B --> D
    B --> E
    B --> F
    
    D --> L
    F --> L
    L --> K
    K --> O
    
    B --> M
    M --> E
    N --> M
    
    E --> Q
    D --> R
    
    style O fill:#e1f5ff
    style K fill:#fff4e1
```

## Layer Descriptions

### 1. Web Layer

**Responsibility**: HTTP request handling, response formatting, authentication

**Components**:
- `api.py` - FastAPI application initialization
- `routers/` - Endpoint definitions organized by domain
- `dependencies.py` - Dependency injection (auth, database sessions)
- `schemas/` - Domain-organized Pydantic request/response models
- Middleware - CORS, security headers, logging

### 2. Service Layer

**Responsibility**: Business logic, orchestration, data validation

**Components**:

#### Resource Services (`services/resources/`)
- `document_collection_service.py` - Document collection from EU Cellar
- `document_parsing_service.py` - Document parsing with tulit
- `document_metadata_service.py` - SPARQL-based document discovery, metadata enrichment, and CELEX metadata extraction
- `legal_resource_service.py` - Legal resource CRUD
- `provision_service.py` - Legal provision CRUD
- `classification_service.py` - Legal Provision Classification CRUD
- `analysis_service.py` - Analysis CRUD
- `category_service.py` - Category CRUD
- `statement_service.py` - Statement generation
- `token_usage_service.py` - LLM token usage tracking

#### Engine Services (`services/engine/`)
- `pipeline_service.py` - Pipeline lookup, orchestration and execution repository.
- `prompt_service.py` - Prompt CRUD
- `training_service.py` - Model training and evaluation orchestration

#### Haystack Integration (`services/haystack/`)
- `components/` - Haystack components used in pipelines (retrievers, classifiers, parsers, custom processors)
- `component_registry.py` - Registry for available Haystack components and factories
- `config.py` - Haystack configuration management
- `document_contract.py` - Utilities for converting between Haystack `Document` objects and internal data models
- `pipeline_validator.py` - Pipeline definition validation and graph checks
- `streaming.py` - Utilities for streaming Haystack pipeline results
- `utils.py` - Haystack-related utilities

All the workflows are implemented as Haystack pipelines composed from the above components and orchestrated by `pipeline_service`. This consolidates Haystack-specific logic under `services/haystack/` while keeping orchestration and execution concerns in `services/engine`.

#### Shared Services (`services/shared/`)
- `user_service.py` - User CRUD
- `refresh_token_service.py` - Token lifecycle management
- `statistics_service.py` - System statistics and dashboards

### 3. Data Layer

**Responsibility**: Data access, ORM mapping, database operations

**Components**:
- `db/models/` - SQLAlchemy models organized by domain
- `db/repositories/` - Data access patterns
- `db/migrations/` - Alembic migration scripts
- `db/database.py` - Database connection and session management

### 4. Task Queue & Workers

**Responsibility**: Asynchronous job processing, background tasks

**Components**:
- `tasks/celery_worker.py` - Celery application configuration
- `tasks/tasks.py` - Task definitions
- `tasks/handler.py` - Task execution handlers
- `tasks/factory.py` - Task factory pattern
- `tasks/utils.py` - Task utilities
- `tasks/types.py` - Task status and record types

### 5. Authentication & Security

**Responsibility**: User authentication, authorization, security

**Components**:
- `auth/security.py` - JWT token generation/validation

### 6. Utilities Layer

**Responsibility**: Cross-cutting concerns, helper functions

**Components**:
- `utils/sparql_utils.py` - SPARQL query execution
- `utils/refresh_token_utils.py` - Token utilities
- `utils/serialization.py` - Serialization helpers
- `utils/utils.py` - General utilities

## Technology Integration Points

### External APIs
- **OpenAI-compatible API**: LLM-based text analysis and annotation
- **SPARQL Endpoints**: Knowledge graph queries (e.g., Cellar)

### Databases
- **PostgreSQL**: Primary data store

### Message Queues
- **Celery + Redis**: Asynchronous task processing

## Configuration Management

Configuration is managed through:
1. **Environment Variables** (`.env` file)
2. **Config JSON** (`config.json` for paths)
3. **Database Configuration** (Alembic for migrations)

## Logging & Monitoring

### Logging
- Structured logging to `logs/ai4drpm.log`
- Log rotation (via `logrotate.conf`)
- Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL

## Deployment Architecture

### Docker Compose Deployment
- Multi-container setup
- Separate containers for: API, Worker, PostgreSQL, Redis
- Volume mounts for persistence
- Network isolation