Overview
The Data Layer implements the persistence and data access patterns for the AI4DRPM system using SQLAlchemy ORM
Architecture
Layer Organization
The data layer follows clean architecture principles:
┌─────────────────────────────────────┐
│ Service Layer (Business Logic) │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Repository Layer (Data Access) │
│ - BaseRepository │
│ - Domain Repositories │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Model Layer (ORM Entities) │
│ - SQLAlchemy Models │
│ - Relationships │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ PostgreSQL Database │
└─────────────────────────────────────┘
Domain Organization
Models are organized by business domain:
auth - User authentication and authorization
resources - Legal documents and provisions
analyses - Various analysis types (description, data, solutions, interoperability)
categories - Hierarchical category taxonomy
classifiers - ML model metadata and training data
prompts - LLM prompt templates
queries - SPARQL query templates
tasks - Celery task tracking
token_usage - LLM token consumption tracking
Entity Relationship Diagram
erDiagram
%% Authentication Domain
User ||--o{ RefreshToken : "has"
User ||--o{ Prompt : "creates"
User ||--o{ Pipeline : "creates"
User ||--o{ SparqlQuery : "creates"
User ||--o{ TokenUsage : "incurs"
%% Pipeline Domain
Pipeline ||--o{ PipelineExecution : "has"
Pipeline ||--o{ TokenUsage : "tracks"
PipelineExecution ||--o{ TokenUsage : "generates"
%% Category Domain (Self-referencing and relationships)
Category ||--o{ Category : "parent/subcategories"
Category ||--o{ ClassifierModel : "has"
Category ||--o{ ClassifierTrainingResult : "has"
%% Many-to-Many: Category with Classifications
Category }o--o{ LegalProvisionClassification : "via legal_provision_classification_categories"
%% Legal Resources Domain
LegalResource ||--o{ LegalProvision : "contains"
LegalProvision ||--o{ LegalProvisionClassification : "classified_by"
%% Many-to-Many: LegalProvisionClassification with all 5 Analysis types
LegalProvisionClassification }o--o{ DescriptionAnalysis : "via description_analysis_provisions"
LegalProvisionClassification }o--o{ DataAnalysis : "via data_analysis_provisions"
LegalProvisionClassification }o--o{ SolutionsAnalysis : "via solutions_analysis_provisions"
LegalProvisionClassification }o--o{ InteroperabilityAnalysis : "via interoperability_analysis_provisions"
LegalProvisionClassification }o--o{ DataFlowsAnalysis : "via data_flows_analysis_provisions"
%% Classifier/ML Model Domain
ClassifierModel ||--o{ ClassifierTrainingResult : "has training results"
%% Key Entity Definitions
User {
int id PK
string username UK
string email UK
boolean is_admin
}
RefreshToken {
int id PK
string token UK
int user_id FK
datetime expires_at
boolean is_revoked
}
Category {
int id PK
string name UK
string category_type
int parent_id FK
boolean is_active
}
LegalResource {
string celex PK
string eli
string preface
int year
string act_type
}
LegalProvision {
int id PK
string eId
string article_heading
string text
string legal_resource_celex FK
}
LegalProvisionClassification {
int id PK
string pipeline
string legal_provision_eId FK
string legal_resource_celex FK
string explanation
boolean validated
}
DescriptionAnalysis {
int id PK
json eId
text description
json high_level_process
json actors
boolean validated
}
DataAnalysis {
int id PK
json eId
string explanation
string type_of_data
string description
json standard
boolean validated
}
SolutionsAnalysis {
int id PK
json eId
string explanation
string digital_solution
json functionalities
json responsible_actor
boolean validated
}
InteroperabilityAnalysis {
int id PK
json eId
string digital_public_service
string description
boolean cross_border_interaction
json legal_interoperability_measures
json organisational_interoperability_measures
json semantic_interoperability_measures
json technical_interoperability_measures
boolean validated
}
DataFlowsAnalysis {
int id PK
json eId
string addresser
string action
string action_result
string addressee
boolean validated
}
ClassifierModel {
int id PK
string name UK
int category_id FK
boolean is_active
}
ClassifierTrainingResult {
int id PK
string classifier_name
int category_id FK
float f1_score
datetime training_date
}
Prompt {
int id PK
string name UK
text content
text system_prompt
string version
int created_by FK
}
SparqlQuery {
int id PK
string name UK
text query_template
string endpoint_url
string category
json parameters_schema
json result_mapping
int timeout_seconds
boolean is_active
int created_by FK
}
Pipeline {
int id PK
string name UK
text description
json definition
int version
boolean is_active
int created_by FK
}
PipelineExecution {
uuid id PK
int pipeline_id FK
int pipeline_version
string status
json input_data
json output_data
text error_message
string webhook_url
int document_count
datetime started_at
datetime completed_at
int created_by FK
}
TokenUsage {
int id PK
uuid execution_id FK
int user_id FK
int pipeline_id FK
int prompt_tokens
int completion_tokens
int total_tokens
string model
string provider
float estimated_cost_usd
string component_name
}
Task {
string id PK
string name
string status
json args
json kwargs
json result
string error_message
int progress
}
Database Statistics:
Core Tables: 19
Association Tables: 6
Total Relations: 25
For detailed field descriptions, see individual model documentation.
Migrations
Schema migrations are managed with Alembic.
Migration files are in ai4drpm/db/migrations/versions/.
alembic upgrade head # Apply migrations
alembic revision --autogenerate -m "desc" # Create migration
Session Management
Sessions are managed through FastAPI dependency injection:
from ai4drpm.db.database import get_db
from fastapi import Depends
@app.get("/resources")
def get_resources(db: Session = Depends(get_db)):
# Session automatically created and closed
repo = LegalResourceRepository(db)
return repo.get_multi()