# Technology Stack ## Core Technologies ### Backend Framework #### FastAPI (v0.115+) - **Purpose**: Web framework for building APIs - **Why**: High performance, automatic API documentation, type hints, async support - **Features Used**: - Automatic OpenAPI schema generation - Pydantic data validation - Dependency injection - Background tasks - Security utilities (OAuth2, JWT) ### Programming Language #### Python 3.12+ - **Why**: Rich ecosystem, excellent AI/ML libraries, readability - **Key Libraries**: - Type hints for better code quality - Async/await for concurrency - Dataclasses for structured data ## Database & Persistence ### Relational Database #### PostgreSQL 12+ - **Purpose**: Primary data store - **Why**: ACID compliance, advanced features, JSON support, full-text search - **Features Used**: - Complex queries and joins - JSON/JSONB columns - Foreign key constraints - Database triggers and events ### ORM #### SQLAlchemy 2.0+ - **Purpose**: Object-Relational Mapping - **Why**: Powerful query API, relationship handling, migration support - **Features Used**: - Declarative models - Relationship configurations - Session management - Query optimization ### Database Migrations #### Alembic 1.16+ - **Purpose**: Database schema versioning - **Why**: Track changes, rollback capability, team collaboration - **Usage**: - Auto-generate migrations from model changes - Version control for database schema - Upgrade/downgrade paths ## Caching & Message Broker ### Redis 6.4+ - **Purpose**: Caching, session storage, message broker - **Why**: In-memory speed, pub/sub messaging, data structures - **Use Cases**: - Celery task queue broker - Celery result backend - Session caching - Application-level caching ## Asynchronous Task Processing ### Celery 5.5+ - **Purpose**: Distributed task queue - **Why**: Async job processing, scheduling, retry mechanisms - **Features Used**: - Task queuing and execution - Task chaining and grouping - Periodic tasks - Result tracking - Retry logic ## AI & Machine Learning ### Large Language Models #### OpenAI-compatible API (v1.75+) - **Purpose**: LLM-based text analysis and annotation - **Why**: Uniform interface for multiple LLM providers - **Use Cases**: - Legal text annotation - Provision classification - Entity extraction - Text generation ### Machine Learning #### scikit-learn 1.6+ - **Purpose**: Traditional ML classification - **Why**: Proven algorithms, easy training, lightweight - **Features Used**: - Text vectorization (TF-IDF) - Classification algorithms (SVM, Random Forest, Logistic Regression) - Model persistence (joblib) - Cross-validation #### joblib 1.5+ - **Purpose**: Model serialization - **Why**: Efficient storage of trained models - **Usage**: Saving/loading trained classifiers ### Natural Language Processing #### spaCy 3.8+ - **Purpose**: NLP preprocessing - **Why**: Fast, production-ready, pre-trained models - **Models Used**: - `en_core_web_sm` - English language model - **Use Cases**: - Text tokenization - Named entity recognition - Part-of-speech tagging - Dependency parsing ## Semantic Web & RDF ### rdflib - **Purpose**: RDF graph manipulation - **Why**: Python-native, comprehensive RDF support - **Features Used**: - Graph creation and manipulation - Turtle/JSON-LD serialization - SPARQL query execution - Namespace management ### SPARQL - **Purpose**: RDF query language - **Why**: Standard for querying knowledge graphs - **Usage**: Querying EUR-Lex and other SPARQL endpoints ### SaxonC (saxonche) 12.6+ - **Purpose**: XSLT transformations - **Why**: Industry-standard XML processing - **Usage**: Transforming legal documents (FMX to AKN-LEOS) ## Authentication & Security ### JWT (PyJWT) 2.10+ - **Purpose**: JSON Web Token handling - **Why**: Stateless authentication, standard-compliant - **Features Used**: - Token generation - Token validation - Expiration handling - Custom claims ### bcrypt 4.1+ - **Purpose**: Password hashing - **Why**: Industry-standard, resistant to rainbow tables - **Usage**: Secure password storage ## HTTP & Networking ### httpx - **Purpose**: HTTP client - **Why**: Modern, async-capable, HTTP/2 support - **Usage**: External API calls ### python-dotenv 1.1+ - **Purpose**: Environment variable management - **Why**: Easy configuration management - **Usage**: Loading `.env` files ## Document Processing ### python-docx 1.1+ - **Purpose**: Microsoft Word document processing - **Why**: Read/write DOCX files programmatically - **Usage**: Extracting text from legal documents ## Development Tools ### Testing #### pytest 8.3+ - **Purpose**: Testing framework - **Why**: Simple syntax, powerful fixtures, extensive plugins - **Plugins Used**: - `pytest-asyncio` - Async test support - `pytest-cov` - Coverage reporting - `requests-mock` - HTTP mocking #### coverage 7.9+ - **Purpose**: Code coverage analysis - **Why**: Identify untested code - **Usage**: HTML and terminal coverage reports ### Documentation #### Sphinx 8.2+ - **Purpose**: Documentation generation - **Why**: Python standard, extensible, multiple output formats - **Extensions**: - `sphinx.ext.autodoc` - Auto-generate from docstrings - `sphinx.ext.napoleon` - Google/NumPy docstring support - `sphinx.ext.viewcode` - Link to source code - `sphinx.ext.intersphinx` - Cross-project linking #### MyST Parser 4.0+ - **Purpose**: Markdown support in Sphinx - **Why**: Write docs in Markdown - **Features**: CommonMark + extensions #### Sphinx RTD Theme 3.0+ - **Purpose**: Documentation theme - **Why**: Clean, responsive, professional ### Dependency Management #### Poetry - **Purpose**: Dependency and environment management - **Why**: Deterministic builds, lock files, virtual env management - **Usage**: - `pyproject.toml` - Dependency specification - `poetry.lock` - Lock file for reproducibility ## Data Validation ### Pydantic - **Purpose**: Data validation using Python type hints - **Why**: Runtime validation, JSON schema generation, editor support - **Usage**: - API request/response models - Configuration validation - Data serialization ### jsonschema 4.25+ - **Purpose**: JSON schema validation - **Why**: Standard-compliant validation - **Usage**: Validating complex JSON structures ## Containerization ### Docker - **Purpose**: Application containerization - **Why**: Consistent environments, easy deployment - **Usage**: - `Dockerfile` for backend image - Multi-stage builds for optimization ### Docker Compose - **Purpose**: Multi-container orchestration - **Why**: Local development, integration testing - **Services**: - Backend API - Celery Worker - PostgreSQL - Redis ## Web Server (Production) ### Uvicorn - **Purpose**: ASGI server - **Why**: High performance, WebSocket support - **Usage**: Running FastAPI application ### Gunicorn (Optional) - **Purpose**: Process manager - **Why**: Multi-worker management, automatic restarts - **Usage**: Production deployment with Uvicorn workers ## Monitoring & Logging ### Python logging - **Purpose**: Application logging - **Why**: Built-in, configurable, handlers - **Configuration**: Custom formatting, rotation