How to Map DSR Types to GDPR Article 15

Data Subject Requests (DSRs) invoking the right of access under GDPR Article 15 demand deterministic intake classification, cryptographically verifiable routing, and fully auditable execution pipelines. Misclassification at the ingestion layer cascades into SLA violations, regulatory exposure, and expensive manual remediation. Privacy engineering and data automation teams must deploy strict type-mapping logic that translates ambiguous user submissions into canonical Article 15 workflows before any downstream data aggregation occurs.

1. Canonical Taxonomy & Semantic Normalization

Users rarely cite legal articles verbatim. Intake payloads typically contain natural language variations such as “export my profile,” “what data do you hold on me?” or “verify my account history.” While semantically aligned with Article 15, each variant implies different retrieval scopes, residency constraints, and applicable exemptions. Establishing a canonical taxonomy requires a deterministic routing layer that evaluates jurisdiction, request semantics, and identity proofing before triggering extraction.

This mapping discipline is formally codified in GDPR vs CCPA Request Taxonomies, where semantic ambiguity is resolved through regex-driven intent parsing, jurisdictional fallback matrices, and strict scope validation. The normalization pipeline must:

  1. Strip HTML/entities and normalize casing
  2. Tokenize against a curated intent dictionary
  3. Map to a standardized enum (ARTICLE_15_ACCESS, ARTICLE_17_ERASURE, ARTICLE_20_PORTABILITY)
  4. Attach a jurisdictional context flag before proceeding

Common phrasings map to a small set of canonical intents and their statutory basis:

User phrasing (examples) Canonical intent GDPR basis Response window
“access”, “download”, “export”, “what data do you hold” ARTICLE_15_ACCESS Article 15 — right of access 30 days
“delete”, “erase”, “remove”, “forget” ARTICLE_17_ERASURE Article 17 — erasure 30 days
“port”, “transfer”, “move my data” ARTICLE_20_PORTABILITY Article 20 — portability 30 days
unrecognized UNKNOWN manual classification clock paused

2. Deterministic Compliance Gating

Not every intake payload should proceed to data aggregation. The compliance gate must enforce identity proofing, validate jurisdictional applicability, and assess scope limitations—particularly Article 12(5) provisions regarding manifestly unfounded or excessive requests. Cryptographic signature verification is mandatory for webhook-originated submissions to prevent replay attacks or spoofed payloads.

Malformed or unauthenticated requests must be rejected immediately, emit structured correlation IDs, and route to a forensic dead-letter queue (DLQ) rather than failing silently. The gate operates as a stateful validator, leveraging strict schema constraints and cryptographic hashing to ensure payload integrity. Reference implementations for secure validation patterns align with Pydantic’s strict validation documentation, which enforces type safety, field coercion boundaries, and custom validators at the ingestion boundary.

3. Secure Routing Implementation (Python)

The following production-grade routing handler demonstrates deterministic classification, cryptographic correlation, and bounded retry logic. It uses pydantic v2 for schema enforcement, tenacity for idempotent retries, and hashlib/hmac for tamper-evident request tracking.

import hashlib
import hmac
import logging
import re
from enum import Enum
from typing import Dict, Optional, Tuple
from pydantic import BaseModel, field_validator, ValidationError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

logger = logging.getLogger("dsr.pipeline")

class DSRIntent(str, Enum):
    ARTICLE_15_ACCESS = "article_15_access"
    ARTICLE_17_ERASURE = "article_17_erasure"
    ARTICLE_20_PORTABILITY = "article_20_portability"
    UNKNOWN = "unknown"

class DSRIntake(BaseModel):
    request_id: str
    jurisdiction: str
    raw_payload: str
    auth_token: Optional[str] = None
    hmac_signature: Optional[str] = None

    @field_validator("raw_payload")
    @classmethod
    def normalize_payload(cls, v: str) -> str:
        return re.sub(r"<[^>]+>", "", v).lower().strip()

class Article15Router:
    INTENT_PATTERNS = {
        DSRIntent.ARTICLE_15_ACCESS: re.compile(
            r"(access|download|export|show.*data|what.*know|account.*details)"
        ),
        DSRIntent.ARTICLE_17_ERASURE: re.compile(
            r"(delete|erase|remove|forget|wipe)"
        ),
        DSRIntent.ARTICLE_20_PORTABILITY: re.compile(
            r"(port|transfer|csv|json.*export|move.*data)"
        ),
    }

    def __init__(self, config: Dict):
        self.allowed_jurisdictions = set(config.get("gdpr_regions", []))
        self.hmac_secret = config.get("hmac_secret", "").encode()
        self.logger = logging.getLogger("dsr.pipeline")

    def _verify_hmac(self, payload: str, signature: Optional[str]) -> bool:
        if not signature or not self.hmac_secret:
            return False
        expected = hmac.new(self.hmac_secret, payload.encode(), hashlib.sha256).hexdigest()
        return hmac.compare_digest(expected, signature)

    def classify_intent(self, payload: str) -> DSRIntent:
        for intent, pattern in self.INTENT_PATTERNS.items():
            if pattern.search(payload):
                return intent
        return DSRIntent.UNKNOWN

    def generate_correlation_id(self, request_id: str, payload: str) -> str:
        seed = f"{request_id}:{payload}".encode()
        return hashlib.sha256(seed).hexdigest()[:16]

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(ConnectionError),
        reraise=True
    )
    def route_request(self, intake: DSRIntake) -> Dict:
        correlation_id = self.generate_correlation_id(intake.request_id, intake.raw_payload)
        
        if not self._verify_hmac(intake.raw_payload, intake.hmac_signature):
            self.logger.warning(f"Auth failure: {correlation_id}")
            raise ValueError("Invalid cryptographic signature")

        if intake.jurisdiction not in self.allowed_jurisdictions:
            self.logger.info(f"Jurisdiction fallback: {correlation_id} -> {intake.jurisdiction}")
            return {"status": "routed_to_fallback", "correlation_id": correlation_id}

        intent = self.classify_intent(intake.raw_payload)
        if intent != DSRIntent.ARTICLE_15_ACCESS:
            self.logger.info(f"Non-Article 15 intent detected: {intent} | {correlation_id}")
            return {"status": "misclassified", "intent": intent, "correlation_id": correlation_id}

        # Proceed to secure aggregation pipeline
        self.logger.info(f"Article 15 gate passed: {correlation_id}")
        return {
            "status": "approved",
            "intent": intent,
            "correlation_id": correlation_id,
            "sla_deadline_hours": 720  # 30 days
        }

The routing gate rejects, falls back, or approves through a deterministic sequence of checks:

flowchart TD
    A["DSR intake"] --> B["Normalize payload - strip HTML, lowercase"]
    B --> C{"HMAC signature valid?"}
    C -->|no| R["Reject - auth failure"]
    C -->|yes| D{"Jurisdiction allowed?"}
    D -->|no| F["Route to fallback"]
    D -->|yes| E["Classify intent by regex"]
    E --> G{"Article 15 access?"}
    G -->|no| M["Flag misclassified"]
    G -->|yes| H["Approved - 30 day SLA"]

4. Fallback Routing & Dead-Letter Escalation

When classification confidence drops below a deterministic threshold, or when identity proofing fails, the pipeline must trigger a structured escalation workflow. This involves routing to a human-in-the-loop review queue, attaching the original payload hash, and preserving the audit trail. Fallback paths must maintain strict idempotency to prevent duplicate processing during retry storms or network partitions.

Production systems should implement a circuit-breaker pattern around downstream extraction services. If the data residency validator or identity provider returns transient errors, the request is serialized to a message broker (e.g., RabbitMQ, AWS SQS) with a retry_count header. Once the retry budget is exhausted, the payload transitions to a DLQ with a MANUAL_REVIEW_REQUIRED status tag. Compliance officers can then inspect the DLQ via a read-only audit dashboard, ensuring that no PII is exposed during triage.

5. SLA Enforcement & Idempotent Execution

Article 15 mandates a 30-day response window, with a possible 60-day extension under specific conditions. The routing layer must attach a monotonic timestamp, calculate the SLA deadline, and propagate it through the execution graph. All state transitions must be logged to an immutable audit store, capturing the exact moment of classification, jurisdictional validation, and downstream handoff.

Idempotency is enforced by hashing the normalized payload alongside the request ID. Subsequent duplicate submissions within the SLA window are intercepted at the gate and return the existing correlation_id without re-triggering aggregation. This prevents SLA clock resets and eliminates redundant compute costs. Legal baselines for these constraints are explicitly defined in the official Regulation (EU) 2016/679 (GDPR) text, which mandates transparent processing timelines and documented justification for any scope limitations.

By embedding deterministic routing, cryptographic verification, and bounded fallback logic directly into the intake layer, privacy engineering teams can guarantee that every Article 15 request is classified accurately, routed securely, and executed within regulatory boundaries. The architectural blueprint for these workflows is comprehensively documented in DSR Architecture & Intake Routing, providing the foundational patterns required for scalable, audit-ready compliance automation.