tech.transparencia.document.item

transparencia.pds.transparencia.tech

Documentation

Core document metadata for official and institutional documents. Stores identity, provenance, and public context, but not full text, sections, chunks, AI analysis, or ingestion pipeline state.

main record

Core document metadata for official and institutional documents. Stores identity, provenance, and public context, but not full text, sections, chunks, AI analysis, or ingestion pipeline state.

Record Key tid Timestamp-based ID

Properties

country string Optional

Primary country connected to the document, as an ISO 3166-1 alpha-2 code (e.g., 'MX', 'BR', 'US'). Omit for international documents.

maxLength: 2 bytes
createdAt string datetime Required

When this AT Protocol record was created.

description string Optional

Short source-provided description or human-readable abstract. AI summaries should be stored in enrichment records.

maxLength: 10000 bytesmaxGraphemes: 2000 graphemes
documentType string Required

Machine-readable document category. Open set; known values cover common official and institutional documents.

maxLength: 128 bytes
Known values: official-publication, official-gazette-issue, official-gazette-entry, law, decree, agreement, notice, regulation, standard, report, audit-report, budget-document, contract, procurement-document, court-ruling, legislative-bill, legislative-opinion, treaty, submission, technical-paper, environmental-impact-document, education-policy-document, dataset-documentation, meeting-minutes, resolution, other
domains array of string Optional

Broad public-interest domains covered by the document. Open set; consumers should tolerate unknown values.

maxLength: 20 items
effectiveAt string datetime Optional

When the document's legal or administrative effects begin, if applicable and explicitly known.

identifiers array of ref #identifier Optional

External identifiers such as DOF IDs, UNFCCC symbols, file numbers, docket numbers, ISBNs, or local archival IDs. For content hashes use retrieval.sha256; for URLs use retrieval.url.

maxLength: 50 items
issuedAt string datetime Optional

When the issuing authority signed, issued, adopted, or approved the document, if different from publication time.

issuingBodies array of ref tech.transparencia.defs#organization Optional

Organizations, public bodies, institutions, or authorities responsible for issuing, publishing, filing, or adopting the document. Uses the shared tech.transparencia.defs#organization type. Conventional role values include 'publisher', 'issuer', 'author', 'adopter', 'filer', 'regulator', 'court', 'legislature', 'repository'.

maxLength: 20 items
jurisdiction string Optional

Legal or administrative jurisdiction covered by the document (e.g., 'federal', 'state', 'municipal', 'international').

maxLength: 256 bytesmaxGraphemes: 64 graphemes
Known values: local, municipal, state, federal, national, regional, international, supranational, unknown
language string language Optional

Primary language of the document content (BCP-47, e.g., 'es-MX', 'en', 'pt-BR').

publishedAt string datetime Required

When the document was published by the source. Use midnight UTC when only a calendar date is available.

retrieval ref #retrieval Required

Per-document retrieval metadata: canonical URLs, MIME type, checksums, file size, and access status of the specific retrieved representation.

source ref com.atproto.repo.strongRef Required

Strong reference to the tech.transparencia.document.source record for the publisher or repository (e.g., DOF, UNFCCC). Identifies which source this document came from.

subtitle string Optional

Optional subtitle, section heading, or secondary title.

maxLength: 4096 bytesmaxGraphemes: 1024 graphemes
title string Required

Official or source-provided title of the document.

maxLength: 4096 bytesmaxGraphemes: 1024 graphemes
topics array of string Optional

Free-form topics, tags, or source categories attached to the document.

maxLength: 30 items
updatedAt string datetime Optional

When this record was last materially updated.

View raw schema
{
  "key": "tid",
  "type": "record",
  "record": {
    "type": "object",
    "required": [
      "title",
      "documentType",
      "source",
      "retrieval",
      "publishedAt",
      "createdAt"
    ],
    "properties": {
      "title": {
        "type": "string",
        "maxLength": 4096,
        "description": "Official or source-provided title of the document.",
        "maxGraphemes": 1024
      },
      "source": {
        "ref": "com.atproto.repo.strongRef",
        "type": "ref",
        "description": "Strong reference to the tech.transparencia.document.source record for the publisher or repository (e.g., DOF, UNFCCC). Identifies which source this document came from."
      },
      "topics": {
        "type": "array",
        "items": {
          "type": "string",
          "maxLength": 512,
          "maxGraphemes": 128
        },
        "maxLength": 30,
        "description": "Free-form topics, tags, or source categories attached to the document."
      },
      "country": {
        "type": "string",
        "maxLength": 2,
        "description": "Primary country connected to the document, as an ISO 3166-1 alpha-2 code (e.g., 'MX', 'BR', 'US'). Omit for international documents."
      },
      "domains": {
        "type": "array",
        "items": {
          "type": "string",
          "maxLength": 128,
          "knownValues": [
            "government",
            "politics",
            "law",
            "justice",
            "environment",
            "climate",
            "education",
            "health",
            "budget",
            "procurement",
            "economy",
            "finance",
            "labor",
            "energy",
            "infrastructure",
            "security",
            "science-technology",
            "society",
            "human-rights",
            "other"
          ],
          "maxGraphemes": 64
        },
        "maxLength": 20,
        "description": "Broad public-interest domains covered by the document. Open set; consumers should tolerate unknown values."
      },
      "issuedAt": {
        "type": "string",
        "format": "datetime",
        "description": "When the issuing authority signed, issued, adopted, or approved the document, if different from publication time."
      },
      "language": {
        "type": "string",
        "format": "language",
        "description": "Primary language of the document content (BCP-47, e.g., 'es-MX', 'en', 'pt-BR')."
      },
      "subtitle": {
        "type": "string",
        "maxLength": 4096,
        "description": "Optional subtitle, section heading, or secondary title.",
        "maxGraphemes": 1024
      },
      "createdAt": {
        "type": "string",
        "format": "datetime",
        "description": "When this AT Protocol record was created."
      },
      "retrieval": {
        "ref": "#retrieval",
        "type": "ref",
        "description": "Per-document retrieval metadata: canonical URLs, MIME type, checksums, file size, and access status of the specific retrieved representation."
      },
      "updatedAt": {
        "type": "string",
        "format": "datetime",
        "description": "When this record was last materially updated."
      },
      "description": {
        "type": "string",
        "maxLength": 10000,
        "description": "Short source-provided description or human-readable abstract. AI summaries should be stored in enrichment records.",
        "maxGraphemes": 2000
      },
      "effectiveAt": {
        "type": "string",
        "format": "datetime",
        "description": "When the document's legal or administrative effects begin, if applicable and explicitly known."
      },
      "identifiers": {
        "type": "array",
        "items": {
          "ref": "#identifier",
          "type": "ref"
        },
        "maxLength": 50,
        "description": "External identifiers such as DOF IDs, UNFCCC symbols, file numbers, docket numbers, ISBNs, or local archival IDs. For content hashes use retrieval.sha256; for URLs use retrieval.url."
      },
      "publishedAt": {
        "type": "string",
        "format": "datetime",
        "description": "When the document was published by the source. Use midnight UTC when only a calendar date is available."
      },
      "documentType": {
        "type": "string",
        "maxLength": 128,
        "description": "Machine-readable document category. Open set; known values cover common official and institutional documents.",
        "knownValues": [
          "official-publication",
          "official-gazette-issue",
          "official-gazette-entry",
          "law",
          "decree",
          "agreement",
          "notice",
          "regulation",
          "standard",
          "report",
          "audit-report",
          "budget-document",
          "contract",
          "procurement-document",
          "court-ruling",
          "legislative-bill",
          "legislative-opinion",
          "treaty",
          "submission",
          "technical-paper",
          "environmental-impact-document",
          "education-policy-document",
          "dataset-documentation",
          "meeting-minutes",
          "resolution",
          "other"
        ]
      },
      "jurisdiction": {
        "type": "string",
        "maxLength": 256,
        "description": "Legal or administrative jurisdiction covered by the document (e.g., 'federal', 'state', 'municipal', 'international').",
        "knownValues": [
          "local",
          "municipal",
          "state",
          "federal",
          "national",
          "regional",
          "international",
          "supranational",
          "unknown"
        ],
        "maxGraphemes": 64
      },
      "issuingBodies": {
        "type": "array",
        "items": {
          "ref": "tech.transparencia.defs#organization",
          "type": "ref"
        },
        "maxLength": 20,
        "description": "Organizations, public bodies, institutions, or authorities responsible for issuing, publishing, filing, or adopting the document. Uses the shared tech.transparencia.defs#organization type. Conventional role values include 'publisher', 'issuer', 'author', 'adopter', 'filer', 'regulator', 'court', 'legislature', 'repository'."
      }
    }
  },
  "description": "Core document metadata for official and institutional documents. Stores identity, provenance, and public context, but not full text, sections, chunks, AI analysis, or ingestion pipeline state."
}
identifier object

External identifier assigned to a document by a source system, authority, archive, or standard. For content hashes use retrieval.sha256; for URLs use retrieval.url.

Properties

type string Required

Identifier type or namespace.

maxLength: 128 bytes
Known values: dof_id, dof_publication_id, unfccc_symbol, official_file_number, docket_number, case_number, law_number, isbn, issn, doi, other
url string uri Optional

Optional URL where this identifier can be resolved or verified.

value string Required

Identifier value.

maxLength: 1024 bytesmaxGraphemes: 256 graphemes
View raw schema
{
  "type": "object",
  "required": [
    "type",
    "value"
  ],
  "properties": {
    "url": {
      "type": "string",
      "format": "uri",
      "description": "Optional URL where this identifier can be resolved or verified."
    },
    "type": {
      "type": "string",
      "maxLength": 128,
      "description": "Identifier type or namespace.",
      "knownValues": [
        "dof_id",
        "dof_publication_id",
        "unfccc_symbol",
        "official_file_number",
        "docket_number",
        "case_number",
        "law_number",
        "isbn",
        "issn",
        "doi",
        "other"
      ]
    },
    "value": {
      "type": "string",
      "maxLength": 1024,
      "description": "Identifier value.",
      "maxGraphemes": 256
    }
  },
  "description": "External identifier assigned to a document by a source system, authority, archive, or standard. For content hashes use retrieval.sha256; for URLs use retrieval.url."
}
retrieval object

Per-document retrieval metadata for a single retrieved representation. Publisher-level metadata (name, base URL, license) lives on the tech.transparencia.document.source record referenced by 'source'.

Properties

accessType string Optional

Access status of the source at retrieval time. Use 'previously-public' for documents that were once publicly accessible but have since been withdrawn or removed by the source.

maxLength: 64 bytes
Known values: public, restricted, paywalled, previously-public, unknown
blob blob Optional

Optional binary attachment preserving the actual document bytes on this PDS. Typical contents: the source PDF, an HTML snapshot, the extracted plain text used by the enrichment pipeline, or the original upstream JSON payload (e.g., a SIDOF response). The other retrieval fields (url, pdfUrl, sha256) still reference the original public source — this blob is the archived copy. Max 50 MB.

maxSize: 50.0 MB
canonicalUrl string uri Optional

Canonical, normalized, or preferred public URL for the document.

fileName string Optional

Original or normalized file name, if applicable.

maxLength: 1024 bytesmaxGraphemes: 256 graphemes
htmlUrl string uri Optional

HTML landing page or web version of the document, if available.

license string Optional

Per-document license override, if the document is licensed differently from the source-level default.

maxLength: 512 bytesmaxGraphemes: 128 graphemes
mimeType string Optional

MIME type of the retrieved representation (e.g., 'text/html', 'application/pdf').

maxLength: 128 bytes
pdfUrl string uri Optional

PDF or downloadable document URL, if available.

retrievedAt string datetime Required

When the source was retrieved by the pipeline.

sha256 string Optional

SHA-256 checksum of the retrieved file or canonical source payload, if available.

maxLength: 64 bytes
sizeBytes integer Optional

Size of the retrieved file or canonical representation in bytes.

minimum: 0
sourceId string Optional

Source-system identifier for deduplication, if provided by the upstream source.

maxLength: 512 bytes
url string uri Required

URL where this document was found or retrieved.

View raw schema
{
  "type": "object",
  "required": [
    "url",
    "retrievedAt"
  ],
  "properties": {
    "url": {
      "type": "string",
      "format": "uri",
      "description": "URL where this document was found or retrieved."
    },
    "blob": {
      "type": "blob",
      "accept": [
        "application/pdf",
        "text/html",
        "text/plain",
        "application/json"
      ],
      "maxSize": 50000000,
      "description": "Optional binary attachment preserving the actual document bytes on this PDS. Typical contents: the source PDF, an HTML snapshot, the extracted plain text used by the enrichment pipeline, or the original upstream JSON payload (e.g., a SIDOF response). The other retrieval fields (url, pdfUrl, sha256) still reference the original public source — this blob is the archived copy. Max 50 MB."
    },
    "pdfUrl": {
      "type": "string",
      "format": "uri",
      "description": "PDF or downloadable document URL, if available."
    },
    "sha256": {
      "type": "string",
      "maxLength": 64,
      "description": "SHA-256 checksum of the retrieved file or canonical source payload, if available."
    },
    "htmlUrl": {
      "type": "string",
      "format": "uri",
      "description": "HTML landing page or web version of the document, if available."
    },
    "license": {
      "type": "string",
      "maxLength": 512,
      "description": "Per-document license override, if the document is licensed differently from the source-level default.",
      "maxGraphemes": 128
    },
    "fileName": {
      "type": "string",
      "maxLength": 1024,
      "description": "Original or normalized file name, if applicable.",
      "maxGraphemes": 256
    },
    "mimeType": {
      "type": "string",
      "maxLength": 128,
      "description": "MIME type of the retrieved representation (e.g., 'text/html', 'application/pdf')."
    },
    "sourceId": {
      "type": "string",
      "maxLength": 512,
      "description": "Source-system identifier for deduplication, if provided by the upstream source."
    },
    "sizeBytes": {
      "type": "integer",
      "minimum": 0,
      "description": "Size of the retrieved file or canonical representation in bytes."
    },
    "accessType": {
      "type": "string",
      "maxLength": 64,
      "description": "Access status of the source at retrieval time. Use 'previously-public' for documents that were once publicly accessible but have since been withdrawn or removed by the source.",
      "knownValues": [
        "public",
        "restricted",
        "paywalled",
        "previously-public",
        "unknown"
      ]
    },
    "retrievedAt": {
      "type": "string",
      "format": "datetime",
      "description": "When the source was retrieved by the pipeline."
    },
    "canonicalUrl": {
      "type": "string",
      "format": "uri",
      "description": "Canonical, normalized, or preferred public URL for the document."
    }
  },
  "description": "Per-document retrieval metadata for a single retrieved representation. Publisher-level metadata (name, base URL, license) lives on the tech.transparencia.document.source record referenced by 'source'."
}

Lexicon Garden

@