The Context Markup Language (CML) is an HTML-based markup language designed to provide machine-readable semantic context for terms and concepts within web content, enhancing the ability of software agents, such as Large Language Models (LLMs), to disambiguate meaning. CML uses inline tags and attributes to link terms to semantic web URIs and Decentralized Identifiers (DIDs), supporting Human-Centric AI (HCAI) by ensuring accurate, transparent, and trustworthy inferences for natural persons.

This is an unofficial draft specification for the Context Markup Language (CML), proposed for discussion within a W3C Community Group. It has not been endorsed by the W3C or any official standards body. Feedback is welcome via the associated Community Group repository or mailing list.

Introduction

Software agents, particularly LLMs, often struggle to disambiguate terms with multiple meanings (e.g., "thongs" as footwear or underwear, "physician" as a medical doctor or a clinic). Existing semantic web standards, such as RDFa and JSON-LD, provide machine-readable metadata but are complex for authors and not optimized for LLMs or HCAI principles. The Context Markup Language (CML) addresses these gaps by introducing a simple, inline HTML markup syntax that links terms to semantic web URIs and DIDs, enabling precise context for software agents while remaining accessible to web authors.

CML supports HCAI by ensuring accurate meaning inference, promoting transparency, and mitigating false assumptions that could harm natural persons. It complements existing semantic web standards, mapping to RDF where applicable, and supports decentralized web technologies through DID integration. This specification defines the CML syntax, namespace, and processing model, with examples for web authors and agent developers.

Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in [[RFC2119]].

CML
Context Markup Language, the markup language defined in this specification, distinct from Chemical Markup Language.
Software Agent
A program, such as an LLM, that processes web content to perform tasks like disambiguation or reasoning.
URI
A Uniform Resource Identifier, as defined in [[RFC3986]], used to reference semantic web resources.
DID
A Decentralized Identifier, as defined in [[DID-CORE]], used for decentralized context resolution.
Human-Centric AI (HCAI)
AI systems designed to prioritize the needs, rights, and well-being of natural persons, per ethical frameworks like the [[EU-AI-ACT]].

Namespace

The CML namespace is https://w3c.github.io/context-markup/ns#. All CML elements and attributes MUST use this namespace to avoid conflicts with other HTML or XML vocabularies, including Chemical Markup Language. In HTML documents, CML elements are prefixed with cml: (e.g., <cml:context>).

The namespace URI is a placeholder and will be finalized upon W3C Community Group adoption. It is distinct from namespaces used by Chemical Markup Language to prevent confusion.

Syntax

CML introduces two primary HTML elements and one attribute for embedding context within web content:

<cml:context> Element

The <cml:context> element wraps a term or phrase and provides a machine-readable context block using attributes to form an RDF-like triple (subject, predicate, object).

Attributes
  • subject: A URI or DID identifying the term’s concept (REQUIRED).
  • predicate: A URI defining the relationship (e.g., http://schema.org/definedAs) (REQUIRED).
  • object: A literal or URI providing the context value (REQUIRED).
  • lang: An optional language tag (e.g., en) for the object, per [[BCP47]].

Example:

<p>
  Wear <cml:context subject="https://wikidata.org/entity/Q12955949" predicate="http://schema.org/definedAs" object="Flip-flops" lang="en">thongs</cml:context> for a beach day.
</p>
      

This example defines "thongs" as flip-flops (footwear), disambiguating it from underwear.

<cml:link> Element and cml:href Attribute

The <cml:link> element or the cml:href attribute on HTML elements (e.g., <a>) references an external context resource, such as a knowledge base or DID document, as defined in [[HTML]] for link relations.

Attributes
  • cml:href: A URI or DID pointing to a context resource (REQUIRED).
  • rel: An optional relationship type (e.g., context), per [[HTML]].

Example:

<p>
  Buy our <cml:link cml:href="https://wikidata.org/entity/Q380339" rel="context">thongs</cml:link> for comfort.
</p>
      

This example references "thongs" as underwear via a Wikidata URI.

Processing Model

Software agents processing CML MUST follow these steps:

  1. Identify <cml:context> and <cml:link> elements or cml:href attributes within the HTML DOM.
  2. Extract attributes (subject, predicate, object, cml:href) and map them to an internal representation (e.g., RDF triples or JSON objects).
  3. Resolve URIs or DIDs to retrieve additional context, if available, per [[DID-CORE]].
  4. Use the extracted context to inform disambiguation or reasoning tasks, ensuring alignment with HCAI principles like accuracy and transparency.

Agents SHOULD handle malformed CML tags gracefully, ignoring invalid attributes or unresolved URIs/DIDs.

Integration with Semantic Web and DIDs

CML aligns with semantic web standards:

CML supports decentralized context:

Human-Centric AI Alignment

CML supports Human-Centric AI (HCAI) principles, ensuring AI systems prioritize the needs, rights, and well-being of natural persons, per frameworks like [[EU-AI-ACT]] and UNESCO’s AI Ethics Recommendation.

CML’s simplicity empowers content authors to contribute to semantic richness, democratizing HCAI benefits across diverse communities.

Examples

Disambiguating "thongs"

<p>
  Wear <cml:context subject="https://wikidata.org/entity/Q12955949" predicate="http://schema.org/definedAs" object="Flip-flops" lang="en">
    <a cml:href="did:example:Footwear123" href="https://example.com/flipflops">thongs</a>
  </cml:context> for a beach day.
</p>
<p>
  Our <cml:context subject="https://wikidata.org/entity/Q380339" predicate="http://schema.org/definedAs" object="Underwear">
    <cml:link cml:href="https://example.com/underwear" rel="context">thongs</cml:link>
  </cml:context> offer maximum comfort.
</p>
      

These examples clarify "thongs" as flip-flops (footwear) or underwear, with a DID for footwear and a URI for underwear.

Disambiguating "physician"

<p>
  Dr. Smith, a <cml:context subject="https://wikidata.org/entity/Q39631" predicate="http://schema.org/definedAs" object="Medical doctor" lang="en">
    <a cml:href="did:example:Doc123" href="https://example.com/dr-smith">physician</a>
  </cml:context>, specializes in cardiology.
</p>
<p>
  Visit our <cml:context subject="https://schema.org/Physician" predicate="http://schema.org/definedAs" object="Medical practice">
    <cml:link cml:href="https://example.com/clinic" rel="context">physician</cml:link>
  </cml:context> at 123 Main St.
</p>
      

These examples clarify "physician" as a person (Wikidata) or a place (Schema.org), with a DID for decentralized context.

Disambiguating "bank"

<p>
  Deposit money at the <cml:context subject="https://wikidata.org/entity/Q22687" predicate="http://schema.org/definedAs" object="Financial institution" lang="en">
    <a cml:href="did:example:Bank123" href="https://example.com/bank">bank</a>
  </cml:context> on Main St.
</p>
<p>
  Fish along the <cml:context subject="https://wikidata.org/entity/Q319686" predicate="http://schema.org/definedAs" object="Riverbank">
    <cml:link cml:href="https://example.com/riverbank" rel="context">bank</cml:link>
  </cml:context> of the river.
</p>
      

These examples clarify "bank" as a financial institution or a riverbank.

Disambiguating "apple"

<p>
  Eat an <cml:context subject="https://wikidata.org/entity/Q89" predicate="http://schema.org/definedAs" object="Fruit" lang="en">
    <a cml:href="did:example:Fruit123" href="https://example.com/apple-fruit">apple</a>
  </cml:context> for a healthy snack.
</p>
<p>
  The new <cml:context subject="https://wikidata.org/entity/Q312" predicate="http://schema.org/definedAs" object="Technology company">
    <cml:link cml:href="https://example.com/apple-inc" rel="context">Apple</cml:link>
  </cml:context> phone is innovative.
</p>
      

These examples clarify "apple" as a fruit or a technology company.

Implementation Considerations

To ensure successful adoption, CML faces the following challenges:

Security and Privacy Considerations

CML introduces minimal security risks, as it extends HTML with non-executable markup. However:

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. All CML implementations MUST conform to the syntax and processing model defined in this specification.

Thanks to the W3C Semantic Web Community, Decentralized Identifier Working Group, and HCAI researchers for inspiration and feedback. Special acknowledgment to the Wikidata and Schema.org communities for providing critical context resources.