Fact-based, Trustworthy AI

A simple solution that is easy to implement, tamper-proof and future-ready.

1. Problem Statement

In 2023, the Internet started buzzing with AI agents, crawlers and language models all vying for their moment of glory.

We've learned that they can be clever, creative, productive and resourceful

As with all new technologies, new challenges also emerged.

Too often they are low quality, out of touch, lack integrity and attribution.

Content creators need a way to verify and demonstrate ownership, receive attribution even negotiate automatic compensation.

Teams building AI struggle to know what content is fair use, whether it has permissive licensing and so on.

We consider the following challenges with respect to AI so that society benefits as a whole.

  1. Data Integrity and Reliability: Without proper mechanisms in place to ensure data integrity, AI systems may encounter inaccuracies, inconsistencies, or even deliberate misinformation in the raw data they process. This can lead to flawed outputs and unreliable decision-making processes.
  2. Attribution and Trustworthiness: Raw data often lacks clear attribution, making it difficult for AI systems to determine the source or credibility of the information. This lack of trustworthiness can result in AI systems incorporating unreliable or biased data into their models, leading to skewed outcomes.
  3. Ethical Considerations: Raw data may contain sensitive or controversial content that raises ethical concerns. AI systems need to be equipped with ethical frameworks and guidelines to handle such data responsibly, ensuring that they do not perpetuate harmful biases or stereotypes.
  4. Legal and Regulatory Compliance: Data lacking ethical or commercial context may also raise legal and regulatory issues, particularly in terms of data privacy, intellectual property rights, and content licensing. AI systems must navigate these complexities to ensure compliance with relevant laws and regulations.
  5. Contextual Understanding: Without proper contextual information, AI systems may struggle to interpret and analyze raw data accurately. Ethical and commercial context provides valuable insights into the intended meaning, relevance, and implications of the data, enabling more informed decision-making.
  6. Bias and Fairness: Raw data, especially when unfiltered, may inadvertently reflect societal biases or prejudices. AI systems trained on such data risk perpetuating or exacerbating these biases in their outputs, leading to unfair or discriminatory outcomes.
  7. Transparency and Explainability: Understanding the ethical and commercial context of data is essential for ensuring transparency and explainability in AI systems. Stakeholders need to understand how decisions are made and what factors influence AI-generated outputs to trust and effectively use these systems.

2. Project Objectives

2.1. Ensure Trustworthiness: Ensure that the facts are trustworthy and immutable - fostering trust among stakeholders.

2.2. Semantic Interoperability: Incorporating standards such as PROV-O, SKOS, and VOID annotations facilitates semantic interoperability. Enable seamless information exchange and collaboration across diverse domains and platforms.

2.3. Knowledge Curation: The fact web should serve as a structured repository for organizing and discovering relevant facts by adhering to standards, ontologies and vocabularies, such as PROV-O, SKOS, and VOID.

2.4. Transparent and Reproducible: By capturing provenance information using PROV-O, the fact web enables transparent and reproducible research. Auditors can trace the lineage of data and assertions, understand how they were derived or obtained, and verify their authenticity.

2.5. Reasoning and Analysis: The fact web facilitates automated reasoning and analysis. By representing data and relationships using standardized RDF, tools can infer new knowledge, detect patterns, and derive insights.

2.6. Privacy and Security: While promoting openness and transparency maintain privacy and security of sensitive information. Access control is part of the graph, identifying confidential data, authorized agents, privacy regulations and ethical standards.

2.7. Collaboration and Sharing: By providing APIs, query interfaces, and visualization tools, the graph enables seamless collaboration and communication, accelerating the pace of discovery and innovation.

3 Fact Claims Architecture

In our architecture, a fact is a small atomic unit of knowledge - serialized in JSON-LD format.

The web links one fact to another - in same or separate documents - a technique called linked data.

We consider a grounded fact to be one stored on IPFS as an immutable record.

At runtime, URLs within the JSON-LD may be retrieved at the agent's discretion.

Unlike IPFS, Internet facts are not tamper-proof. They are dynamic so facts may be ingested by the agent, as required.

The technical architecture for fact claims consists of several key components:

  • IPFS Network: A peer-to-peer network of nodes running IPFS software, facilitating the storage and retrieval of files using content-based addressing.
  • Fact Web: A graph data structure representing interconnected facts and trust claims within the decentralized system.
  • Trust Chains: Blockchain smart contracts can notarize an immutable chain of linked facts, asserting provenance permanetly.
  • Smart Agents: Interface with the IPFS network and Internet for storage, retrieval, curation, inference, visualization and publication of fact webs.

3.1 Conceptual Fact Web

With Linked Data, concepts and documents can be inter-related to describe almost anything.

graph TB;
  Organization["schema:Organization"] -->|schema:offers| Services["schema:Service"]
  Organization2 -->|my:supplier| Organization["schema:Organization"]
  Products["schema:Product"] -->|schema:hasPart| Outcomes["my:Outcomes"]
  Services -->|schema:hasPart| Outcomes
  Products -->|my:supplier| SupplyChain["schema:Product"]
  Services -->|my:provider| Organization2
  Organization -->|my:partners| Organization2["schema:Organization"]

4. Solution Domains

  • AI Answer Engines require a new model for finding trusted facts.
  • Trust is paramount in assessing the credibility of information sources.
  • Fact claims in RDF format, enriched with metadata, serve as foundational elements.
  • Verifiable assertions backed by cryptographic proofs and smart contracts ensure trust.
  • Real-time algorithms verify dynamic and evolving fact claims.
  • Semantic coherence and trust supersede traditional SEO practices.
Solution DomainUse Cases
EcommercePromote with Schema.org - Products, Events, Services, Offers, Loans/Credit
Content CreatorsEmploy metadata standards like Dublin Core and IPTC for multimedia curation - Creative Commons for licensing
EducationEnsure consistency through IMS Global Learning and SCORM - Learning Resource Metadata Initiative (LRMI) and CiTo (Citations)
Regulatory ComplianceVerifiable reports with XBRL, FIBO, and FIGREGONT - Ensure transparency and traceability using standards like PROV-O
Financial AuditingFinancial data exchange with XBRL - data auditability with PROV-O
Supply Chain ManagementTraceability in supply chain processes with GS1 Standards - Utilize PROV-O for tracking changes
Healthcare Data ManagementStandards like HL7 and FHIR for semantic healthcare data
Intellectual PropertyDefine digital rights with W3C ODRL for IP management
Research & InnovationTrack research activities with PROV-O and SKOS - citeusing CiTo (Citations)
Environmental SustainabilityModel environmental data using OGC SOSA/SSN
Legal Contracts and AgreementsModel legal matter Legal Core Ontology - Define digital rights using W3C ODRL
Identity and Access ManagementDigital credentials with W3C VC and DID
Energy Trading and Grid ManagementEnergy market information with IEC CIM
Credential VerificationCredentials using W3C VC and Open Badges
Asset Tokenization and ManagementDefine tokenomics with ERC-20 and ERC-721
News and Data-driven NarrativesStory-driven data exchange with NewsML-G2 and NITF

5. References