The IDEA Ontology is an ontology for representing ideas, issues, arguments, and approaches in scientific research and scholarly communication.
Nipun D. Pathirage
https://opensource.org/licenses/MIT
2025-12-29
Tetherless World Constellation | Rensselaer Polytechnic Institute
IDEA Ontology
Scientific literature communicates knowledge not only through data and results but also through structured scholarly discourse, including motivations, problems, proposed ideas, supporting evidence, and claims. These abstract components are distributed across documents and connected through citations, making them difficult to extract and represent automatically. Existing scholarly knowledge graphs largely focus on metadata or citation networks, offering limited support for modeling scientific intent and argumentation.
The Idea Graph addresses this limitation by introducing an ontology-centered framework that captures both concrete document structure and abstract scholarly discourse in a unified knowledge graph. The framework integrates layout-aware document parsing, ontology-guided extraction with large language models, and structured RDF instantiation. In addition, it models provenance information for all knowledge graph construction operations and adopts the PAV ontology to manage versioning and lineage of extracted instances. This design enables interpretable, evolvable, and semantically grounded scholarly knowledge graphs suitable for advanced querying, reasoning, and machine-assisted knowledge discovery.
https://tetherless-world.github.io/idea-ontology/ontology#
The Idea Graph is a framework for constructing ontology-driven scholarly knowledge graphs from scientific literature. It models scientific publications across three interconnected layers: document structure, extracted information, and abstract scholarly discourse such as research problems, ideas, evidence, and claims. The framework employs a discourse-aware ontology inspired by argumentation theory to align textual grounding, rhetorical intent, and citation attribution. An end-to-end pipeline transforms scientific PDFs into RDF graphs using layout-preserving parsing and ontology-guided large language model extraction. To ensure transparency and reproducibility, the system explicitly represents knowledge graph construction provenance and integrates the PAV ontology for instance-level versioning and lineage tracking. The resulting graphs support semantic querying, retrieval-augmented generation, and interactive visualization of discourse structures and citation roles.
2.2.0
Provenance feature introduced release of IDEA Ontology version 2.2.0.
Copyright (c) 2025 TWC
a step-by-step procedure for solving a problem or accomplishing some end
algorithm
a usually formal statement of the equality or equivalence of mathematical or logical expressions
equation
a diagram or pictorial illustration of textual matter
figure
a simple series of words or numerals (such as the names of persons or objects)
list
a subdivision of a written composition that consists of one or more sentences, deals with one point or gives the words of one speaker, and begins on a new usually indented line
paragraph
a systematic arrangement of data usually in rows and columns for ready reference
label
this is manifestation of rdfs:label data annotation property as a data property to make it easy to develop class axiom and completeness of the ontology to guide prompting.
has label
true
1
1
1
to take preliminary steps toward accomplishment or full knowledge or experience of
approach
1
1
1
1
1
1
1
1
1
1
1
1
1
An experimental design is a structured, organized method for determining the relationship between factors affecting cause-effect relations between known and unknown variables. Reference: Sigma:http://www.isixsigma.com/dictionary/Design_of_Experiments_-_DOE-41.htm
ExperimentalDesign
Hypothesis formation is the scientific task �to frame a hypothesis or supposition*? on the basis of preliminary information gathering and problem analysis. Reference: The Oxford English Dictionary. Oxford University Press, 2 Ed., 1989
HypothesisFormation
Result evaluating is the act of determining the value of experimental results. Reference: Based on The Oxford English Dictionary. Oxford University Press, 2 Ed., 1989
ResultEvaluating
Scientific investigation is a systematic examination; careful and detailed research. Reference: The Oxford English Dictionary. Oxford University Press, 2 Ed., 1989
ScientificInvestigation
1
these are abstract element derived from the paper
This includes the belief that the necessary data, documents, equipment, or participant populations will be accessible to the researcher.
access
A formal, abstract, step-by-step description of a computational process or set of rules used to solve a specific problem. It is the conceptual "recipe" itself, distinct from its implementation in code.
algorithm
These are assumptions required by the data analysis techniques themselves. For example, many statistical tests assume that the data follows a normal distribution.
analytical
A general class for any human-made object, conceptual or physical, that is part of the research process.
artifact
In science, an auxiliary hypothesis that is taken as true for the purposes of interpreting a particular test. All tests involve making assumptions. If an assumption of a test turns out to be inaccurate, it can cause the test results to be incorrectly interpreted. However, assumptions can be independently tested to help establish their accuracy
assumption
The argument suggests that because two domains share a similar structure or set of properties, a principle that works in one domain is likely to work in the other.
Just as deep learning revolutionized image recognition by learning hierarchical features, we argue that a similar approach can be applied to financial time-series data to detect complex, multi-scale patterns.
The argument establishes the value of a proposed method, model, or theory by benchmarking it against the current state-of-the-art or other known solutions.
While existing methods for this task rely on computationally expensive transformers, our proposed lightweight CNN model achieves comparable performance with a 90% reduction in inference time and memory usage.
The argument rests on the presentation of data, measurements, or observations that confirm the claim. This can be qualitative or quantitative.
Our proposed algorithm achieved an accuracy of 92% on the test dataset, outperforming the baseline by 8% (see Table 4).
The argument uses a case study, a specific example, or an illustrative scenario to demonstrate the validity or utility of a claim.
To illustrate the effectiveness of our new user interface design, consider the following use case where a novice user is able to complete the target task 50% faster than with the previous design.
The argument's strength comes from leveraging the findings of previously published and peer-reviewed research.
As demonstrated by Smith et al. (2019), this cellular pathway is critical for protein synthesis. Therefore, our claim that the observed mutation affects this pathway is a plausible mechanism for the disease.
The argument uses formal logic or mathematical derivation to demonstrate that the claim must be true if a set of underlying axioms is accepted.
We prove the completeness of our proposed logic by demonstrating that for any valid formula, a corresponding proof can be constructed within our axiomatic system (see Appendix A for the full proof).
These relate to the integrity of the data gathering process. A common example is the assumption that participants in a survey or interview are providing honest and accurate responses.
data_collection
A structured collection of data. It is often the primary input for a Model or the primary output (prov:wasGeneratedBy) of an Experiment.
dataset
A conceptual model, specification, or blueprint that details how an artifact is to be constructed or how an activity is to be performed. In the context of our EXPO discussion, this maps to the expo:ExperimentalDesign, which specifies the plan, objects, and actions required to achieve an experimental goal.
design
A set of reusable software components, libraries, and conceptual guidelines that provide a structure for building applications. It is more concrete than a Design but more general than a final Model. In the SemSur ontology, this can be considered a type of semsur:Implementation or semsur:Approach. The "Semantic Web" itself was proposed as a new Framework built from RDF and Ontologies.
framework
These are assumptions about the validity and appropriateness of the methods and procedures used in the study. They are the foundational premises that allow the researcher to trust their own process.
methodological
A representation of a system, object, or phenomenon used to explain, predict, or test a hypothesis. This can be a mathematical, physical, or computational model (like a simulation).
model
This is the intentional exclusion of certain variables or factors to simplify the model and isolate the phenomena of interest. A classic example in physics is assuming a frictionless surface or neglecting air resistance.
negligence
This category directly addresses your point about "availability." These are practical assumptions about the resources required to conduct the research. While often unstated, they are critical to the project's feasibility.
resource
This involves limiting the study to a specific context. Examples include focusing on a particular demographic, a specific geographic region, or a defined time period. This is a deliberate act to ensure the research question can be answered within practical constraints.
restriction
These are the conscious, intentional choices a researcher makes to define the boundaries of their study. They are not weaknesses, but necessary decisions to make the research feasible and focused
scoping
This is the assumption that the resources and environment will remain stable enough for the duration of the study. For example, a long-term study assumes that the software, hardware, or data formats it relies on will not become obsolete or unavailable midway through the project.
stability
These are the assumptions that link a researcher's evidence to their claims. In the context of the Argument Model Ontology (AMO) we've discussed, these are formally known as warrants. They are the logical principles or beliefs that justify the research approach.
theoritical