- Architecture Decision Record: Use ADRs
- Architecture Decision Record: Configuration
- Architecture Decision Record: Datomic-based Configuration
- Architecture Decision Record: Module Structure & Loading
- Architecture Decision Record: User Facing Configuration
- Architecture Decision Record: Core Runtime
- Architecture Decision Record: Configuration Updates
- Architecture Decision Record: Abstract Modules
- Architecture Decision Record: Configuration Ontology
- Architecture Decision Record: Persistent Configuration
- Architecture Decision Record: Asset Pipeline
- Architecture Decision Record: Enhanced Validation
- Architecture Decision Record: Error Reporting
- Architecture Decision Record: Project Templates
- Architecture Decision Record: Data Abstraction Model
- Architecture Decision Record: Database Migrations
- Architecture Decision Record: Simplification of Chimera Model
Architecture Decision Record: Use ADRs
Context
Arachne has several very explicit goals that make the practice and discipline of architecture very important:
- We want to think deeply about all our architectural decisions, exploring all alternatives and making a careful, considered, well-researched choice.
- We want to be as transparent as possible in our decision-making process.
- We don't want decisions to be made unilaterally in a vacuum. Specifically, we want to give our steering group the opportunity to review every major decision.
- Despite being a geographically and temporally distributed team, we want our contributors to have a strong shared understanding of the technical rationale behind decisions.
- We want to be able to revisit prior decisions to determine fairly if they still make sense, and if the motivating circumstances or conditions have changed.
Decision
We will document every architecture-level decision for Arachne and its core modules with an Architecture Decision Record. These are a well structured, relatively lightweight way to capture architectural proposals. They can serve as an artifact for discussion, and remain as an enduring record of the context and motivation of past decisions.
The workflow will be:
- A developer creates an ADR document outlining an approach for a particular question or problem. The ADR has an initial status of "proposed."
- The developers and steering group discuss the ADR. During this period, the ADR should be updated to reflect additional context, concerns raised, and proposed changes.
- Once consensus is reached, ADR can be transitioned to either an "accepted" or "rejected" state.
- Only after an ADR is accepted should implementing code be committed to the master branch of the relevant project/module.
- If a decision is revisited and a different conclusion is reached, a new ADR should be created documenting the context and rationale for the change. The new ADR should reference the old one, and once the new one is accepted, the old one should (in its "status" section) be updated to point to the new one. The old ADR should not be removed or otherwise modified except for the annotation pointing to the new ADR.
Status
Accepted
Consequences
- Developers must write an ADR and submit it for review before selecting an approach to any architectural decision -- that is, any decision that affects the way Arachne or an Arachne application is put together at a high level.
- We will have a concrete artifact around which to focus discussion, before finalizing decisions.
- If we follow the process, decisions will be made deliberately, as a group.
- The master branch of our repositories will reflect the high-level consensus of the steering group.
- We will have a useful persistent record of why the system is the way it is.
Architecture Decision Record: Configuration
Context
Arachne has a number of goals.
It needs to be modular. Different software packages, written by different developers, should be usable and swappable in the same application with a minimum of effort.
Arachne applications need to be transparent and introspectable. It should always be as clear as possible what is going on at any given moment, and why the application is behaving in the way it does.
As a general-purpose web framework, it needs to provide a strong set of default settings which are also highly overridable, and configurable to suit the unique needs of users.
Also, it is a good development practice (particularly in Clojure) to code to a specific information model (that is, data) rather than to particular functions or APIs. Along with other benefits, this helps separate (avoids "complecting") the intended operation and its implementation.
Documenting the full rationale for this "data first" philosophy is beyond the scope of this document, but some resources that explain it (among other things) are:
- Simple Made Easy - Rich Hickey
- Narcissistic Design - Stuart Halloway
- Data Beats Functions - Malcolm Sparks
- Always Be Composing - Zach Tellman
- Data > Functions > Macros - Eric Normand
Finally, one weakness of many existing Clojure libraries, especially web development libraries, is the way in which they overload the Clojure runtime (particularly vars and reified namespaces) to store information about the webapp. Because both the Clojure runtime and many web application entities (e.g servers) are stateful, this causes a variety of issues, particularly with reloading namespaces. Therefore, as much as possible, we would like to avoid entangling information about an Arachne application with the Clojure runtime itself.
Decision
Arachne will take the "everything is data" philosophy to its logical extreme, and encode as much information about the application as possible in a single, highly general data structure. This will include not just data that is normally thought of as "config" data, but the structure and definition of the application itself. Everything that does not have to be arbitrary executable code will be reflected in the application config value.
Some concrete examples include (but are not limited to):
- Dependency injection components
- Runtime entities (servers, caches, connections, pools, etc)
- HTTP routes and middleware
- Persistence schemas and migrations
- Locations of static and dynamic assets
This configuration value will have a schema that defines what types of entities can exist in the configuration, and what their expected properties are.
Each distinct module will have the ability to contribute to the schema and define entity types specific to its own domain. Modules may interact by referencing entity types and properties defined in other modules.
Although it has much in common with a fully general in-memory database, the configuration value will be a single immutable value, not a stateful data store. This will avoid many of the complexities of state and change, and will eliminate the temptation to use the configuration itself as dynamic storage for runtime data.
Status
Proposed
Consequences
- Applications will be defined comprehensively and declaratively by a rich data structure, before the application even starts.
- The config schema provides an explicit, reliable contract and set of extension points, which can be used by other modules to modify entities or behaviors.
- It will be easy to understand and inspect an application by inspecting or querying its configuration. It will be possible to write tools to make exploring and visualizing applications even easier.
- Developers will need to carefully decide what types of things are appropriate to encode statically in the configuration, and what must be dynamic at runtime.
Architecture Decision Record: Datomic-based Configuration
Context
ADR-002 indicates that we will store the entire application config in a single rich data structure with a schema.
Config as Database
This implies that it should be possible to easily search, query and update the configuration value. It also implies that the configuration value is general enough to store arbitrary data; we don't know what kinds of things users or module authors will need to include.
If what we need is a system that allows you to define, query, and update arbitrary data with a schema, then we are looking for a database.
Required data store characteristics:
- It must be available under a permissive open source license. Anything else will impose unwanted restrictions on who can use Arachne.
- It can operate embedded in a JVM process. We do not want to force users to install anything else or run multiple processes just to get Arachne to work.
- The database must be serializable. It must be possible to write the entire configuration to disk, and then reconstitute it in the same exact state in a separate process.
- Because modules build up the schema progressively, the schema must be inherently extensible. It should be possible for modules to progressively add both new entity types and new attributes to existing entity types.
- It should be usable from Clojure without a painful impedance mismatch.
Configuration as Ontology
As an extension of the rationale discussed in ADR-002, it is useful to enumerate the possible use cases of the configuration and configuration schema together.
- The configuration is read by the application during bootstrap and controls the behavior of the application.
- The configuration schema defines what types of values the application can or will read to modify its structure and behavior at boot time and run time.
- The configuration is how an application author communicates their intent about how their application should fit together and run, at a higher, more conceptual level than code.
- The configuration schema is how module authors communicate to application authors what settings, entities and structures are available for them to use in their applications.
- The configuration schema is how module authors communicate to other potential module authors what their extension points are; module extenders can safely read or write any entities/attributes declared by the modules upon which they depend.
- The configuration schema can be used to validate a particular configuration, and explain where and how it deviates from what is actually supported.
- The configuration can be exposed (via user interfaces of various types) to end users for analytics and debugging, explaining the structure of their application and why things are the way they are.
- A serialization of the configuration, together with a particular codebase (identified by a git SHA) form a precise, complete, 100% reproducible definition of the behavior of an application.
To the extent that the configuration schema expresses and communicates the "categories of being" or "possibility space" of an application, it is a formal Ontology. This is a desirable characteristic, and to the degree that it is practical to do so, it will be useful to learn from or re-use existing work around formal ontological systems.
Implementation Options
There are instances of four broad categories of data stores that match the first three of the data store characteristics defined above.
- Relational (Derby, HSQLDB, etc)
- Key/value (BerkelyDB, hashtables, etc)
- RDF/RDFs/OWL stores (Jena)
- Datomic-style (Datascript)
We can eliminate relational solutions fairly quickly; SQL schemas are not generally extensible or flexible, failing condition #4. In addition, they do not fare well on #5 -- using SQL for queries and updates is not particularly fluent in Clojure.
Similarly, we can eliminate key/value style data stores. In general, these do not have schemas at all (or at least, not the type of rich schema that provides a meaningful data contract or ontology, which is the point for Arachne.)
This leaves solutions based on the RDF stack, and Datomic-style data stores. Both are viable options which would provide unique benefits for Arachne, and both have different drawbacks.
Explaining the core technical characteristics of RDF/OWL and Datomic is beyond the scope of this document; please see the Jena and Datomic documentation for more details. More information on RDF, OWL and the Semantic web in general:
- Wikipedia article on RDF
- Wikipedia article on OWL
- OWL Semantics standards document.
RDF
The clear choice for a JVM-based, permissively licensed, standards-compliant RDF API is Apache Jena.
Benefits for Arachne
- OWL is a good fit insofar as Arachne's goal is to define an ontology of applications. The point of the configuration schema is first and foremost to serve as unambiguous communication regarding the types of entities that can exist in an application, and what the possible relationships between them are. By definition, this is defining an ontology, and is the exact use case which OWL is designed to address.
- Information model is a good fit for Clojure: tuples and declarative logic.
- Open and extensible by design.
- Well researched by very smart people, likely to avoid common mistakes that would result from building an ontology-like system ourselves.
- Existing technology, well known beyond the Clojure ecosystem. Existing tools could work with Arachne project configurations out of the box.
- The open-world assumption is a good fit for Arachne's per-module schema modeling, since modules cannot know what other modules might be present in the application.
- We're likely to want to introduce RDFs/OWL to the application anyway, at some point, as an abstract entity meta-schema (note: this has not been firmly decided yet.)
Tradeoffs for Arachne (with mitigations)
- OWL is complex. Learning to use it effectively is a skill in its own right and it might be asking a lot to require of module authors.
- OWLs representation of some common concepts can be verbose and/or convoluted in ways that would make schema more difficult to read/write. (e.g, Restriction classes)
- OWL is not a schema. Although the open world assumption is valid and
good when writing ontologies, it means that OWL inferencing is
incapable of performing many of the kind of validations we would
want to apply once we do have a complete configuration and want to
check it for correctness. For example, open-world reasoning can
never validate a
owl:minCardinality
rule.- Mitigation: Although OWL inferencing cannot provide closed-world
validation of a given RDF dataset, such tools do exist. Some
mechanisms for validating a particular closed set of RDF triples
include:
- Writing SPARQL queries that catch various types of validation errors.
- Deriving validation errors using Jena's rules engine.
- Using an existing RDF validator such as Eyeball (although, unfortunately, Eyeball does not seem to be well maintained.)
- For Clojure, it would be possible to validate a given OWL class
by generating a specification using
clojure.spec
that could be applied to concrete instances of the class in their map form.
- Mitigation: Although OWL inferencing cannot provide closed-world
validation of a given RDF dataset, such tools do exist. Some
mechanisms for validating a particular closed set of RDF triples
include:
- Jena's API is aggressively object oriented and at odds with Clojure
idioms.
- Mitigation: Write a data-oriented wrapper (note: I have a working proof of concept already.)
- SPARQL is a string-based query language, as opposed to a composable data API.
- Mitigation: It is possible to hook into Jena's ARQ query engine at the object layer, and expose a data-oriented API from there, with SPARQL semantics but an API similar to Datomic datalog.
- OWL inferencing is known to have performance issues with complex
inferences. While Arachne configurations are tiny (as knowledge bases
go), and we are unlikely to use the more esoteric derivations, it is
unknown whether this will cause problems with the kinds of
ontologies we do need.
- Mitigation: We could restrict ourselves to the OWL DL or even OWL Lite sub-languages, which have more tractable inferencing rules.
- Jena's APIs are such that it is impossible to write an immutable version of a RDF model (at least without breaking most of Jena's API.) It's trivial to write a data-oriented wrapper, but intractable to write a persistent immutable one.
Datomic
Note that Datomic itself does not satisfy the first requirement; it is closed-source, proprietary software. There is an open source project, Datascript, which emulates Datomic's APIs (without any of the storage elements). Either one would work for Arachne, since Arachne only needs the subset of features they both support. In, fact, if Arachne goes the Datomic-inspired route, we would probably want to support both: Datomic, for those who have an existing investment there, and Datascript for those who desire open source all the way.
Benefits for Arachne
- Well known to most Clojurists
- Highly idiomatic to use from Clojure
- There is no question that it would be performant and technically suitable for Arachne-sized data.
- Datomic's schema is a real validating schema; data transacted to Datomic must always be valid.
- Datomic Schema is open and extensible.
Tradeoffs for Arachne (with mitigations)
- The expressivity of Datomic's schema is anemic compared to RDFs/OWL;
for example, it has no built-in notion of types. It is focused
towards data storage and integrity rather than defining a public
ontology, which would be useful for Arachne.
- Mitigation: If we did want something more ontologically focused, it is possible to build an ontology system on top of Datomic using meta-attributes and Datalog rules. Examples of such systems already exist.
- If we did build our own ontology system on top of Datomic (or use an
existing one) we would still be responsible for "getting it right",
ensuring that it meets any potential use case for Arachne while
maintaining internal and logical consistency.
- Mitigation: we could still use the work that has been done in the OWL world and re-implement a subset of axioms and derivations on top of Datomic.
- Any ontological system built on top of Datomic would be novel to module authors, and therefore would require careful, extensive documentation regarding its capabilities and usage.
- To satisfy users of Datomic as well as those who have a requirement
for open source, it will be necessary to abstract across both
Datomic and Datascript.
- Mitigation: This work is already done (provided users stay within the subset of features that is supported by both products.)
Decision
The steering group decided the RDF/OWL approach is too high-risk to wrap in Clojure and implement at this time, while the rewards are mostly intangible "openness" and "interoperability" rather than something that will help move Arachne forward in the short term.
Therefore, we will use a Datomic style schema for Arachne's configuration.
Users may use either Datomic Pro, Datomic Free or Datascript at runtime in their applications. We will provide a "multiplexer" configuration implementation that utilizes both, and asserts that the results are equal: this can be used by module authors to ensure they stay within the subset of features supported by both platforms.
Before Arachne leaves "alpha" status (that is, before it is declared ready for experimental production use or for the release of third-party modules), we will revisit the question of whether OWL would be more appropriate, and whether we have encountered issues that OWL would have made easier. If so, and if time allows, we reserve the option to either refactor the configuration layer to use Jena as a primary store (porting existing modules), or provide an OWL view/rendering of an ontology stored in Datomic.
Status
Proposed
Consequences
- It will be possible to write schemas that precisely define the configuration data that modules consume.
- The configuration system will be open and extensible to additional modules by adding additional attributes and meta-attributes.
- The system will not provide an ontologically oriented view of the system's data without additional work.
- Additional work will be required to validate configuration with respect to requirements that Datomic does not support natively (e.g, required attributes.)
- Every Arachne application must include either Datomic Free, Datomic Pro or Datascript as a dependency.
- We will need to keep our eyes open to look for situations where a more formal ontology system might be a better choice.
Architecture Decision Record: Module Structure & Loading
Context
Arachne needs to be as modular as possible. Not only do we want the community to be able to contribute new abilities and features that integrate well with the core and with eachother, we want some of the basic functionality of Arachne to be swappable for alternatives as well.
ADR-002 specifies that one role of modules is to contribute schema to the application config. Other roles of modules would include providing code (as any library does), and querying and updating the config during the startup process. Additionally, since modules can depend upon each other, they must specify which modules they depend upon.
Ideally there will be as little overhead as possible for creating and consuming modules.
Some of the general problems associated with plugin/module systems include:
- Finding and downloading the implementation of the module.
- Discovering and activating the correct set of installed modules.
- Managing module versions and dependencies.
There are some existing systems for modularity in the Java ecosystem. The most notable is OSGi, which provides not only a module system addressing the concerns above, but also service runtime with classpath isolation, dynamic loading and unloading and lazy activation.
OSGi (and other systems of comparable scope) are overkill for Arachne. Although they come with benefits, they are very heavyweight and carry a high complexity burden, not just for Arachne development but also for end users. Specifically, Arachne applications will be drastically simpler if (at runtime) they exist as a straightforward codebase in a single classloader space. Features like lazy loading and dynamic start-stop are likewise out of scope; the goal is for an Arachne runtime itself to be lightweight enough that starting and stopping when modules change is not an issue.
Decision
Arachne will not be responsible for packaging, distribution or downloading of modules. These jobs will be delegated to an external dependency management & packaging tool. Initially, that tool will be Maven/Leiningen/Boot, or some other tool that works with Maven artifact repositories, since that is currently the standard for JVM projects.
Modules that have a dependency on another module must specify a dependency using Maven (or other dependency management tool.)
Arachne will provide no versioning system beyond what the packaging tool provides.
Each module JAR will contain a special arachne-modules.edn
file at
the root of its classpath. This data file (when read) contains a
sequence of module definition maps.
Each module definition map contains the following information:
- The formal name of the module (as a namespaced symbol.)
- A list of dependencies of the module (as a set of namespaced symbols.) Module dependencies must form a directed acyclic graph; circular dependencies are not allowed.
- A namespace qualified symbol that resolves to the module's schema function. A schema function is a function with no arguments that returns transactable data containing the schema of the module.
- A namespace qualified symbol that resolves to the module's configure function. A configure function is a function that takes a configuration value and returns an updated configuration.
When an application is defined, the user must specify a set of module names to use (exact mechanism TBD.) Only the specified modules (and their dependencies) will be considered by Arachne. In other words, merely including a module as a dependency in the package manager is not sufficient to activate it and cause it to be used in an application.
Status
Proposed
Consequences
- Creating a basic module is lightweight, requiring only:
- writing a short EDN file
- writing a function that returns schema
- writing a function that queries and/or updates a configuration
- From a user's point of view, consuming modules will use the same familiar mechanisms as consuming a library.
- Arachne is not responsible for getting code on the classpath; that is a separate concern.
- We will need to think of a straightforward, simple way for application authors to specify the modules they want to be active.
- Arachne is not responsible for any complexities of publishing, downloading or versioning modules
- Module versioning has all of the drawbacks of the package manager's (usually Maven), including the pain of resolving conflicting versions. This situation with respect to dependency version management will be effectively the same as it is now with Clojure libraries.
- A single dependency management artifact can contain several Arachne modules (whether this is ever desirable is another question.)
- Although Maven is currently the default dependency/packaging tool for the Clojure ecosystem, Arachne is not specified to use only Maven. If an alternative system gains traction, it will be possible to package and publish Arachne modules using that.
Architecture Decision Record: User Facing Configuration
Context
Per ADR-003, Arachne uses Datomic-shaped data for configuration. Although this is a flexible, extensible data structure which is a great fit for programmatic manipulation, in its literal form it is quite verbose.
It is quite difficult to understand the structure of Datomic data by reading its native textual representation, and it is similarly hard to write, containing enough repeated elements that copying and pasting quickly becomes the default.
One of Arachne's core values is ease of use and a fluent experience for developers. Since much of a developer's interaction with Arachne will be writing to the config, it is of paramount importance that there be some easy way to create configuration data.
The question is, what is the best way for developers of Arachne applications to interact with their application's configuration?
Option: Raw Datomic Txdata
This would require end users to write Datomic transaction data by hand in order to configure their application.
This is the "simplest" option, and has the fewest moving parts. However, as mentioned above, it is very far from ideal for human interactions.
Option: Custom EDN data formats
In this scenario, users would write EDN data in some some nested structure of maps, sets, seqs and primitives. This is currently the most common way to configure Clojure applications.
Each module would then need to provide a mapping from the EDN config format to the underlying Datomic-style config data.
Because Arachne's configuration is so much broader, and defines so much more of an application than a typical application config file, it is questionable if standard nested EDN data would be a good fit for representing it.
Option: Code-based configuration
Another option would be to go in the direction of some other frameworks, such as Ruby on Rails, and have the user-facing configuration be code rather than data.
It should be noted that the primary motivation for having a data-oriented configuration language, that it makes it easier to interact with programmatically, doesn't really apply in Arachne's case. Since applications are always free to interact richly with Arachne's full configuration database, the ability to programmatically manipulate the precursor data is moot. As such, one major argument against a code-based configuration strategy does not apply.
Decision
Developers will have the option of writing configuration using either native Datomic-style, data, or code-based configuration scripts. Configuration scripts are Clojure files which, when evaluated, update a configuration stored in an atom currently in context (using a dynamically bound var.)
Configuration scripts are Clojure source files in a distinct directory that by convention is outside the application's classpath: configuration code is conceptually and physically separate from application code. Conceptually, loading the configuration scripts could take place in an entirely different process from the primary application, serializing the resulting config before handing it to the runtime application.
To further emphasize the difference between configuration scripts and
runtime code, and because they are not on the classpath, configuration
scripts will not have namespaces and will instead include each other
via Clojure's load
function.
Arachne will provide code supporting the ability of module authors to write "configuration DSLs" for users to invoke from their configuration scripts. These DSLs will emphasize making it easy to create appropriate entities in the configuration. In general, DSL forms will have an imperative style: they will convert their arguments to configuration data and immediately transact it to the context configuration.
As a trivial example, instead of writing the verbose configuration data:
{:arachne/id :my.app/server
:arachne.http.server/port 8080
:arachne.http.server/debug true}
You could write the corresponding DSL:
(server :id :my.app/server, :port 8080, :debug true)
Note that this is an illustrative example and does not represent the actual DSL or config for the HTTP module.
DSLs should make heavy use of Spec to make errors as comprehensible as possible.
Status
Proposed
Consequences
- It will be possible for end users to define their configuration without writing config data by hand.
- Users will have access to the full power of the Clojure programming language when configuring their application. This grants a great deal of power and flexibility, but also the risk of users doing inadvisable things in their config scripts (e.g, non-repeatable side effects.)
- Module authors will bear the responsibility of providing an appropriate, user-friendly DSL interface to their configuration data.
- DSLs can compose; any module can reference and re-use the DSL definitions included in modules upon which it depends.
Architecture Decision Record: Core Runtime
Context
At some point, every Arachne application needs to start; to bootstrap itself from a static project or deployment artifact, initialize what needs initializing, and begin servicing requests, connecting to databases, processing data, etc.
There are several logically inherent subtasks to this bootstrapping process, which can be broken down as follows.
- Starting the JVM
- Assembling the project's dependencies
- Building a JVM classpath
- Starting a JVM
- Arachne Specific
- Application Specific
- Instantiate user and module-defined objects that needs to exist at runtime.
- Start and stop user and module-defined services
As discussed in ADR-004, tasks in the "starting the JVM" category are not in-scope for Arachne; rather, they are offloaded to whatever build/dependency tool the project is using (usually either boot or leiningen.)
This leaves the Arachne and application-specific startup tasks. Arachne should provide an orderly, structured startup (and shutdown) procedure, and make it possible for modules and application authors to hook into it to ensure that their own code initializes, starts and stops as desired.
Additionally, it must be possible for different system components to have dependencies on eachother, such that when starting, services start after the services upon which they depend. Stopping should occur in reverse-dependency order, such that a service is never in a state where it is running but one of its dependencies is stopped.
Decision
Components
Arachne uses the Component library to manage system components. Instead of requiring users to define a component system map manually, however, Arachne itself builds one based upon the Arachne config via Configuration Entities that appear in the configuration.
Component entities may be added to the config directly by end users (via a initialization script as per ADR-005), or by modules in their configure
function (ADR-004.)
Component entities have attributes which indicates which other components they depend upon. Circular dependencies are not allowed; the component dependency structure must form a Directed Acyclic Graph (DAG.) The dependency attributes also specify the key that Component will use to assoc
dependencies.
Component entities also have an attribute that specifies a component constructor function (via a fully qualified name.) Component constructor functions must take two arguments: the configuration, and the entity ID of the component that is to be constructed. When invoked, a component constructor must return a runtime component object, to be used by the Component library. This may be any object that implements clojure.lang.Associative
, and may also optionally satisfy Component's Lifecycle
protocol.
Arachne Runtime
The top-level entity in an Arachne system is a reified Arachne Runtime object. This object contains both the Component system object, and the configuration value upon which the runtime is based. It satisfies the Lifecycle
protocol itself; when it is started or stopped, all of the component objects it contains are started or stopped in the appropriate order.
The constructor function for a Runtime takes a configuration value and some number of "roots"; entity IDs or lookup refs of Component entities in the config. Only these root components and their transitive dependencies will be instantiated or added to the Component system. In other words, only component entities that are actually used will be instantiated; unused component entities defined in the config will be ignored.
A lookup
function will be provided to find the runtime object instance of a component, given its entity ID or lookup ref in the configuraiton.
Startup Procedure
Arachne will rely upon an external build tool (such as boot or leiningen.) to handle downloading dependencies, assembling a classpath, and starting a JVM.
Once JVM with the correct classpath is running, the following steps are required to yield a running Arachne runtime:
- Determine a set of modules to use (the "active modules")
- Build a configuration schema by querying each active module using its
schema
function (ADR-004) - Update the config with initial configuration data from user init scripts (ADR-005)
- In module dependency order, give each module a chance to query and update the configuration using its
configure
function (ADR-004) - Create a new Arachne runtime, given the configuration and a set of root components.
- Call the runtime's
start
method.
The Arachne codebase will provide entry points to automatically perform these steps for common development and production scenarios. Alternatively, they can always be be executed individually in a REPL, or composed in custom startup functions.
Status
PROPOSED
Consequences
- It is possible to fully define the system components and their dependencies in an application's configuration. This is how Arachne achieves dependency injection and inversion of control.
- It is possible to explicitly create, start and stop Arachne runtimes.
- Multiple Arachne runtimes may co-exist in the same JVM (although they may conflict and fail to start if they both attempt to use a global resource such as a HTTP port)
- By specifying different root components when constructing a runtime, it is possible to run different types of Arachne applications based on the same Arachne configuration value.
Architecture Decision Record: Configuration Updates
Context
A core part of the process of developing an application is making changes to its configuration. With its emphasis on configuration, this is even more true of Arachne than with most other web frameworks.
In a development context, developers will want to see these changes reflected in their running application as quickly as possible. Keeping the test/modify cycle short is an important goal.
However, accommodating change is a source of complexity. Extra code would be required to handle "update" scenarios. Components are initialized with a particular configuration in hand. While it would be possible to require that every component support an update
operation to receive an arbitrary new config, implementing this is non-trivial and would likely need to involve conditional logic to determine the ways in which the new configuration is different from the old. If any mistakes where made in the implementation of update
, for any component, such that the result was not identical to a clean restart, it would be possible to put the system in an inconsistent, unreproducible state.
The "simplest" approach is to avoid the issue and completely discard and rebuild the Arachne runtime (ADR-006) every time the configuration is updated. Every modification to the config would be applied via a clean start, guaranteeing reproducibility and a single code path.
However, this simple baseline approach has two major drawbacks:
- The shutdown, initialization, and startup times of the entire set of components will be incurred every time the configuration is updated.
- The developer will lose any application state stored in the components whenever the config is modified.
The startup and shutdown time issues are potentially problematic because of the general increase to cycle time. However, it might not be too bad depending on exactly how long it takes sub-components to start. Most commonly-used components take only a few milliseconds to rebuild and restart. This is a cost that most Component workflows absorb without too much trouble.
The second issue is more problematic. Not only is losing state a drain on overall cycle speed, it is a direct source of frustration, causing developers to repeat the same tasks over and over. It will mean that touching the configuration has a real cost, and will cause developers to be hesitant to do so.
Prior Art
There is a library designed to solve the startup/shutdown problem, in conjunction with Component: Suspendable. It is not an ideal fit for Arachne, since it focuses on suspending and resuming the same Component instances rather than rebuilding, but its approach may be instructive.
Decision
Whenever the configuration changes, we will use the simple approach of stopping and discarding the entire old Arachne runtime (and all its components), and starting a new one.
To mitigate the issue of lost state, Arachne will provide a new protocol called Preservable
(name subject to change, pending a better one.) Components may optionally implement Preservable
; it is not required. Preservable
defines a single method, preserve
.
Whenever the configuration changes, the following procedure will be used:
- Call
stop
on the old runtime. - Instantiate the new runtime.
- For all components in the new runtime which implement
Preservable
, invoke thepreserve
function, passing it the corresponding component from the old runtime (if there is one). - The
preserve
function will selectively copy state out of the old, stopped component into the new, not-yet-started component. It should be careful not to copy any state that would be invalidated by a configuration change. - Call
start
on the new runtime.
Arachne will not provide a mitigation for avoiding the cost of stopping and starting individual components. If this becomes a pain point, we can explore solutions such as that offered by Suspendable.
Status
PROPOSED
Consequences
- The basic model for handling changes to the config will be easy to implement and reason about.
- It will be possible to develop with stateful components without losing state after a configuration change.
- Only components which need preservable state need to worry about it.
- The default behavior will prioritize correctness.
- It is possible to write a bad
preserve
method which copies elements of the old configuration. - However, because all copies are explicit, it should be easy to avoid writing bad
preserve
methods.
Architecture Decision Record: Abstract Modules
Context
One design goal of Arachne is to have modules be relatively easily swappable. Users should not be permanently committed to particular technical choices, but instead should have some flexibility in choosing their preferred tech, as long as it exists in the form of an Arachne module.
Some examples of the alternative implementations that people might wish to use for various parts of their application:
- HTTP Server: Pedestal or Ring
- Database: Datomic, an RDBMS or one of many NoSQL options.
- HTML Templating: Hiccup, Enlive, StringTemplate, etc.
- Client-side code: ClojureScript, CoffeeScript, Elm, etc.
- Authentication: Password-based, OpenID, Facebook, Google, etc.
- Emailing: SMTP, one of many third-party services.
This is only a representative sample; the actual list is unbounded.
The need for this kind of flexibility raises some design concerns:
Capability. Users should always be able to leverage the full power of their chosen technology. That is, they should not have to code to the "least common denominator" of capability. If they use Datomic Pro, for example, they should be able to write Datalog and fully utilize the in-process Peer model, not be restricted to an anemic "ORM" that is also compatible with RDBMSs.
Uniformity. At tension with capability is the desire for uniformity; where the feature set of two alternatives is not particularly distinct, it is desirable to use a common API, so that implementations can be swapped out with little or no effort. For example, the user-facing API for sending a single email should (probably) not care whether it is ultimately sent via a local Sendmail server or a third-party service.
Composition. Modules should also compose as much as possible, and they should be as general as possible in their dependencies to maximize the number of compatible modules. In this situation, it is actually desirable to have a "least common denominator" that modules can have a dependency on, rather than depending on specific implementations. For example, many modules will need to persist data and ultimately will need to work in projects that use Datomic or SQL. Rather than providing multiple versions, one for Datomic users and another for SQL, it would be ideal if they could code against a common persistence abstraction, and therefore be usable in any project with a persistence layer.
What does it mean to use a module?
The following list enumerates the ways in which it is possible to "use" a module, either from a user application or from another module. (See ADR-004).
- You can call code that the module provides (the same as any Clojure library.)
- You can extend a protocol that the module provides (the same as any Clojure library.)
- You can read the attributes defined in the module from the configuration.
- You can write configuration data using the attributes defined in the module.
These tools allow the definition of modules with many different kinds of relationships to each other. Speaking loosely, these relationships can correspond to other well-known patterns in software development including composition, mixins, interface/implementation, inheritance, etc.
Decision
In order to simultaneously meet the needs for capability, uniformity and composition, Arachne's core modules will (as appropriate) use the pattern of abstract modules.
Abstract modules define certain attributes (and possibly also corresponding init script DSLs) that describe entities in a particular domain, without providing any runtime implementation which uses them. Then, other modules can "implement" the abstract module, reading the abstract entities and doing something concrete with them at runtime, as well as defining their own more specific attributes.
In this way, user applications and dependent modules can rely either on the common, abstract module or the specific, concrete module as appropriate. Coding against the abstract module will yield a more generic "least common denominator" experience, while coding against a specific implementor will give more access to the unique distinguishing features of that particular technology, at the cost of generality.
Similar relationships should hold in the library code which modules expose (if any.) An abstract module, for example, would be free to define a protocol, intended to be implemented concretely by code in an implementing module.
This pattern is fully extensible; it isn't limited to a single level of abstraction. An abstract module could itself be a narrowing or refinement of another, even more general abstract module.
Concrete Example
As mentioned above, Arachne would like to support both Ring and Pedestal as HTTP servers. Both systems have a number of things in common:
- The concept of a "server" running on a port.
- The concept of a URL path/route
- The concept of a terminal "handler" function which receives a request and returns a response.
They also have some key differences:
- Ring composes "middleware" functions, whereas Pedestal uses "interceptor" objects
- Asynchronous responses are handled differently
Therefore, it makes sense to define an abstract HTTP module which defines the basic domain concepts; servers, routes, handlers, etc. Many dependent modules and applications will be able to make real use of this subset.
Then, there will be the two modules which provide concrete implementations; one for Pedestal, one for Ring. These will contain the code that actually reads the configuration, and at runtime builds appropriate routing tables, starts server instances, etc. Applications which wish to make direct use of a specific feature like Pedestal interceptors may freely do so, using attributes defined by the Pedestal module.
Status
PROPOSED
Consequences
- If modules or users want to program against a "lowest common denominator" abstraction, they may do so, at the cost of the ability to use the full feature set of a library.
- If modules or users want to use the full feature set of a library, they may do so, at the cost of being able to transparently replace it with something else.
- There will be a larger number of different Arachne modules available, and their relationships will be more complex.
- Careful thought and architecture will need to go into the factoring of modules, to determine what the correct general elements are.
Architecture Decision Record: Configuration Ontology
Context
In ADR-003 it was decided to use a Datomic-based configuration, the alternative being something more semantically or ontologically descriptive such as RDF+OWL.
Although we elected to use Datomic, Datomic does not itself offer much ontological modeling capacity. It has no built-in notion of types/classes, and its attribute specifications are limited to what is necessary for efficient storage and indexing, rather than expressive or validative power.
Ideally, we want modules to be able to communicate additional information about the structure and intent of their domain model, including:
- Types of entities which can exist
- Relationships between those types
- Logical constraints on the values of attributes:
- more fine grained cardinality; optional/required attributes
- valid value ranges
- target entity type (for ref attributes)
This additional data could serve three purposes:
- Documentation about the intended purpose and structure of the configuration defined by a module.
- Deeper, more specific validation of user-supplied configuration values
- Machine-readable integration point for tools which consume and produce Arachne configurations.
Decision
- We will add meta-attributes to the schema of every configuration, expressing basic ontological relationships.
- These attributes will be semantically compatible with OWL (such that we could conceivably in the future generate an OWL ontology from a config schema)
- The initial set of these attributes will be minimal, and targeted towards the information necessary to generate rich schema diagrams
- classes and superclass
- attribute domain
- attribute range (for ref attributes)
- min and max cardinality
- Arachne core will provide some (optional) utility functions for schema generation, to make writing module schemas less verbose.
Status
PROPOSED
Consequences
- Arachne schemas will reify the concept of entity type and the possible relationships between entities of various types.
- We will have an approach for adding additional semantic attributes in the future, as it makes sense to do so.
- We will not be obligated to define an entire ontology up front
- Modules usage of the defined ontology is not technically enforced. Some, (such as entity type relationships) will be the strong convention and possibly required for tool support; others (such as min and max cardinality) will be optional.
- We will preserve the possibility for interop with OWL in the future.
Architecture Decision Record: Persistent Configuration
Context
While many Arachne applications will use a transient config which is rebuilt from its initialization scripts every time an instance is started, some users might wish instead to store their config persistently in a full Datomic instance.
There are a number of possible benefits to this approach:
- Deployments from the same configuration are highly reproducible
- Organizations can maintain an immutable persistent log of configuration changes over time.
- External tooling can be used to persistently build and define configurations, up to and including full "drag and drop" architecture or application design.
Doing this introduces a number of additional challenges:
Initialization Scripts: Having a persistent configuration introduces the question of what role initialization scripts play in the setup. Merely having a persistent config does not make it easier to modify by hand - quite the opposite. While an init script could be used to create the configuration, it's not clear how they would be updated from that point (absent a full config editor UI.)
Re-running a modified configuration script on an existing configuration poses challenges as well; it would require that all scripts be idempotent, so as not to create spurious objects on subsequent runs. Also, scripts would then need to support some concept of retraction.
Scope & Naming: It is extremely convenient to use
:db.unique/identity
attributes to identify particular entities in a configuration and configuration init scripts. This is not only convenient, but required if init scripts are to be idempotent, since this is the only mechanism by which Datomic can determine that a new entity is "the same" as an older entity in the system.However, if there are multiple different configurations in the same database, there is the risk that some of these unique values might be unintentionally the same and "collide", causing inadvertent linkages between what ought to be logically distinct configurations.
While this can be mitigated in the simple case by ensuring that every config uses its own unique namespace, it is still something to keep in mind.
Configuration Copying & Versioning Although Datomic supports a full history, that history is linear. Datomic does not currently support "forking" or maintaining multiple concurrent versions of the same logical data set.
This does introduce complexities when thinking about "modifying" a configuration, while still keeping the old one. This kind of "fork" would require a deep clone of all the entities in the config, as well as renaming all of the
:db.unique/identity
attrs.Renaming identity attributes compounds the complexity, since it implies that either idents cannot be hardcoded in initialization scripts, or the same init script cannot be used to generate or update two different configurations.
Environment-specific Configuration: Some applications need slightly different configurations for different instances of the "same" application. For instance, some software needs to be told what its own IP address is. While it makes sense to put this data in the configuration, this means that there would no longer be a single configuration, but N distinct (yet 99% identical) configurations.
One solution would be to not store this data in the configuration (instead picking it up at runtime from an environment variable or secondary config file), but multiplying the sources of configuration runs counter to Arachne's overriding philosophy of putting everything in the configuration to start with.
Relationship with module load process: Would the stored configuration represent only the "initial" configuration, before being updated by the active modules? Or would it represent the complete configuration, after all the modules have completed their updates?
Both alternatives present issues.
If only the user-supplied, initial config is stored, then the usefulness of the stored config is diminished, since it does not provide a comprehensive, complete view of the configuration.
On the other hand, if the complete, post-module config is persisted, it raises more questions. What happens if the user edits the configuration in ways that would cause modules to do something different with the config? Is it possible to run the module update process multiple times on the same config? If so, how would "old" or stale module-generated values be removed?
Goals
We need a technical approach with good answers to the challenges described above, that enables a clean user workflow. As such, it is useful to enumerate the specific activities that it would be useful for a persistent config implementation to support:
- Define a new configuration from an init script.
- Run an init script on an existing configuration, updating it.
- Edit an existing configuration using the REPL.
- Edit an existing configuration using a UI.
- Clone a configuration
- Deploy based on a specific configuration
At the same time, we need to be careful not to overly complicate things for the common case; most applications will still use the pattern of generating a configuration from an init script immediately before running an application using it.
Decision
We will not attempt to implement a concrete strategy for config persistence at this time; it runs the risk of becoming a quagmire that will halt forward momentum.
Instead, we will make a minimal set of choices and observations that will enable forward progress while preserving the ability to revisit the issue of persistent configuration at some point in the future.
- The configuration schema itself should be compatible with having several configurations present in the same persistent database. Specifically:
- Each logical configuration should have its own namespace, which will be used as the namespace of all
:db.unique/identity
values, ensuring their global uniqueness. - There is a 'configuration' entity that reifies a config, its possible root components, how it was constructed, etc.
- The entities in a configuration must form a connected graph. That is, every entity in a configuration must be reachable from the base 'config' entity. This is required to have any ability to identify the config as a whole within for any purpose.
The current initial tooling for building configurations (including the init scripts) will focus on building configurations from scratch. Tooling capable of "editing" an existing configuration is sufficiently different, with a different set of requirements and constraints, that it needs its own design process.
Any future tooling for storing, viewing and editing configurations will need to explicitly determine whether it wants to work with the configuration before or after processing by the modules, since there is a distinct set of tradeoffs.
Status
PROPOSED
Consequences
- We can continue making forward progress on the "local" configuration case.
- Storing persistent configurations remains possible.
- It is immediately possible to save configurations for repeatability and debugging purposes.
- The editing of persistent configs is what will be more difficult.
- When we want to edit persistent configurations, we will need to analyze the specific use cases to determine the best way to do so, and develop tools specific to those tasks.
Architecture Decision Record: Asset Pipeline
Context
In addition to handling arbitrary HTTP requests, we would like for Arachne to make it easy to serve up certain types of well-known resources, such as static HTML, images, CSS, and JavaScript.
These "static assets" can generally be served to users as files directly, without processing at the time they are served. However, it is extremely useful to provide pre-processing, to convert assets in one format to another format prior to serving them. Examples of such transformations include:
- SCSS/LESS to CSS
- CoffeeScript to JavaScript
- ClojureScript to JavaScript
- Full-size images to thumbnails
- Compress files using gzip
Additionally, in some cases, several such transformations might be required, on the same resource. For example, a file might need to be converted from CoffeeScript to JavaScript, then minified, then gzipped.
In this case, asset transformations form a logical pipeline, applying a set of transformations in a known order to resources that meet certain criteria.
Arachne needs a module that defines a way to specify what assets are, and what transformations ought to apply and in what order. Like everything else, this system needs to be open to extension by other modules, to provide custom processing steps.
Development vs Production
Regardless of how the asset pipeline is implemented, it must provide a good development experience such that the developer can see their changes immediately. When the user modifies an asset file, it should be automatically reflected in the running application in near realtime. This keeps development cycle times low, and provides a fluid, low-friction development experience that allows developers to focus on their application.
Production usage, however, has a different set of priorities. Being able to reflect changes is less important; instead, minimizing processing cost and response time is paramount. In production, systems will generally want to do as much processing as they can ahead of time (during or before deployment), and then cache aggressively.
Deployment & Distribution
For development and simple deployments, Arachne should be capable of serving assets itself. However, whatever technique it uses to implement the asset pipeline, it should also be capable of sending the final assets to a separate cache or CDN such that they can be served statically with optimal efficiency. This may be implemented as a separate module from the core asset pipeline, however.
Entirely Static Sites
There is a large class of websites which actually do not require any dynamic behavior at all; they can be built entirely from static assets (and associated pre-processing.) Examples of frameworks that cater specifically to this type of "static site generation" include Jekyll, Middleman, Brunch, and many more.
By including the asset pipeline module, and not the HTTP or Pedestal modules, Arachne also ought to be able to function as a capable and extensible static site generator.
Decision
Arachne will use Boot to provide an abstract asset pipeline. Boot has built-in support for immutable Filesets, temp directory management, and file watchers.
As with everything in Arachne, the pipeline will be specified as pure data in the configuration, specifying inputs, outputs, and transformations explicitly.
Modules that participate in the asset pipeline will develop against a well-defined API built around Boot Filesets.
Status
PROPOSED
Consequences
- The asset pipeline will be fully specified as data in the Arachne configuration.
- Adding Arachne support for an asset transformation will involve writing a relatively straightforward wrapper adapting the library to work on boot Filesets.
- We will need to program against some of Boot's internal APIs, although Alan and Micha have suggested they would be willing to factor out the Fileset support to a separate library.
Architecture Decision Record: Enhanced Validation
Context
As much as possible, an Arachne application should be defined by its configuration. If something is wrong with the configuration, there is no way that an application can be expected to work correctly.
Therefore, it is desirable to validate that a configuration is correct to the greatest extent possible, at the earliest possible moment. This is important for two distinct reasons:
- Ease of use and developer friendliness. Config validation can return helpful errors that point out exactly what's wrong instead of deep failures with lengthy debug sessions.
- Program correctness. Some types of errors in configs might not be discovered at all during testing or development, and aggressively failing on invalid configs will prevent those issues from affecting end users in production.
There are two "kinds" of config validation.
The first is ensuring that a configuration as data is structurally correct; that it adheres to its own schema. This includes validating types and cardinalities as expressed by the Arachne's core ontology system.
The second is ensuring that the Arachne Runtime constructed from a given configuration is correct; that the runtime component instances returned by component constructors are of the correct type and likely to work.
Decision
Arachne will perform both kinds of validation. To disambiguate them (since they are logically distinct), we will term the structural/schema validation "configuration validation", while the validation of the runtime objects will be "runtime validation."
Both styles of validation should be extensible by modules, so modules can specify additional validations, where necessary.
Configuration Validation
Configuration validation is ensuring that an Arachne configuration object is consistent with itself and with its schema.
Because this is ultimately validating a set of Datomic style eavt
tuples, the natural form for checking tuple data is Datalog queries and query rules, to search for and locate data that is "incorrect."
Each logical validation will have its own "validator", a function which takes a config, queries it, and either returns or throws an exception. To validate a config, it is passed through every validator as the final step of building a module.
The set of validators is open, and defined in the configuration itself. To add new validators, a module can transact entities for them during its configuration building phase.
Runtime Validation
Runtime validation occurs after a runtime is instantiated, but before it is started. Validation happens on the component level; each component may be subject to validation.
Unlike Configuration validation, Runtime validation uses Spec. What specs should be applied to each component are defined in the configuration using a keyword-valued attribute. Specs may be defined on individual component entities, or to the type of a component entity. When a component is validated, it is validated using all the specs defined for it or any of its supertypes.
Status
PROPOSED
Consequences
- Validations have the opportunity to find errors and return clean error messages
- Both the structure of the config and the runtime instances can be validated
- The configuration itself describes how it will be validated
- Modules have complete flexibility to add new validations
- Users can write custom validations
Architecture Decision Record: Error Reporting
Context
Historically, error handling has not been Clojure's strong suit. For the most part, errors take the form of a JVM exception, with a long stack trace that includes a lot of Clojure's implementation as well as stack frames that pertain directly to user code.
Additionally, prior to the advent of clojure.spec
, Clojure errors were often "deep": a very generic error (like a NullPointerException) would be thrown from far within a branch, rather than eagerly validating inputs.
There are Clojure libraries which make an attempt to improve the situation, but they typically do it by overriding Clojure's default exception printing functions across the board, and are sometimes "lossy", dropping information that could be desirable to a developer.
Spec provides an opportunity to improve the situation across the board, and with Arachne we want to be on the leading edge of providing helpful error messages that point straight to the problem, minimize time spent trying to figure out what's going on, and let developers get straight back to working on what matters to them.
Ideally, Arachne's error handling should exhibit the following qualities:
- Never hide possibly relevant information.
- Allow module developers to be as helpful as possible to people using their tools.
- Provide rich, colorful, multi-line detailed explanations of what went wrong (when applicable.)
- Be compatible with existing Clojure error-handling practices for errors thrown from libraries that Arachne doesn't control.
- Not violate expectations of experienced Clojure programmers.
- Be robust enough not to cause additional problems.
- Not break existing logging tools for production use.
Decision
We will separate the problems of creating rich exceptions, and catching them and displaying them to the user.
Creating Errors
Whenever a well-behaved Arachne module needs to report an error, it should throw an info-bearing exception. This exception should be formed such that it is handled gracefully by any JVM tooling; the message should be terse but communicative, containing key information with no newlines.
However, in the ex-data
, the exception will also contain much more detailed information, that can be used (in the correct context) to provide much more detailed or verbose errors. Specifically, it may contain the following keys:
:arachne.error/message
- The short-form error message (the same as the Exception message.):arachne.error/explanation
- a long-form error message, complete with newlines and formatting.:arachne.error/suggestions
- Zero or more suggestions on how the error might be fixed.:arachne.error/type
- a namespaced keyword that uniquely identifies the type of error.:arachne.error/spec
- The spec that failed (if applicable):arachne.error/failed-data
- The data that failed to match the spec (if applicable):arachne.error/explain-data
- An explain-data for the spec that failed (if applicable).:arachne.error/env
- A map of the locals in the env at the time the error was thrown.
Exceptions may, of course, contain additional data; these are the common keys that tools can use to more effectively render errors.
There will be a suite of tools, provided with Arachne's core, for conveniently generating errors that match this pattern.
Displaying Errors
We will use a pluggable "error handling system", where users can explicitly install an exception handler other than the default.
If the user does not install any exception handlers, errors will be handled the same way as they are by default (usually, dumped with the message and stack trace to System/err
.) This will not change.
However, Arachne will also provide a function that a user can invoke in their main process, prior to doing anything else. Invoking this function will install a set of default exception handlers that will handle errors in a richer, more Arachne-specific way. This includes printing out the long-form error, or even (eventually) popping open a graphical data browser/debugger (if applicable.)
Status
PROPOSED
Consequences
- Error handling will follow well-known JVM patterns.
- If users want, they can get much richer errors than baseline exception handling.
- The "enhanced" exception handling is optional and will not be present in production.
Architecture Decision Record: Project Templates
Context
When starting a new project, it isn't practical to start completely from scratch, every time. We would like to have a varity of "starting point" projects, for different purposes.
Lein templates
In the Clojure space, Leiningen Templates fill this purpose. These are sets of special string-interpolated files that are "rendered" into a working project using special tooling.
However, they have two major drawbacks:
- They only work when using Leiningen as a build tool.
- The template files are are not actually valid source files, which makes them difficult to maintain. Changes need to be manually copied over to the templates.
Rails templates
Rails also provides a complete project templating solution. In Rails, the project template is a template.rb
file which contains DSL forms that specify operations to perform on a fresh project. These operations include creating files, modifying a projects dependencies, adding Rake tasks, and running specific generators.
Generators are particularly interesting, because the idea is that they can generate or modify stubs for files pertaining to a specific part of the application (e.g, a new model or a new controller), and they can be invoked at any point, not just initial project creation.
Decision
To start with, Arachne templates will be standard git repositories containing an Arachne project. They will use no special syntax, and will be valid, runnable projects out of the box.
In order to allow users to create their own projects, these template projects will include a rename
script. The rename
script will recursively rename an entire project directory to something that the user chooses, and will delete .git
and re-run git init
,
Therefore, the process to start a new Arachne project will be:
- Choose an appropriate project template.
- Clone its git repository from Github
- Run the
rename
script to rename the project to whatever you wish - Start a repl, and begin editing.
Maven Distribution
There are certain development environments where there is not full access to the open internet (particularly in certain governmental applications.) Therefore, accessing GitHub can prove difficult. However, in order to support developers, these organizations often run their own Maven mirrors.
As a convenience to users in these situations, when it is necessary, we can build a wrapper that can compress and install a project directory as a Maven artifact. Then, using standard Maven command line tooling, it will be possible to download and decompress the artifact into a local filesystem directory, and proceed as normal.
Status
PROPOSED
Consequences
- It will take only a few moments for users to create new Arachne projects.
- It will be straightforward to build, curate, test and maintain multiple different types of template projects.
- The only code we will need to write to support templates is the "rename" script.
- The rename script will need to be capable of renaming all the code and files in the template, with awareness of the naming requirements and conventions for Clojure namespaces and code.
- Template projects themselves can be built continuously using CI
Contrast with Rails
One way that this approach is inferior to Rails templates is that this approach is "atomic"; templating happens once, and it happens for the whole project. Rails templates can be composed of many different generators, and generators can be invoked at any point over a project's lifecycle to quickly stub out new functionality.
This also has implications for maintenance; because Rails generators are updated along with each Rails release, the template itself is more stable, wheras Arachne templates would need to be updated every single time Arachne itself changes. This imposes a maintenance burden on templates maintained by the core team, and risks poor user experience for users who find and try to use an out-of-date third-party template.
However, there is is mitigating difference between Arachne and Rails, which relates directly to the philosophy and approach of the two projects.
In Rails, the project is the source files, and the project directory layout. If you ask "where is a controller?", you can answer by pointing to the relevant *.rb
file in the app/controllers
directory. So in Rails, the task "create a new controller" is equivalent to creating some number of new files in the appropriate places, containing the appropriate code. Hence the importance of generators.
In Arachne, by contrast, the project is not ultimately defined by its source files and directory structure; it is defined by the config. Of course there are source files and a directory structure, and there will be some conventions about how to organize them, but they are not the very definition of a project. Instead, a project's Configuration is the canonical definition of what a project is and what it does. If you ask "where is a controller?" in Arachne, the only meaningful answer is to point to data in the configuration. And the task "create a controller" means inserting the appropriate data into the config (usually via the config DSL.)
As a consequence, Arachne can focus less on code generation, and more on generating config data. Instead of providing a code generator which writes source files to the project structure, Arachne can provide config generators which users can invoke (with comparable effort) in their config scripts.
As such, Arachne templates will typically be very small. In Arachne, code generation is an antipattern. Instead of making it easy to generate code, Arachne focuses on building abstractions that let users specify their intent directly, in a terse manner.
Architecture Decision Record: Data Abstraction Model
Context
Most applications need to store and manipulate data. In the current state of the art in Clojure, this is usually done in a straightforward, ad-hoc way. Users write schema, interact with their database, and parse data from user input into a persistence format using explicit code.
This is acceptable, if you're writing a custom, concrete application from scratch. But it will not work for Arachne. Arachne's modules need to be able to read and write domain data, while also being compatible with multiple backend storage modules.
For example a user/password based authentication module needs to be able to read and write user records to the application database, and it should work whether a user is using a Datomic, SQL or NoSQL database.
In other words, Arachne cannot function well in a world in which every module is required to interoperate directly against one of several alternative modules. Instead, there needs to be a way for modules to "speak a common language" for data manipulation and persistence.
Other use cases
Data persistence isn't the only concern. There are many other situations where having a common, abstract data model is highly useful. These include:
- quickly defining API endpoints based on a data model
- HTML & mobile form generation
- generalized data validation tools
- unified administration & metrics tools
Modeling & Manipulation
There are actually two distinct concepts at play; data modeling and data manipulation.
Modeling is the activity of defining the abstract shape of the data; essentially, it is writing schema, but in a way that is not specific to any concrete implementation. Modules can then use the data model to generate concrete schema, generate API endpoints, forms, validate data, etc.
Manipulation is the activity of using the model to create, read update or delete actual data. For an abstract data manipulation layer, this generally means a polymorphic API that supports some common set of implementations, which can be extended to concrete CRUD operations
Existing solutions: ORMs
Most frameworks have some answer to this problem. Rails has ActiveRecord, Elixir has Ecto, old-school Java has Hibernate, etc. In every case, they try to paper over what it looks like to access the actual database, and provide an idiomatic API in the language to read and persist data. This language-level API is uniformly designed to make the database "easy" to use, but also has the effect of providing a common abstraction point for extensions.
Unfortunately, ORMs also exhibit a common set of problems. By their very nature, they are an extra level of indirection. They provide abstraction, but given how complex databases are the abstraction is always "leaky" in significant ways. Using them effectively requires a thorough understanding not only of the ORM's APIs, but also the underlying database implementation, and what the ORM is doing to map the data from one format to another.
ORMs are also tied more or less tightly to the relational model. Attempts to extend ActiveRecord (for example) to non-relational data stores have had varying levels of success.
Database "migrations"
One other function is to make sure that the concrete database schema matches the abstract data model that the application is using. Most ORMs implement this using some form of "database migrations", which serve as a repeatable series of all changes made to a database. Ideally, these are not redundant with the abstract data model, to avoid repeating the same information twice and also to ensure consistency.
Decision
Arachne will provide a lightweight model for data abstraction and persistence, oriented around the Entity/Attribute mode. To avoid word salad and acronyms loaded with baggage and false expectations, we will give it a semantically clean name. We will be free to define this name, and set expectations around what it is and how it is to be used. I suggest "Chimera", as it is in keeping with the Greek mythology theme and has several relevant connotations.
Chimera consists of two parts:
- An entity model, to allow application authors to easily specify the shape of their domain data in their Arachne configuration.
- A set of persistence operations, oriented around plain Clojure data (maps, sets and vectors) that can be implemented meaningfully against multiple types of adapters. Individual operations are granular and can be both consumed and provided á la carte; adapters that don't support certain behaviors can omit them (at the cost of compatibility with modules that need them.)
Although support for any arbitrary database cannot be guaranteed, the persistence operations are designed to support a majority of commonly used systems, including relational SQL databases, document stores, tuple stores, Datomic, or other "NoSQL" type systems.
At the data model level, Chimera should be a powerful, easy to use way to specify the structure of your data, as data. Modules can then read this data and expose new functionality driven by the application domain model. It needs to be flexible enough that it can be "projected" as schema into diverse types of adapters, and customizable enough that it can be configured to adapt to existing database installations.
Adapters
Chimera Adapters are Arachne modules which take the abstract data structures and operations defined by Chimera, and extend them to specific databases or database APIs such as JDBC, Datomic, MongoDB, etc.
When applicable, there can also be "abstract adapters" that do the bulk of the work of adapting Chimera to some particular genre of database. For example, most key/value stores have similar semantics and core operations: there will likely be a "Key/Value Adapter" that does the bulk of the work for adapting Chimera's operations to key/value storage, and then several thin concrete adapters that implement the actual get/put commands for Cassandra, DynamoDB, Redis, etc.
Limitations and Drawbacks
Chimera is designed to make a limited set of common operations possible to write generically. It is not and cannot ever be a complete interface to every database. Application developers can and should understand and use the native APIs of their selected database, or use a dedicated wrapper module that exposes the full power of their selected technology. Chimera represents only a single dimension of functionality; the entity/attribute model. By definition, it cannot provide access to the unique and powerful features that different databases provide and which their users ought to leverage.
It is also important to recognize that there are problems (even problems that modules might want to tackle) for which Chimera's basic entity/attribute model is simply not a good fit. If the entity model isn't a good fit, <u>do not use</u> Chimera. Instead, find (or write) an Arachne module that defines a data modeling abstraction better suited for the task at hand.
Examples of applications that might not be a good fit for Chimera include:
- Extremely sparse or "wide" data
- Dynamic data which cannot have pre-defined attributes or structure
- Unstructured heterogeneous data (such as large binary or sampling data)
- Data that cannot be indexed and requires distributed or streaming data processing to handle effectively
Modeling
The data model for an Arachne application is, like everything else, data in the Configuration. Chimera defines a set of DSL forms that application authors can use to define data models programmatically, and of course modules can also read, write and modify these definitions as part of their normal configuration process.
Note: The configuration schema, including the schema for the data model, is itself defined using Chimera. This requires some special bootstrapping in the core module. It also implies that Arachne core has a dependency on Chimera. This does not mean that modules are required to use Chimera or that Chimera has some special status relative to other conceivable data models; it just means that it is a good fit for modeling the kind of data that needs to be stored in the configuration.
Modeling: Entity Types
Entity types are entities that define the structure and content for a domain entity. Entity types specify a set of optional and required attributes that entities of that type must have.
Entity types may have one or more supertypes. Semantically, supertypes imply that any entity which is an instance of the subtype is also an instance of the supertype. Therefore, the set of attributes that are valid or required for an entity are the attributes of its types and all ancestor types.
Entity types define only data structures. They are not objects or classes; they do not define methods or behaviors.
In addition to defining the structure of entities themselves, entity types can have additional config attributes that serve as implementation-specific hints. For example, an entity type could have an attribute to override the name of the SQL table used for persistence. This config attribute would be defined and used by the SQL module, not by Chimera itself.
The basic attributes of the entity type, as defined by Chimera, are:
- The name of the type (as a namespace-qualified keyword)
- Any supertypes it may have
- What attributes can be applied to entities of this type
Attribute Definitions
Attribute Definition entities define what types of values can be associated with an entity. They specify:
- The name of the attribute (as a namespace-qualified keyword)
- The min and max cardinality of an attribute (thereby specifying whether it is required or optional)
- The type of allowed values (see the section on Value Types below)
- Whether the attribute is a key. The values of a key attribute are expected to be globally unique, guaranteed to be present, and serve as a way to find specific entities, no matter what the underlying storage mechanism.
- Whether the attribute is indexed. This is primarily a hint to the underlying database implementation.
Like entity types, attribute definitions may have any number of additional attributes, to modify behavior in an implementation-specific way.
Value Types
The value of an attribute may be one of three types:
A reference is a value that is itself an entity. The attribute must specify the entity type of the target entity.
A component is a reference, with the added semantic implication that the value entity is a logical "part" of the parent entity. It will be retrieved automatically, along with the parent, and will also be deleted/retracted along with the parent entity.
A primitive is a simple, atomic value. Primitives may be one of several defined types, which map more or less directly to primitive types on the JVM:
- Boolean (JVM
java.lang.Boolean
) - String (JVM
java.lang.String
) - Keyword (Clojure
clojure.lang.Keyword
) - 64 bit integer (JVM
java.lang.Long
) - 64 bit floating point decimal (JVM
java.lang.Double
) - Arbitrary precision integer (JVM
java.math.BigInteger
) - Arbitrary precision decimal (JVM
java.math.BigDecimal
) - Instant (absolute time with millisecond resolution) (JVM
java.util.Date
) - UUID (JVM
java.util.UUID
) - Bytes (JVM byte array). Since not all storages support binary data, and might need to serialize it with base64, this should be fairly small.
This set of primitives represent a reasonable common denominator that is supportable on most target databases. Note that the set is not closed: modules can specify new primitive types that are logically "subtypes" of the generic primitives. Entirely new types can also be defined (with the caveat that they will only work with adapters for which an implementation has been defined.)
- Boolean (JVM
Validation
All attribute names are namespace-qualified keywords. If there are specs registered using those keywords, they can be used to validate the corresponding values.
Clojure requires that a namespace be loaded before the specs defined in it are globally registered. To ensure that all relevant specs are loaded before an application runs, Chimera provides config attributes that specify namespaces containing specs. Arachne will ensure that these namespaces are loaded first, so module authors can ensure that their specs are loaded before they are needed.
Chimera also provides a generate-spec
operation which programmatically builds a spec for a given entity type, that can validate a full entity map of that type.
Schema & Migration Operations
In order for data persistence to actually work, the schema of a particular database instance (at least, for those that have schema) needs to be compatible with the application's data model, as defined by Chimera's entity types and attributes.
See ADR-16 for an in-depth discussion of database migrations work, and the ramifications for how a Chimera data model is declared in the configuration.
Entity Manipulation
The previous section discussed the data model, and how to define the general shape and structure of entities in an application. Entity manipulation refers to how the operations available to create, read, update, delete specific instances of those entities.
Data Representation
Domain entities are represented, in application code, as simple Clojure maps. In their function as Chimera entities, they are pure data; not objects. They are not required to support any additional protocols.
Entity keys are restricted to being namespace-qualified keywords, which correspond with the attribute names defined in configuration (see Attribute Definitions above). Other keys will be ignored in Chimera's operations. Values may be any Clojure value, subject to spec validation before certain operations.
Cardinality-many attributes must use a Clojure sequence, even if there is only one value.
Reference values are represented in one of two ways; as a nested map, or as a lookup reference.
Nested maps are straightforward. For example:
{:myapp.person/id 123
:myapp.person/name "Bill"
:myapp.person/friends [{:myapp.person/id 42
:myapp.person/name "Joe"}]}
Lookup references are special values that identify an attribute (which must be a key) and value to indicate the target reference. Chimera provides a tagged literal specifially for lookup references.
{:myapp.person/id 123
:myapp.person/name "Bill"
:myapp.person/friends [#chimera.key[:myapp.person/id 42]]}
All Chimera operations that return data should use one of these representations.
Both representations are largely equivalent, but there is an important note about passing nested maps to persistence operations: the intended semantics for any nested maps must be the same as the parent map. For example, you cannot call create
and expect the top-level entity to be created while the nested entity is updated.
Entities do not need to explicitly declare their entity type. Types may be derived from inspecting the set of keys and comparing it to the Entity Types defined in the configuration.
Persistence Operations
The following basic operations are defined:
get
- Given an attribute name and value, return a set of matching entity maps from the database. Results are not guaranteed to be found unless the attribute is indexed. Results may be truncated if there are more than can be efficiently returned.create
- Given a full entity map, transactionally store it in the database. Adapters may throw an error if an entity with the same key attribute and value already exists.update
- Given a map of attributes and values update each of the attributes provided attributes to have new values. The map must contain at least one key attribute. Also takes a set of attribute names which will be deleted/retracted from the entity. Adapters may throw an error if no entity exists for the given key.delete
- Given a key attribute and a value, remove the entity and all its attributes and components.
All these operations should be transactional if possible. Adapters which cannot provide transactional behavior for these operations should note this fact clearly in their documentation, so their users do not make false assumptions about the integrity of their systems.
Each of these operations has its own protocol which may be required by modules, or satisfied by adapters à la carte. Thus, a module that does not require the full set of operations can still work with an adapter, as long as it satisfies the operations that it does need.
This set of operations is not exhaustive; other modules and adapters are free to extend Chimera and define additional operations, with different or stricter semantics. These operations are those that it is possible to implement consistently, in a reasonably performant way, against a "broad enough" set of very different types of databases.
To make it possible for them to be composed more flexibly, operations are expressed as data, not as direct methods.
Capability Model
Adapters must specify a list of what operations they support. Modules should validate this list at runtime, to ensure the adapter works with the operations that they require.
In addition to specifying whether an operation is supported or not, adapters must specify whether they support the operation idempotently and/or transactionally.
Status
PROPOSED
Consequences
- Users and modules can define the shape and structure of their domain data in a way that is independent of any particular database or type of database.
- Modules can perform basic data persistence tasks in a database-agnostic way.
- Modules will be restricted to a severely limited subset of data persistence functionality, relative to using any database natively.
- The common data persistence layer is optional, and can be easily bypassed when it is not a good fit.
- The set of data persistence operations is open for extension.
- Because spec-able namespaced keywords are used pervasively, it will be straightforward to leverage Spec heavily for validation, testing, and seed data generation.
Architecture Decision Record: Database Migrations
Context
In general, Arachne's philosophy embraces the concepts of immutability and reproducibility; rather than changing something, replace it with something new. Usually, this simplifies the mental model and reduces the number of variables, reducing the ways in which things can go wrong.
But there is one area where this approach just can't work: administering changes to a production database. Databases must have a stable existence across time. You can't throw away all your data every time you want to make a change.
And yet, some changes in the database do need to happen. Data models change. New fields are added. Entity relationships are refactored.
The challenge is to provide a way to provide measured, safe, reproducible change across time which is also compatible with Arachne's target of defining and describing all relevant parts of an application (including it's data model (and therefore schema)) in a configuration.
Compounding the challenge is the need to build a system that can define concrete schema for different types of databases, based on a common data model (such as Chimera's, as described in ADR-15.)
Prior Art
Several systems to do this already exist. The best known is probably Rails' Active Record Migrations, which is oriented around making schema changes to a relational database.
Another solution of interest is Liquibase, a system which reifies database changes as data and explicitly applies them to a relation database.
Scenarios
There are a variety of "user stories" to accomodate. Some examples include:
- You are a new developer on a project, and want to create a local database that will work with the current HEAD of the codebase, for local development.
- You are responsible for the production deployment of your project, and your team has a new software version ready to go, but it requires some new fields to be added to the database before the new code will run.
- You want to set up a staging environment that is an exact mirror of your current production system.
- You and a fellow developer are merging your branches for different features. You both made different changes to the data model, and you need to be sure they are compatible after the merge.
- You recognize that you made a mistake earlier in development, and stored a currency value as a floating point number. You need to create a new column in the database which uses a fixed-point type, and copy over all the existing values, using rounding logic that you've agreed on with domain experts.
Decision
Chimera will explicitly define the concept of a migration, and reify migrations as entities in the configuration.
A migration represents an atomic set of changes to the schema of a database. For any given database instance, either a migration has logically been applied, or it hasn't. Migrations have unique IDs, expressed as namespace-qualified keywords.
Every migration has one or more "parent" migrations (except for a single, special "initial" migration, which has no parent). A migration may not be applied to a database unless all of its parents have already been applied.
Migrations are also have a signature. The signature is an MD5 checksum of the actual content of the migration as it is applied to the database (whether that be txdata for Datomic, a string for SQL DDL, a JSON string, etc.) This is used to ensure that a migration is not "changed" after it has already been applied to some persistent database.
Adapters are responsible for exposing an implementation of migrations (and accompanying config DSL) that is appropriate for the database type.
Chimera Adapters must additionally satisfy two runtime operations:
has-migration?
- takes ID and signature of a particular migration, and returns true if the migration has been successfully applied to the database. This implies that databases must be "migration aware" and store the IDs/signatures of migrations that have already been applied.migrate
- given a specific migration, run the migration and record that the migration has been applied.
Migration Types
There are four basic types of migrations.
- Native migrations. These are instances of the migration type directly implemented by a database adapter, and are specific to the type of DB being used. For example, a native migration against a SQL database would be implemented (primarily) via a SQL string. A native migration can only be used by adapters of the appropriate type.
- Chimera migrations. These define migrations using Chimera's entity/attribute data model. They are abstract, and should work against multiple different types of adapters. Chimera migrations should be supported by all Chimera adapters.
- Sentinel migrations. These are used to coordinate manual changes to an existing database with the code that requires them. They will always fail to automatically apply to an existing database: the database admin must add the migration record explicitly after they perform the manual migration task. (Note, actually implementing these can be deferred until if or when they are needed).
Structure & Usage
Because migrations may have one or more parents, migrations form a directed acyclic graph.
This is appropriate, and combines well with Arachne's composability model. A module may define a sequence of migrations that build up a data model, and extending modules can branch from any point to build their own data model that shares structure with it. Modules may also depend upon a chain of migrations specified in two dependent modules, to indicate that it requires both of them.
In the configuration, a Chimera database component may depend on any number of migration components. These migrations, and all their ancestors, form a "database definition", and represent the complete schema of a concrete database instance (as far as Chimera is concerned.)
When a database component is started and connects to the underlying data store, it verifies that all the specifies migrations have been applied. If they have not, it fails to start. This guarantees the safety of an Arachne system; a given application simply will not start if it is not compatible with the specified database.
Parallel Migrations
This does create an opportunity for problems: if two migrations which have no dependency relatinship ("parallel migrations") have operations that are incompatible, or would yield different results depending on the order in which they are applied, then these operations "conflict" and applying them to a database could result in errors or non-deterministic behavior.
If the parallel migrations are both Chimera migrations, then Arachne is aware of their internal structure and can detect the conflict and refuse to start or run the migrations, before it actually touches the database.
Unfortunately, Arachne cannot detect conflicting parallel migrations for other migration types. It is the responsibility of application developers to ensure that parallel migrations are logically isolate and can coexist in the same database without conflict.
Therefore, it is advisable in general for public modules to only use Chimera migrations. In addition to making them as broadly compatible as possible, and will also make it more tractable for application authors to avoid conflicting parallel migrations, since they only have to worry about those that they themselves create.
Chimera Migrations & Entity Types
One drawback of using Chimera migrations is that you cannot see a full entity type defined in one place, just from reading a config DSL script. This cannot be avoided: in a real, living application, entities are defined over time, in many different migrations as the application grows, not all at once. Each Chimera migration contains only a fragment of the full data model.
However, this poses a usability problem; both for developers, and for machine consumption. There are many reasons for developers or modules to view or query the entity type model as a "point in time" snapshot, rather than just a series of incremental changes.
To support this use case, the Chimera module creates a flat entity type model for each database by "rolling up" the individual Chimera entity definition forms into a single, full data structure graph. This "canonical entity model" can then be used to render schema diagrams for users, or be queried by other modules.
Applying Migrations
When and how to invoke an Adapter's migrate
function is not defined, since different teams will wish to do it in different ways.
Some possibilities include:
- The application calls "migrate" every time it is started (this is only advisable if the database has excellent support for transactional and atomic migrations.) In this scenario, developers only need to worry about deploying the code.
- The devops team can manually invoke the "migrate" function for each new configuration, prior to deployment.
- In a continuous-deployment setup, a CI server could run a battery of tests against a clone of the production database and invoke "migrate" automatically if they pass.
- The development team can inspect the set of migrations and generate a set of native SQL or txdata statements for handoff to a dedicated DBA team for review and commit prior to deployment.
Databases without migrations
Not every application wants to use Chimera's migration system. Some situations where migrations may not be a good fit include:
- You prefer to manage your own database schema.
- You are working with an existing database that predates Arachne.
- You need to work with a database administered by a separate team.
However, you still may wish to utilize Chimera's entity model, and leverage modules that define Chimera migrations.
To support this, Chimera allows you to (in the configuration) designate a database component as "assert-only". Assert-only databases never have migrations applied, and they do not require the database to track any concept of migrations. Instead, they inspect the Chimera entity model (after rolling up all declared migrations) and assert that the database already has compatible schema installed. If it does, everything starts up as normal; if it does not, the component fails to start.
Of course, the schema that Chimera expects most likely will not be an exact match for what is present in the database. To accomodate this, Chimera adapters defines a set of override configuration entities (and accompanying DSL). Users can apply these overrides to change the behavior of the mappings that Chimera uses to query and store data.
Note that Chimera Overrides are incompatible with actually running migrations: they can be used only on an "assert-only" database.
Migration Rollback
Generalized rollback of migrations is intractable, given the variety of databases Chimera intends to support. Use one of the following strategies instead:
- For development and testing, be constantly creating and throwing away new databases.
- Back up your database before running a migration
- If you can't afford the downtime or data loss associated with restoring a backup, manually revert the changes from the unwanted migration.
Status
PROPOSED
Consequences
- Users can define a data model in their configuration
- The data model can be automatically reflected in the database
- Data model changes are explicitly modeled across time
- All migrations, entity types and schema elements are represented in an Arachne app's configuration
- Given the same configuration, a database built using migrations can be reliably reproduced.
- A configuration using migrations will contain an entire, perfectly reproducible history of the database.
- Migrations are optional, and Chimera's data model can be used against existing databases
Architecture Decision Record: Simplification of Chimera Model
Note: this ADR supersedes some aspects of ADR-15 and ADR-16.
Context
The Chimera data model (as described in ADR-15 and ADR-16) includes the concepts of entity types in the domain data model: a defined entity type may have supertypes, and inherits all the attributes of a given supertype
This is quite expressive, and is a good fit for certain types of data stores (such as Datomic, graph databases, and some object stores.) It makes it possible to compose types, and re-use attributes effectively.
However, it leads to a number of conceptual problems, as well as implementation complexities. These issues include but are not limited to:
- There is a desire for some types to be "abstract", in that they exist purely to be extended and are not intented to be reified in the target database (e.g, as a table.) In the current model it is ambiguous whether this is the case or not.
- A singe
extend-type
migration operation may need to create multiple columns in multiple tables, which some databases do not support transactionally. - When doing a lookup by attribute that exists in multiple types, it is ambiguous which type is intended.
- In a SQL database, how to best model an extended type becomes ambiguous: copying the column leads to "denormalization", which might not be desired. On the other hand, creating a separate table for the shared columns leads to more complex queries with more joins.
All of these issues can be resolved or worked around. But they add a variable amount of complexity cost to every Chimera adapter, and create a domain with large amounts of ambigous behavior that must be resolved (and which might not be discovered until writing a particular adapter.)
Decision
The concept of type extension and attribute inheritance does not provide benefits proportional to the cost.
We will remove all concept of supertypes, subtypes and attribute inheritance from Chimera's data model.
Chimera's data model will remain "flat". In order to achieve attribute reuse for data stores for which that is idiomatic (such as Datomic), multiple Chimera attributes can be mapped to a single DB-level attribute in the adapter mapping metadata.
Status
PROPOSED
Consequences
- Adapters will be significantly easier to implement.
- An attribute will need to be repeated if it is present on different domain entity types, even if it is semantically similar.
- Users may need to explicitly map multiple Chimera attributes back to the same underlying DB attr/column if they want to maintain an idiomatic data model for their database.