Architecture Decision Record: Use ADRs

Context

Arachne has several very explicit goals that make the practice and discipline of architecture very important:

Decision

We will document every architecture-level decision for Arachne and its core modules with an Architecture Decision Record. These are a well structured, relatively lightweight way to capture architectural proposals. They can serve as an artifact for discussion, and remain as an enduring record of the context and motivation of past decisions.

The workflow will be:

  1. A developer creates an ADR document outlining an approach for a particular question or problem. The ADR has an initial status of "proposed."
  2. The developers and steering group discuss the ADR. During this period, the ADR should be updated to reflect additional context, concerns raised, and proposed changes.
  3. Once consensus is reached, ADR can be transitioned to either an "accepted" or "rejected" state.
  4. Only after an ADR is accepted should implementing code be committed to the master branch of the relevant project/module.
  5. If a decision is revisited and a different conclusion is reached, a new ADR should be created documenting the context and rationale for the change. The new ADR should reference the old one, and once the new one is accepted, the old one should (in its "status" section) be updated to point to the new one. The old ADR should not be removed or otherwise modified except for the annotation pointing to the new ADR.

Status

Accepted

Consequences

  1. Developers must write an ADR and submit it for review before selecting an approach to any architectural decision -- that is, any decision that affects the way Arachne or an Arachne application is put together at a high level.
  2. We will have a concrete artifact around which to focus discussion, before finalizing decisions.
  3. If we follow the process, decisions will be made deliberately, as a group.
  4. The master branch of our repositories will reflect the high-level consensus of the steering group.
  5. We will have a useful persistent record of why the system is the way it is.

Architecture Decision Record: Configuration

Context

Arachne has a number of goals.

  1. It needs to be modular. Different software packages, written by different developers, should be usable and swappable in the same application with a minimum of effort.

  2. Arachne applications need to be transparent and introspectable. It should always be as clear as possible what is going on at any given moment, and why the application is behaving in the way it does.

  3. As a general-purpose web framework, it needs to provide a strong set of default settings which are also highly overridable, and configurable to suit the unique needs of users.

Also, it is a good development practice (particularly in Clojure) to code to a specific information model (that is, data) rather than to particular functions or APIs. Along with other benefits, this helps separate (avoids "complecting") the intended operation and its implementation.

Documenting the full rationale for this "data first" philosophy is beyond the scope of this document, but some resources that explain it (among other things) are:

Finally, one weakness of many existing Clojure libraries, especially web development libraries, is the way in which they overload the Clojure runtime (particularly vars and reified namespaces) to store information about the webapp. Because both the Clojure runtime and many web application entities (e.g servers) are stateful, this causes a variety of issues, particularly with reloading namespaces. Therefore, as much as possible, we would like to avoid entangling information about an Arachne application with the Clojure runtime itself.

Decision

Arachne will take the "everything is data" philosophy to its logical extreme, and encode as much information about the application as possible in a single, highly general data structure. This will include not just data that is normally thought of as "config" data, but the structure and definition of the application itself. Everything that does not have to be arbitrary executable code will be reflected in the application config value.

Some concrete examples include (but are not limited to):

This configuration value will have a schema that defines what types of entities can exist in the configuration, and what their expected properties are.

Each distinct module will have the ability to contribute to the schema and define entity types specific to its own domain. Modules may interact by referencing entity types and properties defined in other modules.

Although it has much in common with a fully general in-memory database, the configuration value will be a single immutable value, not a stateful data store. This will avoid many of the complexities of state and change, and will eliminate the temptation to use the configuration itself as dynamic storage for runtime data.

Status

Proposed

Consequences

Architecture Decision Record: Datomic-based Configuration

Context

ADR-002 indicates that we will store the entire application config in a single rich data structure with a schema.

Config as Database

This implies that it should be possible to easily search, query and update the configuration value. It also implies that the configuration value is general enough to store arbitrary data; we don't know what kinds of things users or module authors will need to include.

If what we need is a system that allows you to define, query, and update arbitrary data with a schema, then we are looking for a database.

Required data store characteristics:

  1. It must be available under a permissive open source license. Anything else will impose unwanted restrictions on who can use Arachne.
  2. It can operate embedded in a JVM process. We do not want to force users to install anything else or run multiple processes just to get Arachne to work.
  3. The database must be serializable. It must be possible to write the entire configuration to disk, and then reconstitute it in the same exact state in a separate process.
  4. Because modules build up the schema progressively, the schema must be inherently extensible. It should be possible for modules to progressively add both new entity types and new attributes to existing entity types.
  5. It should be usable from Clojure without a painful impedance mismatch.

Configuration as Ontology

As an extension of the rationale discussed in ADR-002, it is useful to enumerate the possible use cases of the configuration and configuration schema together.

To the extent that the configuration schema expresses and communicates the "categories of being" or "possibility space" of an application, it is a formal Ontology. This is a desirable characteristic, and to the degree that it is practical to do so, it will be useful to learn from or re-use existing work around formal ontological systems.

Implementation Options

There are instances of four broad categories of data stores that match the first three of the data store characteristics defined above.

We can eliminate relational solutions fairly quickly; SQL schemas are not generally extensible or flexible, failing condition #4. In addition, they do not fare well on #5 -- using SQL for queries and updates is not particularly fluent in Clojure.

Similarly, we can eliminate key/value style data stores. In general, these do not have schemas at all (or at least, not the type of rich schema that provides a meaningful data contract or ontology, which is the point for Arachne.)

This leaves solutions based on the RDF stack, and Datomic-style data stores. Both are viable options which would provide unique benefits for Arachne, and both have different drawbacks.

Explaining the core technical characteristics of RDF/OWL and Datomic is beyond the scope of this document; please see the Jena and Datomic documentation for more details. More information on RDF, OWL and the Semantic web in general:

RDF

The clear choice for a JVM-based, permissively licensed, standards-compliant RDF API is Apache Jena.

Benefits for Arachne

Tradeoffs for Arachne (with mitigations)

Datomic

Note that Datomic itself does not satisfy the first requirement; it is closed-source, proprietary software. There is an open source project, Datascript, which emulates Datomic's APIs (without any of the storage elements). Either one would work for Arachne, since Arachne only needs the subset of features they both support. In, fact, if Arachne goes the Datomic-inspired route, we would probably want to support both: Datomic, for those who have an existing investment there, and Datascript for those who desire open source all the way.

Benefits for Arachne

Tradeoffs for Arachne (with mitigations)

Decision

The steering group decided the RDF/OWL approach is too high-risk to wrap in Clojure and implement at this time, while the rewards are mostly intangible "openness" and "interoperability" rather than something that will help move Arachne forward in the short term.

Therefore, we will use a Datomic style schema for Arachne's configuration.

Users may use either Datomic Pro, Datomic Free or Datascript at runtime in their applications. We will provide a "multiplexer" configuration implementation that utilizes both, and asserts that the results are equal: this can be used by module authors to ensure they stay within the subset of features supported by both platforms.

Before Arachne leaves "alpha" status (that is, before it is declared ready for experimental production use or for the release of third-party modules), we will revisit the question of whether OWL would be more appropriate, and whether we have encountered issues that OWL would have made easier. If so, and if time allows, we reserve the option to either refactor the configuration layer to use Jena as a primary store (porting existing modules), or provide an OWL view/rendering of an ontology stored in Datomic.

Status

Proposed

Consequences

Architecture Decision Record: Module Structure & Loading

Context

Arachne needs to be as modular as possible. Not only do we want the community to be able to contribute new abilities and features that integrate well with the core and with eachother, we want some of the basic functionality of Arachne to be swappable for alternatives as well.

ADR-002 specifies that one role of modules is to contribute schema to the application config. Other roles of modules would include providing code (as any library does), and querying and updating the config during the startup process. Additionally, since modules can depend upon each other, they must specify which modules they depend upon.

Ideally there will be as little overhead as possible for creating and consuming modules.

Some of the general problems associated with plugin/module systems include:

There are some existing systems for modularity in the Java ecosystem. The most notable is OSGi, which provides not only a module system addressing the concerns above, but also service runtime with classpath isolation, dynamic loading and unloading and lazy activation.

OSGi (and other systems of comparable scope) are overkill for Arachne. Although they come with benefits, they are very heavyweight and carry a high complexity burden, not just for Arachne development but also for end users. Specifically, Arachne applications will be drastically simpler if (at runtime) they exist as a straightforward codebase in a single classloader space. Features like lazy loading and dynamic start-stop are likewise out of scope; the goal is for an Arachne runtime itself to be lightweight enough that starting and stopping when modules change is not an issue.

Decision

Arachne will not be responsible for packaging, distribution or downloading of modules. These jobs will be delegated to an external dependency management & packaging tool. Initially, that tool will be Maven/Leiningen/Boot, or some other tool that works with Maven artifact repositories, since that is currently the standard for JVM projects.

Modules that have a dependency on another module must specify a dependency using Maven (or other dependency management tool.)

Arachne will provide no versioning system beyond what the packaging tool provides.

Each module JAR will contain a special arachne-modules.edn file at the root of its classpath. This data file (when read) contains a sequence of module definition maps.

Each module definition map contains the following information:

When an application is defined, the user must specify a set of module names to use (exact mechanism TBD.) Only the specified modules (and their dependencies) will be considered by Arachne. In other words, merely including a module as a dependency in the package manager is not sufficient to activate it and cause it to be used in an application.

Status

Proposed

Consequences

Architecture Decision Record: User Facing Configuration

Context

Per ADR-003, Arachne uses Datomic-shaped data for configuration. Although this is a flexible, extensible data structure which is a great fit for programmatic manipulation, in its literal form it is quite verbose.

It is quite difficult to understand the structure of Datomic data by reading its native textual representation, and it is similarly hard to write, containing enough repeated elements that copying and pasting quickly becomes the default.

One of Arachne's core values is ease of use and a fluent experience for developers. Since much of a developer's interaction with Arachne will be writing to the config, it is of paramount importance that there be some easy way to create configuration data.

The question is, what is the best way for developers of Arachne applications to interact with their application's configuration?

Option: Raw Datomic Txdata

This would require end users to write Datomic transaction data by hand in order to configure their application.

This is the "simplest" option, and has the fewest moving parts. However, as mentioned above, it is very far from ideal for human interactions.

Option: Custom EDN data formats

In this scenario, users would write EDN data in some some nested structure of maps, sets, seqs and primitives. This is currently the most common way to configure Clojure applications.

Each module would then need to provide a mapping from the EDN config format to the underlying Datomic-style config data.

Because Arachne's configuration is so much broader, and defines so much more of an application than a typical application config file, it is questionable if standard nested EDN data would be a good fit for representing it.

Option: Code-based configuration

Another option would be to go in the direction of some other frameworks, such as Ruby on Rails, and have the user-facing configuration be code rather than data.

It should be noted that the primary motivation for having a data-oriented configuration language, that it makes it easier to interact with programmatically, doesn't really apply in Arachne's case. Since applications are always free to interact richly with Arachne's full configuration database, the ability to programmatically manipulate the precursor data is moot. As such, one major argument against a code-based configuration strategy does not apply.

Decision

Developers will have the option of writing configuration using either native Datomic-style, data, or code-based configuration scripts. Configuration scripts are Clojure files which, when evaluated, update a configuration stored in an atom currently in context (using a dynamically bound var.)

Configuration scripts are Clojure source files in a distinct directory that by convention is outside the application's classpath: configuration code is conceptually and physically separate from application code. Conceptually, loading the configuration scripts could take place in an entirely different process from the primary application, serializing the resulting config before handing it to the runtime application.

To further emphasize the difference between configuration scripts and runtime code, and because they are not on the classpath, configuration scripts will not have namespaces and will instead include each other via Clojure's load function.

Arachne will provide code supporting the ability of module authors to write "configuration DSLs" for users to invoke from their configuration scripts. These DSLs will emphasize making it easy to create appropriate entities in the configuration. In general, DSL forms will have an imperative style: they will convert their arguments to configuration data and immediately transact it to the context configuration.

As a trivial example, instead of writing the verbose configuration data:

{:arachne/id :my.app/server
 :arachne.http.server/port 8080
 :arachne.http.server/debug true}

You could write the corresponding DSL:

(server :id :my.app/server, :port 8080, :debug true)

Note that this is an illustrative example and does not represent the actual DSL or config for the HTTP module.

DSLs should make heavy use of Spec to make errors as comprehensible as possible.

Status

Proposed

Consequences

Architecture Decision Record: Core Runtime

Context

At some point, every Arachne application needs to start; to bootstrap itself from a static project or deployment artifact, initialize what needs initializing, and begin servicing requests, connecting to databases, processing data, etc.

There are several logically inherent subtasks to this bootstrapping process, which can be broken down as follows.

As discussed in ADR-004, tasks in the "starting the JVM" category are not in-scope for Arachne; rather, they are offloaded to whatever build/dependency tool the project is using (usually either boot or leiningen.)

This leaves the Arachne and application-specific startup tasks. Arachne should provide an orderly, structured startup (and shutdown) procedure, and make it possible for modules and application authors to hook into it to ensure that their own code initializes, starts and stops as desired.

Additionally, it must be possible for different system components to have dependencies on eachother, such that when starting, services start after the services upon which they depend. Stopping should occur in reverse-dependency order, such that a service is never in a state where it is running but one of its dependencies is stopped.

Decision

Components

Arachne uses the Component library to manage system components. Instead of requiring users to define a component system map manually, however, Arachne itself builds one based upon the Arachne config via Configuration Entities that appear in the configuration.

Component entities may be added to the config directly by end users (via a initialization script as per ADR-005), or by modules in their configure function (ADR-004.)

Component entities have attributes which indicates which other components they depend upon. Circular dependencies are not allowed; the component dependency structure must form a Directed Acyclic Graph (DAG.) The dependency attributes also specify the key that Component will use to assoc dependencies.

Component entities also have an attribute that specifies a component constructor function (via a fully qualified name.) Component constructor functions must take two arguments: the configuration, and the entity ID of the component that is to be constructed. When invoked, a component constructor must return a runtime component object, to be used by the Component library. This may be any object that implements clojure.lang.Associative, and may also optionally satisfy Component's Lifecycle protocol.

Arachne Runtime

The top-level entity in an Arachne system is a reified Arachne Runtime object. This object contains both the Component system object, and the configuration value upon which the runtime is based. It satisfies the Lifecycle protocol itself; when it is started or stopped, all of the component objects it contains are started or stopped in the appropriate order.

The constructor function for a Runtime takes a configuration value and some number of "roots"; entity IDs or lookup refs of Component entities in the config. Only these root components and their transitive dependencies will be instantiated or added to the Component system. In other words, only component entities that are actually used will be instantiated; unused component entities defined in the config will be ignored.

A lookup function will be provided to find the runtime object instance of a component, given its entity ID or lookup ref in the configuraiton.

Startup Procedure

Arachne will rely upon an external build tool (such as boot or leiningen.) to handle downloading dependencies, assembling a classpath, and starting a JVM.

Once JVM with the correct classpath is running, the following steps are required to yield a running Arachne runtime:

  1. Determine a set of modules to use (the "active modules")
  2. Build a configuration schema by querying each active module using its schema function (ADR-004)
  3. Update the config with initial configuration data from user init scripts (ADR-005)
  4. In module dependency order, give each module a chance to query and update the configuration using its configure function (ADR-004)
  5. Create a new Arachne runtime, given the configuration and a set of root components.
  6. Call the runtime's start method.

The Arachne codebase will provide entry points to automatically perform these steps for common development and production scenarios. Alternatively, they can always be be executed individually in a REPL, or composed in custom startup functions.

Status

PROPOSED

Consequences

Architecture Decision Record: Configuration Updates

Context

A core part of the process of developing an application is making changes to its configuration. With its emphasis on configuration, this is even more true of Arachne than with most other web frameworks.

In a development context, developers will want to see these changes reflected in their running application as quickly as possible. Keeping the test/modify cycle short is an important goal.

However, accommodating change is a source of complexity. Extra code would be required to handle "update" scenarios. Components are initialized with a particular configuration in hand. While it would be possible to require that every component support an update operation to receive an arbitrary new config, implementing this is non-trivial and would likely need to involve conditional logic to determine the ways in which the new configuration is different from the old. If any mistakes where made in the implementation of update, for any component, such that the result was not identical to a clean restart, it would be possible to put the system in an inconsistent, unreproducible state.

The "simplest" approach is to avoid the issue and completely discard and rebuild the Arachne runtime (ADR-006) every time the configuration is updated. Every modification to the config would be applied via a clean start, guaranteeing reproducibility and a single code path.

However, this simple baseline approach has two major drawbacks:

  1. The shutdown, initialization, and startup times of the entire set of components will be incurred every time the configuration is updated.
  2. The developer will lose any application state stored in the components whenever the config is modified.

The startup and shutdown time issues are potentially problematic because of the general increase to cycle time. However, it might not be too bad depending on exactly how long it takes sub-components to start. Most commonly-used components take only a few milliseconds to rebuild and restart. This is a cost that most Component workflows absorb without too much trouble.

The second issue is more problematic. Not only is losing state a drain on overall cycle speed, it is a direct source of frustration, causing developers to repeat the same tasks over and over. It will mean that touching the configuration has a real cost, and will cause developers to be hesitant to do so.

Prior Art

There is a library designed to solve the startup/shutdown problem, in conjunction with Component: Suspendable. It is not an ideal fit for Arachne, since it focuses on suspending and resuming the same Component instances rather than rebuilding, but its approach may be instructive.

Decision

Whenever the configuration changes, we will use the simple approach of stopping and discarding the entire old Arachne runtime (and all its components), and starting a new one.

To mitigate the issue of lost state, Arachne will provide a new protocol called Preservable (name subject to change, pending a better one.) Components may optionally implement Preservable; it is not required. Preservable defines a single method, preserve.

Whenever the configuration changes, the following procedure will be used:

  1. Call stop on the old runtime.
  2. Instantiate the new runtime.
  3. For all components in the new runtime which implement Preservable, invoke the preserve function, passing it the corresponding component from the old runtime (if there is one).
  4. The preserve function will selectively copy state out of the old, stopped component into the new, not-yet-started component. It should be careful not to copy any state that would be invalidated by a configuration change.
  5. Call start on the new runtime.

Arachne will not provide a mitigation for avoiding the cost of stopping and starting individual components. If this becomes a pain point, we can explore solutions such as that offered by Suspendable.

Status

PROPOSED

Consequences

Architecture Decision Record: Abstract Modules

Context

One design goal of Arachne is to have modules be relatively easily swappable. Users should not be permanently committed to particular technical choices, but instead should have some flexibility in choosing their preferred tech, as long as it exists in the form of an Arachne module.

Some examples of the alternative implementations that people might wish to use for various parts of their application:

This is only a representative sample; the actual list is unbounded.

The need for this kind of flexibility raises some design concerns:

Capability. Users should always be able to leverage the full power of their chosen technology. That is, they should not have to code to the "least common denominator" of capability. If they use Datomic Pro, for example, they should be able to write Datalog and fully utilize the in-process Peer model, not be restricted to an anemic "ORM" that is also compatible with RDBMSs.

Uniformity. At tension with capability is the desire for uniformity; where the feature set of two alternatives is not particularly distinct, it is desirable to use a common API, so that implementations can be swapped out with little or no effort. For example, the user-facing API for sending a single email should (probably) not care whether it is ultimately sent via a local Sendmail server or a third-party service.

Composition. Modules should also compose as much as possible, and they should be as general as possible in their dependencies to maximize the number of compatible modules. In this situation, it is actually desirable to have a "least common denominator" that modules can have a dependency on, rather than depending on specific implementations. For example, many modules will need to persist data and ultimately will need to work in projects that use Datomic or SQL. Rather than providing multiple versions, one for Datomic users and another for SQL, it would be ideal if they could code against a common persistence abstraction, and therefore be usable in any project with a persistence layer.

What does it mean to use a module?

The following list enumerates the ways in which it is possible to "use" a module, either from a user application or from another module. (See ADR-004).

  1. You can call code that the module provides (the same as any Clojure library.)
  2. You can extend a protocol that the module provides (the same as any Clojure library.)
  3. You can read the attributes defined in the module from the configuration.
  4. You can write configuration data using the attributes defined in the module.

These tools allow the definition of modules with many different kinds of relationships to each other. Speaking loosely, these relationships can correspond to other well-known patterns in software development including composition, mixins, interface/implementation, inheritance, etc.

Decision

In order to simultaneously meet the needs for capability, uniformity and composition, Arachne's core modules will (as appropriate) use the pattern of abstract modules.

Abstract modules define certain attributes (and possibly also corresponding init script DSLs) that describe entities in a particular domain, without providing any runtime implementation which uses them. Then, other modules can "implement" the abstract module, reading the abstract entities and doing something concrete with them at runtime, as well as defining their own more specific attributes.

In this way, user applications and dependent modules can rely either on the common, abstract module or the specific, concrete module as appropriate. Coding against the abstract module will yield a more generic "least common denominator" experience, while coding against a specific implementor will give more access to the unique distinguishing features of that particular technology, at the cost of generality.

Similar relationships should hold in the library code which modules expose (if any.) An abstract module, for example, would be free to define a protocol, intended to be implemented concretely by code in an implementing module.

This pattern is fully extensible; it isn't limited to a single level of abstraction. An abstract module could itself be a narrowing or refinement of another, even more general abstract module.

Concrete Example

As mentioned above, Arachne would like to support both Ring and Pedestal as HTTP servers. Both systems have a number of things in common:

They also have some key differences:

Therefore, it makes sense to define an abstract HTTP module which defines the basic domain concepts; servers, routes, handlers, etc. Many dependent modules and applications will be able to make real use of this subset.

Then, there will be the two modules which provide concrete implementations; one for Pedestal, one for Ring. These will contain the code that actually reads the configuration, and at runtime builds appropriate routing tables, starts server instances, etc. Applications which wish to make direct use of a specific feature like Pedestal interceptors may freely do so, using attributes defined by the Pedestal module.

Status

PROPOSED

Consequences

Architecture Decision Record: Configuration Ontology

Context

In ADR-003 it was decided to use a Datomic-based configuration, the alternative being something more semantically or ontologically descriptive such as RDF+OWL.

Although we elected to use Datomic, Datomic does not itself offer much ontological modeling capacity. It has no built-in notion of types/classes, and its attribute specifications are limited to what is necessary for efficient storage and indexing, rather than expressive or validative power.

Ideally, we want modules to be able to communicate additional information about the structure and intent of their domain model, including:

This additional data could serve three purposes:

Decision

Status

PROPOSED

Consequences

Architecture Decision Record: Persistent Configuration

Context

While many Arachne applications will use a transient config which is rebuilt from its initialization scripts every time an instance is started, some users might wish instead to store their config persistently in a full Datomic instance.

There are a number of possible benefits to this approach:

Doing this introduces a number of additional challenges:

Goals

We need a technical approach with good answers to the challenges described above, that enables a clean user workflow. As such, it is useful to enumerate the specific activities that it would be useful for a persistent config implementation to support:

At the same time, we need to be careful not to overly complicate things for the common case; most applications will still use the pattern of generating a configuration from an init script immediately before running an application using it.

Decision

We will not attempt to implement a concrete strategy for config persistence at this time; it runs the risk of becoming a quagmire that will halt forward momentum.

Instead, we will make a minimal set of choices and observations that will enable forward progress while preserving the ability to revisit the issue of persistent configuration at some point in the future.

  1. The configuration schema itself should be compatible with having several configurations present in the same persistent database. Specifically:
  1. The current initial tooling for building configurations (including the init scripts) will focus on building configurations from scratch. Tooling capable of "editing" an existing configuration is sufficiently different, with a different set of requirements and constraints, that it needs its own design process.

  2. Any future tooling for storing, viewing and editing configurations will need to explicitly determine whether it wants to work with the configuration before or after processing by the modules, since there is a distinct set of tradeoffs.

Status

PROPOSED

Consequences

  1. We can continue making forward progress on the "local" configuration case.
  2. Storing persistent configurations remains possible.
  3. It is immediately possible to save configurations for repeatability and debugging purposes.
    • The editing of persistent configs is what will be more difficult.
  4. When we want to edit persistent configurations, we will need to analyze the specific use cases to determine the best way to do so, and develop tools specific to those tasks.

Architecture Decision Record: Asset Pipeline

Context

In addition to handling arbitrary HTTP requests, we would like for Arachne to make it easy to serve up certain types of well-known resources, such as static HTML, images, CSS, and JavaScript.

These "static assets" can generally be served to users as files directly, without processing at the time they are served. However, it is extremely useful to provide pre-processing, to convert assets in one format to another format prior to serving them. Examples of such transformations include:

Additionally, in some cases, several such transformations might be required, on the same resource. For example, a file might need to be converted from CoffeeScript to JavaScript, then minified, then gzipped.

In this case, asset transformations form a logical pipeline, applying a set of transformations in a known order to resources that meet certain criteria.

Arachne needs a module that defines a way to specify what assets are, and what transformations ought to apply and in what order. Like everything else, this system needs to be open to extension by other modules, to provide custom processing steps.

Development vs Production

Regardless of how the asset pipeline is implemented, it must provide a good development experience such that the developer can see their changes immediately. When the user modifies an asset file, it should be automatically reflected in the running application in near realtime. This keeps development cycle times low, and provides a fluid, low-friction development experience that allows developers to focus on their application.

Production usage, however, has a different set of priorities. Being able to reflect changes is less important; instead, minimizing processing cost and response time is paramount. In production, systems will generally want to do as much processing as they can ahead of time (during or before deployment), and then cache aggressively.

Deployment & Distribution

For development and simple deployments, Arachne should be capable of serving assets itself. However, whatever technique it uses to implement the asset pipeline, it should also be capable of sending the final assets to a separate cache or CDN such that they can be served statically with optimal efficiency. This may be implemented as a separate module from the core asset pipeline, however.

Entirely Static Sites

There is a large class of websites which actually do not require any dynamic behavior at all; they can be built entirely from static assets (and associated pre-processing.) Examples of frameworks that cater specifically to this type of "static site generation" include Jekyll, Middleman, Brunch, and many more.

By including the asset pipeline module, and not the HTTP or Pedestal modules, Arachne also ought to be able to function as a capable and extensible static site generator.

Decision

Arachne will use Boot to provide an abstract asset pipeline. Boot has built-in support for immutable Filesets, temp directory management, and file watchers.

As with everything in Arachne, the pipeline will be specified as pure data in the configuration, specifying inputs, outputs, and transformations explicitly.

Modules that participate in the asset pipeline will develop against a well-defined API built around Boot Filesets.

Status

PROPOSED

Consequences

Architecture Decision Record: Enhanced Validation

Context

As much as possible, an Arachne application should be defined by its configuration. If something is wrong with the configuration, there is no way that an application can be expected to work correctly.

Therefore, it is desirable to validate that a configuration is correct to the greatest extent possible, at the earliest possible moment. This is important for two distinct reasons:

There are two "kinds" of config validation.

The first is ensuring that a configuration as data is structurally correct; that it adheres to its own schema. This includes validating types and cardinalities as expressed by the Arachne's core ontology system.

The second is ensuring that the Arachne Runtime constructed from a given configuration is correct; that the runtime component instances returned by component constructors are of the correct type and likely to work.

Decision

Arachne will perform both kinds of validation. To disambiguate them (since they are logically distinct), we will term the structural/schema validation "configuration validation", while the validation of the runtime objects will be "runtime validation."

Both styles of validation should be extensible by modules, so modules can specify additional validations, where necessary.

Configuration Validation

Configuration validation is ensuring that an Arachne configuration object is consistent with itself and with its schema.

Because this is ultimately validating a set of Datomic style eavt tuples, the natural form for checking tuple data is Datalog queries and query rules, to search for and locate data that is "incorrect."

Each logical validation will have its own "validator", a function which takes a config, queries it, and either returns or throws an exception. To validate a config, it is passed through every validator as the final step of building a module.

The set of validators is open, and defined in the configuration itself. To add new validators, a module can transact entities for them during its configuration building phase.

Runtime Validation

Runtime validation occurs after a runtime is instantiated, but before it is started. Validation happens on the component level; each component may be subject to validation.

Unlike Configuration validation, Runtime validation uses Spec. What specs should be applied to each component are defined in the configuration using a keyword-valued attribute. Specs may be defined on individual component entities, or to the type of a component entity. When a component is validated, it is validated using all the specs defined for it or any of its supertypes.

Status

PROPOSED

Consequences

Architecture Decision Record: Error Reporting

Context

Historically, error handling has not been Clojure's strong suit. For the most part, errors take the form of a JVM exception, with a long stack trace that includes a lot of Clojure's implementation as well as stack frames that pertain directly to user code.

Additionally, prior to the advent of clojure.spec, Clojure errors were often "deep": a very generic error (like a NullPointerException) would be thrown from far within a branch, rather than eagerly validating inputs.

There are Clojure libraries which make an attempt to improve the situation, but they typically do it by overriding Clojure's default exception printing functions across the board, and are sometimes "lossy", dropping information that could be desirable to a developer.

Spec provides an opportunity to improve the situation across the board, and with Arachne we want to be on the leading edge of providing helpful error messages that point straight to the problem, minimize time spent trying to figure out what's going on, and let developers get straight back to working on what matters to them.

Ideally, Arachne's error handling should exhibit the following qualities:

Decision

We will separate the problems of creating rich exceptions, and catching them and displaying them to the user.

Creating Errors

Whenever a well-behaved Arachne module needs to report an error, it should throw an info-bearing exception. This exception should be formed such that it is handled gracefully by any JVM tooling; the message should be terse but communicative, containing key information with no newlines.

However, in the ex-data, the exception will also contain much more detailed information, that can be used (in the correct context) to provide much more detailed or verbose errors. Specifically, it may contain the following keys:

Exceptions may, of course, contain additional data; these are the common keys that tools can use to more effectively render errors.

There will be a suite of tools, provided with Arachne's core, for conveniently generating errors that match this pattern.

Displaying Errors

We will use a pluggable "error handling system", where users can explicitly install an exception handler other than the default.

If the user does not install any exception handlers, errors will be handled the same way as they are by default (usually, dumped with the message and stack trace to System/err.) This will not change.

However, Arachne will also provide a function that a user can invoke in their main process, prior to doing anything else. Invoking this function will install a set of default exception handlers that will handle errors in a richer, more Arachne-specific way. This includes printing out the long-form error, or even (eventually) popping open a graphical data browser/debugger (if applicable.)

Status

PROPOSED

Consequences

Architecture Decision Record: Project Templates

Context

When starting a new project, it isn't practical to start completely from scratch, every time. We would like to have a varity of "starting point" projects, for different purposes.

Lein templates

In the Clojure space, Leiningen Templates fill this purpose. These are sets of special string-interpolated files that are "rendered" into a working project using special tooling.

However, they have two major drawbacks:

Rails templates

Rails also provides a complete project templating solution. In Rails, the project template is a template.rb file which contains DSL forms that specify operations to perform on a fresh project. These operations include creating files, modifying a projects dependencies, adding Rake tasks, and running specific generators.

Generators are particularly interesting, because the idea is that they can generate or modify stubs for files pertaining to a specific part of the application (e.g, a new model or a new controller), and they can be invoked at any point, not just initial project creation.

Decision

To start with, Arachne templates will be standard git repositories containing an Arachne project. They will use no special syntax, and will be valid, runnable projects out of the box.

In order to allow users to create their own projects, these template projects will include a rename script. The rename script will recursively rename an entire project directory to something that the user chooses, and will delete .git and re-run git init,

Therefore, the process to start a new Arachne project will be:

  1. Choose an appropriate project template.
  2. Clone its git repository from Github
  3. Run the rename script to rename the project to whatever you wish
  4. Start a repl, and begin editing.

Maven Distribution

There are certain development environments where there is not full access to the open internet (particularly in certain governmental applications.) Therefore, accessing GitHub can prove difficult. However, in order to support developers, these organizations often run their own Maven mirrors.

As a convenience to users in these situations, when it is necessary, we can build a wrapper that can compress and install a project directory as a Maven artifact. Then, using standard Maven command line tooling, it will be possible to download and decompress the artifact into a local filesystem directory, and proceed as normal.

Status

PROPOSED

Consequences

Contrast with Rails

One way that this approach is inferior to Rails templates is that this approach is "atomic"; templating happens once, and it happens for the whole project. Rails templates can be composed of many different generators, and generators can be invoked at any point over a project's lifecycle to quickly stub out new functionality.

This also has implications for maintenance; because Rails generators are updated along with each Rails release, the template itself is more stable, wheras Arachne templates would need to be updated every single time Arachne itself changes. This imposes a maintenance burden on templates maintained by the core team, and risks poor user experience for users who find and try to use an out-of-date third-party template.

However, there is is mitigating difference between Arachne and Rails, which relates directly to the philosophy and approach of the two projects.

In Rails, the project is the source files, and the project directory layout. If you ask "where is a controller?", you can answer by pointing to the relevant *.rb file in the app/controllers directory. So in Rails, the task "create a new controller" is equivalent to creating some number of new files in the appropriate places, containing the appropriate code. Hence the importance of generators.

In Arachne, by contrast, the project is not ultimately defined by its source files and directory structure; it is defined by the config. Of course there are source files and a directory structure, and there will be some conventions about how to organize them, but they are not the very definition of a project. Instead, a project's Configuration is the canonical definition of what a project is and what it does. If you ask "where is a controller?" in Arachne, the only meaningful answer is to point to data in the configuration. And the task "create a controller" means inserting the appropriate data into the config (usually via the config DSL.)

As a consequence, Arachne can focus less on code generation, and more on generating config data. Instead of providing a code generator which writes source files to the project structure, Arachne can provide config generators which users can invoke (with comparable effort) in their config scripts.

As such, Arachne templates will typically be very small. In Arachne, code generation is an antipattern. Instead of making it easy to generate code, Arachne focuses on building abstractions that let users specify their intent directly, in a terse manner.

Architecture Decision Record: Data Abstraction Model

Context

Most applications need to store and manipulate data. In the current state of the art in Clojure, this is usually done in a straightforward, ad-hoc way. Users write schema, interact with their database, and parse data from user input into a persistence format using explicit code.

This is acceptable, if you're writing a custom, concrete application from scratch. But it will not work for Arachne. Arachne's modules need to be able to read and write domain data, while also being compatible with multiple backend storage modules.

For example a user/password based authentication module needs to be able to read and write user records to the application database, and it should work whether a user is using a Datomic, SQL or NoSQL database.

In other words, Arachne cannot function well in a world in which every module is required to interoperate directly against one of several alternative modules. Instead, there needs to be a way for modules to "speak a common language" for data manipulation and persistence.

Other use cases

Data persistence isn't the only concern. There are many other situations where having a common, abstract data model is highly useful. These include:

Modeling & Manipulation

There are actually two distinct concepts at play; data modeling and data manipulation.

Modeling is the activity of defining the abstract shape of the data; essentially, it is writing schema, but in a way that is not specific to any concrete implementation. Modules can then use the data model to generate concrete schema, generate API endpoints, forms, validate data, etc.

Manipulation is the activity of using the model to create, read update or delete actual data. For an abstract data manipulation layer, this generally means a polymorphic API that supports some common set of implementations, which can be extended to concrete CRUD operations

Existing solutions: ORMs

Most frameworks have some answer to this problem. Rails has ActiveRecord, Elixir has Ecto, old-school Java has Hibernate, etc. In every case, they try to paper over what it looks like to access the actual database, and provide an idiomatic API in the language to read and persist data. This language-level API is uniformly designed to make the database "easy" to use, but also has the effect of providing a common abstraction point for extensions.

Unfortunately, ORMs also exhibit a common set of problems. By their very nature, they are an extra level of indirection. They provide abstraction, but given how complex databases are the abstraction is always "leaky" in significant ways. Using them effectively requires a thorough understanding not only of the ORM's APIs, but also the underlying database implementation, and what the ORM is doing to map the data from one format to another.

ORMs are also tied more or less tightly to the relational model. Attempts to extend ActiveRecord (for example) to non-relational data stores have had varying levels of success.

Database "migrations"

One other function is to make sure that the concrete database schema matches the abstract data model that the application is using. Most ORMs implement this using some form of "database migrations", which serve as a repeatable series of all changes made to a database. Ideally, these are not redundant with the abstract data model, to avoid repeating the same information twice and also to ensure consistency.

Decision

Arachne will provide a lightweight model for data abstraction and persistence, oriented around the Entity/Attribute mode. To avoid word salad and acronyms loaded with baggage and false expectations, we will give it a semantically clean name. We will be free to define this name, and set expectations around what it is and how it is to be used. I suggest "Chimera", as it is in keeping with the Greek mythology theme and has several relevant connotations.

Chimera consists of two parts:

Although support for any arbitrary database cannot be guaranteed, the persistence operations are designed to support a majority of commonly used systems, including relational SQL databases, document stores, tuple stores, Datomic, or other "NoSQL" type systems.

At the data model level, Chimera should be a powerful, easy to use way to specify the structure of your data, as data. Modules can then read this data and expose new functionality driven by the application domain model. It needs to be flexible enough that it can be "projected" as schema into diverse types of adapters, and customizable enough that it can be configured to adapt to existing database installations.

Adapters

Chimera Adapters are Arachne modules which take the abstract data structures and operations defined by Chimera, and extend them to specific databases or database APIs such as JDBC, Datomic, MongoDB, etc.

When applicable, there can also be "abstract adapters" that do the bulk of the work of adapting Chimera to some particular genre of database. For example, most key/value stores have similar semantics and core operations: there will likely be a "Key/Value Adapter" that does the bulk of the work for adapting Chimera's operations to key/value storage, and then several thin concrete adapters that implement the actual get/put commands for Cassandra, DynamoDB, Redis, etc.

Limitations and Drawbacks

Chimera is designed to make a limited set of common operations possible to write generically. It is not and cannot ever be a complete interface to every database. Application developers can and should understand and use the native APIs of their selected database, or use a dedicated wrapper module that exposes the full power of their selected technology. Chimera represents only a single dimension of functionality; the entity/attribute model. By definition, it cannot provide access to the unique and powerful features that different databases provide and which their users ought to leverage.

It is also important to recognize that there are problems (even problems that modules might want to tackle) for which Chimera's basic entity/attribute model is simply not a good fit. If the entity model isn't a good fit, <u>do not use</u> Chimera. Instead, find (or write) an Arachne module that defines a data modeling abstraction better suited for the task at hand.

Examples of applications that might not be a good fit for Chimera include:

Modeling

The data model for an Arachne application is, like everything else, data in the Configuration. Chimera defines a set of DSL forms that application authors can use to define data models programmatically, and of course modules can also read, write and modify these definitions as part of their normal configuration process.

Note: The configuration schema, including the schema for the data model, is itself defined using Chimera. This requires some special bootstrapping in the core module. It also implies that Arachne core has a dependency on Chimera. This does not mean that modules are required to use Chimera or that Chimera has some special status relative to other conceivable data models; it just means that it is a good fit for modeling the kind of data that needs to be stored in the configuration.

Modeling: Entity Types

Entity types are entities that define the structure and content for a domain entity. Entity types specify a set of optional and required attributes that entities of that type must have.

Entity types may have one or more supertypes. Semantically, supertypes imply that any entity which is an instance of the subtype is also an instance of the supertype. Therefore, the set of attributes that are valid or required for an entity are the attributes of its types and all ancestor types.

Entity types define only data structures. They are not objects or classes; they do not define methods or behaviors.

In addition to defining the structure of entities themselves, entity types can have additional config attributes that serve as implementation-specific hints. For example, an entity type could have an attribute to override the name of the SQL table used for persistence. This config attribute would be defined and used by the SQL module, not by Chimera itself.

The basic attributes of the entity type, as defined by Chimera, are:

Attribute Definitions

Attribute Definition entities define what types of values can be associated with an entity. They specify:

  1. The name of the attribute (as a namespace-qualified keyword)
  2. The min and max cardinality of an attribute (thereby specifying whether it is required or optional)
  3. The type of allowed values (see the section on Value Types below)
  4. Whether the attribute is a key. The values of a key attribute are expected to be globally unique, guaranteed to be present, and serve as a way to find specific entities, no matter what the underlying storage mechanism.
  5. Whether the attribute is indexed. This is primarily a hint to the underlying database implementation.

Like entity types, attribute definitions may have any number of additional attributes, to modify behavior in an implementation-specific way.

Value Types

The value of an attribute may be one of three types:

  1. A reference is a value that is itself an entity. The attribute must specify the entity type of the target entity.

  2. A component is a reference, with the added semantic implication that the value entity is a logical "part" of the parent entity. It will be retrieved automatically, along with the parent, and will also be deleted/retracted along with the parent entity.

  3. A primitive is a simple, atomic value. Primitives may be one of several defined types, which map more or less directly to primitive types on the JVM:

    • Boolean (JVM java.lang.Boolean)
    • String (JVM java.lang.String)
    • Keyword (Clojure clojure.lang.Keyword)
    • 64 bit integer (JVM java.lang.Long)
    • 64 bit floating point decimal (JVM java.lang.Double)
    • Arbitrary precision integer (JVM java.math.BigInteger)
    • Arbitrary precision decimal (JVM java.math.BigDecimal)
    • Instant (absolute time with millisecond resolution) (JVM java.util.Date)
    • UUID (JVM java.util.UUID)
    • Bytes (JVM byte array). Since not all storages support binary data, and might need to serialize it with base64, this should be fairly small.

    This set of primitives represent a reasonable common denominator that is supportable on most target databases. Note that the set is not closed: modules can specify new primitive types that are logically "subtypes" of the generic primitives. Entirely new types can also be defined (with the caveat that they will only work with adapters for which an implementation has been defined.)

Validation

All attribute names are namespace-qualified keywords. If there are specs registered using those keywords, they can be used to validate the corresponding values.

Clojure requires that a namespace be loaded before the specs defined in it are globally registered. To ensure that all relevant specs are loaded before an application runs, Chimera provides config attributes that specify namespaces containing specs. Arachne will ensure that these namespaces are loaded first, so module authors can ensure that their specs are loaded before they are needed.

Chimera also provides a generate-spec operation which programmatically builds a spec for a given entity type, that can validate a full entity map of that type.

Schema & Migration Operations

In order for data persistence to actually work, the schema of a particular database instance (at least, for those that have schema) needs to be compatible with the application's data model, as defined by Chimera's entity types and attributes.

See ADR-16 for an in-depth discussion of database migrations work, and the ramifications for how a Chimera data model is declared in the configuration.

Entity Manipulation

The previous section discussed the data model, and how to define the general shape and structure of entities in an application. Entity manipulation refers to how the operations available to create, read, update, delete specific instances of those entities.

Data Representation

Domain entities are represented, in application code, as simple Clojure maps. In their function as Chimera entities, they are pure data; not objects. They are not required to support any additional protocols.

Entity keys are restricted to being namespace-qualified keywords, which correspond with the attribute names defined in configuration (see Attribute Definitions above). Other keys will be ignored in Chimera's operations. Values may be any Clojure value, subject to spec validation before certain operations.

Cardinality-many attributes must use a Clojure sequence, even if there is only one value.

Reference values are represented in one of two ways; as a nested map, or as a lookup reference.

Nested maps are straightforward. For example:

{:myapp.person/id 123
 :myapp.person/name "Bill"
 :myapp.person/friends [{:myapp.person/id 42
                          :myapp.person/name "Joe"}]}

Lookup references are special values that identify an attribute (which must be a key) and value to indicate the target reference. Chimera provides a tagged literal specifially for lookup references.

{:myapp.person/id 123
 :myapp.person/name "Bill"
 :myapp.person/friends [#chimera.key[:myapp.person/id 42]]}

All Chimera operations that return data should use one of these representations.

Both representations are largely equivalent, but there is an important note about passing nested maps to persistence operations: the intended semantics for any nested maps must be the same as the parent map. For example, you cannot call create and expect the top-level entity to be created while the nested entity is updated.

Entities do not need to explicitly declare their entity type. Types may be derived from inspecting the set of keys and comparing it to the Entity Types defined in the configuration.

Persistence Operations

The following basic operations are defined:

All these operations should be transactional if possible. Adapters which cannot provide transactional behavior for these operations should note this fact clearly in their documentation, so their users do not make false assumptions about the integrity of their systems.

Each of these operations has its own protocol which may be required by modules, or satisfied by adapters à la carte. Thus, a module that does not require the full set of operations can still work with an adapter, as long as it satisfies the operations that it does need.

This set of operations is not exhaustive; other modules and adapters are free to extend Chimera and define additional operations, with different or stricter semantics. These operations are those that it is possible to implement consistently, in a reasonably performant way, against a "broad enough" set of very different types of databases.

To make it possible for them to be composed more flexibly, operations are expressed as data, not as direct methods.

Capability Model

Adapters must specify a list of what operations they support. Modules should validate this list at runtime, to ensure the adapter works with the operations that they require.

In addition to specifying whether an operation is supported or not, adapters must specify whether they support the operation idempotently and/or transactionally.

Status

PROPOSED

Consequences

Architecture Decision Record: Database Migrations

Context

In general, Arachne's philosophy embraces the concepts of immutability and reproducibility; rather than changing something, replace it with something new. Usually, this simplifies the mental model and reduces the number of variables, reducing the ways in which things can go wrong.

But there is one area where this approach just can't work: administering changes to a production database. Databases must have a stable existence across time. You can't throw away all your data every time you want to make a change.

And yet, some changes in the database do need to happen. Data models change. New fields are added. Entity relationships are refactored.

The challenge is to provide a way to provide measured, safe, reproducible change across time which is also compatible with Arachne's target of defining and describing all relevant parts of an application (including it's data model (and therefore schema)) in a configuration.

Compounding the challenge is the need to build a system that can define concrete schema for different types of databases, based on a common data model (such as Chimera's, as described in ADR-15.)

Prior Art

Several systems to do this already exist. The best known is probably Rails' Active Record Migrations, which is oriented around making schema changes to a relational database.

Another solution of interest is Liquibase, a system which reifies database changes as data and explicitly applies them to a relation database.

Scenarios

There are a variety of "user stories" to accomodate. Some examples include:

  1. You are a new developer on a project, and want to create a local database that will work with the current HEAD of the codebase, for local development.
  2. You are responsible for the production deployment of your project, and your team has a new software version ready to go, but it requires some new fields to be added to the database before the new code will run.
  3. You want to set up a staging environment that is an exact mirror of your current production system.
  4. You and a fellow developer are merging your branches for different features. You both made different changes to the data model, and you need to be sure they are compatible after the merge.
  5. You recognize that you made a mistake earlier in development, and stored a currency value as a floating point number. You need to create a new column in the database which uses a fixed-point type, and copy over all the existing values, using rounding logic that you've agreed on with domain experts.

Decision

Chimera will explicitly define the concept of a migration, and reify migrations as entities in the configuration.

A migration represents an atomic set of changes to the schema of a database. For any given database instance, either a migration has logically been applied, or it hasn't. Migrations have unique IDs, expressed as namespace-qualified keywords.

Every migration has one or more "parent" migrations (except for a single, special "initial" migration, which has no parent). A migration may not be applied to a database unless all of its parents have already been applied.

Migrations are also have a signature. The signature is an MD5 checksum of the actual content of the migration as it is applied to the database (whether that be txdata for Datomic, a string for SQL DDL, a JSON string, etc.) This is used to ensure that a migration is not "changed" after it has already been applied to some persistent database.

Adapters are responsible for exposing an implementation of migrations (and accompanying config DSL) that is appropriate for the database type.

Chimera Adapters must additionally satisfy two runtime operations:

Migration Types

There are four basic types of migrations.

  1. Native migrations. These are instances of the migration type directly implemented by a database adapter, and are specific to the type of DB being used. For example, a native migration against a SQL database would be implemented (primarily) via a SQL string. A native migration can only be used by adapters of the appropriate type.
  2. Chimera migrations. These define migrations using Chimera's entity/attribute data model. They are abstract, and should work against multiple different types of adapters. Chimera migrations should be supported by all Chimera adapters.
  3. Sentinel migrations. These are used to coordinate manual changes to an existing database with the code that requires them. They will always fail to automatically apply to an existing database: the database admin must add the migration record explicitly after they perform the manual migration task. (Note, actually implementing these can be deferred until if or when they are needed).

Structure & Usage

Because migrations may have one or more parents, migrations form a directed acyclic graph.

This is appropriate, and combines well with Arachne's composability model. A module may define a sequence of migrations that build up a data model, and extending modules can branch from any point to build their own data model that shares structure with it. Modules may also depend upon a chain of migrations specified in two dependent modules, to indicate that it requires both of them.

In the configuration, a Chimera database component may depend on any number of migration components. These migrations, and all their ancestors, form a "database definition", and represent the complete schema of a concrete database instance (as far as Chimera is concerned.)

When a database component is started and connects to the underlying data store, it verifies that all the specifies migrations have been applied. If they have not, it fails to start. This guarantees the safety of an Arachne system; a given application simply will not start if it is not compatible with the specified database.

Parallel Migrations

This does create an opportunity for problems: if two migrations which have no dependency relatinship ("parallel migrations") have operations that are incompatible, or would yield different results depending on the order in which they are applied, then these operations "conflict" and applying them to a database could result in errors or non-deterministic behavior.

If the parallel migrations are both Chimera migrations, then Arachne is aware of their internal structure and can detect the conflict and refuse to start or run the migrations, before it actually touches the database.

Unfortunately, Arachne cannot detect conflicting parallel migrations for other migration types. It is the responsibility of application developers to ensure that parallel migrations are logically isolate and can coexist in the same database without conflict.

Therefore, it is advisable in general for public modules to only use Chimera migrations. In addition to making them as broadly compatible as possible, and will also make it more tractable for application authors to avoid conflicting parallel migrations, since they only have to worry about those that they themselves create.

Chimera Migrations & Entity Types

One drawback of using Chimera migrations is that you cannot see a full entity type defined in one place, just from reading a config DSL script. This cannot be avoided: in a real, living application, entities are defined over time, in many different migrations as the application grows, not all at once. Each Chimera migration contains only a fragment of the full data model.

However, this poses a usability problem; both for developers, and for machine consumption. There are many reasons for developers or modules to view or query the entity type model as a "point in time" snapshot, rather than just a series of incremental changes.

To support this use case, the Chimera module creates a flat entity type model for each database by "rolling up" the individual Chimera entity definition forms into a single, full data structure graph. This "canonical entity model" can then be used to render schema diagrams for users, or be queried by other modules.

Applying Migrations

When and how to invoke an Adapter's migrate function is not defined, since different teams will wish to do it in different ways.

Some possibilities include:

  1. The application calls "migrate" every time it is started (this is only advisable if the database has excellent support for transactional and atomic migrations.) In this scenario, developers only need to worry about deploying the code.
  2. The devops team can manually invoke the "migrate" function for each new configuration, prior to deployment.
  3. In a continuous-deployment setup, a CI server could run a battery of tests against a clone of the production database and invoke "migrate" automatically if they pass.
  4. The development team can inspect the set of migrations and generate a set of native SQL or txdata statements for handoff to a dedicated DBA team for review and commit prior to deployment.

Databases without migrations

Not every application wants to use Chimera's migration system. Some situations where migrations may not be a good fit include:

However, you still may wish to utilize Chimera's entity model, and leverage modules that define Chimera migrations.

To support this, Chimera allows you to (in the configuration) designate a database component as "assert-only". Assert-only databases never have migrations applied, and they do not require the database to track any concept of migrations. Instead, they inspect the Chimera entity model (after rolling up all declared migrations) and assert that the database already has compatible schema installed. If it does, everything starts up as normal; if it does not, the component fails to start.

Of course, the schema that Chimera expects most likely will not be an exact match for what is present in the database. To accomodate this, Chimera adapters defines a set of override configuration entities (and accompanying DSL). Users can apply these overrides to change the behavior of the mappings that Chimera uses to query and store data.

Note that Chimera Overrides are incompatible with actually running migrations: they can be used only on an "assert-only" database.

Migration Rollback

Generalized rollback of migrations is intractable, given the variety of databases Chimera intends to support. Use one of the following strategies instead:

Status

PROPOSED

Consequences

Architecture Decision Record: Simplification of Chimera Model

Note: this ADR supersedes some aspects of ADR-15 and ADR-16.

Context

The Chimera data model (as described in ADR-15 and ADR-16) includes the concepts of entity types in the domain data model: a defined entity type may have supertypes, and inherits all the attributes of a given supertype

This is quite expressive, and is a good fit for certain types of data stores (such as Datomic, graph databases, and some object stores.) It makes it possible to compose types, and re-use attributes effectively.

However, it leads to a number of conceptual problems, as well as implementation complexities. These issues include but are not limited to:

All of these issues can be resolved or worked around. But they add a variable amount of complexity cost to every Chimera adapter, and create a domain with large amounts of ambigous behavior that must be resolved (and which might not be discovered until writing a particular adapter.)

Decision

The concept of type extension and attribute inheritance does not provide benefits proportional to the cost.

We will remove all concept of supertypes, subtypes and attribute inheritance from Chimera's data model.

Chimera's data model will remain "flat". In order to achieve attribute reuse for data stores for which that is idiomatic (such as Datomic), multiple Chimera attributes can be mapped to a single DB-level attribute in the adapter mapping metadata.

Status

PROPOSED

Consequences