******************
Essential Concepts
******************

Data modelling and schema languages is a complex business – that's why
JSON folks try to avoid it like plague. We believe, however, that data
modelling greatly helps interoperability and unification of management
interfaces. From a programmer's perspective, data models not only
document the data that our programs process, but they can be also be
used for automating many difficult and/or laborious tasks such as
validating data or building user interfaces.

YANG tries to be easy to read and understandable. Very often, people
are able to understand what YANG data models mean without having read
YANG language specification. Nevertheless, some concepts and rules in
YANG are less than intuitive, and some are perhaps even slightly
peculiar. This section gives an overview of fundamental YANG concepts
and terms the are needed for understanding the documentation of the
*Yangson* library. However, it is no substitute for studying YANG
documentation, especially [RFC7950]_ and [RFC8407]_.

Another factor that may confuse users of the *Yangson* library is
conflicting terminology: some terms, such as *module* or *instance*,
are used, with different meanings, in both Python and YANG. Therefore,
when reading the following sections, it is important to distinguish
whether a given text discusses programming language or data modelling
stuff.

Data Models
===========

Data models define and describe some data.  A complete YANG data
model usually corresponds to a particular device, physical virtual,
with a dedicated configuration and management system.

YANG distinguishes four different sorts of data:

* configuration,
* state data and statistics,
* parameters of RPC (remote procedure call) operations,
* asynchronous event notifications.

YANG assumes that data of all the sorts listed above are
hierarchically organised, i.e. they form a tree. For example, data
encoded in JSON can be modelled nicely with YANG.

For the specification of a data model, YANG uses both formal means and
textual descriptions that may specify additional rules and
constraints. Such textual descriptions are considered an integral part
of the data model and cannot be ignored!

The formal means include:

* The hierarchical structure of data is described via containment of
  YANG. For example, a container node is defined through
  the **container** statement, and child nodes of this container are
  defined through its substatements.

* YANG also allows for defining which nodes are mandatory and which
  are optional. For lists (sequences of entries of the same type), it
  is possible to specify the minimum and maximum number of entries.

* All scalar parameters have a type. YANG offers a wide variety of
  built-in types, such as *string*, *boolean* and *int32*. It is also
  possible to define *derived* types by taking an existing type
  (built-in or derived), giving it a new name, and optionally
  specifying some restrictions. For example, a restriction that may be
  applied to the *string* type is the *pattern* statement that
  specifies a regular expression that strings belonging to that type
  must match.

* For scalar parameters, it is also possible to define a default value.

For the *Yangson* library, a fully specified data model is the
baseline from which any further processing starts. That's why
operations with isolated YANG modules are not “officially” supported,
i.e. not available through the public API.

Trees and Nodes
===============

In most practical applications of the *Yangson* library, a programmer
needs to work with two types of trees:

* *data tree* contains real data such as configuration, state data,
  RPC input/output parameters, or notifications. For our purposes, a
  data tree is a JSON document, or a parsed in-memory representation
  thereof.

* *schema tree* allows us to decide which data trees are valid and
  which are not.

Each node in the data tree corresponds to a *data node* in the schema
tree. This looks confusing but in fact it is quite logical: data nodes
are special schema nodes that have counterparts in the data tree.
There are other schema nodes, namely *choice* and *case*, that don't
have this property – they are used in the schema for specifying
possible alternatives of which only one can appear in the data tree.

YANG Modules
============

YANG data models consist of *modules*. Each module defines the schema
for some (usually related) parts of the data trees. Typically, a YANG
module covers a certain subsystem or function. Every module defines a
namespace that needs to be locally unique in a given data model. In
*Yangson*, the namespace is identified by the YANG module name.

YANG modules may also offload parts of their contents
into *submodules*. One can then have one (main) module and any number
of submodules that are included from the main module. The main module
and all its submodules share the same namespace identified by the main
module name.

In order to create a particular data model, one has to decide which
YANG modules will become part of it. The selection is recorded
in *YANG library* data [RFC7895]_. And since YANG modules may exist in
multiple revisions, a revision also needs to be specified for each
module.

YANG also offers two mechanisms that allow for finer-grain control of
data model content:

* *features* are essentially boolean flags that indicate whether an
  optional subsystem or function is supported or not. Parts of the
  schema tree can be labelled as being dependent on a feature: such a
  part exists only if the feature is supported.

* *deviations* allow for specifying that a given implementation
  doesn't exactly follow what's written in a YANG module. In effect, a
  deviation can be understood as a “patch” of the original YANG
  module.

Support for individual features and/or deviations are also indicated
in YANG library data.

Content Types
=============

YANG distinguishes configuration from state data (see sec. `4.2.3`_ in
[RFC7950]_), and the **config** statement can be used to specify to
which of the two categories a given schema node belongs. A schema node
whose definition doesn't contain the **config** statement inherits
this property from its parent schema node. State data may be embedded
inside configuration, but not vice versa. Finally, for schemas of RPC
operations, actions and notifications, the distinction between
configuration and state data makes no sense at all, and **config**
statements, if present, are ignored there.

The approach adopted by the *Yangson* library is to assign a content
type to every :class:`~.schemadata.SchemaNode`. The values are members of
the enumeration :class:`~.enumerations.ContentType`:

* :attr:`~.ContentType.config`
* :attr:`~.ContentType.nonconfig`
* :attr:`~.ContentType.all`

All non-terminal schema nodes (**container**, **list**, **choice**
and **case**) that represent configuration have the content type
:attr:`~ContentType.all` because they may have both configuration and
state data nodes as descendants.

Content type of terminal data nodes (**leaf**, **leaf-list**, **anydata** and
**anyxml**) reflects their **config**, i.e. it is either
:attr:`~ContentType.config` or :attr:`~ContentType.nonconfig`.

Other nodes always have content type :attr:`~ContentType.nonconfig`.

The method :meth:`.SchemaNode.content_type` returns the content type
of the receiver.

The above rules allow for a straightforward implementation of content
filtering in RESTCONF based on the ``content`` query parameter, see
sec. `4.8.1`_ in [RFC8040]_.

.. _4.2.3: https://rfc-editor.org/rfc/rfc7950.html#section-4.2.3
.. _4.8.1: https://rfc-editor.org/rfc/rfc8040.html#section-4.8.1