Defining a Restagraph subschema
It'd be handy to know how to define subschemas of your own, that being the point of this thing.
Why "subschema"? Because each of these documents is a subset of the final schema that's actually in effect after they've all been installed. You can upload/install any number of subschemas, each of which can build on any previously-defined resourcetypes, which are combined by the server to make up the composite schema that controls the behaviour of the API.
This page will make an awful lot more sense if you've already read the conceptual overview.
The format
In summary, a subschema is a JSON document defining a single object with three fields:
name
- Identifies the subschema being managed.
- This isn't currently reflected in the final schema, but this behaviour may be revived in the future.
resourcetypes
- A list of objects, each of which describes a single resourcetype.
- The list may be empty, if you only want to add relationships.
- If the resourcetype has already been defined in the database by a previous subschema, any additional elements will be added, but no existing elements will be changed or deleted. That is, you can add more attributes but not change or remove any, and you can add a
notes
attribute to a resourcetype that doesn't already have one. - It's not strictly an error to have multiple definitions for the same resourcetype, but it is asking for trouble.
- A list of objects, each of which describes a single resourcetype.
relationships
- A list of objects, each of which describes a single relationship with a unique name, which has a set of one or more source resourcetypes, and a set of one or more target resourcetypes.
- The list may be empty, if you only want to add resourcetypes.
- If the definition has the same name as an existing relationship, it will not update any attributes of the existing one. However, it will add any new source or target resourcetypes to the existing set, as long as it's not already defined as
any
. - The
any
resourcetype is handled in specific ways:- If you're creating a new relationship, and "any" is the sole member of the source or target list, then that will be defined as the only source or target type.
- If you're creating a new relationship, or updating an existing one, and the source list includes "any" among other resourcetypes, "any" will be ignored and the others will be added. The same goes for the target list.
- If you're updating an existing relationship with
any
as its sole source resourcetype, Restagraph will ignore any other resourcetypes that you try to add to the list. The same goes for the target list.
- A list of objects, each of which describes a single relationship with a unique name, which has a set of one or more source resourcetypes, and a set of one or more target resourcetypes.
It's entirely valid, if pointless, to define a subschema with neither resourcetypes nor relationships.
To go into more detail:
Name
This is really only used for logging purposes at the moment.
Resourcetypes
Each resourcetype is defined via several key-value pairs, most of which are mandatory:
Name
The name of the resourcetype.
It's used as an identifier in the URI when interacting with the API, and as a node label when recording a resource in the database. Because it's used in the URI, it needs to be safe for such.
- Should be in PascalCase, following the Neo4j naming conventions.
- Should be plural, in keeping with REST conventions, e.g. "People" rather than "Person".
- Must not begin with the reserved prefix 'Rg'; this is reserved for system-managed resourcetypes. Any definitions for resourcetypes whose name starts with 'Rg' will be silently discarded.
Type: string.
Dependent
A boolean value stating whether this is a "dependent" resourcetype, i.e. whether it exists only in the context of another one. E.g, a room only exists in the context of a building.
A dependent resourcetype can only be created in relationship to a "parent" type, via a relationship that is also defined as a dependent one, meaning that it defines the dependency between them. A dependent resourcetype can be dependent on another dependent one, e.g. the ceiling of a room, in a building.
Type: boolean. In accordance with Postel's Principle, acceptable values include true
, "true", "True", false
, "false" and "False". The preferred values are true
and false
.
Description
Description of what this type represents, and possibly how it's to be used.
E.g. the description
for the built-in resourcetype Organisations
is "Any kind of organisation: professional, social or other."
Type: string.
Optional; the default is null
.
Attributes
A list of attributes objects. Their main keys, i.e. those shared by all attribute types, are:
name
- The name by which this attribute is addressed, in both the Schema API and the Raw API.
- Should be URI-safe, because you can use them to filter HTTP GET requests for resources, using URL parameters.
- You can capitalise them (or not) in whatever way suits you. The maintainer has settled on PascalCase, same as for UIDs, because this is the simplest way to a readable UI.
description
- As you'd expect, this is for elaborating on what you actually meant by the
name
. - Must not begin with
RG
; as with resourcetypes starting withRg
, this will cause the attribute to be silently discarded.
- As you'd expect, this is for elaborating on what you actually meant by the
type
- Determines what kinds of value will be accepted, and which additional constraints may be added.
- Default value is
null
, which means anything goes.
Attribute types:
varchar
- Variable-length character strings; useful for short stretches of text such as one-line descriptions, or people's names.
- Intentionally named after the SQL type with the same semantics.
text
- Free-form text, up to 65 535 characters in length. 64K ought to be enough for anybody, right?
- Like "varchar", this is a deliberate reference to SQL types.
integer
- Any integer that will fit in a 64-bit representation.
boolean
- Sometimes you just need to know whether it's a yes or a no, a true or a false.
In case you're wondering why there are two types of string variable, instead of just varchar(65535), it's for the benefit of GUI developers. This is a hint that a GUI can use to decide whether to present a one-line field or a resizeable box for editing the text of a given attribute.
For some attribute types, you can define further constraints on their values.
varchar
maxlength
= the maximum acceptable length for this string. This is in octets, not characters: something to watch out for in non-Roman character sets, since we're using UTF-8 here. This is not disabled by thevalues
attribute, though it probably should be.values
= a list of valid values for this attribute, so you can effectively define it as an enum type. Note that the server doesn't try to reconcilemaxlength
with this, so I recommend not using both for the same attribute.
integer
minimum
= the lowest value accepted for this attribute. This is an inclusive value, not an exclusive one.maximum
= the highest value accepted for this attribute. Also an inclusive value.
Relationships
A list of relationships objects. Their keys are:
name
- The name of this relationship. Needs to be URL-safe.
- Should be in
SCREAMING_SNAKE_CASE
. - Must not begin with
RG_
.- This prefix is reserved for system-managed relationships.
- Try to name it in a way that describes the relationship itself without reference to the target type. Make it as general-purpose as you can, e.g. Countries CONTAIN Cities and States.
- It can be tempting to define the above relationships as
Countries/CITIES/Cities
andCountries/STATES/States
. This is almost certainly a hangover from relational-database thinking, and is a mistake in this context.
- It can be tempting to define the above relationships as
- Type: string.
source-types
- The list of resourcetypes that this relationship comes_from.
- Type: list of names, in the form of strings.
target-types
- The list of resourcetypes that this relationship goes_to.
- Type: list of names, in the form of strings.
cardinality
- How many relationships of this kind are to be permitted from an instance of the
sourcetype
, and how many to an instance of thetarget-type
. Valid options are:- "many:many"
- "1:many"
- "many:1"
- "1:1"
- Type: string.
- How many relationships of this kind are to be permitted from an instance of the
reltype
- What kind of relationship the target resource could bear to this one.
- "dependent" means the target resource can only be of a
dependent
type, but otherwise asserts no restrictions. - "self" means the
- "any" means there are no restrictions.
- "dependent" means the target resource can only be of a
- type: string.
- What kind of relationship the target resource could bear to this one.
description
- Any clarifying notes about what this relationship means.
- Type: string.
I usually define them in the order name
, source-type
, target-type
because that matches the way I think about them. In Cypher, Neo4j's native syntax, it's represented as (:source-type)-[:name]->(:target-type)
. However, the server doesn't care about the order of those keys, so use whatever works best for you and your team.
Cardinality in dependent relationships
Only two types of cardinality are permitted in a dependent relationship:
1:many
(the default)1:1
The reason for this is that it doesn't really make sense for a dependent resource to have multiple parents - in this kind of situation, it's almost certainly a primary resource with the same relationship to two other resources.
The expected use-case for a 1:1
dependent relationship is for managing a set of optional attributes. E.g, Files resources could be of any kind, so it's not practical to define that resourcetype with all possible attributes. Instead, you can define a dependent resourcetype containing the attributes for each given file format: JPEG images, Ogg Vorbis audio, etc. Because it only makes sense to have one such dependent resource for each file, and each one only makes sense in the context of one specific file, the 1:1
cardinality is a natural fit here.
Example
Let's lead with an example, for adding books and authors to the schema:
{
"name": "example_schema",
"resourcetypes": [
{
"name": "Books",
"dependent": "false",
"notes": "Stuff printed on the corpses of trees.",
"attributes": [
{
"name": "description",
"type": "text",
"description": "",
"values": null
},
{
"name": "ISBN",
"type": "varchar",
"description": "International Standard Book Number. Should be a 10- or 13-digit number, optionally interspersed with hyphens.",
"maxlength": 17
}
]
}
],
"relationships": [
{
"name": "AUTHOR",
"source-types": ["Books"],
"target-types": ["People"],
"cardinality": "many:many",
"reltype": "any",
"notes": "Link from the book to its author."
},
{
"name": "AUTHOR_OF",
"source-types": ["People"],
"target-types": ["Books"],
"cardinality": "many:many",
"reltype": "any",
"notes": "Link to a book this person wrote."
},
]
}
Note that it assumes the existence of the People
resourcetype. This is defined in the core schema, so you know it'll always be there. However, you can equally rely on resourcetypes created in other schemas, as long as they were installed before this one.
The server installs all resourcetypes defined in a subschema before trying to create the relationships.
Note: reference errors are handled quietly. If the schema defines a relationship that refers to a resourcetype not already defined, it will log the fact and move on. So it's fine to refer to resourcetypes defined in other subschemas (in fact, it's positively encouraged) but it is important to make sure you a)only make backward references, not forward ones, and b)upload subschemas according to the order of their dependencies.
Notes about the format and naming conventions
- It's recommended that you follow Neo4j naming conventions:
- Names of resourcetypes and relationships should be in
PascalCase
, as they are created as Neo4j labels. - Relationship names should be in
SCREAMING_SNAKE_CASE
.- They must not start with
RG_
- this prefix is reserved for system-managed relationships.
- They must not start with
- Names of resourcetypes and relationships should be in
- Booleans must be either
true
orfalse
, thoughnull
is also accepted as an equivalent tofalse
. - The
dependent
attribute of a resource indicates whether it has independent existence (dependent=false
, the default) or whether it only exists in the context of a parent resource. E.g, an IP address configured on an interface doesn't exist independently - the only reason not to define it as an attribute is that an interface may have any number of addresses configured on it.- This is an optional attribute; it defaults to
false
.
- This is an optional attribute; it defaults to
- The
reltype
attribute of a relationship indicates whether the target resource is dependent on the source resource, i.e. is a child to that parent resource.- This means that if
reltype
is "dependent" in a relationship type, the target resource will be created along with it. - A dependent resource can only have one parent resource.
- A dependent relationship cannot be created with a non-dependent target resource.
- A non-dependent relationship can be created to a dependent target resource, if the target already exists.
- This means that if
- The
description
attribute of a resourcetype or relationship, and thedescription
of an attribute, is optional. If you omit it altogether, or specify it asnull
or an empty string, it will not be added to the resourcetype definition in the database, and will be effectively a null or empty string when this is queried. - The
values
attribute on an attribute is optional, and should only be used when you have a specific reason to constrain it to a fixed set of values. If you're considering using it, think about whether it makes more sense to use a separate resourcetype, enabling you to add/remove values in future.