diff --git a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
index 4e89d8aa74..47eca8948a 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
@@ -1,1395 +1,1400 @@
= Advanced Resource Types =
[[group-resources]]
== Groups - A Syntactic Shortcut ==
indexterm:[Group Resources]
indexterm:[Resource,Groups]
One of the most common elements of a cluster is a set of resources
that need to be located together, start sequentially, and stop in the
reverse order. To simplify this configuration, we support the concept
of groups.
.A group of two primitive resources
======
[source,XML]
-------
-------
======
Although the example above contains only two resources, there is no
limit to the number of resources a group can contain. The example is
also sufficient to explain the fundamental properties of a group:
* Resources are started in the order they appear in (+Public-IP+
first, then +Email+)
* Resources are stopped in the reverse order to which they appear in
(+Email+ first, then +Public-IP+)
If a resource in the group can't run anywhere, then nothing after that
is allowed to run, too.
* If +Public-IP+ can't run anywhere, neither can +Email+;
* but if +Email+ can't run anywhere, this does not affect +Public-IP+
in any way
The group above is logically equivalent to writing:
.How the cluster sees a group resource
======
[source,XML]
-------
-------
======
Obviously as the group grows bigger, the reduced configuration effort
can become significant.
Another (typical) example of a group is a DRBD volume, the filesystem
mount, an IP address, and an application that uses them.
=== Group Properties ===
.Properties of a Group Resource
[width="95%",cols="3m,5<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the group
indexterm:[id,Group Resource Property]
indexterm:[Resource,Group Property,id]
|=========================================================
=== Group Options ===
Groups inherit the +priority+, +target-role+, and +is-managed+ properties
from primitive resources. See <> for information about
those properties.
=== Group Instance Attributes ===
Groups have no instance attributes. However, any that are set for the group
object will be inherited by the group's children.
=== Group Contents ===
Groups may only contain a collection of cluster resources (see
<>). To refer to a child of a group resource, just use
the child's +id+ instead of the group's.
=== Group Constraints ===
Although it is possible to reference a group's children in
constraints, it is usually preferable to reference the group itself.
.Some constraints involving groups
======
[source,XML]
-------
-------
======
=== Group Stickiness ===
indexterm:[resource-stickiness,Groups]
Stickiness, the measure of how much a resource wants to stay where it
is, is additive in groups. Every active resource of the group will
contribute its stickiness value to the group's total. So if the
default +resource-stickiness+ is 100, and a group has seven members,
five of which are active, then the group as a whole will prefer its
current location with a score of 500.
[[s-resource-clone]]
== Clones - Resources That Get Active on Multiple Hosts ==
indexterm:[Clone Resources]
indexterm:[Resource,Clones]
Clones were initially conceived as a convenient way to start multiple
instances of an IP address resource and have them distributed throughout the
cluster for load balancing. They have turned out to quite useful for
a number of purposes including integrating with the Distributed Lock Manager
(used by many cluster filesystems), the fencing subsystem, and OCFS2.
You can clone any resource, provided the resource agent supports it.
Three types of cloned resources exist:
* Anonymous
* Globally unique
* Stateful
'Anonymous' clones are the simplest. These behave
completely identically everywhere they are running. Because of this,
there can be only one copy of an anonymous clone active per machine.
'Globally unique' clones are distinct entities. A copy of the clone
running on one machine is not equivalent to another instance on
another node, nor would any two copies on the same node be
equivalent.
'Stateful' clones are covered later in <>.
.A clone of an LSB resource
======
[source,XML]
-------
-------
======
=== Clone Properties ===
.Properties of a Clone Resource
[width="95%",cols="3m,5<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the clone
indexterm:[id,Clone Property]
indexterm:[Clone,Property,id]
|=========================================================
=== Clone Options ===
Options inherited from <> resources:
+priority, target-role, is-managed+
.Clone-specific configuration options
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|clone-max
|number of nodes in cluster
|How many copies of the resource to start
indexterm:[clone-max,Clone Option]
indexterm:[Clone,Option,clone-max]
|clone-node-max
|1
|How many copies of the resource can be started on a single node
indexterm:[clone-node-max,Clone Option]
indexterm:[Clone,Option,clone-node-max]
|clone-min
|1
|Require at least this number of clone instances to be runnable before allowing
resources depending on the clone to be runnable '(since 1.1.14)'
indexterm:[clone-min,Clone Option]
indexterm:[Clone,Option,clone-min]
|notify
|true
|When stopping or starting a copy of the clone, tell all the other
copies beforehand and again when the action was successful. Allowed values:
+false+, +true+
indexterm:[notify,Clone Option]
indexterm:[Clone,Option,notify]
|globally-unique
|false
|Does each copy of the clone perform a different function? Allowed
values: +false+, +true+
indexterm:[globally-unique,Clone Option]
indexterm:[Clone,Option,globally-unique]
|ordered
|false
|Should the copies be started in series (instead of in
parallel)? Allowed values: +false+, +true+
indexterm:[ordered,Clone Option]
indexterm:[Clone,Option,ordered]
|interleave
|false
|If this clone depends on another clone via an ordering constraint,
is it allowed to start after the local instance of the other clone
starts, rather than wait for all instances of the other clone to start?
Allowed values: +false+, +true+
indexterm:[interleave,Clone Option]
indexterm:[Clone,Option,interleave]
|=========================================================
=== Clone Instance Attributes ===
Clones have no instance attributes; however, any that are set here
will be inherited by the clone's children.
=== Clone Contents ===
Clones must contain exactly one primitive or group resource.
[WARNING]
You should never reference the name of a clone's child.
If you think you need to do this, you probably need to re-evaluate your design.
=== Clone Constraints ===
In most cases, a clone will have a single copy on each active cluster
node. If this is not the case, you can indicate which nodes the
cluster should preferentially assign copies to with resource location
constraints. These constraints are written no differently from those
for primitive resources except that the clone's +id+ is used.
.Some constraints involving clones
======
[source,XML]
-------
-------
======
Ordering constraints behave slightly differently for clones. In the
example above, +apache-stats+ will wait until all copies of +apache-clone+
that need to be started have done so before being started itself.
Only if _no_ copies can be started will +apache-stats+ be prevented
from being active. Additionally, the clone will wait for
+apache-stats+ to be stopped before stopping itself.
Colocation of a primitive or group resource with a clone means that
the resource can run on any machine with an active copy of the clone.
The cluster will choose a copy based on where the clone is running and
the resource's own location preferences.
Colocation between clones is also possible. If one clone +A+ is colocated
with another clone +B+, the set of allowed locations for +A+ is limited to
nodes on which +B+ is (or will be) active. Placement is then performed
normally.
[[s-clone-stickiness]]
=== Clone Stickiness ===
indexterm:[resource-stickiness,Clones]
To achieve a stable allocation pattern, clones are slightly sticky by
default. If no value for +resource-stickiness+ is provided, the clone
will use a value of 1. Being a small value, it causes minimal
disturbance to the score calculations of other resources but is enough
to prevent Pacemaker from needlessly moving copies around the cluster.
[NOTE]
====
For globally unique clones, this may result in multiple instances of the
clone staying on a single node, even after another eligible node becomes
active (for example, after being put into standby mode then made active again).
If you do not want this behavior, specify a +resource-stickiness+ of 0
for the clone temporarily and let the cluster adjust, then set it back
to 1 if you want the default behavior to apply again.
====
=== Clone Resource Agent Requirements ===
Any resource can be used as an anonymous clone, as it requires no
additional support from the resource agent. Whether it makes sense to
do so depends on your resource and its resource agent.
Globally unique clones do require some additional support in the
resource agent. In particular, it must only respond with
+$\{OCF_SUCCESS}+ if the node has that exact instance active. All
other probes for instances of the clone should result in
+$\{OCF_NOT_RUNNING}+ (or one of the other OCF error codes if
they are failed).
Individual instances of a clone are identified by appending a colon and a
numerical offset, e.g. +apache:2+.
Resource agents can find out how many copies there are by examining
the +OCF_RESKEY_CRM_meta_clone_max+ environment variable and which
copy it is by examining +OCF_RESKEY_CRM_meta_clone+.
The resource agent must not make any assumptions (based on
+OCF_RESKEY_CRM_meta_clone+) about which numerical instances are active. In
particular, the list of active copies will not always be an unbroken
sequence, nor always start at 0.
==== Clone Notifications ====
Supporting notifications requires the +notify+ action to be
implemented. If supported, the notify action will be passed a
number of extra variables which, when combined with additional
context, can be used to calculate the current state of the cluster and
what is about to happen to it.
.Environment variables supplied with Clone notify actions
[width="95%",cols="5,3<",options="header",align="center"]
|=========================================================
|Variable
|Description
|OCF_RESKEY_CRM_meta_notify_type
|Allowed values: +pre+, +post+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,type]
indexterm:[type,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_operation
|Allowed values: +start+, +stop+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,operation]
indexterm:[operation,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_resource
|Resources to be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_resource]
indexterm:[start_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_resource
|Resources to be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_resource]
indexterm:[stop_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_resource
|Resources that are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_resource]
indexterm:[active_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_inactive_resource
|Resources that are not running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,inactive_resource]
indexterm:[inactive_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_uname
|Nodes on which resources will be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_uname]
indexterm:[start_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_uname
|Nodes on which resources will be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_uname]
indexterm:[stop_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_uname
|Nodes on which resources are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_uname]
indexterm:[active_uname,Notification Environment Variable]
|=========================================================
The variables come in pairs, such as
+OCF_RESKEY_CRM_meta_notify_start_resource+ and
+OCF_RESKEY_CRM_meta_notify_start_uname+ and should be treated as an
array of whitespace-separated elements.
+OCF_RESKEY_CRM_meta_notify_inactive_resource+ is an exception as the
matching +uname+ variable does not exist since inactive resources
are not running on any node.
Thus in order to indicate that +clone:0+ will be started on +sles-1+,
+clone:2+ will be started on +sles-3+, and +clone:3+ will be started
on +sles-2+, the cluster would set
.Notification variables
======
[source,Bash]
-------
OCF_RESKEY_CRM_meta_notify_start_resource="clone:0 clone:2 clone:3"
OCF_RESKEY_CRM_meta_notify_start_uname="sles-1 sles-3 sles-2"
-------
======
==== Proper Interpretation of Notification Environment Variables ====
.Pre-notification (stop):
* Active resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (stop) / Pre-notification (start):
* Active resources
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Inactive resources
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (start):
* Active resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
[[s-resource-multistate]]
== Multi-state - Resources That Have Multiple Modes ==
indexterm:[Multi-state Resources]
indexterm:[Resource,Multi-state]
Multi-state resources are a specialization of clone resources; please
ensure you understand <> before continuing!
Multi-state resources allow the instances to be in one of two operating modes
(called 'roles'). The roles are called 'master' and 'slave', but can mean
whatever you wish them to mean. The only limitation is that when an instance is
started, it must come up in the slave role.
=== Multi-state Properties ===
.Properties of a Multi-State Resource
[width="95%",cols="3m,5<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|Your name for the multi-state resource
indexterm:[id,Multi-State Property]
indexterm:[Multi-State,Property,id]
|=========================================================
=== Multi-state Options ===
Options inherited from <> resources:
+priority+, +target-role+, +is-managed+
Options inherited from <> resources:
+clone-max+, +clone-node-max+, +notify+, +globally-unique+, +ordered+,
+interleave+
.Multi-state-specific resource configuration options
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|master-max
|1
|How many copies of the resource can be promoted to the +master+ role
indexterm:[master-max,Multi-State Option]
indexterm:[Multi-State,Option,master-max]
|master-node-max
|1
|How many copies of the resource can be promoted to the +master+ role on
a single node
indexterm:[master-node-max,Multi-State Option]
indexterm:[Multi-State,Option,master-node-max]
|=========================================================
=== Multi-state Instance Attributes ===
Multi-state resources have no instance attributes; however, any that
are set here will be inherited by a master's children.
=== Multi-state Contents ===
Masters must contain exactly one primitive or group resource.
[WARNING]
You should never reference the name of a master's child.
If you think you need to do this, you probably need to re-evaluate your design.
=== Monitoring Multi-State Resources ===
The usual monitor actions are insufficient to monitor a multi-state resource,
because pacemaker needs to verify not only that the resource is active, but
also that its actual role matches its intended one.
Define two monitoring actions: the usual one will cover the slave role,
and an additional one with +role="master"+ will cover the master role.
.Monitoring both states of a multi-state resource
======
[source,XML]
-------
-------
======
[IMPORTANT]
===========
It is crucial that _every_ monitor operation has a different interval!
Pacemaker currently differentiates between operations
only by resource and interval; so if (for example) a master/slave resource had
the same monitor interval for both roles, Pacemaker would ignore the
role when checking the status -- which would cause unexpected return
codes, and therefore unnecessary complications.
===========
=== Multi-state Constraints ===
In most cases, multi-state resources will have a single copy on each
active cluster node. If this is not the case, you can indicate which
nodes the cluster should preferentially assign copies to with resource
location constraints. These constraints are written no differently from
those for primitive resources except that the master's +id+ is used.
When considering multi-state resources in constraints, for most
purposes it is sufficient to treat them as clones. The exception is
that the +first-action+ and/or +then-action+ fields for ordering constraints
may be set to +promote+ or +demote+ to constrain the master role,
and colocation constraints may contain +rsc-role+ and/or +with-rsc-role+
fields.
.Additional colocation constraint options for multi-state resources
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|rsc-role
|Started
|An additional attribute of colocation constraints that specifies the
role that +rsc+ must be in. Allowed values: +Started+, +Master+,
+Slave+.
indexterm:[rsc-role,Ordering Constraints]
indexterm:[Constraints,Ordering,rsc-role]
|with-rsc-role
|Started
|An additional attribute of colocation constraints that specifies the
role that +with-rsc+ must be in. Allowed values: +Started+,
+Master+, +Slave+.
indexterm:[with-rsc-role,Ordering Constraints]
indexterm:[Constraints,Ordering,with-rsc-role]
|=========================================================
.Constraints involving multi-state resources
======
[source,XML]
-------
-------
======
In the example above, +myApp+ will wait until one of the database
copies has been started and promoted to master before being started
itself on the same node. Only if no copies can be promoted will +myApp+ be
prevented from being active. Additionally, the cluster will wait for
+myApp+ to be stopped before demoting the database.
Colocation of a primitive or group resource with a multi-state
resource means that it can run on any machine with an active copy of
the multi-state resource that has the specified role (+master+ or
+slave+). In the example above, the cluster will choose a location based on
where database is running as a +master+, and if there are multiple
+master+ instances it will also factor in +myApp+'s own location
preferences when deciding which location to choose.
Colocation with regular clones and other multi-state resources is also
possible. In such cases, the set of allowed locations for the +rsc+
clone is (after role filtering) limited to nodes on which the
+with-rsc+ multi-state resource is (or will be) in the specified role.
Placement is then performed as normal.
==== Using Multi-state Resources in Colocation Sets ====
.Additional colocation set options relevant to multi-state resources
[width="95%",cols="1m,1,6<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|role
|Started
|The role that 'all members' of the set must be in. Allowed values: +Started+, +Master+,
+Slave+.
indexterm:[role,Ordering Constraints]
indexterm:[Constraints,Ordering,role]
|=========================================================
In the following example +B+'s master must be located on the same node as +A+'s master.
Additionally resources +C+ and +D+ must be located on the same node as +A+'s
and +B+'s masters.
.Colocate C and D with A's and B's master instances
======
[source,XML]
-------
-------
======
==== Using Multi-state Resources in Ordering Sets ====
.Additional ordered set options relevant to multi-state resources
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|action
|value of +first-action+
|An additional attribute of ordering constraint sets that specifies the
action that applies to 'all members' of the set. Allowed
values: +start+, +stop+, +promote+, +demote+.
indexterm:[action,Ordering Constraints]
indexterm:[Constraints,Ordering,action]
|=========================================================
.Start C and D after first promoting A and B
======
[source,XML]
-------
-------
======
In the above example, +B+ cannot be promoted to a master role until +A+ has
been promoted. Additionally, resources +C+ and +D+ must wait until +A+ and +B+
have been promoted before they can start.
=== Multi-state Stickiness ===
indexterm:[resource-stickiness,Multi-State]
As with regular clones, multi-state resources are
slightly sticky by default. See <> for details.
=== Which Resource Instance is Promoted ===
During the start operation, most resource agents should call
the `crm_master` utility. This tool automatically detects both the
resource and host and should be used to set a preference for being
promoted. Based on this, +master-max+, and +master-node-max+, the
instance(s) with the highest preference will be promoted.
An alternative is to create a location constraint that
indicates which nodes are most preferred as masters.
.Explicitly preferring node1 to be promoted to master
======
[source,XML]
-------
-------
======
=== Requirements for Multi-state Resource Agents ===
Since multi-state resources are an extension of cloned resources, all
the requirements for resource agents that support clones are also requirements
for resource agents that support multi-state resources.
Additionally, multi-state resources require two extra
actions, +demote+ and +promote+, which are responsible for
changing the state of the resource. Like +start+ and +stop+, they
should return +$\{OCF_SUCCESS}+ if they completed successfully or a
relevant error code if they did not.
The states can mean whatever you wish, but when the resource is
started, it must come up in the mode called +slave+. From there the
cluster will decide which instances to promote to +master+.
In addition to the clone requirements for monitor actions, agents must
also _accurately_ report which state they are in. The cluster relies
on the agent to report its status (including role) accurately and does
not indicate to the agent what role it currently believes it to be in.
.Role implications of OCF return codes
[width="95%",cols="1,1<",options="header",align="center"]
|=========================================================
|Monitor Return Code
|Description
|OCF_NOT_RUNNING
|Stopped
indexterm:[Return Code,OCF_NOT_RUNNING]
|OCF_SUCCESS
|Running (Slave)
indexterm:[Return Code,OCF_SUCCESS]
|OCF_RUNNING_MASTER
|Running (Master)
indexterm:[Return Code,OCF_RUNNING_MASTER]
|OCF_FAILED_MASTER
|Failed (Master)
indexterm:[Return Code,OCF_FAILED_MASTER]
|Other
|Failed (Slave)
|=========================================================
==== Multi-state Notifications ====
Like clones, supporting notifications requires the +notify+ action to
be implemented. If supported, the notify action will be passed a
number of extra variables which, when combined with additional
context, can be used to calculate the current state of the cluster and
what is about to happen to it.
.Environment variables supplied with multi-state notify actions footnote:[Emphasized variables are specific to +Master+ resources, and all behave in the same manner as described for Clone resources.]
[width="95%",cols="5,3<",options="header",align="center"]
|=========================================================
|Variable
|Description
|OCF_RESKEY_CRM_meta_notify_type
|Allowed values: +pre+, +post+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,type]
indexterm:[type,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_operation
|Allowed values: +start+, +stop+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,operation]
indexterm:[operation,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_resource
|Resources that are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_resource]
indexterm:[active_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_inactive_resource
|Resources that are not running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,inactive_resource]
indexterm:[inactive_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_master_resource_
|Resources that are running in +Master+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,master_resource]
indexterm:[master_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_slave_resource_
|Resources that are running in +Slave+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,slave_resource]
indexterm:[slave_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_resource
|Resources to be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_resource]
indexterm:[start_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_resource
|Resources to be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_resource]
indexterm:[stop_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_promote_resource_
|Resources to be promoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,promote_resource]
indexterm:[promote_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_demote_resource_
|Resources to be demoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,demote_resource]
indexterm:[demote_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_uname
|Nodes on which resources will be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_uname]
indexterm:[start_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_uname
|Nodes on which resources will be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_uname]
indexterm:[stop_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_promote_uname_
|Nodes on which resources will be promoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,promote_uname]
indexterm:[promote_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_demote_uname_
|Nodes on which resources will be demoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,demote_uname]
indexterm:[demote_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_uname
|Nodes on which resources are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_uname]
indexterm:[active_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_master_uname_
|Nodes on which resources are running in +Master+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,master_uname]
indexterm:[master_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_slave_uname_
|Nodes on which resources are running in +Slave+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,slave_uname]
indexterm:[slave_uname,Notification Environment Variable]
|=========================================================
==== Proper Interpretation of Multi-state Notification Environment Variables ====
.Pre-notification (demote):
* +Active+ resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* +Master+ resources: +$OCF_RESKEY_CRM_meta_notify_master_resource+
* +Slave+ resources: +$OCF_RESKEY_CRM_meta_notify_slave_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (demote) / Pre-notification (stop):
* +Active+ resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources: +$OCF_RESKEY_CRM_meta_notify_slave_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
.Post-notification (stop) / Pre-notification (start)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (start) / Pre-notification (promote)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (promote)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
[[s-resource-bundle]]
== Bundles - Isolated Environments ==
indexterm:[bundle]
indexterm:[Resource,bundle]
indexterm:[Docker,bundle]
Pacemaker (version 1.1.17 and later) supports a special syntax for combining an
isolated environment with the infrastructure support that it needs: the
'bundle'.
The only isolation technology currently supported by Pacemaker bundles
is https://www.docker.com/[Docker] containers.
footnote:[Docker is a trademark of Docker, Inc. No endorsement by or
association with Docker, Inc. is implied.]
.A bundle for a containerized web server
====
[source,XML]
----
----
====
=== Bundle Properties ===
.Properties of a Bundle
[width="95%",cols="3m,5<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the bundle (required)
indexterm:[id,bundle]
indexterm:[bundle,Property,id]
|description
|Arbitrary text (not used by Pacemaker)
indexterm:[description,bundle]
indexterm:[bundle,Property,description]
|=========================================================
=== Docker Properties ===
A bundle must contain exactly one ++ element.
Before configuring a bundle in Pacemaker, the user must install Docker and
supply a fully configured Docker image on every node allowed to run the bundle.
Pacemaker will create an implicit +ocf:heartbeat:docker+ resource to manage
a bundle's Docker container.
.Properties of a Bundle's Docker Element
[width="95%",cols="3m,4,5<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|image
|
|Docker image tag (required)
indexterm:[image,Docker]
indexterm:[Docker,Property,image]
|replicas
|Value of +masters+ if that is positive, else 1
|A positive integer specifying the number of container instances to launch
indexterm:[replicas,Docker]
indexterm:[Docker,Property,replicas]
|replicas-per-host
|1
|A positive integer specifying the number of container instances allowed to run
on a single node
indexterm:[replicas-per-host,Docker]
indexterm:[Docker,Property,replicas-per-host]
|masters
|0
|A non-negative integer that, if positive, indicates that the containerized
service should be treated as a multistate service, with this many replicas
allowed to run the service in the master role
indexterm:[masters,Docker]
indexterm:[Docker,Property,masters]
|network
|
|If specified, this will be passed to +docker run+ as the
https://docs.docker.com/engine/reference/run/#network-settings[network setting]
for the Docker container.
indexterm:[network,Docker]
indexterm:[Docker,Property,network]
|run-command
|`/usr/sbin/pacemaker_remoted` if bundle contains a +primitive+, otherwise none
|This command will be run inside the container when launching it ("PID 1"). If
the bundle contains a +primitive+, this command 'must' start pacemaker_remoted
(but could, for example, be a script that does other stuff, too).
indexterm:[network,Docker]
indexterm:[Docker,Property,network]
|options
|
|Extra command-line options to pass to `docker run`
indexterm:[options,Docker]
indexterm:[Docker,Property,options]
|=========================================================
=== Bundle Network Properties ===
A bundle may optionally contain one ++ element.
indexterm:[bundle,network]
.Properties of a Bundle's Network Element
[width="95%",cols="2m,1,4<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|ip-range-start
|
|If specified, Pacemaker will create an implicit +ocf:heartbeat:IPaddr2+
resource for each container instance, starting with this IP address,
using up to +replicas+ sequential addresses. These addresses can be used
from the host's network to reach the service inside the container, though
it is not visible within the container itself. Only IPv4 addresses are
currently supported.
indexterm:[ip-range-start,network]
indexterm:[network,Property,ip-range-start]
|host-netmask
|32
|If +ip-range-start+ is specified, the IP addresses are created with this
CIDR netmask (as a number of bits).
indexterm:[host-netmask,network]
indexterm:[network,Property,host-netmask]
|host-interface
|
|If +ip-range-start+ is specified, the IP addresses are created on this
host interface (by default, it will be determined from the IP address).
indexterm:[host-interface,network]
indexterm:[network,Property,host-interface]
|control-port
|
|If the bundle contains a +primitive+, the cluster will use this integer TCP
port for communication with Pacemaker Remote inside the container. This takes
precedence over the value of any PCMK_remote_port environment variable set
in the container image. This can allow a +primitive+ to be specified without
using +ip-range-start+ (in which case +replicas-per-host+ must be 1), or
allow a bundle to run on a Pacemaker Remote node that is already listening on
the default port.
indexterm:[control-port,network]
indexterm:[network,Property,control-port]
|=========================================================
[NOTE]
====
If +ip-range-start+ is used, Pacemaker will automatically ensure that
+/etc/hosts+ inside the containers has entries for each replica and its
assigned IP. Replicas are named by the bundle id plus a dash and an integer
counter starting with zero. For example, if a bundle named +httpd-bundle+ has
+replicas=2+, its containers will be named +httpd-bundle-0+ and
+httpd-bundle-1+.
====
Additionally, a ++ element may optionally contain one or more
++ elements.
indexterm:[bundle,network,port-mapping]
.Properties of a Bundle's Port-Mapping Element
[width="95%",cols="2m,1,4<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the port mapping (required)
indexterm:[id,port-mapping]
indexterm:[port-mapping,Property,id]
|port
|
|If this is specified, connections to this TCP port number on the host network
(on the container's assigned IP address, if +ip-range-start+ is specified)
will be forwarded to the container network. Exactly one of +port+ or +range+
must be specified in a +port-mapping+.
indexterm:[port,port-mapping]
indexterm:[port-mapping,Property,port]
|internal-port
|value of +port+
|If +port+ and this are specified, connections to +port+ on the host's network
will be forwarded to this port on the container network.
indexterm:[internal-port,port-mapping]
indexterm:[port-mapping,Property,internal-port]
|range
|
|If this is specified, connections to these TCP port numbers (expressed as
'first_port'-'last_port') on the host network (on the container's assigned IP
address, if +ip-range-start+ is specified) will be forwarded to the same ports
in the container network. Exactly one of +port+ or +range+ must be specified
in a +port-mapping+.
indexterm:[range,port-mapping]
indexterm:[port-mapping,Property,range]
|=========================================================
[NOTE]
====
If the bundle contains a +primitive+, Pacemaker will automatically map the
+control-port+, so it is not necessary to specify that port in a
+port-mapping+.
====
=== Bundle Storage Properties ===
A bundle may optionally contain one ++ element. A ++ element
has no properties of its own, but may contain one or more ++
elements.
indexterm:[bundle,storage,storage-mapping]
.Properties of a Bundle's Storage-Mapping Element
[width="95%",cols="2m,1,4<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the storage mapping (required)
indexterm:[id,storage-mapping]
indexterm:[storage-mapping,Property,id]
|source-dir
|
|The absolute path on the host's filesystem that will be mapped into the
container. Exactly one of +source-dir+ and +source-dir-root+ must be specified
in a +storage-mapping+.
indexterm:[source-dir,storage-mapping]
indexterm:[storage-mapping,Property,source-dir]
|source-dir-root
|
|The start of a path on the host's filesystem that will be mapped into the
container, using a different subdirectory on the host for each container
instance. Exactly one of +source-dir+ and +source-dir-root+ must be specified
in a +storage-mapping+.
indexterm:[source-dir-root,storage-mapping]
indexterm:[storage-mapping,Property,source-dir-root]
|target-dir
|
|The path name within the container where the host storage will be mapped
(required)
indexterm:[target-dir,storage-mapping]
indexterm:[storage-mapping,Property,target-dir]
|options
|
|File system mount options to use when mapping the storage
indexterm:[options,storage-mapping]
indexterm:[storage-mapping,Property,options]
|=========================================================
[NOTE]
====
If the bundle contains a +primitive+,
Pacemaker will automatically map the equivalent of
+source-dir=/etc/pacemaker/authkey target-dir=/etc/pacemaker/authkey+
and +source-dir-root=/var/log/pacemaker/bundles target-dir=/var/log+ into the
container, so it is not necessary to specify those paths in a
+storage-mapping+. Newer versions of +ocf:heartbeat:docker+ will automatically
create the source directories if they do not exist, but the user may want to
ensure they exist beforehand.
====
=== Bundle Primitive ===
A bundle may optionally contain one ++ resource
(see <>). The primitive may have operations,
instance attributes and meta-attributes defined, as usual.
If a bundle contains a primitive resource, the container image must include
the Pacemaker Remote daemon, and at least one of +ip-range-start+ or
+control-port+ must be configured in the bundle. Pacemaker will create an
implicit +ocf:pacemaker:remote+ resource for the connection, launch
Pacemaker Remote within the container, and monitor and manage the primitive
resource via Pacemaker Remote.
If the bundle has more than one container instance (replica), the primitive
resource will function as an implicit clone (see <>) --
a multistate clone if the bundle has +masters+ greater than zero
(see <>).
[IMPORTANT]
====
Containers in bundles with a +primitive+ must have an accessible networking
environment, so that Pacemaker on the cluster nodes can contact
Pacemaker Remote inside the container. For example, the Docker option
`--net=none` should not be used with a +primitive+. The default (using a
distinct network space inside the container) works in combination with
+ip-range-start+. If the Docker option `--net=host` is used (making the
container share the host's network space), a unique +control-port+ should be
specified for each bundle. Any firewall must allow access to the
+control-port+.
====
=== Bundle Meta-Attributes ===
Any meta-attribute set on a bundle will be inherited by the bundle's
primitive and any resources implicitly created by Pacemaker for the bundle.
This includes options such as +priority+, +target-role+, and +is-managed+. See
<> for more information.
=== Limitations of Bundles ===
-Currently, bundles may not be cloned, or included in groups or colocation
+Bundle support is considered experimental in Pacemaker 1.1.17.
+
+Bundles may not be cloned, or included in groups or ordering
constraints. This includes the bundle's primitive and any resources
implicitly created by Pacemaker for the bundle.
Bundles do not have instance attributes, utilization attributes, or operations,
though a bundle's primitive may have them.
A bundle with a primitive can run on a Pacemaker Remote node only if the bundle
uses a distinct +control-port+.
+
+Interacting directly with any resource or guest node implicitly created by
+Pacemaker for the bundle is strongly discouraged and likely to cause problems.
diff --git a/lib/cib/cib_attrs.c b/lib/cib/cib_attrs.c
index 0f5d5a718e..da5ca8519b 100644
--- a/lib/cib/cib_attrs.c
+++ b/lib/cib/cib_attrs.c
@@ -1,592 +1,594 @@
/*
* Copyright (C) 2004 Andrew Beekhof
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define attr_msg(level, fmt, args...) do { \
if(to_console) { \
printf(fmt"\n", ##args); \
} else { \
do_crm_log(level, fmt , ##args); \
} \
} while(0)
/* could also check for possible truncation */
#define attr_snprintf(_str, _offset, _limit, ...) do { \
_offset += snprintf(_str + _offset, \
(_limit > _offset) ? _limit - _offset : 0, \
__VA_ARGS__); \
} while(0)
extern int
find_nvpair_attr_delegate(cib_t * the_cib, const char *attr, const char *section,
const char *node_uuid, const char *attr_set_type, const char *set_name,
const char *attr_id, const char *attr_name, gboolean to_console,
char **value, const char *user_name)
{
int offset = 0;
static int xpath_max = 1024;
int rc = pcmk_ok;
char *xpath_string = NULL;
xmlNode *xml_search = NULL;
const char *set_type = NULL;
const char *node_type = NULL;
if (attr_set_type) {
set_type = attr_set_type;
} else {
set_type = XML_TAG_ATTR_SETS;
}
CRM_ASSERT(value != NULL);
*value = NULL;
if (safe_str_eq(section, XML_CIB_TAG_CRMCONFIG)) {
node_uuid = NULL;
set_type = XML_CIB_TAG_PROPSET;
} else if (safe_str_eq(section, XML_CIB_TAG_OPCONFIG)
|| safe_str_eq(section, XML_CIB_TAG_RSCCONFIG)) {
node_uuid = NULL;
set_type = XML_TAG_META_SETS;
} else if (safe_str_eq(section, XML_CIB_TAG_TICKETS)) {
node_uuid = NULL;
section = XML_CIB_TAG_STATUS;
node_type = XML_CIB_TAG_TICKETS;
} else if (node_uuid == NULL) {
return -EINVAL;
}
xpath_string = calloc(1, xpath_max);
if (xpath_string == NULL) {
crm_perror(LOG_CRIT, "Could not create xpath");
return -ENOMEM;
}
attr_snprintf(xpath_string, offset, xpath_max, "%.128s", get_object_path(section));
if (safe_str_eq(node_type, XML_CIB_TAG_TICKETS)) {
attr_snprintf(xpath_string, offset, xpath_max, "//%s", node_type);
} else if (node_uuid) {
const char *node_type = XML_CIB_TAG_NODE;
if (safe_str_eq(section, XML_CIB_TAG_STATUS)) {
node_type = XML_CIB_TAG_STATE;
set_type = XML_TAG_TRANSIENT_NODEATTRS;
}
attr_snprintf(xpath_string, offset, xpath_max, "//%s[@id='%s']", node_type,
node_uuid);
}
if (set_name) {
attr_snprintf(xpath_string, offset, xpath_max, "//%s[@id='%.128s']", set_type,
set_name);
} else {
attr_snprintf(xpath_string, offset, xpath_max, "//%s", set_type);
}
attr_snprintf(xpath_string, offset, xpath_max, "//nvpair[");
if (attr_id) {
attr_snprintf(xpath_string, offset, xpath_max, "@id='%s'", attr_id);
}
if (attr_name) {
if (attr_id) {
attr_snprintf(xpath_string, offset, xpath_max, " and ");
}
attr_snprintf(xpath_string, offset, xpath_max, "@name='%.128s'", attr_name);
}
attr_snprintf(xpath_string, offset, xpath_max, "]");
CRM_LOG_ASSERT(offset > 0);
rc = cib_internal_op(the_cib, CIB_OP_QUERY, NULL, xpath_string, NULL, &xml_search,
cib_sync_call | cib_scope_local | cib_xpath, user_name);
if (rc != pcmk_ok) {
crm_trace("Query failed for attribute %s (section=%s, node=%s, set=%s, xpath=%s): %s",
attr_name, section, crm_str(node_uuid), crm_str(set_name), xpath_string,
pcmk_strerror(rc));
goto done;
}
crm_log_xml_debug(xml_search, "Match");
if (xml_has_children(xml_search)) {
xmlNode *child = NULL;
rc = -ENOTUNIQ;
attr_msg(LOG_WARNING, "Multiple attributes match name=%s", attr_name);
for (child = __xml_first_child(xml_search); child != NULL; child = __xml_next(child)) {
attr_msg(LOG_INFO, " Value: %s \t(id=%s)",
crm_element_value(child, XML_NVPAIR_ATTR_VALUE), ID(child));
}
} else {
const char *tmp = crm_element_value(xml_search, attr);
if (tmp) {
*value = strdup(tmp);
}
}
done:
free(xpath_string);
free_xml(xml_search);
return rc;
}
int
update_attr_delegate(cib_t * the_cib, int call_options,
const char *section, const char *node_uuid, const char *set_type,
const char *set_name, const char *attr_id, const char *attr_name,
const char *attr_value, gboolean to_console, const char *user_name,
const char *node_type)
{
const char *tag = NULL;
int rc = pcmk_ok;
xmlNode *xml_top = NULL;
xmlNode *xml_obj = NULL;
char *local_attr_id = NULL;
char *local_set_name = NULL;
CRM_CHECK(section != NULL, return -EINVAL);
CRM_CHECK(attr_value != NULL, return -EINVAL);
CRM_CHECK(attr_name != NULL || attr_id != NULL, return -EINVAL);
rc = find_nvpair_attr_delegate(the_cib, XML_ATTR_ID, section, node_uuid, set_type, set_name,
attr_id, attr_name, to_console, &local_attr_id, user_name);
if (rc == pcmk_ok) {
attr_id = local_attr_id;
goto do_modify;
} else if (rc != -ENXIO) {
return rc;
/* } else if(attr_id == NULL) { */
/* return -EINVAL; */
} else {
crm_trace("%s does not exist, create it", attr_name);
if (safe_str_eq(section, XML_CIB_TAG_TICKETS)) {
node_uuid = NULL;
section = XML_CIB_TAG_STATUS;
node_type = XML_CIB_TAG_TICKETS;
xml_top = create_xml_node(xml_obj, XML_CIB_TAG_STATUS);
xml_obj = create_xml_node(xml_top, XML_CIB_TAG_TICKETS);
} else if (safe_str_eq(section, XML_CIB_TAG_NODES)) {
if (node_uuid == NULL) {
return -EINVAL;
}
if (safe_str_eq(node_type, "remote")) {
xml_top = create_xml_node(xml_obj, XML_CIB_TAG_NODES);
xml_obj = create_xml_node(xml_top, XML_CIB_TAG_NODE);
crm_xml_add(xml_obj, XML_ATTR_TYPE, "remote");
crm_xml_add(xml_obj, XML_ATTR_ID, node_uuid);
crm_xml_add(xml_obj, XML_ATTR_UNAME, node_uuid);
} else {
tag = XML_CIB_TAG_NODE;
}
} else if (safe_str_eq(section, XML_CIB_TAG_STATUS)) {
tag = XML_TAG_TRANSIENT_NODEATTRS;
if (node_uuid == NULL) {
return -EINVAL;
}
xml_top = create_xml_node(xml_obj, XML_CIB_TAG_STATE);
crm_xml_add(xml_top, XML_ATTR_ID, node_uuid);
xml_obj = xml_top;
} else {
tag = section;
node_uuid = NULL;
}
if (set_name == NULL) {
if (safe_str_eq(section, XML_CIB_TAG_CRMCONFIG)) {
local_set_name = strdup(CIB_OPTIONS_FIRST);
} else if (safe_str_eq(node_type, XML_CIB_TAG_TICKETS)) {
local_set_name = crm_concat(section, XML_CIB_TAG_TICKETS, '-');
} else if (node_uuid) {
local_set_name = crm_concat(section, node_uuid, '-');
if (set_type) {
char *tmp_set_name = local_set_name;
local_set_name = crm_concat(tmp_set_name, set_type, '-');
free(tmp_set_name);
}
} else {
local_set_name = crm_concat(section, "options", '-');
}
set_name = local_set_name;
}
if (attr_id == NULL) {
local_attr_id = crm_concat(set_name, attr_name, '-');
crm_xml_sanitize_id(local_attr_id);
attr_id = local_attr_id;
} else if (attr_name == NULL) {
attr_name = attr_id;
}
crm_trace("Creating %s/%s", section, tag);
if (tag != NULL) {
xml_obj = create_xml_node(xml_obj, tag);
crm_xml_add(xml_obj, XML_ATTR_ID, node_uuid);
if (xml_top == NULL) {
xml_top = xml_obj;
}
}
if (node_uuid == NULL && safe_str_neq(node_type, XML_CIB_TAG_TICKETS)) {
if (safe_str_eq(section, XML_CIB_TAG_CRMCONFIG)) {
xml_obj = create_xml_node(xml_obj, XML_CIB_TAG_PROPSET);
} else {
xml_obj = create_xml_node(xml_obj, XML_TAG_META_SETS);
}
} else if (set_type) {
xml_obj = create_xml_node(xml_obj, set_type);
} else {
xml_obj = create_xml_node(xml_obj, XML_TAG_ATTR_SETS);
}
crm_xml_add(xml_obj, XML_ATTR_ID, set_name);
if (xml_top == NULL) {
xml_top = xml_obj;
}
}
do_modify:
xml_obj = create_xml_node(xml_obj, XML_CIB_TAG_NVPAIR);
if (xml_top == NULL) {
xml_top = xml_obj;
}
crm_xml_add(xml_obj, XML_ATTR_ID, attr_id);
crm_xml_add(xml_obj, XML_NVPAIR_ATTR_NAME, attr_name);
crm_xml_add(xml_obj, XML_NVPAIR_ATTR_VALUE, attr_value);
crm_log_xml_trace(xml_top, "update_attr");
rc = cib_internal_op(the_cib, CIB_OP_MODIFY, NULL, section, xml_top, NULL,
call_options | cib_quorum_override, user_name);
if (rc < pcmk_ok) {
attr_msg(LOG_ERR, "Error setting %s=%s (section=%s, set=%s): %s",
attr_name, attr_value, section, crm_str(set_name), pcmk_strerror(rc));
crm_log_xml_info(xml_top, "Update");
}
free(local_set_name);
free(local_attr_id);
free_xml(xml_top);
return rc;
}
int
read_attr_delegate(cib_t * the_cib,
const char *section, const char *node_uuid, const char *set_type,
const char *set_name, const char *attr_id, const char *attr_name,
char **attr_value, gboolean to_console, const char *user_name)
{
int rc = pcmk_ok;
CRM_ASSERT(attr_value != NULL);
CRM_CHECK(section != NULL, return -EINVAL);
CRM_CHECK(attr_name != NULL || attr_id != NULL, return -EINVAL);
*attr_value = NULL;
rc = find_nvpair_attr_delegate(the_cib, XML_NVPAIR_ATTR_VALUE, section, node_uuid, set_type,
set_name, attr_id, attr_name, to_console, attr_value, user_name);
if (rc != pcmk_ok) {
crm_trace("Query failed for attribute %s (section=%s, node=%s, set=%s): %s",
attr_name, section, crm_str(set_name), crm_str(node_uuid), pcmk_strerror(rc));
}
return rc;
}
int
delete_attr_delegate(cib_t * the_cib, int options,
const char *section, const char *node_uuid, const char *set_type,
const char *set_name, const char *attr_id, const char *attr_name,
const char *attr_value, gboolean to_console, const char *user_name)
{
int rc = pcmk_ok;
xmlNode *xml_obj = NULL;
char *local_attr_id = NULL;
CRM_CHECK(section != NULL, return -EINVAL);
CRM_CHECK(attr_name != NULL || attr_id != NULL, return -EINVAL);
if (attr_id == NULL) {
rc = find_nvpair_attr_delegate(the_cib, XML_ATTR_ID, section, node_uuid, set_type,
set_name, attr_id, attr_name, to_console, &local_attr_id,
user_name);
if (rc != pcmk_ok) {
return rc;
}
attr_id = local_attr_id;
}
xml_obj = create_xml_node(NULL, XML_CIB_TAG_NVPAIR);
crm_xml_add(xml_obj, XML_ATTR_ID, attr_id);
crm_xml_add(xml_obj, XML_NVPAIR_ATTR_NAME, attr_name);
crm_xml_add(xml_obj, XML_NVPAIR_ATTR_VALUE, attr_value);
rc = cib_internal_op(the_cib, CIB_OP_DELETE, NULL, section, xml_obj, NULL,
options | cib_quorum_override, user_name);
if (rc == pcmk_ok) {
attr_msg(LOG_DEBUG, "Deleted %s %s: id=%s%s%s%s%s\n",
section, node_uuid ? "attribute" : "option", local_attr_id,
set_name ? " set=" : "", set_name ? set_name : "",
attr_name ? " name=" : "", attr_name ? attr_name : "");
}
free(local_attr_id);
free_xml(xml_obj);
return rc;
}
/*!
* \internal
* \brief Parse node UUID from search result
*
* \param[in] result XML search result
* \param[out] uuid If non-NULL, where to store parsed UUID
* \param[out] is_remote If non-NULL, set TRUE if result is remote node
*
* \return pcmk_ok if UUID was successfully parsed, -ENXIO otherwise
*/
static int
get_uuid_from_result(xmlNode *result, char **uuid, int *is_remote)
{
int rc = -ENXIO;
const char *tag;
const char *parsed_uuid = NULL;
int parsed_is_remote = FALSE;
if (result == NULL) {
return rc;
}
/* If there are multiple results, the first is sufficient */
tag = (const char *) (result->name);
if (safe_str_eq(tag, "xpath-query")) {
result = __xml_first_child(result);
tag = (const char *) (result->name);
}
if (safe_str_eq(tag, XML_CIB_TAG_NODE)) {
/* Result is tag from section */
if (safe_str_eq(crm_element_value(result, XML_ATTR_TYPE), "remote")) {
parsed_uuid = crm_element_value(result, XML_ATTR_UNAME);
parsed_is_remote = TRUE;
} else {
parsed_uuid = ID(result);
parsed_is_remote = FALSE;
}
} else if (safe_str_eq(tag, XML_CIB_TAG_RESOURCE)) {
/* Result is for ocf:pacemaker:remote resource */
parsed_uuid = ID(result);
parsed_is_remote = TRUE;
} else if (safe_str_eq(tag, XML_CIB_TAG_NVPAIR)) {
/* Result is remote-node parameter of for guest node */
parsed_uuid = crm_element_value(result, XML_NVPAIR_ATTR_VALUE);
parsed_is_remote = TRUE;
} else if (safe_str_eq(tag, XML_CIB_TAG_STATE)) {
/* Result is tag from section */
parsed_uuid = crm_element_value(result, XML_ATTR_UNAME);
- crm_element_value_int(result, F_ATTRD_IS_REMOTE, &parsed_is_remote);
+ if (crm_is_true(crm_element_value(result, XML_NODE_IS_REMOTE))) {
+ parsed_is_remote = TRUE;
+ }
}
if (parsed_uuid) {
if (uuid) {
*uuid = strdup(parsed_uuid);
}
if (is_remote) {
*is_remote = parsed_is_remote;
}
rc = pcmk_ok;
}
return rc;
}
/* Search string to find a node by name, as:
* - cluster or remote node in nodes section
* - remote node in resources section
* - guest node in resources section
- * - orphaned remote node in status section
+ * - orphaned remote node or bundle guest node in status section
*/
#define XPATH_NODE \
"/" XML_TAG_CIB "/" XML_CIB_TAG_CONFIGURATION "/" XML_CIB_TAG_NODES \
"/" XML_CIB_TAG_NODE "[@" XML_ATTR_UNAME "='%s']" \
"|/" XML_TAG_CIB "/" XML_CIB_TAG_CONFIGURATION "/" XML_CIB_TAG_RESOURCES \
"/" XML_CIB_TAG_RESOURCE \
"[@class='ocf'][@provider='pacemaker'][@type='remote'][@id='%s']" \
"|/" XML_TAG_CIB "/" XML_CIB_TAG_CONFIGURATION "/" XML_CIB_TAG_RESOURCES \
"/" XML_CIB_TAG_RESOURCE "/" XML_TAG_META_SETS "/" XML_CIB_TAG_NVPAIR \
"[@name='" XML_RSC_ATTR_REMOTE_NODE "'][@value='%s']" \
"|/" XML_TAG_CIB "/" XML_CIB_TAG_STATUS "/" XML_CIB_TAG_STATE \
"[@" XML_NODE_IS_REMOTE "='true'][@" XML_ATTR_UUID "='%s']"
int
query_node_uuid(cib_t * the_cib, const char *uname, char **uuid, int *is_remote_node)
{
int rc = pcmk_ok;
char *xpath_string;
xmlNode *xml_search = NULL;
CRM_ASSERT(uname != NULL);
if (uuid) {
*uuid = NULL;
}
if (is_remote_node) {
*is_remote_node = FALSE;
}
xpath_string = crm_strdup_printf(XPATH_NODE, uname, uname, uname, uname);
if (cib_internal_op(the_cib, CIB_OP_QUERY, NULL, xpath_string, NULL,
&xml_search, cib_sync_call|cib_scope_local|cib_xpath,
NULL) == pcmk_ok) {
rc = get_uuid_from_result(xml_search, uuid, is_remote_node);
} else {
rc = -ENXIO;
}
free(xpath_string);
free_xml(xml_search);
if (rc != pcmk_ok) {
crm_debug("Could not map node name '%s' to a UUID: %s",
uname, pcmk_strerror(rc));
} else {
crm_info("Mapped node name '%s' to UUID %s", uname, (uuid? *uuid : ""));
}
return rc;
}
int
query_node_uname(cib_t * the_cib, const char *uuid, char **uname)
{
int rc = pcmk_ok;
xmlNode *a_child = NULL;
xmlNode *xml_obj = NULL;
xmlNode *fragment = NULL;
const char *child_name = NULL;
CRM_ASSERT(uname != NULL);
CRM_ASSERT(uuid != NULL);
rc = the_cib->cmds->query(the_cib, XML_CIB_TAG_NODES, &fragment,
cib_sync_call | cib_scope_local);
if (rc != pcmk_ok) {
return rc;
}
xml_obj = fragment;
CRM_CHECK(safe_str_eq(crm_element_name(xml_obj), XML_CIB_TAG_NODES), return -ENOMSG);
CRM_ASSERT(xml_obj != NULL);
crm_log_xml_trace(xml_obj, "Result section");
rc = -ENXIO;
*uname = NULL;
for (a_child = __xml_first_child(xml_obj); a_child != NULL; a_child = __xml_next(a_child)) {
if (crm_str_eq((const char *)a_child->name, XML_CIB_TAG_NODE, TRUE)) {
child_name = ID(a_child);
if (safe_str_eq(uuid, child_name)) {
child_name = crm_element_value(a_child, XML_ATTR_UNAME);
if (child_name != NULL) {
*uname = strdup(child_name);
rc = pcmk_ok;
}
break;
}
}
}
free_xml(fragment);
return rc;
}
int
set_standby(cib_t * the_cib, const char *uuid, const char *scope, const char *standby_value)
{
int rc = pcmk_ok;
char *attr_id = NULL;
CRM_CHECK(uuid != NULL, return -EINVAL);
CRM_CHECK(standby_value != NULL, return -EINVAL);
if (safe_str_eq(scope, "reboot") || safe_str_eq(scope, XML_CIB_TAG_STATUS)) {
scope = XML_CIB_TAG_STATUS;
attr_id = crm_strdup_printf("transient-standby-%.256s", uuid);
} else {
scope = XML_CIB_TAG_NODES;
attr_id = crm_strdup_printf("standby-%.256s", uuid);
}
rc = update_attr_delegate(the_cib, cib_sync_call, scope, uuid, NULL, NULL,
attr_id, "standby", standby_value, TRUE, NULL, NULL);
free(attr_id);
return rc;
}
diff --git a/lib/pengine/container.c b/lib/pengine/container.c
index 836b482546..e91644664f 100644
--- a/lib/pengine/container.c
+++ b/lib/pengine/container.c
@@ -1,1035 +1,1063 @@
/*
* Copyright (C) 2004 Andrew Beekhof
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include
#include
#include
#include
#include
#include
#define VARIANT_CONTAINER 1
#include "./variant.h"
void tuple_free(container_grouping_t *tuple);
static char *
next_ip(const char *last_ip)
{
unsigned int oct1 = 0;
unsigned int oct2 = 0;
unsigned int oct3 = 0;
unsigned int oct4 = 0;
int rc = sscanf(last_ip, "%u.%u.%u.%u", &oct1, &oct2, &oct3, &oct4);
if (rc != 4) {
/*@ TODO check for IPv6 */
return NULL;
} else if (oct3 > 253) {
return NULL;
} else if (oct4 > 253) {
++oct3;
oct4 = 1;
} else {
++oct4;
}
return crm_strdup_printf("%u.%u.%u.%u", oct1, oct2, oct3, oct4);
}
static int
allocate_ip(container_variant_data_t *data, container_grouping_t *tuple, char *buffer, int max)
{
if(data->ip_range_start == NULL) {
return 0;
} else if(data->ip_last) {
tuple->ipaddr = next_ip(data->ip_last);
} else {
tuple->ipaddr = strdup(data->ip_range_start);
}
data->ip_last = tuple->ipaddr;
#if 0
return snprintf(buffer, max, " --add-host=%s-%d:%s --link %s-docker-%d:%s-link-%d",
data->prefix, tuple->offset, tuple->ipaddr,
data->prefix, tuple->offset, data->prefix, tuple->offset);
#else
return snprintf(buffer, max, " --add-host=%s-%d:%s",
data->prefix, tuple->offset, tuple->ipaddr);
#endif
}
static xmlNode *
create_resource(const char *name, const char *provider, const char *kind)
{
xmlNode *rsc = create_xml_node(NULL, XML_CIB_TAG_RESOURCE);
crm_xml_add(rsc, XML_ATTR_ID, name);
crm_xml_add(rsc, XML_AGENT_ATTR_CLASS, "ocf");
crm_xml_add(rsc, XML_AGENT_ATTR_PROVIDER, provider);
crm_xml_add(rsc, XML_ATTR_TYPE, kind);
return rsc;
}
static void
create_nvp(xmlNode *parent, const char *name, const char *value)
{
xmlNode *xml_nvp = create_xml_node(parent, XML_CIB_TAG_NVPAIR);
crm_xml_set_id(xml_nvp, "%s-%s", ID(parent), name);
crm_xml_add(xml_nvp, XML_NVPAIR_ATTR_NAME, name);
crm_xml_add(xml_nvp, XML_NVPAIR_ATTR_VALUE, value);
}
static void
create_op(xmlNode *parent, const char *prefix, const char *task, const char *interval)
{
xmlNode *xml_op = create_xml_node(parent, "op");
crm_xml_set_id(xml_op, "%s-%s-%s", prefix, task, interval);
crm_xml_add(xml_op, XML_LRM_ATTR_INTERVAL, interval);
crm_xml_add(xml_op, "name", task);
}
/*!
* \internal
* \brief Check whether cluster can manage resource inside container
*
* \param[in] data Container variant data
*
* \return TRUE if networking configuration is acceptable, FALSE otherwise
*
* \note The resource is manageable if an IP range or control port has been
* specified. If a control port is used without an IP range, replicas per
* host must be 1.
*/
static bool
valid_network(container_variant_data_t *data)
{
if(data->ip_range_start) {
return TRUE;
}
if(data->control_port) {
if(data->replicas_per_host > 1) {
pe_err("Specifying the 'control-port' for %s requires 'replicas-per-host=1'", data->prefix);
data->replicas_per_host = 1;
/* @TODO to be sure: clear_bit(rsc->flags, pe_rsc_unique); */
}
return TRUE;
}
return FALSE;
}
static bool
create_ip_resource(
resource_t *parent, container_variant_data_t *data, container_grouping_t *tuple,
pe_working_set_t * data_set)
{
if(data->ip_range_start) {
char *id = NULL;
xmlNode *xml_ip = NULL;
xmlNode *xml_obj = NULL;
id = crm_strdup_printf("%s-ip-%s", data->prefix, tuple->ipaddr);
crm_xml_sanitize_id(id);
xml_ip = create_resource(id, "heartbeat", "IPaddr2");
free(id);
xml_obj = create_xml_node(xml_ip, XML_TAG_ATTR_SETS);
crm_xml_set_id(xml_obj, "%s-attributes-%d", data->prefix, tuple->offset);
create_nvp(xml_obj, "ip", tuple->ipaddr);
if(data->host_network) {
create_nvp(xml_obj, "nic", data->host_network);
}
if(data->host_netmask) {
create_nvp(xml_obj, "cidr_netmask", data->host_netmask);
} else {
create_nvp(xml_obj, "cidr_netmask", "32");
}
xml_obj = create_xml_node(xml_ip, "operations");
create_op(xml_obj, ID(xml_ip), "monitor", "60s");
// TODO: Other ops? Timeouts and intervals from underlying resource?
if (common_unpack(xml_ip, &tuple->ip, parent, data_set) == false) {
return FALSE;
}
parent->children = g_list_append(parent->children, tuple->ip);
}
return TRUE;
}
static bool
create_docker_resource(
resource_t *parent, container_variant_data_t *data, container_grouping_t *tuple,
pe_working_set_t * data_set)
{
int offset = 0, max = 4096;
char *buffer = calloc(1, max+1);
int doffset = 0, dmax = 1024;
char *dbuffer = calloc(1, dmax+1);
char *id = NULL;
xmlNode *xml_docker = NULL;
xmlNode *xml_obj = NULL;
id = crm_strdup_printf("%s-docker-%d", data->prefix, tuple->offset);
crm_xml_sanitize_id(id);
xml_docker = create_resource(id, "heartbeat", "docker");
free(id);
xml_obj = create_xml_node(xml_docker, XML_TAG_ATTR_SETS);
crm_xml_set_id(xml_obj, "%s-attributes-%d", data->prefix, tuple->offset);
create_nvp(xml_obj, "image", data->image);
create_nvp(xml_obj, "allow_pull", "true");
create_nvp(xml_obj, "force_kill", "false");
create_nvp(xml_obj, "reuse", "false");
offset += snprintf(buffer+offset, max-offset, " --restart=no");
/* Set a container hostname only if we have an IP to map it to.
* The user can set -h or --uts=host themselves if they want a nicer
* name for logs, but this makes applications happy who need their
* hostname to match the IP they bind to.
*/
if (data->ip_range_start != NULL) {
offset += snprintf(buffer+offset, max-offset, " -h %s-%d",
data->prefix, tuple->offset);
}
if(data->docker_network) {
// offset += snprintf(buffer+offset, max-offset, " --link-local-ip=%s", tuple->ipaddr);
offset += snprintf(buffer+offset, max-offset, " --net=%s", data->docker_network);
}
if(data->control_port) {
offset += snprintf(buffer+offset, max-offset, " -e PCMK_remote_port=%s", data->control_port);
} else {
offset += snprintf(buffer+offset, max-offset, " -e PCMK_remote_port=%d", DEFAULT_REMOTE_PORT);
}
for(GListPtr pIter = data->mounts; pIter != NULL; pIter = pIter->next) {
container_mount_t *mount = pIter->data;
if(mount->flags) {
char *source = crm_strdup_printf(
"%s/%s-%d", mount->source, data->prefix, tuple->offset);
if(doffset > 0) {
doffset += snprintf(dbuffer+doffset, dmax-doffset, ",");
}
doffset += snprintf(dbuffer+doffset, dmax-doffset, "%s", source);
offset += snprintf(buffer+offset, max-offset, " -v %s:%s", source, mount->target);
free(source);
} else {
offset += snprintf(buffer+offset, max-offset, " -v %s:%s", mount->source, mount->target);
}
if(mount->options) {
offset += snprintf(buffer+offset, max-offset, ":%s", mount->options);
}
}
for(GListPtr pIter = data->ports; pIter != NULL; pIter = pIter->next) {
container_port_t *port = pIter->data;
if(tuple->ipaddr) {
offset += snprintf(buffer+offset, max-offset, " -p %s:%s:%s",
tuple->ipaddr, port->source, port->target);
} else {
offset += snprintf(buffer+offset, max-offset, " -p %s:%s", port->source, port->target);
}
}
if(data->docker_run_options) {
offset += snprintf(buffer+offset, max-offset, " %s", data->docker_run_options);
}
if(data->docker_host_options) {
offset += snprintf(buffer+offset, max-offset, " %s", data->docker_host_options);
}
create_nvp(xml_obj, "run_opts", buffer);
free(buffer);
create_nvp(xml_obj, "mount_points", dbuffer);
free(dbuffer);
if(tuple->child) {
if(data->docker_run_command) {
create_nvp(xml_obj, "run_cmd", data->docker_run_command);
} else {
create_nvp(xml_obj, "run_cmd", SBIN_DIR"/pacemaker_remoted");
}
/* TODO: Allow users to specify their own?
*
* We just want to know if the container is alive, we'll
* monitor the child independently
*/
create_nvp(xml_obj, "monitor_cmd", "/bin/true");
/* } else if(child && data->untrusted) {
* Support this use-case?
*
* The ability to have resources started/stopped by us, but
* unable to set attributes, etc.
*
* Arguably better to control API access this with ACLs like
* "normal" remote nodes
*
* create_nvp(xml_obj, "run_cmd", "/usr/libexec/pacemaker/lrmd");
* create_nvp(xml_obj, "monitor_cmd", "/usr/libexec/pacemaker/lrmd_internal_ctl -c poke");
*/
} else {
if(data->docker_run_command) {
create_nvp(xml_obj, "run_cmd", data->docker_run_command);
}
/* TODO: Allow users to specify their own?
*
* We don't know what's in the container, so we just want
* to know if it is alive
*/
create_nvp(xml_obj, "monitor_cmd", "/bin/true");
}
xml_obj = create_xml_node(xml_docker, "operations");
create_op(xml_obj, ID(xml_docker), "monitor", "60s");
// TODO: Other ops? Timeouts and intervals from underlying resource?
if (common_unpack(xml_docker, &tuple->docker, parent, data_set) == FALSE) {
return FALSE;
}
parent->children = g_list_append(parent->children, tuple->docker);
return TRUE;
}
static bool
create_remote_resource(
resource_t *parent, container_variant_data_t *data, container_grouping_t *tuple,
pe_working_set_t * data_set)
{
if (tuple->child && valid_network(data)) {
GHashTableIter gIter;
+ GListPtr rsc_iter = NULL;
node_t *node = NULL;
xmlNode *xml_obj = NULL;
xmlNode *xml_remote = NULL;
- char *nodeid = crm_strdup_printf("%s-%d", data->prefix, tuple->offset);
- char *id = NULL;
+ char *id = crm_strdup_printf("%s-%d", data->prefix, tuple->offset);
+ const char *uname = NULL;
- if (remote_id_conflict(nodeid, data_set)) {
+ if (remote_id_conflict(id, data_set)) {
+ free(id);
// The biggest hammer we have
id = crm_strdup_printf("pcmk-internal-%s-remote-%d", tuple->child->id, tuple->offset);
CRM_ASSERT(remote_id_conflict(id, data_set) == FALSE);
- } else {
- id = strdup(nodeid);
}
xml_remote = create_resource(id, "pacemaker", "remote");
+
+ /* Abandon our created ID, and pull the copy from the XML, because we
+ * need something that will get freed during data set cleanup to use as
+ * the node ID and uname.
+ */
free(id);
+ id = NULL;
+ uname = ID(xml_remote);
xml_obj = create_xml_node(xml_remote, "operations");
- create_op(xml_obj, ID(xml_remote), "monitor", "60s");
+ create_op(xml_obj, uname, "monitor", "60s");
xml_obj = create_xml_node(xml_remote, XML_TAG_ATTR_SETS);
crm_xml_set_id(xml_obj, "%s-attributes-%d", data->prefix, tuple->offset);
if(tuple->ipaddr) {
create_nvp(xml_obj, "addr", tuple->ipaddr);
} else {
// REMOTE_CONTAINER_HACK: Allow remote nodes that start containers with pacemaker remote inside
create_nvp(xml_obj, "addr", "#uname");
}
if(data->control_port) {
create_nvp(xml_obj, "port", data->control_port);
} else {
- create_nvp(xml_obj, "port", crm_itoa(DEFAULT_REMOTE_PORT));
+ char *port_s = crm_itoa(DEFAULT_REMOTE_PORT);
+
+ create_nvp(xml_obj, "port", port_s);
+ free(port_s);
}
xml_obj = create_xml_node(xml_remote, XML_TAG_META_SETS);
crm_xml_set_id(xml_obj, "%s-meta-%d", data->prefix, tuple->offset);
create_nvp(xml_obj, XML_OP_ATTR_ALLOW_MIGRATE, "false");
/* This sets tuple->docker as tuple->remote's container, which is
* similar to what happens with guest nodes. This is how the PE knows
* that the bundle node is fenced by recovering docker, and that
* remote should be ordered relative to docker.
*/
create_nvp(xml_obj, XML_RSC_ATTR_CONTAINER, tuple->docker->id);
/* Ensure a node has been created for the guest (it may have already
* been, if it has a permanent node attribute), and ensure its weight is
* -INFINITY so no other resources can run on it.
*/
- node = pe_find_node(data_set->nodes, nodeid);
+ node = pe_find_node(data_set->nodes, uname);
if (node == NULL) {
- node = pe_create_node(strdup(nodeid), nodeid, "remote", "-INFINITY",
+ node = pe_create_node(uname, uname, "remote", "-INFINITY",
data_set);
} else {
node->weight = -INFINITY;
}
+ /* unpack_remote_nodes() ensures that each remote node and guest node
+ * has a node_t entry. Ideally, it would do the same for bundle nodes.
+ * Unfortunately, a bundle has to be mostly unpacked before it's obvious
+ * what nodes will be needed, so we do it just above.
+ *
+ * Worse, that means that the node may have been utilized while
+ * unpacking other resources, without our weight correction. The most
+ * likely place for this to happen is when common_unpack() calls
+ * resource_location() to set a default score in symmetric clusters.
+ * This adds a node *copy* to each resource's allowed nodes, and these
+ * copies will have the wrong weight.
+ *
+ * As a hacky workaround, clear those copies here.
+ */
+ for (rsc_iter = data_set->resources; rsc_iter; rsc_iter = rsc_iter->next) {
+ resource_t *rsc = (resource_t *) rsc_iter->data;
+
+ g_hash_table_remove(rsc->allowed_nodes, uname);
+ }
+
tuple->node = node_copy(node);
tuple->node->weight = 500;
- nodeid = NULL;
- id = NULL;
if (common_unpack(xml_remote, &tuple->remote, parent, data_set) == FALSE) {
return FALSE;
}
g_hash_table_iter_init(&gIter, tuple->remote->allowed_nodes);
while (g_hash_table_iter_next(&gIter, NULL, (void **)&node)) {
if(is_remote_node(node)) {
/* Remote resources can only run on 'normal' cluster node */
node->weight = -INFINITY;
}
}
tuple->node->details->remote_rsc = tuple->remote;
/* #kind is irrelevant to bundles since it is only used in location
* constraint rules, and those don't matter for resources inside
* bundles. But just for clarity, a bundle is closer to "container"
* (guest node) than the "remote" set by pe_create_node().
*/
g_hash_table_insert(tuple->node->details->attrs,
strdup("#kind"), strdup("container"));
/* One effect of this is that setup_container() will add
* tuple->remote to tuple->docker's fillers, which will make
* rsc_contains_remote_node() true for tuple->docker.
*
* tuple->child does NOT get added to tuple->docker's fillers.
* The only noticeable effect if it did would be for its fail count to
* be taken into account when checking tuple->docker's migration
* threshold.
*/
parent->children = g_list_append(parent->children, tuple->remote);
}
return TRUE;
}
static bool
create_container(
resource_t *parent, container_variant_data_t *data, container_grouping_t *tuple,
pe_working_set_t * data_set)
{
if(create_docker_resource(parent, data, tuple, data_set) == FALSE) {
return TRUE;
}
if(create_ip_resource(parent, data, tuple, data_set) == FALSE) {
return TRUE;
}
if(create_remote_resource(parent, data, tuple, data_set) == FALSE) {
return TRUE;
}
if(tuple->child && tuple->ipaddr) {
add_hash_param(tuple->child->meta, "external-ip", tuple->ipaddr);
}
if(tuple->remote) {
/*
* Allow the remote connection resource to be allocated to a
* different node than the one on which the docker container
* is active.
*
* Makes it possible to have remote nodes, running docker
* containers with pacemaker_remoted inside in order to start
* services inside those containers.
*/
set_bit(tuple->remote->flags, pe_rsc_allow_remote_remotes);
}
return FALSE;
}
static void mount_free(container_mount_t *mount)
{
free(mount->source);
free(mount->target);
free(mount->options);
free(mount);
}
static void port_free(container_port_t *port)
{
free(port->source);
free(port->target);
free(port);
}
gboolean
container_unpack(resource_t * rsc, pe_working_set_t * data_set)
{
const char *value = NULL;
xmlNode *xml_obj = NULL;
xmlNode *xml_resource = NULL;
container_variant_data_t *container_data = NULL;
CRM_ASSERT(rsc != NULL);
pe_rsc_trace(rsc, "Processing resource %s...", rsc->id);
container_data = calloc(1, sizeof(container_variant_data_t));
rsc->variant_opaque = container_data;
container_data->prefix = strdup(rsc->id);
xml_obj = first_named_child(rsc->xml, "docker");
if(xml_obj == NULL) {
return FALSE;
}
value = crm_element_value(xml_obj, "masters");
container_data->masters = crm_parse_int(value, "0");
if (container_data->masters < 0) {
pe_err("'masters' for %s must be nonnegative integer, using 0",
rsc->id);
container_data->masters = 0;
}
value = crm_element_value(xml_obj, "replicas");
if ((value == NULL) && (container_data->masters > 0)) {
container_data->replicas = container_data->masters;
} else {
container_data->replicas = crm_parse_int(value, "1");
}
if (container_data->replicas < 1) {
pe_err("'replicas' for %s must be positive integer, using 1", rsc->id);
container_data->replicas = 1;
}
/*
* Communication between containers on the same host via the
* floating IPs only works if docker is started with:
* --userland-proxy=false --ip-masq=false
*/
value = crm_element_value(xml_obj, "replicas-per-host");
container_data->replicas_per_host = crm_parse_int(value, "1");
if (container_data->replicas_per_host < 1) {
pe_err("'replicas-per-host' for %s must be positive integer, using 1",
rsc->id);
container_data->replicas_per_host = 1;
}
if (container_data->replicas_per_host == 1) {
clear_bit(rsc->flags, pe_rsc_unique);
}
container_data->docker_run_command = crm_element_value_copy(xml_obj, "run-command");
container_data->docker_run_options = crm_element_value_copy(xml_obj, "options");
container_data->image = crm_element_value_copy(xml_obj, "image");
container_data->docker_network = crm_element_value_copy(xml_obj, "network");
xml_obj = first_named_child(rsc->xml, "network");
if(xml_obj) {
container_data->ip_range_start = crm_element_value_copy(xml_obj, "ip-range-start");
container_data->host_netmask = crm_element_value_copy(xml_obj, "host-netmask");
container_data->host_network = crm_element_value_copy(xml_obj, "host-interface");
container_data->control_port = crm_element_value_copy(xml_obj, "control-port");
for (xmlNode *xml_child = __xml_first_child_element(xml_obj); xml_child != NULL;
xml_child = __xml_next_element(xml_child)) {
container_port_t *port = calloc(1, sizeof(container_port_t));
port->source = crm_element_value_copy(xml_child, "port");
if(port->source == NULL) {
port->source = crm_element_value_copy(xml_child, "range");
} else {
port->target = crm_element_value_copy(xml_child, "internal-port");
}
if(port->source != NULL && strlen(port->source) > 0) {
if(port->target == NULL) {
port->target = strdup(port->source);
}
container_data->ports = g_list_append(container_data->ports, port);
} else {
pe_err("Invalid port directive %s", ID(xml_child));
port_free(port);
}
}
}
xml_obj = first_named_child(rsc->xml, "storage");
for (xmlNode *xml_child = __xml_first_child_element(xml_obj); xml_child != NULL;
xml_child = __xml_next_element(xml_child)) {
container_mount_t *mount = calloc(1, sizeof(container_mount_t));
mount->source = crm_element_value_copy(xml_child, "source-dir");
if(mount->source == NULL) {
mount->source = crm_element_value_copy(xml_child, "source-dir-root");
mount->flags = 1;
}
mount->target = crm_element_value_copy(xml_child, "target-dir");
mount->options = crm_element_value_copy(xml_child, "options");
if(mount->source && mount->target) {
container_data->mounts = g_list_append(container_data->mounts, mount);
} else {
pe_err("Invalid mount directive %s", ID(xml_child));
mount_free(mount);
}
}
xml_obj = first_named_child(rsc->xml, "primitive");
if (xml_obj && valid_network(container_data)) {
char *value = NULL;
xmlNode *xml_set = NULL;
if(container_data->masters > 0) {
xml_resource = create_xml_node(NULL, XML_CIB_TAG_MASTER);
} else {
xml_resource = create_xml_node(NULL, XML_CIB_TAG_INCARNATION);
}
crm_xml_set_id(xml_resource, "%s-%s", container_data->prefix, xml_resource->name);
xml_set = create_xml_node(xml_resource, XML_TAG_META_SETS);
crm_xml_set_id(xml_set, "%s-%s-meta", container_data->prefix, xml_resource->name);
create_nvp(xml_set, XML_RSC_ATTR_ORDERED, "true");
value = crm_itoa(container_data->replicas);
create_nvp(xml_set, XML_RSC_ATTR_INCARNATION_MAX, value);
free(value);
value = crm_itoa(container_data->replicas_per_host);
create_nvp(xml_set, XML_RSC_ATTR_INCARNATION_NODEMAX, value);
free(value);
if(container_data->replicas_per_host > 1) {
create_nvp(xml_set, XML_RSC_ATTR_UNIQUE, "true");
} else {
create_nvp(xml_set, XML_RSC_ATTR_UNIQUE, "false");
}
if(container_data->masters) {
value = crm_itoa(container_data->masters);
create_nvp(xml_set, XML_RSC_ATTR_MASTER_MAX, value);
free(value);
}
//crm_xml_add(xml_obj, XML_ATTR_ID, container_data->prefix);
add_node_copy(xml_resource, xml_obj);
} else if(xml_obj) {
pe_err("Cannot control %s inside %s without either ip-range-start or control-port",
rsc->id, ID(xml_obj));
return FALSE;
}
if(xml_resource) {
int lpc = 0;
GListPtr childIter = NULL;
resource_t *new_rsc = NULL;
container_mount_t *mount = NULL;
container_port_t *port = NULL;
int offset = 0, max = 1024;
char *buffer = NULL;
if (common_unpack(xml_resource, &new_rsc, rsc, data_set) == FALSE) {
pe_err("Failed unpacking resource %s", ID(rsc->xml));
if (new_rsc != NULL && new_rsc->fns != NULL) {
new_rsc->fns->free(new_rsc);
}
return FALSE;
}
container_data->child = new_rsc;
container_data->child->orig_xml = xml_obj; // Also the trigger for common_free()
// to free xml_resource as container_data->child->xml
mount = calloc(1, sizeof(container_mount_t));
mount->source = strdup(DEFAULT_REMOTE_KEY_LOCATION);
mount->target = strdup(DEFAULT_REMOTE_KEY_LOCATION);
mount->options = NULL;
mount->flags = 0;
container_data->mounts = g_list_append(container_data->mounts, mount);
mount = calloc(1, sizeof(container_mount_t));
mount->source = strdup(CRM_LOG_DIR "/bundles");
mount->target = strdup("/var/log");
mount->options = NULL;
mount->flags = 1;
container_data->mounts = g_list_append(container_data->mounts, mount);
port = calloc(1, sizeof(container_port_t));
if(container_data->control_port) {
port->source = strdup(container_data->control_port);
} else {
port->source = crm_itoa(DEFAULT_REMOTE_PORT);
}
port->target = strdup(port->source);
container_data->ports = g_list_append(container_data->ports, port);
buffer = calloc(1, max+1);
for(childIter = container_data->child->children; childIter != NULL; childIter = childIter->next) {
container_grouping_t *tuple = calloc(1, sizeof(container_grouping_t));
tuple->child = childIter->data;
tuple->offset = lpc++;
offset += allocate_ip(container_data, tuple, buffer+offset, max-offset);
container_data->tuples = g_list_append(container_data->tuples, tuple);
}
container_data->docker_host_options = buffer;
} else {
// Just a naked container, no pacemaker-remote
int offset = 0, max = 1024;
char *buffer = calloc(1, max+1);
for(int lpc = 0; lpc < container_data->replicas; lpc++) {
container_grouping_t *tuple = calloc(1, sizeof(container_grouping_t));
tuple->offset = lpc;
offset += allocate_ip(container_data, tuple, buffer+offset, max-offset);
container_data->tuples = g_list_append(container_data->tuples, tuple);
}
container_data->docker_host_options = buffer;
}
for (GListPtr gIter = container_data->tuples; gIter != NULL; gIter = gIter->next) {
container_grouping_t *tuple = (container_grouping_t *)gIter->data;
// TODO: Remove from list if create_container() returns TRUE
create_container(rsc, container_data, tuple, data_set);
}
if(container_data->child) {
rsc->children = g_list_append(rsc->children, container_data->child);
}
return TRUE;
}
static int
tuple_rsc_active(resource_t *rsc, gboolean all)
{
if (rsc) {
gboolean child_active = rsc->fns->active(rsc, all);
if (child_active && !all) {
return TRUE;
} else if (!child_active && all) {
return FALSE;
}
}
return -1;
}
gboolean
container_active(resource_t * rsc, gboolean all)
{
container_variant_data_t *container_data = NULL;
GListPtr iter = NULL;
get_container_variant_data(container_data, rsc);
for (iter = container_data->tuples; iter != NULL; iter = iter->next) {
container_grouping_t *tuple = (container_grouping_t *)(iter->data);
int rsc_active;
rsc_active = tuple_rsc_active(tuple->ip, all);
if (rsc_active >= 0) {
return (gboolean) rsc_active;
}
rsc_active = tuple_rsc_active(tuple->child, all);
if (rsc_active >= 0) {
return (gboolean) rsc_active;
}
rsc_active = tuple_rsc_active(tuple->docker, all);
if (rsc_active >= 0) {
return (gboolean) rsc_active;
}
rsc_active = tuple_rsc_active(tuple->remote, all);
if (rsc_active >= 0) {
return (gboolean) rsc_active;
}
}
/* If "all" is TRUE, we've already checked that no resources were inactive,
* so return TRUE; if "all" is FALSE, we didn't find any active resources,
* so return FALSE.
*/
return all;
}
resource_t *
find_container_child(const char *stem, resource_t * rsc, node_t *node)
{
container_variant_data_t *container_data = NULL;
resource_t *parent = uber_parent(rsc);
CRM_ASSERT(parent->parent);
parent = parent->parent;
get_container_variant_data(container_data, parent);
if (is_not_set(rsc->flags, pe_rsc_unique)) {
for (GListPtr gIter = container_data->tuples; gIter != NULL; gIter = gIter->next) {
container_grouping_t *tuple = (container_grouping_t *)gIter->data;
CRM_ASSERT(tuple);
if(tuple->node->details == node->details) {
rsc = tuple->child;
break;
}
}
}
if (rsc && safe_str_neq(stem, rsc->id)) {
free(rsc->clone_name);
rsc->clone_name = strdup(stem);
}
return rsc;
}
static void
print_rsc_in_list(resource_t *rsc, const char *pre_text, long options,
void *print_data)
{
if (rsc != NULL) {
if (options & pe_print_html) {
status_print("");
}
rsc->fns->print(rsc, pre_text, options, print_data);
if (options & pe_print_html) {
status_print("\n");
}
}
}
static void
container_print_xml(resource_t * rsc, const char *pre_text, long options, void *print_data)
{
container_variant_data_t *container_data = NULL;
char *child_text = NULL;
CRM_CHECK(rsc != NULL, return);
if (pre_text == NULL) {
pre_text = "";
}
child_text = crm_concat(pre_text, " ", ' ');
get_container_variant_data(container_data, rsc);
status_print("%sid);
status_print("type=\"docker\" ");
status_print("image=\"%s\" ", container_data->image);
status_print("unique=\"%s\" ", is_set(rsc->flags, pe_rsc_unique)? "true" : "false");
status_print("managed=\"%s\" ", is_set(rsc->flags, pe_rsc_managed) ? "true" : "false");
status_print("failed=\"%s\" ", is_set(rsc->flags, pe_rsc_failed) ? "true" : "false");
status_print(">\n");
for (GListPtr gIter = container_data->tuples; gIter != NULL; gIter = gIter->next) {
container_grouping_t *tuple = (container_grouping_t *)gIter->data;
CRM_ASSERT(tuple);
status_print("%s \n", pre_text, tuple->offset);
print_rsc_in_list(tuple->ip, child_text, options, print_data);
print_rsc_in_list(tuple->child, child_text, options, print_data);
print_rsc_in_list(tuple->docker, child_text, options, print_data);
print_rsc_in_list(tuple->remote, child_text, options, print_data);
status_print("%s \n", pre_text);
}
status_print("%s\n", pre_text);
free(child_text);
}
static void
tuple_print(container_grouping_t * tuple, const char *pre_text, long options, void *print_data)
{
node_t *node = NULL;
resource_t *rsc = tuple->child;
int offset = 0;
char buffer[LINE_MAX];
if(rsc == NULL) {
rsc = tuple->docker;
}
if(tuple->remote) {
offset += snprintf(buffer + offset, LINE_MAX - offset, "%s", rsc_printable_id(tuple->remote));
} else {
offset += snprintf(buffer + offset, LINE_MAX - offset, "%s", rsc_printable_id(tuple->docker));
}
if(tuple->ipaddr) {
offset += snprintf(buffer + offset, LINE_MAX - offset, " (%s)", tuple->ipaddr);
}
if(tuple->docker && tuple->docker->running_on != NULL) {
node = tuple->docker->running_on->data;
} else if (tuple->docker == NULL && rsc->running_on != NULL) {
node = rsc->running_on->data;
}
common_print(rsc, pre_text, buffer, node, options, print_data);
}
void
container_print(resource_t * rsc, const char *pre_text, long options, void *print_data)
{
container_variant_data_t *container_data = NULL;
char *child_text = NULL;
CRM_CHECK(rsc != NULL, return);
if (options & pe_print_xml) {
container_print_xml(rsc, pre_text, options, print_data);
return;
}
get_container_variant_data(container_data, rsc);
if (pre_text == NULL) {
pre_text = " ";
}
status_print("%sDocker container%s: %s [%s]%s%s\n",
pre_text, container_data->replicas>1?" set":"", rsc->id, container_data->image,
is_set(rsc->flags, pe_rsc_unique) ? " (unique)" : "",
is_set(rsc->flags, pe_rsc_managed) ? "" : " (unmanaged)");
if (options & pe_print_html) {
status_print("
\n\n");
}
for (GListPtr gIter = container_data->tuples; gIter != NULL; gIter = gIter->next) {
container_grouping_t *tuple = (container_grouping_t *)gIter->data;
CRM_ASSERT(tuple);
if (options & pe_print_html) {
status_print("- ");
}
if(is_set(options, pe_print_clone_details)) {
child_text = crm_strdup_printf(" %s", pre_text);
if(g_list_length(container_data->tuples) > 1) {
status_print(" %sReplica[%d]\n", pre_text, tuple->offset);
}
if (options & pe_print_html) {
status_print("
\n\n");
}
print_rsc_in_list(tuple->ip, child_text, options, print_data);
print_rsc_in_list(tuple->docker, child_text, options, print_data);
print_rsc_in_list(tuple->remote, child_text, options, print_data);
print_rsc_in_list(tuple->child, child_text, options, print_data);
if (options & pe_print_html) {
status_print("
\n");
}
} else {
child_text = crm_strdup_printf("%s ", pre_text);
tuple_print(tuple, child_text, options, print_data);
}
free(child_text);
if (options & pe_print_html) {
status_print(" \n");
}
}
if (options & pe_print_html) {
status_print("
\n");
}
}
void
tuple_free(container_grouping_t *tuple)
{
if(tuple == NULL) {
return;
}
// TODO: Free tuple->node ?
if(tuple->ip) {
tuple->ip->fns->free(tuple->ip);
tuple->ip = NULL;
}
if(tuple->child) {
tuple->child->fns->free(tuple->child);
tuple->child = NULL;
}
if(tuple->docker) {
tuple->docker->fns->free(tuple->docker);
tuple->docker = NULL;
}
if(tuple->remote) {
tuple->remote->fns->free(tuple->remote);
tuple->remote = NULL;
}
free(tuple->ipaddr);
free(tuple);
}
void
container_free(resource_t * rsc)
{
container_variant_data_t *container_data = NULL;
CRM_CHECK(rsc != NULL, return);
get_container_variant_data(container_data, rsc);
pe_rsc_trace(rsc, "Freeing %s", rsc->id);
free(container_data->prefix);
free(container_data->image);
free(container_data->control_port);
free(container_data->host_network);
free(container_data->host_netmask);
free(container_data->ip_range_start);
free(container_data->docker_network);
free(container_data->docker_run_options);
free(container_data->docker_run_command);
free(container_data->docker_host_options);
g_list_free_full(container_data->tuples, (GDestroyNotify)tuple_free);
g_list_free_full(container_data->mounts, (GDestroyNotify)mount_free);
g_list_free_full(container_data->ports, (GDestroyNotify)port_free);
common_free(rsc);
}
enum rsc_role_e
container_resource_state(const resource_t * rsc, gboolean current)
{
enum rsc_role_e container_role = RSC_ROLE_UNKNOWN;
return container_role;
}