Autoenabling a Service Group
A service group is autodisabled until VCS probes all of the resources and checks that they are ready to bring online. Autoenable a service group in situations where the VCS engine is not running on one of the systems in the cluster, and you must override the disabled state of the service group to enable the group on another system in the cluster. To autoenable a service group from Cluster Explorer 1. On the Service Groups tab of the configuration tree, right-click the service group. or Click...
Deleting a Service Group
Delete a service group from Cluster Explorer or Command Center. Note You cannot delete service groups with dependencies. To delete a linked service group, you must first delete the link. To delete a service group from Cluster Explorer 1. On the Service Groups tab of the configuration tree, right-click the service group. or Click a cluster in the configuration tree, click the Service Groups tab, and right-click the service group icon in the view panel. To delete a service group from Command...
Example 2 The maincf for a TwoNode Asymmetric NFS Cluster
The following example is a basic two-node cluster exporting an NFS file system. The servers Serverl and Server2 storage One disk group managed using VERITAS Volume Manager, sharedl IP address 192.168.1.3 IP_nfs1 Server1 is primary location to start the NFS_group1 In an NFS configuration, the resource dependencies must be configured to bring up the IP address last. This prevents the client from accessing the server until everything is ready, and preventing unnecessary Stale File Handle errors on...
Cluster Membership
Cluster membership implies that the cluster must accurately determine which nodes are active in the cluster at any given time. In order to take corrective action on node failure, surviving nodes must agree on when a node has departed. This membership needs to be accurate and must be coordinated among active members. This becomes critical considering nodes can be added, rebooted, powered off, faulted, and so on. VCS uses its cluster membership capability to dynamically track the overall cluster...
Putting the Pieces Together
How do all these pieces combine to form a cluster Understanding that question makes understanding the rest of VCS fairly simple. Take a common example, a two-node cluster exporting an NFS file system to clients. The cluster itself consists of two nodes connected to shared storage, which enables both servers to access the data required for the file system export. In this example, a single service group, NFS_Group, fails over between ServerA and ServerB. The service group, configured as a...
Coordinator Disks
VCS uses special-purpose disks, called coordinator disks, for I O fencing during cluster membership change. These are three standard disks or LUNs, which together act as a global lock device during a cluster reconfiguration. VCS uses this lock mechanism to determine which nodes remain in a cluster and which node gets to fence off data drives from other nodes. Node 0 Node 1 Node 2 Node 3 Node 0 Node 1 Node 2 Node 3 Coordinator disks cannot be used for any other purpose in the VCS configuration....
Global Cluster
A global cluster links clusters at separate locations and enables wide-area failover and disaster recovery. Local clustering provides local failover for each site or building. Campus and replicated cluster configurations offer protection against disasters affecting limited geographic regions. Large scale disasters such as major floods, hurricanes, and earthquakes can cause outages for an entire city or region. In such situations, data availability can be ensured by migrating applications to...
violation Event Trigger
Usage - violation system service_group The variable system represents the name of the system. The variable service_group represents the name of the service group that was fully or partially online. Description This trigger is invoked only on the system that caused the concurrency violation. Specifically, it takes the service group offline on the system where the trigger was invoked. Note that this trigger applies to failover groups only. The default trigger takes the service group offline on...
Administering Service Groups 1
Operations for the VCS global clusters option are enabled or restricted depending on the permissions with which you log on. The privileges associated with each user category are enforced for cross-cluster, service group operations. The following information defines the criteria. See User Privileges on page 55 for details on user categories. To bring online or take offline global service groups on a target in the remote cluster You must be a valid user on the local cluster on which the operation...
Network Partitions and the UNIX Boot Monitor
Most UNIX systems provide a console-abort sequence that enables you to halt and continue the processor. Continuing operations after the processor has stopped may corrupt data and is therefore unsupported by VCS. When a system is halted with the abort sequence, it stops producing heartbeats. The other systems in the cluster consider the system failed and take over its services. If the system is later enabled with another console sequence, it continues writing to shared storage as before, even...
The typescf File
The types.cf file describes standard resource types to the VCS engine specifically, the data required to control a specific resource. The following example illustrates a DiskGroup resource type definition. static int NumThreads 1 static int OnlineRetryLimit 1 static str ArgList DiskGroup, StartVolumes, StopVolumes, MonitorOnly str DiskGroup str StartVolumes 1 str StopVolumes 1 The types definition performs two important functions. First, it defines the type of values that may be set for each...
Freezing a Service Group
Freeze a service group to prevent it from failing over to another system. This freezing process stops all online and offline procedures on the service group. To freeze a service group from Cluster Explorer 1. On the Service Groups tab of the configuration tree, right-click the service group. or Click the cluster in the configuration tree, click the Service Groups tab, and right-click the service group icon in the view panel. 2. Click Freeze, and click Temporary or Persistent from the menu. The...
Freezing a Service Group 1
Freeze a service group to prevent it from failing over to another system. This freezing procedure stops all online and offline operations on the service group. 1. From the Service Group page, click Freeze on the left pane. 2. On the Freeze Group dialog box a. If necessary, click Persistent to enable the service group to retain its frozen state when the cluster is rebooted.
HighAvailability Daemon HAD
The high-availability daemon, or HAD, is the main VCS daemon running on each system. It is responsible for building the running cluster configuration from the configuration files, distributing the information when new nodes join the cluster, responding to operator input, and taking corrective action when something fails. It is typically known as the VCS engine. The engine uses agents to monitor and manage resources. Information about resource states is collected from the agents on the local...
Shared Nothing Cluster
Systems in shared nothing clusters do not share access to disks they maintain separate copies of data. VCS shared nothing clusters typically have read-only data stored locally on both systems. For example, a pair of systems in a cluster that includes a critical Web server which provides access to a backend database. The actual Web server runs on local disks and does not require data sharing at the Web server level.
Running the GCO Configuration Wizard
If you are upgrading from a single-cluster setup to a multi-cluster setup, run the GCO Configuration wizard to create or update the ClusterService group. The wizard verifies your configuration and validates it for a global cluster setup. You must have the GCO license installed on all nodes in the cluster. For more information, see Installing a VCS License on page 73. 1. Start the GCO Configuration wizard. 2. The wizard discovers the NIC devices on the local system and prompts you to enter the...
Critical and NonCritical Resources
The Critical attribute for a resource defines whether a service group fails over when a resource faults. If a resource is configured as non-critical by setting the Critical attribute to 0 and no resources depending on the failed resource are critical, the service group will not fail over. VCS takes the failed resource offline and updates the group status to online I partial. The attribute also determines whether a service group tries to come online on another node if, during the group's online...
VCS Behavior When a Resource Fails to Come Online
In the following example, the agent framework invokes the Online entry point for an offline resource. The resource state changes to waiting to online. If the Online entry point times out, VCS examines the value of the ManageFaults attribute. If ManageFaults is set to NONE, the resource state changes to offline I admin_wait. If ManageFaults is set to ALL, VCS calls the Clean entry point with the CleanReason set to Online Hung. If the Online entry point does not time out, VCS invokes the Monitor...
SystemList Attribute
The SystemList attribute designates all systems that can run a particular service group. VCS does not allow a service group to be brought online on a system that is not in the group's system list. By default, the order of systems in the list defines the priority of systems used in a failover. For example, the definition SystemList SystemA, SystemB, SystemC configures SystemA to be the first choice on failover, followed by SystemB and then SystemC. System priority may also be assigned explicitly...
Configuring SMTP Notification for VRTSweb
You can configure VRTSweb to send out email notifications about events occurring in the Web server. Some events that can occur are Web Server starting stopping severity INFORMATION Web Consoles starting stopping severity INFORMATION Web Server's allocated heap size very close to maximum allowed severity SEVERE To send an email notification, VRTSweb needs to know the IP Address or Hostname of a configured SMTP server. The configured SMTP server address is also made available to all the Web...
Examples of State Transitions
If VCS is started on a system, and if that system is the only one in the cluster with a valid configuration, the system transitions to the running state initing - gt current_discover_wait - gt local_build - gt running If VCS is started on a system with a valid configuration file, and if at least one other system is already in the RUNNING state, the new system transitions to the running state initing - gt current_discover_wait - gt remote_build - gt running If VCS is started on a system with a...
The Notifier Process
The notifier process configures how messages are received from VCS and how they are delivered to SNMP consoles and SMTP servers. Using notifier, you can specify notification based on the severity level of the events generating the messages. You can also specify the size of the VCS message queue, which is 30 by default. You can change this value by modifying the MessageQueue attribute. See the VCS Bundled Agents Reference Guide for more information about this attribute. When started from the...
Configuring Application and NFS Service Groups
VCS provides easy-to-use interface configuration wizards to create specific service groups. VCS provides the following configuration wizards Application Configuration Wizard Creates and modifies Application service groups, which provide high availability for applications in a VCS cluster. For more information, see Configuring Application Service Groups Using the Application Wizard on page 318. Creates and modifies NFS service groups, which provide high availability for fileshares in a VCS...
InterNode Communication
VCS uses the cluster interconnect for network communications between cluster nodes. The nodes communicate using the capabilities provided by LLT and GAB. The LLT module is designed to function as a high performance, low latency replacement for the IP stack and is used for all cluster communications. LLT provides the communications backbone for GAB. LLT distributes, or load balances inter-node communication across up to eight interconnect links. When a link fails, traffic is redirected to...
Logs
The Logs dialog box displays the log messages generated by the VCS engine, VCS agents, and commands issued from Cluster Manager to the cluster. Use this dialog box to monitor and take actions on alerts on faulted global clusters and failed service group failover attempts. Note To ensure the time stamps for engine log messages are accurate, make sure to set the time zone of the system running the Java Console to the same time zone as the system running the VCS engine. Click the VCS Logs tab to...
Jeopardy Scenario Link Failure
In this scenario, a link to node 2 fails, leaving the node with only one possible heartbeat Regular membership 0,1,2,3 Jeopardy membership 2 A new cluster membership is issued with nodes 0, 1, 2, and 3 in the regular membership and node 2 in a jeopardy membership. All normal cluster operations continue, including normal failover of service groups due to resource fault.
Administering the Cluster from Cluster Manager Web Console
Cluster Manager Web Console offers web-based administration capabilities for your cluster. Use the Web Console to monitor clusters and cluster objects, including service groups, systems, resources, and resource types. Many of the operations supported by the Web Console are also supported by the command line interface and Cluster Manager Java Console . The Web Console uses a Web Server component called VRTSweb. See the appendix Administering VERITAS Java Web Server on page 653 for more...
Configuring Replication
VCS supports several replication solutions for global clustering. Please contact your VERITAS sales representative for the solutions supported by VCS. This section describes how to set up replication using VERITAS Volume Replicator VVR. 1. Create a new service group, say appgroup_rep. 2. Copy the DiskGroup resource from the appgroup to the new group. 3. Configure new resources of type IP and NIC in the appgroup_rep service group. 4. Configure a new resource of type RVG in the new appgroup_rep...
Service group autodisabled
When VCS does not know the status of a service group on a particular system, it autodisables the service group on that system. Autodisabling occurs under the following conditions When the VCS engine, had, is not running on the system. When all resources within the service group are not probed on the system. When a particular system is visible through disk heartbeat only. Under these conditions, all service groups that include the system in their SystemList attribute are autodisabled. This does...
Resource Type Attributes
As indicated in the following table, some predefined, static attributes for resource types can be overridden. Additionally, all static attributes that are not predefined can be overridden. See Overriding Resource Type Static Attributes on page 101 for details. For more information on any attribute listed below, see the chapter on setting agent parameters in the VERITAS Cluster Server Agent Developer's Guide. Timeout value for the Action entry point. Default 40 seconds Indicates the scheduling...
Encrypting Passwords
VCS provides the vcsencrypt utility to generate encrypted passwords. The utility prompts you to enter a password and returns an encrypted password. Encrypted passwords can be used when editing the VC configuration file main.cf to add VCS users or when configuring agents that require user password information. Note Do not use the vcsencrypt utility when entering passwords from a configuration wizard or from the Java and Web consoles. 1. Run the utility from the command line. To encrypt a...
hauser
hauser User Privileges for CLI and Cluster Shell Commands
Unfreezing a System
Thaw a frozen system to perform online and offline operations on the system. To thaw or unfreeze a system from Cluster Explorer 1. Click the Systems tab of the configuration tree. 2. In the configuration tree, right-click the system, click Unfreeze To unfreeze a system from Command Center 1. On the Command Center configuration tree, expand Unfreeze System.
Disabling a Service Group
Disable a service group to prevent it from coming online. This process temporarily stops VCS from monitoring a service group on a system undergoing maintenance operations. To disable a service group from Cluster Explorer 1. On the Service Groups tab of the configuration tree, right-click the service group. or Click the cluster in the configuration tree, click the Service Groups tab, and right-click the service group icon in the view panel. 2. Click Disable, and click the appropriate system in...
Network Partition
If all network connections between any two groups of systems fail simultaneously, a network partition occurs. When this happens, systems on both sides of the partition can restart applications from the other side resulting in duplicate services, or split-brain. A split brain occurs when two independent systems configured in a cluster assume they have exclusive access to a given resource usually a file system or volume . The most serious problem caused by a network partition is that it affects...
Symmetric or ActiveActive
In a symmetric configuration, each server is configured to run a specific application or service and provide redundancy for its peer. In the example below, each server is running one application service group. When a failure occurs, the surviving server hosts both application groups. Symmetric configurations appear more efficient in terms of hardware utilization. One could object to the concept of a valuable system sitting idle. However, this line of reasoning can be flawed. In the asymmetric...
Cluster Attributes
Contains list of users with Administrator privileges. If user does not have root privileges, and if this attribute is set to 0, user is prompted for a password when issuing haxxx commands. If this attribute is set to 1, the user is not prompted. VCS validates OS user's login against VCS' list of user IDs and assigns appropriate privileges. Default 0 If the local cluster cannot communicate with one or more remote clusters, this attribute specifies the number of seconds the VCS engine waits...
Simulating Global Clusters Using VCS Simulator
This section describes how you can simulate a global cluster environment using VCS Simulator. 1. Install VCS Simulator in a directory SIM_HOME on your system. For instructions, see Installing VCS Simulator on page 115. 2. Set up the clusters on your system. Run the following command to add a cluster hasim -setupclus clustername -simport port_no -wacport port_no Note Do not use default_clus as the cluster name when simulating a global cluster. The term default_clus is reserved for actual...
Bringing a Service Group Online
To bring a service group online from the Cluster Explorer Configuration Tree 1. On the Service Groups tab of the configuration tree, right-click the service group. or Click a cluster in the configuration tree, click the Service Groups tab, and right-click the service group icon in the view panel. 2. Click Online, and click the appropriate system from the menu. Click Any System if you do not need to specify a system. To bring a service group online from the Cluster Explorer Toolbar 1. Click...
preonline Event Trigger
Usage - preonline triggertype system service_group whyonlining The variable triggertype represents whether trigger is custom triggertype 0 or Note For this trigger, triggertype 0. The variable system represents the name of the system. The variable service_group represents the name of the service group on which the hagrp command was issued or the fault occurred. The variable whyonlining represents two values fault indicates that the group was brought online in response to a group failover or...
Trap Variables in VCS MIB
This section describes trap variables in VCS MIB. Traps sent by VCS 4.0 are reversible to SNMPv2 after an SNMPv2 - gt SNMPvl conversion. For reversible translations between SNMPvl and SNMPv2 trap PDUs, the second-last ID of the SNMP trap OID must be zero. This ensures that once you make a forward translation SNMPv2 trap - gt SNMPvl RFC 2576 Section 3.2 , the reverse translation SNMPvl trap -- gt SNMPv2 trap RFC 2576 Section 3.1 is accurate. In earlier versions of VCS, this ID was not zero. The...
Probing a Resource
Probe a resource to check that it is configured and ready to bring online. To probe a resource from Cluster Explorer 1. On the Service Groups tab of the configuration tree, right-click the resource. 2. Click Probe, and click the appropriate system from the menu. To probe a resource from Command Center 1. On the Command Center configuration tree, expand Resource. 3. Click the system on which to probe the resource.
Understanding Splitbrain and the Need for IO Fencing
When VCS detects node failure, it attempts to take corrective action, which is determined by the cluster configuration. If the failing node hosted a service group, and one of the remaining nodes is designated in the group's SystemList, then VCS fails the service group over and imports shared storage to another node in the cluster.If the mechanism used to detect node failure breaks down, the symptoms appear identical to those of a failed node. For example, in a four-node cluster, if a system...
VCS Behavior When an Online Resource Faults
In the following example, a resource in an online state is reported as being offline without being commanded by the agent to go offline. VCS first verifies the Monitor routine completes successfully in the required time. If it does, VCS examines the exit code returned by the Monitor routine. If the Monitor routine does not complete in the required time, VCS looks at the FaultOnMonitorTimeouts FOMT attribute. If FOMT 0, the resource will not fault when the Monitor routine times out. VCS...
ClusterService Group
The ClusterService group is a special purpose service group, which can fail over to any node despite restrictions such as frozen. It is the first service group to come online and cannot be autodisabled. The group comes online on the first node that goes in the running state. The wide-area connector, its alias, and notifier are components of the ClusterService group. If the node on which the connector is running crashes, the service group is failed over to the next available node. The command...
Low Priority Link
LLT can be configured to use a low priority network link as a backup to normal heartbeat channels. Low priority links are typically configured on the public or administrative network. The low priority link is not used for cluster membership traffic until it is the only remaining link. During normal operation, the low priority link carries only heartbeat traffic for cluster membership and link state maintenance. The frequency of heartbeats is reduced to 50 of normal to reduce network overhead....
Moving and Linking Icons in Group and Resource Views
The Link and Auto Arrange buttons are available in the top right corner of the Service Group or Resource View Click Link to set or disable the link mode for the Service Group and Resource Views. Note There are alternative ways to set up dependency links without using the Link button. The link mode enables you to create a dependency link by clicking on the parent icon, dragging the yellow line to the icon that will serve as the child, and then clicking the child icon. Use the Esc key to delete...
System Attributes
This attribute is set to 1 on a system when all agents running on the system are stopped. Indicates system's available capacity when trigger is fired. If this value is negative, the argument contains the prefix percentage sign for example, -4. Value expressing total system load capacity. This value is relative to other systems in the cluster and does not reflect any real value associated with a particular system. For example, the administrator may assign a value of 200 to a 16-processor machine...
Controlling Cluster Components
Resources are classified according to types, and multiple resources can be of a single type. For example, two disk resources are classified as type Disk. How VCS starts and stops a resource is specific to the resource type. For example, mounting starts a file system resource and configuring the IP address starts the IP resource on a network interface card. Monitoring a resource means testing it to determine if it is online or offline. How VCS monitors a resource is also specific to the resource...












