Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Announcement: Try the Ask AI chatbot for answers to your technical questions about Juniper products and solutions.

close
header-navigation
keyboard_arrow_up
close
keyboard_arrow_left
Contrail Cloud Deployment Guide
Table of Contents Expand all
list Table of Contents
file_download PDF
{ "lLangCode": "en", "lName": "English", "lCountryCode": "us", "transcode": "en_US" }
English
keyboard_arrow_right

Appendix B: Remove a Ceph Storage Node

date_range 03-Apr-23

Use this procedure to remove a Ceph storage node from a Ceph cluster. Removing Ceph storage is handled as a Red Hat process rather than an end-to-end Contrail Cloud process. However, this procedure will demonstrate the removal of a storage node from an environment in the context of Contrail Cloud.

Before you begin, ensure that the remaining nodes in the cluster is sufficient for keeping the required amount of pgs and replicas for your Ceph storage cluster. Ensure that both Ceph cluster and overcloud stack are healthy. For checking the health of your overcloud, see Verify Quorum and Node Health.

All examples in this procedure come from a lab setting to demonstrate storage removal within the context of Contrail Cloud. Sample output in the provided examples will differ from the information in your specific cloud deployment. In the examples used for this procedure, “storage3” will be the targeted node for removal.

Remove the storage node:

  1. Find the connection between the bare metal server and the overcloud server. The output from the command below shows us that the serer we are looking for is “overcloud8st-cephstorageblue1-0”. This information will be used later in the procedure.
    content_copy zoom_out_map
    (undercloud) [stack@undercloud ~]$ openstack ccloud nodemap list
    +---------------------------------+----------------+------------+----------------+
    | Name                            | IP             | Hypervisor | Hypervisor IP  |
    +---------------------------------+----------------+------------+----------------+
    | overcloud8st-afxctrl-0          | 192.168.213.69 | controler2 | 192.168.213.6  |
    | overcloud8st-afxctrl-1          | 192.168.213.52 | controler3 | 192.168.213.7  |
    | overcloud8st-afxctrl-2          | 192.168.213.58 | controler1 | 192.168.213.5  |
    | overcloud8st-ctrl-0             | 192.168.213.73 | controler2 | 192.168.213.6  |
    | overcloud8st-ctrl-1             | 192.168.213.63 | controler1 | 192.168.213.5  |
    | overcloud8st-ctrl-2             | 192.168.213.59 | controler3 | 192.168.213.7  |
    | overcloud8st-cephstorageblue1-0 | 192.168.213.62 | storage3   | 192.168.213.62 |
    | overcloud8st-compdpdk-0         | 192.168.213.56 | compute1   | 192.168.213.56 |
    | overcloud8st-cephstorageblue2-0 | 192.168.213.61 | storage2   | 192.168.213.61 |
    | overcloud8st-cephstorageblue2-1 | 192.168.213.80 | storage1   | 192.168.213.80 |
    +---------------------------------+----------------+------------+----------------+
  2. From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster is healthy:
    content_copy zoom_out_map
    [root@overcloud8st-ctrl-1 ~]# sudo ceph -s
      cluster:
        id:     a98b1580-bb97-11ea-9f2b-525400882160
        health: HEALTH_OK
  3. Find the OSDs that reside on the server to be removed (overcloud8st-cephstorageblue1-0). We identify osd.2, osd.3, osd.6, and osd.7 from the example below:
    content_copy zoom_out_map
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd tree
    ID CLASS WEIGHT   TYPE NAME                                STATUS REWEIGHT PRI-AFF
    -1       10.91638 root default
    -3        3.63879     host overcloud8st-cephstorageblue1-0
     2   hdd  0.90970         osd.2                                up  1.00000 1.00000
     3   hdd  0.90970         osd.3                                up  1.00000 1.00000
     6   hdd  0.90970         osd.6                                up  1.00000 1.00000
     7   hdd  0.90970         osd.7                                up  1.00000 1.00000
    -7        3.63879     host overcloud8st-cephstorageblue2-0
     1   hdd  0.90970         osd.1                                up  1.00000 1.00000
     4   hdd  0.90970         osd.4                                up  1.00000 1.00000
     8   hdd  0.90970         osd.8                                up  1.00000 1.00000
    10   hdd  0.90970         osd.10                               up  1.00000 1.00000
    -5        3.63879     host overcloud8st-cephstorageblue2-1
     0   hdd  0.90970         osd.0                                up  1.00000 1.00000
     5   hdd  0.90970         osd.5                                up  1.00000 1.00000
     9   hdd  0.90970         osd.9                                up  1.00000 1.00000
    11   hdd  0.90970         osd.11                               up  1.00000 1.00000
  4. While still logged in to the openstack controller, mark osd.2, osd.3, osd.6, and osd.7 as non-operational:
    content_copy zoom_out_map
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd out 2
    marked out osd.2.
    [root@overcloud8st-ctrl-1 ~]# sudo osd out 3
    marked out osd.3.
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd out 6
    marked out osd.6.
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd out 7
    marked out osd.7.

    From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

  5. From the undercloud as the heat-admin user, SSH to Ceph node overcloud8st-cephstorageblue1-0, and stop the OSD services:
    content_copy zoom_out_map
    [root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop  ceph-osd@2.service
    [root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop  ceph-osd@3.service
    [root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop  ceph-osd@6.service
    [root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop  ceph-osd@7.service

    From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

  6. From the undercloud as the heat-admin user, SSH back into the controller and remove further information about the OSDs from overcloud8st-cephstorageblue1-0:
    content_copy zoom_out_map
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.2
    removed item id 2 name 'osd.2' from crush map
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.3
    removed item id 3 name 'osd.3' from crush map
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.6
    removed item id 6 name 'osd.6' from crush map
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.7
    removed item id 7 name 'osd.7' from crush map
    
    
    [root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.2
    updated
    [root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.3
    updated
    [root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.6
    updated
    [root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.7
    updated
    
    
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 2
    removed osd.2
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 3
    removed osd.3
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 6
    removed osd.6
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 7
    removed osd.7
    
    [root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush rm overcloud8st-cephstorageblue1-0

    From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

  7. From the undercloud VM find the ID of the Ceph storage node:
    content_copy zoom_out_map
    (undercloud) [stack@undercloud ~]$ openstack server list | grep overcloud8st-cephstorageblue1-0
    | 7ee9be4f-efda-4837-a597-a6554027d0c9 | overcloud8st-cephstorageblue1-0 | ACTIVE | ctlplane=192.168.213.62 | overcloud-full | CephStorageBlue1
  8. Initiate a removal using the node ID from the previous step:
    content_copy zoom_out_map
    (undercloud) [stack@undercloud ~]$ openstack overcloud node delete --stack overcloud 7ee9be4f-efda-4837-a597-a6554027d0c9

    From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

  9. Verify that the bare metal node is in a state of power off and available:
    content_copy zoom_out_map
    (undercloud) [stack@undercloud ~]$ openstack baremetal node list | grep storage3
    | 05bbab4b-b968-4d1d-87bc-a26ac335303d | storage3 | None  | power off  | available | False |
  10. From the jump host as the contrail user mark the storage node with ‘status: deleting’ so the Ceph profile will be removed from it. Add the ‘status: deleting’ to the storage-nodes.yml file for storage3 and then run the script storage-nodes-assign.sh.
    content_copy zoom_out_map
    [contrail@5a6s13-node1 contrail_cloud]$ cat config/storage-nodes.yml
    storage_nodes:
      - name: storage1
        profile: blue2
      - name: storage2
        profile: blue2
      - name: storage3
        profile: blue1
        status: deleting
    
    [contrail@5a6s13-node1 contrail_cloud]$ ./scripts/storage-nodes-assign.sh

    From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

  11. From the jump host as the contrail user, run openstack-deploy.sh to regenerate the templates to reflect the current state:
    content_copy zoom_out_map
    [contrail@5a6s13-node1 contrail_cloud]$ ./scripts/openstack-deploy.sh

    From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

If the goal is to remove the bare metal node completely, use the following additional procedure:

  1. Edit the config/storage-nodes.yml file and remove the bare metal node.

  2. Edit the inventory.yml file and include the ‘status: deleting’ to the node to be removed:

    content_copy zoom_out_map
    [contrail@5a6s13-node1 contrail_cloud]$ cat config/inventory.yml
    ...
    inventory_nodes:
      - name: "storage3"
        pm_addr: "10.84.129.184"
        status: deleting
        <<: *common
    
  3. Run the inventory-assign.sh script:

    content_copy zoom_out_map
    [contrail@5a6s13-node1 contrail_cloud]$ ./scripts/inventory-assign.sh

    From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

  4. Verify the bare metal node has been removed. Enter the following command to view the list of nodes:

    content_copy zoom_out_map
    (undercloud) [stack@undercloud ~]$ openstack ccloud nodemap list |grep storage
     | overcloud8st-cephstorageblue2-1 | 192.168.213.80 | storage1 | 192.168.213.80 |
    | overcloud8st-cephstorageblue2-0 | 192.168.213.61 | storage2 | 192.168.213.61 |
footer-navigation