From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 811CD1FF187 for ; Fri, 2 Jan 2026 17:58:16 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 68B3EF168; Fri, 2 Jan 2026 17:59:27 +0100 (CET) Message-ID: Date: Fri, 2 Jan 2026 17:58:53 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Maximiliano Sandoval References: <20260102134635.458369-1-a.lauterer@proxmox.com> From: Aaron Lauterer In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1767373102195 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.011 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH docs] pveceph: document the change of ceph networks X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Cc: pve-devel@lists.proxmox.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" thanks for the feedback a v2 is now available https://lore.proxmox.com/pve-devel/20260102165754.650450-1-a.lauterer@proxmox.com/T/#u On 2026-01-02 16:03, Maximiliano Sandoval wrote: > Aaron Lauterer writes: > > Some small points below: > >> ceph networks (public, cluster) can be changed on the fly in a running >> cluster. But the procedure, especially for the ceph public network is >> a bit more involved. By documenting it, we will hopefully reduce the >> number of issues our users run into when they try to attempt a network >> change on their own. >> >> Signed-off-by: Aaron Lauterer >> --- >> Before I apply this commit I would like to get at least one T-b where you tested >> both scenarios to make sure the instructions are clear to follow and that I >> didn't miss anything. >> >> pveceph.adoc | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 186 insertions(+) >> >> diff --git a/pveceph.adoc b/pveceph.adoc >> index 63c5ca9..c4a4f91 100644 >> --- a/pveceph.adoc >> +++ b/pveceph.adoc >> @@ -1192,6 +1192,192 @@ ceph osd unset noout >> You can now start up the guests. Highly available guests will change their state >> to 'started' when they power on. >> >> + >> +[[pveceph_network_change]] >> +Network Changes >> +~~~~~~~~~~~~~~~ >> + >> +It is possible to change the networks used by Ceph in a HCI setup without any >> +downtime if *both the old and new networks can be configured at the same time*. >> + >> +The procedure differs depending on which network you want to change. >> + >> +After the new network has been configured on all hosts, make sure you test it >> +before proceeding with the changes. One way is to ping all hosts on the new >> +network. If you use a large MTU, make sure to also test that it works. For >> +example by sending ping packets that will result in a final packet at the max >> +MTU size. >> + >> +To test an MTU of 9000, you will need the following packet sizes: >> + >> +[horizontal] >> +IPv4:: The overhead of IP and ICMP is '28' bytes; the resulting packet size for >> +the ping then is '8972' bytes. > > I would personally mention that this is "generally" the case, as one > could be dealing with bigger headers, e.g. when q-in-q is used. > >> +IPv6:: The overhead is '48' bytes and the resulting packet size is >> +'8952' bytes. >> + >> +The resulting ping command will look like this for an IPv4: >> +[source,bash] >> +---- >> +ping -M do -s 8972 {target IP} >> +---- >> + >> +When you are switching between IPv4 and IPv6 networks, you need to make sure >> +that the following options in the `ceph.conf` file are correctly set to `true` >> +or `false`. These config options configure if Ceph services should bind to IPv4 >> +or IPv6 addresses. >> +---- >> +ms_bind_ipv4 = true >> +ms_bind_ipv6 = false >> +---- >> + >> +[[pveceph_network_change_public]] >> +Change the Ceph Public Network >> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> + >> +The Ceph Public network is the main communication channel in a Ceph cluster >> +between the different services and clients (for example, a VM). Changing it to >> +a different network is not as simple as changing the Ceph Cluster network. The >> +main reason is that besides the configuration in the `ceph.conf` file, the Ceph >> +MONs (monitors) have an internal configuration where they keep track of all the >> +other MONs that are part of the cluster, the 'monmap'. >> + >> +Therefore, the procedure to change the Ceph Public network is a bit more >> +involved: >> + >> +1. Change `public_network` in the `ceph.conf` file > > This is mentioned in the warning below, but maybe more emphasis could be > made here to only touch this one value. > > Additionally, please use the full path here. There are versions at > /etc/pve and /etc/ceph and this is the first time in this new section > where one needs to modify one (even if it is mentioned below in the > expanded version). > >> +2. Restart non MON services: OSDs, MGRs and MDS on one host >> +3. Wait until Ceph is back to 'Health_OK' > > Should be HEALTH_OK instead. > >> +4. Verify services are using the new network >> +5. Continue restarting services on the next host >> +6. Destroy one MON >> +7. Recreate MON >> +8. Wait until Ceph is back to 'Health_OK' > > Should be HEALTH_OK instead. > >> +9. Continue destroying and recreating MONs >> + >> +You first need to edit the `/etc/pve/ceph.conf` file. Change the >> +`public_network` line to match the new subnet. >> + >> +---- >> +cluster_network = 10.9.9.30/24 >> +---- >> + >> +WARNING: Do not change the `mon_host` line or any `[mon.HOSTNAME]` sections. >> +These will be updated automatically when the MONs are destroyed and recreated. >> + >> +NOTE: Don't worry if the host bits (for example, the last octet) are set by >> +default, the netmask in CIDR notation defines the network part. >> + >> +After you have changed the network, you need to restart the non MON services in >> +the cluster for the changes to take effect. Do so one node at a time! To restart all >> +non MON services on one node, you can use the following commands on that node. >> +Ceph has `systemd` targets for each type of service. >> + >> +[source,bash] >> +---- >> +systemctl restart ceph-osd.target >> +systemctl restart ceph-mgr.target >> +systemctl restart ceph-mds.target >> +---- >> +NOTE: You will only have MDS' (Metadata Server) if you use CephFS. >> + >> +NOTE: After the first OSD service got restarted, the GUI will complain that >> +the OSD is not reachable anymore. This is not an issue,; VMs can still reach > > Is the double punctuation here intentional? > >> +them. The reason for the message is that the MGR service cannot reach the OSD >> +anymore. The error will vanish after the MGR services get restarted. >> + >> +WARNING: Do not restart OSDs on multiple hosts at the same time. Chances are >> +that for some PGs (placement groups), 2 out of the (default) 3 replicas will >> +be down. This will result in I/O being halted until the minimum required number >> +(`min_size`) of replicas is available again. >> + >> +To verify that the services are listening on the new network, you can run the >> +following command on each node: >> + >> +[source,bash] >> +---- >> +ss -tulpn | grep ceph >> +---- >> + >> +NOTE: Since OSDs will also listen on the Ceph Cluster network, expect to see that >> +network too in the output of `ss -tulpn`. >> + >> +Once the Ceph cluster is back in a fully healthy state ('Health_OK'), and the > > Same here, HEALTH_OK. > >> +services are listening on the new network, continue to restart the services on >> +the host. >> + >> +The last services that need to be moved to the new network are the Ceph MONs >> +themselves. The easiest way is to destroy and recreate each monitor one by >> +one. This way, any mention of it in the `ceph.conf` and the monitor internal >> +`monmap` is handled automatically. >> + >> +Destroy the first MON and create it again. Wait a few moments before you >> +continue on to the next MON in the cluster, and make sure the cluster reports >> +'Health_OK' before proceeding. >> + >> +Once all MONs are recreated, you can verify that any mention of MONs in the >> +`ceph.conf` file references the new network. That means mainly the `mon_host` >> +line and the `[mon.HOSTNAME]` sections. >> + >> +One final `ss -tulpn | grep ceph` should show that the old network is not used >> +by any Ceph service anymore. >> + >> +[[pveceph_network_change_cluster]] >> +Change the Ceph Cluster Network >> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> + >> +The Ceph Cluster network is used for the replication traffic between the OSDs. >> +Therefore, it can be beneficial to place it on its own fast physical network. >> + >> +The overall procedure is: >> + >> +1. Change `cluster_network` in the `ceph.conf` file >> +2. Restart OSDs on one host >> +3. Wait until Ceph is back to 'Health_OK' >> +4. Verify OSDs are using the new network >> +5. Continue restarting OSDs on the next host >> + >> +You first need to edit the `/etc/pve/ceph.conf` file. Change the >> +`cluster_network` line to match the new subnet. >> + >> +---- >> +cluster_network = 10.9.9.30/24 >> +---- >> + >> +NOTE: Don't worry if the host bits (for example, the last octet) are set by >> +default; the netmask in CIDR notation defines the network part. >> + >> +After you have changed the network, you need to restart the OSDs in the cluster >> +for the changes to take effect. Do so one node at a time! >> +To restart all OSDs on one node, you can use the following command on the CLI on >> +that node: >> + >> +[source,bash] >> +---- >> +systemctl restart ceph-osd.target >> +---- >> + >> +WARNING: Do not restart OSDs on multiple hosts at the same time. Chances are >> +that for some PGs (placement groups), 2 out of the (default) 3 replicas will >> +be down. This will result in I/O being halted until the minimum required number >> +(`min_size`) of replicas is available again. >> + >> +To verify that the OSD services are listening on the new network, you can either >> +check the *OSD Details -> Network* tab in the *Ceph -> OSD* panel or by running >> +the following command on the host: >> +[source,bash] >> +---- >> +ss -tulpn | grep ceph-osd >> +---- >> + >> +NOTE: Since OSDs will also listen on the Ceph Public network, expect to see that >> +network too in the output of `ss -tulpn`. >> + >> +Once the Ceph cluster is back in a fully healthy state ('Health_OK'), and the > > Same, should be HEALTH_OK. > >> +OSDs are listening on the new network, continue to restart the OSDs on the next >> +host. >> + >> + >> [[pve_ceph_mon_and_ts]] >> Ceph Monitoring and Troubleshooting >> ----------------------------------- > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel