* [PVE-User] Thin LVM showing more used space than expected
@ 2022-12-27 17:54 Óscar de Arriba
2022-12-28 10:52 ` Alain Péan
0 siblings, 1 reply; 10+ messages in thread
From: Óscar de Arriba @ 2022-12-27 17:54 UTC (permalink / raw)
To: pve-user
Hello all,
From ~1 week ago, one of my Proxmox nodes' data LVM is doing strange things.
For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. I have set up proxmox inside a cluster with LVM and making backups to a NFS external location.
Last week I tried to migrate an stopped VM of ~64 GiB from one server to another, and found out *the SSD started to underperform (~5 MB/s) after roughly 55 GiB copied *(this pattern was repeated several times).
It was so bad that *even cancelling the migration, the SSD continued busy writting at that speeed and I need to reboot the instance, as it was completely unusable* (it is in my homelab, not running mission critical workloads, so it was okay to do that). After the reboot, I could remove the half-copied VM disk.
After that, (and several retries, even making a backup to an external storage and trying to restore the backup, just in case the bottleneck was on the migration process) I ended up creating the instance from scratch and migrating data from one VM to another - so the VM was crearted brand new and no bottleneck was hit.
The problem is that *now the pve/data logical volume is showing 377 GiB used, but the total size of stored VM disks (even if they are 100% approvisioned) is 168 GiB*. I checked and both VMs have no snapshots.
I don't know if the reboot while writting to the disk (always having cancelled the migration first) damaged the LV in some way, but after thinking about it it does not even make sense that an SSD of this type ends up writting at 5 MB/s, even with the writting cache full. It should be writting far faster than that even without cache.
Some information about the storage:
`root@venom:~# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data pve twi-aotz-- 377.55g 96.13 1.54
[data_tdata] pve Twi-ao---- 377.55g
[data_tmeta] pve ewi-ao---- <3.86g
[lvol0_pmspare] pve ewi------- <3.86g
root pve -wi-ao---- 60.00g
swap pve -wi-ao---- 4.00g
vm-150-disk-0 pve Vwi-a-tz-- 4.00m data 14.06
vm-150-disk-1 pve Vwi-a-tz-- 128.00g data 100.00
vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06
vm-201-disk-1 pve Vwi-aotz-- 40.00g data 71.51`
and can be also seen on this post on the forum I did a couple of days ago: https://forum.proxmox.com/threads/thin-lvm-showing-more-used-space-than-expected.120051/
Any ideas aside from doing a backup and reinstall from scratch?
Thanks in advance!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
2022-12-27 17:54 [PVE-User] Thin LVM showing more used space than expected Óscar de Arriba
@ 2022-12-28 10:52 ` Alain Péan
2022-12-28 11:19 ` Óscar de Arriba
0 siblings, 1 reply; 10+ messages in thread
From: Alain Péan @ 2022-12-28 10:52 UTC (permalink / raw)
To: Proxmox VE user list, Óscar de Arriba
Le 27/12/2022 à 18:54, Óscar de Arriba a écrit :
> For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout.
Hi Oscar,
Just to be sure, because normally wearout is 100% when the SSD is new,
You are just soustracting, and it is in fact 100-4 = 96% ?
My SSDs (Dell mixed use) after some years are still at 99%, so I am
wondering about 4%...
Alain
--
Administrateur Système/Réseau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
2022-12-28 10:52 ` Alain Péan
@ 2022-12-28 11:19 ` Óscar de Arriba
2022-12-28 12:17 ` Alain Péan
0 siblings, 1 reply; 10+ messages in thread
From: Óscar de Arriba @ 2022-12-28 11:19 UTC (permalink / raw)
To: Proxmox VE user list, alain.pean
Hi Alain,
Thanks for taking time to answer my message.
I think Proxmox UI is showing the % of wearout consumed. I just checked SMART using smartctl and it is showing 2.86 TB witten of a maximum of 180 TBW of this model (6%).
I think those numbers are too much for the usage of this drive, but the number of power on hours match (52 days). I think the TBW are elevated because we had an instance with swap actived and that could generate s lot of IO (that's no longer the case from a couple of weeks ago).
However, the strange behaviour of showing much more space used than the sum of VM disks + snapshots continue, and I'm really worried that the performance issue after copying some data can come from that situation. Also, the unit is showing now a 96% of space used, which worries me about decreased performance because of fragmentation issues.
Oscar
On Wed, Dec 28, 2022, at 11:52, Alain Péan wrote:
> Le 27/12/2022 à 18:54, Óscar de Arriba a écrit :
> > For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout.
>
> Hi Oscar,
>
> Just to be sure, because normally wearout is 100% when the SSD is new,
> You are just soustracting, and it is in fact 100-4 = 96% ?
> My SSDs (Dell mixed use) after some years are still at 99%, so I am
> wondering about 4%...
>
> Alain
>
> --
> Administrateur Système/Réseau
> C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
> Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
> Tel : 01-70-27-06-88 Bureau A255
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
2022-12-28 11:19 ` Óscar de Arriba
@ 2022-12-28 12:17 ` Alain Péan
2022-12-28 18:22 ` Óscar de Arriba
0 siblings, 1 reply; 10+ messages in thread
From: Alain Péan @ 2022-12-28 12:17 UTC (permalink / raw)
To: Óscar de Arriba, Proxmox VE user list
Le 28/12/2022 à 12:19, Óscar de Arriba a écrit :
> I think Proxmox UI is showing the % of wearout consumed.
In my case, with Dell servers, the UI in fact is not showing anything
(N/A), when the Raid storage volume is managed by the raid controller.
In this case, I use Dell OMSA (Open Manage Server Administration), to
display the values.
But I have another cluster with Ceph, and indeed, it displays 0% as
wearout. So I think you are right.
I saw that they are Crucial SATA SSD directly attached on the
motherboard. What kind of filesystem do you have on these SSDs ? Can you
run pveperf on /dev/mapper/pveroot to see what are the performances ?
Alain
--
Administrateur Système/Réseau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
2022-12-28 12:17 ` Alain Péan
@ 2022-12-28 18:22 ` Óscar de Arriba
0 siblings, 0 replies; 10+ messages in thread
From: Óscar de Arriba @ 2022-12-28 18:22 UTC (permalink / raw)
To: Alain Péan, Proxmox VE user list
> I saw that they are Crucial SATA SSD directly attached on the motherboard. What kind of filesystem do you have on these SSDs ? Can you run pveperf on /dev/mapper/pveroot to see what are the performances ?
It is using LVM with ext4 for the root filesystem and the data storage is using LVM-Thin.
root@venom:~# blkid
/dev/sdj2: UUID="7B86-9E58" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="f4324ec9-c95e-4963-9ea9-5026f8f3fcae"
/dev/sdj3: UUID="16ioTj-mei2-pqZI-bJWU-myRs-AF5a-Pfw03x" TYPE="LVM2_member" PARTUUID="d2528e8f-7958-4dc1-9960-f67999b75058"
/dev/mapper/pve-swap: UUID="0fbe15d8-7823-42bc-891c-c131407921c7" TYPE="swap"
/dev/mapper/pve-root: UUID="6bef8c06-b480-409c-8fa0-076344c9108d" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdj1: PARTUUID="70bb576f-ab3a-4867-ab2e-e9a7c3fb5a15"
/dev/mapper/pve-vm--150--disk--1: PTUUID="90e3bde4-d85c-46cb-a4b9-799c99e340c6" PTTYPE="gpt"
/dev/mapper/pve-vm--201--disk--1: PTUUID="cb44eeb1-db0d-4d42-8a14-05077231b097" PTTYPE="gpt"
root@venom:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data pve twi-aotz-- 377.55g 96.60 1.55
root pve -wi-ao---- 60.00g
swap pve -wi-ao---- 4.00g
vm-150-disk-0 pve Vwi-a-tz-- 4.00m data 14.06
vm-150-disk-1 pve Vwi-a-tz-- 128.00g data 100.00
vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06
vm-201-disk-1 pve Vwi-aotz-- 40.00g data 75.89
Regarding pveperf:
root@venom:~# pveperf /dev/mapper/pve-root
CPU BOGOMIPS: 211008.96
REGEX/SECOND: 1983864
HD SIZE: 58.76 GB (/dev/mapper/pve-root)
BUFFERED READS: 338.10 MB/sec
AVERAGE SEEK TIME: 0.09 ms
open failed: Not a directory
root@venom:~# pveperf ~/
CPU BOGOMIPS: 211008.96
REGEX/SECOND: 2067874
HD SIZE: 58.76 GB (/dev/mapper/pve-root)
BUFFERED READS: 337.51 MB/sec
AVERAGE SEEK TIME: 0.09 ms
FSYNCS/SECOND: 679.87
DNS EXT: 128.22 ms
DNS INT: 127.51 ms
Thanks,
Oscar
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
2022-12-29 16:58 ` Óscar de Arriba
2022-12-30 9:52 ` Tom Weber
@ 2023-01-02 9:56 ` Óscar de Arriba
1 sibling, 0 replies; 10+ messages in thread
From: Óscar de Arriba @ 2023-01-02 9:56 UTC (permalink / raw)
To: Martin Holub, pve-user
I ended up reinstalling th server and using ZFS this time. It solved all the issues - the ghost used space is back and I could restore even a VM 3x times bigger without any slowdown.
Not sure what happened but with all that space occupied my best guess is that during the restoration of VMs LVM was getting out of blocks to use and needed to do some cleanups, which led to that slow performance at ~5Mbps. However, it is strange that I could not find anyone else facing this issue.
On Thu, Dec 29, 2022, at 17:58, Óscar de Arriba wrote:
> Update: I have enabled `Discard` option in all disks of the VMs on that server and then `fstim` did the work and freed some space.
>
> However, even removing all VMs except one (which is hard to remove without disruption) I can see that:
>
> root@venom:~# lvs
> LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
> data pve twi-aotz-- 377.55g 60.65 0.67
> root pve -wi-ao---- 60.00g
> swap pve -wi-ao---- 4.00g
> vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06
> vm-201-disk-1 pve Vwi-aotz-- 40.00g data 56.58
>
> Which means that I have about 200 GB used out of nowhere :( At least it is no longer under pressure of being almost 100% full.
>
> On Thu, Dec 29, 2022, at 11:48, Óscar de Arriba wrote:
>> Any idea why it still has 96.23% of space used but the VMs are using way less? I'm starting to worry a lot about it (I don't kant tobe really full) and my current only hope is backup + reinstall PVE.
>>
>> Thanks,
>> Oscar
>>
>> On Thu, Dec 29, 2022, at 11:01, Martin Holub wrote:
>>>
>>>
>>> Am 28.12.2022 um 12:44 schrieb Óscar de Arriba:
>>>> Hi Martin,
>>>>
>>>> > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick.
>>>>
>>>> I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied.
>>>>
>>>> I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it.
>>>>
>>>> Should I also enable trim/execute trim on Proxmox itself?
>>>>
>>>> Oscar
>>>>
>>>
>>>
>>>
>>>
>>> Hi,
>>>
>>> TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either.
>>>
>>> hth
>>> Martin
>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
2022-12-29 16:58 ` Óscar de Arriba
@ 2022-12-30 9:52 ` Tom Weber
2023-01-02 9:56 ` Óscar de Arriba
1 sibling, 0 replies; 10+ messages in thread
From: Tom Weber @ 2022-12-30 9:52 UTC (permalink / raw)
To: pve-user
Am 29.12.22 um 17:58 schrieb Óscar de Arriba:
> Update: I have enabled `Discard` option in all disks of the VMs on that server and then `fstim` did the work and freed some space.
>
> However, even removing all VMs except one (which is hard to remove without disruption) I can see that:
>
> root@venom:~# lvs
> LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
> data pve twi-aotz-- 377.55g 60.65 0.67
> root pve -wi-ao---- 60.00g
> swap pve -wi-ao---- 4.00g
> vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06
> vm-201-disk-1 pve Vwi-aotz-- 40.00g data 56.58
>
> Which means that I have about 200 GB used out of nowhere :( At least it is no longer under pressure of being almost 100% full.
>
you might want to try lvs -a
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
2022-12-29 10:48 ` Óscar de Arriba
@ 2022-12-29 16:58 ` Óscar de Arriba
2022-12-30 9:52 ` Tom Weber
2023-01-02 9:56 ` Óscar de Arriba
0 siblings, 2 replies; 10+ messages in thread
From: Óscar de Arriba @ 2022-12-29 16:58 UTC (permalink / raw)
To: Martin Holub, pve-user
Update: I have enabled `Discard` option in all disks of the VMs on that server and then `fstim` did the work and freed some space.
However, even removing all VMs except one (which is hard to remove without disruption) I can see that:
root@venom:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data pve twi-aotz-- 377.55g 60.65 0.67
root pve -wi-ao---- 60.00g
swap pve -wi-ao---- 4.00g
vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06
vm-201-disk-1 pve Vwi-aotz-- 40.00g data 56.58
Which means that I have about 200 GB used out of nowhere :( At least it is no longer under pressure of being almost 100% full.
On Thu, Dec 29, 2022, at 11:48, Óscar de Arriba wrote:
> Any idea why it still has 96.23% of space used but the VMs are using way less? I'm starting to worry a lot about it (I don't kant tobe really full) and my current only hope is backup + reinstall PVE.
>
> Thanks,
> Oscar
>
> On Thu, Dec 29, 2022, at 11:01, Martin Holub wrote:
>>
>>
>> Am 28.12.2022 um 12:44 schrieb Óscar de Arriba:
>>> Hi Martin,
>>>
>>> > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick.
>>>
>>> I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied.
>>>
>>> I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it.
>>>
>>> Should I also enable trim/execute trim on Proxmox itself?
>>>
>>> Oscar
>>>
>>
>>
>>
>>
>> Hi,
>>
>> TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either.
>>
>> hth
>> Martin
>>
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
[not found] ` <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>
@ 2022-12-29 10:48 ` Óscar de Arriba
2022-12-29 16:58 ` Óscar de Arriba
0 siblings, 1 reply; 10+ messages in thread
From: Óscar de Arriba @ 2022-12-29 10:48 UTC (permalink / raw)
To: Martin Holub, pve-user
Any idea why it still has 96.23% of space used but the VMs are using way less? I'm starting to worry a lot about it (I don't kant tobe really full) and my current only hope is backup + reinstall PVE.
Thanks,
Oscar
On Thu, Dec 29, 2022, at 11:01, Martin Holub wrote:
>
>
> Am 28.12.2022 um 12:44 schrieb Óscar de Arriba:
>> Hi Martin,
>>
>> > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick.
>>
>> I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied.
>>
>> I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it.
>>
>> Should I also enable trim/execute trim on Proxmox itself?
>>
>> Oscar
>>
>
>
>
> Hi,
>
> TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either.
>
> hth
> Martin
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PVE-User] Thin LVM showing more used space than expected
[not found] <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
@ 2022-12-28 11:44 ` Óscar de Arriba
[not found] ` <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>
0 siblings, 1 reply; 10+ messages in thread
From: Óscar de Arriba @ 2022-12-28 11:44 UTC (permalink / raw)
To: pve-user, martin
Hi Martin,
> Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick.
I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied.
I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it.
Should I also enable trim/execute trim on Proxmox itself?
Oscar
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-01-02 9:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-27 17:54 [PVE-User] Thin LVM showing more used space than expected Óscar de Arriba
2022-12-28 10:52 ` Alain Péan
2022-12-28 11:19 ` Óscar de Arriba
2022-12-28 12:17 ` Alain Péan
2022-12-28 18:22 ` Óscar de Arriba
[not found] <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
2022-12-28 11:44 ` Óscar de Arriba
[not found] ` <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>
2022-12-29 10:48 ` Óscar de Arriba
2022-12-29 16:58 ` Óscar de Arriba
2022-12-30 9:52 ` Tom Weber
2023-01-02 9:56 ` Óscar de Arriba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox