public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages
@ 2024-06-11  9:30 Dominik Csapak
  2024-06-11  9:42 ` [pbs-devel] applied: " Dietmar Maurer
  2024-06-11 18:05 ` [pbs-devel] " Thomas Lamprecht
  0 siblings, 2 replies; 7+ messages in thread
From: Dominik Csapak @ 2024-06-11  9:30 UTC (permalink / raw)
  To: pbs-devel

such as NFS or SMB. They will not provide the expected performance
and it's better to recommend against them.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
if we want to discourage users even more, we could also detect it on
datastore creation and put a warning into the task log

also if we ever come around to implementing the 'health' page thomas
wished for, we can put a warning/error there too

 docs/system-requirements.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/system-requirements.rst b/docs/system-requirements.rst
index fb920865..17756b7b 100644
--- a/docs/system-requirements.rst
+++ b/docs/system-requirements.rst
@@ -41,6 +41,9 @@ Recommended Server System Requirements
   * Use only SSDs, for best results
   * If HDDs are used: Using a metadata cache is highly recommended, for example,
     add a ZFS :ref:`special device mirror <local_zfs_special_device>`.
+  * While it's technically possible to use remote storages such as NFS or SMB,
+    the additional latency and overhead drastically reduces performance and it's
+    not recommended to use such a setup.
 
 * Redundant Multi-GBit/s network interface cards (NICs)
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup] docs: add note for not using remote storages
  2024-06-11  9:30 [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages Dominik Csapak
@ 2024-06-11  9:42 ` Dietmar Maurer
  2024-06-11 18:05 ` [pbs-devel] " Thomas Lamprecht
  1 sibling, 0 replies; 7+ messages in thread
From: Dietmar Maurer @ 2024-06-11  9:42 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

> On 11.6.2024 11:30 CEST Dominik Csapak <d.csapak@proxmox.com> wrote:
> 
>  
> such as NFS or SMB. They will not provide the expected performance
> and it's better to recommend against them.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> if we want to discourage users even more, we could also detect it on
> datastore creation and put a warning into the task log
> 
> also if we ever come around to implementing the 'health' page thomas
> wished for, we can put a warning/error there too
> 
>  docs/system-requirements.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/docs/system-requirements.rst b/docs/system-requirements.rst
> index fb920865..17756b7b 100644
> --- a/docs/system-requirements.rst
> +++ b/docs/system-requirements.rst
> @@ -41,6 +41,9 @@ Recommended Server System Requirements
>    * Use only SSDs, for best results
>    * If HDDs are used: Using a metadata cache is highly recommended, for example,
>      add a ZFS :ref:`special device mirror <local_zfs_special_device>`.
> +  * While it's technically possible to use remote storages such as NFS or SMB,
> +    the additional latency and overhead drastically reduces performance and it's
> +    not recommended to use such a setup.
>  
>  * Redundant Multi-GBit/s network interface cards (NICs)
>  
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages
  2024-06-11  9:30 [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages Dominik Csapak
  2024-06-11  9:42 ` [pbs-devel] applied: " Dietmar Maurer
@ 2024-06-11 18:05 ` Thomas Lamprecht
  2024-06-12  6:39   ` Dominik Csapak
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Lamprecht @ 2024-06-11 18:05 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

This section is a quite central and important one, so I'm being a bit
more nitpicking with it than other content. NFS boxes are still quite
popular, a blanket recommendation against them quite probably won't
help our cause or reducing noise in our getting help channels.

Dietmar already applied this, so would need a follow-up please.

Am 11/06/2024 um 11:30 schrieb Dominik Csapak:
> such as NFS or SMB. They will not provide the expected performance
> and it's better to recommend against them.

Not so sure about doing recommending against them as a blanket statement,
the "remote" part might adjective is a bit subtle and, e.g., using a local
full flash NVMe storage attached over a 100G link with latency in the µs
surely beats basically any local spinner only storage and probably even
a lot of SATA attached SSD ones.

Also, it can be totally fine to use as second datastore, i.e. in a setup
with a (smaller) datastore backed by (e.g. local) fast storage that is
then periodically synced to a slower remote.

> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> if we want to discourage users even more, we could also detect it on
> datastore creation and put a warning into the task log

I would avoid that, at least not without actually measuring how the
storage performs (which is probably quite prone to errors, or would
require periodic measurements).

> 
> also if we ever come around to implementing the 'health' page thomas
> wished for, we can put a warning/error there too
> 
>  docs/system-requirements.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/docs/system-requirements.rst b/docs/system-requirements.rst
> index fb920865..17756b7b 100644
> --- a/docs/system-requirements.rst
> +++ b/docs/system-requirements.rst
> @@ -41,6 +41,9 @@ Recommended Server System Requirements
>    * Use only SSDs, for best results
>    * If HDDs are used: Using a metadata cache is highly recommended, for example,
>      add a ZFS :ref:`special device mirror <local_zfs_special_device>`.
> +  * While it's technically possible to use remote storages such as NFS or SMB,

Up-front, I wrote some possible smaller improvements upfront but then
a replacement (see below), but I kept the others 

Would do s/remote storages/remote storage/

(We use "storages" quite a few times already, but if possible keeping it
singular sounds nicer IMO)

> +    the additional latency and overhead drastically reduces performance and it's

s/additional latency and overhead/additional latency overhead/ ?

or "network overhead"

If it'd stay as is, the "reduces" should be changed to "reduce" ("latency and
overhead" is plural).


> +    not recommended to use such a setup.

The last part would be better off with just:

"... and is not recommended"


But I'd rather reword the whole thing to focus more on what the actual issue is,
i.e., not NFS or SMB/CIFS per se, but if the network accessing them is slow.
Maybe something like:

* Avoid using remote storage, like NFS or SMB/CIFS, connected over a slow
  (< 10 Gbps) and/or high latency (> 1 ms) link. Such a storage can
  dramatically reduce performance and may even negatively impact the
  backup source, e.g. by causing IO hangs.

I pulled the numbers in parentheses out of thin air, but IMO they shouldn't be too far
off from 2024 Slow™, no hard feelings on adapting them though.

>  
>  * Redundant Multi-GBit/s network interface cards (NICs)
>  



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages
  2024-06-11 18:05 ` [pbs-devel] " Thomas Lamprecht
@ 2024-06-12  6:39   ` Dominik Csapak
  2024-06-12 15:40     ` Thomas Lamprecht
  0 siblings, 1 reply; 7+ messages in thread
From: Dominik Csapak @ 2024-06-12  6:39 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion


On 6/11/24 8:05 PM, Thomas Lamprecht wrote:
> This section is a quite central and important one, so I'm being a bit
> more nitpicking with it than other content. NFS boxes are still quite
> popular, a blanket recommendation against them quite probably won't
> help our cause or reducing noise in our getting help channels.
> 
> Dietmar already applied this, so would need a follow-up please.

sure

> 
> Am 11/06/2024 um 11:30 schrieb Dominik Csapak:
>> such as NFS or SMB. They will not provide the expected performance
>> and it's better to recommend against them.
> 
> Not so sure about doing recommending against them as a blanket statement,
> the "remote" part might adjective is a bit subtle and, e.g., using a local
> full flash NVMe storage attached over a 100G link with latency in the µs
> surely beats basically any local spinner only storage and probably even
> a lot of SATA attached SSD ones.

well alone the fact of using nfs makes some operations a few magnitudes 
slower. e.g. here locally creating a datastore locally takes a few 
seconds (probably fast due to the page cache) but a locally
mounted nfs (so no network involved) on the same disk takes
a few minutes. so at least some file creation/deletion operations
are some magnitudes slower just by using nfs (though i guess
there are some options/ipmlementations that can influence that
such as async/sync export options)

also a remote SMB share from windows (same physical host though, so
again, no real network) takes ~ a minute for the same operation

so yes, while I generally agree that using remote storage can be fast 
enough, using any of them increases some file operations by a 
significant amount, even when using fast storage and fast network

(i know that datastore creation is not the best benchmark for this,
but shows that there is significant overhead on some operations)

> 
> Also, it can be totally fine to use as second datastore, i.e. in a setup
> with a (smaller) datastore backed by (e.g. local) fast storage that is
> then periodically synced to a slower remote.
> 
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> if we want to discourage users even more, we could also detect it on
>> datastore creation and put a warning into the task log
> 
> I would avoid that, at least not without actually measuring how the
> storage performs (which is probably quite prone to errors, or would
> require periodic measurements).

fine with me

> 
>>
>> also if we ever come around to implementing the 'health' page thomas
>> wished for, we can put a warning/error there too
>>
>>   docs/system-requirements.rst | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/docs/system-requirements.rst b/docs/system-requirements.rst
>> index fb920865..17756b7b 100644
>> --- a/docs/system-requirements.rst
>> +++ b/docs/system-requirements.rst
>> @@ -41,6 +41,9 @@ Recommended Server System Requirements
>>     * Use only SSDs, for best results
>>     * If HDDs are used: Using a metadata cache is highly recommended, for example,
>>       add a ZFS :ref:`special device mirror <local_zfs_special_device>`.
>> +  * While it's technically possible to use remote storages such as NFS or SMB,
> 
> Up-front, I wrote some possible smaller improvements upfront but then
> a replacement (see below), but I kept the others
> 
> Would do s/remote storages/remote storage/
> 
> (We use "storages" quite a few times already, but if possible keeping it
> singular sounds nicer IMO)

ok

> 
>> +    the additional latency and overhead drastically reduces performance and it's
> 
> s/additional latency and overhead/additional latency overhead/ ?
> 
> or "network overhead"
> 
> If it'd stay as is, the "reduces" should be changed to "reduce" ("latency and
> overhead" is plural).
> 

i meant actually two things here, the network latency and the additional
overhead of the second filesystem layer

> 
>> +    not recommended to use such a setup.
> 
> The last part would be better off with just:
> 
> "... and is not recommended"
> 

agreed, i was on the edge a bit with that wording anyway but just 
leaving it off sounds better.

> 
> But I'd rather reword the whole thing to focus more on what the actual issue is,
> i.e., not NFS or SMB/CIFS per se, but if the network accessing them is slow.
> Maybe something like:
> 
> * Avoid using remote storage, like NFS or SMB/CIFS, connected over a slow
>    (< 10 Gbps) and/or high latency (> 1 ms) link. Such a storage can
>    dramatically reduce performance and may even negatively impact the
>    backup source, e.g. by causing IO hangs.
> 
> I pulled the numbers in parentheses out of thin air, but IMO they shouldn't be too far
> off from 2024 Slow™, no hard feelings on adapting them though.

IMHO i'd not mention any specific numbers at all, unless we actually
benchmarked such a setup. so what about:

* Avoid using remote storage, like NFS or SMB/CIFS, connected over a 
slow and/or high latency link. Such a storage can dramatically reduce 
performance and may even negatively impact the backup source, e.g. by
causing IO hangs. If you want to use such a storage, make sure it
performs as expected by testing it before using it in production.


By adding that additional sentence we hopefully nudge some users
into actually testing before deploying it, instead of then
complaining that it's slow.


> 
>>   
>>   * Redundant Multi-GBit/s network interface cards (NICs)
>>   
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages
  2024-06-12  6:39   ` Dominik Csapak
@ 2024-06-12 15:40     ` Thomas Lamprecht
  2024-06-13  8:02       ` Dominik Csapak
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Lamprecht @ 2024-06-12 15:40 UTC (permalink / raw)
  To: Dominik Csapak, Proxmox Backup Server development discussion

Am 12/06/2024 um 08:39 schrieb Dominik Csapak:
> 
> On 6/11/24 8:05 PM, Thomas Lamprecht wrote:
>> This section is a quite central and important one, so I'm being a bit
>> more nitpicking with it than other content. NFS boxes are still quite
>> popular, a blanket recommendation against them quite probably won't
>> help our cause or reducing noise in our getting help channels.
>>
>> Dietmar already applied this, so would need a follow-up please.
> 
> sure
> 
>>
>> Am 11/06/2024 um 11:30 schrieb Dominik Csapak:
>>> such as NFS or SMB. They will not provide the expected performance
>>> and it's better to recommend against them.
>>
>> Not so sure about doing recommending against them as a blanket statement,
>> the "remote" part might adjective is a bit subtle and, e.g., using a local
>> full flash NVMe storage attached over a 100G link with latency in the µs
>> surely beats basically any local spinner only storage and probably even
>> a lot of SATA attached SSD ones.
> 
> well alone the fact of using nfs makes some operations a few magnitudes 
> slower. e.g. here locally creating a datastore locally takes a few 
> seconds (probably fast due to the page cache) but a locally
> mounted nfs (so no network involved) on the same disk takes
> a few minutes. so at least some file creation/deletion operations
> are some magnitudes slower just by using nfs (though i guess
> there are some options/ipmlementations that can influence that
> such as async/sync export options)
>
> also a remote SMB share from windows (same physical host though, so
> again, no real network) takes ~ a minute for the same operation
> 
> so yes, while I generally agree that using remote storage can be fast 
> enough, using any of them increases some file operations by a 
> significant amount, even when using fast storage and fast network

Just because there is some overhead (that is the result of a trade-off
to get a parallel/simultaneous accessible FS) doesn't mean that we
should recommend against an FS, which is IMO a bit strange to do
in a system requirement recommendation list anyway (there's a huge
list of things that'd need to get added then here, from not using
USB 1.0 pen drives as backing storage to not sliding strong magnets
over the server).

> 
> (i know that datastore creation is not the best benchmark for this,
> but shows that there is significant overhead on some operations)

Yeah, one creates a datastore only once, and on actual backup there
are at max a few mkdirs, not 65k, so not really relevant here.
Also, just because there's some overhead, allowing simultaneous mounts
doesn't come for free, it doesn't mean that it's actually a problem for
actual backup. As said, a blanket recommendation against a setup that
is already rather frequent is IMO just deterring (future) users.


>>
>> Also, it can be totally fine to use as second datastore, i.e. in a setup
>> with a (smaller) datastore backed by (e.g. local) fast storage that is
>> then periodically synced to a slower remote.
>>
>>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>>> ---
>>> if we want to discourage users even more, we could also detect it on
>>> datastore creation and put a warning into the task log
>>
>> I would avoid that, at least not without actually measuring how the
>> storage performs (which is probably quite prone to errors, or would
>> require periodic measurements).
> 
> fine with me
> 
>>
>>>
>>> also if we ever come around to implementing the 'health' page thomas
>>> wished for, we can put a warning/error there too
>>>
>>>   docs/system-requirements.rst | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/docs/system-requirements.rst b/docs/system-requirements.rst
>>> index fb920865..17756b7b 100644
>>> --- a/docs/system-requirements.rst
>>> +++ b/docs/system-requirements.rst
>>> @@ -41,6 +41,9 @@ Recommended Server System Requirements
>>>     * Use only SSDs, for best results
>>>     * If HDDs are used: Using a metadata cache is highly recommended, for example,
>>>       add a ZFS :ref:`special device mirror <local_zfs_special_device>`.
>>> +  * While it's technically possible to use remote storages such as NFS or SMB,
>>
>> Up-front, I wrote some possible smaller improvements upfront but then
>> a replacement (see below), but I kept the others
>>
>> Would do s/remote storages/remote storage/
>>
>> (We use "storages" quite a few times already, but if possible keeping it
>> singular sounds nicer IMO)
> 
> ok
> 
>>
>>> +    the additional latency and overhead drastically reduces performance and it's
>>
>> s/additional latency and overhead/additional latency overhead/ ?
>>
>> or "network overhead"
>>
>> If it'd stay as is, the "reduces" should be changed to "reduce" ("latency and
>> overhead" is plural).
>>
> 
> i meant actually two things here, the network latency and the additional
> overhead of the second filesystem layer

Then it'd have helped me if to avoid mixing a specific overhead (latency) with
a generic mentioning of the word overhead, like:

"... the added overhead of networking and providing concurrent file system access
drastically reduces performance ..."

But that sounds a bit convoluted, so the best option here might be to just
use "added overhead".


>>
>> But I'd rather reword the whole thing to focus more on what the actual issue is,
>> i.e., not NFS or SMB/CIFS per se, but if the network accessing them is slow.
>> Maybe something like:
>>
>> * Avoid using remote storage, like NFS or SMB/CIFS, connected over a slow
>>    (< 10 Gbps) and/or high latency (> 1 ms) link. Such a storage can
>>    dramatically reduce performance and may even negatively impact the
>>    backup source, e.g. by causing IO hangs.
>>
>> I pulled the numbers in parentheses out of thin air, but IMO they shouldn't be too far
>> off from 2024 Slow™, no hard feelings on adapting them though.
> 
> IMHO i'd not mention any specific numbers at all, unless we actually
> benchmarked such a setup. so what about:

Not sure what numbers from a benchmark would be of use here? One knows what
fast storage can do latency wise and how much bandwidth is a good baseline
– granted, the numbers are not helping for every specific setup, but doing
some benchmark won't change that either.
Anyway, won't matter, see below.

> 
> * Avoid using remote storage, like NFS or SMB/CIFS, connected over a 
> slow and/or high latency link. Such a storage can dramatically reduce 
> performance and may even negatively impact the backup source, e.g. by
> causing IO hangs. If you want to use such a storage, make sure it
> performs as expected by testing it before using it in production.
> 

That starts to get rather convoluted, tbh., the more I think about this,
the more I prefer just reverting the whole thing, I see no gain in
"bashing" NFS/SMB just because they have some overhead.

If, we could simply adapt the "Use only SSDs, for best results" point to:

"Prefer fast local storage that delivers high IOPS for random IO workloads; use only enterprise SSDs for best results."

Would be a better fit to convey that fast local storage should be preferred,
especially in a "recommended" (not "recommended against") list.


> 
> By adding that additional sentence we hopefully nudge some users
> into actually testing before deploying it, instead of then
> complaining that it's slow.

If only; from forum and office request it's quite sensible to assume
that a good amount of users already have their storage box, and they'd
need to do so to be able to test it in any way, so already too late.

It might be better to describe a setup how to still be able to use their
existing, NFS/SMB/... attached storage in the best way possible. E.g., by
doing a fast small local storage for incoming backups and use the bigger
remote storage only through syncing to it. This has a few benefits beside
getting good performance with existing, slower storage (of any type), like
having already an extra copy of most recent data.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages
  2024-06-12 15:40     ` Thomas Lamprecht
@ 2024-06-13  8:02       ` Dominik Csapak
  2024-06-17 15:58         ` Thomas Lamprecht
  0 siblings, 1 reply; 7+ messages in thread
From: Dominik Csapak @ 2024-06-13  8:02 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 6/12/24 17:40, Thomas Lamprecht wrote:
> Am 12/06/2024 um 08:39 schrieb Dominik Csapak:
>>
>> On 6/11/24 8:05 PM, Thomas Lamprecht wrote:
>>> This section is a quite central and important one, so I'm being a bit
>>> more nitpicking with it than other content. NFS boxes are still quite
>>> popular, a blanket recommendation against them quite probably won't
>>> help our cause or reducing noise in our getting help channels.
>>>
>>> Dietmar already applied this, so would need a follow-up please.
>>
>> sure
>>
>>>
>>> Am 11/06/2024 um 11:30 schrieb Dominik Csapak:
>>>> such as NFS or SMB. They will not provide the expected performance
>>>> and it's better to recommend against them.
>>>
>>> Not so sure about doing recommending against them as a blanket statement,
>>> the "remote" part might adjective is a bit subtle and, e.g., using a local
>>> full flash NVMe storage attached over a 100G link with latency in the µs
>>> surely beats basically any local spinner only storage and probably even
>>> a lot of SATA attached SSD ones.
>>
>> well alone the fact of using nfs makes some operations a few magnitudes
>> slower. e.g. here locally creating a datastore locally takes a few
>> seconds (probably fast due to the page cache) but a locally
>> mounted nfs (so no network involved) on the same disk takes
>> a few minutes. so at least some file creation/deletion operations
>> are some magnitudes slower just by using nfs (though i guess
>> there are some options/ipmlementations that can influence that
>> such as async/sync export options)
>>
>> also a remote SMB share from windows (same physical host though, so
>> again, no real network) takes ~ a minute for the same operation
>>
>> so yes, while I generally agree that using remote storage can be fast
>> enough, using any of them increases some file operations by a
>> significant amount, even when using fast storage and fast network
> 
> Just because there is some overhead (that is the result of a trade-off
> to get a parallel/simultaneous accessible FS) doesn't mean that we
> should recommend against an FS, which is IMO a bit strange to do
> in a system requirement recommendation list anyway (there's a huge
> list of things that'd need to get added then here, from not using
> USB 1.0 pen drives as backing storage to not sliding strong magnets
> over the server).

but we already do recommend against using remote storage regularly,
just not in the docs but in the forum. (so do many of our users)

we also recommend against slow storage, but that can also work
depending on the use case/workload/exact setup

> 
>>
>> (i know that datastore creation is not the best benchmark for this,
>> but shows that there is significant overhead on some operations)
> 
> Yeah, one creates a datastore only once, and on actual backup there
> are at max a few mkdirs, not 65k, so not really relevant here.
> Also, just because there's some overhead, allowing simultaneous mounts
> doesn't come for free, it doesn't mean that it's actually a problem for
> actual backup. As said, a blanket recommendation against a setup that
> is already rather frequent is IMO just deterring (future) users.

it's not only datastore creation, also garbage collection and
all operations that has to access many files in succession suffers
from the over head here.

my point is that the overhead of using a remote fs (regardless which)
adds so much overhead that it often turns what would be 'reasonable'
performance locally into 'unreasonably slow' so you'd have to massively
overcompensate for that in hardware. This is possible ofc, but highly
unlikely for the vast majority of users.

> 
> 
>>>
>>> Also, it can be totally fine to use as second datastore, i.e. in a setup
>>> with a (smaller) datastore backed by (e.g. local) fast storage that is
>>> then periodically synced to a slower remote.
>>>
>>>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>>>> ---
>>>> if we want to discourage users even more, we could also detect it on
>>>> datastore creation and put a warning into the task log
>>>
>>> I would avoid that, at least not without actually measuring how the
>>> storage performs (which is probably quite prone to errors, or would
>>> require periodic measurements).
>>
>> fine with me
>>
>>>
>>>>
>>>> also if we ever come around to implementing the 'health' page thomas
>>>> wished for, we can put a warning/error there too
>>>>
>>>>    docs/system-requirements.rst | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/docs/system-requirements.rst b/docs/system-requirements.rst
>>>> index fb920865..17756b7b 100644
>>>> --- a/docs/system-requirements.rst
>>>> +++ b/docs/system-requirements.rst
>>>> @@ -41,6 +41,9 @@ Recommended Server System Requirements
>>>>      * Use only SSDs, for best results
>>>>      * If HDDs are used: Using a metadata cache is highly recommended, for example,
>>>>        add a ZFS :ref:`special device mirror <local_zfs_special_device>`.
>>>> +  * While it's technically possible to use remote storages such as NFS or SMB,
>>>
>>> Up-front, I wrote some possible smaller improvements upfront but then
>>> a replacement (see below), but I kept the others
>>>
>>> Would do s/remote storages/remote storage/
>>>
>>> (We use "storages" quite a few times already, but if possible keeping it
>>> singular sounds nicer IMO)
>>
>> ok
>>
>>>
>>>> +    the additional latency and overhead drastically reduces performance and it's
>>>
>>> s/additional latency and overhead/additional latency overhead/ ?
>>>
>>> or "network overhead"
>>>
>>> If it'd stay as is, the "reduces" should be changed to "reduce" ("latency and
>>> overhead" is plural).
>>>
>>
>> i meant actually two things here, the network latency and the additional
>> overhead of the second filesystem layer
> 
> Then it'd have helped me if to avoid mixing a specific overhead (latency) with
> a generic mentioning of the word overhead, like:
> 
> "... the added overhead of networking and providing concurrent file system access
> drastically reduces performance ..."
> 
> But that sounds a bit convoluted, so the best option here might be to just
> use "added overhead".
> 
> 
>>>
>>> But I'd rather reword the whole thing to focus more on what the actual issue is,
>>> i.e., not NFS or SMB/CIFS per se, but if the network accessing them is slow.
>>> Maybe something like:
>>>
>>> * Avoid using remote storage, like NFS or SMB/CIFS, connected over a slow
>>>     (< 10 Gbps) and/or high latency (> 1 ms) link. Such a storage can
>>>     dramatically reduce performance and may even negatively impact the
>>>     backup source, e.g. by causing IO hangs.
>>>
>>> I pulled the numbers in parentheses out of thin air, but IMO they shouldn't be too far
>>> off from 2024 Slow™, no hard feelings on adapting them though.
>>
>> IMHO i'd not mention any specific numbers at all, unless we actually
>> benchmarked such a setup. so what about:
> 
> Not sure what numbers from a benchmark would be of use here? One knows what
> fast storage can do latency wise and how much bandwidth is a good baseline
> – granted, the numbers are not helping for every specific setup, but doing
> some benchmark won't change that either.
> Anyway, won't matter, see below.
> 
>>
>> * Avoid using remote storage, like NFS or SMB/CIFS, connected over a
>> slow and/or high latency link. Such a storage can dramatically reduce
>> performance and may even negatively impact the backup source, e.g. by
>> causing IO hangs. If you want to use such a storage, make sure it
>> performs as expected by testing it before using it in production.
>>
> 
> That starts to get rather convoluted, tbh., the more I think about this,
> the more I prefer just reverting the whole thing, I see no gain in
> "bashing" NFS/SMB just because they have some overhead.
> 
> If, we could simply adapt the "Use only SSDs, for best results" point to:
> 
> "Prefer fast local storage that delivers high IOPS for random IO workloads; use only enterprise SSDs for best results."
> 
> Would be a better fit to convey that fast local storage should be preferred,
> especially in a "recommended" (not "recommended against") list.
> 
> 
>>
>> By adding that additional sentence we hopefully nudge some users
>> into actually testing before deploying it, instead of then
>> complaining that it's slow.
> 
> If only; from forum and office request it's quite sensible to assume
> that a good amount of users already have their storage box, and they'd
> need to do so to be able to test it in any way, so already too late.
> 
> It might be better to describe a setup how to still be able to use their
> existing, NFS/SMB/... attached storage in the best way possible. E.g., by
> doing a fast small local storage for incoming backups and use the bigger
> remote storage only through syncing to it. This has a few benefits beside
> getting good performance with existing, slower storage (of any type), like
> having already an extra copy of most recent data.

ultimately it's your call, but personally i'd prefer a broad statement
that defers users from using a sub optimal setup in the first place
than not mentioning it at all in the official docs and explaining
every week in the forums that it's a bad idea

this is the same as recommending fast disk, as one can use slow disks
in some (small) setups successfully without problems, but it does not
scale properly so we recommend against it. for remote storage,
the vast majority of users won't probably invest in a super
high performance nas/san box so recommending against using those
is worth mentioning in the docs IMHO

it does not have to be in the system requirements though, we could
also put a longer explanation in e.g. the FAQ or datastore section.
i just put it in the system requirements because we call out
slow disks there too and i guessed this is one of the more
read sections.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages
  2024-06-13  8:02       ` Dominik Csapak
@ 2024-06-17 15:58         ` Thomas Lamprecht
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Lamprecht @ 2024-06-17 15:58 UTC (permalink / raw)
  To: Dominik Csapak, Proxmox Backup Server development discussion

Am 13/06/2024 um 10:02 schrieb Dominik Csapak:
> On 6/12/24 17:40, Thomas Lamprecht wrote:
> but we already do recommend against using remote storage regularly,
> just not in the docs but in the forum. (so do many of our users)
> 
> we also recommend against slow storage, but that can also work
> depending on the use case/workload/exact setup

If a user complains it's safe to assume that it's too slow for their
use case, otherwise they would not be in the forum.

It's also OK to tell users that their storage is too slow and a local
storage with some SSDs might be a (relatively) cheap alternative to
address that, especially in the previous mentioned combination where
a small and fast local storage is used for incoming backups while still
using the remote storage to sync a longer history of backups too.

Both have nothing to do with a blanket recommendation against remote
storage, i.e., without looking at the actual setup closely, and I hope
not that's such blanket statements are currently done frequently without
context.

>>>
>>> (i know that datastore creation is not the best benchmark for this,
>>> but shows that there is significant overhead on some operations)
>>
>> Yeah, one creates a datastore only once, and on actual backup there
>> are at max a few mkdirs, not 65k, so not really relevant here.
>> Also, just because there's some overhead, allowing simultaneous mounts
>> doesn't come for free, it doesn't mean that it's actually a problem for
>> actual backup. As said, a blanket recommendation against a setup that
>> is already rather frequent is IMO just deterring (future) users.
> 
> it's not only datastore creation, also garbage collection and
> all operations that has to access many files in succession suffers
> from the over head here.
> 
> my point is that the overhead of using a remote fs (regardless which)
> adds so much overhead that it often turns what would be 'reasonable'
> performance locally into 'unreasonably slow' so you'd have to massively
> overcompensate for that in hardware. This is possible ofc, but highly
> unlikely for the vast majority of users.
> 

That a storage being remote makes it unusable slow for PBS by definition is
just not true (see next paragraph of my reply for expanding on that).

>>
>> If only; from forum and office request it's quite sensible to assume
>> that a good amount of users already have their storage box, and they'd
>> need to do so to be able to test it in any way, so already too late.
>>
>> It might be better to describe a setup how to still be able to use their
>> existing, NFS/SMB/... attached storage in the best way possible. E.g., by
>> doing a fast small local storage for incoming backups and use the bigger
>> remote storage only through syncing to it. This has a few benefits beside
>> getting good performance with existing, slower storage (of any type), like
>> having already an extra copy of most recent data.
> 
> ultimately it's your call, but personally i'd prefer a broad statement
> that defers users from using a sub optimal setup in the first place
> than not mentioning it at all in the official docs and explaining
> every week in the forums that it's a bad idea

Again, just because a storage is remote just does *not* mean that it has to
be too slow to be used. I.e., just because there is _some_ overhead it does
*not* mean that it will make the storage unusable. Ceph, e.g., is a remote
storage that can be made plenty of fast, as our own benchmark papers how,
and some users in huge environments even have to use it for backups as nothing
else can scale amount of data and performance.
Or take Blockbridge, they're providing fast remote storage through NVMe over
TCP.

So by counterexample, including our *own* benchmarks, I think we really
can establish as a fact that there can be remote storage setups that are fast,
and I do not see any point in arguing that further.

> 
> this is the same as recommending fast disk, as one can use slow disks
> in some (small) setups successfully without problems, but it does not
> scale properly so we recommend against it. for remote storage,

It really isn't, recommending for fast local storage in a recommended
system specs section is not the same 

> the vast majority of users won't probably invest in a super
> high performance nas/san box so recommending against using those
> is worth mentioning in the docs IMHO

As mentioned in my last reply, with that logic we have thousands+ things
to recommend against, lots of old/low-power/ HW, some USB HW (some other
nice one can be totally fine again), ... this would blow up the section
such over some time, that almost nobody would read it to completion,
not really helping such annoying cases in the forum or other channels
(that cannot be really fixed by just adding a bulletin point, IME they're
even encouraged to further go in the wrong direction if argumentation isn't
sound (and sometimes even then..)).

> 
> it does not have to be in the system requirements though, we could
> also put a longer explanation in e.g. the FAQ or datastore section.
> i just put it in the system requirements because we call out
> slow disks there too and i guessed this is one of the more
> read sections.
> 

I reworked the system requirements part to my previous proposal, that fit's
the style of recommending for things, not against, and tells the user what's
actually important, not some possible correlation to that.

https://git.proxmox.com/?p=proxmox-backup.git;a=commitdiff;h=5c15fb97b4d507c2f60428b3dba376bdbfadf116

This is getting long again and so only as short draft that would need some
more thoughts and expansion, but a IMO better help that recommending against
such things would be to provide a CLI command that allows users to test some
basic throughput and access times (e.g. with cold/flushed FS cache) and
use these measurements to extrapolate on some GC/Verify examples that try to
mirror some real-world smaller/medium/big setups.
While naturally still not perfect it would tell the user much more to see
that a work load with, e.g., 30 VMs (backup group), with each say ~100 GB of
space usage, and 10 snapshots per backup group each, would need roughly X time
for a GC and Y time for a verification of all data. Surely quite a bit more
complex to do sanely, but something like that would IMO *much* more helpful.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-17 15:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-11  9:30 [pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages Dominik Csapak
2024-06-11  9:42 ` [pbs-devel] applied: " Dietmar Maurer
2024-06-11 18:05 ` [pbs-devel] " Thomas Lamprecht
2024-06-12  6:39   ` Dominik Csapak
2024-06-12 15:40     ` Thomas Lamprecht
2024-06-13  8:02       ` Dominik Csapak
2024-06-17 15:58         ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal