From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 16B2F69F29 for ; Fri, 25 Mar 2022 18:59:07 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0D0B93C1E for ; Fri, 25 Mar 2022 18:59:07 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id E259A3C14 for ; Fri, 25 Mar 2022 18:59:05 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id B84DC46FB1 for ; Fri, 25 Mar 2022 18:59:05 +0100 (CET) Message-ID: <416be01d-0e5f-02eb-0af8-7c6e1896d095@proxmox.com> Date: Fri, 25 Mar 2022 18:59:00 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:99.0) Gecko/20100101 Thunderbird/99.0 Content-Language: en-US To: Daniel Tschlatscher , PVE development discussion References: <20220324154443.2442430-1-d.tschlatscher@proxmox.com> <5a02e9a6-fe05-129a-a4bf-9a29a8efa5d4@proxmox.com> <8af5712b-d33d-ffb7-ccd5-5dda8973822a@proxmox.com> <4095c73e-9aa2-41c4-21f6-8aa4fb05c591@proxmox.com> From: Thomas Lamprecht In-Reply-To: <4095c73e-9aa2-41c4-21f6-8aa4fb05c591@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.056 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [pve-devel] [PATCH pve-qemu] fix: 3865: backup restore human readable output X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Mar 2022 17:59:07 -0000 On 25.03.22 13:39, Daniel Tschlatscher wrote: > On 3/25/22 12:05, Thomas Lamprecht wrote: >>>>> ++ >>>>> ++=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 if (delta !=3D 0) >>>>> ++=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bps /=3D delta; >>>> currently, if delta would be null you'd print a size unit as through= put unit? >>>> >>> This is due to the underlying code. The function that is wrapping thi= s part of the code is called multiple times every second (something like = ~20 times / sec) and in the original code a print statement would only oc= cur after the percentage of bytes transferred increased by at least 1%.> >>> The time elapsed between an increase in percent could be less than a = second. In this case the output simply prints how much bytes were transfe= rred since this second began. This means the value for throughput would a= t the time of the print be always lower than the real world B/s and only = the last print in that second would represent the actual B/s. >> This just doesn't makes sense to me, the if clauses enforces that its = called at >> max every 2s (or 100% edge case) already, and using wrong units is usi= ng wrong >> units, you cannot really argue basic math/physics away.. >> > The 100% edge case is the problem for the most part. You could also ignore that in loop but print a 100% done after, rethinkin= g this makes the current approach to handle that look rather odd anyhow. >=20 > It comes down to the inaccuracy of keeping time in seconds rather than = something more granular. While it is true that in 0 time no actual data c= ould be transferred, 0 in this case means somewhere from 0 =3D> 0.999 sec= onds. Delta could be zero while the transferred bytes for example count 2= 0MB. yes, as mentioned using something more granular improves a lot of things.= =2E >=20 > Just printing 0 would be confusing because *some* data was in fact tran= sferred -> reflected in the bps variable. It could also give the user a f= alse impression that their disk is slow or something similar if they see = a lot of "0 B/s" entries (which was the actual reasoning here). it isn't really reflect in bps as it's just wrong to print a size as thro= ughput, and with finer granularity or just not spamming the log at the 100% stage= would be a nicer, more practical solution. The delta 0 -> throughput 0 is then still the only valid way to handle it= , the minimal time resolution the program handles needs to be used as metric, n= ot the physical (almost) infinitesimal one, and once that's ms or =C2=B5s its on= ly going to happen when throughput would be rounded down to 0 anyway. > The current implementation means, if delta is 0, bps =3D 20MB and in re= ality 0.33 seconds have elapsed that the program prints "20MB/s". In actu= ality the expected rate of transfer would be "60 MB/s" though. bps (byte per second) cannot be MB, you really need to get your units str= aigt, as else its just confusing to me, but I assume you meant a delta of 20 Mi= B in bytes restored since last time, and then 20 / 0.33 =3D 60 so not seeing t= he issue. >=20 > My thinking not using milliseconds here was: Even though I could calcul= ate how much the transfer rate per second is, during the time of the prin= t, it is still an approximation. Imagine following scenario: So, you're arguing that because a much more accurate solution is still no= t 100% accurate (which is impossible to do) lets go for the much less accurate o= ne? Sorry, but not really following that logic and really not getting why you= still argue for it?! It should be clear that using millisecond (or lower) resol= ution gives much more accurate results, and can *never* be worse than more gran= ular ones. > In the first 0.33 secs 20MB/s are=C2=A0 transferred (=3D> print goes ou= t with 60 MB/s) Do you meant to write "In the first 0.33s 20 MiB are transferred", as els= e we talk second order derivative, i.e., throughput acceleration? > In the next 0.33 secs only 10MB/s are transferred > In the last 0.33 secs only 0.5MB/s are transferred > In this case "60MB/s" was printed but 30.5MB have been transferred. I don't get the point/example... >=20 > The current implementation might be severely understating the actual th= roughput, but will never show a value for B/s that is higher than the act= ual throughput per second. The unaccounted for bytes here would be then r= eflected in the following output. exactly, but making the interval, that the error "leaks" into, 1000, or e= ven 1 million times smaller reduces the difference from reality vs. actual outp= ut quite a lot. > The millisecond approach would have both the same and opposite problem,= e.g. if delta is only a few milliseconds and the throughput in that time= a lot higher than average, then the printed B/s value could be highly ex= aggerated. no, they do not have the opposite problem, that cannot really be as its t= he same thing but just more accurate, it's still not perfect, and for very v= ery high bandwidths (100s of GiB/s) the error may be even still get relevant = with a per ms sampling, but then it'd be still 1000 times better on average. >=20 > I let the current implementation be because, in practice, at the end of= the backup restore there are usually a lot of sparse bytes (I hope I am = using this term correctly). This means that the restore would go from 56%= to 100% in less than a second.=C2=A0 Simply writing 0 here would probabl= y confuse the user more than anything because the value suddenly jumped b= y e.g.=C2=A0 6GiB with 0B/s. This cannot be generalized in practice, on a long running production VM h= oles due to sparseness can be there or not, and may not be accumulated all at the = end. Different OS and the filesystems they use also have very different impact= s. >=20 > Still, I do concede that using milliseconds just makes more sense becau= se it's the more "proper" way to implement something like this and especi= ally because of the unit. In the current implementation it would be somet= hing like B/last-output. > Also, it seems to me that the rate of throughput plummeting or skyrocke= ting in the span of less than a second is rather unlikely, the millisecon= d method would statistically just be more correct. Hows that unlikely? Modern HW and their caches are really not determinist= ic at all, e.g., QLC SSDs with a few tens of GiB of SLC and some GiB of DDR4 cache c= an show very stark jumps in throughput, from 6 Gbps (maxing out SATA link sp= eed) to 100 Mbps (maxing out QLC). In addition to that, the IO schedule handling mult= iple processes submitting IO concurrently doesn't makes it anymore stable, the= n add effects of filesystems or possible ZFS RAID, ... on top. Also, simply that anothe= r process is done with IO can make a lot of capacity available in mere micro seconds. There's a reason some counters/timing bits of the IO paths adds to the ke= rnel's RNG entropy pool. Further, note that the base assumption of why I went into this wasn't eve= n erratic IO, but rather CPU scheduling leading into very different actual duration int= ervals. And that could even make a stable IO throughput flow display as completel= y erratic, even if it actually wasn't. I.e., using such a low accuracy just adds way= to much distortion in any direction. > I will change it in v3. >=20 >>> Nonetheless, I later adapted this code to behave like the backup crea= te function does, which updates every 3 seconds, eliminating this problem= entirely. This is therefore dead code anyway. >>> >> later where? >> >> Actual improvement, track duration in milliseconds or less and still c= atch the edge >> case for when delta is zero and just set throughput to zero then (as n= o data can >> transfer in zero time). >> > Later in the sense of time. I rewrote the if statement above to only ex= ecute if delta is greater than 2 seconds (before it updated with every pe= rcent increase). Ah ok, in a future patch - I encountered "later" quite often like in "Lat= er in this patch or series", so yeah can be clear enough & probably just confusion on my s= ide. > _>> __"eliminating this problem entirely"_ is, I retract, not true thou= gh because I forgot about the check whether progress is at 100%, in which= case delta could still be 0. In that case you either skip reporting this round and/or move the 100% re= porting out of the loop entirely, which could be the nicer approach in general anyway, a= nd with the reporting factored out in its own function also not much code overhead.