From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 219B461554 for ; Thu, 17 Dec 2020 10:24:07 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0F81623E67 for ; Thu, 17 Dec 2020 10:23:37 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 3321723E5C for ; Thu, 17 Dec 2020 10:23:36 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id EAE6345259 for ; Thu, 17 Dec 2020 10:23:35 +0100 (CET) To: Fabian Ebner , Proxmox VE development discussion References: <20201214130039.9997-1-f.ebner@proxmox.com> <5c12de59-cec9-d1f3-9c3d-17a99c67e872@proxmox.com> From: Thomas Lamprecht Message-ID: Date: Thu, 17 Dec 2020 10:23:33 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:82.0) Gecko/20100101 Thunderbird/82.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Language: en-GB Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.065 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [0pointer.de] Subject: Re: [pve-devel] [PATCH zsync] fix #2821: only abort if there really is a waiting/syncing job instance already X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2020 09:24:07 -0000 On 17/12/2020 09:40, Fabian Ebner wrote: > Am 14.12.20 um 14:47 schrieb Thomas Lamprecht: >> On 14.12.20 14:00, Fabian Ebner wrote: >>> @@ -584,6 +586,33 @@ sub destroy_job { >>>       }); >>>   } >>>   +sub get_process_start_time { >>> +    my ($pid) = @_; >>> + >>> +    return eval { run_cmd(['ps', '-o', 'lstart=', '-p', "$pid"]); }; >> >> instead of fork+exec do a much cheaper file read? >> >> I.e., copying over file_read_firstline from PVE::Tools then: >> >> sub get_process_start_time { >>      my $stat_str = file_read_firstline("/proc/$pid/stat"); >>      my $stat = [ split(/\s+/, $stat_str) ]; >> >>      return $stat->[21]; >> } >> >> plus some error handling (note I did not test above) >> > > Agreed, although we also need to obtain the boot time (from /proc/stat) to have the actual start time, because the value in /proc/$pid/stat is just the number of clock ticks since boot when the process was started. But it's still much cheaper of course. hmm, yeah intra-boot this would not be enough to always tell 100% for sure. FYI, there you probably could also use `/proc/sys/kernel/random/boot_id` can be read once at program startup. http://0pointer.de/blog/projects/ids.html (see "Software IDs"), >>>   @@ -593,11 +622,18 @@ sub sync { >>>       eval { $job = get_job($param) }; >>>         if ($job) { >>> -        if (defined($job->{state}) && ($job->{state} eq "syncing" || $job->{state} eq "waiting")) { >>> +        my $state = $job->{state} // 'ok'; >>> +        $state = 'ok' if !instance_exists($job->{instance_id}); >>> + >>> +        if ($state eq "syncing" || $state eq "waiting") { >>>           die "Job --source $param->{source} --name $param->{name} is already scheduled to sync\n"; >>>           } >>>             $job->{state} = "waiting"; >>> + >>> +        eval { $job->{instance_id} = get_instance_id($$); }; >> >> I'd query and cache the local instance ID from the current process on startup, this >> would have the nice side effect of avoiding error potential here completely >> > > What if querying fails on startup? I'd rather have it be a non-critical failure and continue. Then we'd still need a check here to see if the cached instance_id is defined. if you make it just reads of /proc and it fails you can assume critical conditions and abort. If you really do not want too, you can add a singleton which returns the cached info and if not available retry getting it and warn. my $id_cache; sub get_local_instance_id { return $id_cache if defined($id_cache); $id_cache = eval { get_instance_id($$) }; warn $@ if $@; return $id_cache; } Albeit, I'd have less hard feelings about caching if getting the ID doesn't fork, nor other rather costly operations.