From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 219B461554
 for <pve-devel@lists.proxmox.com>; Thu, 17 Dec 2020 10:24:07 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 0F81623E67
 for <pve-devel@lists.proxmox.com>; Thu, 17 Dec 2020 10:23:37 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 3321723E5C
 for <pve-devel@lists.proxmox.com>; Thu, 17 Dec 2020 10:23:36 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id EAE6345259
 for <pve-devel@lists.proxmox.com>; Thu, 17 Dec 2020 10:23:35 +0100 (CET)
To: Fabian Ebner <f.ebner@proxmox.com>,
 Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <20201214130039.9997-1-f.ebner@proxmox.com>
 <5c12de59-cec9-d1f3-9c3d-17a99c67e872@proxmox.com>
 <ac0e0452-9cbd-ae60-fb6e-d688bc2e4481@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Message-ID: <cfb0a229-ad22-349d-cb76-df40fac4c936@proxmox.com>
Date: Thu, 17 Dec 2020 10:23:33 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:82.0) Gecko/20100101
 Thunderbird/82.0
MIME-Version: 1.0
In-Reply-To: <ac0e0452-9cbd-ae60-fb6e-d688bc2e4481@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Language: en-GB
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.065 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [0pointer.de]
Subject: Re: [pve-devel] [PATCH zsync] fix #2821: only abort if there really
 is a waiting/syncing job instance already
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 09:24:07 -0000

On 17/12/2020 09:40, Fabian Ebner wrote:
> Am 14.12.20 um 14:47 schrieb Thomas Lamprecht:
>> On 14.12.20 14:00, Fabian Ebner wrote:
>>> @@ -584,6 +586,33 @@ sub destroy_job {
>>>       });
>>>   }
>>>   +sub get_process_start_time {
>>> +    my ($pid) = @_;
>>> +
>>> +    return eval { run_cmd(['ps', '-o', 'lstart=', '-p', "$pid"]); };
>>
>> instead of fork+exec do a much cheaper file read?
>>
>> I.e., copying over file_read_firstline from PVE::Tools then:
>>
>> sub get_process_start_time {
>>      my $stat_str = file_read_firstline("/proc/$pid/stat");
>>      my $stat = [ split(/\s+/, $stat_str) ];
>>
>>      return $stat->[21];
>> }
>>
>> plus some error handling (note I did not test above)
>>
> 
> Agreed, although we also need to obtain the boot time (from /proc/stat) to have the actual start time, because the value in /proc/$pid/stat is just the number of clock ticks since boot when the process was started. But it's still much cheaper of course.

hmm, yeah intra-boot this would not be enough to always tell 100% for sure.
FYI, there you probably could also use `/proc/sys/kernel/random/boot_id` can be
read once at program startup.

http://0pointer.de/blog/projects/ids.html (see "Software IDs"),


>>>   @@ -593,11 +622,18 @@ sub sync {
>>>       eval { $job = get_job($param) };
>>>         if ($job) {
>>> -        if (defined($job->{state}) && ($job->{state} eq "syncing" || $job->{state} eq "waiting")) {
>>> +        my $state = $job->{state} // 'ok';
>>> +        $state = 'ok' if !instance_exists($job->{instance_id});
>>> +
>>> +        if ($state eq "syncing" || $state eq "waiting") {
>>>           die "Job --source $param->{source} --name $param->{name} is already scheduled to sync\n";
>>>           }
>>>             $job->{state} = "waiting";
>>> +
>>> +        eval { $job->{instance_id} = get_instance_id($$); };
>>
>> I'd query and cache the local instance ID from the current process on startup, this
>> would have the nice side effect of avoiding error potential here completely
>>
> 
> What if querying fails on startup? I'd rather have it be a non-critical failure and continue. Then we'd still need a check here to see if the cached instance_id is defined.

if you make it just reads of /proc and it fails you can assume critical
conditions and abort. If you really do not want too, you can add a singleton
which returns the cached info and if not available retry getting it and warn.

my $id_cache;
sub get_local_instance_id {
    return $id_cache if defined($id_cache);
    $id_cache = eval { get_instance_id($$) };
    warn $@ if $@;
    return $id_cache;
}

Albeit, I'd have less hard feelings about caching if getting the ID doesn't
fork, nor other rather costly operations.