From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <a.lauterer@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id CD21962F45
 for <pbs-devel@lists.proxmox.com>; Mon,  8 Feb 2021 17:07:10 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id BAB4F25248
 for <pbs-devel@lists.proxmox.com>; Mon,  8 Feb 2021 17:06:40 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 6536825236
 for <pbs-devel@lists.proxmox.com>; Mon,  8 Feb 2021 17:06:39 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 268E945581
 for <pbs-devel@lists.proxmox.com>; Mon,  8 Feb 2021 17:06:39 +0100 (CET)
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
 Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
References: <20210205151030.28946-1-a.lauterer@proxmox.com>
 <20210205151030.28946-2-a.lauterer@proxmox.com>
 <2632760d-8024-0ede-0cf1-cd9140e450e2@proxmox.com>
From: Aaron Lauterer <a.lauterer@proxmox.com>
Message-ID: <fbc0dffa-6045-34e9-e0aa-cad026d1c937@proxmox.com>
Date: Mon, 8 Feb 2021 17:06:38 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.7.0
MIME-Version: 1.0
In-Reply-To: <2632760d-8024-0ede-0cf1-cd9140e450e2@proxmox.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.020 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH proxmox-backup 2/2] docs/scanrefs: fix
 handling if ref is same as headline
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 08 Feb 2021 16:07:10 -0000



On 2/6/21 9:22 AM, Thomas Lamprecht wrote:
> On 05.02.21 16:10, Aaron Lauterer wrote:
>> If the ref is named the same as the headline (once normalized), sphinx
>> will return a 'idX' value in node['ids'][1] which we use for the label
>> ID. The headline is always present at index 0.
>>
>> Checking for that and using index 0 in case we do get a 'idX' helps us
>> to avoid using the 'idX' as keys in our OnlineHelpInfo.js and actually
>> use the intended key.
>>
>> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
>> ---
>>   docs/_ext/proxmox-scanrefs.py | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/_ext/proxmox-scanrefs.py b/docs/_ext/proxmox-scanrefs.py
>> index 1b3c0615..0d626561 100644
>> --- a/docs/_ext/proxmox-scanrefs.py
>> +++ b/docs/_ext/proxmox-scanrefs.py
>> @@ -90,7 +90,18 @@ class ReflabelMapper(Builder):
>>                   if hasattr(node, 'expect_referenced_by_id') and len(node['ids']) > 1: # explicit labels
>>                       filename = self.env.doc2path(docname)
>>                       filename_html = re.sub('.rst', '.html', filename)
>> -                    labelid = node['ids'][1] # [0] is predefined by sphinx, we need [1] for explicit ones
>> +
>> +                    # node['ids'][0] contains a normalized version of the
>> +                    # headline.  If the ref and headline are the same
>> +                    # (normalized) sphinx will set the node['ids'][1] to a
>> +                    # generic id in the format `idX` where X is numeric. If the
>> +                    # ref and headline are not the same, the ref name will be
>> +                    # stored in node['ids'][1]
> 
> can you point me from where you derived that?
> 
> Because I think there are always two refs in such cases where we set one
> above a heading: the implicit heading one and the explicit from us.
> The always get normalized, but the implicit has a fallback if there's a ref
> conflict with an explicit or even another implicit one, when a title is
> reused in the same chapter or so?
> 
> Do we also have access to the chapter id/name here?
> Then we could enforce that explicit ones must have that prefixed.

I did derive that from comparing the output of the debug prints for the different situations. Unfortunately the Sphinx docs are a bit sparse on that or my search foo is not good enough ;)

Comparing the output if the explicit ref matches the implicit from the headline (shortened the 'children' element):

{'attributes': {'backrefs': [],
                 'classes': [],
                 'dupnames': [],
                 'ids': ['creating-backups', 'id1'],
                 'names': ['creating backups', 'creating_backups']},
  'children': [<title: <#text: 'Creating Backups'>>,
               <paragraph: <#text: 'This section e ...'>>,
               [.....]
               <literal_block: <#text: '# proxmox-back ...'>>,
               <section "excluding files/folders from a backup": <title...><paragraph...><paragraph...><paragraph...><par ...>],
  'document': <document: <section "backup client usage"...>>,
  'expect_referenced_by_id': {'creating-backups': <target: >},
  'expect_referenced_by_name': {'creating_backups': <target: >},


And now if the explicit ref is different from the headline:


{'attributes': {'backrefs': [],
                 'classes': [],
                 'dupnames': [],
                 'ids': ['creating-backups', 'client-creating-backups'],
                 'names': ['creating backups', 'client_creating_backups']},
  'children': [<title: <#text: 'Creating Backups'>>,
               <paragraph: <#text: 'This section e ...'>>,
               [...]
               <literal_block: <#text: '# proxmox-back ...'>>,
               <section "excluding files/folders from a backup": <title...><paragraph...><paragraph...><paragraph...><par ...>],
  'document': <document: <section "backup client usage"...>>,
  'expect_referenced_by_id': {'client-creating-backups': <target: >},
  'expect_referenced_by_name': {'client_creating_backups': <target: >},



You can see the difference in the 'attributes.ids' array.

On thing though that I observed is that 'expect_referenced_by_id' will contain the actual key used for the ref AFAICT. So we could use that and not worry about checking if the 'attributes.ids[0]' array contains a string starting with 'id[0-9]'. If I set the explicit ref to 'idX' with X being a number, that then is also present in the 'expect_referenced_by_id' field.


On an additional note: Right now we do not have any explicit references matching the headlines they are referencing because they are all prefixed or unique in another way. We could add a check here to fail if the explicit ref id matches the normalized headline and throw a warning / die with error to avoid any ambiguity in the refs in the future.

e.g. (pseudo code)
if (attributes['ids'][0] == expect_referenced_by_id:
     exit('reference is matching implicit headline ref, consider adding a prefix')

> 
>> +                    if re.match('^id[0-9]*$', node['ids'][1]):
> 
> should be a + not * op? we want to avoid clashes with real possible refs
> as much as possible..
> 
> What happens if I set now one to id1 and there would be already an id1?
> 
> I just really do not want to revisit this again, and loosing references
> is a no-go, the docs must work.

See above note, I think that addresses it.

> 
>> +                        labelid = node['ids'][0]
>> +                    else:
>> +                        labelid = node['ids'][1]
>> +
>>                       title = cast(nodes.title, node[0])
>>                       logger.info('traversing section {}'.format(title.astext()))
>>                       ref_name = getattr(title, 'rawsource', title.astext())
>>
>