From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <m.carrara@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id C9E2BBAA8B
 for <pve-devel@lists.proxmox.com>; Wed, 20 Mar 2024 17:59:45 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id AF9F713E52
 for <pve-devel@lists.proxmox.com>; Wed, 20 Mar 2024 17:59:15 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Wed, 20 Mar 2024 17:59:14 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id EB8A548BAA
 for <pve-devel@lists.proxmox.com>; Wed, 20 Mar 2024 17:59:13 +0100 (CET)
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Wed, 20 Mar 2024 17:59:12 +0100
Message-Id: <CZYQK2WBVZSK.WRXZRY301JIG@proxmox.com>
From: "Max Carrara" <m.carrara@proxmox.com>
To: "Max Carrara" <m.carrara@proxmox.com>, "Proxmox VE development
 discussion" <pve-devel@lists.proxmox.com>
X-Mailer: aerc 0.17.0-72-g6a84f1331f1c
References: <20240305150758.252669-1-m.carrara@proxmox.com>
 <20240305150758.252669-7-m.carrara@proxmox.com>
 <1710839809.dxcgevda47.astroid@yuna.none>
 <CZXUNVRT36KS.35GJQBJUR8DPX@proxmox.com>
In-Reply-To: <CZXUNVRT36KS.35GJQBJUR8DPX@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.374 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_ASCII_DIVIDERS 0.8 Email that uses ascii formatting dividers and possible
 spam tricks
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [boost.org, ceph.com, cephconfig.pm, proxmox.com]
Subject: Re: [pve-devel] [PATCH v4 pve-storage 06/16] cephconfig: support
 line-continuations in parser
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 20 Mar 2024 16:59:45 -0000

On Tue Mar 19, 2024 at 4:59 PM CET, Max Carrara wrote:
> On Tue Mar 19, 2024 at 10:37 AM CET, Fabian Gr=C3=BCnbichler wrote:
> > On March 5, 2024 4:07 pm, Max Carrara wrote:
> > > Ceph's docs state the following [0]:
> > >> The backslash character `\` is used as the line-continuation marker
> > >> that combines the next line with the current one.
> > >=20
> > > This commit implements the support of such line-continuations in our
> > > parser.
> > >=20
> > > The line following a line ending with a '\' has its whitespace
> > > preserved, which matches the behaviour in Ceph's original
> > > implementation [1]. In other words, leading and trailing whitespace i=
s
> > > not stripped from a continued line.
> >
> > it's actually a bit more complicated.. ceph only supports line
> > continuations inside values (well, in key value lines after the key ;))=
,
> > and only if they are unquoted..

Upon further research and confirming the behaviour via `ceph-conf`
(thanks for the tip btw!) line continuations are in fact supported in
different parts as well.

Consider the following example 'ceph.conf' file:
```
[clie\
nt]      # some comment
foo\
\
\
\
=3D \
bar
```

The continued `client` section header does actually get parsed by
`ceph-conf` without any issues - the trailing comment and whitespace
are also ignored.

Where it gets really interesting is the continuation right after 'foo':
Because keys are defined using `raw[]` [0], whatever is skipped by the
parser is still included in the parsed output [1].

This has the consequence that the four continued lines are in fact not
skipped and instead read as literal newline characters.

After the equals sign, the line continuation is skipped as expected.

By providing literal newlines via the shell, the above can easily be
verified:

$ ceph-conf -c ceph_cancer.conf -s client foo^M^M^M^M
bar

(The ^M is a literal newline and can usually be obtained by typing
CTRL+V, Enter in your shell.)

To make matters even worse, quoted values may in fact be *directly*
followed by continuations (`ceph-conf` fails otherwise):

```
[client]
foo =3D "bar"\

baz =3D qux
```

The above is considered "correct" because the escaped newline counts as
whitespace. If you were to put some spaces into the empty line after the
"foo" key, these would be skipped as well.

For completeness's sake, this also parses:

```
[client]
foo =3D "bar"\
   # some comment
baz =3D qux
```

However, the following is invalid:

```
[client]
foo =3D "bar"\
baz =3D qux
```

... because the parser sees:

```
[client]
foo =3D "bar"baz =3D qux
```

... which is not allowed, because a quoted value may only be followed
what the grammar defines as "empty_line" [2].

So, this doesn't really make the parsing logic regarding
line continuations any simpler:

  1. Section headers may contain line continuations
  2. Section headers may be followed by whitespace + comments (after ']'
  3. Keys are parsed "raw" and may therefore be continued
     --> Will probably just not handle this case, as there are no config
     keys that contain newline characters or anything of the sort
     - why would there be? Why would a user need this?
  4. Unquoted values may contain line continuations
  5. Quoted values may be *directly* followed by a line continuation
     character, as long as the remaining stuff is whitespace or a
     comment
  6. Bonus point: Quoted values MUST NOT *contain* line continuations,
     as they're parsed as `lexeme[]`s [3]

... so, see you in v5 ;)

[0]: https://git.proxmox.com/?p=3Dceph.git;a=3Dblob;f=3Dceph/src/common/Con=
fUtils.cc;h=3D2f78fd02bf9e27467275752e6f3bca0c5e3946ce;hb=3Drefs/heads/mast=
er#l182
[1]: https://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/r=
eference/directive/raw.html
[2]: https://git.proxmox.com/?p=3Dceph.git;a=3Dblob;f=3Dceph/src/common/Con=
fUtils.cc;h=3D2f78fd02bf9e27467275752e6f3bca0c5e3946ce;hb=3Drefs/heads/mast=
er#l188
[3]: https://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/r=
eference/directive/lexeme.html

>
> As mentioned in my other reply, I'll probably have to revise the whole
> parsing logic to take that into account... but thanks for being so
> thorough!
>
> >
> > >=20
> > > [0]: https://docs.ceph.com/en/reef/rados/configuration/ceph-conf/#cha=
nges-introduced-in-octopus
> > > [1]: https://git.proxmox.com/?p=3Dceph.git;a=3Dblob;f=3Dceph/src/comm=
on/ConfUtils.cc;h=3D2f78fd02bf9e27467275752e6f3bca0c5e3946ce;hb=3Drefs/head=
s/master#l262
> > >=20
> > > Signed-off-by: Max Carrara <m.carrara@proxmox.com>
> > > ---
> > > Changes v2 --> v3:
> > >   * new
> > > Changes v3 --> v4:
> > >   * none
> > >=20
> > >  src/PVE/CephConfig.pm | 28 ++++++++++++++++++++++++----
> > >  1 file changed, 24 insertions(+), 4 deletions(-)
> > >=20
> > > diff --git a/src/PVE/CephConfig.pm b/src/PVE/CephConfig.pm
> > > index 74a92eb..80f71b0 100644
> > > --- a/src/PVE/CephConfig.pm
> > > +++ b/src/PVE/CephConfig.pm
> > > @@ -19,13 +19,33 @@ sub parse_ceph_config {
> > >      return $cfg if !defined($raw);
> > > =20
> > >      my @lines =3D split /\n/, $raw;
> > > +    my @lines_normalized;
> > > +
> > > +    my $re_comment_not_escaped =3D qr/(?<!\\)(#|;).*$/;
> > > +    my $re_leading_ws =3D qr/^\s+/;
> > > +    my $re_trailing_ws =3D qr/\s+$/;
> > > +
> > > +    while (scalar(@lines)) {
> > > +	my $line =3D shift(@lines);
> > > +	$line =3D~ s/$re_comment_not_escaped//;
> > > +	$line =3D~ s/$re_leading_ws//;
> > > +	$line =3D~ s/$re_trailing_ws//;
> > > +	next if !$line;
> > > +
> > > +	# merge lines ending with continuation character '\'
> > > +	while ($line =3D~ s/\\$//) {
> > > +	    my $next_line =3D shift(@lines);
> > > +	    $next_line =3D~ s/$re_comment_not_escaped//;
> > > +	    $next_line =3D~ s/$re_trailing_ws//;
> > > +	    $line .=3D $next_line;
> > > +	}
> > > +
> > > +	push(@lines_normalized, $line);
> > > +    }
> > > =20
> > >      my $section;
> > > =20
> > > -    for my $line (@lines) {
> > > -	$line =3D~ s/(?<!\\)(#|;).*$//;
> > > -	$line =3D~ s/^\s+//;
> > > -	$line =3D~ s/\s+$//;
> > > +    for my $line (@lines_normalized) {
> > >  	next if !$line;
> > > =20
> > >  	if ($line =3D~ m/^\[(.+)\]$/) {
> > > --=20
> > > 2.39.2
> > >=20
> > >=20
> > >=20
> > > _______________________________________________
> > > pve-devel mailing list
> > > pve-devel@lists.proxmox.com
> > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> > >=20
> > >=20
> > >=20
> >
> >
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel