Re: [PATCH container] close #7342: Extend qga file-read with chunked access for large files

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

From: Fiona Ebner <f.ebner@proxmox.com>
To: Markus Ebner <info@ebner-markus.de>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH container] close #7342: Extend qga file-read with chunked access for large files
Date: Tue, 24 Feb 2026 12:08:38 +0100	[thread overview]
Message-ID: <97d497d1-067f-4da3-a3f9-f0ea87a27e54@proxmox.com> (raw)
In-Reply-To: <20260223201648.297620-2-info@ebner-markus.de>

As you already noted yourself, the prefix is wrong. Should be
'qemu-server' rather than 'container' or 'qemu'.

Am 24.02.26 um 9:47 AM schrieb Markus Ebner:
> The file-read command of the QEMU guest agent previously had several
> practical limitations.

The limitations are in the file-read API endpoint, not in the guest
agent itself.

> It always read a fixed 16 MiB block starting at
> offset 0, making it impossible to retrieve larger files in multiple
> chunks. On busy or resource‑constrained hosts, requests for large files
> often timed out because the agent attempted to read and JSON‑encode the
> entire 16 MiB block at once.

Okay, 16 MiB does not seem like that much, but if you ran into the
issue, it makes sense to be more flexible.

> Binary data was also returned as raw JSON strings with extensive
> escaping, which inflated payload size and caused compatibility issues
> with some JSON parsers.

Could you give a concrete example here? What JSON parser/what
compatibility issue? I'd add that the 'decode' parameter is there for
improving this.

> This patch extends the file-read method with three new parameters:
> 
> - decode — Controls whether the base64‑encoded data returned by the
>   guest agent should be decoded before being sent back through the API.
>   When disabled, the base64 string is passed through unchanged, which is
>   ideal for binary data and mirrors the existing encode parameter of
>   file-write.
> 
> - offset — Allows reading from an arbitrary byte offset within the
>   file.
> 
> - count — Allows requesting a smaller number of bytes than the
>   internal 16 MiB limit, avoiding unnecessary overhead and reducing
>   timeout risk.

I'd prefer if there were three patches, one for each new parameter.

> 
> With these additions, the behavior now mirrors standard file operations
> (fopen, fseek, fread). Reading beyond EOF returns zero bytes.
> Seek can choose any non-negative position within the file, without
> bounds checking. Reading out of bounds returns 0 bytes.
> This allows conveniently reading an entire file in a robust way:
> while(truncated && content.length != 0) {}

It should be enough to only check the truncated flag, or what additional
info does length being non-zero give?

> and also enables things like tailing a changing file.
> This makes the file-read command significantly more flexible.
> 
> All parameter additions were done in a backwards-compatible fashion.
> 
> Signed-off-by: Markus Ebner <info@ebner-markus.de>

Thank you for your contribution! A few comments below, but it's looking
quite nice already :)

> ---
>  src/PVE/API2/Qemu/Agent.pm | 54 ++++++++++++++++++++++++++++++++------
>  1 file changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/src/PVE/API2/Qemu/Agent.pm b/src/PVE/API2/Qemu/Agent.pm
> index de36ce1e..ccd1dca2 100644
> --- a/src/PVE/API2/Qemu/Agent.pm
> +++ b/src/PVE/API2/Qemu/Agent.pm
> @@ -464,6 +464,28 @@ __PACKAGE__->register_method({
>                  'pve-vmid',
>                  { completion => \&PVE::QemuServer::complete_vmid_running },
>              ),
> +            decode => {
> +                type => 'boolean',
> +                optional => 1,
> +                default => 1,
> +                description =>
> +                    "Data received from the QEMU Guest-Agent is base64 encoded. If this is set to true, the data is decoded."

Style nit: line is longer than 100 columns

> +                    . "Otherwise the content is forwarded with base64 encoding - defaults to true.",

Nit: missing space at the beginning, since the strings are joined.

> +            },
> +            offset => {
> +                type => 'integer',
> +                optional => 1,
> +                default => 0,
> +                description => "Offset to start reading at",
> +            },
> +            count => {
> +                type => 'integer',
> +                optional => 1,
> +                minimum => 0,
> +                maximum => $MAX_READ_SIZE,
> +                default => $MAX_READ_SIZE,
> +                description => "Number of bytes to read.",
> +            },
>              file => {
>                  type => 'string',
>                  description => 'The path to the file',
> @@ -487,6 +509,9 @@ __PACKAGE__->register_method({
>      },
>      code => sub {
>          my ($param) = @_;
> +        my $param_offset = int($param->{offset} // 0);
> +        my $param_decode = $param->{decode} // 1;
> +        my $param_count = int($param->{count} // $MAX_READ_SIZE);

Nit: I'd drop the $param_ prefix

>  
>          my $vmid = $param->{vmid};
>          my $conf = PVE::QemuConfig->load_config($vmid);
> @@ -494,18 +519,33 @@ __PACKAGE__->register_method({
>          my $qgafh =
>              agent_cmd($vmid, $conf, "file-open", { path => $param->{file} }, "can't open file");
>  
> -        my $bytes_left = $MAX_READ_SIZE;
> +        if ($param_offset > 0) {
> +            my $seek = mon_cmd(
> +                $vmid, "guest-file-seek",
> +                handle => $qgafh,
> +                offset => $param_offset,
> +                whence => 'set',
> +            );
> +            check_agent_error($seek, "can't seek to offset position");

We should check the result to see if the seek position is as expected.
I'd rather tell the user "you searched to an invalid position" than
implicitly return 0 bytes. If we seek exactly to the EOF it can still be
fine to return 0 bytes I guess. We can set $eof=1 early then.

What do you think?

> +        }
> +
> +        my $bytes_read = 0;
>          my $eof = 0;
>          my $read_size = 1024 * 1024;
>          my $content = "";
>  
> -        while ($bytes_left > 0 && !$eof) {
> +        while ($bytes_read < $param_count && !$eof) {
> +            my $bytes_left = $param_count - $bytes_read;
> +            my $chunk_size = $bytes_left < $read_size ? $bytes_left : $read_size;
>              my $read =
> -                mon_cmd($vmid, "guest-file-read", handle => $qgafh, count => int($read_size));
> +                mon_cmd($vmid, "guest-file-read", handle => $qgafh, count => int($chunk_size));
>              check_agent_error($read, "can't read from file");
>  
> -            $content .= decode_base64($read->{'buf-b64'});
> -            $bytes_left -= $read->{count};
> +            my $chunk = $read->{'buf-b64'};
> +            $chunk = decode_base64($chunk) if $param_decode;
> +            $content .= $chunk;
> +
> +            $bytes_read += $read->{count};
>              $eof = $read->{eof} // 0;
>          }
>  
> @@ -514,12 +554,10 @@ __PACKAGE__->register_method({
>  
>          my $result = {
>              content => $content,
> -            'bytes-read' => ($MAX_READ_SIZE - $bytes_left),
> +            'bytes-read' => $bytes_read,
>          };
>  
>          if (!$eof) {
> -            warn
> -                "agent file-read: reached maximum read size: $MAX_READ_SIZE bytes. output might be truncated.\n";

I think we should still warn this if no read size was explicitly specified.

>              $result->{truncated} = 1;
>          }
>

     prev parent reply	other threads:[~2026-02-24 11:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 20:16 [PATCH qemu 0/1] " Markus Ebner
2026-02-23 20:16 ` [PATCH container] close #7342: " Markus Ebner
2026-02-24 11:08   ` Fiona Ebner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97d497d1-067f-4da3-a3f9-f0ea87a27e54@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=info@ebner-markus.de \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal