Re: [PATCH container] close #7342: Extend qga file-read with chunked access for large files

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

From: Fiona Ebner <f.ebner@proxmox.com>
To: Markus Ebner <info@ebner-markus.de>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH container] close #7342: Extend qga file-read with chunked access for large files
Date: Tue, 24 Feb 2026 12:08:38 +0100	[thread overview]
Message-ID: <97d497d1-067f-4da3-a3f9-f0ea87a27e54@proxmox.com> (raw)
In-Reply-To: <20260223201648.297620-2-info@ebner-markus.de>

As you already noted yourself, the prefix is wrong. Should be
'qemu-server' rather than 'container' or 'qemu'.

Am 24.02.26 um 9:47 AM schrieb Markus Ebner:
> The file-read command of the QEMU guest agent previously had several
> practical limitations.

The limitations are in the file-read API endpoint, not in the guest
agent itself.

> It always read a fixed 16 MiB block starting at
> offset 0, making it impossible to retrieve larger files in multiple
> chunks. On busy or resource‑constrained hosts, requests for large files
> often timed out because the agent attempted to read and JSON‑encode the
> entire 16 MiB block at once.

Okay, 16 MiB does not seem like that much, but if you ran into the
issue, it makes sense to be more flexible.

> Binary data was also returned as raw JSON strings with extensive
> escaping, which inflated payload size and caused compatibility issues
> with some JSON parsers.

Could you give a concrete example here? What JSON parser/what
compatibility issue? I'd add that the 'decode' parameter is there for
improving this.

> This patch extends the file-read method with three new parameters:
> 
> - decode — Controls whether the base64‑encoded data returned by the
>   guest agent should be decoded before being sent back through the API.
>   When disabled, the base64 string is passed through unchanged, which is
>   ideal for binary data and mirrors the existing encode parameter of
>   file-write.
> 
> - offset — Allows reading from an arbitrary byte offset within the
>   file.
> 
> - count — Allows requesting a smaller number of bytes than the
>   internal 16 MiB limit, avoiding unnecessary overhead and reducing
>   timeout risk.

I'd prefer if there were three patches, one for each new parameter.

> 
> With these additions, the behavior now mirrors standard file operations
> (fopen, fseek, fread). Reading beyond EOF returns zero bytes.
> Seek can choose any non-negative position within the file, without
> bounds checking. Reading out of bounds returns 0 bytes.
> This allows conveniently reading an entire file in a robust way:
> while(truncated && content.length != 0) {}

It should be enough to only check the truncated flag, or what additional
info does length being non-zero give?

> and also enables things like tailing a changing file.
> This makes the file-read command significantly more flexible.
> 
> All parameter additions were done in a backwards-compatible fashion.
> 
> Signed-off-by: Markus Ebner <info@ebner-markus.de>

Thank you for your contribution! A few comments below, but it's looking
quite nice already :)

> ---
>  src/PVE/API2/Qemu/Agent.pm | 54 ++++++++++++++++++++++++++++++++------
>  1 file changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/src/PVE/API2/Qemu/Agent.pm b/src/PVE/API2/Qemu/Agent.pm
> index de36ce1e..ccd1dca2 100644
> --- a/src/PVE/API2/Qemu/Agent.pm
> +++ b/src/PVE/API2/Qemu/Agent.pm
> @@ -464,6 +464,28 @@ __PACKAGE__->register_method({
>                  'pve-vmid',
>                  { completion => \&PVE::QemuServer::complete_vmid_running },
>              ),
> +            decode => {
> +                type => 'boolean',
> +                optional => 1,
> +                default => 1,
> +                description =>
> +                    "Data received from the QEMU Guest-Agent is base64 encoded. If this is set to true, the data is decoded."

Style nit: line is longer than 100 columns

> +                    . "Otherwise the content is forwarded with base64 encoding - defaults to true.",

Nit: missing space at the beginning, since the strings are joined.

> +            },
> +            offset => {
> +                type => 'integer',
> +                optional => 1,
> +                default => 0,
> +                description => "Offset to start reading at",
> +            },
> +            count => {
> +                type => 'integer',
> +                optional => 1,
> +                minimum => 0,
> +                maximum => $MAX_READ_SIZE,
> +                default => $MAX_READ_SIZE,
> +                description => "Number of bytes to read.",
> +            },
>              file => {
>                  type => 'string',
>                  description => 'The path to the file',
> @@ -487,6 +509,9 @@ __PACKAGE__->register_method({
>      },
>      code => sub {
>          my ($param) = @_;
> +        my $param_offset = int($param->{offset} // 0);
> +        my $param_decode = $param->{decode} // 1;
> +        my $param_count = int($param->{count} // $MAX_READ_SIZE);

Nit: I'd drop the $param_ prefix

>  
>          my $vmid = $param->{vmid};
>          my $conf = PVE::QemuConfig->load_config($vmid);
> @@ -494,18 +519,33 @@ __PACKAGE__->register_method({
>          my $qgafh =
>              agent_cmd($vmid, $conf, "file-open", { path => $param->{file} }, "can't open file");
>  
> -        my $bytes_left = $MAX_READ_SIZE;
> +        if ($param_offset > 0) {
> +            my $seek = mon_cmd(
> +                $vmid, "guest-file-seek",
> +                handle => $qgafh,
> +                offset => $param_offset,
> +                whence => 'set',
> +            );
> +            check_agent_error($seek, "can't seek to offset position");

We should check the result to see if the seek position is as expected.
I'd rather tell the user "you searched to an invalid position" than
implicitly return 0 bytes. If we seek exactly to the EOF it can still be
fine to return 0 bytes I guess. We can set $eof=1 early then.

What do you think?

> +        }
> +
> +        my $bytes_read = 0;
>          my $eof = 0;
>          my $read_size = 1024 * 1024;
>          my $content = "";
>  
> -        while ($bytes_left > 0 && !$eof) {
> +        while ($bytes_read < $param_count && !$eof) {
> +            my $bytes_left = $param_count - $bytes_read;
> +            my $chunk_size = $bytes_left < $read_size ? $bytes_left : $read_size;
>              my $read =
> -                mon_cmd($vmid, "guest-file-read", handle => $qgafh, count => int($read_size));
> +                mon_cmd($vmid, "guest-file-read", handle => $qgafh, count => int($chunk_size));
>              check_agent_error($read, "can't read from file");
>  
> -            $content .= decode_base64($read->{'buf-b64'});
> -            $bytes_left -= $read->{count};
> +            my $chunk = $read->{'buf-b64'};
> +            $chunk = decode_base64($chunk) if $param_decode;
> +            $content .= $chunk;
> +
> +            $bytes_read += $read->{count};
>              $eof = $read->{eof} // 0;
>          }
>  
> @@ -514,12 +554,10 @@ __PACKAGE__->register_method({
>  
>          my $result = {
>              content => $content,
> -            'bytes-read' => ($MAX_READ_SIZE - $bytes_left),
> +            'bytes-read' => $bytes_read,
>          };
>  
>          if (!$eof) {
> -            warn
> -                "agent file-read: reached maximum read size: $MAX_READ_SIZE bytes. output might be truncated.\n";

I think we should still warn this if no read size was explicitly specified.

>              $result->{truncated} = 1;
>          }
>

     prev parent reply	other threads:[~2026-02-24 11:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 20:16 [PATCH qemu 0/1] " Markus Ebner
2026-02-23 20:16 ` [PATCH container] close #7342: " Markus Ebner
2026-02-24 11:08   ` Fiona Ebner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97d497d1-067f-4da3-a3f9-f0ea87a27e54@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=info@ebner-markus.de \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal