From: Fiona Ebner <f.ebner@proxmox.com>
To: Markus Ebner <info@ebner-markus.de>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH container] close #7342: Extend qga file-read with chunked access for large files
Date: Tue, 24 Feb 2026 12:08:38 +0100 [thread overview]
Message-ID: <97d497d1-067f-4da3-a3f9-f0ea87a27e54@proxmox.com> (raw)
In-Reply-To: <20260223201648.297620-2-info@ebner-markus.de>
As you already noted yourself, the prefix is wrong. Should be
'qemu-server' rather than 'container' or 'qemu'.
Am 24.02.26 um 9:47 AM schrieb Markus Ebner:
> The file-read command of the QEMU guest agent previously had several
> practical limitations.
The limitations are in the file-read API endpoint, not in the guest
agent itself.
> It always read a fixed 16 MiB block starting at
> offset 0, making it impossible to retrieve larger files in multiple
> chunks. On busy or resource‑constrained hosts, requests for large files
> often timed out because the agent attempted to read and JSON‑encode the
> entire 16 MiB block at once.
Okay, 16 MiB does not seem like that much, but if you ran into the
issue, it makes sense to be more flexible.
> Binary data was also returned as raw JSON strings with extensive
> escaping, which inflated payload size and caused compatibility issues
> with some JSON parsers.
Could you give a concrete example here? What JSON parser/what
compatibility issue? I'd add that the 'decode' parameter is there for
improving this.
> This patch extends the file-read method with three new parameters:
>
> - decode — Controls whether the base64‑encoded data returned by the
> guest agent should be decoded before being sent back through the API.
> When disabled, the base64 string is passed through unchanged, which is
> ideal for binary data and mirrors the existing encode parameter of
> file-write.
>
> - offset — Allows reading from an arbitrary byte offset within the
> file.
>
> - count — Allows requesting a smaller number of bytes than the
> internal 16 MiB limit, avoiding unnecessary overhead and reducing
> timeout risk.
I'd prefer if there were three patches, one for each new parameter.
>
> With these additions, the behavior now mirrors standard file operations
> (fopen, fseek, fread). Reading beyond EOF returns zero bytes.
> Seek can choose any non-negative position within the file, without
> bounds checking. Reading out of bounds returns 0 bytes.
> This allows conveniently reading an entire file in a robust way:
> while(truncated && content.length != 0) {}
It should be enough to only check the truncated flag, or what additional
info does length being non-zero give?
> and also enables things like tailing a changing file.
> This makes the file-read command significantly more flexible.
>
> All parameter additions were done in a backwards-compatible fashion.
>
> Signed-off-by: Markus Ebner <info@ebner-markus.de>
Thank you for your contribution! A few comments below, but it's looking
quite nice already :)
> ---
> src/PVE/API2/Qemu/Agent.pm | 54 ++++++++++++++++++++++++++++++++------
> 1 file changed, 46 insertions(+), 8 deletions(-)
>
> diff --git a/src/PVE/API2/Qemu/Agent.pm b/src/PVE/API2/Qemu/Agent.pm
> index de36ce1e..ccd1dca2 100644
> --- a/src/PVE/API2/Qemu/Agent.pm
> +++ b/src/PVE/API2/Qemu/Agent.pm
> @@ -464,6 +464,28 @@ __PACKAGE__->register_method({
> 'pve-vmid',
> { completion => \&PVE::QemuServer::complete_vmid_running },
> ),
> + decode => {
> + type => 'boolean',
> + optional => 1,
> + default => 1,
> + description =>
> + "Data received from the QEMU Guest-Agent is base64 encoded. If this is set to true, the data is decoded."
Style nit: line is longer than 100 columns
> + . "Otherwise the content is forwarded with base64 encoding - defaults to true.",
Nit: missing space at the beginning, since the strings are joined.
> + },
> + offset => {
> + type => 'integer',
> + optional => 1,
> + default => 0,
> + description => "Offset to start reading at",
> + },
> + count => {
> + type => 'integer',
> + optional => 1,
> + minimum => 0,
> + maximum => $MAX_READ_SIZE,
> + default => $MAX_READ_SIZE,
> + description => "Number of bytes to read.",
> + },
> file => {
> type => 'string',
> description => 'The path to the file',
> @@ -487,6 +509,9 @@ __PACKAGE__->register_method({
> },
> code => sub {
> my ($param) = @_;
> + my $param_offset = int($param->{offset} // 0);
> + my $param_decode = $param->{decode} // 1;
> + my $param_count = int($param->{count} // $MAX_READ_SIZE);
Nit: I'd drop the $param_ prefix
>
> my $vmid = $param->{vmid};
> my $conf = PVE::QemuConfig->load_config($vmid);
> @@ -494,18 +519,33 @@ __PACKAGE__->register_method({
> my $qgafh =
> agent_cmd($vmid, $conf, "file-open", { path => $param->{file} }, "can't open file");
>
> - my $bytes_left = $MAX_READ_SIZE;
> + if ($param_offset > 0) {
> + my $seek = mon_cmd(
> + $vmid, "guest-file-seek",
> + handle => $qgafh,
> + offset => $param_offset,
> + whence => 'set',
> + );
> + check_agent_error($seek, "can't seek to offset position");
We should check the result to see if the seek position is as expected.
I'd rather tell the user "you searched to an invalid position" than
implicitly return 0 bytes. If we seek exactly to the EOF it can still be
fine to return 0 bytes I guess. We can set $eof=1 early then.
What do you think?
> + }
> +
> + my $bytes_read = 0;
> my $eof = 0;
> my $read_size = 1024 * 1024;
> my $content = "";
>
> - while ($bytes_left > 0 && !$eof) {
> + while ($bytes_read < $param_count && !$eof) {
> + my $bytes_left = $param_count - $bytes_read;
> + my $chunk_size = $bytes_left < $read_size ? $bytes_left : $read_size;
> my $read =
> - mon_cmd($vmid, "guest-file-read", handle => $qgafh, count => int($read_size));
> + mon_cmd($vmid, "guest-file-read", handle => $qgafh, count => int($chunk_size));
> check_agent_error($read, "can't read from file");
>
> - $content .= decode_base64($read->{'buf-b64'});
> - $bytes_left -= $read->{count};
> + my $chunk = $read->{'buf-b64'};
> + $chunk = decode_base64($chunk) if $param_decode;
> + $content .= $chunk;
> +
> + $bytes_read += $read->{count};
> $eof = $read->{eof} // 0;
> }
>
> @@ -514,12 +554,10 @@ __PACKAGE__->register_method({
>
> my $result = {
> content => $content,
> - 'bytes-read' => ($MAX_READ_SIZE - $bytes_left),
> + 'bytes-read' => $bytes_read,
> };
>
> if (!$eof) {
> - warn
> - "agent file-read: reached maximum read size: $MAX_READ_SIZE bytes. output might be truncated.\n";
I think we should still warn this if no read size was explicitly specified.
> $result->{truncated} = 1;
> }
>
prev parent reply other threads:[~2026-02-24 11:08 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 20:16 [PATCH qemu 0/1] " Markus Ebner
2026-02-23 20:16 ` [PATCH container] close #7342: " Markus Ebner
2026-02-24 11:08 ` Fiona Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97d497d1-067f-4da3-a3f9-f0ea87a27e54@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=info@ebner-markus.de \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox