From: Fiona Ebner <f.ebner@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: [pve-devel] Script for bug #2874
Date: Fri, 27 Jan 2023 14:49:20 +0100 [thread overview]
Message-ID: <5105f09e-2f15-02ee-dd41-a427a6262a91@proxmox.com> (raw)
The attached script allows monitoring the first sector of the bootdisk
for running VMs (all or a selection of IDs) for people affected by bug
#2874 [0]. The hope is to pinpoint when the sector gets corrupted to be
able to correlate the timing with operations that might cause it. The
script also dumps the contents, because it might help to see how the
sector gets corrupted.
Note that the script needs to be executed on each node and that you can
specify IDs for VMs not currently on that node, which is useful to catch
migrating VMs (or don't specify any IDs to monitor all running VMs).
The script parses the VM config to determine the boot disk, looks up the
path and uses qemu-img dd and base64 to save the contents of the first
512 bytes in a non-binary format and will dump the contents whenever
they change.
Example invocations:
# monitor all running VMs, check every 5 minutes
perl monitor-sector-zero.pl --interval 300
# only monitor 166 and 167, check every minute, log to file
perl monitor-sector-zero.pl 166 167 &> /path/to/file
Feedback from users and other developers is highly appreciated!
[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=2874
From f.ebner@proxmox.com Fri Jan 27 14:58:15 2023
Return-Path: <f.ebner@proxmox.com>
X-Original-To: pve-devel@lists.proxmox.com
Delivered-To: pve-devel@lists.proxmox.com
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits))
(No client certificate requested)
by lists.proxmox.com (Postfix) with ESMTPS id D1DBA970CE
for <pve-devel@lists.proxmox.com>; Fri, 27 Jan 2023 14:58:15 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
by firstgate.proxmox.com (Proxmox) with ESMTP id ABF52970F
for <pve-devel@lists.proxmox.com>; Fri, 27 Jan 2023 14:57:45 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
[94.136.29.106])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits))
(No client certificate requested)
by firstgate.proxmox.com (Proxmox) with ESMTPS
for <pve-devel@lists.proxmox.com>; Fri, 27 Jan 2023 14:57:44 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 20F4A4680C
for <pve-devel@lists.proxmox.com>; Fri, 27 Jan 2023 14:57:44 +0100 (CET)
Message-ID: <e6f641f6-14cc-29ee-fd59-a8dcb7209d38@proxmox.com>
Date: Fri, 27 Jan 2023 14:57:38 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.6.0
To: pve-devel@lists.proxmox.com
References: <5105f09e-2f15-02ee-dd41-a427a6262a91@proxmox.com>
Content-Language: en-US
From: Fiona Ebner <f.ebner@proxmox.com>
In-Reply-To: <5105f09e-2f15-02ee-dd41-a427a6262a91@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results: 0
AWL 2.824 Adjusted score from AWL reputation of From: address
BAYES_00 -1.9 Bayes spam probability is 0 to 1%
KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
KAM_NUMSUBJECT 0.5 Subject ends in numbers excluding current years
NICE_REPLY_A -1.148 Looks like a legit reply (A)
RCVD_IN_DNSWL_HI -5 Sender listed at https://www.dnswl.org/,
high trust
SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record
SPF_PASS -0.001 SPF: sender matches SPF record
URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
information. [proxmox.com, monitor-sector-zero.pl]
Subject: Re: [pve-devel] Script for bug #2874
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>,
<mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>,
<mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 27 Jan 2023 13:58:15 -0000
Am 27.01.23 um 14:49 schrieb Fiona Ebner:
> The attached script allows monitoring the first sector of the bootdisk
> for running VMs (all or a selection of IDs) for people affected by bug
> #2874 [0]. The hope is to pinpoint when the sector gets corrupted to be
> able to correlate the timing with operations that might cause it. The
> script also dumps the contents, because it might help to see how the
> sector gets corrupted.
>
> Note that the script needs to be executed on each node and that you can
> specify IDs for VMs not currently on that node, which is useful to catch
> migrating VMs (or don't specify any IDs to monitor all running VMs).
>
> The script parses the VM config to determine the boot disk, looks up the
> path and uses qemu-img dd and base64 to save the contents of the first
> 512 bytes in a non-binary format and will dump the contents whenever
> they change.
>
> Example invocations:
> # monitor all running VMs, check every 5 minutes
> perl monitor-sector-zero.pl --interval 300
> # only monitor 166 and 167, check every minute, log to file
> perl monitor-sector-zero.pl 166 167 &> /path/to/file
>
> Feedback from users and other developers is highly appreciated!
>
> [0]: https://bugzilla.proxmox.com/show_bug.cgi?id=2874
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
Well, apparently the attachment got removed. So here it is:
#!/bin/perl
use strict;
use warnings;
use Getopt::Long qw(GetOptions);
use POSIX qw(strftime);
use PVE::Cluster;
use PVE::QemuConfig;
use PVE::QemuServer::Drive qw(drive_is_cdrom is_valid_drivename parse_drive);
use PVE::QemuServer::Helpers;
use PVE::Storage;
# START OF HELPER FUNCTIONS
sub pprint {
my ($msg, $vmid, $volid) = @_;
chomp($msg);
my $time = strftime("%F %H:%M:%S", localtime);
my $time_prefix = "$time - ";
my $vmid_prefix = $vmid ? "$vmid - " : '';
my $volid_prefix = $volid ? "$volid - " : '';
print "$time_prefix$vmid_prefix$volid_prefix$msg\n";
}
my $fixed_vmlist;
sub get_vmids {
return $fixed_vmlist if $fixed_vmlist;
my $list = [];
my $vmlist = PVE::Cluster::get_vmlist();
for my $vmid (keys $vmlist->{ids}->%*) {
next if $vmlist->{ids}->{$vmid}->{type} ne 'qemu';
push $list->@*, $vmid;
}
return $list;
}
my $running = {};
sub update_running {
my ($vmid) = @_;
my $old_running = $running->{$vmid};
$running->{$vmid} = eval { PVE::QemuServer::Helpers::vm_running_locally($vmid); };
pprint("could not check if VM is running - $@", $vmid) if $@;
pprint("stop monitoring - not running", $vmid) if !$running->{$vmid} && $old_running;
pprint("start monitoring - now running", $vmid) if $running->{$vmid} && !$old_running;
return $running->{$vmid};
}
sub get_bootdisk_volid {
my ($vmid) = @_;
my $conf = PVE::QemuConfig->load_config($vmid);
my $bootdisks = PVE::QemuServer::Drive::get_bootdisks($conf);
for my $bootdisk ($bootdisks->@*) {
next if !is_valid_drivename($bootdisk);
next if !$conf->{$bootdisk};
my $drive = parse_drive($bootdisk, $conf->{$bootdisk});
next if !defined($drive);
next if drive_is_cdrom($drive);
my $volid = $drive->{file};
next if !$volid;
return $volid;
}
die "no bootdisk found in config\n";
}
my $errors = {};
sub should_skip {
my ($vmid) = @_;
return $errors->{$vmid} >= 3;
}
# END OF HELPER FUNCTIONS
my $interval = 60;
GetOptions('interval=i' => \$interval);
if (scalar(@ARGV)) {
$fixed_vmlist = [@ARGV];
pprint("monitoring VMs " . join(',', sort {$a <=> $b} $fixed_vmlist->@*));
} else {
pprint("no list of VMIDs provided - monitoring all VMs");
}
my $contents = {};
while (1) {
PVE::Cluster::cfs_update();
my $storecfg = PVE::Storage::config();
my $vmids = get_vmids();
for my $vmid ($vmids->@*) {
$errors->{$vmid} //= 0;
next if should_skip($vmid);
next if !update_running($vmid);
eval {
my $volid = get_bootdisk_volid($vmid);
my $path = PVE::Storage::path($storecfg, $volid);
my $cmd = [
['qemu-img', 'dd', 'bs=512', 'count=1', "if=$path"],
['base64', '--wrap', '0'],
];
my $content;
PVE::Tools::run_command($cmd, outfunc => sub { $content = shift });
die "no output\n" if !$content;
if (!defined($contents->{$vmid})) {
pprint("registered content for first sector", $vmid, $volid);
print "$content\n";
$contents->{$vmid} //= $content;
}
if ($content ne $contents->{$vmid}) {
pprint("detected changed content for first sector!", $vmid, $volid);
print "$content\n";
$contents->{$vmid} = $content;
}
};
if (my $err = $@) {
pprint("can't determine content for first sector - $err", $vmid);
$errors->{$vmid}++;
pprint("too many errors - skipping from now on", $vmid) if should_skip($vmid);
} else {
$errors->{$vmid} = 0;
}
}
sleep $interval;
}
reply other threads:[~2023-01-27 13:49 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5105f09e-2f15-02ee-dd41-a427a6262a91@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox