public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: PVE development discussion <pve-devel@pve.proxmox.com>
Subject: [pve-devel] Plan for (invasive) shrink of pve-manager git repository
Date: Fri, 26 May 2023 11:45:15 +0200	[thread overview]
Message-ID: <e220f733-e96d-f83d-5afb-606199bc6fdc@proxmox.com> (raw)

Hi all!

It follows a head's up for the plan of making it easier to work with our
pve-manager git repository by rewriting its history to filter out huge
artefacts.

This will only affect developers, nothing in the current pve-manager Debian
package will change.


# Background

Our current pve-manager git repository is huge (> 500 MB) and this is mostly
due to hosting various huge copies of ExtJS, both as ZIP archive and as
extracted version directly in its git history.

Nowadays, well since Q1 of 2017 (before Proxmox VE 5), those huge artefacts are
not used anymore, as we slit the one still in use, like the ExtJS GPL source
code, out to its own repo, without any ZIP archives.  But, git being git and
providing a full history of every change still needs to hold copies of those
artefacts in its CAS object store, one cannot really mask those in any (for
development) ergonomic way.


# Proposed Solution

I'll use the git filter-repo [0] tool, a replacement for filter-branch with
better UX and less potential for getting it wrong, to rewrite the history,
filtering out any problematic artefact or directory.

For this I'll use the following file-list

www/ext6 www/ext5 www/ext4 www/touch po glob:*.zip

used as inverted match via the following command:

git filter-repo --invert-paths --paths-from-file
~/pve-manager-inverted-filter-paths

Then, I'd rename the current "pve-manager.git" hosted at git.proxmox.com to
"pve-manager-legacy.git", so it will still be able as reference for ancient
history, providing the possibility to build pre PVE 5 pve-manager packages
(why ever one would want/needs to do that).

A new repo, with the same name "pve-manager.git", would then get created and
the now cleaned up git repo pushed to it.


# Result

The result of above command measured by .git disk usage:

Before:  551 MB After:    26 MB

So a huge reduction.


# Fallout

This naturally has some fallout for developers currently working patch series,
similar to any force-push (which we normally avoid at all cost).

Rebasing won't work IIUC, but as the source file layout won't change, you can
simply use "git cherry-pick <rev-range>" if you have the before filter and
after filter remotes & branches in the same git repo.  Otherwise, one can
always use "git format-patch -o ~/patches/ <rev-range>" in the old repo to
export patches cleanly, and then use "git am -3 ~/patches/*.patch" in the new
repo.

Note that git commit hash references inside commit messages of pve-manager will
get rewritten, so here won't notice anything.  Commit references from other
repos are naturally untouched, but pve-manager being a leave package means that
it won't have that many in other repos.

I'll safe a copy of the old -> new commit reference map that git filter-repo
produces, ensuring we got full transparency.


# Date of Change

I'll probably carry above out tomorrow, Saturday 2023-05-27, sometimes between
10:00 CEST and day's end, but writing today for a short heads-up.

For the record: this plan was discussed with Dietmar Maurer and Dominik, and as
said, this is "only" affecting developers.  And yes, it is a bit of a nuisance
and generating some churn, but we talked about doing this every other year, and
it won't get better on it's own, so let's just finally go for it.

cheers
 Thomas




             reply	other threads:[~2023-05-26  9:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-26  9:45 Thomas Lamprecht [this message]
2023-05-28 18:38 ` Thomas Lamprecht
2023-05-28 18:51   ` Thomas Lamprecht
2023-05-30  8:36   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e220f733-e96d-f83d-5afb-606199bc6fdc@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-devel@pve.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal