From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id DCDCBB435D for ; Fri, 1 Dec 2023 15:26:01 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B6D2918437 for ; Fri, 1 Dec 2023 15:25:31 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 1 Dec 2023 15:25:29 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9CD8743433 for ; Fri, 1 Dec 2023 15:25:29 +0100 (CET) From: Maximiliano Sandoval To: pve-devel@lists.proxmox.com Date: Fri, 1 Dec 2023 15:25:27 +0100 Message-Id: <20231201142527.213620-1-m.sandoval@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.002 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [jsgettext.pl] Subject: [pve-devel] [PATCH proxmox-i18n] use xgettext to extract translatable strings X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Dec 2023 14:26:01 -0000 xgettext is a robust tool to extract translatable strings from source code. Using msgcat for concatenating pot files is not recommended, hence we also switch to xgettext. It also added garbage when there were comments. What do we get for free: - It de-escapes strings. there are 3 cases in our code base where single-quoted strings were used and its `'` had to be escaped, these were not de-escaped properly when presented to translators. This is one such example ```diff #: proxmox-widget-toolkit/src/panel/EmailRecipientPanel.js:39 -msgid "The notification will be sent to the user\\'s configured mail address" +#, fuzzy +msgid "The notification will be sent to the user's configured mail address" msgstr "La notificación sera enviada a el correo configurado del usuario" ``` - xgettext can detect when strings use a certain style of substitutions, but I was not able to detect the conditions and it only affects a single string in the entire code base. ```diff #: proxmox-widget-toolkit/src/Utils.js:995 +#, javascript-format msgid "{0}% of {1}" msgstr "{0}% de {1}" ``` - Correct POT-Creation-Date, note how the new one matches the Revision-Date's format. ```diff @@ -7,7 +7,7 @@ msgid "" msgstr "" "Project-Id-Version: proxmox translations\n" "Report-Msgid-Bugs-To: \n" -"POT-Creation-Date: Wed Nov 22 18:17:30 2023\n" +"POT-Creation-Date: 2023-12-01 11:25+0100\n" "PO-Revision-Date: 2023-11-27 16:43+0100\n" "Last-Translator: Maximiliano Sandoval \n" "Language-Team: Spanish\n" ``` - Extraction of strings using ngettext, pgettext, etc. Even if we don't have js wrappers for these at the moment, they are critical to provide good-quality translations and could be added in the future. - We can extract comments from the source code with `xgettext -c`. Newly added comments won't mark strings as fuzzy but can provide helpful context to translators. Comments are additive, if for example two sources contain the same string with different comments and it appears a third time without comments, the three sources and the two comments will be shown to translators. These are a few examples that could be implemented in our codebase: It is not clear if "Prune Options" prunes the options or configures pruning. ```js // TRANSLATORS: Opens the panel that allows configuring how Pruning works let s = gettext("Prune Options"); ``` Adding a source for a concept or its expanded name can help translators decide whats the gender for a word in their language. ```js // TRANSLATORS: TOTP stands for Time-based one-time password let s = gettext("Add a TOTP login factor"); ``` Some strings are not marked for translation to avoid translating certain parts of it, this is a change that could be made ```diff -fieldLabel: 'Crush Rule', // do not localize +// TRANSLATORS: Do not translate 'Crush', its a proper name +fieldLabel: gettext('Crush Rule'), ``` Or simply to give more context when substitutions are involved. ``` // TRANSLATORS: For example 'Join CLUSTER_NAME' return Ext.String.format(gettext('Join {0}'), `'${cn}'`); ``` Cons: - In total 3 translations were marked as fuzzy. Translators will have to review and mark them as translated again. - If using -c, gettext can't distinguish if the comment above is useful for translators. The common practice is to add a `TRANSLATORS:` tag to these comments. - The reordering of sources for each msgstr will create an unnecessarily massive (yet ultimately harmless) diff (approx. 50k insertions(+) 50k deletions(-)). Signed-off-by: Maximiliano Sandoval Thomas: Should this be merged, please run `make do_update` and commit the changes to each .po{,t} file. I am not sure if it is possible to even send an email with over 100k lines of text. --- Makefile | 10 +++- jsgettext.pl | 135 --------------------------------------------------- 2 files changed, 8 insertions(+), 137 deletions(-) delete mode 100755 jsgettext.pl diff --git a/Makefile b/Makefile index 1d7af6e..4776e02 100644 --- a/Makefile +++ b/Makefile @@ -97,7 +97,13 @@ pbs-lang-%.js: %.po # parameter 1 is the name # parameter 2 is the directory define potupdate - ./jsgettext.pl -p "$(1) $(shell cd $(2);git rev-parse HEAD)" -o $(1).pot $(2) + find . -iname "*.js" -path "./$(2)*" | xargs xgettext -c -s \ + --from-code="UTF-8" \ + --package-name="$(1)" \ + --package-version="$(shell cd $(2);git rev-parse HEAD)" \ + --msgid-bugs-address="" \ + --copyright-holder="Copyright (C) Proxmox Server Solutions GmbH & the translation contributors." \ + --output="$(1)".pot endef .PHONY: update update_pot do_update @@ -124,7 +130,7 @@ init-%.po: messages.pot .INTERMEDIATE: messages.pot messages.pot: proxmox-widget-toolkit.pot proxmox-mailgateway.pot pve-manager.pot proxmox-backup.pot - msgcat $^ > $@ + xgettext $^ --msgid-bugs-address="" -o $@ .PHONY: distclean distclean: clean diff --git a/jsgettext.pl b/jsgettext.pl deleted file mode 100755 index 7f758fd..0000000 --- a/jsgettext.pl +++ /dev/null @@ -1,135 +0,0 @@ -#!/usr/bin/perl - -use strict; -use warnings; - -use Encode; -use Getopt::Long; -use Locale::PO; -use Time::Local; - -my $options = {}; -GetOptions($options, 'o=s', 'b=s', 'p=s') or die "unable to parse options\n"; - -my $dirs = [@ARGV]; - -die "no directory specified\n" if !scalar(@$dirs); - -foreach my $dir (@$dirs) { - die "no such directory '$dir'\n" if ! -d $dir; -} - -my $projectId = $options->{p} || die "missing project ID\n"; - -my $basehref = {}; -if (my $base = $options->{b}) { - my $aref = Locale::PO->load_file_asarray($base) || - die "unable to load '$base'\n"; - - my $charset; - my $hpo = $aref->[0] || die "no header"; - my $header = $hpo->dequote($hpo->msgstr); - if ($header =~ m|^Content-Type:\s+text/plain;\s+charset=(\S+)$|im) { - $charset = $1; - } else { - die "unable to get charset\n" if !$charset; - } - - foreach my $po (@$aref) { - my $qmsgid = decode($charset, $po->msgid); - my $msgid = $po->dequote($qmsgid); - $basehref->{$msgid} = $po; - } -} - -sub find_js_sources { - my ($base_dirs) = @_; - - my $find_cmd = 'find '; - # shell quote heuristic, with the (here safe) assumption that the dirs don't contain single-quotes - $find_cmd .= join(' ', map { "'$_'" } $base_dirs->@*); - $find_cmd .= ' -name "*.js"'; - open(my $find_cmd_output, '-|', "$find_cmd | sort") or die "Failed to execute command: $!"; - - my $sources = []; - while (my $line = <$find_cmd_output>) { - chomp $line; - print "F: $line\n"; - push @$sources, $line; - } - close($find_cmd_output); - - return $sources; -} - -my $header = <<'__EOD'; -Proxmox message catalog. - -Copyright (C) Proxmox Server Solutions GmbH - -This file is free software: you can redistribute it and/or modify it under the terms of the GNU -Affero General Public License as published by the Free Software Foundation, either version 3 of the -License, or (at your option) any later version. --- Proxmox Support Team -__EOD - -my $ctime = scalar localtime; - -my $href = { - '' => Locale::PO->new( - -msgid => '', - -comment => $header, - -fuzzy => 1, - -msgstr => "Project-Id-Version: $projectId\n" - ."Report-Msgid-Bugs-To: \n" - ."POT-Creation-Date: $ctime\n" - ."PO-Revision-Date: YEAR-MO-DA HO:MI +ZONE\n" - ."Last-Translator: FULL NAME \n" - ."Language-Team: LANGUAGE \n" - ."MIME-Version: 1.0\n" - ."Content-Type: text/plain; charset=UTF-8\n" - ."Content-Transfer-Encoding: 8bit\n", - ), -}; - -sub extract_msg { - my ($filename, $linenr, $line) = @_; - - my $count = 0; - - while(1) { - my $text; - if ($line =~ m/\bgettext\s*\((("((?:[^"\\]++|\\.)*+)")|('((?:[^'\\]++|\\.)*+)'))\)/g) { - $text = $3 || $5; - } - last if !$text; - return if $basehref->{$text}; - $count++; - - my $ref = "$filename:$linenr"; - - if (my $po = $href->{$text}) { - $po->reference($po->reference() . " $ref"); - } else { - $href->{$text} = Locale::PO->new(-msgid=> $text, -reference=> $ref, -msgstr=> ''); - } - } - die "can't extract gettext message in '$filename' line $linenr\n" if !$count; - return; -} - -my $sources = find_js_sources($dirs); - -foreach my $s (@$sources) { - open(my $SRC_FH, '<', $s) || die "unable to open file '$s' - $!\n"; - while(defined(my $line = <$SRC_FH>)) { - if ($line =~ m/gettext\s*\(/ && $line !~ m/^\s*function gettext/) { - extract_msg($s, $., $line); - } - } - close($SRC_FH); -} - -my $filename = $options->{o} // "messages.pot"; -Locale::PO->save_file_fromhash($filename, $href); - -- 2.39.2