From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 0BD93B8F3A for ; Thu, 7 Dec 2023 09:42:33 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id DFEA3B258 for ; Thu, 7 Dec 2023 09:42:32 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 7 Dec 2023 09:42:32 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C26E3432D9 for ; Thu, 7 Dec 2023 09:42:31 +0100 (CET) From: Maximiliano Sandoval To: pve-devel@lists.proxmox.com Date: Thu, 7 Dec 2023 09:42:30 +0100 Message-Id: <20231207084230.62608-1-m.sandoval@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.002 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [jsgettext.pl] Subject: [pve-devel] [PATCH proxmox-i18n v2] use xgettext to extract translatable strings X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Dec 2023 08:42:33 -0000 xgettext is a robust tool to extract translatable strings from source code. Using msgcat for concatenating pot files is not recommended, hence we also switch to xgettext. It also added garbage when there were comments. What do we get for free: - It de-escapes strings. there are 3 cases in our code base where single-quoted strings were used and its `'` had to be escaped, these were not de-escaped properly when presented to translators. This is one such example ```diff #: proxmox-widget-toolkit/src/panel/EmailRecipientPanel.js:39 -msgid "The notification will be sent to the user\\'s configured mail address" +#, fuzzy +msgid "The notification will be sent to the user's configured mail address" msgstr "La notificación sera enviada a el correo configurado del usuario" ``` - xgettext can detect when strings use a certain style of substitutions, but I was not able to detect the conditions and it only affects a single string in the entire code base. ```diff #: proxmox-widget-toolkit/src/Utils.js:995 +#, javascript-format msgid "{0}% of {1}" msgstr "{0}% de {1}" ``` - Correct POT-Creation-Date, note how the new one matches the Revision-Date's format. ```diff @@ -7,7 +7,7 @@ msgid "" msgstr "" "Project-Id-Version: proxmox translations\n" "Report-Msgid-Bugs-To: \n" -"POT-Creation-Date: Wed Nov 22 18:17:30 2023\n" +"POT-Creation-Date: 2023-12-01 11:25+0100\n" "PO-Revision-Date: 2023-11-27 16:43+0100\n" "Last-Translator: Maximiliano Sandoval \n" "Language-Team: Spanish\n" ``` - Extraction of strings using ngettext, pgettext, etc. Even if we don't have js wrappers for these at the moment, they are critical to provide good-quality translations and could be added in the future. - We can extract comments from the source code with `xgettext -c TAG`. Code comments in a line above a `gettext` that start with `TRANSLATORS` will be added to the po files to provide context for translators. Newly added comments won't mark strings as fuzzy but can provide helpful context to translators. Comments are additive, if for example two sources contain the same string with different comments and it appears a third time without comments, the three sources and the two comments will be shown to translators. These are a few examples that could be implemented in our codebase: It is not clear if "Prune Options" prunes the options or configures pruning. ```js // TRANSLATORS: Opens the panel that allows configuring how Pruning works let s = gettext("Prune Options"); ``` Adding a source for a concept or its expanded name can help translators decide whats the gender for a word in their language. ```js // TRANSLATORS: TOTP stands for Time-based one-time password let s = gettext("Add a TOTP login factor"); ``` Some strings are not marked for translation to avoid translating certain parts of it, this is a change that could be made ```diff -fieldLabel: 'Crush Rule', // do not localize +// TRANSLATORS: Do not translate 'Crush', its a proper name +fieldLabel: gettext('Crush Rule'), ``` Or simply to give more context when substitutions are involved. ``` // TRANSLATORS: For example 'Join CLUSTER_NAME' return Ext.String.format(gettext('Join {0}'), `'${cn}'`); ``` Cons: - In total 3 translations were marked as fuzzy. Translators will have to review and mark them as translated again. - The reordering of sources for each msgstr will create an unnecessarily massive (yet ultimately harmless) diff (approx. 50k insertions(+) 50k deletions(-)). Signed-off-by: Maximiliano Sandoval --- Differences from v1: - Use `find -name` rather than `find -iname` - Only extract comments starting with TRANSLATORS. It seems it is not possible to specify multiple tags. Makefile | 11 ++++- jsgettext.pl | 135 --------------------------------------------------- 2 files changed, 9 insertions(+), 137 deletions(-) delete mode 100755 jsgettext.pl diff --git a/Makefile b/Makefile index 1d7af6e..cee10cf 100644 --- a/Makefile +++ b/Makefile @@ -97,7 +97,14 @@ pbs-lang-%.js: %.po # parameter 1 is the name # parameter 2 is the directory define potupdate - ./jsgettext.pl -p "$(1) $(shell cd $(2);git rev-parse HEAD)" -o $(1).pot $(2) + find . -name "*.js" -path "./$(2)*" | xargs xgettext -s \ + --add-comments=TRANSLATORS \ + --from-code="UTF-8" \ + --package-name="$(1)" \ + --package-version="$(shell cd $(2);git rev-parse HEAD)" \ + --msgid-bugs-address="" \ + --copyright-holder="Copyright (C) Proxmox Server Solutions GmbH & the translation contributors." \ + --output="$(1)".pot endef .PHONY: update update_pot do_update @@ -124,7 +131,7 @@ init-%.po: messages.pot .INTERMEDIATE: messages.pot messages.pot: proxmox-widget-toolkit.pot proxmox-mailgateway.pot pve-manager.pot proxmox-backup.pot - msgcat $^ > $@ + xgettext $^ --msgid-bugs-address="" -o $@ .PHONY: distclean distclean: clean diff --git a/jsgettext.pl b/jsgettext.pl deleted file mode 100755 index 7f758fd..0000000 --- a/jsgettext.pl +++ /dev/null @@ -1,135 +0,0 @@ -#!/usr/bin/perl - -use strict; -use warnings; - -use Encode; -use Getopt::Long; -use Locale::PO; -use Time::Local; - -my $options = {}; -GetOptions($options, 'o=s', 'b=s', 'p=s') or die "unable to parse options\n"; - -my $dirs = [@ARGV]; - -die "no directory specified\n" if !scalar(@$dirs); - -foreach my $dir (@$dirs) { - die "no such directory '$dir'\n" if ! -d $dir; -} - -my $projectId = $options->{p} || die "missing project ID\n"; - -my $basehref = {}; -if (my $base = $options->{b}) { - my $aref = Locale::PO->load_file_asarray($base) || - die "unable to load '$base'\n"; - - my $charset; - my $hpo = $aref->[0] || die "no header"; - my $header = $hpo->dequote($hpo->msgstr); - if ($header =~ m|^Content-Type:\s+text/plain;\s+charset=(\S+)$|im) { - $charset = $1; - } else { - die "unable to get charset\n" if !$charset; - } - - foreach my $po (@$aref) { - my $qmsgid = decode($charset, $po->msgid); - my $msgid = $po->dequote($qmsgid); - $basehref->{$msgid} = $po; - } -} - -sub find_js_sources { - my ($base_dirs) = @_; - - my $find_cmd = 'find '; - # shell quote heuristic, with the (here safe) assumption that the dirs don't contain single-quotes - $find_cmd .= join(' ', map { "'$_'" } $base_dirs->@*); - $find_cmd .= ' -name "*.js"'; - open(my $find_cmd_output, '-|', "$find_cmd | sort") or die "Failed to execute command: $!"; - - my $sources = []; - while (my $line = <$find_cmd_output>) { - chomp $line; - print "F: $line\n"; - push @$sources, $line; - } - close($find_cmd_output); - - return $sources; -} - -my $header = <<'__EOD'; -Proxmox message catalog. - -Copyright (C) Proxmox Server Solutions GmbH - -This file is free software: you can redistribute it and/or modify it under the terms of the GNU -Affero General Public License as published by the Free Software Foundation, either version 3 of the -License, or (at your option) any later version. --- Proxmox Support Team -__EOD - -my $ctime = scalar localtime; - -my $href = { - '' => Locale::PO->new( - -msgid => '', - -comment => $header, - -fuzzy => 1, - -msgstr => "Project-Id-Version: $projectId\n" - ."Report-Msgid-Bugs-To: \n" - ."POT-Creation-Date: $ctime\n" - ."PO-Revision-Date: YEAR-MO-DA HO:MI +ZONE\n" - ."Last-Translator: FULL NAME \n" - ."Language-Team: LANGUAGE \n" - ."MIME-Version: 1.0\n" - ."Content-Type: text/plain; charset=UTF-8\n" - ."Content-Transfer-Encoding: 8bit\n", - ), -}; - -sub extract_msg { - my ($filename, $linenr, $line) = @_; - - my $count = 0; - - while(1) { - my $text; - if ($line =~ m/\bgettext\s*\((("((?:[^"\\]++|\\.)*+)")|('((?:[^'\\]++|\\.)*+)'))\)/g) { - $text = $3 || $5; - } - last if !$text; - return if $basehref->{$text}; - $count++; - - my $ref = "$filename:$linenr"; - - if (my $po = $href->{$text}) { - $po->reference($po->reference() . " $ref"); - } else { - $href->{$text} = Locale::PO->new(-msgid=> $text, -reference=> $ref, -msgstr=> ''); - } - } - die "can't extract gettext message in '$filename' line $linenr\n" if !$count; - return; -} - -my $sources = find_js_sources($dirs); - -foreach my $s (@$sources) { - open(my $SRC_FH, '<', $s) || die "unable to open file '$s' - $!\n"; - while(defined(my $line = <$SRC_FH>)) { - if ($line =~ m/gettext\s*\(/ && $line !~ m/^\s*function gettext/) { - extract_msg($s, $., $line); - } - } - close($SRC_FH); -} - -my $filename = $options->{o} // "messages.pot"; -Locale::PO->save_file_fromhash($filename, $href); - -- 2.39.2