Add eflomal word aligner: Bayesian IBM1→HMM→fertility with N parallel Gibbs chains#433
Add eflomal word aligner: Bayesian IBM1→HMM→fertility with N parallel Gibbs chains#433johnml1135 wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Integrates the new Thot-backed Eflomal Bayesian word alignment model into the SIL.Machine.Translation.Thot alignment pipeline, exposing training configuration (iterations, parallel Gibbs samplers) and wiring it through the CLI and model factory.
Changes:
- Adds
Eflomalas aThotWordAlignmentModelTypeand wires it intoThotWordAlignmentModel.Create. - Introduces
ThotEflomalWordAlignmentModel, plus trainer support forEflomalNumSamplersand iteration scheduling viaThotWordAlignmentParameters. - Extends Thot interop (
Thot.cs) and CLI plumbing (AlignmentModelCommandSpec,ToolHelpers) and adds Eflomal-focused tests.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/SIL.Machine.Translation.Thot.Tests/ThotEflomalWordAlignmentModelTests.cs | Adds coverage for creating/training/aligning with Eflomal, batch alignment, save/load, and symmetric alignment. |
| src/SIL.Machine.Translation.Thot/ThotWordAlignmentParameters.cs | Adds Eflomal iteration and sampler parameters with defaults. |
| src/SIL.Machine.Translation.Thot/ThotWordAlignmentModelType.cs | Adds Eflomal model type and string mapping. |
| src/SIL.Machine.Translation.Thot/ThotWordAlignmentModelTrainer.cs | Adds an Eflomal training branch, including sampler configuration. |
| src/SIL.Machine.Translation.Thot/ThotWordAlignmentModel.cs | Wires Eflomal into the alignment model factory method. |
| src/SIL.Machine.Translation.Thot/ThotEflomalWordAlignmentModel.cs | Implements the Eflomal-specific ComputeAlignedWordPairScores behavior. |
| src/SIL.Machine.Translation.Thot/Thot.cs | Adds Eflomal alignment-model enum value, P/Invoke for sampler count, and mapping from ThotWordAlignmentModelType. |
| src/SIL.Machine.Tool/ToolHelpers.cs | Adds CLI string constant for eflomal. |
| src/SIL.Machine.Tool/AlignmentModelCommandSpec.cs | Updates help text and parameter mapping to include Eflomal iterations and samplers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #433 +/- ##
==========================================
- Coverage 73.18% 73.01% -0.17%
==========================================
Files 440 441 +1
Lines 36882 36981 +99
Branches 5075 5091 +16
==========================================
+ Hits 26991 27002 +11
- Misses 8778 8866 +88
Partials 1113 1113 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…el Gibbs chains Wires EflomalAlignmentModel from thot into machine's alignment pipeline. Depends on sillsdev/thot#11. Changes: ThotWordAlignmentModelType - Eflomal enum value + "eflomal" string alias ThotEflomalWordAlignmentModel - new class modeled on ThotFastAlignWordAlignmentModel; AlignmentScore is 1.0 (uniform) since eflomal does not expose an alignment probability ThotWordAlignmentModel.Create - Eflomal factory case ThotWordAlignmentModelTrainer - Eflomal branch: single IBM1->HMM->fertility cascade, graceful NotSupportedException when Thot NuGet predates EflomalAlignmentModel, setEflomalNumSamplers for parallel chains ThotWordAlignmentParameters - EflomalIterationCount (default 12) + EflomalNumSamplers (default 1) Thot.cs - swAlignModel_setEflomalNumSamplers/getEflomalNumSamplers P/Invoke AlignmentModelCommandSpec - --eflomal-iters + --eflomal-samplers CLI flags; ToolHelpers.Eflomal added to ValidateAlignmentModelTypeOption ThotEflomalWordAlignmentModelTests - 6 tests; [OneTimeSetUp]+Assume.That skips gracefully when installed Thot NuGet lacks EflomalAlignmentModel Quality (WPT English-French, 300k pairs, 447 gold - measured in thot): HMM: 10.4% intersection AER eflomal GPL ref: 6.58% (3 chains) This PR 1 chain: 7.52% This PR 5 chains: 6.46% (beats GPL reference) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
johnml1135
left a comment
There was a problem hiding this comment.
All Copilot review comments have been addressed — see replies on individual threads for details.
Line 105 was 131 characters (exceeds max_line_length=120). Wrap the three-part HasValue condition and the return expression across lines, and add required braces (IDE0011) around the now-multi-line body. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Wires
EflomalAlignmentModelfrom thot into machine's alignment pipeline. eflomal is aBayesian IBM1→HMM→fertility cascade trained by collapsed Gibbs sampling — competitive with
or better than reference eflomal GPL on WPT English–French AER (6.46% at 5 samplers vs
6.58% for the GPL reference), and well ahead of HMM (10.4%).
See sillsdev/thot#11 for the thot implementation this PR depends on.
Changes
ThotWordAlignmentModelType—Eflomalenum value +"eflomal"string alias (existing commit).ThotEflomalWordAlignmentModel— new class (existing commit), modeled onThotFastAlignWordAlignmentModel.Added to
ThotWordAlignmentModel.Createfactory.ThotWordAlignmentModelTrainer— Eflomal branch: creates a single model that runs the fullIBM1→HMM→fertility cascade internally and drives it for
EflomalIterationCountsweeps (default 12).Now also calls
swAlignModel_setEflomalNumSamplersso the number of parallel Gibbs chains isconfigurable at train time.
ThotWordAlignmentParameters—EflomalIterationCount(existing) +EflomalNumSamplers(new):number of independent Gibbs chains trained in parallel; marginals summed across chains before argmax
(eflomal's
n_samplersscheme). Default 1.Thot.cs—swAlignModel_setEflomalNumSamplers/swAlignModel_getEflomalNumSamplersP/Invokedeclarations.
AlignmentModelCommandSpec—--eflomal-iters(existing) +--eflomal-samplers(new) CLI flags.Quality (WPT English–French, 300k training, 447 gold pairs — measured in thot)
Test plan
ThotEflomalWordAlignmentModelTests: CreateTrainer+Align, AlignBatch, translation probability,vocab counts, save/load round-trip, symmetrized model — all pass (dotnet test against local thot dll)
TreatWarningsAsErrors=true)🤖 Generated with Claude Code
This change is