-
Notifications
You must be signed in to change notification settings - Fork 36
/
README
143 lines (103 loc) · 5.57 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
For the latest information about "NiuTrans" Platform, please visit our website at:
http://www.nlplab.com/NiuPlan/NiuTrans.html
NiuTrans - SMT platform
Copyright (C) 2011-2014, NEU-NLPLab (http://www.nlplab.com). All rights reserved.
This platform is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
This platform is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public
License along with this platform; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
I) Introduction
NiuTrans is an open-source statistical machine translation system developed by
the Natural Language Processing Group at Northeastern University, China.
The NiuTrans system is fully developed in C++ language. So it runs fast and uses less memory.
Currently it has already supported phrase-based, hierarchical phrase-based and syntax-based models for research-oriented studies.
II) Features
1. Written in C++. So it runs fast.
2. Multi-thread supported
3. Easy-to-use APIs for feature engineering
4. Competitive performance for Chinese-Foreign translation tasks
5. A compact but efficient n-gram language model is embedded. It does not need external support from other softwares (such as SRILM)
6. Supports multiple SMT models
a) Phrase-based model
b) Hierarchical phrase-based model
c) Syntax-based model
III) Requirements
For Windows users, Visual Studio 2008, Cygwin, and perl (version 5.10.0 or higher) are required.
It is suggested to install cygwin under path "C:\" by default.
For Linux users, gcc (version 4.1.2 or higher), g++ (version 4.1.2 or higher),
GNU Make (version 3.81 or higher) and perl (version 5.8.8 or higher) are required.
NOTE: 2GB memory and 10GB disc space is a minimal requirement for running the system.
Of course, more memory and disc space are helpful if the system is trained using large-scale data.
To support large data/model (such as n-gram LM), 64bit OS is recommended.
IV) Installation
Please unpack the downloaded package (surppose that the target directory is "NiuTrans") and follow the following instructions to install the system.
For Windows users,
- open "NiuTrans.sln" in "NiuTrans\src\"
- set configuration mode to "Release"
- set platform mode to "Win32" (for 32bit OS)
or
set platform mode to "x64" (for 64bit OS)
- build the whole solution
You will then find that all binaries are generated in "NiuTrans\bin\".
For Linux users,
- cd NiuTrans/src/
- chmod a+x install.sh
- ./install.sh -m32 (for 32bit OS)
or
./install.sh (for 64bit OS)
- source ~/.bashrc
You will then find that all binaries are generated in "NiuTrans/bin/".
V) Step-by-Step
a) NiuTrans.Phrase:
Please refer to the file "NiuTrans/doc/NiuTrans.Phrase.html" to learn how to use the phrase-based engine of the NiuTrans system.
b) NiuTrans.Hierarchy:
Please refer to the file "NiuTrans/doc/NiuTrans.Hierarchy.html" to learn how to use the hierarchical phrase-based engine of the NiuTrans system.
c) NiuTrans.Syntax:
Please refer to the file "NiuTrans/doc/NiuTrans.Syntax.html" to learn how to use the syntax-based engines of the NiuTrans system.
VI) Manual
We also offer a manual to describe more details about the NiuTrans system, as well as various tricks to build better MT engines.
You can find it under the path "NiuTrans/doc/niutrans-manual.pdf".
VII) Team Member
Jingbo Zhu (Co-PI)
Tong Xiao (Co-PI)
Hao Zhang
Qiang Li
Ji Ma
Quan Du
VIII) How To Cite NiuTrans
If you use NiuTrans in your research and would like to acknowledge this project, please cite the following paper:
"Tong Xiao, Jingbo Zhu, Hao Zhang and Qiang Li. 2012.
NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation.
In Proc. of ACL, demonstration session".
IX) Get Support
For any questions about NiuTrans, please e-mail to us ([email protected]) directly.
X) History
VERSION 1.3.1 Beta --- Dec 1, 2014
* bug fixes for the t2s/t2t decoder and syntactic rule extraction module
VERSION 1.3.0 Beta CWMT2013 --- July 22, 2013
* CWMT2013 Chinses-English/English-Chinese baseline system
VERSION 1.3.0 Beta --- July 17, 2013
* bug fixes, decoder updates, data preprocessing system updates, new scripts for CWMT2013
VERSION 1.2.0 Beta --- Jan. 31, 2013
* bug fixes, decoder updates, add preprocessing system, word-alignment tool and recasing module
VERSION 1.0.0 Beta --- July 7th, 2012
* Syntax-based models are supported (string-to-tree/tree-to-string/tree-to-tree)
VERSION 0.3.0 --- April 27th, 2012
* Hierarchical phrase-based model is supported
VERSION 0.2.0 --- Oct. 29th, 2011
* 32bit OS supported
* Bug-fixing for ME-based reordering model (in parsing Berkeley syntactic parses)
* Better initial weight setting
VERSION 0.1.0 --- July 5th, 2011
XI) Acknowledgements
This project is supported in part by the National Science Foundation of China (60873091; 61073140),
Specialized Research Fund for the Doctoral Program of Higher Education (20100042110031),
and the Fundamental Research Funds for the Central Universities.
In the process of the implementation of this project, we get the support of previous graduates, they are Rushan Chen (language model) and Shujie Yao (data selection and processing).