NooJ
NooJ is a linguistic development environment software as well as a corpus processor constructed by Max Silberztein. NooJ allows linguists to construct the four classes of the Chomsky-Schützenberger hierarchy of generative grammars: Finite-State Grammars, Context-Free Grammars, Context-Sensitive Grammars as well as Unrestricted Grammars, using either a text editor (e.g. to write down regular expressions), or a Graph editor.[1]
Written in | Java C# |
---|---|
Website | http://www.nooj4nlp.org/ |
NooJ allows linguists to develop orthographical and morphological grammars, dictionaries of simple words, of compound words as well as discontinuous expressions, local syntactic grammars (such as Named Entities Recognizers),[2][3] structural syntactic grammars (that produce syntactic trees) as well as Zellig Harris‘ transformational grammars.
All NooJ parsers process Atomic Linguistic Units (ALUs), as opposed to word forms (i.e. sequences of letters between two space characters).[4] This allows NooJ’s syntactic parser to parse sequences of word forms such as “can not” exactly as contracted word forms such as “cannot” or “can’t”. This allows linguists to write relatively simple syntactic grammars, even for agglutinative languages. ALUs are represented by annotations that are stored in the Text Annotation Structure (or TAS): all NooJ parsers add, or remove annotations in the TAS. A typical NooJ analysis involves applying to a text a series of elementary grammars in cascade, in a bottom-up approach (from spelling to semantics).
History
NooJ originated in investigations by Silberztein and the INTEX community of linguist users into the Lexicon-Grammar approach of Maurice Gross’ LADL, which states that no grammar rule can be developed independently from a strict delimitation of its domain of application.
NooJ has been used as a corpus processor by researchers in Linguistics,[5][6] History,[7] in Psychology,[8][9] in Literature studies,[10] in sentiment analysis projects,[11] data mining,[12][13][14] and even for processing musical notes.[15] For instance NooJ was used in the MARS 500 experiment[16] but also by several computer software companies to build Information Extraction and Information Retrieval software.
Complexity and application
NooJ’s dictionaries are represented by finite-state transducers and can represent simple words[17] (e.g. table), compound words[18] (e.g. as a matter of fact) as well as discontinuous expressions such as phrasal verbs (e.g. to turn … off),[19] idiomatic expressions[20] (e.g. to take the bull by the horns) as well as support verb/predicative noun associations (e.g. to take a nap).
NooJ allows linguists to create, edit, debug and maintain a large number of grammars that belong to the four classes of generative grammars in the Chomsky-Schützenberger hierarchy: finite-state grammars, context-free grammars, context-sensitive grammars and unrestricted grammars.
NooJ can often apply grammars to texts in linear time: for instance, most NooJ Context-Free Grammars can often be derecursived. NooJ Context-Sensitive Grammars are made of two parts: one part is a Context-Free (or even a Finite-State Grammar) that is applied to texts very efficiently, the second consists in a set of constraints applied to matching sequences, each one performed in constant time. NooJ unrestricted grammars are context-sensitive grammars that can contain variables and can modify the text input. They are typically used to perform transformational analysis & generation (see Zellig Harris), but several teams of linguists have shown that, when used in conjunction with multilingual lexicons, they can be used to perform Machine Translation[21][22]
References
- Silberztein M., 2015. La formalisation des langues : l'approche de NooJ. ISTE: London (426 p.).
- Fehri H., Haddar K. and Ben Hamadou A. 2011. A new representation model for the automatic recognition and translation of Arabic Named Entities with NooJ. RANLP 2011 (Hissar, Bulgaria)
- Mota C. and Grishman R. 2008. Is this NE tagger getting old? Proceedings of LREC 2008. Marrakech: ELRA, pp. 1196-1202.
- Silberztein M., 2003. NooJ manual
- Mesfar S. 2011. Towards a Cascade of Morpho-syntactic Tools for Arabic Natural Language Processing. Computational Linguistics and Intelligent Text Processing, LNCS Vol 6008, Springer, pp. 150-162
- Trouilleux, F. 2014. Un dictionnaire et une grammaire de composés français. TALN 2014, Marseille
- Gucul-Milojević S., Radulović V. and Krstev C. 2010. A View on the Representation of Women in Serbian Newspaper Texts. Applications of Finite-State Language Processing : Selected Papers from the NooJ 2008 International Conference (Budapest, Hungaria). Edited by Kuti Judit, Silberztein Max, Varadi Tamas. Cambridge Scholars Publishing, Newcastle., UK: 166-176
- Ehmann B., Lendvai P., Pólya T., Vincze O., Miháltz M., Tihanyi L., Váradi T. and László J. 2012. Narrative Psychological Application of Semantic Role Labeling. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2011 International Conference (Dubrovnik, Croatia). Edited by Kristina Vučković, Božo Bekavac and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 218-228
- Pilar L. and Reimerink A. 2014. From term dynamics to concept dynamics: term variation and multidimensionality in the psychiatric domain. Proceedings of EURALEX 2014. Bolzano, July 15-19., Italy
- Mesfar S., Gambin M. and Piton O. 2012. In the Pursuit of a Lost Manuscript: Ptolemy’s Planisphaerium. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2011 International Conference (Dubrovnik, Croatia). Edited by Kristina Vučković, Božo Bekavac and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 205-217
- Merkler D. and Agić Ž. 2013. Sentiscope: A System for Sentiment Analysis in Daily Horoscopes. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2012 International Conference (Paris, France). Edited by Anaïd Donabédian, Victoria Khurshudian and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 173-181
- Elia A., Vietri S., Postiglione A., Monteleone M. and Marano F. 2010. Data Mining Modular Software System. SWWS2010 - Proceedings of the 2010 International Conference on Semantic Web & Web Services, Las Vegas, Nevada, USA, pp. 127-133. ISBN 9781601321619
- Matos S., Barreiro A. and Oliveira J.L. 2009. Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature. Progress in Artificial Intelligence, LNCS Vol. 5816, pp. 79-85.
- Pilar L. and Faber P. 2012. Causality in the Specialized Domain of the Environment. Proceedings of the Workshop Semantic Relations-II. Enhancing Resources and Applications (LREC12), eds. Mititelu V.B., Popescu O. and Pekar V. Istanbul:ELRA, Turkey, pp. 10-17.
- Kocijan K., Librenjak S. and Dovedan Z. 2014. Introducing Music to NooJ . Formalising Natural Languages with NooJ 2013 : Selected Papers from the NooJ 2013 International Conference (Saarbrücken, Germany). Edited by Svetla Koeva, Slim Mesfar and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 209-222
- Ehmann B., Balázs L., Shved D., Bénet V. and Gushin V. 2013. The Russian Linguistic Resources in Space Psychological Research. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2012 International Conference (Paris, France). Edited by Anaïd Donabédian, Victoria Khurshudian and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 150-161
- Piton O., Lagji Kl. and Përnaska R. 2007. Electronic Dictionaries and Transducers for Automatic Processing of Albanian Language. Proceedings of 12th International conference NLDB 2007, CNAM, Paris, France. LNCS Series, Springer Verlag, pp.407-413.
- Chadjipapa E., Papadopoulou E. and Gavriilidou Z. 2010. New data in the Greek NooJ module: Compounds and Proper Nouns. Applications of Finite-State Language Processing : Selected Papers from the NooJ 2008 International Conference (Budapest, Hungaria). Edited by Kuti Judit, Silberztein Max, Varadi Tamas. Cambridge Scholars Publishing, Newcastle., UK: 93-100
- Machonis P.A. 2010. English Phrasal Verbs: from Lexicon-Grammar to Natural Language Processing. Southern Journal of Linguistics 34.1, United-States: 21-48
- Vietri S. 2014. Idiomatic Constructions in Italian. A Lexicon-Grammar Approach. John Benjamins BV: Amsterdam Netherlands. ISBN 9789027231413
- Barreiro A. 2008. Port4NooJ: Portuguese Linguistic Module and Bilingual Resources for Machine Translation. In Proceedings of the 2007 International NooJ Conference (Barcelona, Spain). Edited by Xavier Blanco and Max Silberztein. Cambridge Scholars Publishing, Newcastle , UK: 19-47
- Soussi R., Mesfar S. and Faget M. 2014. STORM Project: Towards a NooJ Module within Armadillo Database to Manage Museum Collection . Formalising Natural Languages with NooJ 2013 : Selected Papers from the NooJ 2013 International Conference (Saarbrücken, Germany). Edited by Svetla Koeva, Slim Mesfar and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 223-232
External links
- The NooJ Web Site hosts NooJ, its instruction manual, linguistic resources for a dozen languages, links to several NooJ conferences as well as over 200 paper references.
- There are several YouTube video tutorials on NooJ in