Keywords: Natural Language Processing, NLTK, Parsing, Tokenize
This was an assignment for my AI course at the University of Genoa. The task was to download some electronic books from Project Gutenberg -> Find the longest sentence -> Analyze what syntactic construction(s) were responsible for such long sentences.
We made use of various parsers including NLTK, Stanford CoreNLP and openNLP. We also made use of a large grammar, namely CT grammar, ATIS grammar and PT grammar. Also we tried to build our own grammar using in-built functions of NLTK.
In the end, we were able to scan the books for long sentences and find the longest sentence using the inbuilt functions of gutenberg object in NLTK. But we were unsuccesful in building the syntax tree of these long sentences and determining the syntatic construction(s) responsible for such long sentences, we found the following obstacles and drawbacks associated with NLTK.