Open:FactSet Forum

Working with Japanese Earnings Call Transcripts: StanfordNLP

transcripts
factset
machine-learning
(Drake Bushnell) #1

Working with Japanese Earnings Call Transcripts: StanfordNLP

What open-source packages are you using in the NLP space?

I had always found Python’s NLTK package easy to use. A recent blog post peaked my interest in the StanfordNLP Python Package’s latest version.

A few months ago, Stanford released an updated version of the Python package built on top of PyTorch. The package contains several out of the box tools for NLP but the most interesting feature to me was the support for 53 languages featured in 73 treebanks.

To test this functionality, I wanted to use the tool on an excerpt from earnings calls transcribed in English and Japanese. The excerpts below come from the BASF’s Q4 2018 and Teijin Limited’s fiscal 2017 earnings calls.

I replicate the steps found in the above blog post. Attached is the Jupyter Notebook I used.

Here is a basic example using the English transcription. Nothing groundbreaking here but it helps frame the Japanese example.
image
Part of Speech Results:
image
Above we can see the various words and their associated part of speech.

Repeating the above example but this time using a Japanese transcription along with StanfordNLP’s Japanese dictionary and treebank.


Part of Speech Results:
image
We can see in StanfordNLP was able to identify each of the words. Two quick examples: 四半(quarter) and 期連 (term) as a noun.

I’d be curious to hear what other open-source or licensed software others are using in the space. Is language support a key driver in the decision making process?

Working with StanfordNLP - Japanese and English Transcripts.ipynb (20.7 KB)

7 Likes