image: a schematic representation of the topic index algorithm
Credit: Marcel Lee, Alan Spark
A recent article in The Journal of Finance and Data Science introduces an innovative method for constructing investment instruments directly from financial reports — without the need for human intervention.
This novel approach employs dynamic topic modeling (DTM), a variant of Latent Dirichlet Allocation (LDA), to analyze annual and quarterly reports from companies, uncovering hidden risk factors and translating them into tradable indices.
"The beauty of this method lies in its simplicity and transparency; it combines several established algorithms to achieve what previously was not possible,” says co-author Marcel Lee. “By automating the process, we eliminate biases and provide a cost-effective alternative to traditional index construction."
This unsupervised technique automatically selects optimal parameters, discovering implicit risk factors through the semantic analysis of corporate publications, thereby creating a new class of investment instruments — thematic indices.
The study describes the model's capacity to dynamically track economic and industrial trends, illustrating that sectors considered static are in reality constantly evolving. This method captures the fluid nature of industries more accurately than traditional static classifications like GICS or ICB.
"We're observing the industrial landscape through a much sharper and multicoloured lens, enabling investors to tap into nuanced market themes and risk factors previously inaccessible," adds co-author Alan Spark.
In several cases, the research demonstrated that these newly created thematic indices closely mimic established indices, yet are derived without the predefined biases of manual classification systems. “This not only paves the way for a more unbiased benchmarking tool but also reveals industry trends and vocabulary shifts over time, offering a fresh perspective on sectoral dynamics,” says Lee.
One notable challenge acknowledged by the researchers is the approach’s reliance on a ‘bag-of-words’ model, which, while instrumental in parsing large datasets, overlooks the nuanced relationships between words. “Future iterations of this work aim to incorporate more complex models that capture these subtleties, potentially enhancing the predictive power of thematic indices on corporate actions and industry shifts,” shares Spark.
Journal
The Journal of Finance and Data Science
Method of Research
Computational simulation/modeling
Subject of Research
Not applicable
Article Title
Unsupervised generation of tradable topic indices through textual analysis
COI Statement
The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work in this paper.