A Machine Learning-Based Approach to Thematic ESG Investing: Case Study in GEO (Green Economy Opportunities)
Key Takeaways
- Machine learning (ML) has surprising benefits in the creation of thematic portfolios.
- In the climate context, for example, sophisticated natural language processing (NLP) can identify companies aligned with the energy transition more comprehensively using richer information, and in a more timely fashion, than traditional methods.
- By exploiting the resulting investment universe, a systematic investing approach can produce thematic portfolios that have superior financial characteristics relative to conventional, narrow thematic strategies.
Table of contents
Many asset owners are looking to invest in a global shift to a lower carbon economy, believing that the transition will help to determine corporate winners and losers for many years to come. For such investors, however, the potential scope of the energy transition poses both an opportunity and a challenge. While the most conspicuous beneficiaries may include companies that make or distribute clean technologies, less-apparent winners may include consumers of those technologies as well as producers of their raw materials and components. As a result, the universe of potential participants in the energy transition is expansive, diverse, and, at first blush, ambiguous.
In this note, therefore, we illustrate benefits of applying advanced natural language processing (NLP) techniques in constructing transition-aligned thematic portfolios. Specifically, we introduce a systematic predictive signal, called GEO (Green Economy Opportunities), that can identify companies associated with the transition more comprehensively, using richer information, and on a more timely basis than traditional methods.
Using the GEO signal, we then highlight often-overlooked design decisions in the creation of thematic investment strategies. In particular, we demonstrate the financial benefits of applying sophisticated systematic portfolio construction methods as opposed to following the more conventional approach of selecting from a limited universe of companies that are most manifestly aligned with the theme of interest.
Introducing GEO
To identify companies that are directly or indirectly linked to the energy transition, we took a novel approach and developed a predictive signal. The methodology uses sophisticated NLP techniques applied to a wide range of corporate disclosures to 1) identify firms with transition-related products, services, and policies and 2) gauge their commitment to sustainability and the environment.
This application of NLP techniques to analyze text has three advantages over conventional data produced by ESG analysts without the aid of such technological sophistication:
-
Scope: NLP is orders of magnitude more efficient than human beings in processing company documents, and it can be applied across many languages. Moreover, while data coverage for many traditional climate metrics is often limited to large companies, the text-based approach generates far greater coverage, particularly among mid- and small-cap stocks. (Figure 1)
-
Richness: The ability to process both qualitative and quantitative information from diverse sources allows for greater breadth and nuance in measuring thematic alignment. For example, traditional data providers frequently rely on information about companies’ revenue sources to identify thematic exposure, but that often limits coverage to “pure plays.” But because many environmental projects are long-term and capital intensive, it is especially important to expand the information set in the context of the energy transition. NLP allows investors to account for companies’ future plans through the inclusion of forward-looking statements.
-
Timeliness: The GEO signal updates automatically as new information arrives and is algorithmically processed. There is no need to wait for a scheduled review, as is the norm for traditional data providers.
Figure 1: Securities with Data Coverage by Market-Cap Tercile—GEO versus Clean Energy Metrics
The methodology underlying GEO involves training a complex machine-learning algorithm to recognize and quantify the extent to which individual companies discuss environmental themes, including decarbonization plans, renewable energy, and carbon capture. The first step in that process is to train the algorithm to recognize salient characteristics of information related to the energy transition by having it process reports on the topic published by various industry organizations.
Using a subset of ML called supervised learning, we then apply an algorithm to analyze a wide range of company disclosures, among them earnings call transcripts, regulatory filings, and annual shareholder meetings, to measure firms’ alignment with the energy transition. Figure 2 provides illustrative examples that highlight the breadth and industry-specific nature of energy transition topics that GEO can recognize. For instance, the words highlighted in dark blue reflect discussions about decarbonization, while the words in light blue refer to discussions about renewable energy.
Figure 2: Illustrative GEO Text Processing Examples
But the model does more than simply quantify the extent to which companies speak about the transition. It also assesses the credibility of what they say by integrating prescriptive guidelines from leading climate frameworks, including the Task Force on Climate-related Financial Disclosures and the Science Based Targets initiative. For example, the NLP algorithm takes into account the granularity of decarbonization plans, including whether they refer to dates, baselines, and targets.
Figure 3 provides hard evidence of the GEO signal’s ability in measuring company-level alignment with the energy transition. The left panel shows that GEO scores predict one-year-ahead changes in carbon intensity. Specifically, if we sort stocks in a combined developed and emerging market universe by the GEO signal, we see a monotonic pattern of subsequent peer-relative declines in carbon intensity. The right panel provides corroborative evidence, showing that companies with the highest GEO scores are more likely to commit to science-based targets or hold a higher proportion of green patents. The demonstrably forward-looking nature of GEO contrasts with conventional climate metrics, such as scope 1 and 2 emissions, which are backwards-looking and often reported with a lag.
Figure 3: GEO Environmental Characteristics
GEO-Based Portfolios: Benefits of a Systematic Implementation
Because the GEO signal allows for better exploitation of the full opportunity set in creating low-carbon-transition portfolios, it also brings to the fore certain design decisions that often go overlooked in the creation of thematic strategies, specifically in relation to the breadth of the investment universe and portfolio construction.
To illustrate, we compare three implementations of the GEO signal. As a common baseline, we start with a hypothetical Low Carbon strategy that is benchmarked to the MSCI World Index. While this baseline integrates ESG-related alpha signals, a net zero glidepath, and exclusions of climate laggards based on fossil fuel revenues and forward-looking considerations, it does not make use of GEO to identify stocks associated with the energy transition.1
In framing hypothetical GEO-based strategies, we consider a conventional use case where an investor seeks positive active exposure to companies that are aligned to and might benefit from the energy transition.2 We compare three approaches to achieving this objective:
-
Restricted Universe: Although systematic in implementation, this approach echoes a key aspect of conventional discretionary thematic strategies, which tend to draw from restricted investment universes composed of those companies that are most manifestly aligned with the desired theme. As such, we rank stocks in the investable developed market universe by the GEO signal and only consider the top-20% for portfolio inclusion. To isolate the impact of limiting the investment universe in building the Restricted Universe strategy, we retain most of the other major features of the Low Carbon baseline, including its returns forecasting and portfolio construction elements.3 Nevertheless, we relax the decarbonization constraints themselves, because similar restrictions are not commonly found in discretionary thematic strategies aimed at capitalizing on the energy transition.
-
Integrated: Relative to the Restricted Universe approach, this implementation makes better use of the broader investment universe that the GEO signal affords. Specifically, the Integrated GEO strategy applies an additional portfolio-level tilt to the Low Carbon baseline (including the decarbonization constraints), requiring that its (ex-ante) GEO exposure exceed the benchmark’s by 20%.
-
Thematic: This implementation is similar to Integrated, except that it includes a stronger tilt towards GEO exposure, two times the benchmark, with the intent of providing exposure to green companies that is similar to the Restricted Universe GEO strategy.
To reiterate, each of the hypothetical strategies seeks to maximize risk-adjusted returns using CCAT’s multi-factor stock selection and portfolio construction process while also ensuring the specified minimum GEO exposure (if any). They are all rebalanced monthly from January 2016 – August 2022, and the exercises assume starting AUM of USD 1bn.4
Figure 4 shows headline results. The left panel focuses on the ESG objective. Despite the challenges associated with green revenue, one way to independently measure the effectiveness of applying GEO is to assess the exposure to companies with a material portion of their revenue from green sources, including a range of environmental solutions such as alternative energy, products that promote energy efficiency, and pollution prevention.5 While the Low Carbon baseline actually exhibits lower exposure to green revenue companies than the MSCI World Index, despite its otherwise climate-favorable construction, all three of the GEO strategies deliver positive active exposure to green revenue companies as at the end of the simulation period. Thematic GEO generates the largest increase, slightly higher than Restricted Universe GEO.
Figure 4: Key Environmental Characteristics and Financial Results—Hypothetical Strategies
The key issue for investors, however, is the financial impact of achieving the ESG gains. The right panel of Figure 4 shows that the Restricted Universe approach, which limits the investment universe as conventional thematic strategies tend to do, has a substantial cost in terms of exposure to CCAT’s alpha model, with nearly a 50% reduction relative to CCAT’s standard global equity strategy. In contrast, the Integrated and Thematic implementations, which make use of portfolio-level tilts to boost GEO exposure, exhibit minimal deterioration.6
Figure 5 provides insight as to how the Thematic GEO implementation generates both attractive environmental characteristics and financial results. For each hypothetical strategy, the left panel breaks out contributions to weighted-average carbon intensity (WACI) from stock selection and sector allocation. WACI is an important attribute to track in the context of energy transition-oriented investments, because many companies that have high clean energy exposure also have high current fossil fuels exposure. As a result, thematic strategies that focus on the energy transition also tend to have high WACI because they overweight carbon-intensive sectors. Consistent with that observation, the Restricted Universe GEO implementation, which does not include decarbonization constraints, indeed exhibits WACI in excess of the benchmark (left panel) reflective of overweights in both energy and utilities (right panel).
But the portfolio tilt-based Integrated and Thematic implementations manage to avoid this elevated WACI profile—in fact still meeting the baseline Low Carbon constraints—while also achieving their GEO targets and maintaining high alpha model exposure. The reason is that these constraint-based GEO implementations extract significant increases in GEO exposure through stock selection rather than blunt sector reallocation.
The charts in Figure 5 illustrate this point. In particular, the left panel shows that stock selection, on average, accounts for nearly all of the Thematic strategy’s benchmark-relative WACI reduction. The right panel shows that the Thematic portfolio maintains an overall sector allocation profile that is reasonably similar to the Low Carbon baseline, including underweights to both energy and utilities. The key to achieving this outcome is that the portfolio tilt-based approaches make use of the GEO signal’s ability to identify diverse companies from across sectors that are highly exposed to the energy transition, which provides greater flexibility to meet the aggressive GEO exposure target while still meeting the Low Carbon strategy’s decarbonization constraints and retaining alpha exposure.
Figure 5: Average Active WACI—Hypothetical Strategies
Figure 6 highlights another advantage of the portfolio tilt-based GEO implementations by showing the Thematic strategy’s active revenue exposure to different sub-themes of green energy, as measured using data from a third-party source (MSCI). Intuitively, the strategy exhibits positive exposure to many of these sub-themes, including recycling, sustainable agriculture, insulation, and pollution control. But it also has negative active exposure to zero emission vehicles (largely Tesla) and clean transport infrastructure.7 The underweights reflect the influence of valuation signals on stock selection, which help to ensure that the strategies do not buy thematic exposure at any price.
Moreover, many other company attributes that help to predict future returns also influence the portfolio, including governance signals that measure management’s quality and alignment with shareholder incentives. We view such governance considerations as especially important in the development of energy transition thematic portfolios, because the capital flowing into green technologies has increased the risk of overinvestment and other agency problems. Incorporating an alpha model that considers financial and non-financial characteristics may be critical to successful investment outcomes.
Figure 6: Hypothetical Thematic GEO Active Revenue Exposure—By Green Energy Sub-Theme
Conclusion
Most investors probably don’t readily associate systematic investing with thematic strategies. Yet in the thematic context, the increasingly sophisticated analysis of alternative data that has become a central focus of systematic investing offers substantial benefits.
For example, as we’ve demonstrated in the context of the energy transition, we can apply scalable machine-learning algorithms, and specifically NLP-based textual analysis, to identify a more expansive set of relevant companies on the basis of richer, more forward-looking, and more timely information than conventional methods of stock selection. Moreover, the sophisticated portfolio construction machinery that underlies modern systematic investment processes can extract greater financial value from the resulting investment universe. In the energy transition case study, we show that it better preserves exposure to a holistic stock-selection alpha model and better controls for uncompensated risk exposures than traditional methods of forming thematic portfolios.
Profitably applying these tools isn’t easy, however. It requires significant algorithmic expertise and high-performance computing infrastructure. From the standpoint of a sophisticated systematic investor, however, these foundational investments are part of a much broader and ongoing evolution of the information set and the analytical toolkit. As such, their adaptation to the development of thematic portfolios is, perhaps, surprisingly natural.