Extracting and analyzing SEC DEF14A filings. This project includes a command-line tool for parsing the "Compensation Discussion and Analysis" section from DEF14A filings, outputting the data in plain text. Additionally, it features a pipeline that combines LangChain and GPT-4-mini to transform the extracted text into structured data, facilitating streamlined financial analysis and reporting.
This study focuses on reproducing and extending the 2023 ACL paper by Hallinan et al. on MaRCo, a novel approach for text detoxification. MaRCo addresses the challenge of handling toxic language by using a weakly-supervised algorithm with autoencoder Language Models (AE-LMs). It involves masking potentially toxic content by comparing probability distributions from expert and anti-expert models and replacing these sections with safer tokens.
This project focuses on building two types of recommendation systems: an Item-Based Collaborative Filtering model and a Model-Based approach, using user review datasets to predict ratings for businesses. The goal is to accurately predict the star ratings that a user is likely to give to a particular business based on historical data of user-business interactions, providing personalized recommendations.