Thematic Modeling in Python

Analyzing Reviews on M.Video Computer Monitors

Introduction

  • Thematic modeling in Python for analyzing reviews
  • Benefits for marketing agencies and electronics retailers
  • Challenges of manual analysis
  • Using text normalization and stop word removal
  • Introducing the pymorphy2 library
  • Parsing the dataset using a web scraper
  • Overview of computer monitor reviews dataset

Text Preprocessing

  • Cleaning and normalizing the reviews
  • Removing stop words and special characters
  • Transliterating Cyrillic 'Ñ‘' to 'е'
  • Removing punctuation and extra spaces
  • Recommendation to remove words shorter than three characters

Topic Modeling

  • Introduction to Latent Dirichlet Allocation (LDA)
  • LDA as a probabilistic modeling technique
  • Training the LDA model using the sklearn library
  • Choosing the number of topics
  • Visualizing the topics using pyLDAvis

Topic Analysis

  • Analyzing the top words for each topic
  • Understanding the key themes of the topics
  • Linking topics to specific monitor features
  • Identifying topics with positive or negative sentiment
  • Counting the number of reviews per topic

Topic Prediction

  • Creating a topic prediction function
  • Handling missing or unclassified data
  • Demonstration of predicting a monitor topic

Conclusion

  • Summary of the presentation
  • Reiterate the benefits of thematic modeling
  • Suggest further improvements and research areas