Day 1

less than 1 minute read

IST 736

1.1 Readings

1.2 Student intro

1.3 Text Representation/Vectorization

1.4 Exploratory Text Mining

  • Corpus Statistics
  • Document Clustering
  • Topic Modeling

1.5 Predictive Text Mining

  • Text categorization
    • Sentiment classification
    • News topic classification
    • Genre classification
    • USING: Naive Bayes and SVM algorithms
  • Regression problems

1.6 Difference between Text Mining and NLP

  • NLP
    • Deep linguistic analysis
    • (May take a long time to analyze large collections)
  • Text Mining
    • Shallow analysis (e.g. N-grams) for quick analysis of large collections
    • (Sometimes use deep NLP features like PoS tags or dependencies for feature engineering)

1.7 Class Policies


UNSCHOOLING:

Khan Academy

Updated: