daily log 11.30.19

1 minute read

RESOURCES for when I return to this mess

How to Properly Perform LDA – in R

Possible help here

Possible help2 here

UPDATE:

  1. INPUT DF
  2. Get mini neg df (0 and 1)
  3. INPUT DF
  4. Get mini pos df (3 and 4)

ON PLANE

  1. Re-write nlpia LDA algos
  2. Use them on Congress docs
  3. Use them on Screenshot docs

ORGANIZE EXISTING CODE:

  1. Into reusable chunks (for vis, for processing etc)
  2. Into “posts” for portfolio

KAGGLE SENTIMENT WITH LDA:

  1. Import docs
  2. separate into 0 and not zero
  3. separate into 1 and not 1
  4. separate into 2 and not 2
  5. separate into 3 and not 3

  6. separate into 01 and 34

Separate into 01 and 234 get 0 1 back for 0, split into 0 and 1 again if

  1. GET 0 and 1
  2. GET 3 and 4
  3. LEFTOVER is 2

INPUT: binary df OUTPUT: prediction

FOR KAGGLE:

  1. Use training data to get centroids – CENTROIDS FOR 0,1 & 2,3,4 – GET MINI NEG DF —- Run get_centroids(mini_neg_df)

THEN run get_lda_submission

  1. Use training data to get centroids – CENTROIDS FOR 0,1,3 & 3,4 – GET MINI POS DF —- Run get_centroids(mini_pos_df) —- Turn 0 and 1 into 3 and 4

  2. Use training data to get centroids for negatives (STEP 1b)


some pseudo code

`

INPUT OG DF

OUTPUT 0,1 df and 3,4 df

def get_small_df():

def get_negatives(df, new_df): # split big df # take 0 # split again # PRINT! # new 0 = 0, 1 = 1

new_df[‘actual_label’]

# return new_df

def get_positives(df, new_df): # split big df # take 1 # split again # new 0 = 3, 1 = 4

`

Updated: