daily log 11.30.19
RESOURCES for when I return to this mess
How to Properly Perform LDA – in R
Possible help here
Possible help2 here
UPDATE:
- INPUT DF
- Get mini neg df (0 and 1)
- INPUT DF
- Get mini pos df (3 and 4)
ON PLANE
- Re-write nlpia LDA algos
- Use them on Congress docs
- Use them on Screenshot docs
ORGANIZE EXISTING CODE:
- Into reusable chunks (for vis, for processing etc)
- Into “posts” for portfolio
KAGGLE SENTIMENT WITH LDA:
- Import docs
- separate into 0 and not zero
- separate into 1 and not 1
- separate into 2 and not 2
-
separate into 3 and not 3
- separate into 01 and 34
Separate into 01 and 234 get 0 1 back for 0, split into 0 and 1 again if
- GET 0 and 1
- GET 3 and 4
- LEFTOVER is 2
INPUT: binary df OUTPUT: prediction
FOR KAGGLE:
- Use training data to get centroids – CENTROIDS FOR 0,1 & 2,3,4 – GET MINI NEG DF —- Run get_centroids(mini_neg_df)
THEN run get_lda_submission
-
Use training data to get centroids – CENTROIDS FOR 0,1,3 & 3,4 – GET MINI POS DF —- Run get_centroids(mini_pos_df) —- Turn 0 and 1 into 3 and 4
-
Use training data to get centroids for negatives (STEP 1b)
some pseudo code
`
INPUT OG DF
OUTPUT 0,1 df and 3,4 df
def get_small_df():
def get_negatives(df, new_df): # split big df # take 0 # split again # PRINT! # new 0 = 0, 1 = 1
new_df[‘actual_label’]
# return new_df
def get_positives(df, new_df): # split big df # take 1 # split again # new 0 = 3, 1 = 4
`