Data Prep
Counting Unique
df.groupby([‘group’]).agg([‘min’,’max’,’count’,’nunique’])
df.groupby(‘param’)[‘group’].nunique()
Dealing with Skewness
As mentioned in most of the answers that there are various ways of dealing with skewed data. I would just like to highlight that SMOTE is one of the recommended ways to overcome this skewness.
All of the above answers covers the techniques to overcome the issue. If you choose to do upsampling/downsampling then the imblearn package in python can helpful. It includes several techniques to deal with imbalanced data in general. (I wanted to add as comment in Rahul’s answer but don’t have enough reputations.) – smm Feb 4 at 0:13
When you want it stratified
