Text this: Modern big data preprocessing techniques