数据:402900 对Quora上的提问。
Conducted EDA and created histograms, word cloud, etc. - 描述性分析
Processed data (tokenization, stemming) and performed Bag-of-Words transformation. - 数据清洗
Defined similarity calculation functions for five distances, which are Cosine, Manhattan, Euclidean, Jaccard, and Minkowski.
Conducted baseline assessments with these five similarity functions and calculated log loss of each.
Performed SVR, Random Forest Regressor, and Decision Tree Regressor with the similarity matrix and output log loss for further comparison. - 机器学习