Projects
Samples of My Work
Data Engineering on PHI data
Confidential
Working with researchers and Ph.D. students to build a data retrieval system to handle Protected Health Information of size 300 gigabytes, and to consolidate all the variables and data points at a single location.
Using random sampling techniques to reduce the data set size (to 2%) to perform statistical analysis, and then converting the SQL queries to run on the university’s supercomputing resources to obtain the results on full data set.
Social Media Analysis for Financial Decision-Making
Social Media Mining
Research project utilizing the live data sets from Reddit, Twitter, Youtube and, S&P500 to do analysis using statistical and NLP techniques. NLP (Natural Language Processing) techniques include Sentiment Analysis and LDA topic modeling.
Predicting review ratings from review text
Yelp Dataset challenge
Build a collaborative filtering based recommender systems as a part of the project to recommend friends on Yelp data.
A study of different machine learning and deep learning models to predict review ratings from review text achieving 65% accuracy on two layer LSTM based model. Used various NLP based feature engineering to improve prediction accuracy.