PROJECTS
Solving La Ghigliottina via Natural Language Processing
Tackling the famous Italian game where a target word or phrase links multiple clues, using phrases and sentences gathered from numerous sources to extract nouns. Furthermore, multiple novel algorithms and transformers were implemented to achieve this task with rudimentary success.
Key Targets :
Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, NLTK, Skip-Gram Classification, WordNet, Recurrent Neural Network, Long Short Term Memory (LSTM), Bidirectional Auto Regressive Transformer (BART), T5 Architecture, Sentence Transformer, GPT-2
Programming Language :
Python
Privacy-Preserving Image Processing for Face Recognition Algorithms
Face data for most people does not stay protected against third-party corporations - this technique utilizes two privacy-preserving techniques - differential privacy and homomorphic encryption while passing data to a convolutional neural network, ensuring that this data cannot be accessed at any point of its computation.
Key Targets :
Differential Privacy, Homomorphic Encryption, Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Convolutional Neural Networks, Tensorflow, OneHot Encoding, LabelEncoding, Stochastic Gradient Descent, DIfferential-Privacy-Adam-Optimizer
Programming Language :
Python
Abstract Text Summarization Using Generative Adversarial Networks
Generating summaries from text data via abstraction; not relying on text directly present in the passage.
Key Targets :
Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
TFIDF Vectorization, BERT, RoBERTa, SiameseBERT
Programming Language :
Python
Identification and Classification of Conditions owing to Age from ICR dataset
Among a myriad studied health conditions, identifying, collating, re-engineering, and analyzing factors that affect the onset of ailments specifically based on the age of the patients.
Key Targets :
Supervised Machine Learning, Classification, Statistical Inference, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Classification Algorithms (scikit_learn), OneHot Encoder, GradientBoost
Programming Language :
Python
Analyzing and Predicting Factors Affecting Credit Card Approvals
For a group of 100,000 registered clients, inspecting, engineering, and analyzing factors that would influence the decision of banks to provide or deny lines of credit. Further, prediction was implemented to devise an algorithm that would achieve the same task, and optimization was implemented to be employed on outliers.
Key Targets :
Machine Learning, Regression, Statistical Inference, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
tidyverse, ggplot2, dplyr
Programming Language :
R Programming
Generic Clustering System for Any Formatted Dataset
Devised a clustering algorithm based on K-Means and Agglomerative clustering techniques that could take in any data of a specific format and provide recommendations from clusters. Tested on Netflix data and University collection data. Further, devised a graph visualizing the clustering process via NetworkX.
Key Targets :
Machine Learning, Clustering, Feature Engineering, Data Analysis, Data Visualization
Key Libraries :
Pandas, Plotly, NLTK, OneHot Encoding, Label Encoding, K-Means Clustering, Agglomerative Clustering, Silhouette Score, Elbow Method, NetworkX.
Programming Language :
Python
Understanding the Role of Optimizers on Convolutional Neural Networks
Testing the improvements in terms of accuracy and averaged loss for 3 optimizers - Adam, SGDOptimizers, RMSProp - on a Neural Network. Also tested the same for a Recurrent Neural Network.
Key Targets :
Neural Networks, Optimization of Neural Networks
Key Libraries :
Tensorflow, PyTorch, Adam, SGDOptimizer, RMSProp
Programming Language :
Python
Analyzing Customer Demographics and Behaviour to Strategize Targeted Marketing for an Online Retail Store
Devised targeted and broad marketing strategies for an Online Retail Company's 85000+ products across 23 categories based on sales, overall perception, shipping capabilities and general demand across the landscape.
Key Targets :
Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
tidyverse, ggplot2, dplyr
Programming Language :
R Programming
Building a Twitter Database with an Optimized Search via Machine Learning and Sentiment Analysis
Developed an optimized algorithm to pull information from relational (PostgreSQL) and non-relational database (MongoDB) containing over 1,000,000 tweets. Further developed an algorithm to improve search based on hashtags, users, and a custom method of ranking engagement, optimized using sentiment analysis.
Key Targets :
Database Management, Machine Learning, Data Analysis, Feature Engineering, Data Visualization, Sentiment Analysis
Key Libraries :
Pandas, Plotly, NLTK, TextBlob, PostgreSQL, MongoDB, SQL, MySQL
Programming Language :
Python
Analyzing Factors Affecting the S&P500 Index and Forecasting the Stock Price for a Company in the Highest Performing Sector
Collated over 10 years of S&P500 data for each company within its 11 sectors. Provided a macro-micro approach, analyzing each sector's impact on the overall index and the impact of top-performing companies within the sector, and finally, forecasting Pfizer's stock for a year within the healthcare sector.
Key Targets :
Machine Learning, Regression, Data Analysis, Feature Engineering, Data Visualization, ARIMA, ARIMAx, SARIMA, Box Test, Chi-Square Test
Key Libraries :
tidyverse, ggplot2, dplyr
Programming Language :
R Programming
A Computational Linguistics Approach to Clustering Scientific Research Papers
For a collection of over 250,000 research papers whose title and abstract were publicly available, devised a clustering algorithm to categorize these papers into broad fields and niche subfields. For Latin and scientific text, Computational Linguistics was introduced atop NLP.
Key Targets :
Machine Learning, Clustering, Neural Networks, Data Analysis, Feature Engineering, Data Visualization, tSNE, uMAP
Key Libraries :
Pandas, Plotly, NLTK, Recurrent Neural Network, tSNE, uMAP, Tensorflow, PyTorch, BERT, RoBERTa, SiameseBERT, K-Means Clustering
Programming Language :
Python
Identification and Analysis of Factors Affecting the Job Market for Aspiring Data Scientists, and Predicting Possibilities of Garnering a Job
Analyzed over 100,000 potential candidate portfolios to identify factors that enable or inhibit the possibility of garnering a job in Data Science. Specific focus paid towards experience, projects, collaborative nature
Key Targets :
Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Classification Algorithms (scikit_learn), SMOTE, Tensorflow & Keras, GBM Classifier
Programming Language :
Python
Recommender System for Netflix Data
Employed Clustering and Neural Networks to develop a recommender system for movies and TV shows on Netflix, with an emphasis on watchtime, search, ratings. Also developed a rudimentary method of retaining search history to improve search experience.
Key Targets :
Machine Learning, Clustering, Neural Networks, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, NLTK, Clustering Algorithms (scikit_learn), Word2Vec, Tensorflow & Keras
Programming Language :
Python
Analyzing Factors Influencing Heart Disease and Predicting Risk of Onset Among Patients
For 85,000 patients over the age of 60, analyzed health conditions and vital reports to understand factors that influenced the onset of any cardio-related ailment. Also predicted the risk of contracting such an ailment from the data.
Key Targets :
Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Classification Algorithms, Label Encoding, OneHot Encoding, Boosting Algorithms
Programming Language :
Python
Survival Guide for the Safest Nations During the COVID Pandemic - Analysis of Economy and Demographic Indexes
Analyzed Economic Factors such as GDP, GDP per Capita, Pharmaceutical Exports and Imports, Human Development Index, Stringency Index and other demographic indexes
Key Targets :
Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, NLTK, Skip-Gram Classification, WordNet, Recurrent Neural Network, Long Short Term Memory (LSTM), Bidirectional Auto Regressive Transformer (BART), T5 Architecture, Sentence Transformer, GPT-2
Programming Language :
Python
Analyzing Cricket Matches and Ball-by-Ball Play of the Indian Premier League - Creating an "Unbeatable" Team from the Analysis Stats
A multi-billion dollar sporting league, that garners millions of fans from around the world. An analysis into 8 seasons of the IPL, with a 170 matches per season, to determine the success factors for any team. Also, this algorithm creates an "unbeatable" team based on its analysis.
Key Targets :
Data Analysis, Feature Engineering, Data Visualisation
Key Libraries :
Pandas, Plotly, Seaborn
Programming Language :
Python
Analyzing and Predicting House Prices
Application of analysis and rudimentary supervised machine learning to predict house prices based on feature engineering.
Key Targets :
Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualisation
Key Libraries :
Pandas, Plotly, Seaborn, Classification Algorithms (scikit_learn)
Programming Language :
Python
Analyzing and Predicting Mobile Prices
Application of analysis and rudimentary supervised machine learning to predict mobile and smartphone prices based on feature engineering.
Key Targets :
Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, Classification Algorithms (scikit_learn)
Programming Language :
Python
Titanic - A Survival Prediction Guide
Application of Machine Learning and Feature Engineering to predict survival rate across various classes aboard the Titanic.
Key Targets :
Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualisation
Key Libraries :
Pandas, Plotly, Seaborn, Classification Algorithms (scikit_learn)
Programming Language :
Python
Analyzing Suicide Rates for the Past 40 Years Around the World
A deep analysis into suicide rates across major countries around the world, over a timespan of 40 years, and the factors that influence it.
Key Targets :
Data Analysis, Feature Engineering, Data Visualisation
Key Libraries :
Pandas, Plotly, Seaborn
Programming Language :
Python
Sentiment Analysis on Live Tweets
Analyzing live tweets and categorizing them via sentiment. Further n-gram prediction to predict follow-up words to search query, based on the tweets.
Key Targets :
Natural Language Processing, Machine Learning, Data Analysis, Feature Engineering, Data Visualization
Key Libraries :
Pandas, Plotly, NLTK, n-gram, Word2Vec, Textblob, Seaborn
Programming Language :
Python
An Artificially Intelligent "Unbeatable" Player for Checkers
A computer playing checkers against any user, learning with each move played in the game to ultimately become an undefeatable machine.
Key Targets :
Artificial Intelligence, Minimax Algorithm
Key Libraries :
Minimax Algorithm, PyGame
Programming Language :
Python
Solving Sudoku via Machine Learning
The computer is presented a Sudoku puzzle, and it can solve any valid puzzle it gets.
Key Targets :
Machine Learning, Back-Tracking Algorithm
Key Libraries :
Back-Tracking Algorithm, PyGame
Programming Language :
Python
Rudimentary Chatbot Designed with Deep Learning
Skeletal Bots developed for various social messaging services, that can interact with any user and perform basic functions and replies.
Key Targets :
Deep Learning, Natural Language Processing
Key Libraries :
Tensorflow & Keras, JSON
Programming Language :
Python
Personal Cryptocurrency - The Biro Coin
A personal Cryptocurrency built on top of ECR20 token.
Key Targets :
Cryptocurrency
Programming Language :
Solidity