PROJECTS

Solving La Ghigliottina via Natural Language Processing

Tackling the famous Italian game where a target word or phrase links multiple clues, using phrases and sentences gathered from numerous sources to extract nouns. Furthermore, multiple novel algorithms and transformers were implemented to achieve this task with rudimentary success.

Key Targets :

Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, NLTK, Skip-Gram Classification, WordNet, Recurrent Neural Network, Long Short Term Memory (LSTM), Bidirectional Auto Regressive Transformer (BART), T5 Architecture, Sentence Transformer, GPT-2

Programming Language :

Python

focus photography of brown and gray concrete wall
focus photography of brown and gray concrete wall
Privacy-Preserving Image Processing for Face Recognition Algorithms

Face data for most people does not stay protected against third-party corporations - this technique utilizes two privacy-preserving techniques - differential privacy and homomorphic encryption while passing data to a convolutional neural network, ensuring that this data cannot be accessed at any point of its computation.

Key Targets :

Differential Privacy, Homomorphic Encryption, Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Convolutional Neural Networks, Tensorflow, OneHot Encoding, LabelEncoding, Stochastic Gradient Descent, DIfferential-Privacy-Adam-Optimizer

Programming Language :

Python

Abstract Text Summarization Using Generative Adversarial Networks

Generating summaries from text data via abstraction; not relying on text directly present in the passage.

Key Targets :

Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

TFIDF Vectorization, BERT, RoBERTa, SiameseBERT

Programming Language :

Python

white book page on white table
white book page on white table
Identification and Classification of Conditions owing to Age from ICR dataset

Among a myriad studied health conditions, identifying, collating, re-engineering, and analyzing factors that affect the onset of ailments specifically based on the age of the patients.

Key Targets :

Supervised Machine Learning, Classification, Statistical Inference, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Classification Algorithms (scikit_learn), OneHot Encoder, GradientBoost

Programming Language :

Python

an older woman holding a baby's hand
an older woman holding a baby's hand
Analyzing and Predicting Factors Affecting Credit Card Approvals

For a group of 100,000 registered clients, inspecting, engineering, and analyzing factors that would influence the decision of banks to provide or deny lines of credit. Further, prediction was implemented to devise an algorithm that would achieve the same task, and optimization was implemented to be employed on outliers.

Key Targets :

Machine Learning, Regression, Statistical Inference, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

tidyverse, ggplot2, dplyr

Programming Language :

R Programming

Generic Clustering System for Any Formatted Dataset

Devised a clustering algorithm based on K-Means and Agglomerative clustering techniques that could take in any data of a specific format and provide recommendations from clusters. Tested on Netflix data and University collection data. Further, devised a graph visualizing the clustering process via NetworkX.

Key Targets :

Machine Learning, Clustering, Feature Engineering, Data Analysis, Data Visualization

Key Libraries :

Pandas, Plotly, NLTK, OneHot Encoding, Label Encoding, K-Means Clustering, Agglomerative Clustering, Silhouette Score, Elbow Method, NetworkX.

Programming Language :

Python

assorted color plastic bottle on brown wooden shelf
assorted color plastic bottle on brown wooden shelf
Understanding the Role of Optimizers on Convolutional Neural Networks

Testing the improvements in terms of accuracy and averaged loss for 3 optimizers - Adam, SGDOptimizers, RMSProp - on a Neural Network. Also tested the same for a Recurrent Neural Network.

Key Targets :

Neural Networks, Optimization of Neural Networks

Key Libraries :

Tensorflow, PyTorch, Adam, SGDOptimizer, RMSProp

Programming Language :

Python

a blue background with lines and dots
a blue background with lines and dots
Analyzing Customer Demographics and Behaviour to Strategize Targeted Marketing for an Online Retail Store

Devised targeted and broad marketing strategies for an Online Retail Company's 85000+ products across 23 categories based on sales, overall perception, shipping capabilities and general demand across the landscape.

Key Targets :

Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

tidyverse, ggplot2, dplyr

Programming Language :

R Programming

assorted berries
assorted berries
Building a Twitter Database with an Optimized Search via Machine Learning and Sentiment Analysis

Developed an optimized algorithm to pull information from relational (PostgreSQL) and non-relational database (MongoDB) containing over 1,000,000 tweets. Further developed an algorithm to improve search based on hashtags, users, and a custom method of ranking engagement, optimized using sentiment analysis.

Key Targets :

Database Management, Machine Learning, Data Analysis, Feature Engineering, Data Visualization, Sentiment Analysis

Key Libraries :

Pandas, Plotly, NLTK, TextBlob, PostgreSQL, MongoDB, SQL, MySQL

Programming Language :

Python

white iphone 5 c displaying apple logo
white iphone 5 c displaying apple logo
Analyzing Factors Affecting the S&P500 Index and Forecasting the Stock Price for a Company in the Highest Performing Sector

Collated over 10 years of S&P500 data for each company within its 11 sectors. Provided a macro-micro approach, analyzing each sector's impact on the overall index and the impact of top-performing companies within the sector, and finally, forecasting Pfizer's stock for a year within the healthcare sector.

Key Targets :

Machine Learning, Regression, Data Analysis, Feature Engineering, Data Visualization, ARIMA, ARIMAx, SARIMA, Box Test, Chi-Square Test

Key Libraries :

tidyverse, ggplot2, dplyr

Programming Language :

R Programming

A Computational Linguistics Approach to Clustering Scientific Research Papers

For a collection of over 250,000 research papers whose title and abstract were publicly available, devised a clustering algorithm to categorize these papers into broad fields and niche subfields. For Latin and scientific text, Computational Linguistics was introduced atop NLP.

Key Targets :

Machine Learning, Clustering, Neural Networks, Data Analysis, Feature Engineering, Data Visualization, tSNE, uMAP

Key Libraries :

Pandas, Plotly, NLTK, Recurrent Neural Network, tSNE, uMAP, Tensorflow, PyTorch, BERT, RoBERTa, SiameseBERT, K-Means Clustering

Programming Language :

Python

Identification and Analysis of Factors Affecting the Job Market for Aspiring Data Scientists, and Predicting Possibilities of Garnering a Job

Analyzed over 100,000 potential candidate portfolios to identify factors that enable or inhibit the possibility of garnering a job in Data Science. Specific focus paid towards experience, projects, collaborative nature

Key Targets :

Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Classification Algorithms (scikit_learn), SMOTE, Tensorflow & Keras, GBM Classifier

Programming Language :

Python

a sign that says we are hiring and apply today
a sign that says we are hiring and apply today
Recommender System for Netflix Data

Employed Clustering and Neural Networks to develop a recommender system for movies and TV shows on Netflix, with an emphasis on watchtime, search, ratings. Also developed a rudimentary method of retaining search history to improve search experience.

Key Targets :

Machine Learning, Clustering, Neural Networks, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, NLTK, Clustering Algorithms (scikit_learn), Word2Vec, Tensorflow & Keras

Programming Language :

Python

a television with the netflix logo lit up in the dark
a television with the netflix logo lit up in the dark
Analyzing Factors Influencing Heart Disease and Predicting Risk of Onset Among Patients

For 85,000 patients over the age of 60, analyzed health conditions and vital reports to understand factors that influenced the onset of any cardio-related ailment. Also predicted the risk of contracting such an ailment from the data.

Key Targets :

Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Classification Algorithms, Label Encoding, OneHot Encoding, Boosting Algorithms

Programming Language :

Python

heart illustration
heart illustration
Survival Guide for the Safest Nations During the COVID Pandemic - Analysis of Economy and Demographic Indexes

Analyzed Economic Factors such as GDP, GDP per Capita, Pharmaceutical Exports and Imports, Human Development Index, Stringency Index and other demographic indexes

Key Targets :

Machine Learning, Classification, Neural Networks, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, NLTK, Skip-Gram Classification, WordNet, Recurrent Neural Network, Long Short Term Memory (LSTM), Bidirectional Auto Regressive Transformer (BART), T5 Architecture, Sentence Transformer, GPT-2

Programming Language :

Python

Analyzing Cricket Matches and Ball-by-Ball Play of the Indian Premier League - Creating an "Unbeatable" Team from the Analysis Stats

A multi-billion dollar sporting league, that garners millions of fans from around the world. An analysis into 8 seasons of the IPL, with a 170 matches per season, to determine the success factors for any team. Also, this algorithm creates an "unbeatable" team based on its analysis.

Key Targets :

Data Analysis, Feature Engineering, Data Visualisation

Key Libraries :

Pandas, Plotly, Seaborn

Programming Language :

Python

Analyzing and Predicting House Prices

Application of analysis and rudimentary supervised machine learning to predict house prices based on feature engineering.

Key Targets :

Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualisation

Key Libraries :

Pandas, Plotly, Seaborn, Classification Algorithms (scikit_learn)

Programming Language :

Python

white and red house near lake
white and red house near lake
Analyzing and Predicting Mobile Prices

Application of analysis and rudimentary supervised machine learning to predict mobile and smartphone prices based on feature engineering.

Key Targets :

Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, Classification Algorithms (scikit_learn)

Programming Language :

Python

Titanic - A Survival Prediction Guide

Application of Machine Learning and Feature Engineering to predict survival rate across various classes aboard the Titanic.

Key Targets :

Machine Learning, Classification, Data Analysis, Feature Engineering, Data Visualisation

Key Libraries :

Pandas, Plotly, Seaborn, Classification Algorithms (scikit_learn)

Programming Language :

Python

sunken ship
sunken ship
Analyzing Suicide Rates for the Past 40 Years Around the World

A deep analysis into suicide rates across major countries around the world, over a timespan of 40 years, and the factors that influence it.

Key Targets :

Data Analysis, Feature Engineering, Data Visualisation

Key Libraries :

Pandas, Plotly, Seaborn

Programming Language :

Python

Sentiment Analysis on Live Tweets

Analyzing live tweets and categorizing them via sentiment. Further n-gram prediction to predict follow-up words to search query, based on the tweets.

Key Targets :

Natural Language Processing, Machine Learning, Data Analysis, Feature Engineering, Data Visualization

Key Libraries :

Pandas, Plotly, NLTK, n-gram, Word2Vec, Textblob, Seaborn

Programming Language :

Python

An Artificially Intelligent "Unbeatable" Player for Checkers

A computer playing checkers against any user, learning with each move played in the game to ultimately become an undefeatable machine.

Key Targets :

Artificial Intelligence, Minimax Algorithm

Key Libraries :

Minimax Algorithm, PyGame

Programming Language :

Python

Solving Sudoku via Machine Learning

The computer is presented a Sudoku puzzle, and it can solve any valid puzzle it gets.

Key Targets :

Machine Learning, Back-Tracking Algorithm

Key Libraries :

Back-Tracking Algorithm, PyGame

Programming Language :

Python

Rudimentary Chatbot Designed with Deep Learning

Skeletal Bots developed for various social messaging services, that can interact with any user and perform basic functions and replies.

Key Targets :

Deep Learning, Natural Language Processing

Key Libraries :

Tensorflow & Keras, JSON

Programming Language :

Python

Personal Cryptocurrency - The Biro Coin

A personal Cryptocurrency built on top of ECR20 token.

Key Targets :

Cryptocurrency

Programming Language :

Solidity