Hi! I am Mayur 👋 Data Professional

Graduated with a MS in Data Engineering & Analytics from Northeastern University (Class of 2024 🎓). With 2+ years of experience in Python, PySpark, SQL, JavaScript and a suite of Cloud services (AWS, GCP, Databricks). My expertise & experience is centered around spearheading Data Engineering | Data Science | Data Analytics projects & pipelines at a petabyte scale, significantly enhancing business operations.

Socials:

GitHubLinkedInInstagramTableauKaggle

66.67%

 

About me

Mayur Mahanta
I'm a skilled data engineer turned data scientist with experience in Python, PySpark, SQL, AWS Services like S3, Glue Studio, Athena, QuickSight, Redshift, Lambda, GCP services, Databricks along with Talend, Tableau, Power BI, PostgreSQL, MongoDB. I also have experience in JavaScript, with expertise in frameworks like ReactJS, ThreeJS, AngularJS and VueJS. On the data side of things, with around 2+ years of work experience, and through development of personal projects, I have learned the importance of having an iterative, hypothesis-oriented approach to Data Engineering, Science/Analytics and Decision Intelligence. I also have experience as a Software Engineer in Front-end and Testing, skilled in ensuring the quality and reliability of software solutions. With expertise in tools like Cypress and Docker, I specialize in validating and enhancing the performance of applications, contributing to the creation of robust and dependable software experiences. An audiophile, guitar player and a vagabond who always likes to explore nature in pursuit of a greater purpose in life!

Data Science

Data Engineering

Analytics

Cloud Services

Technology & Expertise

Programming Languages

PythonPython
CC
JavaJava
RR
JavaScriptJavaScript
ScalaScala

Frameworks & Libraries

ReactReact
Node.jsNode.js
VueVue
AngularAngular
BootstrapBootstrap
Three.jsThree.js

Cloud Services & Platforms

AWSAWS
AzureAzure
DockerDocker
GitGit
GitHubGitHub
GitLabGitLab

Databases

MongoDBMongoDB
MySQLMySQL
DynamoDBDynamoDB
PostgreSQLPostgreSQL

Tools & Other Technologies

VS CodeVS Code
EclipseEclipse
MavenMaven
Scikit-learnScikit-learn
KafkaKafka
RegexRegex

Game Development

UnityUnity
UnrealUnreal
 

Work Experience

 

Projects

Highlighted below are projects that serve as tangible manifestations of my skills and experience. Each project is succinctly outlined, providing access to code repositories. They collectively underscore my adeptness in resolving intricate challenges, navigating diverse technologies, and proficiently overseeing project lifecycles.

project_image
source code

Classification Analysis on NYPD Crime Arrest Data

Presenting an insightful breakdown of NYPD arrests in New York City this year. I conducted comprehensive Data Wrangling, involving meticulous data preprocessing and cleaning. Leveraging Python libraries like Scikit-Learn, NumPy, Pandas, SNS, Seaborn, and Matplotlib, I extracted compelling visualizations. Machine Learning Models --

#DataScience

#KNN

#Linear Regression

#Logistic Regression

#Random Forest

#Naïve-Bayes

#Neural Networks

project_image
source code

Advanced Machine Learning in Predictive Maintenance

Advanced a predictive maintenance model using machine learning on a UCI dataset with 14 features, like Air Temperature and Torque. Focused on data preprocessing and employed Logistic Regression, Naive Bayes, Decision Trees, and SVMs for predicting equipment failures. Achieved a notable F1-score of 0.95 with Logistic Regression, significantly improving maintenance scheduling and reducing costs

#python

#MachineLearning

#LogisticRegression

#NaiveBayes

#DecisionTrees

#SVM

project_image
source code

Advanced Data Clustering Techniques in Machine Learning

Implemented K-means clustering and other techniques to segment synthetic and real-world datasets. The project involved thorough data preprocessing and optimization of clustering algorithms, validated through various methods to ensure result accuracy. Demonstrated clustering's potential in uncovering patterns for informed decision-making

#DataScience

#Python

#Sckit-Learn

#Numpy

#Pandas

#Matplotlib

#Seaborn

#KMeans

#AlgorithmOptimization

project_image
source code

YouTube Data Engineering & Analytics (AWS & PySpark

Developed a data engineering pipeline on AWS for YouTube analytics, encompassing data ingestion, ETL processes with Lambda, and data storage in a scalable S3 data lake. Leveraged AWS Glue and Athena for data organization and querying, culminating in QuickSight dashboards that provided detailed analyses of video trends and viewer engagement

#PySpark

#DataEngineering

#AWS

#Lambda

#QuickSight

#S3

#Athena

#Glue

project_image
source code

COVID-19 Data Insights & Trends (AWS & Python)

Constructed an AWS-based data analysis pipeline for COVID-19 data, starting with data collection into S3 and utilizing AWS Glue for ETL. A scalable data lake was formed using S3, with Redshift for warehousing and Athena for queries. Advanced data transformations and exploratory analysis were conducted using Python in Jupyter Notebooks, resulting in dynamic dashboards that visualized key pandemic trends and supported public health decisions

#Python

#DataMining

#AWS

#Redshift

#Athena

#Glue

#S3

#Public Health

project_image
source code

Spotify Data Engineering and Analysis (AWS & Python)

Utilized AWS and Python to analyze the Spotify 2023 dataset, revealing streaming trends. Cleaned and preprocessed data using Python, stored it in S3, and transformed it via AWS Glue for analysis. Leveraged Glue crawlers for schema detection and Athena for SQL querying, which fed into AWS QuickSight dashboards that highlighted music streaming dynamics, providing valuable industry insights

#DataEngineering

#Python

#AWS

#QuickSight

#S3

#Athena

#Glue

#MusicStreaming

project_image
source code

Employee Rating Application

Modelled and developed an employee performance Data Warehouse, by creating a multi-dimensional schema and ran various SQL and NoSQL (MongoDB) queries on it. Visualized and provided some important observations in Python by drawing significant insights on the Review Data.

#Python

#SQL

#MongoDB

#Neo4js

project_image
source code

Bike Sharing Analysis in Capital Bikeshare System (Tableau)

Analyzed Washington D.C.'s Capital Bikeshare data from 2011-2012, studying the influence of weather and seasons on bike-sharing patterns. Processed a dataset of 17,389 records to discover trends in urban mobility, using Tableau to visualize user behaviors, temperature impacts, and rental cycles. Results informed strategic bike distribution to enhance system efficiency and promote sustainable transportation

#DataAnalysis

#Tableau

#BikeSharing

#UrbanMobility

#Sustainability

project_image
source code

Data Analysis on Elections

This mini project delves into Indian Elections Analysis using Hadoop (Big Data). It encompasses a comprehensive study of the 2019 Indian Elections datasets through Tableau and Hadoop. This endeavor seeks to ascertain political party popularity, decipher intricate trends, and patterns via Python. Additionally, it probes the correlation between party density and topography. The project culminates with a detailed analysis report, utilizing Hive to explore and infer win/loss percentages of parties.

#DataAnalytics

#Tableau

#Hive

#HBase

#HDFS

#MapReduce

 

Contact