Graduated with a MS in Data Engineering & Analytics from Northeastern University (Class of 2024 🎓). With 2+ years of experience in Python, PySpark, SQL, JavaScript and a suite of Cloud services (AWS, GCP, Databricks). My expertise & experience is centered around spearheading Data Engineering | Data Science | Data Analytics projects & pipelines at a petabyte scale, significantly enhancing business operations.
CVSHealth
Abiomed
Ericsson Global
Ericsson Global
Highlighted below are projects that serve as tangible manifestations of my skills and experience. Each project is succinctly outlined, providing access to code repositories. They collectively underscore my adeptness in resolving intricate challenges, navigating diverse technologies, and proficiently overseeing project lifecycles.
Presenting an insightful breakdown of NYPD arrests in New York City this year. I conducted comprehensive Data Wrangling, involving meticulous data preprocessing and cleaning. Leveraging Python libraries like Scikit-Learn, NumPy, Pandas, SNS, Seaborn, and Matplotlib, I extracted compelling visualizations. Machine Learning Models --
#DataScience
#KNN
#Linear Regression
#Logistic Regression
#Random Forest
#Naïve-Bayes
#Neural Networks
Advanced a predictive maintenance model using machine learning on a UCI dataset with 14 features, like Air Temperature and Torque. Focused on data preprocessing and employed Logistic Regression, Naive Bayes, Decision Trees, and SVMs for predicting equipment failures. Achieved a notable F1-score of 0.95 with Logistic Regression, significantly improving maintenance scheduling and reducing costs
#python
#MachineLearning
#LogisticRegression
#NaiveBayes
#DecisionTrees
#SVM
Implemented K-means clustering and other techniques to segment synthetic and real-world datasets. The project involved thorough data preprocessing and optimization of clustering algorithms, validated through various methods to ensure result accuracy. Demonstrated clustering's potential in uncovering patterns for informed decision-making
#DataScience
#Python
#Sckit-Learn
#Numpy
#Pandas
#Matplotlib
#Seaborn
#KMeans
#AlgorithmOptimization
Developed a data engineering pipeline on AWS for YouTube analytics, encompassing data ingestion, ETL processes with Lambda, and data storage in a scalable S3 data lake. Leveraged AWS Glue and Athena for data organization and querying, culminating in QuickSight dashboards that provided detailed analyses of video trends and viewer engagement
#PySpark
#DataEngineering
#AWS
#Lambda
#QuickSight
#S3
#Athena
#Glue
Constructed an AWS-based data analysis pipeline for COVID-19 data, starting with data collection into S3 and utilizing AWS Glue for ETL. A scalable data lake was formed using S3, with Redshift for warehousing and Athena for queries. Advanced data transformations and exploratory analysis were conducted using Python in Jupyter Notebooks, resulting in dynamic dashboards that visualized key pandemic trends and supported public health decisions
#Python
#DataMining
#AWS
#Redshift
#Athena
#Glue
#S3
#Public Health
Utilized AWS and Python to analyze the Spotify 2023 dataset, revealing streaming trends. Cleaned and preprocessed data using Python, stored it in S3, and transformed it via AWS Glue for analysis. Leveraged Glue crawlers for schema detection and Athena for SQL querying, which fed into AWS QuickSight dashboards that highlighted music streaming dynamics, providing valuable industry insights
#DataEngineering
#Python
#AWS
#QuickSight
#S3
#Athena
#Glue
#MusicStreaming
Modelled and developed an employee performance Data Warehouse, by creating a multi-dimensional schema and ran various SQL and NoSQL (MongoDB) queries on it. Visualized and provided some important observations in Python by drawing significant insights on the Review Data.
#Python
#SQL
#MongoDB
#Neo4js
Analyzed Washington D.C.'s Capital Bikeshare data from 2011-2012, studying the influence of weather and seasons on bike-sharing patterns. Processed a dataset of 17,389 records to discover trends in urban mobility, using Tableau to visualize user behaviors, temperature impacts, and rental cycles. Results informed strategic bike distribution to enhance system efficiency and promote sustainable transportation
#DataAnalysis
#Tableau
#BikeSharing
#UrbanMobility
#Sustainability
This mini project delves into Indian Elections Analysis using Hadoop (Big Data). It encompasses a comprehensive study of the 2019 Indian Elections datasets through Tableau and Hadoop. This endeavor seeks to ascertain political party popularity, decipher intricate trends, and patterns via Python. Additionally, it probes the correlation between party density and topography. The project culminates with a detailed analysis report, utilizing Hive to explore and infer win/loss percentages of parties.
#DataAnalytics
#Tableau
#Hive
#HBase
#HDFS
#MapReduce