Data Scientist Analyst

Posted on: 2018-07-15

Summary

 

Data Analyst with 4 years of experience along with data science expertise in utilizing machine learning, data analytics, python, data visualization tools and SQL programming. Strong SQL programming and experience with creating Database and converting and loading raw data into structured data. Strong experience with creating Data Analysis stories, Visualization reports using Python and Pandas. Growing knowledge and working on Time Series Data Analysis.

 

Technical Skills


Machine Learning: Linear regression, Logistic regression, Classifiers: Random Forest, Decision Tree, Ada Boost, Multinomial Naïve Bayes, SVM, Time Series Prediction, ARIMA

Imbalance-Learn: SMOTE

Statistical Methods: Hypothesis testing (Chi-square contingency test)

Programming Languages: Python, SQL, Panda Packages as Scikit-learn (Libraries for Classifiers, Model evaluation, Metrics, Cross-Validation, Feature Importance), NumPy, SciPy, Matplotlib, Seaborn

Data: Data cleaning, Data wrangling, Data visualization (using Tableau), SAS
Version Control: GitHub, Microsoft Visual Source Safe (VSS)
Development Tools: Microsoft Visual Studio 2010, SQL Server Management Studio 2008/2012, Fiddler, MiniProfiler
Operating Systems: Windows, UNIX

Microsoft Technologies: C#, ASP.Net Framework, ADO.Net

Databases: Microsoft SQL Server 2008/2012, Oracle 11g

*Microsoft Certified Technical Associate and Oracle certified Associate

 

Projects

 

AUTO-TICKET RESOLUTION SYSTEM                                                                                                        May 2018 – June 2018

This project involves automating some of the processes of the organization’s service ticketing system where root cause can be automatically inferred from the ticket data.

Created a Text analysis model using Multinomial Naïve Bayes, Random forest and Decision Trees. Developed Feature Selection using SVM (Support Vector Machine) and Feature Engineering technique using Bag of Words and TF-IDF (Term Frequency – Inverse Document Frequency).

 

USED CAR DATABASE                                                                                                                                    Feb 2018 – May 2018

Worked on the Public dataset from Ebay Germany which has over 370,000 used car data.

Created statistical machine learning models as Linear Regression, Lasso, Ridge and Random Forest that will predict pricing based on the used cars features. I built a Random Forest model with a good R-squared value of 0.83. I also developed feature importance table which have llist with weigh of major feature on pricing.

 

Github Link: https://github.com/Sneharani143/Capstone-Project-2---Used-Car-Database

 

IBM HR ANALYTICS: EMPLOYEE ATTRITION                                                                                               Dec 2017 - Feb 2018

Worked on the Public dataset from Kaggle which was created by IBM scientists. Size of the dataset was 48KB.

Created statistical results to uncover the factors that lead to employee attrition. I used python and machine learning models as Logistic Regression, Random Forest, Decision Tree and AdaBoost to analyze Imbalanced Class problems and was able to achieve this through SMOTE by randomly sampling the attributes from instances in the minority class. I built a logistic regression model with a good recall score of 0.72 that will allow management to create better decision-making in terms of employees who left the company.

Github Link: https://github.com/Sneharani143/CapstoneProject_HR-attrition/blob/master/Capstone_Project_1.ipynb

 

WEALTH-OF-NATIONS-VERSUS-LABOR-PARTICIPATION                                                                           Oct 2017 - Nov 2017

Worked on World Bank Dataset having data from different countries in the world. This project is a combination of two different datasets: World Development Report 2013 and Wealth of Nations. Combined size of the Dataset was 300KB. Performed EDA on different features under different Income Group, Net Wealth and Women Participation. Developed statistical inference on how the wealth of nations and labor force participation of women varies.

Github Link: https://github.com/Sneharani143/Wealth-Of-Nations-versus-Labor-Participation/blob/master/Capstone_Project_Data Story.ipynb

 

 

 

Experience

 

Independent Contractor

Data Analyst / Data Administrator                                                                                                              June 2016 – May 2017

  • Oracle DBA freelancing projects across multiple platforms (Linux, and Windows) with responsibilities encompassing database installation and configuration, database creation, migration, upgrades, backup, recovery and capacity planning like allocating system storage and planning future storage requirements.
  • Database Administration activities such as Backup and Recovery and Capacity Planning.
  • Creating and managing tablespace.
  • Fulfilling DBA daily activities, including user management (creating users, privileges, roles, quotas, tables, indexes, sequence), space management (tablespace, rollback segment), monitoring (alert log, memory, disk I/O, CPU, database connectivity).

 

Thomson Reuters                                                                                                    

Data Analyst / Programmer                                                                                                                     July 2015 - May 2016

  • Worked on the Story portion of the DataStream product which provided a clearer visual for clients to view charts in PDF form or in an Interactive format.
  • NET framework on frontend for dynamic application.
  • Programmed using C# code to store data in database.
  • Used bx-slider plugin to create an interactive application for the user
  • Used cascade style sheets (CSS) and html (hypertext markup language) for styling and design to the application.
  • Utilized Fiddler for HTTP debugging proxy server application to troubleshoot HTTPS issues.
  • Mini profiler for profiling DataStream project.
  • Created Ado.Net functions for Data connection of Application code with Database.

 

Aroha Technologies       

Software Engineer                                                                                                                                                  March 2015 - July 2015

  • Software Engineer responsible for working in SQL database and dealing with Stored Procedures, Functions, Views, Index.

 

Inube                                                                                                                          

Programmer                                                                                                                                                 Feb 2015 - March 2015

  • Working for Client Reliance and projects related to insurance policies.
  • Developed front-end web pages using bootstrap for desktop and mobile app.
  • Involved in the design of the database packages and procedures.
  • Involved in the design and development of interactive user interface web pages.

 

PALLE Technologies                                                                                            

Intern                                                                                                                                                               July 2014 - Dec 2014                                                                             

  • Trainee Intern Engineer, responsible for projects using Microsoft Visual Studio development tool.

 

 

Education

 

Bachelor of Engineering in Computer Science

Visvesvaraya Technological University – JSSATE (Bangalore)                                                                                               2010-2014

Relevant Courses: Programming, Mathematics, Database

 

Springboard                                                                                                                                                                                 2018

Data Science Career Track Program

Relevant Courses: Python, SQL, R, Data Visualization, Machine Learning