MORE THAN JUST A COURSE
Python for Data Science
This course is to teach the analytical mindset & programming skills relevant to data science. Students will continue to polish the basics of the Python programming language, along with a set of tools for data science in Python, including the Jupyter (IPython) Notebook, NumPy, Pandas, Matplotlib and Scikit-learn. Students will learn skills that cover the various phases of exploratory data analysis:
- Importing data
- Cleaning and transforming data
- Algorithmic thinking
- Grouping and aggregation
- Visualization
- Statistical modeling/prediction
- Communication of results
The course will utilize data from a wide range of sources and will culminate with a final project and presentation.
Part 1: The Pandas DataFrame Library
- Pandas & DataFrames • Pandas Basics • Interaction with DataFrames
- Importing Data • Importing data from a list or dictionary • Importing data from a flat file • Importing data from a database • Importing data from a JSON file
- Data Exploration • Describe() • Unique counts • Basic Pandas charting to see distribution of data
- Data Cleaning • Grouping and Replacing values • Data types • String cleaning • Handling null values • Removing duplicates. • Renaming columns • Dropping columns • In-line lambda functions • Lambdas with functions for complicated logic
- Data Filtering • loc, iloc, and slicing functions • Categorical and distinct filters using boolean indexing • Numeric and range filters • Date filters • Multi-level filters
- Data Joining • Inner Joins • Left Joins • Difference between join and merge functions • Concatenating (Unions)
- Aggregating Data • Rolling data up to a higher level (equivalent of SQL group by clause) • Multi-level aggregations • Understanding the reset_index function
- Outputing Data • Exporting as CSV • Exporting as Excel • Export options • Exporting to Database
Part 2: Python and Data Science Applications
- Machine learning: classification • Introduction to the package scikit-learn • Classification for data exploration using decision trees • Classification for prediction • Measuring classification performance
- Machine learning II: Regression • Regression for prediction • Linear • Logistic • Difference between regression and classification • Measuring regression performance
- Machine learning III: Clustering (if time at the end) • What is clustering and how is it used? • Clustering algorithms: K-NN • Describing clusters and goodness of fit • Export options • Exporting to Database
- Advanced Data Cleaning (If time allows) • Pivot Data • Adding information from aggregated data back down to row level items • Using where functions • Using map functions • Data formats for different output use cases such as BI Reporting, machine learning, relational database, data warehouse, etc. •Scheduling and automation using command line
have any questions or concerns?
Start Today With A Free Quote
As you are reading this, the world of data analytics is evolving. Make sure your team is up to date with our practical, insights-focused data analytics training. Start now with a free, no-obligation quote.