Textbook Recommendation: "Pandas in Action" by Boris Paskhaver
A book for learning the most important Python library in data science
If you want to work in data science, you should learn Python. It is the programming language that most employers use, and the library that they use most often is Pandas. It is important to note that Pandas has its own syntax, and proficiency in base Python alone is not sufficient.
A good book for learning this Python library is “Pandas in Action” by Boris Paskhaver. It explains Pandas in clear and straightforward language, and it has many examples that you can follow in a step-by-step manner. If you can access O’Reilly Online Learning through your local public library or academic institution, then you can read it for free.
As the book’s publication webpage from Manning enumerates, here are the key skills that you will learn from this book:
Import datasets, identify issues with their data structures, and optimize them for efficiency
Sort, filter, pivot, and draw conclusions from a dataset and its subsets
Identify trends from text-based and time-based data
Organize, group, merge, and join separate datasets
Use a GroupBy object to store multiple DataFrames
Note that a “DataFrame” is a particular type of object for storing tabular data in Pandas. If you want to work as a data scientist and program in Python, you will need to become knowledgeable about the structure of a Pandas DataFrame and the various operations that you can perform on it.