9 game-changing open source Python packages
For machine learning, data science, apis and much more!
Introduction
One of the best things about Python is how active and expansive the open-source community is around it. From the core developers being active in discussions to some wonderful packages seemingly popping out of no-where over time, there's so much to learn.
I will start with saying this is definitely not a complete list of all the wonderful packages and yes I am biased. While trying to keep it open to all walks of Python, I definitely focused on ones that I've found the most useful and in absence of a better word "game-changing".
Packages
Pandas
Pandas is hands down one of the best examples of abstracting out difficult concepts into an easy to use interface. It stands at pretty much the center of data analytics with its implementation of the DataFrame (as well as other data types). With the ability to graph, clean, and analyze (along with MUCH more), it stands as a great tool for any data analysis / data science project.
FastAPI
In my time of coding I've never seen a package developed by one person take the Python world by storm so quickly. It is built on the shoulders of [Pydantic] and [Starlette], while taking it a step further. By making an API framework which is fast, easy to use, and powerful with type-hinting, it greatly lowers the barrier of entry for making a scalable and maintainable API in Python.
To see the full set of features, check their official docs
SQLAlchemy
While in general SQLAlchemy is great for abstracting out a lot of setup needed when working with an SQL database in Python, the Object-relational mapper (ORM) is where it really shines. With their ORM, you're able to connect Python classes to SQL tables in an easy to use way that is built for fast results.
SpaCy
If you can't tell already, I like opinionated packages. They're the experts building them, so I generally like their guidance. SpaCy is exactly that! It gives you state-of-the-art NLP models in an extremely easy-to-use interface. Additionally, they have standards for training system pipelines so building any sort of production NLP model is super easy.
Scikit-learn
As pandas is synonymous with data analysis in Python, Scikit-learn (sklearn for short) is synonymous for Machine Learning. It's an amazing suite that can do supervised learning, unsupervised learning, model analysis, preprocessing, and much more!
httpx
While I do have a love for Requests (and honestly use it more regularly) I think the Python world will move over to httpx as async
work becomes more standard. With a simple interface like Requests for regular use and more complex operations for async work, I think it's the new HTTP client to keep an eye on for the future.
Poetry
If there's one package on this list purely from a productivity and sound-mind perspective, it's this one. I can't recommend the switch from any other dependency management (including virtual envs) and/or packaging package.
Poetry replaces any virtual environment system (pyenv, virtualenv, etc.), dependency management (pip) and publishing (tox, setup.py, etc.) with one single system and a few files! Continue below to see a setup that I think will convince you as quickly as it did me.
Poetry is more in-depth to implement so I'll leave it to the official docs to show how it's done.
Typer
If you didn't notice, this package is made from the same person who made FastAPI (Tiangolo). Typer is coined as the "FastAPI for CLI" in the sense that it also puts Python's type-hinting to work. Built on top of Click it allows you to create powerful CLI projects in just a few lines. And don't worry, it's quite extensible if needed.
NetworkX
Last but not least, NetworkX! Although not the best with performance, it's a great research tool when working with networks (also called graphs). From this like determining the shortest path between nodes, to determining what node is most central to a network, it can do pretty much everything I've ever needed to do with networks.
Additionally, there's been a decent push for Network -> Deep Learning (embeddings) algorithms with packages like GraphEmbedding so NetworkX might even be more useful in the future.