Fast prototyping in data science with Streamlit

  • Scope:
  • Artificial Intelligence
  • Generative AI
Fast prototyping in data science with Streamlit

Ever find yourself sipping warm coffee on a Sunday morning while the love of your life is making pancakes in the kitchen and thinking to yourself, “Man, I love frontend development…”? Yeah, me neither. 

If the mere thought of placing drop-down menus or buttons on an HTML website using Javascript makes you choke on your warm beverage – fear no more. For Streamlit is here, a pure Python framework for simple web applications, and it’s here to save the day.

ML developers are not frontend masters

Let’s face it, the majority of Machine Learning folks would much rather debug tensor shape issues than design and implement a web application. Names like Flask or FastAPI exist, but setting up a couple of endpoints is a lot different than making a pretty web application that doesn’t look like a Windows-95-style blast from the past. 

The alternatives aren’t pretty, either. If you’ve ever had the opportunity to explain to a less technical person how they should set up a Docker container with GPU support in order to start a web server that accepts CURL requests, you probably know their enthusiasm isn’t at its peak. That sad look on their face is a sign. A sign that says: we can do better.

Why does it matter?

Pause, tape-scratch, and rewind to your last project. The client was not a technical person, but whatever. The model you trained worked perfectly. It scored above 95% accuracy on the test set. Even the saliency maps make it look like it might actually be focusing on the important parts of the image. And sure, it throws a CUDA_OUT_OF_MEMORY error from time to time, but hey, just buy a bigger machine, pal.

Your heart is racing with excitement as you prepare a live demo for your client. You flex your fingers and spin up a Jupyter Notebook on a remote cluster. You graciously Shift+Enter through the cells with a smug look on your face to deliver the final: “See? It works.”

The client is a bit surprised: Jupyter Notebook? Shift+Enter? Some terminal mumbo-jumbo? Is this what they’re paying for?

“Yeah, we’re gonna take another look at your budget.” Yikes.

Streamlit to the rescue!

What if the world was a simpler place? A place where I could just write some Python code and have a nice web application anyone could enjoy?

The idea isn’t exactly new: we have Flask and FastAPI but, as I said, these are more like backend servers. Yes, we also have things like Dash Plotly, but this feels like Flask with extra steps. And Django… well, Django is just absolutely massive, suitable for large, commercial, enterprise-type projects, not for the one-man-army prototyping style.

Simple Flask application. Source

Streamlit, on the other hand, advertises itself as a minimal Python framework, tailored specifically for Machine Learning Engineers and Data Scientists that helps build web applications. It has (almost) all the components I could wish for in a web framework. Button? Bang, single Python object. Drop-down menu? Bang, single Python object. Showing a picture that dynamically changes whenever a user pushes a button? Yep, you guessed it: single Python object.

We could probably go all day long about what makes Streamlit great, but let’s focus on the two aspects that make it outshine the competition: Data Science tools and Unit Tests support.

Data Science tools

Native integration with data science formats is, in my opinion, the strongest selling point of Streamlit. And it is probably what makes it great for the majority of people.

Now, let’s take a step back and talk a bit about the visuals. I like to think of people as visual creatures. In fact, visual content is 43% more persuasive than text alone and people tend to process an image up to 60,000 times faster than text

We do, after all, create so many charts and slides for a reason. We like to look at data much more than we like to actually read the data. And I believe that anyone, regardless of their background, will appreciate you presenting your work in a nice and understandable way.

The data science ecosystem is cluttered with various formats to display different things. We’ve got matplotlib charts, plotly charts, pandas dataframes, graphviz tools for model visualization, etc, etc.

We can display most of these things in a Jupyter Notebook. But what about a web app? Streamlit has got you covered.

Almost every widget Streamlit offers that can display something has strong support for typical data science objects. That means stuff like Pandas Dataframes or matplotlib charts work out of the box. If you feel super visual-ish, you can even pop in the entire Keras model and the docs claim it should work.

Example input types accepted by st.write. The list goes on. Source

I’d like to take this moment to express my appreciation for the effort put into this. Making our research look clean and readable, with such minimal overhead, is amazing.

Unit tests

Personally, the feature I’m most excited about is the support for writing unit tests. 

Testing each part of a web application in Python doesn’t sound like the smoothest experience. Yes, Flask has native support for testing endpoints, but it’s a bit different than testing the behavior after a button click. Stacking too many workarounds to test a single functionality makes the tests ugly, hard to maintain, and, worst of the worst: makes us wish we didn’t have those tests in the first place. 

This is exactly the opposite of what we want. We want to have fast, deterministic tests, and we want to be proud of them.

Streamlit comes with its own testing module named AppTesting. Using simple Python functions, we can seamlessly test individual components as well as entire applications. Everything can be declared as a simple unit test, which is clean, quick in execution, and immediately reports an error if we happen to have broken something. 

Example of AppTest in action. Source:

The drawbacks

As Uncle Ben once famously said: “With great power comes great responsibility.” The opposite is sometimes (often) true: with little responsibility comes little power. Yes, you’ve declared stuff in pure Python and you basically throw around objects with various parameters, but… that’s mostly it. 

Whenever you need more customization, more dynamic behavior, or sometimes only slightly more control –  things can get messy.

Issue no. 1: limited layout customization

Let’s talk about everyone’s most loved topic in the front-end development world – the layout. The joys of centering a div has left countless developers thinking about dropping their brand-new MacBook Pros and taking up farming instead. 

“Don’t worry, little one,” says Streamlit. “For I am fully, automatically responsive.”

“Y-you are?” I ask, stuttering in awe.

“Yes.” Streamlit assures me in a cold-tempered voice. “Whenever you resize the browser screen, I will adjust. With absolute grace!”

“Impossible!” I whisper to myself. 

I jump on the keyboard, tweaking various screen sizes in the Chrome Developer Console. Everything works great until I hit the one dreaded option: 

Mobile Screen Size. 

Streamlit doesn’t make a sound. I know it still works, but it doesn’t look so good anymore. It looks bad. It looks sick. The buttons are all over the place. The small icons, once clustered together, now take up 90% of the screen. This looks nothing like my app.

“A-are you okay?” I ask in a worried tone.

Streamlit doesn’t say anything. 

I let out a deep sigh, “I wish I had the option to explicitly declare the layout of your components based on the current screen size.” 

Streamlit stays silent.

“You don’t seem to be the strongest candidate for mobile screens,” I think aloud. “I guess when I’m only targeting desktop screens, you’ll be good to go.”

Issue no. 2: no conditional logic.

The other thing that I’d like to mention, which almost resulted in a few sleepless nights, is limited support for dynamism in apps. Say you’d like certain options to be enabled or disabled based on other options you’ve set. Streamlit doesn’t like that very much.

And yeah, the docs say you should refer to caching, which might work for your use case. Unfortunately, it didn’t work for us. And yeah, there is the “unsafe” keyword that lets you go into Javascript mode to tweak certain behaviors. But, uh… can we, like… not? 

Wrapping it up

There it is, folks. Streamlit. Some love it, and some love it less. 

It has some sharp edges here and there, but overall, it’s a decent choice for writing simple web applications, sharing the results of your work, or simply having a bit of fun. Be sure to give it a try, and maybe the world of frontend development will be a bit brighter in your Machine Learning career.

Till the next one,


Similar Posts

See all posts