I have come across the Python array API standard and the dataframe API standard, and I would like to know if there is something like a plotting API standard?
There are many data visualization libraries in Python (Matplotlib, Bokeh, Plotly… to only name a few).
All libraries have their own APIs, and forcing all of them to adopt a common API seems unrealistic at this stage (e.g. are the limits on the x-axis
However, I was wondering if it would be possible/useful to have some sort of standard data structure that would fully describe a figure, that could be used by all libraries.
It could for instance be a
json blob that would contain the axes labels and ranges, the type of plot, the data to be plotted, the size of the figure, the font size etc… anything that would be needed to make a plot reproducible across different libraries.
This would make it easier for projects that use these libraries to quickly switch to a different plotting solution if they want to experiment. Right now, the cost of moving from Matplotlib to Plotly can be quite high.
Thanks for any comments/ideas.
I’m responding to watch this issue and to go on record to say that I like it.
It couldn’t cover the whole API of any one of those libraries (especially not Matplotlib), and it would be unrealistic to expect pixel-level fidelity between the libraries for a given set of generic plotting instructions (as, for instance, CSS aspires to/often achieves across web browsers). But suppose there were generic instructions, especially if they could be encoded in JSON, which gets a plot 90% right, which can then be tweaked by specialized instructions (
plt.subplots_adjust!) that depend on the backend library.
The closest attempt I’m aware of is Altair.
Underlying Altair are Vega and Vega-Lite, two JSON standards; Vega-Lite is a high-level overlay on Vega. Altair is a nice Python interface to these language-independent standards.
(I had entirely forgotten about it, but 5 years ago I made a viewer for developing both Vega standards interactively: VegaScope. Altair is probably better, especially if it interfaces well with Jupyter.)
One thing about the Vega standards, though, is that they can be confining. They don’t provide you with a canvas and help you convert data coordinates into page coordinates or anything like that (like what d3 does); instead, they associate datasets with plot aspects, like horizontal position, vertical position, color, shape, etc. If you want to make “normal” plots of a familiar type, that’s great—less boilerplate. If you want to make weird plots (as one would with d3), you’re out of luck. The place where I hit a wall, 5 years ago, was trying to plot histograms in which I had the already-aggregated bin data, rather than the data to be aggregated (because the already-aggregated data was many orders of magnitude smaller in size).
Thanks for the replies, interesting stuff.
For the record, I am definitely looking for something that gets you 90% of the way, not something that is pixel perfect.
Even something that would produce a different look and feel depending on the library, but all the info displayed on the plot is equivalent, would be fine.
Hi, so there are sorta multiple things going on in different directions but I think related to your question:
A couple of years back, there was an attempt at a common protocol at GitHub - pyviz/spec: Minimal shared API spec and it had involvement from matplotlib and bokeh folk.
GitHub - matplotlib/data-prototype is building these new artists that use a data model based on fiber bundles, which Butler proposed as a sort of uniform data model for visualization data. Fiber bundles are these objects from algebraic topology that are really nice 'cause they separately encode topology/continuity and field types (e.g. that temperature and pressure fields are over the same 3D space) while also encoding how to look up which values are over which part of the continuity. They’re mostly topology and field agnostic so the abstraction generalizes to almost everything. And there’s a nice extension into sheaves, which are another mathy thing you can use to describe the rules for reassembling distributed data.
ETA: Nick Krutchen from plotly has a nice project comparing the different visualization APIs: NotaScope: my data visualization research-in-progress
There are a few answers here, though I don’t know if they meet your needs:
HoloViews lets you write the same code for Bokeh, Matplotlib, or Plotly plots. The three plots have the same data but come out looking different. You can add styling hints for each of the three backends, which will take effect only when that backend is in use.
The High-level shared API tools listed at PyViz.org all use largely the same API that comes from Pandas .plot, allowing you to create plots with Bokeh (using hvPlot or Pandas-Bokeh), Matplotlib (directly or via hvPlot), or Plotly (using hvPlot or cufflinks) from the same code. hvPlot is a wrapper over HoloViews, but with a very different API.
As mentioned above, Julia Signell and I proposed a common spec that would be implemented by each of the libraries independently in a compatible way, but we both ended up focusing on other things. Katrina Riehl at NumFocus has taken over some of the PyViz duties, and this might be one that she could move forward. To follow up with her on that, tag her over at the PyViz issue!
None of those approaches are JSON based, and as far as I know only Altair directly supports JSON serialization. At some point Bokeh supported rendering Matplotlib figures, but that turned out to be to difficult to maintain. I think plotting library authors already struggle just to keep up with issues raised on a single library, so hopefully one of the existing approaches above is sufficient.
Thanks for the links, I see that the
pyviz/spec has a couple of old PRs that were never merged.
It is a nice idea to try and standardize the APIs, but I suspect it is a very difficult task, as getting everyone to agree on a single API is probably unrealistic.
I presume the work went into projects like HoloViews to wrap the existing libraries and add a new API on top, thus hiding the different APIs beneath, which was probably easier to do.
I think users should be free to use the API they like, and developers should be free to develop the API they want, but it would be very useful if one could pass around an object that all of them could turn into a visual representation. Basically have a new method (so you don’t have to modify the existing API), something like
LIBRARY.from_json(data), which would make a figure.
Yes, that would be nice! If the functionality is restricted to what is shared across libraries, such a spec will end up covering only a tiny fraction of what each library can do, because each library offers very different, complementary functionality, and invoking that functionality is the library-specific. Anyway, I do believe in there being at least a shared core that people can use that is the same across libraries, and would be happy for our own libraries at HoloViz.org to support that!