Highlights from our Own Your Data Science and AI Workshop
By Cailean Osborne, Head of Ecosystem Development at Probabl
On Tuesday, May 5th, we packed the Open Stage at Station F for a full day dedicated to “Own Your Data Science and AI” – co-hosted by Probabl, the scikit-learn community, and Nicolas Flores-Herr from Fraunhofer IAIS.
The schedule was chock-full of exciting talks spanning the full open source AI stack: tabular foundation models like TabICLv2, LLMs, smol models from Hugging Face, evaluating agents on economically valuable tasks, scikit-learn acceleration on GPUs, and open source tools enabling data scientists to scale the impact of their AI projects like skore, Kedro, and dlt.
Below, we recap the key takeaways from the talks.
SOTA open source foundation models
From tabular foundation models like TabICLv2 to LLMs, the workshop showcased what’s possible when state-of-the-art AI is built and shared in the open. The talks in this section spanned the spectrum of open source foundation models.
Gael Varoquaux, CSO of Probabl and Research Director at Inria, presented TabICLv2, a state-of-the-art tabular foundation model developed by his SODA team at Inria:
“Table foundation models (TFMs) bring strong predictive performers without the need for any tuning. TabICL is a state-of-the-art TFM for tabular classification and regression on the TabArena and TALENT benchmarks. It’s open source, pip-installable, and scikit-learn compliant.”
Nicolas Flores Herr, Manager of Foundation Models & Generative AI Systems at Fraunhofer IAIS, made the case for why Europe needs to go beyond funding research and start building:
“Europe doesn’t lack funding, talent, or ideas to build competitive foundation models. It lacks the structures: dedicated model-building teams, purchasable AI compute, and continuous development beyond one-shot research projects. Such teams magnetize top talent, and when embedded with industrial users from day one, they create a flywheel between model development and enterprise adoption that makes both sides stronger. Yes, we need to continue to fund AI research but it’s as important that we fund teams building and scaling Frontier AI models.”
Nouamane Tazi from Hugging Face shared hard-won lessons from training state-of-the-art language models:
“Understanding tradeoffs between memory, communication and compute is key to successfully scaling your training. Get these right, and you can confidently scale from a single node to thousands of GPUs.”
Said Taghadouini from LightOn presented LightOnOCR, showing that domain-specialized small models can punch well above their weight class:
“For document OCR, a specialized 1B model trained with the appropriate recipe can beat 70B-class VLMs on independent benchmarks while running on a single GPU. LightOnOCR-2 is fully open-weight under Apache 2.0, so anyone can fine-tune and run it on their own hardware.”
scikit-learn acceleration
A recurring theme across multiple talks: adoption of the Array API is well-underway in scikit-learn, enabling significant speed-ups and productivity gains for data scientists.
Stefanie Senger, PhD and Olivier Grisel from Probabl presented ongoing work adopting the Python Array API standard in scikit-learn:
“scikit-learn is getting faster by adopting the Array API standard, and can increasingly work with arrays from different libraries like NumPy, CuPy, DPNP, and PyTorch. Computations stay on the input array’s original device (GPU/accelerator or CPU), so users can switch array libraries with minimal code changes.”
David Cortes from Intel introduced Extension for scikit-learn, which takes a different, more immediate approach to acceleration:
“scikit-learn provides user-friendly interfaces to well-established ML algorithms, but the library might not always scale to large data problems or not allow to fully exploit server-grade hardware. Extension for scikit-learn offers a seamless way of accelerating existing scikit-learn workflows with two lines of code that swap selected scikit-learn calls with calls to highly optimized C++ libraries.”
We also heard from NVIDIA’s Andy Terrel (Python on GPUs: Speeding Up Machine Learning) and Tim Head (Accelerating scikit-learn with GPUs), as well as Quansight’s Evgeni Burovski (SciPy’s Support for GPUs, PyTorch, and CuPy) – rounding out a deep dive into the latest developments in GPU-powered scientific Python.
Best practices for evaluating models and agents
Agents represent a major opportunity for productivity gains, but only if you can trust what you put into production. These talks approached that challenge from multiple angles: rigorous evaluation tooling, benchmarking agents on real work rather than clean sandboxes, and making sure practitioners have the deep skills to build reliable pipelines in the first place.Artur
Marie Sacksick and Fabien Pesquerel from Probabl presented Skore, Probabl’s open source tool for model experimentation, evaluation, collaboration, and communication -- designed to close the gap between your models and the stakeholders who need to act on them. It’s Python-native, framework-agnostic, and built around the full model lifecycle.
“Your compiler can run your model without computation errors, but it can’t catch methodological ones. Skore is the toolbox that helps your data science team ship the best model they trust, so your stakeholders trust them back.”
Björn Plüster and Benedikt Droste from ellamind presented their work on evaluating frontier agents against economically valuable, real-world tasks – a meaningful step beyond sandbox benchmarks:
“Most enterprise agent pilots stall not because the model is weak, but because benchmarks live in a clean sandbox while real work happens inside a restricted, human-shaped workplace. We build evaluation environments that put the agent in that same workplace and grade it on whether the workday actually gets done.”
David Arturo Amor Quiroz from Probabl spoke about Skolar and the importance of leveling up ML and AI skills for reviewing, verifying, and trusting model pipelines generated by AI coding assistants:
“Data scientists and ML engineers are increasingly using AI coding tools to assist or even generate their ML pipelines, but reliable models start with creativity and a deep understanding of how they are trained and evaluated. Skolar helps you level up your skills and ship models into production with confidence.”
Tools for data scientists
Several talks showcased open source tools built to close the gap between data science notebooks and production systems.
Sajid Alam from QuantumBlack presented Kedro, framing the core problem simply: AI projects don’t fail because of bad models, but because of the code around them:
“AI projects don’t fail because the model was wrong, they fail because the code around the model wasn’t engineered to ship. Kedro brings software engineering fundamentals to Data Science and AI Workflows so teams can move from notebook to production without rewriting everything in between.”
Sylvain Corlay from QuantStack introduced notebook.link, a browser-native Jupyter experience:
“Notebook.link revolutionizes the sharing of Jupyter notebook by offering a fully interactive, scalable, and language-agnostic computing environment - all in your browser. Its seamless integration of JupyterLite, WebAssembly, and multi-language support makes it a game-changer for data scientists, educators, and developers alike.”
Data
Anastasia Stasenko and Yannick Detrois from Pleias made a compelling case for synthetic data as a first-class training resource:
“Synthetic data allows to collapse all phases of training into one, baking in reasoning and uncertainty signals directly into the model. Entirely synthetic models built from curated seeds can replace traditional training pipelines and produce competitive language models more efficiently and openly.”
Violetta Mishechkina from dltHub talked about turning agent traces into structured analytics – an increasingly important capability as agentic systems proliferate.
Responsible AI
The workshop closed on a note that tied everything together: what does it mean to build AI responsibly, and what role can open source communities play?
Joanna Kramer from WISE and Marie Sacksick from Probabl argued that the principles for responsible AI are already embedded in open source culture – we just need to act on them:
“Core open source principles like horizontal learning, democratized knowledge sharing, and peer review offer the scaffolding for responsible AI development. The principles are there. We just need to activate them through intentional design of things like onboarding processes, meetup facilitation, governance structures, and multi-stakeholder contribution.”
Thanks to all speakers, attendees, and organizers who made this event possible. Days like this are a reminder of what the open source community does best: share knowledge freely, challenge assumptions openly, and build things that can be used, modified, and redistributed by everyone. We rounded up the day with a social at La Felicità, co-hosted with our friends at Fundamental. We had a blast and hope the community did, too.
See you at the next one!
For more from Probabl
Follow our latest updates on LinkedIn
Subscribe to our monthly newsletter
Check out over 100 tutorial videos on our YouTube account
Level up your machine learning skills for free with Skolar















