Almost exactly a year ago we blogged about the finished map of the fruit fly brain. Today, we celebrate the publication of the two (much improved) papers – one led by the folks in Princeton, one led by us – that jointly describe this “FlyWire” brain dataset in Nature:
- Dorkenwald et al. describes the overall resource, the proofreading effort and showcases some high-level analyses of the dataset
- Schlegel et al. provides neuron annotations and validates the dataset against another (partial) brain map
See our media page for more images + videos!
Here is an analogy that I find useful to explain this duet:
Imagine having satellite images of the entire world and you want to turn them into Google maps or – even better – OpenStreetMap. The first thing you need to do is find and digitalize all the roads, buildings and natural structures such as rivers and lakes. At that point you can already generate instructions on how to get from coordinate A to coordinate B. But what you really want is to be able to ask “Show me how to get from 10 Downing Street to 21 Baker Street” or “Find me a nice pizzeria somewhere close by”. For that you need labels: street names, opening hours, reviews and so on. Makes sense so far? Good! Going back to the FlyWire connectome:
- The high-resolution electron microscopy (EM) data1 (Zheng et al. Cell, 2018) of a fly brain is analogous to the satellite image data – contains all the information but not very useful in its raw state.
- Using AI to extract neurons (Dorkenwald et al. Nat. Methods, 2022) and synapses (Heinrich et al. arXiv, 2018; Buhmann et al., Nat. Methods 2021) from the EM data, followed by human proofreading (the new Dorkenwald et al. Nature, 2024) is analogous to finding roads, buildings, lakes, etc. on the satellite images.
- Annotating the neurons with extra information such as cell type, transmitter, etc. (Eckstein et al. Cell, 2024; the new Schlegel et al. Nature, 2024) is analogous to adding names for streets and businesses, opening hours and so on.
As you can see from the various publications sprinkled throughout the text above2, our two shiny new papers are the result of years of work not just by us but many others. And of course it doesn’t stop here: over the course of the next few months, there will be many more papers from labs all over the world using the FlyWire dataset for their work. Nature has put together a collection page to track those appearing as part of the paper “package”.
Most (if not all) of the above information is also available through the various press releases and landing pages (see also the links at the bottom of this page). So instead of repeating things you may or may not already know, I’d like to focus on the people that didn’t get as much attention.
Unsung heroes
Invariably, there are people whose contributions end up falling a bit by the wayside – not out of maliciousness or neglect but when you try to get your paper published nobody seems interested in how the sausage was made (so to speak). As an author you then find yourself adding unsatisfying, half-sentenced thank-you notes to the paper’s acknowledgements section. To relieve my guilty conscience, I will use the second half of this post to tell you a bit about the behind-the-scenes people that didn’t end up in this particular limelight.
Outsourcing
You can perhaps imagine that it is rather difficult for small teams to suddenly and quickly scale up their operations. In particular when you know that you will likely also have to downsize in a few months – either because the project is finished or because money is running out. That’s pretty much the situation we found ourselves in when we decided to go all in on FlyWire. While we did grow our team in Cambridge (at peak we had 17 people in the group), both we and Princeton ended up outsourcing parts of the work to specialists. On our end, we contracted Ariadne.ai3 who proofread around 14% of the central brain4 in addition to our own efforts. Aelysia5 helped with annotations and proofreading whenever things got a bit more tricky. Not contracted by us but by Princeton: several Seung lab alumni founded Zetta.ai which provides connectomes-as-a-service. They re-aligned the Bock lab’s original EM image data and ran the initial segmentation which the FlyWire consortium collectively worked to proofread over the last few years.
Connectivity
The FlyWire dataset has two key resources: the morphologies of all individual neurons and the network graph of how they connect to each other. Both are intrinsically linked – after all you can’t really connect to someone if they aren’t in physical proximity. However, when we started working on FlyWire in mid 2020, the only available data was the neuron segmentation. Consequently, we only ever looked at neuron morphologies and had little to no clue about their connectivity. At the time, there was a “someone will solve that later” attitude to the problem. And what do you know – someone did it! A lot of someones, in fact. The groundwork had been laid by Larissa Heinrich from the Saalfeld lab (Janelia Research Campus) who used AI to detect synaptic clefts from EM images. The second piece to the puzzle – predicting pre- and postsynaptic partners from the clefts – was provided by Julia Buhmann from the lab of Jan Funke (also Janelia Research Campus). Julia and Jan were kind enough to share their data ahead of publication and just like that6 we had connectivity for FlyWire neurons! Initially that huge (130M rows after some filtering) connectivity table was a bit clunky to handle but with a bit7 of software engineering querying connections is now pretty seamless.
As the icing on the cake, the Funke lab in collaboration with Alex Bates (then PhD student in the Jefferis lab) and to everyone’s surprise, managed to reliably predict neurotransmitter identities from the raw EM image data. This data was also kindly shared ahead of publication and is now used in many of the FlyWire papers.
Software Stack
The FlyWire project as it is today would not have been possible without a great many technical innovations on the software side. Here are shout-outs to some of the relevant people and projects (in no particular order):
- Jeremy Maitin-Shepard (Google) developed Neuroglancer, a WebGL based viewer for volumetric (images, segmentation, etc) data. FlyWire and many other connectome projects use modified versions of Neuroglancer for proofreading and exploration.Â
- The Seung lab and Zetta.ai built the tools to re-align (Popovych et al. Nat. Comm., 2024) and segment the image data.Â
- Nico Kemnitz, Akhilesh Halageri and Sven Dorkenwald (then Seung lab) created PyChunkedGraph (part of the CAVE ecosystem, see below) which is the data management and proofreading backend underlying FlyWire.
- Will Silversmith developed various Python libraries (cloud-volume, kimimaro, igneous) to process and interact with connectomics data. A lot of our own tools use his tools under the hood.
- Eric Perlman developed and maintains various microservices that were critical in the early days of FlyWire before CAVE was established, including a service mapping between the original FAFB14 and FlyWire coordinate space that is still frequently used.
- Forrest Collman, Casey Schneider-Mizell, Sven Dorkenwald, Derrick Brittain (all currently at the Allen Institute for Brain Science) and others develop and importantly maintain the “Connectome Annotation Versioning Engine” (CAVE). Without getting too much into the weeds: CAVE allows layering extra information on top of the neuron segmentation, crucially including (but not limited to) neuron annotations and synapses.
Tech Support
A lot of the work in the group relies on data and services hosted on our own servers at the MRC-LMB. The person making sure everything from SSL certificates to kernel updates runs smoothly is our own Andrew Champion8.
Further reading:
- UKRI press release
- Princeton press release
- University of Vermont press release
- MRC-LMB news story
- Nature’s landing and collection page for the FlyWire paper package
- FlyWire.ai homepage
- Codex (FlyWire data explorer)
- For raw data enthusiasts:
- Zenodo repository with connectivity data (by S. Dorkenwald)
- Zenodo repository with skeletons neuron skeletons and NBLAST scores
- Github with annotations and other data artefacts
Edits
04/10/24:
- Corrected year for Dorkenwald et al. reference (2018 -> 2022)
- Added Nico Kemnitz as contributor to ChunkedGraph
- Added Derrick Brittain as contributor to CAVE
- Made a note that ChunkeGraph is part of the CAVE ecosystem
- Added link to Princeton press release
11/10/24:
- Added link to Popovych et al. alignment paper
- Added shout-out to Eric Perlman
- You can explore & download the raw EM images at https://temca2data.org/. ↩︎
- I’m even probably forgetting some for which I apologise. ↩︎
- A Swiss company that provides image analysis services. ↩︎
- Around 9% across the brain. ↩︎
- A consulting firm founded by a former member of the lab. ↩︎
- That’s an understatement – the process was actually quite involved. ↩︎
- A lot, but see below. ↩︎
- And he does it on top of all the other things a postdoc does! ↩︎