Guests
Eva Maxfield Brown | Boris Veytsman
Panelist
Richard Littauer
Show Notes
In this episode of Sustain, host Richard Littauer engages with guests Eva Maxfield Brown and Boris Veytsman to explore their co-authored paper, "Biomedical Open Source Software: Crucial Packages and Hidden Heroes." The paper focuses on identifying crucial but often overlooked software dependencies in biomedical research. The discussions delve into how the study used data from two million papers to map these dependencies, revealing both well-supported and undermaintained software components vital to scientific research. There’s a conversation on the methodological challenges and the concept of "Nebraska packages," which are essential yet potentially undermaintained elements crucial to the software stack used in both industry and science. The conversation also covers broader implications for software sustainability, security, and future research directions, including improving how software contributions are tracked and recognized within scientific careers. Press download now to hear more!
[00:01:47] Richard dives into the paper co-authored by Eva and Boris. Boris explains the origins of the paper, starting from a workshop at CZI aimed at accelerating science through sustainable software, leading to the analysis of software used in biomedical research. He highlights the focus on identifying crucial yet often unmentioned software dependencies in research software, which he labels as “unsung heroes.”
[00:05:22] Boris provides findings from their study, noting that while many foundational packages were cited, there are significant packages that, despite their critical role, remain uncited.
[00:06:43] Eva discusses the concept of “Nebraska packages,” which are essential yet potentially undermaintained components that are crucial to the software stack used in both industry and science. Also, she elaborates on the methodological challenges of determining which packages to include in their analysis, particularly in terms of dependencies that vary between different users and contexts.
[00:09:42] Richard reflects on the broader implications of their discussion for the open source community, particularly in terms of software sustainability and security. Eva emphasizes the importance of security across all fields and discusses the potential impact of software bugs on scientific research and the need for robust software infrastructure.
[00:12:04] Boris comments on the necessity of well-tested tools in the scientific community, given that many scientists may lack a strong background in software development and training.
[00:13:47] Richard quotes from the paper discussing the absence of cycles in the network of software packages used in science, indicating a more robust design compared to general software. He questions this in light of earlier comments about scientists not being great at coding.
[00:14:08] Eva explains that the paper’s findings about acyclic dependencies (DAGs) might seem surprising given the common perception that scientific software is poorly developed. She notes that while scientists may not be trained in proper software packaging, the Python environment helps prevent cyclic dependencies.
[00:17:31] Richard brings up “Katz centrality” which is discussed in the paper, and Boris clarifies that “Katz centrality” refers to a concept by Leo Katz on network centrality, explaining how it helps determine the importance of nodes within a network.
[00:20:13] Richard questions the practical applications of the research findings, probing for advice on supporting crucial but underrecognized dependencies within software ecosystems. Eva addresses future research directions, including improving ecosystem matching algorithms for better accuracy in linking software mentions to the correct ecosystems.
[00:22:50] Eva suggests expanding the research to cover more domains beyond biomedicine, considering different software needs across various scientific disciplines. Boris discusses the potential for targeted interventions to support underrecognized contributors in the scientific software community aiming to enhance their prestige.
[00:27:22] Richard asks how the research team plans to map dependencies to individual contributors and track their motivations. Boris responds that while they have gathered substantial data from sources like GitHub logs, publishing this information poses ethical challenges due to privacy concerns.
[00:28:45] Eva discusses her work on linking GitHub profiles to academic authors using ORCID identifiers to better track contributions to scientific software.
[00:31:42] Richard brings up the broader impacts of their research, questioning whether their study on software packages centrality within the scientific community is unique or if there are similar studies at this scale. Eva acknowledges the need for more comprehensive studies and cites a previous study from 2015 that analyzed developer networks on GitHub. Boris adds that while there is extensive literature on scientific citation networks, the study of dependencies is less explored.
[00:34:38] Find out where you can follow Boris and Eva’s work and social medias online.
Spotlight
[00:37:06] Richard’s spotlight is Deirdre Madeleine Smith.
[00:37:29] Eva’s spotlight is Talley Lambert.
[00:38:02] Boris’s spotlight is the CZI Collaborators.
Links
SustainOSS (https://sustainoss.org/)
SustainOSS Twitter (https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor)
SustainOSS Discourse (https://discourse.sustainoss.org/)
[email protected] (mailto:[email protected])
SustainOSS Mastodon (https://mastodon.social/tags/sustainoss)
Open Collective-SustainOSS (Contribute) (https://opencollective.com/sustainoss)
Richard Littauer Socials (https://www.burntfen.com/2023-05-30/socials)
Eva Maxfield Brown X/Twitter (https://x.com/evamaxfieldb)
Eva Maxfield Brown Website (https://evamaxfield.github.io/)
Eva Maxfield Brown GitHub (https://github.com/evamaxfield)
Boris Veytsman X/Twitter (https://x.com/BorisVeytsman?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor)
Boris Veytsman Mastodon (https://sfba.social/@borisveytsman)
Boris Veytsman LinkedIn (https://www.linkedin.com/in/boris-veytsman-50a1162/)
Chan Zuckerberg Initiative (CTI) (https://chanzuckerberg.com/)
“Biomedical Open Source Software : Crucial Packages and Hidden Heroes” (arXiv) (https://arxiv.org/pdf/2404.06672)
“A large dataset of software mentions in the biomedical literature” (arXiv) (https://arxiv.org/abs/2209.00693)
xkcd Dependency comic 2347 (https://xkcd.com/2347/)
Dataset Artefacts are the Hidden Drivers of the Declining Disruptiveness in Science (arXiv) (https://arxiv.org/abs/2402.14583)
Directed acyclic graph (DAG) (https://en.wikipedia.org/wiki/Directed_acyclic_graph)
Katz centrality (https://en.wikipedia.org/wiki/Katz_centrality)
Sustain Podcast-Episode 136: Daniel S. Katz on The Research Software Alliance (https://podcast.sustainoss.org/guests/katz)
Sustain Podcast-Episode 159: Dawn Foster & Andrew Nesbitt at State of Open Con 2023 (https://podcast.sustainoss.org/guests/nesbitt)
Sustain Podcast-Episode 218: Karthik Ram & James Howison on Research Software Visibility Infrastructure Priorities (https://podcast.sustainoss.org/guests/james-howison)
ORCID (https://orcid.org/)
Mapping the Impact of Research Software in Science- A CZI Hackathon (https://github.com/chanzuckerberg/software-impact-hackathon-2023)
Deirdre Smith Academia (https://pitt.academia.edu/DeirdreSmith)
Talley Lambert GitHub (https://github.com/tlambert03)
Credits
Produced by Richard Littauer (https://www.burntfen.com/)
Edited by Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/)
Show notes by DeAnn Bahr Peachtree Sound (https://www.peachtreesound.com/) Special Guests: Boris Veytsman and Eva Maxfield Brown.

Guests

Eva Maxfield Brown | Boris Veytsman

Panelist

Richard Littauer

Show Notes

In this episode of Sustain, host Richard Littauer engages with guests Eva Maxfield Brown and Boris Veytsman to explore their co-authored paper, "Biomedical Open Source Software: Crucial Packages and Hidden Heroes." The paper focuses on identifying crucial but often overlooked software dependencies in biomedical research. The discussions delve into how the study used data from two million papers to map these dependencies, revealing both well-supported and undermaintained software components vital to scientific research. There’s a conversation on the methodological challenges and the concept of "Nebraska packages," which are essential yet potentially undermaintained elements crucial to the software stack used in both industry and science. The conversation also covers broader implications for software sustainability, security, and future research directions, including improving how software contributions are tracked and recognized within scientific careers. Press download now to hear more!

[00:01:47] Richard dives into the paper co-authored by Eva and Boris. Boris explains the origins of the paper, starting from a workshop at CZI aimed at accelerating science through sustainable software, leading to the analysis of software used in biomedical research. He highlights the focus on identifying crucial yet often unmentioned software dependencies in research software, which he labels as “unsung heroes.”

[00:05:22] Boris provides findings from their study, noting that while many foundational packages were cited, there are significant packages that, despite their critical role, remain uncited.

[00:06:43] Eva discusses the concept of “Nebraska packages,” which are essential yet potentially undermaintained components that are crucial to the software stack used in both industry and science. Also, she elaborates on the methodological challenges of determining which packages to include in their analysis, particularly in terms of dependencies that vary between different users and contexts.

[00:09:42] Richard reflects on the broader implications of their discussion for the open source community, particularly in terms of software sustainability and security. Eva emphasizes the importance of security across all fields and discusses the potential impact of software bugs on scientific research and the need for robust software infrastructure.

[00:12:04] Boris comments on the necessity of well-tested tools in the scientific community, given that many scientists may lack a strong background in software development and training.

[00:13:47] Richard quotes from the paper discussing the absence of cycles in the network of software packages used in science, indicating a more robust design compared to general software. He questions this in light of earlier comments about scientists not being great at coding.

[00:14:08] Eva explains that the paper’s findings about acyclic dependencies (DAGs) might seem surprising given the common perception that scientific software is poorly developed. She notes that while scientists may not be trained in proper software packaging, the Python environment helps prevent cyclic dependencies.

[00:17:31] Richard brings up “Katz centrality” which is discussed in the paper, and Boris clarifies that “Katz centrality” refers to a concept by Leo Katz on network centrality, explaining how it helps determine the importance of nodes within a network.

[00:20:13] Richard questions the practical applications of the research findings, probing for advice on supporting crucial but underrecognized dependencies within software ecosystems. Eva addresses future research directions, including improving ecosystem matching algorithms for better accuracy in linking software mentions to the correct ecosystems.

[00:22:50] Eva suggests expanding the research to cover more domains beyond biomedicine, considering different software needs across various scientific disciplines. Boris discusses the potential for targeted interventions to support underrecognized contributors in the scientific software community aiming to enhance their prestige.

[00:27:22] Richard asks how the research team plans to map dependencies to individual contributors and track their motivations. Boris responds that while they have gathered substantial data from sources like GitHub logs, publishing this information poses ethical challenges due to privacy concerns.

[00:28:45] Eva discusses her work on linking GitHub profiles to academic authors using ORCID identifiers to better track contributions to scientific software.

[00:31:42] Richard brings up the broader impacts of their research, questioning whether their study on software packages centrality within the scientific community is unique or if there are similar studies at this scale. Eva acknowledges the need for more comprehensive studies and cites a previous study from 2015 that analyzed developer networks on GitHub. Boris adds that while there is extensive literature on scientific citation networks, the study of dependencies is less explored.

[00:34:38] Find out where you can follow Boris and Eva’s work and social medias online.

Spotlight

[00:37:06] Richard’s spotlight is Deirdre Madeleine Smith.
[00:37:29] Eva’s spotlight is Talley Lambert.
[00:38:02] Boris’s spotlight is the CZI Collaborators.

Links

SustainOSS
SustainOSS Twitter
SustainOSS Discourse
[email protected]
SustainOSS Mastodon
Open Collective-SustainOSS (Contribute)
Richard Littauer Socials
Eva Maxfield Brown X/Twitter
Eva Maxfield Brown Website
Eva Maxfield Brown GitHub
Boris Veytsman X/Twitter
Boris Veytsman Mastodon
Boris Veytsman LinkedIn
Chan Zuckerberg Initiative (CTI)
“Biomedical Open Source Software : Crucial Packages and Hidden Heroes” (arXiv)
“A large dataset of software mentions in the biomedical literature” (arXiv)
xkcd Dependency comic 2347
Dataset Artefacts are the Hidden Drivers of the Declining Disruptiveness in Science (arXiv)
Directed acyclic graph (DAG)
Katz centrality
Sustain Podcast-Episode 136: Daniel S. Katz on The Research Software Alliance
Sustain Podcast-Episode 159: Dawn Foster & Andrew Nesbitt at State of Open Con 2023
Sustain Podcast-Episode 218: Karthik Ram & James Howison on Research Software Visibility Infrastructure Priorities
ORCID
Mapping the Impact of Research Software in Science- A CZI Hackathon
Deirdre Smith Academia
Talley Lambert GitHub

Credits

Produced by Richard Littauer
Edited by Paul M. Bahr at Peachtree Sound
Show notes by DeAnn Bahr Peachtree Sound

Special Guests: Boris Veytsman and Eva Maxfield Brown.

Support Sustain

Twitter Mentions