Selected Publications
[P] = peer-reviewed
Calling Bullshit: The Art of Skepticism in a Data-Driven World (2020)
Random House.
isbn: 0525509186
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
The world is awash in bullshit, and we're drowning in it. Politicians are unconstrained by facts. Science is conducted by press release. Startup culture elevates bullshit to high art. These days, calling bullshit is a noble act. Based on Carl Bergstrom and Jevin West's popular course at the University of Washington, Calling Bullshit is a modern handbook to the art of skepticism. Bergstrom, a computational biologist, and West, an information scientist, catalogue bullshit in its many forms, explaining and offering readers the tools to see through the obfuscations, deliberate and careless, that dominate every realm of our lives. In this lively guide to everything from misleading statistics to "fake news," Bergstrom and West help you recognize bullshit whenever and wherever you encounter it--in data, in conversation, even within yourself--and explain it to your crystal-loving aunt or casually racist uncle. Now more than ever, calling bullshit is crucial to a properly functioning community, whether it be a circle of friends, a network of academics, or the citizenry of a nation.
⊖ Bibtex
@book{callingbullshit2020,
author = {Bergstrom, Carl T and West, Jevin D},
title = {Calling Bullshit: The Art of Skepticism in a Data-Driven World},
publisher = {Random House},
edition ={1},
year = {2020},
isbn = {0525509186}}
author = {Bergstrom, Carl T and West, Jevin D},
title = {Calling Bullshit: The Art of Skepticism in a Data-Driven World},
publisher = {Random House},
edition ={1},
year = {2020},
isbn = {0525509186}}
Combining interventions to reduce the spread of viral misinformation (2022)
[P] Nature Human Behaviour.
June 23: 1-9
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Misinformation online poses a range of threats, from subverting democratic processes to undermining public health measures. Proposed solutions range from encouraging more selective sharing by individuals to removing false content and accounts that create or promote it. Here we provide a framework to evaluate interventions aimed at reducing viral misinformation online both in isolation and when used in combination. We begin by deriving a generative model of viral misinformation spread, inspired by research on infectious disease. By applying this model to a large corpus (10.5 million tweets) of misinformation events that occurred during the 2020 US election, we reveal that commonly proposed interventions are unlikely to be effective in isolation. However, our framework demonstrates that a combined approach can achieve a substantial reduction in the prevalence of misinformation. Our results highlight a practical path forward as misinformation online continues to threaten vaccination efforts, equity and democratic processes around the globe.
⊖ Bibtex
@ARTICLE{Bak-Coleman2022NatureHumanBehaviour,
author = {Bak-Coleman, Joseph B and Kennedy, Ian and Wack, Morgan and Beers, Andrew and Schafer, Joseph S and Spiro, Emma S and Starbird, Kate and West, Jevin D},
title = {Combining interventions to reduce the spread of viral misinformation},
journal = {Nature Human Behaviour},
volume={6},
number={10},
pages={1372--1380},
year={2022},
publisher={Nature Publishing Group},
doi = {https://doi.org/10.1038/s41562-022-01388-6}}
author = {Bak-Coleman, Joseph B and Kennedy, Ian and Wack, Morgan and Beers, Andrew and Schafer, Joseph S and Spiro, Emma S and Starbird, Kate and West, Jevin D},
title = {Combining interventions to reduce the spread of viral misinformation},
journal = {Nature Human Behaviour},
volume={6},
number={10},
pages={1372--1380},
year={2022},
publisher={Nature Publishing Group},
doi = {https://doi.org/10.1038/s41562-022-01388-6}}
Modest interventions complement each other in reducing misinformation (2022)
Nature Human Behaviour.
June 23
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Proposals to fight online misinformation range from gently encouraging users to consider the accuracy of information (‘nudges’) to bans and removing content. Using modelling techniques, we find that these interventions are unlikely to be effective in isolation, but that a combined approach can achieve a significant reduction in the spread of misinformation.
⊖ Bibtex
@ARTICLE{Bak-Coleman2022NatureHumanBehaviour-perspective,
author = {Bak-Coleman, Joseph B and West, Jevin D},
title = {Modest interventions complement each other in reducing misinformation},
journal = {Nature Human Behaviour},
volume = {June 23},
doi = {https://doi.org/10.1038/s41562-022-01389-5},
year = {2022}}
author = {Bak-Coleman, Joseph B and West, Jevin D},
title = {Modest interventions complement each other in reducing misinformation},
journal = {Nature Human Behaviour},
volume = {June 23},
doi = {https://doi.org/10.1038/s41562-022-01389-5},
year = {2022}}
Misinformation in and about science (2021)
[P] Proceedings of the National Academies of Sciences.
118(15): 1-8
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Humans learn about the world by collectively acquiring information, filtering it, and sharing what we know. Misinformation undermines this process. The repercussions are extensive. Without reliable and accurate sources of information, we cannot hope to halt climate change, make reasoned democratic decisions, or control a global pandemic. Most analyses of misinformation focus on popular and social media, but the scientific enterprise faces a parallel set of problems-from hype and hyperbole to publication bias and citation misdirection, predatory publishing, and filter bubbles. In this perspective, we highlight these parallels and discuss future research directions and interventions.
⊖ Bibtex
@article{West2021pnas,
title={Misinformation in and about science},
author={West, Jevin D and Bergstrom, Carl T},
journal={Proceedings of the National Academies of Sciences},
volume={118},
number={15},
pages = {1-8},
doi={https://www.pnas.org/content/118/15/e1912444117},
year={2021}}
title={Misinformation in and about science},
author={West, Jevin D and Bergstrom, Carl T},
journal={Proceedings of the National Academies of Sciences},
volume={118},
number={15},
pages = {1-8},
doi={https://www.pnas.org/content/118/15/e1912444117},
year={2021}}
Echo Chambers in the Age of Algorithms: An Audit of Twitter’s Friend Recommender System (2024)
[P] Proceedings of the 16th ACM Web Science Conference.
pgs: 11-21
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The presence of political misinformation and ideological echo chambers on social media platforms is concerning given the important role that these sites play in the public’s exposure to news and current events. Algorithmic systems employed on these platforms are presumed to play a role in these phenomena, but little is known about their mechanisms and effects. In this work, we conduct an algorithmic audit of Twitter’s Who-To-Follow friend recommendation system, the first empirical audit that investigates the impact of this algorithm in-situ. We create automated Twitter accounts that initially follow left and right affiliated U.S. politicians during the 2022 U.S. midterm elections and then grow their information networks using the platform’s recommender system. We pair the experiment with an observational study of Twitter users who already follow the same politicians. Broadly, we find that while following the recommendation algorithm leads accounts into dense and reciprocal neighborhoods that structurally resemble echo chambers, the recommender also results in less political homogeneity of a user’s network compared to accounts growing their networks through social endorsement. Furthermore, accounts that exclusively followed users recommended by the algorithm had fewer opportunities to encounter content centered on false or misleading election narratives compared to choosing friends based on social endorsement.
⊖ Bibtex
@inproceedings{Duskin2024websci,
title={Echo Chambers in the Age of Algorithms: An Audit of Twitter’s Friend Recommender System},
author={Duskin, Kayla and Schafer, Joseph S and West, Jevin D and Spiro, Emma S},
booktitle={Proceedings of the 16th ACM Web Science Conference},
pages={11--21},
year={2024}}}
title={Echo Chambers in the Age of Algorithms: An Audit of Twitter’s Friend Recommender System},
author={Duskin, Kayla and Schafer, Joseph S and West, Jevin D and Spiro, Emma S},
booktitle={Proceedings of the 16th ACM Web Science Conference},
pages={11--21},
year={2024}}}
Selective and deceptive citation in the construction of dueling consensuses (2023)
[P] Science Advances
9(38): eadh1933
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The COVID-19 pandemic provides a unique opportunity to study science communication and, in particular, the transmission of consensus. In this study, we show how “science communicators,” writ large to include both mainstream science journalists and practiced conspiracy theorists, transform scientific evidence into two dueling consensuses using the effectiveness of masks as a case study. We do this by compiling one of the largest, hand-coded citation datasets of cross-medium science communication, derived from 5 million Twitter posts of people discussing masks. We find that science communicators selectively uplift certain published works while denigrating others to create bodies of evidence that support and oppose masks, respectively. Anti-mask communicators in particular often use selective and deceptive quotation of scientific work and criticize opposing science more than pro-mask communicators. Our findings have implications for scientists, science communicators, and scientific publishers, whose systems of sharing (and correcting) knowledge are highly vulnerable to what we term adversarial science communication. A large dataset of Twitter arguments about masks is used to show how consensus is formed in the public eye.
⊖ Bibtex
@ARTICLE{Beers2023scienceadvances,
author = {Beers, Andrew and Nguyễn, Sarah and Starbird, Kate and West, Jevin D and Spiro, Emma},
title = {Selective and deceptive citation in the construction of dueling consensuses},
journal = {Science Advances},
volume = {9},
number = {38},
pages = {eadh1933},
doi = {10.1126/sciadv.adh1933},
year = {2023}}
author = {Beers, Andrew and Nguyễn, Sarah and Starbird, Kate and West, Jevin D and Spiro, Emma},
title = {Selective and deceptive citation in the construction of dueling consensuses},
journal = {Science Advances},
volume = {9},
number = {38},
pages = {eadh1933},
doi = {10.1126/sciadv.adh1933},
year = {2023}}
Understanding and addressing misinformation about science (2024)
[P] National Academies of Sciences, Engineering, and Medicine (NASEM) Consensus Report.
Editors: Tiffany E. Taylor, Holly G. Rhodes, Kasisomayajula Viswanath
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
Our current information ecosystem makes it easier for misinformation about science to spread and harder for people to figure out what is scientifically accurate. Proactive solutions are needed to address misinformation about science, an issue of public concern given its potential to cause harm at individual, community, and societal levels. Improving access to high-quality scientific information can fill information voids that exist for topics of interest to people, reducing the likelihood of exposure to and uptake of misinformation about science. Misinformation is commonly perceived as a matter of bad actors maliciously misleading the public, but misinformation about science arises both intentionally and inadvertently and from a wide range of sources. Understanding and Addressing Misinformation About Science characterizes the nature, scope, and impacts of this phenomenon, and provides guidance on interventions, policies, and future research. This report is a comprehensive assessment of the available evidence and reflects a systems view of the problem given the broader historical and contemporary contexts that shape the lived experiences of people and their relationships to information. The report aims to illuminate the impacts of misinformation about science and potential solutions across a diversity of individual peoples, communities, and societies.
⊖ Bibtex
@incollection{nasem2024misinformation,
title={Understanding and addressing misinformation about science},
author={Kasisomayajula Viswanath and Nick Allum and Nadine Barrett and David A. Broniatowski and Afua A.N. Bruce and Lisa K. Fazio and Lauren Feldman and Deen Freelon and Asheley R. Landrum and David M. J. Lazer and Ezra M. Markowitz and Pamela C. Ronald and David Scales and Brian G. Southwell and Jevin D. West},
booktitle={National Academies of Sciences, Engineering, and Medicine Consensus Reports},
editor={Tiffany E. Taylor and Holly G. Rhodes and Kasisomayajula Viswanath},
year={2024}}
title={Understanding and addressing misinformation about science},
author={Kasisomayajula Viswanath and Nick Allum and Nadine Barrett and David A. Broniatowski and Afua A.N. Bruce and Lisa K. Fazio and Lauren Feldman and Deen Freelon and Asheley R. Landrum and David M. J. Lazer and Ezra M. Markowitz and Pamela C. Ronald and David Scales and Brian G. Southwell and Jevin D. West},
booktitle={National Academies of Sciences, Engineering, and Medicine Consensus Reports},
editor={Tiffany E. Taylor and Holly G. Rhodes and Kasisomayajula Viswanath},
year={2024}}
Response: Emergent analogical reasoning in large language models (2023)
arXiv:2308.16118
(in review)
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In their recent Nature Human Behaviour paper, "Emergent analogical reasoning in large language models," (Webb, Holyoak, and Lu, 2023) the authors argue that "large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems." In this response, we provide counterexamples of the letter string analogies. In our tests, GPT-3 fails to solve even the easiest variants of the problems presented in the original paper. Zero-shot reasoning is an extraordinary claim that requires extraordinary evidence. We do not see that evidence in our experiments. To strengthen claims of humanlike reasoning such as zero-shot reasoning, it is important that the field develop approaches that rule out data memorization.
⊖ Bibtex
@ARTICLE{Hodel2023analogicalreasoning,
author = {Hodel, Damian and West, Jevin D},
title = {Response: Emergent analogical reasoning in large language models},
journal = {arXiv},
volume = {2308.16118},
doi = {https://doi.org/10.48550/arXiv.2308.16118},
year = {2023}}
author = {Hodel, Damian and West, Jevin D},
title = {Response: Emergent analogical reasoning in large language models},
journal = {arXiv},
volume = {2308.16118},
doi = {https://doi.org/10.48550/arXiv.2308.16118},
year = {2023}}
Auditing Google's Search Headlines as a Potential Gateway to Misleading Content: Evidence from the 2020 US Election (2022)
[P] Journal of Online Trust and Safety.
1(4):1-32
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The prevalence and spread of online misinformation during the 2020 US presidential election served to perpetuate a false belief in widespread election fraud. Though much research has focused on how social media platforms connected people to election-related rumors and conspiracy theories, less is known about the search engine pathways that linked users to news content with the potential to undermine trust in elections. In this paper, we present novel data related to the content of political headlines during the 2020 US election period. We scraped over 800,000 headlines from Google's search engine results pages (SERP) in response to 20 election-related keywords—10 general (e.g., "Ballots") and 10 conspiratorial (e.g., "Voter fraud")—when searched from 20 cities across 16 states. We present results from qualitative coding of 5,600 headlines focused on the prevalence of delegitimizing information. Our results reveal that videos (as compared to stories, search results, and advertisements) are the most problematic in terms of exposing users to delegitimizing headlines. We also illustrate how headline content varies when searching from a swing state, adopting a conspiratorial search keyword, or reading from media domains with higher political bias. We conclude with policy recommendations on data transparency that allow researchers to continue to monitor search engines during elections.
⊖ Bibtex
@ARTICLE{Zade2022JTrustOnline,
author = {Zade, Himanshu and Wach, Morgan and Zhang, Yuanrui and Starbird, Kate and Calo, Ryan and Young, Jason and West, Jevin D},
title = {Auditing Google's Search Headlines as a Potential Gateway to Misleading Content: Evidence from the 2020 US Election},
journal = {Journal of Online Trust and Safety},
volume = {1},
number = {4},
pages = {1-32},
doi = {https://doi.org/10.54501/jots.v1i4.72},
year = {2022}}
author = {Zade, Himanshu and Wach, Morgan and Zhang, Yuanrui and Starbird, Kate and Calo, Ryan and Young, Jason and West, Jevin D},
title = {Auditing Google's Search Headlines as a Potential Gateway to Misleading Content: Evidence from the 2020 US Election},
journal = {Journal of Online Trust and Safety},
volume = {1},
number = {4},
pages = {1-32},
doi = {https://doi.org/10.54501/jots.v1i4.72},
year = {2022}}
Replication does not reliably measure scientific productivity (2022)
SocArxiv.
May 12
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In this perspective, we address three key distinctions for research and policy about misinformation: the distinction between misinformation/disinformation, speech/action, and mistaken belief/conviction.
⊖ Bibtex
@ARTICLE{Bak-Coleman2022SocArxiv,
author = {Joseph Bak-Coleman and Richard Mann and Carl T Bergstrom and Jevin D. West},
title = {Replication does not reliably measure scientific productivity},
journal = {SocArxiv},
volume = {May 12},
doi = {10.31235/osf.io/rkyf7},
year = {2022}}
author = {Joseph Bak-Coleman and Richard Mann and Carl T Bergstrom and Jevin D. West},
title = {Replication does not reliably measure scientific productivity},
journal = {SocArxiv},
volume = {May 12},
doi = {10.31235/osf.io/rkyf7},
year = {2022}}
How publishers can fight misinformation in and about science (2023)
Nature Medicine.
July 7
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Misinformation and disinformation about science and medicine have reached crisis proportions and cause harm on a massive scale1. This includes misinformation about science, such as when television personalities or social media accounts spread anti-vaccine propaganda or push ineffective dietary supplements, as well as misinformation in science, such as when claims appear in scholarly journals that are incautious, deceptive or even fraudulent...
⊖ Bibtex
@ARTICLE{Bergstrom2023NatureMedicine,
author = {Bergstrom, Carl T. and West, Jevin D.},
title = {How publishers can fight misinformation in and about science},
journal = {Nature Medicine},
volume = {July 7},
doi = {https://doi.org/10.1038/s41591-023-02411-7},
year = {2023}}
author = {Bergstrom, Carl T. and West, Jevin D.},
title = {How publishers can fight misinformation in and about science},
journal = {Nature Medicine},
volume = {July 7},
doi = {https://doi.org/10.1038/s41591-023-02411-7},
year = {2023}}
How do you solve a problem like misinformation? (2021)
Science Advances.
7(50): eabn0481
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In this perspective, we address three key distinctions for research and policy about misinformation: the distinction between misinformation/disinformation, speech/action, and mistaken belief/conviction.
⊖ Bibtex
@ARTICLE{Calo2021ScienceAdvances,
author = {Calo, Ryan and Coward, Chris and Spiro, Emma and Starbird, Kate and West, Jevin D},
title = {How do you solve a problem like misinformation?},
journal = {Science Advances},
volume = {7},
number = {50},
doi = {https://doi.org/10.1126/sciadv.abn0481},
year = {2021}}
author = {Calo, Ryan and Coward, Chris and Spiro, Emma and Starbird, Kate and West, Jevin D},
title = {How do you solve a problem like misinformation?},
journal = {Science Advances},
volume = {7},
number = {50},
doi = {https://doi.org/10.1126/sciadv.abn0481},
year = {2021}}
Suspensions of prominent accounts minimally impact platform engagement (2023)
SocArxiv.
(in review)
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Health-related misinformation online poses threats to individual well-being and undermines public health efforts. In response, many social media platforms have temporarily or permanently suspended accounts that spread misinformation, at the risk of losing traffic vital to platform revenue. Here we examine the impact on platform engagement following removal of six prominent accounts during the COVID-19 pandemic. Focused on those who engaged with the removed accounts, we find that suspension did not meaningfully reduce activity on the platform. Moreover, we find that removal of the prominent accounts minimally impacted the diversity of information sources consumed.
⊖ Bibtex
@ARTICLE{Duskin2023suspendedusers,
author = {Kayla Duskin and Jevin D West and Joseph Bak-Coleman},
title = {Suspension of prominent accounts minimally impacts platform engagement},
journal = {SocArxiv},
volume = {(in review)},
year = {2023}}
author = {Kayla Duskin and Jevin D West and Joseph Bak-Coleman},
title = {Suspension of prominent accounts minimally impacts platform engagement},
journal = {SocArxiv},
volume = {(in review)},
year = {2023}}
The chatbot era: Better or worse off? (2023)
Seattle Times.
March 31
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
AAre we better off because of penicillin? Yes. The internet? Probably. Social media? Probably not. So what about chatbots? The chatbot craze has captured the world’s attention, and massive piles of money. Chatbots are software programs that use artificial intelligence to process and simulate conversations with humans. Will they improve human experience and longevity, peace and prosperity, environmental health, productivity or social well-being? From my perspective, as a researcher who studies misinformation and its effects on society, chatbots will be vectors of propaganda, they will make it harder to discern truth, and they will further erode trust in our institutions. I see two main reasons for this: They are bullshitters at scale, and they are difficult, if not impossible, to reverse engineer.
⊖ Bibtex
@ARTICLE{West2023SeattleTimes,
author = {West, Jevin D},
title = {The chatbot era: Better or worse off?},
journal = {Seattle Times},
volume = {March 31},
year = {2023}}
author = {West, Jevin D},
title = {The chatbot era: Better or worse off?},
journal = {Seattle Times},
volume = {March 31},
year = {2023}}
The Past 110 Years: Historical Data on the Underrepresentation of Women in Philosophy Journals (2022)
[P] Ethics.
132(3): 680-729
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
This article provides the first large-scale, longitudinal study examining publication rates by gender in philosophy journals. We find that from 1900 to 1990 the proportion of women authorships in philosophy increased, but it has plateaued since the 1990s (unlike in other disciplines). Top Philosophy journals publish the lowest proportion of women, and anonymous review does not increase the proportion publishing in these journals (though it does in other journals). Value Theory journals do not publish articles by women in proportion to their presence in the subdiscipline. Although the proportion of women authorships in philosophy has increased over time, measurable disparities persist.
⊖ Bibtex
@article{Hassoun2022Ethics,
title={The Past 110 Years: Historical Data on the Underrepresentation of Women in Philosophy Journals},
author={Nicole Hassoun and Sherri Conklin and Michael Nekrasov and Jevin D. West},
journal={Ethics},
volume = {132},
number = {3},
pages = {680-729},
publisher = {The University of Chicago Press Chicago, IL},
year={2022}}
title={The Past 110 Years: Historical Data on the Underrepresentation of Women in Philosophy Journals},
author={Nicole Hassoun and Sherri Conklin and Michael Nekrasov and Jevin D. West},
journal={Ethics},
volume = {132},
number = {3},
pages = {680-729},
publisher = {The University of Chicago Press Chicago, IL},
year={2022}}
Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery (2022)
[P] Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.
April 29
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Isolated silos of scientific research and the growing challenge of information overload limit awareness across the literature and hinder innovation. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these informational “filter bubbles.” In response, we describe Bridger, a system for facilitating discovery of scholars and their work. We construct a faceted representation of authors with information gleaned from their papers and inferred author personas, and use it to develop an approach that locates commonalities and contrasts between scientists to balance relevance and novelty. In studies with computer science researchers, this approach helps users discover authors considered useful for generating novel research directions. We also demonstrate an approach for displaying information about authors, boosting the ability to understand the work of new, unfamiliar scholars. Our analysis reveals that Bridger connects authors who have different citation profiles and publish in different venues, raising the prospect of bridging diverse scientific communities.
⊖ Bibtex
@ARTICLE{Portenoy2022CHI,
author = {Jason Portenoy and Marissa Radensk and Jevin D. West and Eric Horvitz and Daniel S. Weld and Tom Hope},
title = {Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery},
journal = {Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems},
volume={April 29},
year={2022},
doi = {10.1145/3491102.3501905}}
author = {Jason Portenoy and Marissa Radensk and Jevin D. West and Eric Horvitz and Daniel S. Weld and Tom Hope},
title = {Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery},
journal = {Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems},
volume={April 29},
year={2022},
doi = {10.1145/3491102.3501905}}
Perceived experts are prevalent and influential within an antivaccine community on Twitter (2024)
[P] Proceedings of the National Academies of Sciences (PNAS) Nexus
3(2): pgae007
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Perceived experts (i.e. medical professionals and biomedical scientists) are trusted sources of medical information who are especially effective at encouraging vaccine uptake. The role of perceived experts acting as potential antivaccine influencers has not been characterized systematically. We describe the prevalence and importance of antivaccine perceived experts by constructing a coengagement network of 7,720 accounts based on a Twitter data set containing over 4.2 million posts from April 2021. The coengagement network primarily broke into two large communities that differed in their stance toward COVID-19 vaccines, and misinformation was predominantly shared by the antivaccine community. Perceived experts had a sizable presence across the coengagement network, including within the antivaccine community where they were 9.8\\% of individual, English-language users. Perceived experts within the antivaccine community shared low-quality (misinformation) sources at similar rates and academic sources at higher rates compared to perceived nonexperts in that community. Perceived experts occupied important network positions as central antivaccine users and bridges between the antivaccine and provaccine communities. Using propensity score matching, we found that perceived expertise brought an influence boost, as perceived experts were significantly more likely to receive likes and retweets in both the antivaccine and provaccine communities. There was no significant difference in the magnitude of the influence boost for perceived experts between the two communities. Social media platforms, scientific communications, and biomedical organizations may focus on more systemic interventions to reduce the impact of perceived experts in spreading antivaccine misinformation.
⊖ Bibtex
@article{Harris2024pnasnexus,
author = {Harris, Mallory J and Murtfeldt, Ryan and Wang, Shufan and Mordecai, Erin A and West, Jevin D},
title = {Perceived experts are prevalent and influential within an antivaccine community on Twitter},
journal = {Prceedings of the National Academy of Sciences (PNAS) Nexus},
volume = {3},
number = {2},
pages = {pgae007},
year = {2024},
month = {02},
issn = {2752-6542},
doi = {10.1093/pnasnexus/pgae007},
url = {https://doi.org/10.1093/pnasnexus/pgae007},
eprint = {https://academic.oup.com/pnasnexus/article-pdf/3/2/pgae007/56594128/pgae007.pdf},
year = {2024}}
author = {Harris, Mallory J and Murtfeldt, Ryan and Wang, Shufan and Mordecai, Erin A and West, Jevin D},
title = {Perceived experts are prevalent and influential within an antivaccine community on Twitter},
journal = {Prceedings of the National Academy of Sciences (PNAS) Nexus},
volume = {3},
number = {2},
pages = {pgae007},
year = {2024},
month = {02},
issn = {2752-6542},
doi = {10.1093/pnasnexus/pgae007},
url = {https://doi.org/10.1093/pnasnexus/pgae007},
eprint = {https://academic.oup.com/pnasnexus/article-pdf/3/2/pgae007/56594128/pgae007.pdf},
year = {2024}}
Multi-Agent Systems for Automated Frame Detection (2025)
(in review)
Abstract »
⊖ Abstract
|
Bibtex »
Frame detection is a critical component of web science, offering insights into how users interpret online information. Frames are mental schemas that determine how people interpret information. Frame detection tasks attempt to use linguistic cues to infer what frame is used during communication, but not all frames are visible in text, making frame detection difficult. This paper evaluates various system designs that leverage language models for automated frame detection, benchmarking two labeled framing datasets. We identify a methodology that employs role-prompting across multiple language agents to achieve optimal performance, even when frames are ambiguous or not explicitly stated in the text. Additionally, we incorporate uncertainty quantification techniques into data labeling workflows that significantly improve system efficacy with minimal human intervention for complex or ambiguous tasks, including cases where a communicator's meaning is not immediately clear without background knowledge surrounding the conversation in question. These findings represent a significant step toward scalable, efficient frame detection, contributing to the broader understanding of online discourse.
⊖ Bibtex
@article{farr2025websci,
author = {David Farr and Stephen Prochaska and Kate Starbird and Jevin D. West},
title = {Multi-Agent Systems for Automated Frame Detection},
journal = {ACM Web Science Conference (WebSci)},
volume = {(in review)},
year = {2025}}
author = {David Farr and Stephen Prochaska and Kate Starbird and Jevin D. West},
title = {Multi-Agent Systems for Automated Frame Detection},
journal = {ACM Web Science Conference (WebSci)},
volume = {(in review)},
year = {2025}}
Can LLM-based AI agents automate science communication? (2025)
(in prep)
Abstract »
⊖ Abstract
|
Bibtex »
There is growing interest in automating scientific research and communication using large language models (LLMs). ``AI Scientist,'' a multi-agent LLM-based system that has received considerable media attention, exemplifies this effort. In this commentary, we examine the AI Scientist and its outputs and ask whether LLMs can effectively automate science, with a specific focus on science communication. We highlight issues of misattribution, bias, hallucinations, self-contradictions, and lack of meaningful engagement in its papers and reviews. We further discuss the social, ethical, and epistemological implications of such efforts. Finally, we explore various aspects and directions for responsibly advancing AI-mediated science and communication, including the development of task-guided agents, the establishment of benchmarks, improving scientific communication practices, defining structural standards for AI-generated papers, proposing a dedicated journal for AI scientists, and examining the potential impacts of AI on science, the institutions that support it, and metrics such as citations and scholarly influence.
⊖ Bibtex
@article{Memon2025aiautomatescience,
author = {Shahan Memon and James Koppel and Tom Hope and Jevin D. West},
title = {Can LLM-based AI agents automate science communication?},
journal = {(in prep)},
year = {2025}}
author = {Shahan Memon and James Koppel and Tom Hope and Jevin D. West},
title = {Can LLM-based AI agents automate science communication?},
journal = {(in prep)},
year = {2025}}
Our field was built on decades-old bodies of research across a range of disciplines. It wasn’t invented by a 'class of misinformation experts' in 2016 (2024)
Center for an Informed Public Blog
July 24
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In a recent essay, philosopher Dan Williams provocatively argues that there is no way to advance a “science of misleading content.” He also puts forth his belief that concern over the impact of misinformation is a “moral panic,” asserting that clear-cut cases of false information are so rare in Western democracies that the exposure of a susceptible minority to them has little effect on societal outcomes. Moreover, he claims that misleading information is so difficult to define that systematic study of the problem by researchers is both impossible and undesirable. Williams’ line of critique is familiar, echoing recent pieces that frame the “field” that coalesced following Brexit and the election of Donald Trump in 2016 as a convenient liberal establishment response to the decline of trust in institutions in Western democracies.
⊖ Bibtex
@article{Kharazian2024cip,
author = {Zarine Kharazian and Madeline Jalbert and Saloni Dash and Shahan Ali Memon and Kate Starbird and Emma S. Spiro and Jevin D. West},
title = {Our field was built on decades-old bodies of research across a range of disciplines. It wasn’t invented by a 'class of misinformation experts' in 2016 },
journal = {Center for an Informed Public, Blog},
volume = {January 24},
url = {https://www.cip.uw.edu/2024/01/24/misinformation-field-research/}, year = {2024}}
author = {Zarine Kharazian and Madeline Jalbert and Saloni Dash and Shahan Ali Memon and Kate Starbird and Emma S. Spiro and Jevin D. West},
title = {Our field was built on decades-old bodies of research across a range of disciplines. It wasn’t invented by a 'class of misinformation experts' in 2016 },
journal = {Center for an Informed Public, Blog},
volume = {January 24},
url = {https://www.cip.uw.edu/2024/01/24/misinformation-field-research/}, year = {2024}}
LLM Confidence Evaluation Measures in Zero-Shot CSS Classification (2025)
(in review)
Abstract »
⊖ Abstract
|
Bibtex »
Assessing classification confidence is critical for leveraging large language models (LLMs) in automated labeling tasks, especially in the sensitive domains presented by Computational Social Science (CSS) tasks. In this paper, we make three key contributions: (1) we propose an uncertainty quantification (UQ) performance measure tailored for data annotation tasks, (2) we compare, for the first time, five different UQ strategies across three distinct LLMs and CSS data annotation tasks, (3) we introduce a novel UQ aggregation strategy that effectively identifies low-confidence LLM annotations and disproportionately uncovers data incorrectly labeled by the LLMs. Our results demonstrate that our proposed UQ aggregation strategy improves upon existing methods and can be used to significantly improve human-in-the-loop data annotation processes.
⊖ Bibtex
@ARTICLE{Farr2025NAACL,
author = {David T Farr and Iain Cruickshank and Nico Manzonelli and Nicholas Clark and Kate Starbird and Jevin D. West},
title = {LLM Confidence Evaluation Measures in Zero-Shot CSS Classification.},
journal = {NAACL},
volume={(in review)},
year = {2025}}
author = {David T Farr and Iain Cruickshank and Nico Manzonelli and Nicholas Clark and Kate Starbird and Jevin D. West},
title = {LLM Confidence Evaluation Measures in Zero-Shot CSS Classification.},
journal = {NAACL},
volume={(in review)},
year = {2025}}
RPAM: A Principled Metric for Evaluating Biases in Language Models with High Predictive Validity in Downstream Outputs (2025)
(in review)
Abstract »
⊖ Abstract
|
Bibtex »
Language models (LMs) exhibit social biases, such as stereotypes. Effectively analyzing and mitigating these biases requires accurate and generalizable evaluation methods. Some existing approaches focus on downstream metrics that analyze biases in generated text. Since generated text content can vary drastically across LMs, such metrics often require ad-hoc evaluation datasets, which limits the generalization of such downstream metrics. In contrast, upstream metrics examine LMs at the fundamental level of embeddings or continuation probabilities, enabling principled bias analyses across LMs. Yet, to date, no upstream metric for generative LMs has uncovered a strong relationship with biased behavior in real-world applications, as approximated through real-world biases, including those measured in generated text. To address this gap, we introduce the Relative Probability Association Metric (RPAM), a bias evaluation metric for generative LMs. For three LMs of different quality of language generation and purpose (Mistral-7B-Instruct, Mistral-7B, and GPT-2) and well-studied evaluation datasets (WEAT-WS, Bellezza, WS-353, and SST2), we find a strong relationship between upstream RPAM measurements and corresponding implicit and explicit human biases observed in humans, as well as biases measured downstream with LM-specific tasks, outperforming prior record values where applicable. Our findings help to mitigate the risks and harms associated with biases in LMs by contributing to reliable bias measurements.
⊖ Bibtex
@article{Hodel2025rpam,
author = {Damian Hodel and Jevin D. West and Aylin Caliskan},
title = {RPAM: A Principled Metric for Evaluating Biases in Language Models with High Predictive Validity in Downstream Outputs},
journal = {NAACL},
volume = {(in review)},
year = {2025}}
author = {Damian Hodel and Jevin D. West and Aylin Caliskan},
title = {RPAM: A Principled Metric for Evaluating Biases in Language Models with High Predictive Validity in Downstream Outputs},
journal = {NAACL},
volume = {(in review)},
year = {2025}}
Improving Informational Health in Washington (2024)
Project for Civic Health, Lieutenant Governor Blog Post
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Human communication can be challenging, even in settings that are in-person, non-distractive, and with people we know well. What did she mean by ‘interesting clothing’? Why did he pause so long on the last question? Throw in digital distance, distraction, and divergent perspectives—collateral effects of social media and algorithmically curated content—and no wonder the world sometimes feels like a white noise machine.
⊖ Bibtex
@article{West2024ltgovernorblog,
author = {Jevin D. West},
title = {Improving Informational Health in Washington},
journal = {Project for Civic Health, Lieutenant Governor Blog Post},
year = {2024}}
author = {Jevin D. West},
title = {Improving Informational Health in Washington},
journal = {Project for Civic Health, Lieutenant Governor Blog Post},
year = {2024}}
Search engines post-ChatGPT: How generative artificial intelligence could make search less reliable (2024)
Center for an Informed Public Blog
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.
⊖ Bibtex
@article{West2024ltgovernorblog,
author = {Jevin D. West},
title = {Improving Informational Health in Washington},
journal = {Project for Civic Health, Lieutenant Governor Blog Post},
year = {2024}}
author = {Jevin D. West},
title = {Improving Informational Health in Washington},
journal = {Project for Civic Health, Lieutenant Governor Blog Post},
year = {2024}}
Publications
[P] = peer-reviewed
RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Linguistic Classifiers (2024)
[P] International Conference on Computational Linguistics (COLING)
(in press)
Abstract »
⊖ Abstract
|
Bibtex »
Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.
⊖ Bibtex
@ARTICLE{Farr2024BigData,
author = {David T Farr and Iain Cruickshank and Nico Manzonelli and Jevin D. West},
title = {RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Linguistic Classifiers},
journal = {International Conference on Computational Linguistics (COLING)},
volume={(in press)},
year = {2024}}
author = {David T Farr and Iain Cruickshank and Nico Manzonelli and Jevin D. West},
title = {RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Linguistic Classifiers},
journal = {International Conference on Computational Linguistics (COLING)},
volume={(in press)},
year = {2024}}
LLM Chain Ensembles for Scalable and Accurate Data Annotation (2024)
[P] IEEE Conference on Big Data
(in press)
Abstract »
⊖ Abstract
|
Bibtex »
The ability of large language models (LLMs) to perform zero-shot classification makes them viable solutions for data annotation in rapidly evolving domains where quality labeled data is often scarce and costly to obtain. However, the large-scale deployment of LLMs can be prohibitively expensive. This paper introduces an LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models based on classification uncertainty. This approach leverages the strengths of individual LLMs within a broader system, allowing each model to handle data points where it exhibits the highest confidence, while forwarding more complex cases to potentially more robust models. Our results show that the chain ensemble method often exceeds the performance of the best individual model in the chain and achieves substantial cost savings, making LLM chain ensembles a practical and efficient solution for large-scale data annotation challenges.
⊖ Bibtex
@ARTICLE{Farr2024BigData,
author = {David T Farr and Iain Cruickshank and Nico Manzonelli and Nicholas Clark and Kate Starbird and Jevin D. West},
title = {LLM Chain Ensembles for Scalable and Accurate Data Annotation},
journal = {IEEE International Conference on Big Data},
volume={(in press)},
year = {2024}}
author = {David T Farr and Iain Cruickshank and Nico Manzonelli and Nicholas Clark and Kate Starbird and Jevin D. West},
title = {LLM Chain Ensembles for Scalable and Accurate Data Annotation},
journal = {IEEE International Conference on Big Data},
volume={(in press)},
year = {2024}}
Disagreement as a Way to Study Misinformation and its Effects (2024)
[P] Harvard Kennedy School (HKS) Misinformation Review
(in review)
Abstract »
⊖ Abstract
|
Bibtex »
Misinformation is considered a significant societal concern due to its associated problems like political polarization, erosion of trust, and public health challenges. However, these broad effects can occur independently of misinformation, illustrating a misalignment with the narrow focus of the prevailing misinformation concept. We propose using disagreement—conflicting attitudes and beliefs—as a more effective framework for studying these effects. This approach, for example, reveals the limitations of current misinformation interventions and offers a method to empirically test whether we are living in a post-truth era.
⊖ Bibtex
@ARTICLE{Hodel2024HarvardMisinformationReview,
author = {Hodel, Damian and West, Jevin D.},
title = {Disagreement as a Way to Study Misinformation and its Effects.},
journal = {Harvard Kennedy School (HKS) Misinformation Review},
volume = {(in review)},
year = {2024}}
author = {Hodel, Damian and West, Jevin D.},
title = {Disagreement as a Way to Study Misinformation and its Effects.},
journal = {Harvard Kennedy School (HKS) Misinformation Review},
volume = {(in review)},
year = {2024}}
Content Recommendation on Twitter During the 2022 U.S. Midterm Election (2024)
[P] ACM, The Web Conference
(in review)
Abstract »
⊖ Abstract
|
Bibtex »
Social media platforms shape users' experience through the algorithmic systems they deploy. In this study, we examine to what extent Twitter's content recommender impacts the topic, political bias, and reliability of information served to users during a high-stakes election. We utilize automated accounts to document Twitter's algorithmically curated and reverse chronological timelines throughout the U.S. 2022 midterm election. We find that the algorithmic timeline measurably influences exposure to election content, partisan bias, and the prevalence of low-quality information. Critically, these impacts are mediated by the partisan makeup of one's personal social network, which often exerts greater influence than the algorithm alone. We find that the algorithmic feed decreases the proportion of election content shown to left-leaning accounts, and that it skews content toward right-leaning sources when compared to the reverse chronological feed. We additionally find evidence that the algorithmic system increases the prevalence of election-related rumors for right-leaning users, and has mixed effects on the prevalence of low-quality information sources.
⊖ Bibtex
@ARTICLE{Duskin2024www,
author = {Kayla Duskin and Joseph Schafer and Jevin D West and Emma Spiro},
title = {Content Recommendation on Twitter During the 2022 U.S. Midterm Election},
journal = {ACM Web Conference (WWW)},
volume = {(in review)},
year = {2024}}
author = {Kayla Duskin and Joseph Schafer and Jevin D West and Emma Spiro},
title = {Content Recommendation on Twitter During the 2022 U.S. Midterm Election},
journal = {ACM Web Conference (WWW)},
volume = {(in review)},
year = {2024}}
Science communication with generative AI (2024)
Nature Human Behaviour
8(4):625-627
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Generative AI tools can quickly translate or summarize large volumes of complex information. This technology could revolutionize the way that we communicate science, but there are many reasons for caution. We asked six experts about the potential and pitfalls of generative AI for science communication.
⊖ Bibtex
@ARTICLE{Caliskan2024NatureHumanBehavior,
title={Science communication with generative AI},
author={Alvarez, Amanda and Caliskan, Aylin and Crockett, MJ and Ho, Shirley S and Messeri, Lisa and West, Jevin},
journal={Nature Human Behaviour},
volume={8},
number={4},
pages={625--627},
year={2024},
publisher={Nature Publishing Group UK London}}
title={Science communication with generative AI},
author={Alvarez, Amanda and Caliskan, Aylin and Crockett, MJ and Ho, Shirley S and Messeri, Lisa and West, Jevin},
journal={Nature Human Behaviour},
volume={8},
number={4},
pages={625--627},
year={2024},
publisher={Nature Publishing Group UK London}}
RIP Twitter API: A eulogy to its vast research contributions (2024)
arxiv
:2404.07340
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Since 2006, Twitter's Application Programming Interface (API) has been a treasure trove of high-quality data for researchers studying everything from the spread of misinformation, to social psychology and emergency management. However, in the spring of 2023, Twitter (now called X) began changing $42,000/month for its Enterprise access level, an essential death knell for researcher use. Lacking sufficient funds to pay this monthly fee, academics are now scrambling to continue their research without this important data source. This study collects and tabulates the number of studies, number of citations, dates, major disciplines, and major topic areas of studies that used Twitter data between 2006 and 2023. While we cannot know for certain what will be lost now that Twitter data is cost prohibitive, we can illustrate its research value during the time it was available. A search of 8 databases and 3 related APIs found that since 2006, a total of 27,453 studies have been published in 7,432 publication venues, with 1,303,142 citations, across 14 disciplines. Major disciplines include: computational social science, engineering, data science, social media studies, public health, and medicine. Major topics include: information dissemination, assessing the credibility of tweets, strategies for conducting data research, detecting and analyzing major events, and studying human behavior. Twitter data studies have increased every year since 2006, but following Twitter's decision to begin charging for data in the spring of 2023, the number of studies published in 2023 decreased by 13% compared to 2022. We assume that much of the data used for studies published in 2023 were collected prior to Twitter's shutdown, and thus the number of new studies are likely to decline further in subsequent years.
⊖ Bibtex
@ARTICLE{murtfeldt2024twitterapi,
title={RIP Twitter API: A eulogy to its vast research contributions},
author={Murtfeldt, Ryan and Alterman, Naomi and Kahveci, Ihsan and West, Jevin D},
journal={arXiv preprint arXiv:2404.07340},
year={2024}}
title={RIP Twitter API: A eulogy to its vast research contributions},
author={Murtfeldt, Ryan and Alterman, Naomi and Kahveci, Ihsan and West, Jevin D},
journal={arXiv preprint arXiv:2404.07340},
year={2024}}
Selective and deceptive citation in the construction of dueling consensuses (2023)
[P] Science Advances
9(38): eadh1933
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The COVID-19 pandemic provides a unique opportunity to study science communication and, in particular, the transmission of consensus. In this study, we show how “science communicators,” writ large to include both mainstream science journalists and practiced conspiracy theorists, transform scientific evidence into two dueling consensuses using the effectiveness of masks as a case study. We do this by compiling one of the largest, hand-coded citation datasets of cross-medium science communication, derived from 5 million Twitter posts of people discussing masks. We find that science communicators selectively uplift certain published works while denigrating others to create bodies of evidence that support and oppose masks, respectively. Anti-mask communicators in particular often use selective and deceptive quotation of scientific work and criticize opposing science more than pro-mask communicators. Our findings have implications for scientists, science communicators, and scientific publishers, whose systems of sharing (and correcting) knowledge are highly vulnerable to what we term adversarial science communication. A large dataset of Twitter arguments about masks is used to show how consensus is formed in the public eye.
⊖ Bibtex
@ARTICLE{Beers2023scienceadvances,
author = {Beers, Andrew and Nguyễn, Sarah and Starbird, Kate and West, Jevin D and Spiro, Emma},
title = {Selective and deceptive citation in the construction of dueling consensuses},
journal = {Science Advances},
volume = {9},
number = {38},
pages = {eadh1933},
doi = {10.1126/sciadv.adh1933},
year = {2023}}
author = {Beers, Andrew and Nguyễn, Sarah and Starbird, Kate and West, Jevin D and Spiro, Emma},
title = {Selective and deceptive citation in the construction of dueling consensuses},
journal = {Science Advances},
volume = {9},
number = {38},
pages = {eadh1933},
doi = {10.1126/sciadv.adh1933},
year = {2023}}
Gender-based homophily in collaborations across a heterogeneous scholarly landscape (2023)
PLoS One.
18(4): e0283106
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In this article, we investigate the role of gender in collaboration patterns by analyzing gender-based homophily—the tendency for researchers to co-author with individuals of the same gender. We develop and apply novel methodology to the corpus of JSTOR articles, a broad scholarly landscape, which we analyze at various levels of granularity. Most notably, for a precise analysis of gender homophily, we develop methodology which explicitly accounts for the fact that the data comprises heterogeneous intellectual communities and that not all authorships are exchangeable. In particular, we distinguish three phenomena which may affect the distribution of observed gender homophily in collaborations: a structural component that is due to demographics and non-gendered authorship norms of a scholarly community, a compositional component which is driven by varying gender representation across sub-disciplines and time, and a behavioral component which we define as the remainder of observed gender homophily after its structural and compositional components have been taken into account. Using minimal modeling assumptions, the methodology we develop allows us to test for behavioral homophily. We find that statistically significant behavioral homophily can be detected across the JSTOR corpus and show that this finding is robust to missing gender indicators in our data. In a secondary analysis, we show that the proportion of women representation in a field is positively associated with the probability of finding statistically significant behavioral homophily.
⊖ Bibtex
@ARTICLE{Wang2023plosone,
author = {Wang, Samuel and Lee, Carole and West, Jevin D and Bergstrom, Carl T and Erosheva, Elena},
title = {Gender-based homophily in collaborations across a heterogeneous scholarly landscape},
journal = {PLoS One},
volume = {18},
number = {4},
pages = {e0283106},
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0283106},
doi = {https://doi.org/10.1371/journal.pone.0283106},
year = {2023}}
author = {Wang, Samuel and Lee, Carole and West, Jevin D and Bergstrom, Carl T and Erosheva, Elena},
title = {Gender-based homophily in collaborations across a heterogeneous scholarly landscape},
journal = {PLoS One},
volume = {18},
number = {4},
pages = {e0283106},
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0283106},
doi = {https://doi.org/10.1371/journal.pone.0283106},
year = {2023}}
Where are the Women: The Ethnic Representation of Women Authors in Philosophy Journal by Regional Affiliation and Specialization (2023)
European Journal of Analytic Philosophy.
19(1): 1-46
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Using bibliographic metadata from 177 Philosophy Journals between 1950 and 2020, this article presents new data on the under-representation of women authors in philosophy journals across decades and across four different compounding factors. First, we examine how philosophy fits in comparison to other academic disciplines. Second, we consider how the regional academic context in which Philosophy Journals operate impacts on author gender proportions. Third, we consider how the regional specialization of a journal impacts on author gender proportions. Fourth, and perhaps most interestingly, we consider the impact of author hereditary origin, a proxy for author ethnicity, on gender representation, and we examine the breakdown of author hereditary origin across Philosophy Journals between 1950 and 2020. To our knowledge, this is the first work to offer an estimate for author ethnicity and gender in philosophy publications using a large-scale data set. We find that women authors are underrepresented in Philosophy Journals across time, across disciplines, across the globe, and regardless of ethnicity.
⊖ Bibtex
@ARTICLE{Conklin2023EuJAP,
author = {Sherri Lynn Conklin and Michael Nekrasov and Jevin D. West},
title = {Where are the Women: The Ethnic Representation of Women Authors in Philosophy Journal, by Regional Affiliation and Specialization},
journal = {European Journal of Analytic Philosophy},
volume = {19},
number = {1},
pages = {1-46},
doi = {https://doi.org/10.31820/ejap.19.1.3},
year = {2023}}
author = {Sherri Lynn Conklin and Michael Nekrasov and Jevin D. West},
title = {Where are the Women: The Ethnic Representation of Women Authors in Philosophy Journal, by Regional Affiliation and Specialization},
journal = {European Journal of Analytic Philosophy},
volume = {19},
number = {1},
pages = {1-46},
doi = {https://doi.org/10.31820/ejap.19.1.3},
year = {2023}}
Combining interventions to reduce the spread of viral misinformation (2022)
Nature Human Behaviour.
June 23: 1-9
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Misinformation online poses a range of threats, from subverting democratic processes to undermining public health measures. Proposed solutions range from encouraging more selective sharing by individuals to removing false content and accounts that create or promote it. Here we provide a framework to evaluate interventions aimed at reducing viral misinformation online both in isolation and when used in combination. We begin by deriving a generative model of viral misinformation spread, inspired by research on infectious disease. By applying this model to a large corpus (10.5 million tweets) of misinformation events that occurred during the 2020 US election, we reveal that commonly proposed interventions are unlikely to be effective in isolation. However, our framework demonstrates that a combined approach can achieve a substantial reduction in the prevalence of misinformation. Our results highlight a practical path forward as misinformation online continues to threaten vaccination efforts, equity and democratic processes around the globe.
⊖ Bibtex
@ARTICLE{Bak-Coleman2022NatureHumanBehaviour,
author = {Bak-Coleman, Joseph B and Kennedy, Ian and Wack, Morgan and Beers, Andrew and Schafer, Joseph S and Spiro, Emma S and Starbird, Kate and West, Jevin D},
title = {Combining interventions to reduce the spread of viral misinformation},
journal = {Nature Human Behaviour},
volume={6},
number={10},
pages={1372--1380},
year={2022},
publisher={Nature Publishing Group},
doi = {https://doi.org/10.1038/s41562-022-01388-6}}
author = {Bak-Coleman, Joseph B and Kennedy, Ian and Wack, Morgan and Beers, Andrew and Schafer, Joseph S and Spiro, Emma S and Starbird, Kate and West, Jevin D},
title = {Combining interventions to reduce the spread of viral misinformation},
journal = {Nature Human Behaviour},
volume={6},
number={10},
pages={1372--1380},
year={2022},
publisher={Nature Publishing Group},
doi = {https://doi.org/10.1038/s41562-022-01388-6}}
Lower use of academic affiliation by university faculty who study abortion in top U.S. newspapers (2022)
Journal of Communication in Healthcare.
118(15): 1-8
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Background: University faculty are considered trusted sources of information to disseminate accurate information to the public that abortion is a common, safe and necessary medical health care service. However, misinformation persists about abortion’s alleged dangers, commonality, and medical necessity. Methods: Systematic review of popular media articles related to abortion, gun control (an equally controversial topic), and cigarette use (a more neutral topic) published in top U.S. newspapers between January 2015 and July 2020 using bivariate analysis and logistic regression to compare disclosure of university affiliation among experts in each topic area. Results: We included 41 abortion, 102 gun control, and 130 smoking articles, which consisted of 304 distinct media mentions of university-affiliated faculty. Articles with smoking and gun control faculty experts had statistically more affiliations mentioned (90%, n=195 and 88%, n =159, respectively) than abortion faculty experts (77%, n=54) (p=0.02). The probability of faculty disclosing university affiliation was similar between smoking and gun control (p=0.73), but between smoking and abortion was significantly less (Ave Marginal Effects – 0.13, p=0.02). Conclusions: Fewer faculty members disclose their university affiliation in top U.S. newspapers when discussing abortion. Lack of academic disclosure may paradoxically make these faculty appear less ‘legitimate.’ This leads to misinformation, branding abortion as a ‘choice,’ suggesting it is an unessential medical service. With the recent U.S. Supreme Court landmark decision, Dobbs v. Jackson Women’s Health Organization, and subsequent banning of abortion in many U.S. states, faculty will probably be even less likely to disclose their university affiliation in the media than in the past.
⊖ Bibtex
@article{Miller2022JCommunicationsHealthcare,
author = {Madison Miller and Alexa R. Lindley and Jevin D. West and Erin K. Thayer and Emily M. Godfrey},
title = {Lower use of academic affiliation by university faculty who study abortion in top U.S. newspapers},
journal = {Journal of Communication in Healthcare},
volume = {0},
number = {0},
pages = {1-14},
year = {2022},
publisher = {Taylor & Francis},
doi = {10.1080/17538068.2022.2150166}}
author = {Madison Miller and Alexa R. Lindley and Jevin D. West and Erin K. Thayer and Emily M. Godfrey},
title = {Lower use of academic affiliation by university faculty who study abortion in top U.S. newspapers},
journal = {Journal of Communication in Healthcare},
volume = {0},
number = {0},
pages = {1-14},
year = {2022},
publisher = {Taylor & Francis},
doi = {10.1080/17538068.2022.2150166}}
The Past 110 Years: Historical Data on the Underrepresentation of Women in Philosophy Journals (2022)
Ethics.
132(3): 680-729
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
This article provides the first large-scale, longitudinal study examining publication rates by gender in philosophy journals. We find that from 1900 to 1990 the proportion of women authorships in philosophy increased, but it has plateaued since the 1990s (unlike in other disciplines). Top Philosophy journals publish the lowest proportion of women, and anonymous review does not increase the proportion publishing in these journals (though it does in other journals). Value Theory journals do not publish articles by women in proportion to their presence in the subdiscipline. Although the proportion of women authorships in philosophy has increased over time, measurable disparities persist.
⊖ Bibtex
@article{Hassoun2022Ethics,
title={The Past 110 Years: Historical Data on the Underrepresentation of Women in Philosophy Journals},
author={Nicole Hassoun and Sherri Conklin and Michael Nekrasov and Jevin D. West},
journal={Ethics},
volume = {132},
number = {3},
pages = {680-729},
publisher = {The University of Chicago Press Chicago, IL},
year={2022}}
title={The Past 110 Years: Historical Data on the Underrepresentation of Women in Philosophy Journals},
author={Nicole Hassoun and Sherri Conklin and Michael Nekrasov and Jevin D. West},
journal={Ethics},
volume = {132},
number = {3},
pages = {680-729},
publisher = {The University of Chicago Press Chicago, IL},
year={2022}}
Misinformation in and about science (2021)
Proceedings of the National Academies of Sciences.
118(15): 1-8
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Humans learn about the world by collectively acquiring information, filtering it, and sharing what we know. Misinformation undermines this process. The repercussions are extensive. Without reliable and accurate sources of information, we cannot hope to halt climate change, make reasoned democratic decisions, or control a global pandemic. Most analyses of misinformation focus on popular and social media, but the scientific enterprise faces a parallel set of problems-from hype and hyperbole to publication bias and citation misdirection, predatory publishing, and filter bubbles. In this perspective, we highlight these parallels and discuss future research directions and interventions.
⊖ Bibtex
@article{West2021pnas,
title={Misinformation in and about science},
author={West, Jevin D and Bergstrom, Carl T},
journal={Proceedings of the National Academies of Sciences},
volume={118},
number={15},
pages = {1-8},
doi={https://www.pnas.org/content/118/15/e1912444117},
year={2021}}
title={Misinformation in and about science},
author={West, Jevin D and Bergstrom, Carl T},
journal={Proceedings of the National Academies of Sciences},
volume={118},
number={15},
pages = {1-8},
doi={https://www.pnas.org/content/118/15/e1912444117},
year={2021}}
The hidden influence of communities in collaborative funding of clinical science (2021)
Royal Society Open Science.
8(8):210072
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Every year the National Institutes of Health allocates $10.7 billion (one-third of its funds) for clinical science research while the pharmaceutical companies spend $52.9 billion (90% of its annual budget). However, we know little about funder collaborations and the impact of collaboratively funded projects. As an initial effort towards this, we examine the co-funding network, where a funder represents a node and an edge signifies collaboration. Our core data include all papers that cite and receive citations by the Cochrane Database of Systemic Reviews, a prominent clinical review journal. We find that 65% of clinical papers have multiple funders and discover communities of funders that are formed by national boundaries and funding objectives. To quantify success in funding, we use a g-index metric that indicates efficiency of funders in supporting clinically relevant research. After controlling for authorship, we find that funders generally achieve higher success when collaborating than when solo-funding. We also find that as a funder, seeking multiple, direct connections with various disconnected funders may be more beneficial than being part of a densely interconnected network of co-funders. The results of this paper indicate that collaborations can potentially accelerate innovation, not only among authors but also funders.
⊖ Bibtex
@article{Vasan2021royalsociety,
title={The hidden influence of communities in collaborative funding of clinical science},
author={Vasan, Kishore and West, Jevin D.},
journal={Royal Society open science},
volume={8},
number={8},
pages={210072},
year={2021},
doi={https://doi.org/10.1098/rsos.210072},
publisher={The Royal Society}}
title={The hidden influence of communities in collaborative funding of clinical science},
author={Vasan, Kishore and West, Jevin D.},
journal={Royal Society open science},
volume={8},
number={8},
pages={210072},
year={2021},
doi={https://doi.org/10.1098/rsos.210072},
publisher={The Royal Society}}
Social Media COVID-19 Misinformation Interventions Viewed Positively, But Have Limited Impact (2020)
arxiv.
2012.11055
The Influence of Changing Marginals on Measures of Inequality in Scholarly Citations: Evidence of Bias and a Resampling Correction (2020)
Sociological Science.
7: 314-341
HTML |
PDF |
Abstract »
⊖ Abstract
Bibtex »
Scholars have debated whether changes in digital environments have led to greater concentration or dispersal of scientific citations, but this debate has paid little attention to how other changes in the publication environment may impact the commonly used measures of inequality. We demonstrate using Monte Carlo experiments that a variety of inequality measures -- including the Gini coefficient, the Herfindahl-Hirschman index, and the percentage of papers ever cited -- are substantially biased downwards by increases in the total number of papers and citations. We propose and validate a resampling-based correction for this “marginals bias,” and apply this correction to empirical data on scholarly citation distributions using Web of Science data covering four broad scientific fields (Health; Humanities; Mathematics and Computer Sciences; and Social Sciences) during 1996–2014. We find that in each field the bulk of the apparent decline in citation inequality in recent years is an artifact of marginals bias, as are most apparent inter-field differences in citation inequality. Researchers using inequality measures to compare citation distributions and other distributions with many cases at or near the zero-bound should interpret these metrics carefully and account for the influence of changing marginals.
|
⊖ Bibtex
@article{Kim2020sociologicalscience,
title={The Influence of Changing Marginals on Measures of Inequality in Scholarly Citations: Evidence of Bias and a Resampling Correction},
author={Kim, Lanu and Adolph, Christopher and West, Jevin D and Stovel, Katherine},
journal={Sociological Science},
volume = {7},
pages = {314-341},
year={2020}}
title={The Influence of Changing Marginals on Measures of Inequality in Scholarly Citations: Evidence of Bias and a Resampling Correction},
author={Kim, Lanu and Adolph, Christopher and West, Jevin D and Stovel, Katherine},
journal={Sociological Science},
volume = {7},
pages = {314-341},
year={2020}}
Constructing and Evaluating Automated Literature Review Systems (2020)
Scientometrics.
125(3): 3233-3251
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Automated literature reviews have the potential to accelerate knowledge synthesis and provide new insights. However, a lack of labeled ground-truth data has made it difficult to develop and evaluate these methods. We propose a framework that uses the reference lists from existing review papers as labeled data, which can then be used to train supervised classifiers, allowing for experimentation and testing of models and features at a large scale. We demonstrate our framework by training classifiers using different combinations of citation- and text-based features on 500 review papers. We use the R-Precision scores for the task of reconstructing the review papers' reference lists as a way to evaluate and compare methods. We also extend our method, generating a novel set of articles relevant to the fields of misinformation studies and science communication. We find that our method can identify many of the most relevant papers for a literature review from a large set of candidate papers, and that our framework allows for development and testing of models and features to incrementally improve the results. The models we build are able to identify relevant papers even when starting with a very small set of seed papers. We also find that the methods can be adapted to identify previously undiscovered articles that may be relevant to a given topic.
⊖ Bibtex
@article{portenoy2020scientometrics,
title={Constructing and Evaluating Automated LiteratureReview Systems},
author={Portenoy, Jason and West, Jevin D},
journal={Scientometrics},
volume={125},
number={3},
pages={3233--3251},
year={2020},
publisher={Springer}}
title={Constructing and Evaluating Automated LiteratureReview Systems},
author={Portenoy, Jason and West, Jevin D},
journal={Scientometrics},
volume={125},
number={3},
pages={3233--3251},
year={2020},
publisher={Springer}}
Scientific Journals Still Matter in the Era of Academic Search Engines and Preprint Archives (2019)
Journal of the American Society for Information Science & Technology.
71 (10): 1218--1226
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Journals play a critical role in the scientific process because they evaluate the quality of incoming papers and offer an organizing filter for search. However, the role of journals has been called into question because new preprint archives and academic search engines make it easier to find articles independent of the journals that publish them. Research on this issue is complicated by the deeply confounded relationship between article quality and journal reputation. We present an innovative proxy for individual article quality that is divorced from the journal's reputation or impact factor: the number of citations to preprints posted on arXiv.org. Using this measure to study three subfields of physics that were early adopters of arXiv, we show that prior estimates of the effect of journal reputation on an individual article's impact (measured by citations) are likely inflated. While we find that higher‐quality preprints in these subfields are now less likely to be published in journals compared to prior years, we find little systematic evidence that the role of journal reputation on article performance has declined.
⊖ Bibtex
@article{kim2020jasist,
title={Scientific Journals Still Matter in the Era of Academic Search Engines and Preprint Archives},
author={Kim, Lanu and Portenoy, Jason and West, Jevin D and Stove, Katherine},
journal={Journal of the American Society for Information Science and Technology},
volume={71},
number={10},
pages={1218--1226},
year={2020}}
title={Scientific Journals Still Matter in the Era of Academic Search Engines and Preprint Archives},
author={Kim, Lanu and Portenoy, Jason and West, Jevin D and Stove, Katherine},
journal={Journal of the American Society for Information Science and Technology},
volume={71},
number={10},
pages={1218--1226},
year={2020}}
Understanding the Elephant: The Discourse Approach to Boundary Identification and Corpus Construction for Theory Review Articles (2019)
Journal of the Association for Information Systems.
20(7):15
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The goal of a review article is to present the current state of knowledge in a research area. Two important initial steps in writing a review article are boundary identification (identifying a body of potentially relevant past research) and corpus construction (selecting research manuscripts to include in the review). We present a theory-as-discourse approach, which (1) creates a theory ecosystem of potentially relevant prior research using a citation-network approach to boundary identification; and (2) identifies manuscripts for consideration using machine learning or random selection. We demonstrate an instantiation of the theory as discourse approach through a proof-of-concept, which we call the automated detection of implicit theory (ADIT) technique. ADIT improves performance over the conventional approach as practiced in past technology acceptance model reviews (i.e., keyword search, sometimes manual citation chaining); it identifies a set of research manuscripts that is more comprehensive and at least as precise. Our analysis shows that the conventional approach failed to identify a majority of past research. Like the three blind men examining the elephant, the conventional approach distorts the totality of the phenomenon. ADIT also enables researchers to statistically estimate the number of relevant manuscripts that were excluded from the resulting review article, thus enabling an assessment of the review article’s representativeness.
⊖ Bibtex
@article{Larsen2019JASIS,
title={Understanding the Elephant: The Discourse Approach to Boundary Identification and Corpus Construction for Theory Review Articles},
author={Kai R Larsen and Dirk Hovorka and Alan Dennis and Jevin D. West},
journal={Journal of the Association for Information Systems},
volume={20},
number={7},
pages={15},
doi={10.17705/1jais.00556},
year={2019},
url = {https://aisel.aisnet.org/jais/vol20/iss7/15/}}
title={Understanding the Elephant: The Discourse Approach to Boundary Identification and Corpus Construction for Theory Review Articles},
author={Kai R Larsen and Dirk Hovorka and Alan Dennis and Jevin D. West},
journal={Journal of the Association for Information Systems},
volume={20},
number={7},
pages={15},
doi={10.17705/1jais.00556},
year={2019},
url = {https://aisel.aisnet.org/jais/vol20/iss7/15/}}
The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles (2018)
PeerJ.
6:e4375, doi:10.7717/peerj.4375
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Despite growing interest in Open Access (OA) to scholarly literature, there is an unmet need for large-scale, up-to-date, and reproducible studies assessing the prevalence and characteristics of OA. We address this need using oaDOI, an open online service that determines OA status for 67 million articles. We use three samples, each of 100,000 articles, to investigate OA in three populations: (1) all journal articles assigned a Crossref DOI, (2) recent journal articles indexed in Web of Science, and (3) articles viewed by users of Unpaywall, an open-source browser extension that lets users find OA articles using oaDOI. We estimate that at least 28% of the scholarly literature is OA (19M in total) and that this proportion is growing, driven particularly by growth in Gold and Hybrid. The most recent year analyzed (2015) also has the highest percentage of OA (45%). Because of this growth, and the fact that readers disproportionately access newer articles, we find that Unpaywall users encounter OA quite frequently: 47% of articles they view are OA. Notably, the most common mechanism for OA is not Gold, Green, or Hybrid OA, but rather an under-discussed category we dub Bronze: articles made free-to-read on the publisher website, without an explicit Open license. We also examine the citation impact of OA articles, corroborating the so-called open-access citation advantage: accounting for age and discipline, OA articles receive 18% more citations than average, an effect driven primarily by Green and Hybrid OA. We encourage further research using the free oaDOI service, as a way to inform OA policy and practice.
⊖ Bibtex
@article{piwowar2018peerj,
title={The state of {OA}: a large-scale analysis of the prevalence and impact of Open Access articles},
author={Piwowar, Heather and Priem, Jason and Lariviere, Vincent and Alperin, Juan Pablo and Matthias, Lisa and Norlander, Bree and Farley, Ashley and West, Jevin and Haustein, Stefanie},
journal={PeerJ},
volume={6},
pages={e4375},
year={2018},
publisher={PeerJ Inc.},
url = {https://doi.org/10.7717/peerj.4375}}
title={The state of {OA}: a large-scale analysis of the prevalence and impact of Open Access articles},
author={Piwowar, Heather and Priem, Jason and Lariviere, Vincent and Alperin, Juan Pablo and Matthias, Lisa and Norlander, Bree and Farley, Ashley and West, Jevin and Haustein, Stefanie},
journal={PeerJ},
volume={6},
pages={e4375},
year={2018},
publisher={PeerJ Inc.},
url = {https://doi.org/10.7717/peerj.4375}}
Toward the Operationalization of Visual Metaphor (2017)
Journal of the American Society for Information Science & Technology.
68(10): 2338-2349, 10.1002/asi.23857
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Many successful digital interfaces employ visual metaphors to convey features or data properties to users, but the characteristics that make a visual metaphor effective are not well-understood. We used a theoretical conception of metaphor from cognitive linguistics to design an interactive system for viewing the citation network of the corpora of literature in the JSTOR database, a highly connected compound graph of 2 million papers linked by 8 million citations. We created four variants of this system, manipulating two distinct properties of metaphor. We conducted a between-subjects experimental study with 80 participants to compare understanding and engagement when working with each version. We found that building on known image schemas improved response time on lookup tasks, while contextual detail predicted increases in persistence and the number of inferences drawn from the data. Schema-congruency combined with contextual detail produced the highest gains in comprehension. These findings provide concrete mechanisms by which designers presenting large data sets through metaphorical interfaces may improve their effectiveness and appeal with users.
⊖ Bibtex
@article {Hiniker2017jasist,
author = {Hiniker, Alexis and Hong, Sungsoo Ray and Kim, Yea-Seul and Chen, Nan-Chen and West, Jevin D and Aragon, Cecilia},
title = {Toward the Operationalization of Visual Metaphor},
journal = {Journal of the American Society for Information Science and Technology},
volume = {68},
number = {10},
pages = {2338-2349},
publisher = {Wiley Online Library},
year = {2017}}
author = {Hiniker, Alexis and Hong, Sungsoo Ray and Kim, Yea-Seul and Chen, Nan-Chen and West, Jevin D and Aragon, Cecilia},
title = {Toward the Operationalization of Visual Metaphor},
journal = {Journal of the American Society for Information Science and Technology},
volume = {68},
number = {10},
pages = {2338-2349},
publisher = {Wiley Online Library},
year = {2017}}
Attrition and Performance of Community College Transfers (2017)
PLoS One.
12: 1-23, doi: 10.1371/journal.pone.0174683
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Community colleges are an important part of the US higher education landscape, yet the aptitude and preparedness of student transfers to baccalaureate institutions is often called into question. Examining transcript records and demographic information of nearly 70,000 students across over 15 years of registrar records at a public university, this study performed a descriptive analysis of the persistence, performance, and academic migration patterns of community college transfers, transfers from four-year institutions, and freshmen entrants. We found little difference between community college transfers and freshmen entrants in terms of post-transfer grades and persistence. Transfers from four-year institutions had higher grades but also had higher attrition rates than their peers. This study also found no strong evidence of transfer shock on students’ post-transfer grades. When examining the tendencies of students to shift fields of study during their educational pursuits, the academic migration patterns of transfer students were more concentrated than those of freshmen entrants.
⊖ Bibtex
@ARTICLE{Aulck2017plos,
author = {Aulck, Lovenoor and West, Jevin},
title = {Attrition and Performance of Community College Transfers},
journal = {PLoS One},
volume = {(in press)},
year = {2017}}
author = {Aulck, Lovenoor and West, Jevin},
title = {Attrition and Performance of Community College Transfers},
journal = {PLoS One},
volume = {(in press)},
year = {2017}}
Men set their own cites high: Gender and self-citation across fields and over time (2017)
Socius.
3:1-22, doi: 10.1177/2378023117738903
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
How common is self-citation in scholarly publication, and does the practice vary by gender? Using novel methods and a data set of 1.5 million research papers in the scholarly database JSTOR published between 1779 and 2011, the authors find that nearly 10 percent of references are self-citations by a paper’s authors. The findings also show that between 1779 and 2011, men cited their own papers 56 percent more than did women. In the last two decades of data, men self-cited 70 percent more than women. Women are also more than 10 percentage points more likely than men to not cite their own previous work at all. While these patterns could result from differences in the number of papers that men and women authors have published rather than gender-specific patterns of self-citation behavior, this gender gap in self-citation rates has remained stable over the last 50 years, despite increased representation of women in academia. The authors break down self-citation patterns by academic field and number of authors and comment on potential mechanisms behind these observations. These findings have important implications for scholarly visibility and cumulative advantage in academic careers.
⊖ Bibtex
@ARTICLE{King2017socius,
author = {King, Molly M. and Bergstrom, Carl T. and Correll, Shelly J. and Jacquet, Jennifer and West, Jevin D.},
title = {Men set their own cites high: Gender and self-citation across fields and over time},
journal = {Socius},
volume = {3},
pages = {1-22},
year = {2017},
doi = {10.1177/2378023117738903}}
author = {King, Molly M. and Bergstrom, Carl T. and Correll, Shelly J. and Jacquet, Jennifer and West, Jevin D.},
title = {Men set their own cites high: Gender and self-citation across fields and over time},
journal = {Socius},
volume = {3},
pages = {1-22},
year = {2017},
doi = {10.1177/2378023117738903}}
Viziometrics: Analyzing Visual Information in the Scientific Literature (2017)
IEEE Transactions on Big Data.
PP(99): 1-14. doi: 10.1109/TBDATA.2017.2689038
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this paper, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types and study the resulting patterns of visual information as they relate to impact. We find that the distribution of figures and figure types in the literature has remained relatively constant over time, but can vary widely across field and topic. Remarkably, we find a significant correlation between scientific impact and the use of visual information, where higher impact papers tend to include more diagrams, and to a lesser extent more plots and photographs. To explore these results and other ways of extracting this visual information, we have built a visual browser to illustrate the concept and explore design alternatives for supporting viziometric analysis and organizing visual information. We use these results to articulate a new research agenda -- viziometrics -- to study the organization and presentation of visual information in the scientific literature.
⊖ Bibtex
@ARTICLE{Lee2017bigdata,
author = {Lee, Poshen and West, Jevin D and Howe, Bill},
title = {Viziometrics: Analyzing Visual Information in the Scientific Literature},
journal = {IEEE Transactions on Big Data},
volume = {PP},
number = {99},
pages = {1-14},
doi={10.1109/TBDATA.2017.2689038},
year = {2017}}
author = {Lee, Poshen and West, Jevin D and Howe, Bill},
title = {Viziometrics: Analyzing Visual Information in the Scientific Literature},
journal = {IEEE Transactions on Big Data},
volume = {PP},
number = {99},
pages = {1-14},
doi={10.1109/TBDATA.2017.2689038},
year = {2017}}
Towards Assessing Gender Authorship in Aquaculture Publications (2017)
The Journal of the Asian Fisheries Society.
30S:131-143
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
While gender disparities are decreasing in some areas of academia, studies have shown that gender inequities in scholarly literature still persist. A review of more than eight million papers across disciplines found that men predominate in the first and last author positions and women are underrepresented in single-authored papers. The present study applies the vetted methodology of assigning authorship gender in peer-reviewed literature, according to the U.S. Social Security Database of names, to the broad discipline of aquaculture in peer-reviewed journals in the complete JSTOR database archive, and compares these results to authorship by gender in the International Aquaculture Curated Database (IACD). The International Aquaculture Curated Database (IACD) is a compilation of over 500 peer-reviewed publications supported by four international aquaculture programs developed by Oregon State University researchers. Preliminary findings reveal that the percentage of women authors was similar to that for the JSTOR aquaculture journals subsample (13.8%) and the journals in the IACD (15.7%). Women, therefore, are not well represented in either database. The next steps for this work include comparing and contrasting the proportion of women authors in aquaculture journals to women working in the aquaculture discipline and to women graduates in the discipline. Learning how gender authorship has changed in the aquaculture discipline is a critical component for promoting gender equity in the academic discipline and broader field of aquaculture.
⊖ Bibtex
@ARTICLE{chow2017fisheries,
author = {Chow, Morgan and Egna, Hillary and West, Jevin D.},
title = {Towards Assessing Gender Authorship in Aquaculture Publications},
journal = {The Journal of the Asian Fisheries Society},
volume = {30s},
pages = {131-143},
year = {2017}}
author = {Chow, Morgan and Egna, Hillary and West, Jevin D.},
title = {Towards Assessing Gender Authorship in Aquaculture Publications},
journal = {The Journal of the Asian Fisheries Society},
volume = {30s},
pages = {131-143},
year = {2017}}
Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis (2017)
IEEE Transactions on Big Data.
11(3): 32:1--32:30. doi: 10.1145/2992785
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Community detection is a powerful approach to uncover important structures in large networks. For real networks that often describe the flow of some entity, flow-based community detection methods are particularly important. Infomap is a flow-based community detection algorithm that optimizes the objective function known as the map equation. Third-party benchmarks have found that Infomap is the most effective algorithm for identifying clusters in large graphs. Unfortunately, though Infomap works well, it is an inherently serial algorithm and thus cannot take advantage of multi-core processing in modern computers, limiting its use for analyzing large graphs quickly. In this paper, we propose a novel algorithm to optimize the map equation called RelaxMap. RelaxMap provides two important improvements over Infomap: parallelization, so that the map equation can be optimized over much larger graphs, and prioritization, so that the most important work occurs first, iterations take less time, and the algorithm converges faster. We implement these techniques using OpenMP on shared-memory multicore systems, and evaluate our approach on a variety of graphs from standard graph clustering benchmarks as well as real graph datasets. Our evaluation shows that both techniques are effective: RelaxMap achieves 70% parallel efficiency on 8 cores, and prioritization improves algorithm performance by an additional 20%–50% in average, depending on the graph properties. Additionally, RelaxMap converges in the similar number of iterations and provides solutions of equivalent quality as the serial Infomap implementation.
⊖ Bibtex
@Aarticle{Bae2016tkdd,
title={Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis},
author={Bae, Seung-Hee and Halperin, Daniel and West, Jevin D and Rosvall, Martin and Howe, Bill},
journal = {ACM Transactions on Knowledge Discovery from Data},
volume = {11},
number = {3},
year = {2017},
issn = {1556-4681},
pages = {32:1--32:30},
keywords = {Community detection, graph analysis, parallelization, prioritization}}
title={Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis},
author={Bae, Seung-Hee and Halperin, Daniel and West, Jevin D and Rosvall, Martin and Howe, Bill},
journal = {ACM Transactions on Knowledge Discovery from Data},
volume = {11},
number = {3},
year = {2017},
issn = {1556-4681},
pages = {32:1--32:30},
keywords = {Community detection, graph analysis, parallelization, prioritization}}
All Patents Great and Small - A Big Data Network Approach to Valuation (2017)
Virginia Journal of Law and Technology.
20(3): 466-504
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Measuring patent value is an important goal of scholars in both patent law and patent economics. However, doing so objectively, accurately and consistently has proved exceedingly difficult. At least part of the reason for this difficulty is that patents themselves are complex documents that are difficult even for patent experts to interpret. In addition, issued patents are the result of an often long and complicated negotiation between applicant and patent office (in the United States, the United States Patent & Trademark Office ("USPTO"), the result of which is an opaque “prosecution history” upon which depend the scope of claimed patent rights. In this Article, we approach the concept of patent value by using the relative positions of issued United States ("U.S.") patents embedded within a comprehensive patent citation network to measure the importance of those patents within the network. Thus, we tend to refer to the "importance" of patents instead of "value", but there is good reason to believe that these two concepts share a very similar meaning.
⊖ Bibtex
@article{torrance2017virginia,
author = {Torrance, Andrew W and West, Jevin D},
title = {All Patents Great and Small - A Big Data Network Approach to Valuation},
journal = {Virginia Journal of Law and Technology},
volume = {20},
pages = {466-504},
number = {3},
year = {2017}}
author = {Torrance, Andrew W and West, Jevin D},
title = {All Patents Great and Small - A Big Data Network Approach to Valuation},
journal = {Virginia Journal of Law and Technology},
volume = {20},
pages = {466-504},
number = {3},
year = {2017}}
Leveraging Citation Networks to Visualize Scholarly Influence Over Time (2017)
Frontiers in Research Metrics and Analytics.
2:1-10, doi: 10.3389/frma.2017.00008
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Assessing the influence of a scholar's work is an important task for funding organizations, academic departments, and researchers. Common methods, such as measures of citation counts, can ignore much of the nuance and multidimensionality of scholarly influence. We present an approach for generating dynamic visualizations of scholars' careers. This approach uses an animated node-link diagram showing the citation network accumulated around the researcher over the course of the career in concert with key indicators, highlighting influence both within and across fields. We developed our design in collaboration with one funding organization---the Pew Biomedical Scholars program---but the methods are generalizable to visualizations of scholarly influence. We applied the design method to the Microsoft Academic Graph, which includes more than 120 million publications. We validate our abstractions throughout the process through collaboration with the Pew Biomedical Scholars program officers and summative evaluations with their scholars.
⊖ Bibtex
@Aarticle{Portenoy2017frontiers,
title={Leveraging Citation Networks to Visualize Scholarly Influence Over Time},
author={Portenoy, Jason and Hullman, Jessica and West, Jevin D.},
journal = {Frontiers in Research Metrics and Analytics},
volume = {2},
pages = {8},
year = {2017},
issn = {2504-0537},
doi = {10.3389/frma.2017.00008}}
title={Leveraging Citation Networks to Visualize Scholarly Influence Over Time},
author={Portenoy, Jason and Hullman, Jessica and West, Jevin D.},
journal = {Frontiers in Research Metrics and Analytics},
volume = {2},
pages = {8},
year = {2017},
issn = {2504-0537},
doi = {10.3389/frma.2017.00008}}
A recommendation system based on hierarchical clustering of an article-level citation network (2016)
IEEE Transactions on Big Data.
2(2): 113-123. doi: 10.1109/TBDATA.2016.2541167
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order models are particularly important for effectively revealing actual, overlapping community structure, but higher-order Markov chain models suffer from the curse of dimensionality: their vast parameter spaces require exponentially increasing data to avoid overfitting and therefore make mapping inefficient already for moderate-sized systems. To overcome this problem, we introduce an efficient cross-validated mapping approach based on network flows modeled by sparse Markov chains. To illustrate our approach, we present a map of citation flows in science with research fields that overlap in multidisciplinary journals. Compared with currently used categories in science of science studies, the research fields form better units of analysis because the map more effectively captures how ideas flow through science.
⊖ Bibtex
@ARTICLE{west2016IEEE,
author = {A recommendation system based on hierarchical clustering of an article-level citation network},
author = {West, Jevin D and Wesley-Smith, Ian and Bergstrom, Carl T},
journal = {IEEE Transactions on Big Data},
volume = {2},
number = {2},
month = {June},
pages = {113-123},
doi = {10.1109/TBDATA.2016.2541167},
year = {2016}}
author = {A recommendation system based on hierarchical clustering of an article-level citation network},
author = {West, Jevin D and Wesley-Smith, Ian and Bergstrom, Carl T},
journal = {IEEE Transactions on Big Data},
volume = {2},
number = {2},
month = {June},
pages = {113-123},
doi = {10.1109/TBDATA.2016.2541167},
year = {2016}}
The Academic Advantage: Gender Disparities in Patenting (2015)
PLoS One.
10(5): e0128000, doi: 10.1371/journal.pone.0128000
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
We analyzed gender disparities in patenting by country, technological area, and type of as- signee using the 4.6 million utility patents issued between 1976 and 2013 by the United States Patent and Trade Office (USPTO). Our analyses of fractionalized inventorships dem- onstrate that women’s rate of patenting has increased from 2.7% of total patenting activity to 10.8% over the nearly 40-year period. Our results show that, in every technological area, female patenting is proportionally more likely to occur in academic institutions than in corpo- rate or government environments. However, women’s patents have a lower technological impact than that of men, and that gap is wider in the case of academic patents. We also pro- vide evidence that patents to which women—and in particular academic women—contribut- ed are associated with a higher number of International Patent Classification (IPC) codes and co-inventors than men. The policy implications of these disparities and academic set- ting advantages are discussed.
⊖ Bibtex
@Article{Sugimoto2015PlosOne,
author = {Sugimoto, Cassidy R and Ni, Chaoqun and West, Jevin D and Lariviere, Vincent},
journal = {PLoS ONE},
publisher = {Public Library of Science},
title = {The Academic Advantage: Gender Disparities in Patenting},
year = {2015},
month = {05},
volume = {10},
url = {http://dx.doi.org/10.1371%2Fjournal.pone.0128000},
pages = {e0128000},
number = {5},
doi = {10.1371/journal.pone.0128000}}
author = {Sugimoto, Cassidy R and Ni, Chaoqun and West, Jevin D and Lariviere, Vincent},
journal = {PLoS ONE},
publisher = {Public Library of Science},
title = {The Academic Advantage: Gender Disparities in Patenting},
year = {2015},
month = {05},
volume = {10},
url = {http://dx.doi.org/10.1371%2Fjournal.pone.0128000},
pages = {e0128000},
number = {5},
doi = {10.1371/journal.pone.0128000}}
Memory in network flows and its effects on spreading dynamics and community detection (2014)
Nature Communications.
5:4630, doi:10.1038/ncomms5630
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking and spreading analysis, although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and although we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function.
⊖ Bibtex
@ARTICLE{Rosvall2014NatureCom,
title={Memory in network flows and its effects on spreading dynamics and community detection},
author={{Rosvall, Martin and Esquivel, Alcides V and Lancichinetti, Andrea and West, Jevin D and Lambiotte, Renaud},
journal={Nature Communications},
volume={5},
number={1},
year={2014},
publisher={Nature Publishing Group}}
title={Memory in network flows and its effects on spreading dynamics and community detection},
author={{Rosvall, Martin and Esquivel, Alcides V and Lancichinetti, Andrea and West, Jevin D and Lambiotte, Renaud},
journal={Nature Communications},
volume={5},
number={1},
year={2014},
publisher={Nature Publishing Group}}
Cost-effectiveness of open access publications (2014)
Economic Inquiry.
52: 1315-1321. doi: 10.1111/ecin.12117
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Open access publishing has been proposed as one possible solution to the serials crisis—the rapidly growing subscription prices in scholarly journal publishing. However, open access publishing can present economic pitfalls as well, such as excessive article processing charges. We discuss the decision that an author faces when choosing to submit to an open access journal. We develop an interactive tool to help authors compare among alternative open access venues and thereby get the most for their article processing charges.
⊖ Bibtex
@ARTICLE{West2014EconInquiry,
author = {West, Jevin D and Bergstrom, Theodore and Bergstrom, Carl T},
title = {Cost Effectiveness of Open Access Publications},
journal = {Economic Inquiry},
volume = {52},
number = {4},
publisher = {Wiley Periodicals, Inc.},
issn = {1465-7295},
url = {http://dx.doi.org/10.1111/ecin.12117},
doi = {10.1111/ecin.12117},
pages = {1315--1321},
year = {2014}}
author = {West, Jevin D and Bergstrom, Theodore and Bergstrom, Carl T},
title = {Cost Effectiveness of Open Access Publications},
journal = {Economic Inquiry},
volume = {52},
number = {4},
publisher = {Wiley Periodicals, Inc.},
issn = {1465-7295},
url = {http://dx.doi.org/10.1111/ecin.12117},
doi = {10.1111/ecin.12117},
pages = {1315--1321},
year = {2014}}
Finding Cultural Holes: How Structure and Culture Diverge in Networks of Scholarly Communication (2014)
Sociological Science.
1: 221-238, doi: 10.15195/v1.a15
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Divergent interests, expertise, and language form cultural barriers to communication. No formalism has been available to characterize these “cultural holes.” Here we use information theory to measure cultural holes and demonstrate our formalism in the context of scientific communication using papers from JSTOR. We extract scientific fields from the structure of citation flows and infer field-specific cultures by cataloging phrase frequencies in full text and measuring the relative efficiency of between-field communication. We then combine citation and cultural information in a novel topographic map of science, mapping citations to geographic distance and cultural holes to topography. By analyzing the full citation network, we find that communicative efficiency decays with citation distance in a field-specific way. These decay rates reveal hidden patterns of cohesion and fragmentation. For example, the ecological sciences are balkanized by jargon, whereas the social sciences are relatively integrated. Our results highlight the importance of enriching structural analyses with cultural data.
⊖ Bibtex
@ARTICLE{Vilhena2014SocScience,
author = {Vilhena, Daril A and Foster, Jacob G and Rosvall, Martin and West, Jevin D and Evans, James and Bergstrom, Carl T},
title = {Finding Cultural Holes: How Structure and Culture Diverge in Networks of Scholarly Communication},
doi = {10.15195/v1.a15},
issn = {23306696},
journal = {Sociological Science},
pages = {221--238},
url = {http://www.sociologicalscience.com/articles-vol1-15-221/},
volume = {1},
year = {2014}}
author = {Vilhena, Daril A and Foster, Jacob G and Rosvall, Martin and West, Jevin D and Evans, James and Bergstrom, Carl T},
title = {Finding Cultural Holes: How Structure and Culture Diverge in Networks of Scholarly Communication},
doi = {10.15195/v1.a15},
issn = {23306696},
journal = {Sociological Science},
pages = {221--238},
url = {http://www.sociologicalscience.com/articles-vol1-15-221/},
volume = {1},
year = {2014}}
The Role of Gender in Scholarly Authorship (2013)
PLoS One.
8(7): e66212, doi: 10.1371/journal.pone.0066212
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Gender disparities appear to be decreasing in academia according to a number of metrics, such as grant funding, hiring, acceptance at scholarly journals, and productivity, and it might be tempting to think that gender inequity will soon be a problem of the past. However, a large-scale analysis based on over eight million papers across the natural sciences, social sciences, and humanities reveals a number of understated and persistent ways in which gender inequities remain. For instance, even where raw publication counts seem to be equal between genders, close inspection reveals that, in certain fields, men predominate in the prestigious first and last author positions. Moreover, women are significantly underrepresented as authors of single-authored papers. Academics should be aware of the subtle ways that gender disparities can occur in scholarly authorship.
⊖ Bibtex
@ARTICLE{West2013PlosOne,
title={The role of gender in scholarly authorship},
author={West, Jevin D and Jacquet, Jennifer and King, Molly M and Correll, Shelley J and Bergstrom, Carl T},
journal={PloS One},
volume={8},
number={7},
pages={e66212},
year={2013},
doi = {10.1371/journal.pone.0066212},
publisher={Public Library of Science}}
title={The role of gender in scholarly authorship},
author={West, Jevin D and Jacquet, Jennifer and King, Molly M and Correll, Shelley J and Bergstrom, Carl T},
journal={PloS One},
volume={8},
number={7},
pages={e66212},
year={2013},
doi = {10.1371/journal.pone.0066212},
publisher={Public Library of Science}}
Author-Level Eigenfactor Metrics: Evaluating the Influence of Authors, Institutions and Countries Within the SSRN Community (2013)
Journal of the American Society for Information Science & Technology.
64(4): 787-801, doi: 10.1002/asi.22790
The EigenfactorTM Metrics: A network approach to assessing scholarly journals (2010)
College & Research Libraries.
71(3): 236-244, doi: 10.5860/0710236
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Limited time and budgets have created a legitimate need for quantitative measures of scholarly work. The well-known journal impact factor is the leading measure of this sort; here we describe an alternative approach based on the full structure of the scholarly citation network. The Eigenfactor Metrics-Eigenfactor Score and Article Influence Score-use an iterative ranking scheme similar to Google's PageRank algorithm. By this approach, citations from top journals are weighted more heavily than citations from lower-tier publications. Here we describe these metrics and the rankings that they provide.
⊖ Bibtex
@ARTICLE{article{West2010CRL,
Title = {The Eigenfactor Metrics: A Network Approach to Assessing Scholarly Journals},
Author = {West, Jevin D and Bergstrom, Theodore C and Bergstrom, Carl T},
Journal = {College and Research Libraries},
Number = {3},
Pages = {236-244},
Volume = {71},
Year = {2010}}
Title = {The Eigenfactor Metrics: A Network Approach to Assessing Scholarly Journals},
Author = {West, Jevin D and Bergstrom, Theodore C and Bergstrom, Carl T},
Journal = {College and Research Libraries},
Number = {3},
Pages = {236-244},
Volume = {71},
Year = {2010}}
Big Macs and Eigenfactor Scores: Don't Let Correlation Coefficients Fool You (2010)
Journal of the American Society for Information Science & Technology.
61(9): 1800-1807, doi: 10.1002/asi.21374
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The Eigenfactor Metrics provide an alternative way of evaluating scholarly journals based on an iterative ranking procedure analogous to Google’s PageRank algorithm. These metrics have recently been adopted by Thomson Reuters and are listed alongside the Impact Factor in the Journal Citation Reports. But do these metrics differ sufficiently so as to be a useful addition to the bibliometric toolbox? Davis (2008) has argued that they do not, based on his finding of a 0.95 correlation coefficient between Eigenfactor score and Total Citations for a sample of journals in the field of medicine. This conclusion is mistaken; in this article, we illustrate the basic statistical fallacy to which Davis succumbed. We provide a complete analysis of the 2006 Journal Citation Reports and demonstrate that there are statistically and economically significant differences between the information provided by the Eigenfactor Metrics and that provided by Impact Factor and Total Citations.
⊖ Bibtex
@ARTICLE{article{West2010JASIST,
title={Big Macs and Eigenfactor scores: Don't let correlation coefficients fool you},
author={West, Jevin and Bergstrom, Theodore and Bergstrom, Carl T},
journal={Journal of the American Society for Information Science and Technology},
volume={61},
number={9},
pages={1800--1807},
year={2010},
publisher={Wiley Online Library}}
title={Big Macs and Eigenfactor scores: Don't let correlation coefficients fool you},
author={West, Jevin and Bergstrom, Theodore and Bergstrom, Carl T},
journal={Journal of the American Society for Information Science and Technology},
volume={61},
number={9},
pages={1800--1807},
year={2010},
publisher={Wiley Online Library}}
Differences in Impact Factor across fields and over time (2009)
Journal of the American Society for Information Science & Technology.
60(1): 27-34, doi: 10.1002/asi.20936
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The bibliometric measure impact factor is a leading indicator of journal influence, and impact factors are routinely used in making decisions ranging from selecting journal subscriptions to allocating research funding to deciding tenure cases. Yet journal impact factors have increased gradually over time, and moreover impact factors vary widely across academic disciplines. Here we quantify inflation over time and differences across fields in impact factor scores and determine the sources of these differences. We find that the average number of citations in reference lists has increased gradually, and this is the predominant factor responsible for the inflation of impact factor scores over time. Field-specific variation in the fraction of citations to literature indexed by Thomson Scientific’s Journal Citation Reports is the single greatest contributor to differences among the impact factors of journals in different fields. The growth rate of the scientific literature as a whole, and cross-field differences in net size and growth rate of individual fields, have had very little influence on impact factor inflation or on cross-field differences in impact factor.
⊖ Bibtex
@ARTICLE{article{Althouse2009JASIST,
title={Differences in impact factor across fields and over time},
author={Althouse, Benjamin M and West, Jevin D and Bergstrom, Carl T and Bergstrom, Theodore},
journal={Journal of the American Society for Information Science and Technology},
volume={60},
number={1},
pages={27--34},
year={2009},
publisher={Wiley Online Library}}
title={Differences in impact factor across fields and over time},
author={Althouse, Benjamin M and West, Jevin D and Bergstrom, Carl T and Bergstrom, Theodore},
journal={Journal of the American Society for Information Science and Technology},
volume={60},
number={1},
pages={27--34},
year={2009},
publisher={Wiley Online Library}}
Coevolutionary cycling of host sociality and pathogen virulence in contact networks (2009)
Journal of Theoretical Biology.
261: 561-569, doi: 10.1016/j.jtbi.2009.08.022
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Infectious diseases may place strong selection on the social organization of animals. Conversely, the structure of social systems can influence the evolutionary trajectories of pathogens. While much attention has focused on the evolution of host sociality or pathogen virulence separately, few studies have looked at their coevolution. Here we use an agent-based simulation to explore host–pathogen coevolution in social contact networks. Our results indicate that under certain conditions, both host sociality and pathogen virulence exhibit continuous cycling. The way pathogens move through the network (e.g., their interhost transmission and probability of superinfection) and the structure of the network can influence the existence and form of cycling.
⊖ Bibtex
@ARTICLE{Prado2009JTheorBiol,
title={Coevolutionary cycling of host sociality and pathogen virulence in contact networks},
author={Prado, Federico and Sheih, Alyssa and West, Jevin D and Kerr, Benjamin},
journal={Journal of theoretical biology},
volume={261},
number={4},
pages={561--569},
year={2009},
publisher={Elsevier}}
title={Coevolutionary cycling of host sociality and pathogen virulence in contact networks},
author={Prado, Federico and Sheih, Alyssa and West, Jevin D and Kerr, Benjamin},
journal={Journal of theoretical biology},
volume={261},
number={4},
pages={561--569},
year={2009},
publisher={Elsevier}}
Dynamics of stomatal patches for a single surface of Xanthium strumarium L. leaves observed with flourescence and thermal images (2005)
Plant, Cell & Environment.
28: 633-641, doi: 10.1111/j.1365-3040.2005.01309.x
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Fluorescence and thermal imaging were used to examine the dynamics of stomatal patches for a single surface of Xanthium strumarium L. leaves following a decrease in ambient humidity. Patches were not observed in all experiments, and in many experiments the patches were short lived. In some experiments, however, patches persisted for many hours and showed complex temporal and spatial patterns. Rapidly sampled fluorescence images showed that the measurable variations of these patches were sufficiently slow to be captured by fluorescence images taken at 3-min intervals using a saturating flash of light. Stomatal patchiness with saturating flashes of light was not demonstrably different from that without saturating flashes of light, suggesting that the regular flashes of light did not directly cause the phenomenon. Comparison of simultaneous fluorescence and thermal images showed that the fluorescence patterns were largely the result of stomatal conductance patterns, and both thermal and fluorescence images showed patches of stomatal conductance that propagated coherently across the leaf surface. These nondispersing patches often crossed a given region of the leaf repeatedly at regular intervals, resulting in oscillations in stomatal conductance for that region. The existence of these coherently propagating structures has implications for the mechanisms that cause patchy stomatal behaviour as well as for the physiological ramifications of this phenomenon.
⊖ Bibtex
@ARTICLE{West2005PCE,
title={Dynamics of stomatal patches for a single surface of Xanthium strumarium L. leaves observed with fluorescence and thermal images},
author={West, Jevin D and Peak, David and Peterson, James Q and Mott, Keith A},
journal={Plant, Cell \& Environment},
volume={28},
number={5},
pages={633--641},
year={2005},
publisher={Wiley Online Library}}
title={Dynamics of stomatal patches for a single surface of Xanthium strumarium L. leaves observed with fluorescence and thermal images},
author={West, Jevin D and Peak, David and Peterson, James Q and Mott, Keith A},
journal={Plant, Cell \& Environment},
volume={28},
number={5},
pages={633--641},
year={2005},
publisher={Wiley Online Library}}
Evidence for complex, collective dynamics and emergent, distributed computation in plants (2004)
Proceedings of the National Academy of Sciences USA.
101: 918-922, doi: 10.1073/pnas.0307811100
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
It has been suggested that some biological processes are equivalent to computation, but quantitative evidence for that view is weak. Plants must solve the problem of adjusting stomatal apertures to allow sufficient CO2 uptake for photosynthesis while preventing excessive water loss. Under some conditions, stomatal apertures become synchronized into patches that exhibit richly complicated dynamics, similar to behaviors found in cellular automata that perform computational tasks. Using sequences of chlorophyll fluorescence images from leaves of Xanthium strumarium L. (cocklebur), we quantified spatial and temporal correlations in stomatal dynamics. Our values are statistically indistinguishable from those of the same correlations found in the dynamics of automata that compute. These results are consistent with the proposition that a plant solves its optimal gas exchange problem through an emergent, distributed computation performed by its leaves.
⊖ Bibtex
@ARTICLE{Peak2004PNAS,
title={Evidence for complex, collective dynamics and emergent, distributed computation in plants},
author={Peak, David and West, Jevin D and Messinger, Susanna M and Mott, Keith A},
journal={Proceedings of the National Academy of Sciences of the United States of America},
volume={101},
number={4},
pages={918--922},
year={2004}}
title={Evidence for complex, collective dynamics and emergent, distributed computation in plants},
author={Peak, David and West, Jevin D and Messinger, Susanna M and Mott, Keith A},
journal={Proceedings of the National Academy of Sciences of the United States of America},
volume={101},
number={4},
pages={918--922},
year={2004}}
Peer-Reviewed Conferences
Mobilizing Manufactured Reality: How Participatory Disinformation Shaped Deep Stories to Catalyze Action during the 2020 U.S. Presidential Election (2023)
ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW)
7(CSCW1):1-39
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Claims of election fraud throughout the 2020 U.S. Presidential Election and during the lead up to the January 6, 2021 insurrection attempt have drawn attention to the urgent need to better understand how people interpret and act on disinformation. In this work, we present three primary contributions: (1) a framework for understanding the interaction between participatory disinformation and informal and tactical mobilization; (2) three case studies from the 2020 U.S. election analyzed using detailed temporal, content, and thematic analysis; and (3) a qualitative coding scheme for understanding how digital disinformation functions to mobilize online audiences. We combine resource mobilization theory with previous work examining participatory disinformation campaigns and "deep stories" to show how false or misleading information functioned to mobilize online audiences before, during, and after election day. Our analysis highlights how users on Twitter collaboratively construct and amplify alleged evidence of fraud that is used to facilitate action, both online and off. We find that mobilization is dependent on the selective amplification of false or misleading tweets by influencers, the framing around those claims, as well as the perceived credibility of their source. These processes are a self-reinforcing cycle where audiences collaborate in the construction of a misleading version of reality, which in turn leads to offline actions that are used to further reinforce a manufactured reality. Through this work, we hope to better inform future interventions.
⊖ Bibtex
@ARTICLE{Prochaska2023 CSCW,
author={Prochaska, Stephen and Duskin, Kayla and Kharazian, Zarine and Minow, Carly and Blucker, Stephanie and Venuto, Sylvie and West, Jevin D and Starbird, Kate},
title={Mobilizing Manufactured Reality: How Participatory Disinformation Shaped Deep Stories to Catalyze Action during the 2020 U.S. Presidential Election},
journal={ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW)},
volume={7},
number={CSCW1},
pages={1--39},
publisher={ACM New York, NY, USA},
year = {2023}}
author={Prochaska, Stephen and Duskin, Kayla and Kharazian, Zarine and Minow, Carly and Blucker, Stephanie and Venuto, Sylvie and West, Jevin D and Starbird, Kate},
title={Mobilizing Manufactured Reality: How Participatory Disinformation Shaped Deep Stories to Catalyze Action during the 2020 U.S. Presidential Election},
journal={ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW)},
volume={7},
number={CSCW1},
pages={1--39},
publisher={ACM New York, NY, USA},
year = {2023}}
Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery (2022)
Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
April 29
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Isolated silos of scientific research and the growing challenge of information overload limit awareness across the literature and hinder innovation. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these informational “filter bubbles.” In response, we describe Bridger, a system for facilitating discovery of scholars and their work. We construct a faceted representation of authors with information gleaned from their papers and inferred author personas, and use it to develop an approach that locates commonalities and contrasts between scientists to balance relevance and novelty. In studies with computer science researchers, this approach helps users discover authors considered useful for generating novel research directions. We also demonstrate an approach for displaying information about authors, boosting the ability to understand the work of new, unfamiliar scholars. Our analysis reveals that Bridger connects authors who have different citation profiles and publish in different venues, raising the prospect of bridging diverse scientific communities.
⊖ Bibtex
@ARTICLE{Portenoy2022CHI,
author = {Jason Portenoy and Marissa Radensk and Jevin D. West and Eric Horvitz and Daniel S. Weld and Tom Hope},
title = {Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery},
journal = {Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems},
volume={April 29},
year={2022},
doi = {10.1145/3491102.3501905}}
author = {Jason Portenoy and Marissa Radensk and Jevin D. West and Eric Horvitz and Daniel S. Weld and Tom Hope},
title = {Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery},
journal = {Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems},
volume={April 29},
year={2022},
doi = {10.1145/3491102.3501905}}
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search (2020)
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (Systems Demonstrations).
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight has so far served over 15K users with over 42K page views and 13% returns.
⊖ Bibtex
@article{hope2020EMNLP,
title={SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search},
author={Hope, Tom and Portenoy, Jason and Vasan, Kishore and Borchardt, Jonathan and Horvitz, Eric and Weld, Daniel and Hearst, Marti and West, Jevin D},
journal={Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (Systems Demonstrations)},
url={https://www.aclweb.org/anthology/2020.emnlp-demos.18},
publisher={Association for Computational Linguistics},
doi={10.18653/v1/2020.emnlp-demos.18},
year={2020}}
title={SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search},
author={Hope, Tom and Portenoy, Jason and Vasan, Kishore and Borchardt, Jonathan and Horvitz, Eric and Weld, Daniel and Hearst, Marti and West, Jevin D},
journal={Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (Systems Demonstrations)},
url={https://www.aclweb.org/anthology/2020.emnlp-demos.18},
publisher={Association for Computational Linguistics},
doi={10.18653/v1/2020.emnlp-demos.18},
year={2020}}
Increasing Enrollment by Optimizing Scholarship Allocations Using Machine Learning and Genetic Algorithms (2020)
International Conference on Educational Data Mining (EDM).
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Effectively estimating student enrollment and recruiting students is critical to the success of any university. However, despite having an abundance of data and researchers at the forefront of data science, traditional universities are not fully leveraging machine learning and data mining approaches to improve their enrollment management strategies. In this project, we use data at a large, public university to increase their student enrollment. We do this by first predicting the enrollment of admitted first-year, first-time students using a suite of machine learning classifiers (AUROC = 0.85). We then use the results from these machine learning experiments in conjunction with genetic algorithms to optimize scholarship disbursement. We show the effectiveness of this approach using real-world enrollment metrics. Our optimized model was expected to increase enrollment yield by 15.8% over previous disbursement strategies. After deploying the model and confirming student enrollment decisions, the university actually saw a 23.3% increase in enrollment yield. This resulted in millions of dollars in additional annual tuition revenue and a commitment by the university to employ the method in subsequent enrollment cycles. We see this as a successful case study of how educational institutions can more effectively leverage their data.
⊖ Bibtex
@article{Aulck2020EDM,
title={Increasing Enrollment by Optimizing Scholarship Allocations Using Machine Learning and Genetic Algorithms},
author={Aulck, Lavi and Nambi, Dev and West, Jevin D},
booktitle={International Conference on Educational Data Mining (EDM)},
pages={29-38},
URL={https://educationaldatamining.org/files/conferences/EDM2020/papers/paper_44.pdf},
year={2020}}
title={Increasing Enrollment by Optimizing Scholarship Allocations Using Machine Learning and Genetic Algorithms},
author={Aulck, Lavi and Nambi, Dev and West, Jevin D},
booktitle={International Conference on Educational Data Mining (EDM)},
pages={29-38},
URL={https://educationaldatamining.org/files/conferences/EDM2020/papers/paper_44.pdf},
year={2020}}
Modeling and analysis of migration and mobility among scholars using bibliometric data (2020)
Social Informatics.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Bibliometric data give us unprecedented opportunities for understanding patterns of mobility among scholars, testing existing migration theories and developing new ones. In a series of studies, we leverage large-scale bibliometric data to measure and model the migration of scholars in different contexts and answer fundamental questions in the intersection of high-skilled migration and science of science. Our research project series explores three research questions within the scope of migration in academia and analysis of large-scale bibliometric data.
⊖ Bibtex
@article{Aref2020socialinformatics,
author = {Aref, Samin and Miranda-Gonzalez, Andrea and Subbotin, Alexander and Theile, Tom and Zagheni, Emilio and West, Jevin D},
title = {Modeling and analysis of migration and mobility among scholars using bibliometric data},
journal = {Social Informatics: 2nd Workshop on Reframing Research},
url={https://refresh20.infrascience.isti.cnr.it/files/RefResh_2020_paper_3.pdf},
year = {2020}}
author = {Aref, Samin and Miranda-Gonzalez, Andrea and Subbotin, Alexander and Theile, Tom and Zagheni, Emilio and West, Jevin D},
title = {Modeling and analysis of migration and mobility among scholars using bibliometric data},
journal = {Social Informatics: 2nd Workshop on Reframing Research},
url={https://refresh20.infrascience.isti.cnr.it/files/RefResh_2020_paper_3.pdf},
year = {2020}}
GraviTIE: Exploratory Analysis of Large-Scale Heterogeneous Image Collections (2019)
The World Wide Web Conference (WWW).
pgs, 3605–3609
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
We present GraviTIE (Global Representation and Visualization of Text and Image Embeddings, pronounced ”gravity”), an interactive visualization system for large-scale image datasets. GraviTIE operates on datasets consisting of images equipped with unstructured and semi-structured text, relying on multi-modal unsupervised learning methods to produce an interactive similarity map. Users interact with the similarity map through pan and zoom operations, as well as keyword-oriented queries. GraviTIE makes no assumptions about the form, scale, or content of the data, allowing it to be used for exploratory analysis, assessment of unsupervised learning methods, data curation and quality control, data profiling, and other purposes where flexibility and scalability are paramount. We demonstrate GraviTIE on three real datasets: 500k images from the Russian misinformation dataset from Twitter, 2 million art images, and 5 million scientific figures. A screencast video is available at https://vimeo.com/310511187.
⊖ Bibtex
@inproceedings{Yang2019www,
author = {Yang, Sean and Rodriguez, Luke and West, Jevin D. and Howe, Bill},
title = {GraviTIE: Exploratory Analysis of Large-Scale Heterogeneous Image Collections},
booktitle={The World Wide Web Conference (WWW)},
year = {2019},
doi = {10.1145/3308558.3314142},
pages={3605--3609}}
author = {Yang, Sean and Rodriguez, Luke and West, Jevin D. and Howe, Bill},
title = {GraviTIE: Exploratory Analysis of Large-Scale Heterogeneous Image Collections},
booktitle={The World Wide Web Conference (WWW)},
year = {2019},
doi = {10.1145/3308558.3314142},
pages={3605--3609}}
The Demography of the Peripatetic Researcher: Evidence on Highly Mobile Scholars from the Web of Science (2019)
Social Informatics.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The policy debate around researchers' geographic mobility has been moving away from a theorized zero-sum game in which countries can be winners (''brain gain'') or losers (''brain drain''), and toward the concept of ''brain circulation,'' which implies that researchers move in and out of countries and everyone benefits. Quantifying trends in researchers' movements is key to understanding the drivers of the mobility of talent, as well as the implications of these patterns for the global system of science, and for the competitive advantages of individual countries. Existing studies have investigated bilateral flows of researchers. However, in order to understand migration systems, determining the extent to which researchers have worked in more than two countries is essential. This study focuses on the subgroup of highly mobile researchers whom we refer to as ''peripatetic researchers'' or ''super-movers.''.
⊖ Bibtex
@inproceedings{Aref2019socialinformatics,
author = {Aref, Samin and Zagheni, Emilio and West, Jevin D.},
title = {The Demography of the Peripatetic Researcher: Evidence on Highly Mobile Scholars from the Web of Science},
booktitle={Social Informatics},
year = {2019},
publisher = {Springer International Publishing},
isbn = {978-3-030-34971-4},
pages={50--65}}
author = {Aref, Samin and Zagheni, Emilio and West, Jevin D.},
title = {The Demography of the Peripatetic Researcher: Evidence on Highly Mobile Scholars from the Web of Science},
booktitle={Social Informatics},
year = {2019},
publisher = {Springer International Publishing},
isbn = {978-3-030-34971-4},
pages={50--65}}
Using Machine Learning and Genetic Algorithms to Optimize Scholarship Allocation for Student Yield (2019)
Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Machine Learning in Education.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Effectively estimating student enrollment and recruiting students is critical to the success of any university. However, despite having an abundance of data and researchers at the forefront of data science, universities are not fully leveraging machine learning and data mining approaches to improve their enrollment management strategies. In this project, we use data at a large, public university to increase their student enrollment. We do this by first predicting the enrollment of admitted first-year, first-time students using a suite of machine learning classifiers (AUROC = 0.85). We then use the results from these machine learning experiments in conjunction with genetic algorithms to optimize scholarship disbursement. We show the effectiveness of this approach using actual enrollment metrics. Our optimized model was expected to increase enrollment yield by 15.8% over previous disbursement strategies. After deploying the model and confirming student enrollment decisions, the university actually saw a 23.3% increase in enrollment yield. This resulted in millions of dollars in additional annual tuition revenue and a commitment by the university to employ the method in subsequent enrollment cycles. We see this as a successful case study of how educational institutions can more effectively leverage their data.
⊖ Bibtex
@ARTICLE{Aulck2019kdd,
author = {Aulck, Lavi and Nambi, Dev and West, Jevin D},
title = {Using Machine Learning and Genetic Algorithms to Optimize Scholarship Allocation for Student Yield},
journal = {Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Machine Learning in Education},
year = {2019}}
author = {Aulck, Lavi and Nambi, Dev and West, Jevin D},
title = {Using Machine Learning and Genetic Algorithms to Optimize Scholarship Allocation for Student Yield},
journal = {Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Machine Learning in Education},
year = {2019}}
Mining University Registrar Records to Predict First-Year Undergraduate Attrition (2019)
International Conference on Educational Data Mining (EDM).
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Each year, roughly 30% of first-year students at US baccalaureate institutions do not return for their second year and billions of dollars are spent educating these students. Yet, little quantitative research has analyzed the causes and possible remedies for student attrition. What’s more, most of the previous attempts to model attrition at traditional campuses using machine learning have focused on small, homogeneous groups of students. In this work, we model student attrition using a dataset that is composed almost exclusively of information routinely collected for record-keeping at a large, public US university. By examining the entirety of the university’s student body and not a subset thereof, we use one of the largest known datasets for examining attrition at a public US university (N = 66,060). Our results show that students’ second year re-enrollment and eventual graduation can be accurately predicted based on a single year of data (AUROCs = 0.887 and 0.811, respectively). We find that demographic data (such as race, gender, etc.) and pre-admission data (such as high school academics, entrance exam scores, etc.) - upon which most admissions processes are predicated - are not nearly as useful as early college performance/transcript data for these predictions. These results highlight the potential for data mining to impact student retention and success at traditional campuses.
⊖ Bibtex
@ARTICLE{Aulck2019kdd,
author = {Aulck, Lovenoor and Nambi, Dev and Velagapudi, Nishant and Blumenstock, Joshua and West, Jevin D.},
title = {Mining University Registrar Records to Predict First-Year Undergraduate Attrition},
journal = {International Conference on Educational Data Mining (EDM)},
year = {2019}}
author = {Aulck, Lovenoor and Nambi, Dev and Velagapudi, Nishant and Blumenstock, Joshua and West, Jevin D.},
title = {Mining University Registrar Records to Predict First-Year Undergraduate Attrition},
journal = {International Conference on Educational Data Mining (EDM)},
year = {2019}}
Using institutional records and student survey responses to examine freshmen interest groups (FIGs) (2019)
Society for Research in Educational Effectiveness’ (SREE’s) Symposium on New Types of Data and Their Applications in Educational Research.
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Freshman orientation seminars (freshman seminars) are courses dedicated to helping incoming students transition to college life both socially and academically. The popularity and ubiquity of these courses has made them among the most studied course genre in American higher education. That said, the existence and effectiveness of these seminars on college campuses across the U.S. continues to be called into question. Though some prior studies have used randomized controlled trials, large scale and causally rigorous studies of seminar effectiveness using matched comparison groups are rare.
⊖ Bibtex
@article{Aulck2019sree,
author = {Aulck, Lovenoor and Malters, Joshua and Lee, Casey and Mancinelli, Gianni and Sun, Min and West, Jevin D.},
title = {Using institutional records and student survey responses to examine freshmen interest groups (FIGs)},
journal = {Society for Research in Educational Effectiveness’ (SREE) Symposium on New Types of Data and Their Applications in Educational Research},
year = {2019}}
author = {Aulck, Lovenoor and Malters, Joshua and Lee, Casey and Mancinelli, Gianni and Sun, Min and West, Jevin D.},
title = {Using institutional records and student survey responses to examine freshmen interest groups (FIGs)},
journal = {Society for Research in Educational Effectiveness’ (SREE) Symposium on New Types of Data and Their Applications in Educational Research},
year = {2019}}
Identifying the Central Figure of a Scientific Paper (2019)
International Conference on Document Analysis and Recognition (ICDAR).
pgs, 1063-1070
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a ”central figure” that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.
⊖ Bibtex
@ARTICLE{Yang2019ICDAR,
author = {Yang, Sean T. and Lee, Po-Shen and Kazakova, Lia and Joshi, Abhishek and Oh, Bum M. and West, Jevin D. and Howe, Bill},
title = {Identifying the Central Figure of a Scientific Paper},
journal = {International Conference on Document Analysis and Recognition (ICDAR)},
pages = {1063-1070},
doi = {https://doi.org/10.1109/ICDAR.2019.00173},
year = {2019}}
author = {Yang, Sean T. and Lee, Po-Shen and Kazakova, Lia and Joshi, Abhishek and Oh, Bum M. and West, Jevin D. and Howe, Bill},
title = {Identifying the Central Figure of a Scientific Paper},
journal = {International Conference on Document Analysis and Recognition (ICDAR)},
pages = {1063-1070},
doi = {https://doi.org/10.1109/ICDAR.2019.00173},
year = {2019}}
Is together better? Examining scientific collaborations across multiple authors, institutions, and departments (2018)
Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Scholarly Big Data (BigScholar)
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Collaborations are an integral part of scientific research and publishing. In the past, access to large-scale corpora has limited the ways in which questions about collaborations could be investigated. However, with improvements in data/metadata quality and access, it is possible to explore the idea of research collaboration in ways beyond the traditional definition of multiple authorship. In this paper, we examine scientific works through three different lenses of collaboration: across multiple authors, multiple institutions, and multiple departments. We believe this to be a first look at multiple departmental collaborations as we employ extensive data curation to disambiguate authors' departmental affiliations for nearly 70,000 scientific papers. We then compare citation metrics across the different definitions of collaboration and find that papers defined as being collaborative were more frequently cited than their non-collaborative counterparts, regardless of the definition of collaboration used. We also share preliminary results from examining the relationship between co-citation and co-authorship by analyzing the extent to which similar fields (as determined by co-citation) are collaborating on works (as determined by co-authorship). These preliminary results reveal trends of compartmentalization with respect to intra-institutional collaboration and show promise in being expanded.
⊖ Bibtex
@ARTICLE{Aulck2018kdd,
author = {Aulck, Lovenoor and Vasan, Kishore and West, Jevin D.},
title = {Is together better? Examining scientific collaborations across multiple authors, institutions, and departments},
journal = {Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Scholarly Big Data (BigScholar)},
year = {2018}}
author = {Aulck, Lovenoor and Vasan, Kishore and West, Jevin D.},
title = {Is together better? Examining scientific collaborations across multiple authors, institutions, and departments},
journal = {Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Scholarly Big Data (BigScholar)},
year = {2018}}
Classifying digitized art type and time period (2018)
SIGKDD Conference On Knowledge Discovery and Data (KDD) Mining Workshop on Data Science for Digital Art History.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Millions of art images have been digitized over the last several decades. This has created new opportunities for art scholars and historians. However, searching and navigating these art images is difficult because of the sparsity of metadata and contextual information used to describe these images. Unless one knows the exact title and artist, finding related artwork is a difficult task. The research in this project addresses this challenge by developing unsupervised computer vision methods that generates metadata automatically from artworks. Our dataset includes more than 300,000 art images from three sources: the Metropolitan Museum of Art, WikiArt and Artsy, an online art collection platform. If successful, we plan to build an interactive interface for exploring the extracted features and developing a recommendation system that can be used by art historians, scholars, and art aficionados.
⊖ Bibtex
@ARTICLE{Yang2018kddart,
author = {Yang, Sean and Oh, Bum Mook and Merchant, Daniel and Howe, Bill and West, Jevin D.},
title = {Classifying digitized art type and time period},
journal = {SIGKDD Conference On Knowledge Discovery and Data (KDD) Mining Workshop on Data Science for Digital Art History},
year = {2018}}
author = {Yang, Sean and Oh, Bum Mook and Merchant, Daniel and Howe, Bill and West, Jevin D.},
title = {Classifying digitized art type and time period},
journal = {SIGKDD Conference On Knowledge Discovery and Data (KDD) Mining Workshop on Data Science for Digital Art History},
year = {2018}}
Chromatic Structure and Family Resemblance in Large Art Collections — Exemplary Quantification and Visualizations (2018)
SIGKDD Conference On Knowledge Discovery and Data (KDD) Mining Workshop on Data Science for Digital Art History.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Computational pattern recognition has made ground-breaking progress in recent years by combining advanced methods of machine learning with ever increasing amounts of visual data. Algorithms that learn to learn, combined with massive parallel computation in so-called GPU clusters, and billions of images a day acquired via sensors, or uploaded by Web users, have led to a situation where computers are able to recognize faces, spot cats in any body-configuration, and even drive cars without human interaction. In Art History such advanced methods of pattern recognition increasingly aim to compete with human connoisseurship. Relevant studies, for example, successfully identify duplicate photos in image archives (Resig, 2013), quickly find artworks given a certain object (Crowley and Zisserman, 2014), quantify the innovativeness of paintings (Elgammal and Saleh, 2015), convincingly discern and date architectural styles at a mega-city scale (Lee et al., 2015), and track the evolution of color contrast in Western Art from chiaroscuro to landscape painting (Kim et al., 2014 and Lee et al., 2017). What is missing is a rigorous reconciliation between state-of-the-art computer science techniques and established art historical standards based on trained observation and hermeneutic interpretation. Such a reconciliation is hard due to both the so-called “curse of dimensionality” in machine learning, and the cognitive limit of individual researchers confronted with potentially millions of images..
⊖ Bibtex
@ARTICLE{Yang2018kddart,
author = {Tran, Loan and Lee, Poshen and West, Jevin D. and Schich, Maximilian},
title = {Chromatic Structure and Family Resemblance in Large Art Collections — Exemplary Quantification and Visualizations},
journal = {SIGKDD Conference On Knowledge Discovery and Data (KDD) Mining Workshop on Data Science for Digital Art History},
year = {2018}}
author = {Tran, Loan and Lee, Poshen and West, Jevin D. and Schich, Maximilian},
title = {Chromatic Structure and Family Resemblance in Large Art Collections — Exemplary Quantification and Visualizations},
journal = {SIGKDD Conference On Knowledge Discovery and Data (KDD) Mining Workshop on Data Science for Digital Art History},
year = {2018}}
A worldwide patent citation network (2017)
IP Statistics for Decision Makers (IPSDM) conference.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Patent rights are granted by almost all governments around the world. A primary motivation for these grants is to foster technological innovation. By offering inventors the prospect of limited rights to exclude others from exploiting their inventions, patents are often assumed to spur inventive activity, followed by commercialization of new inventions. This view of patents as important policy levels for spurring innovation has existed for a long time. In fact, one of the first successful global treaties was the Paris Convention for the Protection of Industrial Property ("Paris Convention"), which offers mutual recognition of patents, designs, and trademarks, came into force on July 7, 1884. One of the foundational principles of the Paris Convention is the international interconnectedness of patent documents...
⊖ Bibtex
@article{torrance2017virginia,
author = {Torrance, Andrew W and West, Jevin D},
title = {A worldwide patent citation network},
journal = {IP Statistics for Decision Makers (IPSDM) conference},
year = {2017}}
author = {Torrance, Andrew W and West, Jevin D},
title = {A worldwide patent citation network},
journal = {IP Statistics for Decision Makers (IPSDM) conference},
year = {2017}}
PhyloParser: A Hybrid Algorithm for Extracting Phylogenies from Dendrogram (2017)
14th IAPR International Conference on Document Analysis and Recognition (ICDAR).
1087-1094, doi: 10.1109/ICDAR.2017.180
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
We consider a new approach to extracting information from dendrograms in the biological literature representing phylogenetic trees. Existing algorithmic approaches to extract these relationships rely on tracing tree contours and are very sensitive to image quality issues, but manual approaches require significant human effort and cannot be used at scale. We introduce PhyloParser, a fully automated, end-to-end system for automatically extracting species relationships from phylogenetic tree diagrams using a multi-modal approach to digest diverse tree styles. Our approach automatically identifies phylogenetic tree figures in the scientific literature, extracts the key components of tree structure, reconstructs the tree, and recovers the species relationships. We use multiple methods to extract tree components with high recall, then filter false positives by applying topological heuristics about how these components fit together. We present an evaluation on a real-world dataset to quantitatively and qualitatively demonstrate the efficacy of our approach. Our classifier achieves 89% recall and 99% precision, with a low average error rate relative to previous approaches. We aim to use PhyloParser to build a linked, open, comprehensive database of phylogenetic information that covers the historical literature as well as current data, and then use this resource to identify areas of disagreement and poor coverage in the biological literature.
⊖ Bibtex
@inproceedings{Lee2017icdar,
author = {Lee, Poshen and Yang, Sean and West, Jevin D. and Howe, Bill},
title = {PhyloParser: A Hybrid Algorithm for Extracting Phylogenies from Dendrogram},
booktitle={14th IAPR International Conference on Document Analysis and Recognition (ICDAR)},
year = {2017},
doi = {10.1109/ICDAR.2017.180},
pages = {1087-1094}}
author = {Lee, Poshen and Yang, Sean and West, Jevin D. and Howe, Bill},
title = {PhyloParser: A Hybrid Algorithm for Extracting Phylogenies from Dendrogram},
booktitle={14th IAPR International Conference on Document Analysis and Recognition (ICDAR)},
year = {2017},
doi = {10.1109/ICDAR.2017.180},
pages = {1087-1094}}
STEM-ming the Tide: Predicting STEM attrition using student transcript data (2017)
Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Machine Learning in Education
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Science, technology, engineering, and math (STEM) fields play growing roles in national and international economies by driving innovation and generating high salary jobs. Yet, the US is lagging behind other highly industrialized nations in terms of STEM education and training. Furthermore, many economic forecasts predict a rising shortage of domestic STEM-trained professions in the US for years to come. One potential solution to this deficit is to decrease the rates at which students leave STEM-related fields in higher education, as currently over half of all students intending to graduate with a STEM degree eventually attrite. However, little quantitative research at scale has looked at causes of STEM attrition, let alone the use of machine learning to examine how well this phenomenon can be predicted. In this paper, we detail our efforts to model and predict dropout from STEM fields using one of the largest known datasets used for research on students at a traditional campus setting. Our results suggest that attrition from STEM fields can be accurately predicted with data that is routinely collected at universities using only information on students' first academic year. We also propose a method to model student STEM intentions for each academic term to better understand the timing of STEM attrition events. We believe these results show great promise in using machine learning to improve STEM retention in traditional and non-traditional campus settings.
⊖ Bibtex
@inproceedings{Aulck2017KDD,
author = {Aulck, Lovenoor and Aras, Rohan and Li, Lysia Li and L’Heureux, Coulter and Lu, Peter and West, Jevin D.},
title = {STEM-ming the Tide: Predicting STEM attrition using student transcript data},
booktitle={Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Machine Learning in Education},
url = {https://arxiv.org/pdf/1708.09344.pdf},
year = {2017}}
author = {Aulck, Lovenoor and Aras, Rohan and Li, Lysia Li and L’Heureux, Coulter and Lu, Peter and West, Jevin D.},
title = {STEM-ming the Tide: Predicting STEM attrition using student transcript data},
booktitle={Special Interest Group on Knowledge Discovery and Data Mining’s (KDD) Workshop on Machine Learning in Education},
url = {https://arxiv.org/pdf/1708.09344.pdf},
year = {2017}}
Deep Mapping of the Visual Literature (2017)
Proceedings of the 26th International Conference onWorld Wide Web Companion (WWW): Workshop on Big Scholarly Data.
1273-1277, doi: 10.1145/3041021.3053065
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
We consider how patterns of figure use in the scientific literature relate to impact, change over time, and vary across disciplines. We use a convolutional neural network to embed figures as feature vectors in a high-dimensional space, then visualize this space as a 2D heatmap to expose patterns. We consider how these patterns vary with respect to time, impact, and discipline, concluding that high-impact papers tend to include significantly more data-carrying figures (i.e., visualizations), despite a downward trend in such figures overall. We also show how this approach can be used to bootstrap targeted information extraction projects for specific figure types, describing one such project involving phylogenetic trees.
⊖ Bibtex
@inproceedings{Howe2017www,
author = {Howe, Bill and Lee, Po-shen and Grechkin, Maxim and Yang, Sean T. and West, Jevin D.},
title = {Deep Mapping of the Visual Literature},
booktitle={Proceedings of the 26th International Conference on World Wide Web Companion (WWW)},
isbn = {978-1-4503-4914-7},
location = {Perth, Australia},
pages = {1273--1277},
doi = {10.1145/3041021.3053065},
url = {https://doi.org/10.1145/3041021.3053065},
year = {2017},
keywords = {deep learning, machine vision, scientometrics, viziometrics},
acmid = {3053065}}
author = {Howe, Bill and Lee, Po-shen and Grechkin, Maxim and Yang, Sean T. and West, Jevin D.},
title = {Deep Mapping of the Visual Literature},
booktitle={Proceedings of the 26th International Conference on World Wide Web Companion (WWW)},
isbn = {978-1-4503-4914-7},
location = {Perth, Australia},
pages = {1273--1277},
doi = {10.1145/3041021.3053065},
url = {https://doi.org/10.1145/3041021.3053065},
year = {2017},
keywords = {deep learning, machine vision, scientometrics, viziometrics},
acmid = {3053065}}
Echo Chambers in Science? (2017)
American Sociological Association (ASA) Annual Meeting, August 2017
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
This paper examines whether digitization and the rise of integrated academic search engines have transformed how researchers engage with previous literature, a critical component of modern scientific practice. Among technological advancements, we particularly focus on the recent emergence of academic search engines such as Google Scholar, because these services are provided based on proprietary algorithms that actively interfere in authors' search process. While the impact of general recommender systems has been widely noted, the effect of academic recommender systems on scientific practice has not been fully examined. Using the comprehensive Web of Science database covering a wide range of publications and the citation links between them, we focus on yearly changes in the the citing behavior of researchers in two relatively similar disciplines, Sociology and Social Work, between 1998 to 2014. We document three temporal changes in researchers' behavior. First, researchers' citations in both disciplines have become more expansive since 2005 and stable after 2010. Second, controlling for a measure of journal prestige, the impact of a paper-based popularity measure, the cumulative previous citation count, has increased in both Sociology and Social Work. Third, more papers published in lower-tier journals are now cited than prior to 2005, and the variability of citation counts among papers published in the same journal has also increased. Based on three findings, we see some evidence that the digitization of science has democratized the exposure of prior research and weakened journals' role as gatekeepers. Nevertheless, the increasing importance of prior citations suggests a competing trend is also occurring that may create an echo chamber centered on small numbers of highly cited papers.
⊖ Bibtex
@inproceedings{Kim2016asa,
author = {Kim, Lanu and West, Jevin D and Stovel, Katherine},
title = {Echo Chambers in Science?},
booktitle={American Sociological Association},
year = {2017}}
author = {Kim, Lanu and West, Jevin D and Stovel, Katherine},
title = {Echo Chambers in Science?},
booktitle={American Sociological Association},
year = {2017}}
Improved adaptation in exogenously and endogenously changing environments (2017)
Proceedings of the 14th European Conference on Artificial Life (ECAL).
14:306-313, doi: 10.7551/ecal_a_052
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Fitness landscapes are visual metaphors that appeal to our intuition for real-world landscapes to help us understand how populations evolve. The object inspiring the metaphor is better described as a networks composed of all possible genotypes, but they are frequently simplified to a surface where the fitness of each genotype is represented by elevation. Selection drives evolving populations to ascend the landscape until they are dominated by genotypes from which no further beneficial mutations are likely, known as a peak. However, by allowing for environmental change, former peaks can vanish, forcing populations to resume adapting. To explore how changing environments affect adaptation, we used the digital evolution platform, Avida, wherein we could manipulate the organisms’ environment as they are subject to natural evolutionary forces. We found that transient exposure to alternate environments frequently resulted in more fit genotypes. Negative-frequency-dependent environments, in particular, yielded strong fitness benefits after returning to the original environment. Furthermore, we explored how such environmental change could yield adaptive benefits via valley crossing and how such knowledge could be exploited in systems where improving the rate of adaption is beneficial.
⊖ Bibtex
@inproceedings{Nahum2017ECAL,
author = {Nahum, Joshua R and West, Jevin D and Althouse, Benjamin M and Zaman, Luis and Ofria, Charles and Kerr, Benjamin},
title = {Improved adaptation in exogenously and endogenously changing environments},
booktitle={Proceedings of the 14th European Conference on Artificial Life (ECAL)},
volume = {14},
doi = {10.7551/ecal_a_052},
pages = {306-313},
year = {2017}}
author = {Nahum, Joshua R and West, Jevin D and Althouse, Benjamin M and Zaman, Luis and Ofria, Charles and Kerr, Benjamin},
title = {Improved adaptation in exogenously and endogenously changing environments},
booktitle={Proceedings of the 14th European Conference on Artificial Life (ECAL)},
volume = {14},
doi = {10.7551/ecal_a_052},
pages = {306-313},
year = {2017}}
Visualizing Scholarly Publications and Citations to Enhance Author Profiles (2017)
Proceedings of the 26th International Conference on World Wide Web Companion (WWW).
Workshop on Big Scholarly Data. 1279-1282, doi: 10.1145/3041021.3053058
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
With data on scholarly publications becoming more abundant and accessible, there exist new opportunities for using this information to provide rich author profiles to display and explore scholarly work. We present a pair of linked visualizations connected to the Microsoft Academic Graph that can be used to explore the publications and citations of individual authors. We provide an online application with which a user can manage collections of papers and generate these visualizations.
⊖ Bibtex
@inproceedings{Portenoy2017www,
author = {Portenoy, Jason and West, Jevin D},
title = {Visualizing Scholarly Publications and Citations to Enhance Author Profiles},
booktitle={Proceedings of the 26th International Conference on World Wide Web Companion (WWW): Workshop on Big Scholarly},
year = {2017}}
author = {Portenoy, Jason and West, Jevin D},
title = {Visualizing Scholarly Publications and Citations to Enhance Author Profiles},
booktitle={Proceedings of the 26th International Conference on World Wide Web Companion (WWW): Workshop on Big Scholarly},
year = {2017}}
Predicting Student Dropout in Higher Education (2016)
International Conference on Machine Learning (ICML).
Workshop on Data4Good: Machine Learning in Social Good Applications. arxiv: 1606.06364
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Each year, roughly 30% of first-year students at US baccalaureate institutions do not return for their second year and over $9 billion is spent educating these students. Yet, little quantitative research has analyzed the causes and possible remedies for student attrition. Here, we describe initial efforts to model student dropout using the largest known dataset on higher education attrition, which tracks over 32,500 students' demographics and transcript records at one of the nation's largest public universities. Our results highlight several early indicators of student attrition and show that dropout can be accurately predicted even when predictions are based on a single term of academic transcript data. These results highlight the potential for machine learning to have an impact on student retention and success while pointing to several promising directions for future work.
⊖ Bibtex
@inproceedings{Aulck2016ICML,
author = {L. Aulck and N. Velagapudi and J. Blumenstock and J.D. West},
title = {Predicting Student Dropout in Higher Education},
booktitle={International Conference on Machine Learning (ICML). Workshop on Data4Good: Machine Learning in Social Good Applications},
url = {https://arxiv.org/abs/1606.06364},
year = {2016}}
author = {L. Aulck and N. Velagapudi and J. Blumenstock and J.D. West},
title = {Predicting Student Dropout in Higher Education},
booktitle={International Conference on Machine Learning (ICML). Workshop on Data4Good: Machine Learning in Social Good Applications},
url = {https://arxiv.org/abs/1606.06364},
year = {2016}}
Delineating Fields Using Mathematical Jargon (2016)
Joint Conference on Digital Libraries (JCDL).
Workshop on Bibliometric-enhanced Information Retrieval & Natural Language Processing. pgs 63-71
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Tracing ideas through the scientific literature is useful in understanding the origin of ideas and for generating new ones. Machines can be trained to do this at large scale, feeding search engines and recommendation algorithms. Citations and text are the features commonly used for these tasks. In this paper, we focus on a largely ignored facet of scholarly papers—the equations. Mathematical language varies from field to field but original formulae are maintained over generations (e.g., Shannon’s Entropy equation). Here we extract a common set of mathematical symbols from more than 250,000 LATEX source files in the arXiv repository. We compare the symbol distributions across different fields and calculate the jargon distance between fields. We find a greater difference between the experimental and theoretical disciplines than within these fields. This provides a first step at using equations as a bridge between disciplines that may not cite each other or may speak different natural languages but use a similar mathematical language.
⊖ Bibtex
@inproceedings {West2016JCDL,
author = {West, Jevin D and Portenoy, Jason},
title = {Delineating Fields Using Mathematical Jargon},
booktitle={JCDL Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)},
pages = {63-71},
year = {2016}}
author = {West, Jevin D and Portenoy, Jason},
title = {Delineating Fields Using Mathematical Jargon},
booktitle={JCDL Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)},
pages = {63-71},
year = {2016}}
Viziometrix: A platform for analyzing the visual information in big scholarly data (2016)
Proceedings of the 25th International Conference on World Wide Web (WWW).
Worskhop on Big Scholarly Data. pgs. 413-418, doi: 10.1145/2872518.2890523
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
We present VizioMetrix, a platform that extracts visual information from the scientific literature and makes it available for use in new information retrieval applications and for studies that look at patterns of visual information across millions of papers. New ideas are conveyed visually in the scientific literature through figures --- diagrams, photos, visualizations, tables --- but these visual elements remain ensconced in the surrounding paper and difficult to use directly to facilitate information discovery tasks or longitudinal analytics. Very few applications in information retrieval, academic search, or bibliometrics make direct use of the figures, and none attempt to recognize and exploit the type of figure, which can be used to augment interactions with a large corpus of scholarly literature. The VizioMetrix platform processes a corpus of documents, classifies the figures, organizes the results into a cloud-hosted databases, and drives three distinct applications to support bibliometric analysis and information retrieval. The first application supports information retrieval tasks by allowing rapid browsing of classified figures. The second application supports longitudinal analysis of visual patterns in the literature and facilitates data mining of these figures. The third application supports crowdsourced tagging of figures to improve classification, augment search, and facilitate new kinds of analyses. Our initial corpus is the entirety of PubMed Central (PMC), and will be released to the public alongside this paper; we welcome other researchers to make use of these resources.
⊖ Bibtex
@inproceedings{Lee2016www,
author = {Lee, Po-shen and West, Jevin D. and Howe, Bill},
title = {Viziometrix: A platform for analyzing the visual information in big scholarly data},
booktitle = {Proceedings of the 25th international conference companion on world wide web (WWW)},
pages = {413--418},
doi = {10.1145/2872518.2890523},
year = {2016}}
author = {Lee, Po-shen and West, Jevin D. and Howe, Bill},
title = {Viziometrix: A platform for analyzing the visual information in big scholarly data},
booktitle = {Proceedings of the 25th international conference companion on world wide web (WWW)},
pages = {413--418},
doi = {10.1145/2872518.2890523},
year = {2016}}
Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF) (2016)
WSDM Conference: Entity Ranking Challenge Workshop.
arXiv:1606.08534
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Microsoft Research hosted the 2016 WSDM Cup Challenge based on the Microsoft Academic Graph. The goal was to provide static rankings for the articles that make up the graph, with the rankings to be evaluated against those of human judges. While the Microsoft Academic Graph provided metadata about many aspects of each scholarly document, we focused more narrowly on citation data and used this contest as an opportunity to test the Article Level Eigenfactor (ALEF), a novel citation-based ranking algorithm, and evaluate its performance against competing algorithms that drew upon multiple facets of the data from a large, real world dataset (122M papers and 757M citations). Our final submission to this contest was scored at 0.676, earning second place.
⊖ Bibtex
@inproceedings{WesleySmith2016wsdm,
author = {Wesley-Smith, Ian and Bergstrom, Carl T and West, Jevin D},
title = {Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF)},
booktitle={WSDM Conference: Entity Ranking Challenge Workshop},
url ={http://arxiv.org/abs/1606.08534},
year = {2016}}
author = {Wesley-Smith, Ian and Bergstrom, Carl T and West, Jevin D},
title = {Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF)},
booktitle={WSDM Conference: Entity Ranking Challenge Workshop},
url ={http://arxiv.org/abs/1606.08534},
year = {2016}}
Babel: A platform for research in scholarly article recommendation (2016)
Proceedings of the 25th International Conference on World Wide Web (WWW).
Worskhop on Big Scholarly Data. pgs. 289-294, doi: 10.1145/2872518.2890517
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The body of scientific literature is growing at an exponential rate. This expansion of scientific knowledge has increased the need for tools to help users find relevant articles. However, researchers developing new scholarly article recommendation algorithms face two substantial hurdles: acquiring high-quality, large-scale scholarly metadata and mechanisms for evaluating their recommendation algorithms. To address these problems we created Babel—an open-source web platform uniting publisher, researchers, and users. Babel includes tens of millions of scholarly articles, several content- based recommendation algorithms, and tools for integrating recommendations into publisher websites and other scholarly platforms.
⊖ Bibtex
@inproceedings{wesleysmith2016babel,
author = {Wesley-Smith, Ian and West, Jevin D},
title = {Babel: A platform for research in scholarly article recommendation},
booktitle = {Proceedings of the 25th international conference companion on world wide web (WWW)},
pages = {389-394},
year = {2016}}
author = {Wesley-Smith, Ian and West, Jevin D},
title = {Babel: A platform for research in scholarly article recommendation},
booktitle = {Proceedings of the 25th international conference companion on world wide web (WWW)},
pages = {389-394},
year = {2016}}
Dynamic Visualization of Citation Networks Showing the Influence of Scholarly Fields Over Time (2016)
WWW Workshop on Semantics, Analytics, Visualization. Enhancing Scholarly Data.
Springer International Publishing. pgs. 147-151
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Citation graphs between scholarly papers can be used to learn about the structure and development of scholarship. We present a generalizable approach to visualizing scholarly influence over time, using a dynamic node-link diagram representing the citation patterns between groups of papers. We combine this approach with hierarchical clustering techniques that exploit the network structure to partition the graph into clusters representing fields and subfields. We use these methods to explore the influence that fields have had on other fields over time.
⊖ Bibtex
@inproceedings {Portenoy2016www,
author = {Portenoy, Jason and West, Jevin D},
title = {Dynamic Visualization of Citation Networks Showing the Influence of Scholarly Fields Over Time},
booktitle={Semantics, Analytics, Visualization. Enhancing Scholarly Data},
pages = {147-151},
publisher = {Springer International Publishing},
year = {2016}}
author = {Portenoy, Jason and West, Jevin D},
title = {Dynamic Visualization of Citation Networks Showing the Influence of Scholarly Fields Over Time},
booktitle={Semantics, Analytics, Visualization. Enhancing Scholarly Data},
pages = {147-151},
publisher = {Springer International Publishing},
year = {2016}}
An experimental platform for scholarly article recommendation (2015)
37th European Conference on Information Retrieval (ECIR).
1344: 30-39
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
We describe the experimental recommendation platform created in collaboration with the Social Science Research Network (SSRN). This system allows for researchers to test recommendation algorithm on SSRN's users and quickly collect feedback on the efficacy of their recommendations. We further describe a test run performed using EigenFactor recommends and compare its performance to SSRN's production recommender.
⊖ Bibtex
@ARTICLE{wesleysmith2015ecir,
title = {An experimental platform for scholarly article recommendation},
author = {Wesley-Smith, Ian and Ralph J Dandrea and West, Jevin D},
journal = {37th European Conference on Information Retrieval (ECIR): Workshop on Bibliometric-Enhanced Information Retrieval},
volume = {1344},
pages = {30-39},
year = {2015}}
title = {An experimental platform for scholarly article recommendation},
author = {Wesley-Smith, Ian and Ralph J Dandrea and West, Jevin D},
journal = {37th European Conference on Information Retrieval (ECIR): Workshop on Bibliometric-Enhanced Information Retrieval},
volume = {1344},
pages = {30-39},
year = {2015}}
Theory Identity: A Machine-Learning Approach (2014)
Best Paper Nomination.
Hawaii International Conference on System Sciences (HICSS).
4639-4648. doi: 10.1109/HICSS.2014.564
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Theory identity is a fundamental problem for researchers seeking to determine theory quality, create theory ontologies and taxonomies, or perform focused theory-specific reviews and meta-analyses. We demonstrate a novel machine-learning approach to theory identification based on citation data and article features. The multi-disciplinary ecosystem of articles which cite a theory’s originating paper is created and refined into the network of papers predicted to contribute to, and thus identify, a specific theory. We provide a 'proof-of-concept' for a highly-cited theory. Implications for cross disciplinary theory integration and the identification of theories for a rapidly expanding scientific literature are discussed.
⊖ Bibtex
@inproceedings{Larsen2014HICSS,
title={Theory identity: A machine-learning approach},
author={Larsen, Kai R and Hovorka, Dirk and West, Jevin and Birt, James and Pfaff, James R and Chambers, Trevor W and Sampedro, Zebula R and Zager, Nick and Vanstone, Bruce},
booktitle={Hawaii International Conference on System Sciences},
pages={4639--4648},
year={2014},
publisher={IEEE}}
title={Theory identity: A machine-learning approach},
author={Larsen, Kai R and Hovorka, Dirk and West, Jevin and Birt, James and Pfaff, James R and Chambers, Trevor W and Sampedro, Zebula R and Zager, Nick and Vanstone, Bruce},
booktitle={Hawaii International Conference on System Sciences},
pages={4639--4648},
year={2014},
publisher={IEEE}}
Innovative women: an analysis of global gender disparities in patenting (2014)
International Conference on Science and Technology Indicators.
611-615.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Innovation is critical to economic development and depends upon the full participation of the scientific workforce. Yet, the field of innovation studies demonstrates that there are many disparities in the exploitation of human capacity for innovation. Foremost among these is the dearth of female inventors. The first patent granted to a woman was in 1637; however, female contribution failed to exceed more than 2% through the first half of the 20th century. Contemporary studies have shown that fewer women patent and when they do, they produce fewer patents per person than men. A number of correlates have been noted: women with higher degrees are more likely to patent than those without, and when women inventors are involved, patents tend to have higher diversity in terms of the number of IPC codes assigned.
⊖ Bibtex
@inproceedings{Sugimoto2014STI,
title={Innovative women: an analysis of global gender disparities in patenting},
author={Sugimoto, Cassidy R and Ni, Chaoqun and West, Jevin D and Lariviere, Vincent},
journal={International Conference on Science and Technology Indicators},
pages={611-615},
year={2014}}
title={Innovative women: an analysis of global gender disparities in patenting},
author={Sugimoto, Cassidy R and Ni, Chaoqun and West, Jevin D and Lariviere, Vincent},
journal={International Conference on Science and Technology Indicators},
pages={611-615},
year={2014}}
Scalable Flow-Based Community Detection for Large-Scale Network Analysis (2013)
Proceedings of IEEE International Conference on Data Mining Workshops (ICDMW).
303-310. doi: 10.1109/ICDMW.2013.138
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Community-detection is a powerful approach to uncover important structures in large networks. Since networks often describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Infomap, which optimizes the objective function known as the map equation. While Infomap is known to be an effective algorithm, its serial implementation cannot take advantage of multicore processing in modern computers. In this paper, we propose a novel parallel generalization of Infomap called RelaxMap. This algorithm relaxes concurrency assumptions to avoid lock overhead, achieving 70% parallel efficiency in shared-memory multicore experiments while exhibiting similar convergence properties and finding similar community structures as the serial algorithm. We evaluate our approach on a variety of real graph datasets as well as synthetic graphs produced by a popular graph generator used for benchmarking community detection algorithms. We describe the algorithm, the experiments, and some emerging research directions in high-performance community detection on massive graphs.
⊖ Bibtex
@inproceedings{Bae2013IEEE,
title={Scalable flow-based community detection for large-scale network analysis},
author={Bae, Seung-Hee and Halperin, Daniel and West, Jevin and Rosvall, Martin and Howe, Bill},
booktitle={International Conference on Data Mining Workshops (ICDMW)},
pages={303--310},
year={2013},
publisher={IEEE}}
title={Scalable flow-based community detection for large-scale network analysis},
author={Bae, Seung-Hee and Halperin, Daniel and West, Jevin and Rosvall, Martin and Howe, Bill},
booktitle={International Conference on Data Mining Workshops (ICDMW)},
pages={303--310},
year={2013},
publisher={IEEE}}
Hoptrees: Branching History Navigation for Hierarchies (2013)
Human-Computer Interaction (INTERACT)
316-333. doi: 10.1007/978-3-642-40477-1_20
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Designing software for exploring hierarchical data sets is challenging because users can easily become lost in large hierarchies. We present a novel interface, the hoptree, to assist users with navigating large hierarchies. The hoptree preserves navigational history and context and allows one-click navigation to recently-visited locations. We describe the design of hoptrees and an implementation that we created for a tree exploration application. We discuss the potential for hoptrees to be used in a wide variety of hierarchy navigation scenarios. Through a controlled experiment, we compared the effectiveness of hoptrees to a breadcrumb navigation interface. Study participants overwhelmingly preferred the hoptree, with improved time-on-task with no difference in error rates.
⊖ Bibtex
@incollection{Brooks2013Interact,
title={Hoptrees: branching history navigation for hierarchies},
author={Brooks, Michael and West, Jevin D and Aragon, Cecilia R and Bergstrom, Carl T},
booktitle={Human-Computer Interaction: INTERACT},
pages={316-333},
year={2013},
publisher={Springer}}
title={Hoptrees: branching history navigation for hierarchies},
author={Brooks, Michael and West, Jevin D and Aragon, Cecilia R and Bergstrom, Carl T},
booktitle={Human-Computer Interaction: INTERACT},
pages={316-333},
year={2013},
publisher={Springer}}
Comparing the dynamics of stomatal networks to the problem-solving dynamics of cellular computers (2011)
Unifying Themes in Complex Systems: Proceedings of the Fifth International Conference on Complex Systems.
327-341. doi: 10.1007/978-3-642-17635-7_40
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Is the adaptive response to environmental stimuli of a biological system lacking a central nervous system a result of a formal computation? If so, these biological systems must conform to a different set of computational rules than those associated with central processing. To explore this idea, we examined the dynamics of stomatal patchiness in leaves. Stomata—tiny pores on the surface of a leaf—are biological processing units that a plant uses to solve an optimization problem—maximize CO2 assimilation and minimize H2O loss. Under some conditions, groups of stomata coordinate in both space and time producing motile patches that can be visualized with chlorophyll fluorescence. These patches suggest that stomata are nonautonomous and that they form a network presumably engaged in the optimization task. In this study, we show that stomatal dynamics are statistically and qualitatively comparable to the emergent, collective, problem-solving dynamics of cellular computing systems.
⊖ Bibtex
@incollection{West2011Complex,
title={Comparing the dynamics of stomatal networks to the problem-solving dynamics of cellular computers},
author={West, Jevin D and Peak, David and Mott, Keith A and Messinger, Susanna M},
booktitle={Unifying Themes in Complex Systems: Proceedings of the Fifth International Conference on Complex Systems},
pages={327--341},
doi = {10.1007/978-3-642-17635-7_40},
year={2011},
publisher={Springer}}
title={Comparing the dynamics of stomatal networks to the problem-solving dynamics of cellular computers},
author={West, Jevin D and Peak, David and Mott, Keith A and Messinger, Susanna M},
booktitle={Unifying Themes in Complex Systems: Proceedings of the Fifth International Conference on Complex Systems},
pages={327--341},
doi = {10.1007/978-3-642-17635-7_40},
year={2011},
publisher={Springer}}
Invited
The chatbot era: Better or worse off? (2023)
Seattle Times.
March 31
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Are we better off because of penicillin? Yes. The internet? Probably. Social media? Probably not. So what about chatbots? The chatbot craze has captured the world’s attention, and massive piles of money. Chatbots are software programs that use artificial intelligence to process and simulate conversations with humans. Will they improve human experience and longevity, peace and prosperity, environmental health, productivity or social well-being? From my perspective, as a researcher who studies misinformation and its effects on society, chatbots will be vectors of propaganda, they will make it harder to discern truth, and they will further erode trust in our institutions. I see two main reasons for this: They are bullshitters at scale, and they are difficult, if not impossible, to reverse engineer.
⊖ Bibtex
@ARTICLE{West2023SeattleTimes,
author = {West, Jevin D},
title = {The chatbot era: Better or worse off?},
journal = {Seattle Times},
volume = {March 31},
year = {2023}}
author = {West, Jevin D},
title = {The chatbot era: Better or worse off?},
journal = {Seattle Times},
volume = {March 31},
year = {2023}}
GrantExplorer (2023)
ACS Green Chemistry Institute Nexus.
March 24
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
How much NSF funding has been devoted to “generative AI” in the last five years? Searching existing published datasets for “generative AI” is cumbersome. While innovative new AI tools like ChatGPT have received lots of attention for information retrieval tasks, they’re ill-suited for answering questions about structured data and providing sources. The goal of GrantExplorer is to make this sort of question much easier to answer.
⊖ Bibtex
@article{Chamberlin2023acs,
author = {Cole Chamberline and Jevin D. West},
title = {GrantExplorer},
journal = {ACS Green Chemistry Institute Nexus},
volume = {March 24},
url = {https://communities.acs.org/t5/GCI-Nexus-Blog/GrantExplorer/ba-p/89871},
year = {2023}}
author = {Cole Chamberline and Jevin D. West},
title = {GrantExplorer},
journal = {ACS Green Chemistry Institute Nexus},
volume = {March 24},
url = {https://communities.acs.org/t5/GCI-Nexus-Blog/GrantExplorer/ba-p/89871},
year = {2023}}
A volcanic change in social-media landscape (2022)
Seattle Times
December 9
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
A pilgrimage to Mount St. Helens is a Northwest tradition. My family and I made the trek this past summer. From the Windy Ridge Viewpoint, we could see both the destruction and the emergence of a new mountain. We could see the mats of denuded trees but also the prairie lupine pushing through the dry pumice. The same kind of volcanic eruption and resulting rebuild is happening in the social-media landscape.
⊖ Bibtex
@ARTICLE{West2022SeattleTimes,
author = {West, Jevin D},
title = {A volcanic change in social-media landscape},
journal = {Seattle Times},
volume = {December 9},
year = {2022}}
author = {West, Jevin D},
title = {A volcanic change in social-media landscape},
journal = {Seattle Times},
volume = {December 9},
year = {2022}}
An Introduction to Calling Bullshit: Learning to Think Outside the Black Box (2022)
Numeracy
15(1):1-5
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
While statistical methods receive greater attention, the art of critically evaluating information in everyday life more commonly depends on thinking outside the black box of the algorithm. In this piece we introduce readers to our book and associated online teaching materials—for readers who want to more capably call “bullshit” or to teach their students to do the same.
⊖ Bibtex
@article{West2022numeracy,
author = {West, Jevin D. and Bergstrom, Carl T.},
title = {An Introduction to Calling Bullshit: Learning to Think Outside the Black Box},
journal = {Numeracy},
volume = {15},
number = {1},
pages = {1-5},
doi = {https://doi.org/10.5038/1936-4660.15.1.1405},
year = {2022}}
author = {West, Jevin D. and Bergstrom, Carl T.},
title = {An Introduction to Calling Bullshit: Learning to Think Outside the Black Box},
journal = {Numeracy},
volume = {15},
number = {1},
pages = {1-5},
doi = {https://doi.org/10.5038/1936-4660.15.1.1405},
year = {2022}}
How do you solve a problem like misinformation? (2021)
Science Advances
7(50): eabn0481
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In this perspective, we address three key distinctions for research and policy about misinformation: the distinction between misinformation/disinformation, speech/action, and mistaken belief/conviction.
⊖ Bibtex
@ARTICLE{Calo2021ScienceAdvances,
author = {Calo, Ryand and Coward, Chris and Spiro, Emma and Starbird, Kate and West, Jevin D},
title = {How do you solve a problem like misinformation?},
journal = {Science Advances},
volume = {7},
number = {50},
doi = {https://doi.org/10.1126/sciadv.abn0481},
year = {2021}}
author = {Calo, Ryand and Coward, Chris and Spiro, Emma and Starbird, Kate and West, Jevin D},
title = {How do you solve a problem like misinformation?},
journal = {Science Advances},
volume = {7},
number = {50},
doi = {https://doi.org/10.1126/sciadv.abn0481},
year = {2021}}
Calling BS: Data reasoning during an infodemic (2020)
Organisation for Economic Co-operation and Development (OECD) Forum
October 20
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The University of Washington, where we both teach, conferred 8,853 undergraduate degrees in the 2017-2018 academic year. A mere 20 of these degrees were awarded in the field of statistics. Indeed, only a small fraction of our students receive any formal training in statistics whatsoever—and of those who do, the vast majority take only a class or two in the area. When looking at the general population, the percentage is even smaller. According to the National Center for Education Statistics, only about 11% of United States high school students completed a statistics course in 2009. World Statistics Day offers an opportunity to reflect on how we can improve data literacy in a "big data" world, not just for university students, but for everyone...
⊖ Bibtex
@article{West2020oecd,
author = {West, Jevin D. and Bergstrom, Carl T.},
title = {Calling BS: Data reasoning during an infodemic},
journal = {Organisation for Economic Co-operation and Development (OECD) Forum},
volume = {October 20},
url = {https://www.oecd-forum.org/posts/calling-bs-data-reasoning-during-an-infodemic},
year = {2020}}
author = {West, Jevin D. and Bergstrom, Carl T.},
title = {Calling BS: Data reasoning during an infodemic},
journal = {Organisation for Economic Co-operation and Development (OECD) Forum},
volume = {October 20},
url = {https://www.oecd-forum.org/posts/calling-bs-data-reasoning-during-an-infodemic},
year = {2020}}
This covid-19 misinformation went viral. Here’s what we learned. (2020)
The Washington Post
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In a crisis, people struggle collectively to make sense of a complex and frightening situation — and as a result, misinformation spreads. For several reasons, that’s been especially true during the novel coronavirus pandemic. Scientists are still trying to understand everything about the virus’s disease, covid-19, including how it spreads and which treatments work. Armchair epidemiologists are filling the Internet with their own interpretations of the emerging science...
⊖ Bibtex
@article{Starbird2020washingtonpost,
author = {Starbird, Kate and Spiro, Emma and West, Jevin D.},
title = {This covid-19 misinformation went viral. Here’s what we learned.},
journal = {The Washington Post},
volume = {May 8},
year = {2020},
url = {https://www.washingtonpost.com/politics/2020/05/08/this-covid-19-misinformation-went-viral-heres-what-we-learned/}}
author = {Starbird, Kate and Spiro, Emma and West, Jevin D.},
title = {This covid-19 misinformation went viral. Here’s what we learned.},
journal = {The Washington Post},
volume = {May 8},
year = {2020},
url = {https://www.washingtonpost.com/politics/2020/05/08/this-covid-19-misinformation-went-viral-heres-what-we-learned/}}
Hydroxychloroquine for COVID-19 prevention? How to separate science from partisanship (2020)
NBC News
August 5
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
A Rasmussen Reports survey of 1,000 American adults conducted in April found that 53 percent of Republicans were willing to take anti-malarial hydroxychloroquine to treat COVID-19, while only 18 percent of Democrats were willing to try it. No one is surprised to see political polarization around issues of taxation, immigration, welfare or military spending. But it has been remarkable to see such deep partisan divides about basic medical science. And as has become very clear this year, it is especially dangerous during a global pandemic. In 2019, you might have predicted that in some future disease outbreak, liberals would favor an expanded federal role in health care while conservatives would oppose government restrictions on business activity. But could you have anticipated that Democrats would champion masks and Republicans would endorse hydrocholorquine, rather than vice versa? The utter arbitrariness of how public opinion on scientific questions has fractured along partisan divides reveals something rotten at the core of the national conversation...
⊖ Bibtex
@article{West2020nbc,
author = {West, Jevin D. and Bergstrom, Carl T.},
title = {Hydroxychloroquine for COVID-19 prevention? How to separate science from partisanship},
journal = {NBC News},
volume = {August 5th},
url = {https://www.nbcnews.com/think/opinion/hydroxychloroquine-covid-19-prevention-how-separate-science-partisanship-ncna1235834},
year = {2020}}
author = {West, Jevin D. and Bergstrom, Carl T.},
title = {Hydroxychloroquine for COVID-19 prevention? How to separate science from partisanship},
journal = {NBC News},
volume = {August 5th},
url = {https://www.nbcnews.com/think/opinion/hydroxychloroquine-covid-19-prevention-how-separate-science-partisanship-ncna1235834},
year = {2020}}
5 types of misinformation to watch out for while ballots are being counted – and after (2020)
The Conversation
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
With no clear winner yet in the presidential election, there’s an opportunity for partisan activists, conspiracy theorists and others to exploit public uncertainty and anxiety to attempt to delegitimize the election results....
⊖ Bibtex
@article{Starbird2020conversation,
author = {Starbird, Kate and West, Jevin D and DiResta, Renee},
title = {5 types of misinformation to watch out for while ballots are being counted – and after},
journal = {The Conversation},
volume = {November 5},
year = {2020},
url = {https://theconversation.com/5-types-of-misinformation-to-watch-out-for-while-ballots-are-being-counted-and-after-149509}}
title = {5 types of misinformation to watch out for while ballots are being counted – and after},
journal = {The Conversation},
volume = {November 5},
year = {2020},
url = {https://theconversation.com/5-types-of-misinformation-to-watch-out-for-while-ballots-are-being-counted-and-after-149509}}
Deepfakes and the U.S. Elections: Lessons from the 2020 Workshops (2020)
University of Washington Center for an Informed Public (CIP)
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
In July 2020, the University of Washington’s Center for an Informed Public (CIP) and Microsoft’s Defending Democracy Program convened a three-part workshop with experts from the technology industry, media organizations, government, and academia to discuss the state of media manipulated by Artificial Intelligence (AI), also known as deepfakes. The invited participants included representatives from major tech companies and social media platforms, academia and think tanks, major international, national and regional news organizations, fact-checking groups, civil society organizations, and elected officials and government technology professionals...
⊖ Bibtex
@article{Prochaska2020deepfake,
author = {Prochaska, Stephen and Grass, Michael and West, Jevin D.},
title = {Deepfakes and the U.S. Elections: Lessons from the 2020 Workshops},
journal = {University of Washington Center for an Informed Public (CIP)},
year = {2020},
url = {https://www.cip.uw.edu/deepfakes-and-the-u-s-elections-lessons-from-the-2020-workshops/}}
author = {Prochaska, Stephen and Grass, Michael and West, Jevin D.},
title = {Deepfakes and the U.S. Elections: Lessons from the 2020 Workshops},
journal = {University of Washington Center for an Informed Public (CIP)},
year = {2020},
url = {https://www.cip.uw.edu/deepfakes-and-the-u-s-elections-lessons-from-the-2020-workshops/}}
The 2020 election integrity partnership (2020)
Yale University Law School Information Society Project (ISP)
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
An unverified user posted this tweet shortly after the Arizona polls closed. Within a few hours, the tweet and others like it went viral. The official (verified) Pima County account tweeted a response to Sharpiegate, stating that felt-tipped pens could be used to vote; they would not invalidate a ballot. That was not enough to stop the surge. Hundreds of thousands of tweets followed, pushing the narrative of voter fraud...
⊖ Bibtex
@article{West2020yale-isp,
author = {West, Jevin D},
title = {The 2020 election integrity partnership},
journal = {Yale University ISP Conference},
volume = {December 11},
year = {2020},
url = {https://knightfoundation.org/news-and-information-disorder-in-the-2020-presidential-election/}}
author = {West, Jevin D},
title = {The 2020 election integrity partnership},
journal = {Yale University ISP Conference},
volume = {December 11},
year = {2020},
url = {https://knightfoundation.org/news-and-information-disorder-in-the-2020-presidential-election/}}
Curiosity is an antidote to misinformation (2020)
University of Washington Center for an Informed Public (CIP) Newsletter
July 29
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
No one is immune to misinformation — not young people, older people, conservatives, liberals, politicians (for sure), or even professors that teach classes on Calling BS. We are all susceptible. But there is one human quality that may help: curiosity...
⊖ Bibtex
@article{West2020cip-june,
author = {\textbf{J.D. West}},
title = {Curiosity is an antidote to misinformation},
journal = {University of Washington Center for an Informed Public (CIP) Newsletter},
volume = {July 29},
year = {2020},
url = {https://www.cip.uw.edu/2020/07/29/curiosity-antidote-misinformation/}}
author = {\textbf{J.D. West}},
title = {Curiosity is an antidote to misinformation},
journal = {University of Washington Center for an Informed Public (CIP) Newsletter},
volume = {July 29},
year = {2020},
url = {https://www.cip.uw.edu/2020/07/29/curiosity-antidote-misinformation/}}
With CIP's team in place, we're actively engaged in building a community to combat strategic misinformation (2020)
University of Washington Center for an Informed Public (CIP) Newsletter
June 29
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
December 3, 2019. It seems like a lifetime ago. We launched the Center for an Informed Public (CIP) that day with the hope of building a community to combat strategic misinformation and to strengthen democratic discourse. Since then, SARS-CoV-2 has infected more than 10 million people worldwide. As with any crisis (but especially this one), misinformation and disinformation has proliferated – so much so that the World Health Organization has popularized the term “infodemic” to describe the health risks associated with false information about the disease...
⊖ Bibtex
@article{West2020cip-june,
author = {\textbf{J.D. West}},
title = {With CIP's team in place, we're actively engaged in building a community to combat strategic misinformation},
journal = {University of Washington Center for an Informed Public (CIP) Newsletter},
volume = {June 29},
year = {2020},
url = {https://www.cip.uw.edu/2020/06/29/jevin-west-cip-note/}}
author = {\textbf{J.D. West}},
title = {With CIP's team in place, we're actively engaged in building a community to combat strategic misinformation},
journal = {University of Washington Center for an Informed Public (CIP) Newsletter},
volume = {June 29},
year = {2020},
url = {https://www.cip.uw.edu/2020/06/29/jevin-west-cip-note/}}
Preventing and Mitigating Misinformation (2020)
University of Washington Information School Newsletter
October 14
Misinformation and the Impending US Election (2020)
Penguin Random House Newsletter
August
A Center for an Informed Public (2020)
University of Washington Magazine. Forward
Abstract »
⊖ Abstract
|
Bibtex »
On Dec. 3, we launched the new Center for an Informed Public (CIP) at UW—-a response to the rise in disinformation and erosion of trust in our most basic societal institutions. More than 400 people packed the HUB South Ballroom, among them faculty, alumni, students, librarians, journalists, industry leaders, high school teachers, funders and policymakers. There was a sense of excitement mixed with concern—excitement that UW was leading the way; concern over the magnitude and reach of the problem.
⊖ Bibtex
@article{West2020uwmagazine,
author = {West, Jevin D},
title = {A Center for an Informed Public},
journal = {University of Washington Magazine},
volume = {31},
number = {1},
pages = {6},
year = {2020}}
author = {West, Jevin D},
title = {A Center for an Informed Public},
journal = {University of Washington Magazine},
volume = {31},
number = {1},
pages = {6},
year = {2020}}
How to fine-tune your BS meter (2017)
Seattle Times. Op-ed
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The Industrial Revolution changed the world. The steam engine and machine tools increased labor efficiency a hundredfold. Advances in iron production increased energy efficiency and decreased costs, expanding the distances one could travel and decreasing the time it took to do so. But these technological developments carried serious side effects. The new coal-powered factory and large-scale chemical production blackened the sky and darkened the waterways. It was not an abstract concept; people only had to look up to the sky. The Digital Revolution has followed a similar path. Leaps in computer technology have again streamlined production — not of physical goods, but information goods. Inexpensive and nearly infinite storage, gigahertz microprocessors, light-speed communication, and the rise of sociotechnical platforms all have made the production and transfer of information fast and cheap. Everyone has become a publisher. The gatekeeper models of news dissemination have lost their dominance. Nearly two-thirds of Americans get some news from social media. This decentralization of news filtering is not all bad. Sure, the proliferation of fake news is a byproduct, but this era also provides access to diversity of thought and interests that the Walter Cronkite period could have never afforded.
⊖ Bibtex
@article{West2017seattletimes,
author = {West, Jevin D},
title = {How to fine-tune your BS meter},
journal = {Seattle Times},
volume = {Op-ed, Sept. 12},
year = {2017}}
author = {West, Jevin D},
title = {How to fine-tune your BS meter},
journal = {Seattle Times},
volume = {Op-ed, Sept. 12},
year = {2017}}
The Science of Data Science (2016)
Journal of Integrated Creative Studies.
2016-010-e. doi: 10.14989/214432
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
This report summarizes two talks that I gave at the Advanced Future Studies at Kyoto University in February of 2016. One talk was for the Global Partnership on Science Education through Engagement. In this talk I focused on an emerging educational trend in the United States—the rise of Data Science at both the undergraduate and graduate level—and the effect it is having on research and industry. In the second talk, I spoke at the International Sympsosium on Advanced Future Studies symposium. In this talk, I provided an overview of an emerging research trend—the emergence of a new discipline called the Science of Science. In this new field, science is done at the level of millions of publications over many generations and disciplines using new tools from machine learning, computer vision, and network science. Both Data Science and the Science of Science require perspectives from multiple disciplines, which fit well with the general theme of both meetings in Kyoto.
⊖ Bibtex
@article{West2016jics,
author = {West, Jevin D},
title = {The Science of Data Science},
journal = {Journal of Integrated Creative Studies},
volume = {No.2016-010-e},
doi = {10.14989/214432},
year = {2016}}
author = {West, Jevin D},
title = {The Science of Data Science},
journal = {Journal of Integrated Creative Studies},
volume = {No.2016-010-e},
doi = {10.14989/214432},
year = {2016}}
Can Ignorance Promote Democracy? (2011)
Science.
334(6062):1503-1504. doi:10.1126/science.1216124
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Ideas are like fire, observed Thomas Jefferson in 1813—information can be passed on without relinquishing it (1). Indeed, the ease and benefit of sharing information select for individuals to aggregate into groups, driving the buildup of complexity in the biological world (2, 3). Once the members of some collective—whether cells of a fruit fly or citizens of a democratic society—have accumulated information, they must integrate that information and make decisions based upon it. When these members share a common interest, as do the stomata on the surface of a plant leaf (4), integrating distributed information may be a computational challenge. But when individuals do not have entirely coincident interests, strategic problems arise. Members of animal herds, for example, face a tension between aggregating information for the benefit of the herd as a whole, and avoiding manipulation by self-interested individuals in the herd. Which collective decision procedures are robust to manipulation by selfish players (5)? On page 1578 of this issue, Couzin et al. (6) show how the presence of uninformed agents can promote democratic outcomes in collective decision problems.
⊖ Bibtex
@article{West2011Science,
title={Can ignorance promote democracy?},
author={West, Jevin D and Bergstrom, Carl T},
journal={Science},
volume={334},
number={6062},
number={1503--1504},
year={2011},
doi = {10.1126/science.1216124},
issn = {0036-8075},
publisher={American Association for the Advancement of Science}}
title={Can ignorance promote democracy?},
author={West, Jevin D and Bergstrom, Carl T},
journal={Science},
volume={334},
number={6062},
number={1503--1504},
year={2011},
doi = {10.1126/science.1216124},
issn = {0036-8075},
publisher={American Association for the Advancement of Science}}
How to improve the use of metrics: Learn from Game Theory (2010)
Nature.
465:870-872. doi:10.1038/465870a
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Giving bad answers is not the worst thing a ranking system can do — the worst thing is to encourage bad science. The next generation of scientific metrics needs to take this into account.When scientists order elements by molecular weight, the elements do not respond by trying to sneak higher up the order. But when administrators order scientists by prestige, the scientists tend to be less passive. There is a powerful feedback between the ranking systems used to assess scientific productivity and the actions of scientists trying to further their careers via these ranking systems.
⊖ Bibtex
@article{West2010Nature,
Author = {West, Jevin D},
Journal = {Nature},
Pages = {870-872},
Title = {How to improve the use of metrics: Learn from Game Theory},
Volume = {465},
Year = {2010}}
Author = {West, Jevin D},
Journal = {Nature},
Pages = {870-872},
Title = {How to improve the use of metrics: Learn from Game Theory},
Volume = {465},
Year = {2010}}
Response to "Big Macs and Eigenfactor Scores: The Correlation Conundrum" (2010)
Journal of the American Society for Information Science & Technology.
61(12):2592 doi: 10.1002/asi.21408
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
As we pointed out in our original article (West,Bergstrom, & Bergstrom, in press), currency denominations generate a spurious correlation in the Big Mac data. The high correlation between wage rates and Big Mac prices denominated in local currency might lead a careless reader to believe that in all countries it takes a laborer about the same amount of time to earn a Big Mac. By rescaling currencies in a few of the countries, Prathap (in press) shows that this is not the case. Of course. Any competent statistician would do something like this. For example, when The Economist publishes their Big Mac index, they convert all prices into US dollars at prevailing currency exchange rates. That was the point of our analogy—to pick a case where the source of spurious correlation was so obvious that anyone could recognize the problem.
⊖ Bibtex
@article{West2010JASIST-2,
title={Response to “Big Macs and Eigenfactor scores: The correlation conundrum”},
author={West, Jevin D and Bergstrom, Theodore and Bergstrom, Carl T},
journal={Journal of the American Society for Information Science and Technology},
volume={61},
number={12},
pages={2592--2592},
year={2010},
issn = {1532-2890},
doi = {10.1002/asi.21408},
publisher={Wiley Online Library}}
title={Response to “Big Macs and Eigenfactor scores: The correlation conundrum”},
author={West, Jevin D and Bergstrom, Theodore and Bergstrom, Carl T},
journal={Journal of the American Society for Information Science and Technology},
volume={61},
number={12},
pages={2592--2592},
year={2010},
issn = {1532-2890},
doi = {10.1002/asi.21408},
publisher={Wiley Online Library}}
The EigenfactorTM Metrics: How does the Journal of Biological Chemistry stack up? (2009)
The American Society for Biochemistry and Molecular Biology (ASBMB Today).
April: p. 20-21
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The scientific literature comprises a vast network of research papers, linked to one another by scholarly citations; this network traces the spread of ideas through the scientific community. At the Eigenfactor Project, we use the structure of this network to assess the influence of scholarly journals and to map out relations among scientific fields.
⊖ Bibtex
@article{West2009AmSocBiochem,
author = {West, Jevin D and Stefaner, Moritz and Bergstrom, Carl T},
title = {The Eigenfactor Metrics: How does the Journal of Biological Chemistry stack up?},
journal = {The American Society for Biochemistry and Molecular Biology (ASBMB Today)},
volume = {April},
pages = {20--21},
year = {2009}}
author = {West, Jevin D and Stefaner, Moritz and Bergstrom, Carl T},
title = {The Eigenfactor Metrics: How does the Journal of Biological Chemistry stack up?},
journal = {The American Society for Biochemistry and Molecular Biology (ASBMB Today)},
volume = {April},
pages = {20--21},
year = {2009}}
Assessing Citations with the EigenfactorTM Metrics (2008)
Neurology.
71(23):1850-1851. doi:10.1212/01.wnl.0000338904.37585.66
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
For more than 80 years, researchers and administrators alike have evaluated the prestige and productivity of researchers, institutions, journals, and even nations by counting citations. For the past half century, the impact factor has been the most prominent of these citation metrics. Impact factor is essentially a measure of the average number of citations that a journal’s articles receive over the two calendar years following publication. As a citation metric, impact factor has a number of virtues, not the least of which are that it is simple to describe and easy to calculate.
⊖ Bibtex
@article{Bergstrom2008Neurology,
title={Assessing citations with the Eigenfactor Metrics},
author={Bergstrom, Carl T and West, Jevin D},
journal={Neurology},
volume={71},
number={23},
pages={1850--1851},
year={2008},
doi = {10.1212/01.wnl.0000338904.37585.66}}
title={Assessing citations with the Eigenfactor Metrics},
author={Bergstrom, Carl T and West, Jevin D},
journal={Neurology},
volume={71},
number={23},
pages={1850--1851},
year={2008},
doi = {10.1212/01.wnl.0000338904.37585.66}}
Eigenfactor - The Google Approach to Bibliometrics (2008)
Front Matter, Allen Press.
4:7
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Not all citations are created equal. This is one of the core ideas behind Eigenfactor. Citations from more prestigious journals (such as Science and Nature) are worth more than citations from less important journals (such as the Journal of Obscurity). This meritocratic approach to bibliometrics is very similar to the philosophy behind Google’s PageRank algorithm, which is at “the heart of [its] software”. Receiving a hyperlink from a highly reputable website means more than a hyperlink from a neighborhood blog. Both Google and Eigenfactor utilize the wealth of information inherent in the structure of their respective networks. For Google, that information can be found in the topology of the web, and for Eigenfactor, the information can be found in the citation structure of the scholarly literature. The success of Google's search engine illustrates the power of this approach to ranking. Part of the success behind PageRank can actually be traced back to prior work in the field of bibliometrics. With the advent of scholarly measures like Eigenfactor, this relationship has come full circle.
⊖ Bibtex
@article{West2008FrontMatter,
author = {West, Jevin D},
title = {Eigenfactor - The Google Approach to Bibliometrics},
volume = {4},
pages = {7},
journal = {FrontMatter},
year = {2008},
publisher = {Allen Press}}
author = {West, Jevin D},
title = {Eigenfactor - The Google Approach to Bibliometrics},
volume = {4},
pages = {7},
journal = {FrontMatter},
year = {2008},
publisher = {Allen Press}}
Book Chapters
Global Ebbs and Flows of Patent Knowledge (2022)
Trade in Knowledge: Intellectual Property, Trade and Development in a Transformed Global Economy
Cambridge University Press.
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
Patents and patent applications contain rich sources of scientific and technical information. By their very name, patents - derived from a Latin word meaning “open” - make their information freely available to the public. They describe new devices, machines, systems, molecules, and even living organisms, as well as methods for making and using them. Transfer of technical knowledge among people, firms, institutions, and governments has long relied upon the information patent documents (hereafter simply “patents”) disclose. One may take advantage of a feature of patent law to track the flow of technical ideas among patents by constructing a network from the citations patents make to and from each other. The result is a patent citation network. The worldwide patent citation network is immense, with almost half a billion citations among more than 100 million patents. In this chapter, we use the worldwide patent citation network to infer the flow of technical information among countries...
⊖ Bibtex
@INCOLLECTION{Torrance2022patentflows,
author = {Torrance, Andrew and West, Jevin D. and Friedman, Lisa},
title = {Global Ebbs and Flows of Patent Knowledge},
booktitle = {Trade in Knowledge: Intellectual Property, Trade and Development in a Transformed Global Economy},
publisher = {Cambridge University Press},
year = {2022},
pages = {218-264},
url = {https://www.cambridge.org/core/books/trade-in-knowledge/310203EBBFE94182502B01F77D7800B1},
doi = {10.1017/9781108780919.010},
chapter = {7}}
Patent Analytics: Information From Innovation (2021)
Legal Informatics.
Cambridge University Press. Chapter 20. (in press)
Abstract »
⊖ Abstract
|
Bibtex »
The United States Patent and Trademark Office (“USPTO”) recently made their electronic patent and trademark data publicly available in machine readable form. Opening this massive data source to the public corresponded with significant advances in natural language processing, graph analytics, community detection, computer vision, and data processing for scaling these new computational methods, as well as with heightened interest in probing the accuracy of long-held assumptions about how, and how well, the patent system functions. This confluence of open data with advances in machine learning has created a Cambrian explosion in patent analysis, patent search, and patent consumption. This impacts all stakeholders in the patent ecosystem, including patent attorneys, examiners, inventors, owners, litigants, portfolio managers, and analysts of every stripe. In this chapter, we survey this emerging field in both the research and commercial domains. We categorize the existing technologies, provide examples of these technologies, and speculate on how these new approaches will affect intellectual property rights and innovation. Just as the Cambrian explosion resulted in many new forms of life, we expect that the patent system will undergo substantial evolution in response to the availability of data and powerful new methods of analysis.
⊖ Bibtex
@inproceedings {Torrance2018patentinnovation,
author = {Torrance, Andrew and West, Jevin D},
title = {Patent Analytics: Information From Innovation},
booktitle = {Legal Informatics},
publisher = {Cambridge University Press},
year = {2021},
chapter = {chapter 20},
editor = {Katz, Daniel M and Bommarito, Michael J and Dolin, Ron}}
author = {Torrance, Andrew and West, Jevin D},
title = {Patent Analytics: Information From Innovation},
booktitle = {Legal Informatics},
publisher = {Cambridge University Press},
year = {2021},
chapter = {chapter 20},
editor = {Katz, Daniel M and Bommarito, Michael J and Dolin, Ron}}
The Data Gold Rush in Higher Education (2016)
Big Data is Not a Monolith.
chpt. 10, MIT Press
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
The enthusiasm for all things big data and data science is more alive now than ever. It can be seen in the frequency of big data articles published in major newspapers and in the venture capitalists betting on its economic impact. Governments and foundations are calling for grant proposals, and big companies are reorganizing in response to this new commodity. Another, often-overlooked vitality indicator of data science comes from education. Students are knocking down the doors at universities, massive open online courses (MOOCs), and workshops. The demand for data science skills is at an all-time high, and universities are responding.
⊖ Bibtex
@inproceedings {West2016goldrush,
author = {West, Jevin D and Portenoy, Jason},
title = {The Data Gold Rush in Higher Education},
booktitle = {Big Data is Not a Monolith},
publisher = {MIT Press},
year = {2016},
chapter = {10},
isbn = {9780262035057},
editor = {Sugimoto, C.R. and Ekbia, H. and Mattioli, M.}}
author = {West, Jevin D and Portenoy, Jason},
title = {The Data Gold Rush in Higher Education},
booktitle = {Big Data is Not a Monolith},
publisher = {MIT Press},
year = {2016},
chapter = {10},
isbn = {9780262035057},
editor = {Sugimoto, C.R. and Ekbia, H. and Mattioli, M.}}
A Network Approach to Scholarly Evaluation (2014)
Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact.
chpt. 8:151-166, MIT Press
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
As Derek de Solla Price famously noted in 1965, the scientific literature forms a vast network. The nodes of this network are the millions of published articles, and the edges are the citations between them. There is a wealth of information — not only within the content of these nodes (the text) - but also within the structure connecting these nodes (the network topology). In fact, the network topology by itself provides clues about the quality of the content. This is similar to how Google’s PageRank algorithm harnesses the hyperlink structure of the web to evaluate web pages.
⊖ Bibtex
@inproceedings {West2014BeyondBiblio,
author = {J.D. West and D. Vilhena},
title = {A Network Approach to Scholarly Evaluation},
booktitle = {Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact},
publisher = {MIT Press},
chapter = {8},
pages = {151-166},
isbn = {9780262026796},
editor = {{B. Cronin} and {C.R. Sugimoto}},
year = {2014}}
author = {J.D. West and D. Vilhena},
title = {A Network Approach to Scholarly Evaluation},
booktitle = {Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact},
publisher = {MIT Press},
chapter = {8},
pages = {151-166},
isbn = {9780262026796},
editor = {{B. Cronin} and {C.R. Sugimoto}},
year = {2014}}
Bacteriophages: models for exploring basic principles of ecology (2008)
Bacteriophage Ecology: Population Growth, Evolution, and Impact of Bacterial Viruses.
chpt. 2: 31-63, University Press, Cambridge, U.K.
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
A virus depends intimately upon its host in order to reproduce, which makes the host organism a crucial part of the virus' environment. This basic facet of viral existence means that ecology, the scientific field focusing on how organisms interact with each other and their environment, is particularly relevant to the study of viruses. In this chapter we explore some of the ways in which the principles of ecology apply to viruses that infect bacteria—the bacteriophages (or "phages" for short). In turn, we also discuss how the study of phage and their bacterial hosts has contributed to different subfields of ecology.
⊖ Bibtex
@inproceedings {Kerr2008Bacteriophages,
author = {Kerr, Benjamin and West, Jevin D and Bohannan, Brendan JM},
title = {Bacteriophages: models for exploring basic principles of ecology},
chapter = {2},
pages = {31-63},
booktitle = {Bacteriophage Ecology: Population Growth, Evolution, and Impact of Bacterial Viruses},
isbn = {9780521858458},
publisher = {Cambridge Univerity Press},
editor = {Abedon, Stephen T},
year = {2008}}
author = {Kerr, Benjamin and West, Jevin D and Bohannan, Brendan JM},
title = {Bacteriophages: models for exploring basic principles of ecology},
chapter = {2},
pages = {31-63},
booktitle = {Bacteriophage Ecology: Population Growth, Evolution, and Impact of Bacterial Viruses},
isbn = {9780521858458},
publisher = {Cambridge Univerity Press},
editor = {Abedon, Stephen T},
year = {2008}}
Patents
Systems and Methods for Data Analysis (2013)
US Patent Application: US20140337280A1.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Described herein are methods and systems for hierarchically mapping, ranking, and labeling data sets automatically. Also provided are methods for browsing and navigating a hierarchically mapped data set, and START identifying changes in network structure over time. An example method may involve receiving document data indicating a corpus of documents and references between documents within the corpus. Based on the document data, a network comprising two or more nodes and at least one directed edge may be determined. Also, a hierarchical partition of the documents may be determined based on the directed edges of the network. The hierarchical partition may define a plurality of nested modules, and each module in the plurality of nested modules may be associated with one or more respective documents within the corpus. The method may additionally include causing a graphical display to provide a visual indication of one or more of the plurality of nested modules.
⊖ Bibtex
MISC{West2013Patent,
author = {Bergstrom, Carl T and Rosvall, Martin and Vilhena, Daril and West, Jevin D and Torrance, Andrew},
title = {Systems and Methods for Data Analysis},
institution = {University of Washington},
year = {2013},
journal = {US20140337280A1},
note = {PCT Application Filed on Feb. 1, 2013}}
author = {Bergstrom, Carl T and Rosvall, Martin and Vilhena, Daril and West, Jevin D and Torrance, Andrew},
title = {Systems and Methods for Data Analysis},
institution = {University of Washington},
year = {2013},
journal = {US20140337280A1},
note = {PCT Application Filed on Feb. 1, 2013}}
Theses
Eigenfactor: ranking and mapping scientific knowledge (2010)
Doctoral Dissertation.
University of Washington, Department of Biology
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Each year, tens of thousands of scholarly journals publish hundreds of thousands of scholarly papers, collectively containing tens of millions of citations. As De Solla Price recognized in 1965, these citations form a vast network linking up the collective research output of the scholarly community. These well-defined and well-preserved networks are model systems well suited for studying communication networks and the flow of information on these networks. In this dissertation, I explain how I used citation networks to develop an algorithm that I call 'Eigenfactor.' The goal of Eigenfactor is to mine the wealth of information contained within the full structure of the scholarly web, in order to identify the important nodes in these networks. This is dif-ferent from the conventional approach to scholarly evaluation. Metrics like impact factor ignore the network when ranking scholarly journals and only count incoming links. Eigenfactor not only counts citations but takes into account the source of those citations. By considering the whole network, I claim that Eigenfactor is a more information rich statistic. Librarians, publishers, editors and scholars around the world are now using Eigenfactor alongside impact factor to evaluate their journal collections. This dissertation consists of a collection of papers that provide an overview of Eigenfactor - what it is, what it measures and how it can be used to better evaluate and navigate the ever-expanding scholarly literature.
⊖ Bibtex
PHDTHESIS{West2010Phd,
author = {West, Jevin D},
title = {Eigenfactor: ranking and mapping scientific knowledge},
school = {University of Washington, Department of Biology},
year = {2010}}
author = {West, Jevin D},
title = {Eigenfactor: ranking and mapping scientific knowledge},
school = {University of Washington, Department of Biology},
year = {2010}}
Investigations into the spatial and temporal dynamics of stomatal networks to determine whether plants perform emergent, distributed computation (2004)
Masters Thesis.
Utah State University, Department of Biology
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
This research studies the spatial and temporal dynamics of stomatal patchiness in order to investigate the possibility that plants solve their problem of adjusting stomatal aperture in order to maximize CO2 uptake while minimizing H2O loss through an emergent, distributed computation. An extensive study is done on qualitative and quantitative characteristics of stomatal patchiness, and improved imaging techniques are developed to more fully capture the dynamics. In doing so, soliton-like structures were discovered. Sequences of chlorophyll fluorescence images of Xanthium strumarium leaves were then compared to image sequences of cellular computer simulations that solve problems via emergent, distributed computation. Statistical analyses revealed that the spatial and temporal correlations of the patchy dynamics for the two types of images were indistinguishable.
⊖ Bibtex
MASTERSTHESIS{West2004Masters,
author = {West, Jevin D},
title = {Investigations into the spatial and temporal dynamics of stomatal networks to determine whether plants perform emergent,
distributed computation},
school = {Utah State University, Department of Biology},
year = {2004}}
author = {West, Jevin D},
title = {Investigations into the spatial and temporal dynamics of stomatal networks to determine whether plants perform emergent,
distributed computation},
school = {Utah State University, Department of Biology},
year = {2004}}
White paper, Blog Posts, Theses and Patents
Addressing false claims and misperceptions of the UW Center for an Informed Public’s research (2023)
Center for an Informed Public.
March 16
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The researchers at the University of Washington’s Center for an Informed Public (CIP) are recognized leaders in the study of rumors, conspiracy theories, and mis- and disinformation. Over the past decade, our research has made significant strides towards understanding and addressing these problems. Unfortunately, some of the projects CIP researchers have contributed to have become the subject of false claims and criticism that mischaracterizes our work, a tactic that peer researchers in this space are also experiencing. As mis- and disinformation researchers, it’s distressing — though perhaps not surprising — to see some of the very dynamics and tactics we study being used to disrupt and undermine our own work and its impact. That includes our work with the nonpartisan Election Integrity Partnership research collaboration that we helped launch in 2020 with the Stanford Internet Observatory and other partners...
⊖ Bibtex
@article{Starbird2023cip-march,
author = {Kate Starbird and Ryan Calo and Chris Coward and Emma Spiro and Jevin D. West},
title = {Addressing false claims and misperceptions of the UW Center for an Informed Public’s research},
journal = {Center for an Informed Public},
volume = {March 16},
year = {2023},
url = {https://www.cip.uw.edu/2023/03/16/uw-cip-election-integrity-partnership-research-claims/}}
author = {Kate Starbird and Ryan Calo and Chris Coward and Emma Spiro and Jevin D. West},
title = {Addressing false claims and misperceptions of the UW Center for an Informed Public’s research},
journal = {Center for an Informed Public},
volume = {March 16},
year = {2023},
url = {https://www.cip.uw.edu/2023/03/16/uw-cip-election-integrity-partnership-research-claims/}}
How Real-Time Visualizations of Vote Count 'Spikes' Can Lead to Unfounded Allegations of Election Fraud (2022)
Election Integrity Partnerhsip.
November 13
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Seemingly sudden changes or "spikes" in vote counts and shifts in vote composition for candidates are commonplace in elections. Although these vote count "spikes" have specific explanations and are not indicative of voter fraud, they have been incorporated into voter fraud allegations and conspiracy theories in prior elections, including 2020, and now, in 2022. Visualizations that show temporal changes in vote shares (e.g., Fig. 1) can be easily misinterpreted due to errors stemming from data sources and data processing pipelines. These errors enable potentially misleading reporting on data visualizations — both intentionally or unintentionally...
⊖ Bibtex
@article{Venkatagiri2022eip,
author = {Venkatagiri, Sukrit and Caulfield, Mike and West, Jevin D},
title = {How Real-Time Visualizations of Vote Count 'Spikes' Can Lead to Unfounded Allegations of Election Fraud},
journal = {Election Integrity Partnership},
volume = {November 13},
year = {2022},
url = {https://www.eipartnership.net/blog/potentially-misleading-data-visualizations-lindell}}
author = {Venkatagiri, Sukrit and Caulfield, Mike and West, Jevin D},
title = {How Real-Time Visualizations of Vote Count 'Spikes' Can Lead to Unfounded Allegations of Election Fraud},
journal = {Election Integrity Partnership},
volume = {November 13},
year = {2022},
url = {https://www.eipartnership.net/blog/potentially-misleading-data-visualizations-lindell}}
The Long Fuse: Misinformation and the 2020 Election (2021)
Election Integrity Partnerhsip.
March 2
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The Election Integrity Partnership was officially formed on July 26, 2020 — 100 days before the 2020 presidential election — as a coalition of research entities who would focus on supporting real-time information exchange between the research community, election officials, government agencies, civil society organizations, and social media platforms. The Partnership was formed between four of the nation’s leading institutions focused on understanding misinformation in the social media landscape: the Stanford Internet Observatory, Graphika, the Atlantic Council’s Digital Forensic Research Lab, and the University of Washington’s Center for an Informed Public. This is the final report of their findings.
⊖ Bibtex
@article{EIP2021,
author = {Center for an Informed Public and Digital Forensic Research Lab and Graphika and Stanford Internet Observatory},
title = {The Long Fuse: Misinformation and the 2020 Election},
journal = {Election Integrity Partnership},
volume = {March 2},
year = {2021},
url = {https://purl.stanford.edu/tr171zs0069}}
author = {Center for an Informed Public and Digital Forensic Research Lab and Graphika and Stanford Internet Observatory},
title = {The Long Fuse: Misinformation and the 2020 Election},
journal = {Election Integrity Partnership},
volume = {March 2},
year = {2021},
url = {https://purl.stanford.edu/tr171zs0069}}
Vote Data Patterns used to Delegitimize the Election Results (2020)
Election Integrity Partnerhsip.
November 6
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The figure above shows the leading digit of reported vote tallies across select counties. For instance, the final tally in Dane County, Wisconsin was 338,946. This would count for one county in the 3 column. But why would anyone care to look at this kind of frequency distribution? Data forensic experts use these distributions to investigate fraud. They look at whether empirical distributions of leading digits deviate from a special distribution described by Benford’s Law. The law posits that leading digits of numbers are more likely to be smaller numbers than larger numbers...
⊖ Bibtex
@article{Bak-Coleman2020eip,
author = {Bak-Coleman, Joe and Wack, Morgan and Schafer, Joey and Spiro, Emma and West, Jevin D},
title = {Vote Data Patterns used to Delegitimize the Election Results},
journal = {Election Integrity Partnership},
volume = {November 6},
year = {2020},
url = {https://www.eipartnership.net/rapid-response/what-the-election-results-dont-tell-us}}
author = {Bak-Coleman, Joe and Wack, Morgan and Schafer, Joey and Spiro, Emma and West, Jevin D},
title = {Vote Data Patterns used to Delegitimize the Election Results},
journal = {Election Integrity Partnership},
volume = {November 6},
year = {2020},
url = {https://www.eipartnership.net/rapid-response/what-the-election-results-dont-tell-us}}
Uncertainty and Misinformation: What to Expect on Election Night and Days After (2020)
Election Integrity Partnerhsip.
October 26
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
The U.S. general election, less than two weeks away, is unprecedented in several ways. We are currently in the midst of a devastating pandemic, with infection numbers hitting rising across much of the country. In response to the pandemic, election officials in some states have adapted and modified their voting processes. Many are relying upon mail-in ballots in new ways and at new scales. Even states with universal mail-in balloting are seeing new challenges such as increased turn-out. This has created opportunities for some to question, and diminish trust in, the election processes. We have heard repeated (and mostly unfounded) accusations, sometimes from the U.S. president himself, that mail-in voting will lead to widespread fraud and that the election is “rigged.” Researchers have described this as an elite-driven disinformation campaign and though this campaign is largely shaped by right-wing media and political influencers, its effects aren’t limited to one side of the political spectrum...
⊖ Bibtex
@article{Starbird2020eip-uncertainty,
author = {Kate Starbird and Michael Caulfield and Renee DiResta and Jevin D. West and Emma Spiro and Nicole Buckley and Rachel Moran and Morgan Wack},
title = {Uncertainty and Misinformation: What to Expect on Election Night and Days After},
journal = {Election Integrity Partnership},
volume = {October 26},
year = {2020},
url = {https://www.eipartnership.net/news/what-to-expect}}
author = {Kate Starbird and Michael Caulfield and Renee DiResta and Jevin D. West and Emma Spiro and Nicole Buckley and Rachel Moran and Morgan Wack},
title = {Uncertainty and Misinformation: What to Expect on Election Night and Days After},
journal = {Election Integrity Partnership},
volume = {October 26},
year = {2020},
url = {https://www.eipartnership.net/news/what-to-expect}}
"Know the Facts": Do Social Media COVID-19 Banners Help? (2020)
Medium.
April 2
Weaponizing projections as tools of election delegitimization (2020)
Election Integrity Partnerhsip.
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Forecasting elections is notoriously difficult. In 2016, several models expressed an all but certain Clinton victory. FiveThirtyEight was an exception among the more established forecasters, giving Trump a seemingly-generous 30%. In the wake of the 2016 election, Modellers and pollsters alike pointed fingers, and there has been no shortage of explanations for what went wrong uncovered through post-mortems. Of the many credible explanations, there was no evidence that voter fraud or illegal electoral misconduct was to blame...
⊖ Bibtex
@article{Bak-Coleman2020eip-projections,
author = {Bak-Coleman, Joe and Haughey, Melinda and Schafer, Joey and Wack, Morgan and West, Jevin D},
title = {Weaponizing projections as tools of election delegitimization},
journal = {Election Integrity Partnership},
volume = {November 3},
year = {2020},
url = {https://www.eipartnership.net/rapid-response/weaponizing-projections-as-tools-of-election-delegitimization}}
author = {Bak-Coleman, Joe and Haughey, Melinda and Schafer, Joey and Wack, Morgan and West, Jevin D},
title = {Weaponizing projections as tools of election delegitimization},
journal = {Election Integrity Partnership},
volume = {November 3},
year = {2020},
url = {https://www.eipartnership.net/rapid-response/weaponizing-projections-as-tools-of-election-delegitimization}}
Uncovering Reality (2020)
Pacific Science Center (Virtual Exhibit)
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
We’re all looking at a lot of graphs and charts during this pandemic, but which are trustworthy? Check out this virtual exhibit, created by PacSci with the University of Washington’s Center for an Informed Public, to learn more about some of the common ways data visualizations can be accidentally distorted or intentionally manipulated and why that matters.
⊖ Bibtex
@article{Maffia2020pacificsciencecenter-reality,
author = {Maffia, Felicia and Verret, Abby and Wood, Gwendolyn and Gardner, and Bergstrom, Carl T. and West, Jevin D.},
title = {Uncovering Reality},
journal = {Pacific Science Center (Virtual Exhibit)},
year = {2020},
url = {https://view.genial.ly/5eceeffdc42d9b0dba4b9a64/interactive-content-uncovering-reality}}
author = {Maffia, Felicia and Verret, Abby and Wood, Gwendolyn and Gardner, and Bergstrom, Carl T. and West, Jevin D.},
title = {Uncovering Reality},
journal = {Pacific Science Center (Virtual Exhibit)},
year = {2020},
url = {https://view.genial.ly/5eceeffdc42d9b0dba4b9a64/interactive-content-uncovering-reality}}
Facts in the time of COVID-19 (2020)
Pacific Science Center (Virtual Exhibit)
HTML |
Abstract »
⊖ Abstract
|
Bibtex »
During a pandemic it’s more important than ever to avoid falling for or spreading misinformation and disinformation. But with so much new and changing information, how do you know what to trust? PacSci has teamed up with the University of Washington’s Center for an Informed Public to help you navigate COVID-19 and the 24-hour news cycle.
⊖ Bibtex
@article{Maffia2020pacificsciencecenter-reality,
author = {Maffia, Felicia and Verret, Abby and Wood, Gwendolyn and Gardner, and Bergstrom, Carl T. and West, Jevin D.},
title = {Facts in the time of COVID-19},
journal = {Pacific Science Center (Virtual Exhibit)},
year = {2020},
url = {https://view.genial.ly/5eea3a0c15e1e60d88c5c4d0/interactive-content-facts-in-the-time-of-covid-19}}
author = {Maffia, Felicia and Verret, Abby and Wood, Gwendolyn and Gardner, and Bergstrom, Carl T. and West, Jevin D.},
title = {Facts in the time of COVID-19},
journal = {Pacific Science Center (Virtual Exhibit)},
year = {2020},
url = {https://view.genial.ly/5eea3a0c15e1e60d88c5c4d0/interactive-content-facts-in-the-time-of-covid-19}}
Working Papers
Helping Students FIG-ure It Out: A large-scale study of freshmen interestgroups (FIGs) and student success (in review)
Abstract »
⊖ Abstract
|
Bibtex »
Freshman seminars are a ubiquitous offering in higher education, but they haven't been evaluated using matched comparisons with data at scale. In this work, we use transcript data on nearly 77,000 students to examine the impact of first-year interest groups (FIGs) on student graduation and retention. We first apply propensity score matching on course-level data to account for selection bias. We find that graduation and re-enrollment rates for FIG students were higher than non-FIG students, an effect that was more pronounced for self-identified Hispanic students and under-represented minority students. We then employ topic modeling to analyze survey responses from over 12,500 FIG students to find that social aspects of FIGs were most beneficial to students. Interestingly, references to social aspects were not disproportionately present in the responses of self-identified Hispanic students and under-represented minority students. Finally, we build supervised machine learning models to predict students' graduation from their survey responses.
⊖ Bibtex
@ARTICLE{Aulck2021figs,
author = {Aulck, Lovenoor and Malters, Joshua and Lee, Casey and Mancinelli, Gianni and Sun, Min and West, Jevin D.},
title = {Helping Students FIG-ure It Out: A large-scale study of freshmen interest groups (FIGs) and student success},
journal = {AERA Open},
volume = {(in review)},
year = {2021}}
author = {Aulck, Lovenoor and Malters, Joshua and Lee, Casey and Mancinelli, Gianni and Sun, Min and West, Jevin D.},
title = {Helping Students FIG-ure It Out: A large-scale study of freshmen interest groups (FIGs) and student success},
journal = {AERA Open},
volume = {(in review)},
year = {2021}}
Delineating knowledge domains in the scientific literature using visual information (2021)
(in review).
Abstract »
⊖ Abstract
|
Bibtex »
Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender systems, and studies of scientific information exchange. We encode sets of images into a visual signature, then use distances between these signatures to understand how patterns of visual communication compare with patterns of jargon and citation structures. We find that figures can be as effective for differentiating communities of practice as text or citation patterns. We then consider where these metrics disagree to understand how different disciplines use visualization to express ideas. Finally, we further consider how specific figure types propagate through the literature, suggesting a new mechanism for understanding the flow of ideas apart from conventional channels of text and citations. Our ultimate aim is to better leverage these information-dense objects to improve scientific communication across disciplinary boundaries.
⊖ Bibtex
@inproceedings{Yang2021delineatingvisual,
author = {Yang, Sean and Lee, Poshen and West, Jevin D. and Howe, Bill},
title = {Delineating Disciplines Using Visual Information in Scientific Literature},
booktitle={(in review)},
year = {2021}}
author = {Yang, Sean and Lee, Poshen and West, Jevin D. and Howe, Bill},
title = {Delineating Disciplines Using Visual Information in Scientific Literature},
booktitle={(in review)},
year = {2021}}
A natural network derived technology classification (in prep)
Frontiers in Physics (Social Physics): The Physics of the Law - Legal Systems Through the Prism of Complexity Science
Abstract »
⊖ Abstract
|
Bibtex »
As long as there has been technology, there have been technology classifications. Underlying these classifications have commonly been criteria such as structure (i.e., similarity based on design or construction), function (i.e., similarity based on operation or purpose), and derivation (i.e., similarity based on common origin). Categories have traditionally run the gamut from energy, mechanical devices, optics, and computer hardware to inorganic chemistry, biotechnology, pharmaceuticals, and computer software. More formally, patent systems worldwide have adopted detailed taxonomies of technological innovations. In addition to national systems native to individual countries, groups of countries have negotiated and adopted transnational systems like the International Patent Classification (“IPC”) and Cooperative Patent Classification (“CPC”) systems. For example, the CPC system is composed of general technological categories: (A) human necessities, (B) performing operations and transporting, (C) chemistry and metallurgy, (D) textiles and paper, (E) fixed construction, (F) mechanical engineering, lighting, heating, weapons, and blasting, (G) physics, (H) electricity, and (Y) new technologies. We use eigenvector centrality, hierarchical graphing, and network analysis to offer a new “natural” technology classification. Using a dataset consisting of approximately 500,000,000 citations among almost 150,000,000 worldwide patent documents, we calculate the relative positions of patent documents embedded within a comprehensive patent citation network. One of the emergent properties of this analysis is a series of thousands of technology “clusters”, from the largest, most inclusive, to the smallest, more exclusive. Based on our novel network analytic approach, we find that the worldwide network of patent documents reveals the following high-level technology clusters: computer software, mechanical, computer hardware, biopharmaceuticals, medical devices, inorganic chemistry, electronics, energy, and optics. Furthermore, within each of these super-clusters, there are nested subclusters, sub sub clusters, subsubsub clusters, et cetera, for a total of about 80,000,000 distinct technology clusters. This natural technology classification system offers a more objective and empirically justifiable and replicable approach than any previous approaches, including the widely used CPC. We describe our methods for calculating these technology clusters, the nest hierarchical structure of the resulting clusters, and implications for the classification of technology.
⊖ Bibtex
@article{Torrance2021frontiersphysics,
author = {Torrance, Andrew and Friedman, Lisa and West, Jevin D},
title = {A natural network derived technology classification},
journal = {Frontiers in Physics (Social Physics): The Physics of the Law - Legal Systems Through the Prism of Complexity Science},
volume = {(in prep)},
year = {2021}}
author = {Torrance, Andrew and Friedman, Lisa and West, Jevin D},
title = {A natural network derived technology classification},
journal = {Frontiers in Physics (Social Physics): The Physics of the Law - Legal Systems Through the Prism of Complexity Science},
volume = {(in prep)},
year = {2021}}
Measuring interdisciplinarity without subject categories (in prep)
Abstract »
⊖ Abstract
Interdisciplinary research plays an important role in academic research across and sometimes between the natural sciences, social sciences, and humanities. Universities and funding organizations alike aim to promote and further interdisciplinarity, but interdisciplinary research arguably remains under-appreciated and under-funded. An influential 2005 report from the National Research Council defines interdisciplinary research as a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or field of research practice.
Why scatter plots suggest causality, and what we can do about it (2018)
HTML |
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Scatter plots carry an implicit if subtle message about causality. Whether we look at functions of one variable in pure mathematics, plots of experimental measurements as a function of the experimental conditions, or scatter plots of predictor and response variables, the value plotted on the vertical axis is by convention assumed to be determined or influenced by the value on the horizontal axis. This is a problem for the public understanding of scientific results and perhaps also for professional scientists' interpretations of scatter plots. To avoid suggesting a causal relationship between the x and y values in a scatter plot, we propose a new type of data visualization, the diamond plot. Diamond plots are essentially 45 degree rotations of ordinary scatter plots; by visually jarring the viewer they clearly indicate that she should not draw the usual distinction between independent/predictor variable and dependent/response variable. Instead, she should see the relationship as purely correlative.
⊖ Bibtex
@article{bergstrom2018diamond,
author = {Bergstrom, Carl T and West, Jevin D},
title = {Why scatter plots suggest causality, and what we can do about it},
journal={arXiv},
year = {2018},
pages = {arXiv:1809.09328}}
author = {Bergstrom, Carl T and West, Jevin D},
title = {Why scatter plots suggest causality, and what we can do about it},
journal={arXiv},
year = {2018},
pages = {arXiv:1809.09328}}
References Predict Acceptance In Top Computer Science Conferences (in review)
Abstract »
⊖ Abstract
The number of papers submitted to top conferences and journals is growing superlinearly, creating challenges for peer review. Machine learning techniques have been proposed to partially automate the peer review process to manage scale. Existing NLP methods predict acceptance from the paper's content; we consider whether the paper's bibliography could be used to improve prediction performance. Surprisingly, we find that not only can bibliographic references improve prediction performance, but references alone can achieve higher accuracy than state-of-the-art NLP models. Using data from ICLR, we show that accepted papers tend to include more references overall, with more references from recent years, from closely related venues, and from conferences with higher measured impact. Using these results, we show that just two features, the number of references in the last two years and the number of references from the same venue (ICLR), can achieve prediction accuracy within 3\% of the best known models. These findings not only demonstrate that any ML-based peer review system should include bibliographic references, but also that references may have a disproportionate influence over peer review decisions.
Show Me Don't Tell Me: Figure Use Associated with Increased Citations (in prep)
Abstract »
⊖ Abstract
We consider the relationship between the use of figures in the biomedical literature and scientific impact. We find that higher use of expository figures is associated with increased citations, suggesting that presenting results visually improves communicability. We extract the figures from 200k papers in PubMed and train a model to classify the figures as diagrams, visualizations, photographs, or tables. We then measure the correlation between the density of each figure type with measures of scientific impact, including raw citation count and more sophisticated measures that consider the link structures between papers. We find that the number of diagrams, plots and tables per page correlates positively with impact, while the number of photographs is negatively correlated. Moreover, we find citations from within the same field tend to be associated with the use of tables to present data, but that citations from other fields are associated with the use of diagrams, suggesting that visual representations improve communication across discipline boundaries. To enable further analysis, we have released the labeled dataset used in this study and an suite of online services to explore the data.
Echo Chambers in Science? (in prep)
Abstract »
⊖ Abstract
This paper examines whether digitization and the rise of integrated academic search engines have transformed how researchers engage with previous literature, a critical component of modern scientific practice. Among technological advancements, we particularly focus on the recent emergence of integrated academic search engines such as Google Scholar, because these services are provided based on proprietary algorithms that actively interfere in authors’ search process. While the impact of general search engines has been widely noted, the effect of academic systems on scientific practice has not been fully examined. Using the comprehensive Web of Science database covering a wide range of publications and the citation links between them, we focus on yearly changes in the the citing behavior of researchers in six well-established disciplines between 1999 to 2016. We document three temporal changes in researchers’ behavior. First, researchers’ citations in both disciplines have become more expansive since 2005 and stable after 2010. Second, controlling for a measure of journal prestige, the impact of a paper-based popularity measure, the cumulative previous citation count, has increased in both Sociology and Social Work. Third, more papers published in lower-tier journals are now cited than prior to 2005, and the variability of citation counts among papers published in the same journal has also increased. Based on three findings, we see some evidence that the digitization of science has democratized the exposure of prior research and weakened journals' role as gatekeepers. Nevertheless, the increasing importance of prior citations suggests a competing trend is also occurring that may create an echo chamber centered on small numbers of highly cited papers.
Mapping mathematical jargon in the scholarly literature (in prep)
Abstract »
⊖ Abstract
Tracing ideas through the scientific literature is useful in understanding the origin of ideas and for generating new ones. Machines can be trained to do this at large scale, feeding search engines and recommendation systems. Citations and text are the features commonly used for these tasks. In this paper, we focus on a largely ignored facet of scholarly papers--the equations. Mathematical language varies from field to field but original formulae are maintained over generations. We extract the mathematical symbols from 323,830 LaTeX source files in the arXiv repository. We compare the symbol distributions across different fields and calculate the jargon distance between fields. We also compare mathematical jargon to similarity measures based on natural language (text) and structural information (citations). When using mathematical jargon, we find a greater difference between the experimental and theoretical disciplines than within these fields. We also find that similar fields tend to cluster together using this method in intuitive ways, suggesting that there is useful information in these distributions that even relatively simple methods can surface. This provides a first step in using equations as a bridge between disciplines, which can be useful in recommending articles within and across these fields and more generally helping to bridge knowledge gaps in science.
Ranking and mapping article-level citation networks (in prep)
Abstract »
⊖ Abstract
Time-directed networks pose a challenge for flow-based methods of network analysis. Such networks are acyclic or nearly acyclic and thus very far from the nearly ergodic structures that flow-based methods are designed to handle. Without suitable modification, flow-directed ranking algorithms such as the Eigenfactor score put too much weight on older documents. Flow-based methods of cluster detection, such as the map equation approach, can fail to resolve important structures. Here we show how flow-directed methods can be modified to avoid these problems and thereby perform well on time-directed networks. To demonstrate the power of the new {\em article level Eigenfactor} metrics, we rank the 1.8 millions articles in JSTOR. To illustrate the power of our clustering approach, we create a hierarchical citation map of the JSTOR corpus using the article-level ranking.
Are Trolls Good (At Choosing Valuable Patents)? (in prep)
Abstract »
⊖ Abstract
Patent acquisition entities ("PAEs")(sometimes also referred to as non-practicing entities ("NPEs") or "patent trolls"), own rights to many thousands of patents. Moreover, they frequently assert such patents against other entities in infringement litigation. This phenomenon has led some to warn of a crisis caused by PAE who allegedly abuse and undermine the patent system through the assertion of large numbers of low value junk patents. Others have suggested that PAEs may play more positive roles, such as providing a market for the sale of patents by individual inventors, thus incentivizing further innovation, salvaging patents from bankrupt firms, or cultivating expertise in the challenging arena of identifying valuable patents. An important threshold question is whether or not PAEs are, in fact, relatively skilled at separating patent wheat from patent chaff. Using the Stanford NPE Litigation Dataset ("Stanford Dataset") in conjunction with two large patent datasets we have already compiled ourselves (that is, (a) all United States ("U.S.") patents issued since 1975 and (b) all U.S. patent litigations since 2000 that have resulted in a published decision), we explore this question.
Measuring interdisciplinarity without subject categories (in prep)
Abstract »
⊖ Abstract
Interdisciplinary research plays an important role in academic research across and sometimes between the natural sciences, social sciences, and humanities. Universities and funding organizations alike aim to promote and further interdisciplinarity, but interdisciplinary research arguably remains under-appreciated and under-funded. An influential 2005 report from the National Research Council defines interdisciplinary research as a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or field of research practice.
Pseudocode
Compressed Source Code for the Eigenfactor Calculation (2008)
Eigenfactor.org
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Mathematica code for calculating Journal Eigenfactor scores.
⊖ Bibtex
@MISC{Bergstrom2008pseudocodeJournals,
author = {Bergstrom, Carl T and West, Jevin D},
title = {Compressed Source Code for the Eigenfactor Calculation},
howpublished = {Eigenfactor.org},
year = {2008}}
author = {Bergstrom, Carl T and West, Jevin D},
title = {Compressed Source Code for the Eigenfactor Calculation},
howpublished = {Eigenfactor.org},
year = {2008}}
Calculating Journal-Level Eigenfactors (2008)
Eigenfactor.org
PDF |
Abstract »
⊖ Abstract
|
Bibtex »
Pseudocode for calculating Jorunal-level Eigenfactor Scores and ArticleInfluence Scores.
⊖ Bibtex
@MISC{West2008pseudocodeAuthors,
author = {West, Jevin D and Bergstrom, Carl T},
title = {Calculating Journal-Level Eigenfactor Metrics},
howpublished = {Eigenfactor.org},
year = {2008}}
author = {West, Jevin D and Bergstrom, Carl T},
title = {Calculating Journal-Level Eigenfactor Metrics},
howpublished = {Eigenfactor.org},
year = {2008}}
Posters
Is Abortion 'Too Political'? Assessing Use of University Affiliation by Faculty in Popular U.S. Newspapers (2021)
Virtual (May 3-7) PDF
Chromatic Structure and Family Resemblance in Large Art Collections — Exemplary Quantification and Visualizations (2018)
Mexico City, Mexico (June 26-29)
Examining Gender Authorship in Aquaculture Journals (2016)
Bangkok, Thailand (Aug. 3-7) PDF
Visualizing Scholarly Influence Over Time (2016)
Philadelphia, PA (March. 20-21) PDF
Surveying Usage of Academic Research in Journalism (2016)
University of Washington (June) PDF
Using Visual Metaphor in Interactive Visualization to Improve Navigation of Complex Data Sets (2014)
University of Washington (November) PDF
Categorical Landscapes: Large Scale Cluster Analysis of Wikipedia Category System over Time (2012)
UW iSchool Research Fair
University of Washington, Seattle, WA
A Novel Method for Ranking the Quality of Cardiology Literature (2009)
Orlando, FL (March) PDF
Traversing Fitness Landscapes by Changing Environments (2009)
Best Poster Award, Gordon Research Conference.
Proctor Academy, NH (July 20-24) PDF
A Top-Down Approach to Discriminate Adaptive Landscape Topology (2007)
Gordon Research Conference.
Proctor Academy, NH (July 22-27) PDF
Ranking and Mapping Scholarly Literature (2007)
Gordon Research Conference.
Proctor Academy, NH (July 22-27) PDF
The Missing Link (2007)
UW Scholarship of Teaching and Learning Symposium.
University of Washington (April) PDF
The evolution of a 'Tragedy of the Commons' in a Host-Pathogen Metapopulation (2006)
Port Townsend, WA (April) PDF
Sophisticated Information Processing in Plants (2005)
Universitatbonn, Florence, Italy (May 17-20) PDF
Problem Solving Dynamics of Stomatal Networks (2004)
Utah State University (June)
Stomatal Networks and Cellular Computation (2004)
Boston, MA (May 17) PDF
The Game of Leaf: Evidence that Stomatal Networks are Cellular Computers (2003)
Santa Fe, NM (May) PDF
Can Stomata Respond as a Reaction Diffusion Model? (2001)
Logan, UT (April)
Conference Workshops
Modifying the Eigenfactor Algorithm for improving interpretability (2014)
Annual Meeting of the Association for Information Science and Technology (ASIST)
The Temporal Dimension in the Study of Knowledge Bases: Approaches to Understanding Knowledge Creation and Representation Over Time (2013)
Proceedings of the American Society for Information Science and Technology
50(1): 1-3, PDF
Acknowledgements
Dual adjacency matrix: exploring link groups in dense networks (2015)
Computer Graphics Forum.
34(3): 311-320. HTML