Policy2Code Prototyping Challenge
Experimenting with Generative AI to Translate Public Benefits Policy into Rules as Code
Overview
The Policy2Code Prototyping Challenge invites experiments to test how generative artificial intelligence (AI) technology can be used to translate government policies for U.S. public benefits programs into plain language and software code.
It aims to explore ways in which generative AI tools such as Large Language Models (LLMs) can help make policy implementation more efficient by converting policies into plain language logic models and software code under an approach known worldwide as Rules as Code (RaC).
Generative AI technologies offer civic technologists and practitioners an opportunity to potentially maximize and save countless human hours spent interpreting policy into code, and translating between code languages — potentially expediting the adoption of Rules as Code. What is unclear is whether or not generative models are advanced enough to be fine tuned to translate rules to code easily, or whether a more robust human-in-the-loop or reinforcement learning process is needed. This challenge is a first step toward understanding that.
The challenge is open to technologists, policy specialists, and practitioners who want to collaborate and demonstrate the benefits and limitations of using AI in this context.
This challenge is organized by the Digital Benefits Network (DBN) at the Beeck Center for Social Impact + Innovation in partnership with the Massive Data Institute (MDI), both based at Georgetown University in Washington, D.C.
Challenge Focus
The Policy2Code Prototyping Challenge seeks experiments to test the use of generative AI in the following scenarios:
- The translation of U.S. public benefits policies into software code.
- The translation of U.S. public benefits policies into plain language logic models.
- The translation between coding languages, including converting code from legacy systems for benefits eligibility and enrollment.
- Recommendations of additional scenarios that enhance policy implementation through digitization.
The demos and analysis from this challenge will be made available to the public, allowing practitioners to gain insights into the capabilities of current technologies and inspiring them to apply the latest advancements in their own projects through open-source code and available resources. We will be using Github Classroom to coordinate and share starter resources as well as instructions across the prototyping teams.
Benefits Policy Focus
The challenge is focused on U.S. public benefit programs like SNAP, WIC, Medicaid, TANF, Basic Income, Unemployment Insurance, SSI/SSDI, tax credits (EITC, CTC), and child care.
Demo Day Recap & Videos
Policy2Code Demo Day Recap
A recap of the twelve teams who presented during the Policy2Code Prototyping Challenge Demo Day at BenCon 2024.
Policy Pulse at Policy2Code Demo Day at BenCon 2024
The team introduced "Policy Pulse," a tool to help policy analysts understand laws and regulations better by comparing current policies with their original goals to identify implementation issues.
Code The Dream at Policy2Code Demo Day at BenCon 2024
The team introduced an AI assistant for benefits navigators to streamline the process and improve outcomes by quickly assessing client eligibility for benefits programs.
Hoyas Lex Ad Codex at Policy2Code Demo Day at BenCon 2024
The team explored the performance of various AI chatbots and LLMs in supporting the adoption of Rules as Code for SNAP and Medicaid policies using policy data from Georgia and Oklahoma.
MITRE at Policy2Code Demo Day at BenCon 2024
The team aimed to automate applying rules efficiently by creating computable policies, recognizing the need for AI tools to convert legacy policy content into automated business rules using Decision Model Notation (DMN) for effective processing and monitoring.
mRelief at Policy2Code Demo Day at BenCon 2024
The team conducted experiments to determine whether clients would be responsive to proactive support offered by a chatbot, and identify the ideal timing of the intervention.
Nava Labradors at Policy2Code Demo Day at BenCon 2024
The team developed an AI solution to assist benefit navigators with in-the-moment program information, finding that while LLMs are useful for summarizing and interpreting text, they are not ideal for implementing strict formulas like benefit calculations, but can accelerate the eligibility process by leveraging their strengths in general tasks.
PolicyEngine at Policy2Code Demo Day at BenCon 2024
The team developed an AI-powered explanation feature that effectively translates complex, multi-program policy calculations into clear and accessible explanations, enabling users to explore "what-if" scenarios and understand key factors influencing benefit amounts and eligibility thresholds.
The RAGTag SNAPpers at Policy2Code Demo Day at BenCon 2024
The team examined how AI, specifically LLMs, could streamline the case review process for SNAP applications to alleviate the burden on case workers while potentially improving accuracy.
POMs and Circumstance at Policy2Code Demo Day at BenCon 2024
The team explored using LLMs to interpret the Program Operations Manual System (POMS) into plain language logic models and flowcharts as educational resources for SSI and SSDI eligibility, benchmarking LLMs in RAG methods for reliability in answering queries and providing useful instructions to users.
Team ImPROMPTu at Policy2Code Demo Day at BenCon 2024
The team developed an application to simplify Medicaid and CHIP applications through LLM APIs while addressing limitations such as hallucinations and outdated information by implementing a selective input process for clean and current data.
Tech2i Tralblazers at Policy2Code Demo Day at BenCon 2024
The team developed an application to simplify Medicaid and CHIP applications through LLM APIs while addressing limitations such as hallucinations and outdated information by implementing a selective input process for clean and current data.
Participating Teams
Thank you to everyone who submitted creative ideas for impactful prototypes!
The following teams are participating in Policy2Code.
PolicyPulse
Government policies often underperform due to a lack of clear understanding and measurement of results, making the implementation of statutes into policies a complex and lengthy process. Shifting societal and economic landscapes, implementation challenges, and resource-intensive manual evaluation methods create a gap between legislative intent and real-world impact, leading to inefficiencies and unmet public needs.
We propose the use of an Adversarial Model using a Large Language Model (LLM) to compare the logic model and code from existing policy implementation with those generated directly from the law/legislation. The tool thus developed will allow policy analysts and program managers to compare the logic of current policies with their original legislative intent, identifying gaps and inefficiencies in policy implementation.
This AI-driven approach will help bridge the gap between policy intent and outcomes, providing valuable insights into policy interpretation and all for greater efficiency of policy implementation by policy specialists.
Alexander Chen, Senior Consultant , BDO Canada
Jerry Wilson, Senior Analyst , BDO Canada
Craig Marchand, Vice President, BDO Canada Exponential Labs
Emmanuel Florakas, Partner, Technology Data Leader, Consulting
Rule Rangers
Building off of work to develop an innovative platform to streamline eligibility screenings and application submissions across various benefits and US states. A major goal we have is to empower subject matter experts to configure campaigns, benefit rules, screening logic, and submission logic with minimal reliance on software engineers.
Preston Cabe, Software Engineering Manager, Benefits Data Trust
Daniel Singer, Senior Product Manager, Benefits Data Trust
Miriam Peskowitz, Software Engineer, Benefits Data Trust
Code the Dream presents: Bennie and the Artificial Jets
Leveraging the power of large language models (LLMs) and generative AI, Code the Dream is developing innovative tools to enhance public benefits systems. Our comprehensive benefits screener, created in collaboration with public benefits partners in North Carolina and inspired by Colorado’s MyFriendBen, aims to streamline access to vital resources.
Tom Rau, Director of CTD Labs, Code the Dream
Louise Pocock, Senior Program Consultant, Immigrant Services, Virginia Office of New Americans
Brian Espinosa, Senior Software Developer, Radiate Consulting
Mohammad Rezaie, Apprentice Software Developer, Code the Dream
Tam Pham, Apprentice Software Developer, Code the Dream
Sofia Perez, Apprentice UI/UX Designer, Code the Dream
Hoyas Lex Ad Codex
Building on our research in Exploring Rules Communication, we seek to understand how different Large Language Models (LLMs) perform key tasks in Rules as Code (RaC) adoption. We plan to provide documentation to help the ecosystem, and in particular government decision makers, better understand how the technologies currently perform and if public benefits eligibility rules can be more efficiently managed using a generative AI powered RaC approach.We will test three commercially available chatbots (and their related models) to evaluate how they 1) generate eligibility questions for SNAP and Medicaid in at least five states, 2) how they write plain language summaries of the logic, and 3) how they translate policy into code, including which languages are easiest to work with.Based on performance of the models, we will develop a strategy for fine-tuning the chatbots for the non-code generation experiments and the code generation LLMs for the code generation experiments. Our initial plan is to provide a small number of examples to the chatbots, have them extend the examples in different ways to increase the training data, and then use all the examples (hand developed, chatbot generated) to fine-tune each LLM for each experiment.
Lisa Singh, Director, Massive Data Institute (MDI), Sonneborn Chair, Professor, Department of Computer Science, Professor, McCourt School of Public Policy
Mahendran Velauthapillai, McBride Professor, CS Department, Georgetown University
Ariel Kennan, Senior Director, Digital Benefits Network (DBN) at the Beeck Center for Social Impact + Innovation
Tina Amper, Community Advisor, Rules As Code Community of Practice, DBN
Steve Kent, Software Developer, MDI
Jason Goodman, Student Analyst, DBN, MPP/MBA Candidate,Georgetown University
Alessandra Garcia Guevara, Student Analyst, DBN, Georgetown University
Kenny Ahmed, MDI Scholar, MDI,Georgetown University
Jennifer Melot, Technical Lead, Center for Security and Emerging Technology (CSET),Georgetown University
Lightbend
LLMs for better resident experience —The City of South Bend Civic Innovation team and Starlight team (Lightbend) hopes to reduce rejection rates, improve approval rate, and streamline administrative processes for LIHEAP (Low Income Household Energy Assistance Program) in South Bend, Indiana by understanding the approval processes for LIHEAP and other related energy assistance programs and building LLM assisted agents to help social service workers, direct enrollment teams and outreach and support teams improve the net outcomes for eligible applicants.
Shreenath Regunatha, Co-Founder, Starlight
Madi Rogers, Director of Civic Innovation, City of South Bend
Denise Riedl, Chief Innovation Officer, City of South Bend
Catherine Xu, Co-Founder, Starlight
Matthew Williams, First Engineer, Starlight
MITRE AI for Computable Rules
The team is exploring the hypothesis that AI can be used to assess a corpus of policy information to determine its quality and complexity. In addition, the team will leverage AI to develop an efficient method to generate high quality computable rules for machine consumption.
Matt Friedman, Senior Principal- AI and Policy, MITRE
Abigail Gertner, AI-Enhanced Discovery and Decisions Department Manager, MITRE
Laurine Carson, Senior Principal – Policy Analysis, MITRE
Moe Tun – Systems Engineer, MITRE
Shahzad Rajput – AI Engineer, MITRE
Billal Gilani – Policy Analyst, MITRE
Jenny Trinh – Computable Rules Analyst, MITRE
mRelief
mRelief is dedicated to improving access to social services, offering simplified SNAP applications in 10 states. Their user-friendly applications have led to an average application completion time of under 13 minutes and resulted in over 40,000 submissions in 2023. mRelief compared their success rates with FOIA requests for state data, showing a 70% application success rate compared to the states’ 58%. However, only 49% of clients who start their self-service application complete it, with increased engagement via a community-based organization boosting this to 81%.
To address completion rate challenges, mRelief is collaborating with Google.org to develop a GenAI chatbot using LLMs, specifically Dialogflow, aiming to launch a MVP in South Carolina by August 2024. They are mindful of Gen AI risks and plan to keep a human in the loop at all stages, handcraft training data, and monitor the chatbot to ensure an empathetic tone. They aim to mitigate biases and misinformation by testing the prototype with small populations and creating an AI policy and ethics committee.
mRelief believes their AI training efforts will benefit the broader Policy2Code movement and hopes to share knowledge by participating in the Prototyping Challenge, looking forward to learning from other participants’ risk mitigation approaches.
Dize Hacioglu, CTO, mRelief
Belinda Rodriguez, Product Manager, mRelief
Nava Labradors: Retrieving clear and concise answers to navigator’s public benefit questions
Nava Labs is developing generative AI solutions to support navigators such as caseworkers and call center specialists. Through research with navigators, program beneficiaries, and strategic partners, the team identified multiple use cases in which generative AI shows potential to streamline the process of applications and enhance the experience of applying for services. Currently, we’re prototyping one use case: an assistive chatbot that responds to navigators’ program and policy questions while they help clients complete eligibility screenings and applications. The chatbot uses a large language model (LLM) constrained by Retrieval Augmented Generation (RAG) to search policy guidance, retrieve information relevant to the navigator’s question, and summarize the information in plain language. We’ve partnered with Benefits Data Trust (BDT) to develop the chatbot, using BDT navigator feedback to iterate on its design and functionality. The chatbot searches BDT’s internal policy database to answer SNAP questions. Soon, we’ll test the chatbot’s performance retrieving data from public policy documents and supporting additional benefit programs. Key evaluation metrics include the tool’s appropriateness for government services, user acceptability, administrative burden reduction, output accuracy, and algorithmic fairness. We are sharing findings as we progress via public demo days & articles and posting code on Github.
Martelle Esposito, Partnerships and Evaluation Manager + Nava Labs Co-Lead, Nava PBC
Genevieve Gaudet, Design Director, Research and Development + Nava Labs Co-Lead, Nava PBC
Alicia Benish, Partnerships and Evaluation Lead, Nava PBC
Diana Griffin, Product Lead, Nava PBC
Ryan Hansz, Design Lead, Nava PBC
Yoom Lam, Engineering Lead, Nava PBC
Kanaiza Imbuye, Designer, Nava PBC
Charlie Cheng, Engineer, Nava PBC
Kevin Boyer, Engineer, Nava PBC
PolicyEngine
PolicyEngine’s goal is to improve accessibility and comprehension of public benefits policy for policymakers, researchers, and the public. Presently, complex eligibility rules and benefit calculations in programs like TANF are inaccessible and inconsistently applied across states. This leads to challenges in understanding benefits and hindering evidence-based policymaking. PolicyEngine believes in encoding benefits policy in open-source software to enable policy analysis, benefit calculators, and automation tools.
Their prototype aims to explore whether language models can expedite the process of translating benefits policy into code. They plan to leverage AI to encode TANF rules into PolicyEngine’s Python framework, experimenting with few-shot prompting and fine-tuning language models. They will evaluate the quality of the generated code against expected inputs and outputs using models like GPT-4 and Anthropic’s Claude.
The success of this prototype could benefit state agencies, advocates, think tanks, academics, and TANF participants, making the system easier to navigate and enabling data-driven policymaking. Key risks, such as hallucinated or biased rules, will be mitigated through a human-in-the-loop approach, test-driven development, and transparent sharing of training data and models. PolicyEngine believes that transparent open-source code for policy is essential for accountability.
Max Ghenis, CEO, PolicyEngine
Nikhil Woodruff, CTO, PolicyEngine
Anthony Volk, Engineer, PolicyEngine
Ron Basumallik, Senior Software Engineer, Independent Contributor
The RAGTag SNAPpers
Much attention has been placed on the use of large language models (LLMs) to aid individuals in applying for benefits like SNAP. LLMs can simplify the complex legal language surrounding a program’s eligibility requirements, hopefully making it easier to understand and access one’s benefits. Our goal for this project is to take the same philosophy and apply LLMs to help case workers review SNAP applications.
Each individual SNAP application must (by law) be reviewed by a human, who – with the aid of an eligibility system – ultimately makes the decision of whether an application is approved or not. Case workers are under pressure to process applications quickly, accurately, and in high volumes. They must respond to applications within 30 days. They must keep their error rates low as these are publicly reported numbers. And, in 2023, over 2 million households in Illinois alone received SNAP assistance. We believe that an LLM assistant can help case workers 1) process the application by reading in the application PDF, 2) tag areas on the application that provide evidence for an accepted or rejected applications, 3) provide a clear paper trail for the human reviewer to ultimately decide on the SNAP application outcome, and ultimately 4) be a lightweight, flexible model that accommodates differing eligibility guidelines between states.
Evelyn Siu, Data Science – Researcher, Garner Health
Ellie Jackson,Software Engineer, Harvard Growth Lab
Sophie Logan, Data Scientist, Invenergy
SSI/SSDI POMS Translator
Assessing eligibility for the SSI and SSDI disability benefits programs often requires years of experience working with a complicated interweaving set of regulations, operations manuals, and rulings. Rules as Code systems can help translate these legal documents into clear logical models that potential beneficiaries can measure themselves against to understand eligibility. However, Rules as Code systems require careful configuration and can be burdensome to create manually. Our team uses manually created, legally informed Rules as Code systems as templates for Generative AI systems to create Rules as Code for SSI/SSDI eligibility rules, exploring basic eligibility rules, pre-medical eligibility, and medical eligibility criteria while attempting to contribute to the existing Rules as Code ecosystem, particularly around disability benefits.
Nicholas Burka, Data Engineer/Data Architect, Beneficial Computing Inc.
David Xu, Engineer, Civic Tech Toronto
Marc Raifman, Benefits Lawyer
Angela Zhou, Assistant Professor. Data Sciences and Operations, University of Southern California (USC)
Team ImPROMPTu
Converting complex public benefit rules into plain language or working software to support the administration of public benefit programs can be a complex, lengthy, and costly process. Errors in this process can result in a number of issues including eligible recipients being denied benefits, and inaccurate benefit amounts being calculated. We believe that policy SMEs will always be the arbiter of correctness of system implementation. Thus we should give them tools that guide them in describing the essence of the policies they are expert on for those who are expert in implementation.
We propose a prototype effort to test the use of a Domain Specific Language (DSL) for public benefit programs that enables subject matter experts (SMEs) to accurately communicate their requirements in depth to the teams implementing software to help administer those programs. This prototype effort would test the efficacy of using a simple DSL to better guide the output of an LLM into an intermediate format that can be easily converted to plain language or working software code, whichever is desired. This approach would appropriately place program SMEs as the authoritative source of information used by others to implement program rules as software code or in plain language.
Paul Smith, Chief Technologist, Ad Hoc, LLC
Deirdre Holub, Director of Engineering, Ad Hoc LLC
Greta Opzt, Senior Data Analyst, Ad Hoc LLC
Mark Headd, Senior Director of Technology, Ad Hoc LLC
Alex Mack, Director of Research, Ad Hoc LLC
Ben Kutil, CTO, Ad Hoc LLC
Clara Abdurazak, Strategic Partnerships, Ad Hoc LLC
Corey Kolb, Senior UX Designer, Ad Hoc LLC
John French, Experience Director, Ad Hoc LLC
Patrick Saxton, Director of Engineering, Ad Hoc LLC
Wryen Meek, Product Lead, Ad Hoc LLC
Tech2i Trailblazers
Navigating the complexities of Medicaid and the Children’s Health Insurance Program (CHIP) can be challenging for many beneficiaries. To address this, we are introducing an innovative AI-powered solution designed to streamline access to essential healthcare information. Our advanced AI technology provides concise summaries of Medicaid/CHIP policies and features an intelligent chatbot to assist users with their inquiries, ensuring they receive accurate and timely information.
Comprehensive Summarization: This AI system scans vast amounts of Medicaid/CHIP documentation to generate easy-to-understand summaries. Summaries are tailored to address common questions and concerns, reducing the need for beneficiaries to sift through extensive materials.
Interactive Chatbot Assistance: The AI-powered chatbot offers real-time assistance, guiding users through their Medicaid/CHIP queries. Capable of answering a wide range of questions, from eligibility criteria to benefits details and application processes. Provides step-by-step support for common procedures, ensuring users feel confident and informed.
User-Friendly Interface: Designed with accessibility in mind, our interface ensures that users of all ages and technical backgrounds can easily navigate the system.
We envision a future where every Medicaid and CHIP beneficiary has effortless access to the information they need to make informed healthcare decisions. By leveraging cutting-edge AI technology, we aim to demystify healthcare programs, empower users, and ultimately improve public health outcomes.
Sunil Dabbiru, Data Engineer/Data Architect, Tech2i
Bahodir Nigmat, Data Architect, Tech2i
Harshit Raj, Data Engineer, Tech2i
Sergey Tsoy, Solution Architect, Tech2i
Farouk Boudjemaa, Engineer, Tech2i
Truviq
Truviq’s project focuses on utilizing AI technology to develop prototypes for converting government policies into online applications. This aims to reduce the manual and error-prone process of translating policies into code, improving compliance, security, and operational efficiency. By merging policy-to-code approaches with GenAI capabilities, the team aims to shorten the time between policy definition and implementation.
Suneel Sundaraneedi, CEO, Truviq Systems
Winfried Hamer, Customer Success & AI Evangelist, Dutch UWV
Chaitanya PVK, Lead Architect, Truviq Systems
Coaches
The Policy2Code teams are receiving multidisciplinary and cross-sector advice from our four coaches.
- Harold Moore, Director, The Opportunity Project for Cities and Senior Technical Advisor at Beeck Center for Social Impact + Innovation
- Michael Ribar, Principal Associate for Social Policy and Economics Research at Westat
- Andrew Gamino-Cheong, Co-Founder and CTO, Trustible, an AI-Governance Software Company
- Meg Young, PhD, Participatory Methods Researcher, Algorithmic Impact Methods (AIM) Lab at Data & Society Research Institute
Data Use and Code Requirements
- All code should be well-documented.
- All code and documentation used in the prototypes will be open and available for viewing by the public at the completion of the challenge. It may also be incorporated into Georgetown University research.
- Only use policy documentation that is publicly available unless you have permission to share internal documentation.
- Ensure that you have the rights or permission to use the data or code in your prototype.
- Ensure that no personally identifiable information (PII) is shared via your prototype or documentation.
- Please only use a sandbox environment, not production systems.
How Does It Work?
1. Program Launch
Join us for the Policy2Code launch webinar on May 3, 2024 from 1-2pm ET. We’ll provide a full overview of the program and leave time for questions. Questions and answers will be shared on this page following the webinar.
2. Apply
To participate in the Policy2Code Challenge, individuals or teams must complete an application form. The deadline for submission is May 22, 2024. The DBN and MDI will select up to 10 teams to join the challenge.
3. Announcement of Participants
We will announce selected teams the week of June 3, 2024.
4. Monthly Touchpoints
Once selected, participants will have approximately three months to complete their testing.
Throughout the summer, we’ll host monthly touchpoints to share progress, solicit feedback from fellow participants, and receive support from coaches to help problem-solve issues with experiments. Teams are asked to join at least 2 of the 3 monthly touchpoints.
Proposed dates:
- June 25, 2024, 12 noon to 1:30pm ET
- July 23, 2024, 12 noon to 1:30pm ET
- August 22, 2024, 11:30am to 1pm ET
5. Demo Day at BenCon 2024
The challenge will culminate in a Demo Day at the DBN’s signature event, BenCon 2024, scheduled for September 17-18 in Washington, D.C. Participants will present their experiments, prototypes, tinkering, and other developments to the community for feedback, awareness, and evaluation.
Participating teams will be invited for in person demos, and a virtual option will also be available. Each member of a demo team will be awarded a $500 honorarium in accordance with the Georgetown University honoraria policy and if their institution allows. The coaches and organizers will also award non-monetary prizes for categories such as best use of open tools, best public sector demo, community choice, design simplicity, etc.
6. Report Findings
In early 2025, the DBN and MDI will co-author and publish a report summarizing challenge activities, documenting findings from the prototypes, and highlighting the suitability of LLM frameworks and usage. Building on findings from the challenge, the public report will also recommend next steps for using cross-sector, multidisciplinary approaches to scaling and standardizing how benefits eligibility rules are written, tested, and implemented in digital systems for eligibility screening, program enrollment, and policy analysis.
Why Participate?
Understand the Application of Generative AI in Policy Implementation
Gain insights into how generative artificial intelligence technologies, specifically Large Language Models (LLMs), can be utilized to translate policy documents into plain language logic models and software code under the Rules as Code (RaC) approach.
Explore the Benefits and Limitations of Generative AI in Policy Translation
Analyze the potential benefits and limitations of using generative AI tools like LLMs for converting U.S. public benefits policies into software code, plain language logic models, and for translating between coding languages, with an emphasis on efficiency and accuracy.
Collaborate Across Disciplines for Policy Digitaization
Discover the importance of cross-disciplinary collaboration between technologists, policy specialists, and practitioners across sectors in developing innovative solutions to translate policies into software code and logic models, applying the latest technological advancements.
Enhance Problem-solving Skills Through Experiments
Engage in hands-on experimentation to test scenarios related to policy digitization, such as converting legacy code systems for benefits eligibility, and recommend additional scenarios that can enhance policy implementation through digitization.
Apply RaC Approach in Real-world Applications
Gain practical experience in utilizing the Rules as Code approach to scale and standardize how benefits eligibility rules are written, tested, and implemented in digital systems, leading to a deeper understanding of digital service delivery.
Matchmaking
Are you looking for team members?
We ran an open matchmaking process ahead of the application deadline. We have taken down this information since the application period has closed.
Tools & Resources
We have collected tools and resources that may be useful in prototyping. Listing of a tool does not designate endorsement. If you’d like to recommend a tool or resource be added to the list, please send us a note at rulesascode@georgetown.edu.
Digital Benefits Hub
The Digitizing Policy + Rules as Code page of the Digital Government Hub hosts numerous resources including example projects, case studies, demo videos, research, and more related to Rules as Code.
The Automation + AI page of the Digital Benefits Hub hosts resources on how to responsibly, ethically, and equitably use artificial intelligence and automation, with strong consideration given to mitigating harms and bias.
Rules Frameworks
LLMs + Tools
Policy Manuals
Questions & Answers
Please see below for questions asked by potential participants and answers from the DBN and MDI Team. Please send any additional questions to rulesscode@georgetown.edu.
Question 1: I have a question regarding the source material. The website provides links to SNAP manuals and state websites. What should participants consider as the authoritative source for the rules to be used by the generative AI tools? Should it be from an official government website or a third-party source?
Answer 1: All of the mentioned sources are considered acceptable for the Policy2Code challenge. Many nonprofits and private sector organizations develop tools based on these rules, usually referencing original government documents.
There are some national policies that come from the U.S. Department of Agriculture, the Food and Nutrition Service (FNS). For SNAP policies, the state policy manuals are considered the most reliable source. These manuals can be found in various formats, including dynamic websites and PDFs. [To see examples of state rules, you can refer to the report authored by DBN-MDI: “Exploring Rules Communication: Moving Beyond Static Documents to Standardized Code for U.S. Public Benefits Programs”]
The SNAP policy manuals link listed on our Policy2Code website references a resource created by the Center on Budget and Policy Priorities. It provides comprehensive access to state SNAP websites and policy manuals. However, if you need information on programs like WIC in a specific state, a simple Google search will yield relevant results. It’s worth noting that the quality and availability of the sources may vary.
Question 2: Will the coaches offer constructive criticism on the social potential for harm of these projects, or will they be focused primarily on technical assistance?
Answer 2: We have not yet publicly announced our coaches for the Policy2Code challenge, but we are actively working on assembling a team of multidisciplinary experts. These experts will go beyond specific domains such as technology, policy, AI, or AI ethics. We are considering cross-disciplinary support to ensure comprehensive guidance. At the DBN and MDI, we are deeply focused on understanding the civil rights implications of the tools being developed. We recognize that people’s livelihoods and lives are significantly impacted by benefit systems, particularly with the introduction of AI and automation. Therefore, we are extremely cautious and strive to raise awareness about these implications.
Part of the purpose of the sandbox nature of this work is to showcase where these tools excel and where they may potentially cause harm. By experimenting away from real-life systems currently in use, we can better understand their performance and impact. Additionally, ethical considerations play a pivotal role in our approach. Comparing different approaches, such as open source versus closed source, can shed light on ethical considerations, biases, and better understanding of the tools. It is worth sharing these findings as part of your project’s output.
Question 3: Could you elaborate on the output and deliverable and criteria for success?
Answer 3: We are really interested in trying to create a holistic setup to understand what works well and what doesn’t when it comes to large language models and rules as code. There are various areas where your project can make an impact, and we don’t expect you to figure out every aspect and provide a complete system. Instead, we encourage you to focus on a specific slice that interests you.
For example, if you’re not as experienced in coding, you could explore the types of prompts that are effective for generating specific types of code using chatbots. On the other hand, if you excel in working with large language models, you might want to fine-tune them for SNAP rules or rules with specific properties. You could also investigate how to evaluate the performance of language models.
We have deliberately kept the task formulation broad, allowing you to choose a specific task that aligns with our overarching goal. The role of coaches and check-ins is to ensure that your chosen task fits the overall objective.
In summary, you have a task, you have an experiment, and you have results from that experiment. And then perhaps, you can explore some ethical considerations or policy considerations that are connected to your output.
Question 4: What are the expected actions for participants in this challenge? Are they expected to examine potential use cases and scenarios for applying AI and LLMs in information collection, decision-making, and the execution of public policy?
Answer 4: The challenge is focused on three core scenarios. The first scenario involves translating policy into software code and plain language, while the second scenario deals with translating code between different programming languages. The third scenario involves extracting code from legacy systems or transitioning from one code language to another for system integration purposes. While we are open to hearing ideas for other scenarios, our core research is focused on these three.
We are limiting the scope of the challenge to policy-focused scenarios specific to U.S. public benefit systems. While we are considering other programs, we are primarily concentrating on the U.S. context, as we are aware that compared to international colleagues, the Rules as Code concept is less advanced in the U.S. We hope to use this challenge to close that gap and produce new evidence and documentation on the potential benefits of implementing Rules as Code in the U.S.
Question 5: What is the criteria for selecting teams?
Answer 5: We do not plan to release a public-facing rubric. We have tried to keep the application form as low-burden as possible. And knowing that these are prototypes and experiments, and they’re going to change over time.
We would like to see teams with diverse expertise. We’re very interested in teams that consider public sector or benefits perspectives, either included within a prototyping team, or how you would engage those stakeholders directly.
We’ve asked for a description of your intended prototype and how you plan to test for it. What technologies are you thinking about using? How are you mitigating bias and harms potentially in that system? How are you thinking about who is impacted by a potential system?
We’re also wanting to make sure that we have a range of solutions and experiments. So we’re also looking at diversity of experiments. So that we’re able to learn more as a community. When developing your prototype, it is important not to scope it too large. Keep it within a specific scope to ensure you can complete the project successfully.
About the Host Organizations
This challenge is organized by the Digital Benefits Network at the Beeck Center for Social Impact and Innovation in partnership with the Massive Data Institute, both based at Georgetown University in Washington, D.C.
The Digital Benefits Network (DBN) supports government in delivering public benefits services and technology that are accessible, effective, and equitable in order to ultimately increase economic opportunity. We explore current and near term challenges, while also creating space to envision future policies, services, and technologies. The DBN’s Rules as Code Community of Practice (RaC CoP) creates a shared learning and exchange space for people working on public benefits eligibility and enrollment systems — especially those tackling the issue of how policy becomes software code. The RaC CoP brings together cross-sector experts who share approaches, examples, and challenges.
At Georgetown’s McCourt School of Public Policy, the Massive Data Institute (MDI) is an interdisciplinary research institute that connects experts across computer science, data science, public health, public policy, and social science to tackle societal scale issues and impact public policy in ways that improves people’s lives through responsible evidence-based research.
Join the Rules as Code Community of Practice
Learn more about Rules as Code and join our community of practice. If you have further questions, email us at rulesascode@georgetown.edu.