Protecting Privacy in Libraries as AI Adoption Accelerates

By: Anusha Nasrulai

Like picking what movie to watch, what restaurant to eat at, or where to go on vacation, what we read next is often recommended to us by personalization algorithms. Social media or reading platforms such as Goodreads already process user data to generate recommended content. Recently, the library catalog browsing app, Libby, announced its own book recommendation feature, Inspire Me.

Inspire Me by Libby recommends users’ books based on their own prompts or previously saved titles in the app. Originally announced as an optional feature, Inspire Me features prominently at the top of the home screen when users open the Libby App. The feature recommends books available through the catalogs of the libraries which users have linked accounts with. When the feature was first announced, users and libraries showed resistance, voicing concerns about forced AI adoption and diminished patron privacy. OverDrive, the parent company of Libby, states that readers’ personally identifying data and reading activity are not provided to the AI model.

Libraries work with vendor platforms, distributors, and publishers to deliver library services, particularly for e-materials. Despite popular backlash, vendors are expanding development of AI integrations. OverDrive CEO Steve Potash has announced goals to use AI to “match users to content across its platforms,” which also include streaming platform Kanopy, and k-12 education platform Sora. Other subscription vendor companies, such as OCLC, EBSCO, and Clarivate, have introduced AI features for content recommendation, enhanced search, text summaries, and AI-generated research assistants. Beyond externally marketed AI tools, vendors are incorporating AI into their internal workflows for “building, improving, and refining products.” Libraries now are finding the balance between their duty to protect patron data and privacy and providing access to digital resources.

Legal Regulations

The integration of AI by vendor platforms poses new privacy considerations for libraries. AI introduces new risk points at data collection, processing, training, and deployment.

The United States currently has no comprehensive AI or data privacy laws. Instead, states have passed dozens of laws regulating certain AI use cases. As of now, 6 states have passed cross-sectoral AI governance laws that apply to commercial entities. Vendors are likely subject to state-level AI and data privacy laws that target commercial entities. Libraries can leverage legal regulations to negotiate with vendors for stronger privacy protections. Trends in AI regulations show that states are increasingly passing and updating AI legislation amid legal challenges and an absence of federal regulation

AI Governance and Contracting

In light of legal uncertainties, contracts and licenses are a key opportunity for imposing guardrails on AI use. These agreements address how vendors and third parties can collect, process, and disclose user data.

More often, vendor agreements will not explicitly disclose internal use of AI tools or AI model training. Research and policy organization, Library Futures, and staff attorney, Layla Maurer, presented on this issue, flagging that broad language around operational mechanisms and data usage may permit vendors to train and deploy AI models using patron and institutional data. When reviewing vendor contracts for AI usage, libraries should focus on:

  • Vendor’s rights around data use and sharing, including with third parties. Use of patron data for “analytics” or “development and improvement of services” may include AI training.
  • References to third-party applications or tools, processors, or contractors necessary to carry out services under the agreement.
  • Whether there is a defined data retention period and what happens to patron data when the contract ends

Libraries can strengthen contract terms by including language requiring compliance with applicable federal and state laws, as well as with industry standards such as ISO and NIST. In addition, libraries may negotiate with vendors to:

  • Define user rights to data, including the right to opt out of nonessential data collection and the right to delete their data.
  • Limit secondary uses of data, including for training internal or external AI tools
  • Disclose third party partners and whether data is shared or sold to third parties
  • Conduct privacy and security audits
  • Establish a data retention period and protocol for destroying data at the end of the retention period

As said by attorney Layla Maurer, “Updating contract language to allow flexibility around software development needs while retaining safeguards for what the licensee… wants to protect is not just an expeditious way to reach an agreement with a software vendor, it’s also a strategy that helps ensure the licensee can continue to safely use the software despite future legislative changes provided the vendor updates their software in a manner consistent with the intent of the legislation.”

Future-proofing

Digital lending and services are a popular means of accessing materials from libraries, but at the same time, raise new challenges for protecting patron privacy. Therefore, as AI becomes embedded in services, libraries need to adopt AI guardrails in contracting to manage the harms and opportunities related to AI use in libraries, particularly around privacy.

E-Lending Challenges and Libraries’ Mission to Ensure Information Access for All

By: Anusha Seyed Nasrulai

Library services have transformed from being primarily administered in the physical library space to providing library card holders with access to a broad range of digital materials, including ebooks, audiobooks, research, music, film, and more. When digital materials first entered the market, they posed great opportunities to increase the availability and accessibility of library collections. Libraries have adjusted their acquisitions and curation efforts to accommodate an increased demand for digital materials. At the same time, publishers and vendors have repackaged their products to drive profits in response to the demand by raising ebook costs to exorbitant rates. Libraries are “typically required to pay 3–4 times the consumer price for an ebook or audiobook license of a popular title.” Also, many publishers have replaced perpetual licenses with time limited licenses. Publishers further control the market by restricting “how many copies libraries can have, who they can lend to, and how long they (and their patrons) can keep the books.” This has led to library budgets being consumed by licensing costs.

The e-lending marketplace presents multiple challenges to libraries’ longstanding commitment to ensure access to information for all. Digital materials are many patrons’ primary method of accessing information. For example, digital formats are essential resources to patrons with “vision impairment, dyslexia, and other physical or learning needs.”

Libraries are at the whim of the power wielded by vendors controlling access to vital digital materials. About five companies control publishing and dominate the industry for licensing digital materials to libraries. Some companies have business enterprises beyond academic information, including the use and sale of personal and financial information. Thomas Reuters and RELX Group (parent companies of Westlaw and LexisNexis) not only dominate the legal research market, but they also own some of the largest news and academic databases and are data brokers that sell to private entities and law enforcement agencies. Sarah Lamdan, former CUNY law librarian and professor, now ALA director, described the digital information market landscape as a monopoly of information markets, which raises significant ethical and privacy concerns.

Libraries’ Respond to Market Shifts 

The rest of this article examines the implications for the market shift to digital materials for libraries and their patrons, focusing on ownership rights, open source projects, and patron privacy. In response to vendors’ overwhelming control of the digital information marketplace, libraries and researchers are developing solutions to ensure information access for all.

Ownership Rights

Libraries hold ownership rights and control lending access over physical books by the right to first sale. The “first sale” doctrine (17 U.S.C. § 109(a)) “gives the owners of copyrighted works the rights to sell, lend, or share their copies without having to obtain permission or pay fees.” However, this ownership doctrine does not control digital transmissions— including ebook acquisitions. Publishers create license agreements in partnership with vendors, who then license them to libraries. Margaret Chon, Law Professor at Seattle University, argues that high prices and restrictive lending practices undermine the special position libraries have historically held in the copyright system as institutions protecting and facilitating public access to copyrighted works.

Without copyright reform, libraries are often at the behest of vendors’ licensing models. In response, libraries have developed comprehensive strategies to negotiate with vendor providers and select vendors that align with their mission. Still, “the contract-law focused world of copyright for digital content is much more heavily weighted to the benefit of publishers and to the greatest extent possible.” Therefore, libraries have sought legal reforms as one of the solutions to address the modern digital information marketplace. 

ReadersFirst is an organization of almost 300 libraries dedicated to libraries maintaining open and free access to ebooks as collections are increasingly digitized. ReadersFirst advocates for ebook legislation to prevent content restrictions, prohibitively high prices for licenses, and using licenses to excise important copyright law, such as Fair Use. This past summer, Connecticut passed an ebook bill and other states have introduced similar legislation. This bill will be carefully watched after similar legislation in Maryland and New York have been undone by copyright challenges. 

Open Source Projects

During the COVID-19 pandemic, Internet Archive launched the National Emergency Library (NEL). NEL was a continuation of a previous online project where scans of physical library books were “checked out” to people as though they were physical books. In Hachette v. Internet Archive, publishers successfully challenged NEL’s temporary lifting of the one-person-limit on lending. Though this case did not involve a traditional library, it does call into question whether controlled digital lending practices by libraries are vulnerable. 

To protect library projects that expand access to digital materials, new industry standards are being proposed. Controlled digital lending (CDL) protections allow libraries to lend, preserve, and archive digital materials. Currently, a new NISO consensus framework is being developed to support CDL in libraries, with the goal of expanding “understanding of CDL as a natural extension of existing rights held and practices undertaken by libraries for content they legally hold.”

The ability to curate and share open source resources further libraries’ goal to ensure information access for all. An important example of library open source projects are research guides. Research guides are collections of high quality and relevant resources on a given topic from books. Resources included articles, books, media, databases, special collections, exhibits, and programs. Kara Phillips, director of the Seattle University Law Library, stated that research guides “respond to important issues so that patrons can find reliable, authoritative information… [to] support democracy, rule of law, and the legal system.” 

Patron Privacy

As vendors adapt to the competitive digital information marketplace, the change in business models has increased their appetite for patron data. As Roxanne Shirazi, a research librarian at CUNY, puts it, “[a]s lenders, library vendors do not end their relationships with libraries when they complete a sale. Instead, as streaming content providers, vendors become embedded in libraries. They are able to follow library patrons’ research activities, storing data about how people are using their services.”

There are only a handful of states that protect readers’ data outside of libraries. For example, the California’s Reader Privacy Act safeguards readers’ data when accessing physical books or ebooks. Therefore, ensuring patron privacy and holding vendors accountable to ALA privacy standards are central to libraries’ mission.

The Path Forward for Libraries

Librarians and other stakeholders are organizing to address the profound problems that have arisen from changes in the e-lending market. In providing guidance regarding digital access, the American Library Association states, “[i]n order to have a functional democracy, we must have informed citizens. Libraries are an essential part of the national information infrastructure, providing people with access and opportunities for participation in the digital environment, especially those who might otherwise be excluded.”

The Freedom to Inquire: Data Privacy Lessons from Libraries

By: Anusha Seyed Nasrulai

“All people, regardless of origin, age, background, or views, possess a right to privacy and confidentiality in their library use. Libraries should advocate for, educate about, and protect people’s privacy, safeguarding all library use data, including personally identifiable information.”

These are the words enshrined in the last article of the American Library Association’s (ALA) Library Bill of Rights. The ALA first adopted principles protecting the freedom of inquiry in 1939 in response to concerns of government censorship and surveillance amid a moral panic against anarchists. In subsequent decades, the Library Bill of Rights was amended and interpreted to champion intellectual freedom during eras like McCarthyism, the Civil Rights Movement, and post-9/11. 

The Legal Right to Data Privacy 

Recognition of the freedom of inquiry in libraries also developed at the same time as a legal right to privacy was being conceptualized. In 1890, lawyers Samuel Warren and future Supreme Court Justice Louis Brandeis first defined a legal right to privacy in a famous law review article. Still, a legal right to privacy was not widely recognized till 1965 in Griswold v. Connecticut. There is currently no comprehensive federal data privacy law, resulting in a patchwork of sectoral and state data privacy laws. However, the libraries’ privacy principles obligate libraries to expand the privacy rights afforded to patrons beyond what the law requires. Examining libraries’ data privacy principles offers important lessons for envisioning new legal data privacy frameworks.

Libraries’ responsibility to protect patron privacy and confidentiality is, in fact, recognized by the law. Forty-eight states protect the confidentiality of patron records, and the attorney generals in the other two states have recognized the privacy of patrons’ library records. 

Libraries’ Approach to Data Privacy 

Precise definitions are required to understand these principles. For libraries, the right to “privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.” Confidentiality is the libraries’ duty to keep personally identifiable information private on patrons’ behalf. Personally Identifiable Information (PII) is information that can be used to identify a specific person.

Data Privacy Policies 

Only 19 states have passed comprehensive privacy laws. Rights recognized under state laws may include the right to request data for correction or deletion, the right to opt out of certain processing and sales, the prohibition on discrimination for exercising rights under the law, notice and transparency requirements, and data purpose and processing limitations. The state laws typically only apply to for-profit businesses that meet high thresholds for gross revenue and amount of business activity in the state. Whereas library policies protect patron data from private and government requests. State laws are also limited by their enforcement mechanisms. Many state privacy laws rely on the enforcement of attorney generals rather than create a private right of action.

In addition to complying with privacy laws, library privacy policies are developed with guidance from the ALA’s Privacy Interpretation of the Library Bill of Rights and NISO Consensus Principles on Users’ Digital Privacy in Library, Publisher, and Software-Provider Systems. Libraries have a duty to create and maintain clear, easily accessible, and understandable privacy policies for all patrons. Privacy policies include information on what data is collected, who the data is shared with, and how long the data is retained for. PII should only be collected and stored when required for specific, clearly disclosed purposes and only with the patron’s consent. Users should have the right to access their own personal information or activity data for review, export, and request correction or deletion. Libraries should process these requests wherever operationally feasible.

Libraries practice data minimization, meaning libraries only collect personal data necessary for an operational purpose. Libraries default to practices such as purpose limitation and opting users out of nonessential data collections. Patrons should have an opportunity to give explicit consent so they can make an informed decision whether to agree with the collection of their data for nonessential purposes. Patrons should also be able to opt out at any time. For instance, some libraries offer patrons to opt in to a saved history of their checked-out books, otherwise, this data is deleted by default.

Libraries’ privacy policies often reflect a deep commitment to patron trust. As Mustafa Hassoun, a privacy attorney at Hillis Clark Martin & Peterson, noted, “Libraries always strive to do right by their patrons.” He works with libraries across Washington state and emphasized that “this commitment to patron trust and data stewardship continues even in the absence of broader legislation like the People’s Privacy Act, which would significantly expand data protection requirements in Washington.”

Vendor Partners 

Libraries aim to hold vendor partners, such as publishers and software providers, accountable to their data privacy principles where possible. Vendors are obligated to make their data use policies accessible to patrons. Libraries also carefully consider patrons’ privacy before entering data sharing agreements with vendors. The ALA’s Privacy Interpretation guides libraries to never share patron’s PII with vendors unless they have explicit patron permission or are required to under law or existing contract. When such information is shared, “any data collected for analysis should be anonymous or aggregated, it should never be linked to personal information.” Finally, when procuring new technologies, “[b]iometric technologies, like facial recognition, do not align with the library’s mission of facilitating access without unjust surveillance.”

The library community has developed processes and resources to negotiate contracts that align with their privacy principles. This is significant given that readers often lack clarity into how vendors use their data. Also, vendor partners may have great incentives to collect and aggregate as much user data as possible.

Complying with Law Enforcement 

The ALA guides library workers to consult with their library administration and legal counsel before complying with law enforcement. Records are to be shared only in response to a properly executed court order or legal process. “If a library worker is compelled to release information by a valid subpoena or court order,” they are instructed to personally retrieve the requested information rather than “allowing the law enforcement agency to perform its own retrieval [which] may compromise confidential information that is not subject to the current request.” 

Libraries have chosen to strictly comply with the boundaries of the law to balance the strong interest of protecting patron privacy while complying with legal orders. As Jonathan Franklin, a Digital Innovation Law Librarian at the University of Washington, puts it, “In a world where all data is seen as having value, it might be that the easiest path is to delete nothing and sell/use everything, so protecting privacy over profits takes extra-effort.” Companies or other entities may have different incentives for more broadly collaborating with law enforcement. Companies like Ring, Flock, and many others are directly partnering with law enforcement to share data that facilitates surveillance of customers and the broader public.

Looking Forward: Lessons and Challenges

Libraries provide important insights regarding how to enact data privacy principles and policies that champion people’s freedom of intellectual exploration and expression. As data privacy law continues to develop and transform, these lessons from libraries exemplify how data privacy principles can be enacted to uphold people’s privacy and civil liberties.

The privacy ideals of libraries are constrained by the realities of limited resources and funding. One study found that libraries face significant challenges when upholding patron privacy due to lack of technical knowledge and training among staff, as well as inadequate funding for training or privacy protection tools. Many of the data privacy studies and resources developed by and for librarians are funded by the Institute of Museum and Library Services (IMLS) grants. The current administration is attempting to dismantle IMLS, though that is being challenged in court. Amid these pressures, libraries have an almost century-long tradition of protecting patron data from censorship and surveillance.

As C. Allison Sills, an instructional librarian in North Carolina, aptly stated, the “Invasion of privacy by retaining patron checkout history is tantamount to book banning. If you surveil the populace, the populace will start to self-censor to prevent ‘potential’ discrimination, which starts the fear cycle.”

Shelving the dream of an online library? Hachette v. Internet Archive goes to the Second Circuit

Photo by Perfecto Capucine on Pexels.com

By: Zachary Blinkinsop

The opening chapter: COVID-19 and the National Emergency Library.

With the COVID-19 lockdowns of early 2020 slamming library doors shut, students and researchers found themselves struggling to access critical educational materials. Libraries, like many institutions, scrambled to adapt to the unprecedented challenges posed by the pandemic. Many librarians responded by espousing the use of copyrighted materials in remote education and research. They cited the doctrine of fair use which protects certain unlicensed uses of copyrighted materials without permission from the rightsholder. Fair use can protect the use of copyrighted materials in a range of contexts, including in research, education, news reporting, and criticism.

The main character in today’s story, an online library, may have pushed the limits of fair use too far. Even before the pandemic, the Internet Archive ran a digital library in compliance with the principles of controlled digital lending. Controlled digital lending (CDL) is a novel legal framework that would permit libraries to digitize their physical books and to lend those digital copies in a manner analogous to traditional lending practices. Under the CDL framework, a library needs to maintain an “owned to loaned” ratio, lending only as many digital copies of an item as it legally owns. The legal theory of CDL had been largely untested, and legal scholars held a wide range of opinions about whether courts would broadly hold CDL to comport with fair use.

In March of 2020, the Internet Archive launched its National Emergency Library to support “emergency remote teaching, research activities, independent scholarship, and intellectual stimulation” while schools and libraries were closed due to the pandemic. It temporarily allowed multiple users to check out the same digital copy simultaneously, disregarding the “owned to loaned” ratio prescribed by the CDL framework. This sparked controversy.

The plot thickens: a lawsuit filed.

Publishers had already been taking aim at controlled digital lending programs. The Authors Guild argued that “copyright law does not support the practice of even true, traditional libraries offering unauthorized scans of books to its users on an e-lending basis…” The National Emergency Library’s flouting of CDL’s permissive framework crossed an implicit redline for publishers. In June of 2020, Hachette, HarperCollins, Wiley, and Penguin Random House sued the Internet Archive in the Southern District of New York for “willful mass copyright infringement.” In their complaint, publishers eviscerated the underpinnings of CDL, “the rules of which”, they wrote, “have been concocted from wholecloth and continue to get worse.”

In its response, the Internet Archive insisted that the National Emergency Library qualified under fair use as it offered a noncommercial, educational service to the public during a national emergency. It further maintained that a digital library should be treated like a traditional library: “Contrary to the publishers’ accusations, the Internet Archive and the hundreds of libraries and archives that support it are not pirates or thieves. They are librarians, striving to serve their patrons online just as they have done for centuries in the brick-and-mortar world.”

The future of controlled digital lending and the viability of online libraries was at stake in the case.

How does fair use apply to controlled digital lending?

Section 107 of the Copyright Act directs courts to consider four factors when evaluating a fair use defense to a claim of copyright infringement. A court must balance (1) the purpose of and character of the use, including whether it innovates in any way and whether it is for a commercial or non-profit purpose; (2) the nature of the copyrighted work; (3) the amount of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the market value of the copyrighted work. Courts adjudicate fair use claims on a case-by-case basis; an activity that qualifies as fair use in one set of circumstances may not qualify under a different set of facts.

Capitol Records v. ReDigi, a case decided in 2013, foreshadowed the outcome of Hachette v. Internet Archive. ReDigi was a service that facilitated the resale of digital files originally purchased from the iTunes store. In that case, the court held that ReDigi’s resale of digital music files fell “well outside the fair use defense.” Running through the four-step test, the court found that (1) uploading and distributing digital files for sale does not add anything new to a copyrighted work; (2) copyright protections are intended to protect musical recordings; (3) transmitting a work in its entirety usually negates a fair use defense; and (4) ReDigi’s sales obviously undercut Capitol Records’ profits.

Although ReDigi’s marketplace was commercial in nature, an obvious difference from the nonprofit intent of the National Emergency Library, the other facts broadly aligned. The National Emergency Library arguably did not innovate the use of copyrighted books. Copyright protections clearly protect rightsholders’ interests in published books. The books offered through the National Emergency Library were transmitted in whole, and this arguably undercut the publishers’ profits from ebook sales.

An open-and-shut case? The Second Circuit enters the plot.

Judge John G. Koeltl held that the Internet Archive’s National Emergency Library failed all four factors of the fair use test. He wrote in his opinion that “IA’s fair use defense rests on the notion that lawfully acquiring a copyrighted print book entitles the recipient to make an unauthorized copy and distribute it in place of the print book, so long as it does not simultaneously lend the print book. But no case or legal principle supports that notion. Every authority points the other direction.” The opinion was a resounding victory for publishers.

The Internet Archive promised to continue fighting. The founder of the Internet Archive, Brewster Kahle, framed the case as a battle for free access to information within a wider war for global democracy: “Libraries are more than the customer service departments for corporate database products. For democracy to thrive at global scale, libraries must be able to sustain their historic role in society—owning, preserving, and lending books.”

On December 15, 2023, the Internet Archive filed its opening brief to the U.S. Court of Appeals for the Second Circuit. In the brief, the Internet Archive asks the Court to reverse the lower court’s decision and to hold that its controlled digital lending is fair use. The Internet Archive is arguing that the lower court erred in applying the four-factor test because the court “failed to grasp the key feature of controlled digital lending: the digital copy is available only to the one person entitled to borrow it at a time, just like lending a print book.” The Internet Archive says that the court’s misunderstanding particularly tainted its analysis of the first and fourth factors. For example, it argues that the court’s analysis of the fourth factor did not take into account expert testimony indicating that “lending is not a substitute for Publishers’ ebooks and has no effect on Publishers’ markets.”

The Second Circuit’s decision in this case will shape the future of controlled digital lending and the ongoing debate surrounding fair use and access to information in the digital era. Librarians, publishers, and legal scholars will be watching closely, waiting for the next major development in the application of free use to a rapidly evolving digital world.

Stay tuned for the next chapter in this story.