Data, Responsibly

PDF Posté par : Nozha Boujemaa

Big data technology promises to improve people’s lives, accelerate scientific discovery and innov- ation, and bring about positive societal change. Yet, if not used responsibly, large-scale data analysis and data-driven algorithmic decision-making can increase economic inequality, affirm systemic bias, and even destabilize global markets. While the potential benefits of data analysis techniques are well accepted, the importance of using them responsibly – that is, in accordance with ethical and moral norms, and with legal and policy considerations – is not yet part of the mainstream research agenda in computer science. Dagstuhl Seminar “Data, Responsibly” brought together academic and industry researchers from several areas of computer science, including a broad representation of data management, but also data mining, security/privacy, and computer networks, as well as social sciences researchers, data journalists, and those active in government think-tanks and policy initiatives. The goals of the seminar were to assess the state of data analysis in terms of fairness, transparency and diversity, identify new research challenges, and derive an agenda for computer science research and education efforts in responsible data analysis and use. While the topic of the seminar is transdisciplinary in nature, an important goal of the seminar was to identify opportunities for high-impact contributions to this important emergent area specifically from the data management community.

Crying Wolf? On the Price Discrimination of Online Airline Tickets

PDF Posté par : Nozha Boujemaa

Price discrimination refers to the practice of dynamically varying the prices of goods based on a customer’s purchasing power and willingness to pay. In this paper, motivated by several anecdotal ac- counts, we report on a three-week experiment, conducted in search of price discrimination in airline tickets. Despite presenting the companies with multiple opportunities for discriminating us, and contrary to our expectations, we do not find any evidence for systematic price discrimi- nation. At the same time, we witness the highly volatile prices of certain airlines which make it hard to establish cause and effect. Finally, we provide alternative explanations for the observed price differences.

Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems

PDF Posté par : Daniel Le Métayer

Algorithmic systems that employ machine learning play an increasing role in making substantive decisions in modern society, ranging from online personalization to insurance and credit decisions to predictive policing. But their decision-making processes are often opaque—it is difficult to explain why a certain decision was made. We develop a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of influence of inputs on outputs of systems. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals (e.g., a loan decision) and groups (e.g., disparate impact based on gender). Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the marginal influence of individual inputs within such a set (e.g., income). Since a single input may be part of multiple influential sets, the average marginal influence of the input is computed using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting. Further, since transparency reports could compromise privacy, we explore the transparency-privacy tradeoff and prove that a number of useful transparency reports can be made differentially private with very little addition of noise. Our empirical validation with standard machine learning algo- rithms demonstrates that QII measures are a useful transparency mechanism when black box access to the learning system is available. In particular, they provide better explanations than standard associative measures for a host of scenarios that we consider. Further, we show that in the situations we consider, QII is efficiently approximable and can be made differentially private while preserving accuracy.

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

PDF Posté par : Nozha Boujemaa

We present Sunlight, a system that detects the causes of target- ing phenomena on the web – such as personalized advertisements, recommendations, or content – at large scale and with solid statisti- cal confidence. Today’s web is growing increasingly complex and impenetrable as myriad of services collect, analyze, use, and ex- change users’ personal information. No one can tell who has what data, for what purposes they are using it, and how those uses affect the users. The few studies that exist reveal problematic effects – such as discriminatory pricing and advertising – but they are either too small-scale to generalize or lack formal assessments of confi- dence in the results, making them difficult to trust or interpret. Sunlight brings a principled and scalable methodology to per- sonal data measurements by adapting well-established methods from statistics for the specific problem of targeting detection. Our method- ology formally separates different operations into four key phases: scalable hypothesis generation, interpretable hypothesis formation, statistical significance testing, and multiple testing correction. Each phase bears instantiations from multiple mechanisms from statis- tics, each making different assumptions and tradeoffs. Sunlight of- fers a modular design that allows exploration of this vast design space. We explore a portion of this space, thoroughly evaluating the tradeoffs both analytically and experimentally. Our exploration reveals subtle tensions between scalability and confidence. Sun- light’s default functioning strikes a balance to provide the first sys- tem that can diagnose targeting at fine granularity, at scale, and with solid statistical justification of its results. We showcase our system by running two measurement studies of targeting on the web, both the largest of their kind. Our studies – about ad targeting in Gmail and on the web – reveal statistically jus- tifiable evidence that contradicts two Google statements regarding the lack of targeting on sensitive and prohibited topics.

Cross-Device Tracking: Measurement and Disclosures

PDF Posté par : Daniel Le Métayer

Internet advertising and analytics technology companies are increasingly trying to find ways to link behavior across the various devices consumers own. This cross-device tracking can provide a more complete view into a consumer’s behavior and can be valuable for a range of purposes, including ad targeting, research, and conversion attribution. However, consumers may not be aware of how and how often their behavior is tracked across different devices. We designed this study to try to assess what information about cross-device tracking (including data flows and policy disclosures) is observ- able from the perspective of the end user. Our paper demonstrates how data that is routinely collected and shared online could be used by online third parties to track consumers across devices.

Automated Experiments on Ad Privacy Settings

PDF Posté par : Nozha Boujemaa

To partly address people’s concerns over web tracking, Google has created the Ad Settings webpage to provide information about and some choice over the profiles Google creates on users. We present AdFisher, an automated tool that explores how user behaviors, Google’s ads, and Ad Settings interact. AdFisher can run browser-based experiments and analyze data using machine learning and significance tests. Our tool uses a rigorous experimental design and statistical analysis to ensure the statistical soundness of our results. We use AdFisher to find that the Ad Settings was opaque about some features of a user’s profile, that it does provide some choice on ads, and that these choices can lead to seemingly discriminatory ads. In particular, we found that visiting webpages associated with substance abuse changed the ads shown but not the settings page. We also found that setting the gender to female resulted in getting fewer instances of an ad related to high paying jobs than setting it to male. We cannot determine who caused these findings due to our limited visibility into the ad ecosystem, which includes Google, advertisers, websites, and users. Nevertheless, these results can form the starting point for deeper investigations by either the companies themselves or by regulatory bodies.

Statement on Algorithmic Transparency and Accountability

PDF Posté par : Nozha Boujemaa

Computer algorithms are widely employed throughout our economy and society to make decisions that have far-reaching impacts, including their applications for education, access to credit, healthcare, and employment. The ubiquity of algorithms in our everyday lives is an important reason to focus on addressing challenges associated with the design and technical aspects of algorithms and preventing bias from the onset.

Big data : A Tool for Inclusion or Exclusion?

PDF Posté par : Nozha Boujemaa

We are in the era of big data. With a smartphone now in nearly every pocket, a computer in nearly every household, and an ever-increasing number of Internet-connected devices in the marketplace, the amount of consumer data flowing throughout the economy continues to increase rapidly. The analysis of this data is often valuable to companies and to consumers, as it can guide the development of new products and services, predict the preferences of individuals, help tailor services and opportunities, and guide individualized marketing. At the same time, advocates, academics, and others have raised concerns about whether certain uses of big data analytics may harm consumers, particularly low- income and underserved populations. To explore these issues, the Federal Trade Commission (“FTC” or “the Commission”) held a public workshop, Big Data: A Tool for Inclusion or Exclusion?, on September 15, 2014. The workshop brought together stakeholders to discuss both the potential of big data to create opportunities for consumers and to exclude them from such opportunities. The Commission has synthesized the information from the workshop, a prior FTC seminar on alternative scoring products, and recent research to create this report. Though “big data” encompasses a wide range of analytics, this report addresses only the commercial use of big data consisting of consumer information and focuses on the impact of big data on low-income and underserved populations. Of course, big data also raises a host of other important policy issues, such as notice, choice, and security, among others. Those, however, are not the primary focus of this report. As “little” data becomes “big” data, it goes through several phases. The life cycle of big data can be divided into four phases: (1) collection; (2) compilation and consolidation; (3) analysis; and (4) use. This report focuses on the fourth phase and discusses the benefits and risks created by the use of big data analytics; the consumer protection and equal opportunity laws that currently apply to big data; research in the field of big data; and lessons that companies should take from the research. Ultimately, this report is intended to educate businesses on important laws and research that are relevant to big data analytics and provide suggestions aimed at maximizing the benefits and minimizing its risks.

Learning to trust artificial intelligence systems

PDF Posté par : Nozha Boujemaa

For more than 100 years, we at IBM have been in the business of building machines designed to help improve the effectiveness and efficiency of people. And we’ve made measurable improvements to many of the systems that facilitate life on this planet. But we’ve never known a technology that can have a greater benefit to all of society than artificial intelligence. At IBM, we are guided by the term “augmented intelligence” rather than “artificial intelligence.” This vision of “AI” is the critical difference between systems that enhance, improve and scale human expertise, and those that attempt to replicate human intelligence. The ability of AI systems to transform vast amounts of complex, ambiguous information into insight has the potential to reveal long-held secrets and help solve some of the world’s most enduring problems. AI systems can potentially be used to help discover insights to treat disease, predict the weather, and manage the global economy. It is an undeniably powerful tool. And like all powerful tools, great care must be taken in its development and deployment. To reap the societal benefits of AI systems, we will first need to trust it. The right level of trust will be earned through repeated experience, in the same way we learn to trust that an ATM will register a deposit, or that an automobile will stop when the brake is applied. Put simply, we trust things that behave as we expect them to. But trust will also require a system of best practices that can help guide the safe and ethical management of AI systems including alignment with social norms and values; algorithmic responsibility; compliance with existing legislation and policy; assurance of the integrity of the data, algorithms and systems; and protection of privacy and personal information. We consider this paper to be part of the global conversation on the need for safe, ethical and socially beneficial management of AI systems. To facilitate this dialogue, we are in the process of building an active community of thoughtful, informed thinkers that can evolve the ideas herein. Because there is too much to gain from AI systems to let myth and misunderstanding steer us off our course. And while we don’t have all the answers yet, we’re confident that together we can address the concerns of the few to the benefit of many.

The Ethics of Algorithms: 
 from radical content to self-driving cars

PDF Posté par : Nozha Boujemaa

A new kind of object, intermediator, gate-keeper and more has risen: the algorithm, or the code that operates increasingly ubiquitous computational objects and governs digital environments. Computer chips and other forms of computation are not new; however, the increasing integration of digital connectivity in everyday life; the rise of massive amounts of datasets with personal, Jinancial and other kinds of information, and the rise in objects that have embedded chips have combined to create a new environment. This environment has been shaped by three developments: Advances especially in machine learning which allow artiJicial intelligence, with the help of big data, to perform tasks that were outside its reach just a few years ago; the rise of powerful platforms online such a Google, Amazon or Facebook that mediate social, political, personal and commercial interactions for billions of people and act as powerful gatekeepers; and the incorporation of algorithmic capabilities to other areas of decision-making ranging from hiring, Jiring and employment to healthcare, advertising, to Jinance and many others. In sum, algorithms are increasingly used to make decisions for us, about us, or with us. They are progressively capable and pervasive. They are now either main or auxiliary tools, or even sole decision-makers, in areas of life that either did not exist more than a decade ago (what updates and news should you be shown from your social network, as in Facebook’s Newsfeed) to traditional areas where decisions used to be made primarily via human judgment, such as health-care and employment. SigniJicantly, algorithms are rapidly encroaching into “subjective” decision-making where there is no right or wrong answer, or even a good deJinition of what a “right” answer would look like without much transparency, accountability or even a mapping out of the issues. The speed of technological developments, corporate and government incentives have overtaken and overshadowed the urgently needed discussion of ethics and accountability of this new decision-making infrastructure. The concerns that often bring us to thinking about algorithms are both historic and mundane: fairness, discrimination and power. Algorithms, and all complex computational systems, however, operate in ways that are a new category of objects compared with other institutions, persons or objects that have not been probed for such concerns. In this report, we provide some of the key areas that require further probing, research and discussion, and should be taken up by policy-makers, civic actors, citizens and everyone concerned about the ethical, legal and policy frameworks in the 21st century which can nolonger be discussed without incorporating questions of computation. We will begin by deJining algorithms, in particular those that demand ethical scrutiny. We will proceed by illustrating three characteristics of algorithms with cases from a wide variety of Jields. In the Jinal section, we will address three regulatory responses that have discussed in response to the challenges posed by algorithmic decision-making. This background paper is the result of a two-day conference on “The Ethics of Algorithms”, held in Berlin on March 9 and 10, 2015. The event was jointly organised by the Centre for Internet and Human Rights and the Technical University Berlin, with the support of the Dutch Ministry of Foreign Affairs. The results presented in this paper will feed into the discussions at the Global Conference on Cyberspace, which will take place in the Hague on 16 and 17 April 2015.

The National artificial intelligence research and development strategic plan

PDF Posté par : Nozha Boujemaa

Artificial intelligence (AI) is a transformative technology that holds promise for tremendous societal and economic benefit. AI has the potential to revolutionize how we live, work, learn, discover, and communicate. AI research can further our national priorities, including increased economic prosperity, improved educational opportunities and quality of life, and enhanced national and homeland security. Because of these potential benefits, the U.S. government has invested in AI research for many years. Yet, as with any significant technology in which the Federal government has interest, there are not only tremendous opportunities but also a number of considerations that must be taken into account in guiding the overall direction of Federally-funded R&D in AI. On May 3, 2016, the Administration announced the formation of a new NSTC Subcommittee on Machine Learning and Artificial intelligence, to help coordinate Federal activity in AI.1 This Subcommittee, on June 15, 2016, directed the Subcommittee on Networking and Information Technology Research and Development (NITRD) to create a National Artificial Intelligence Research and Development Strategic Plan. A NITRD Task Force on Artificial Intelligence was then formed to define the Federal strategic priorities for AI R&D, with particular attention on areas that industry is unlikely to address. This National Artificial Intelligence R&D Strategic Plan establishes a set of objectives for Federally- funded AI research, both research occurring within the government as well as Federally-funded research occurring outside of government, such as in academia. The ultimate goal of this research is to produce new AI knowledge and technologies that provide a range of positive benefits to society, while minimizing the negative impacts. To achieve this goal, this AI R&D Strategic Plan identifies the following priorities for Federally-funded AI research: Strategy 1: Make long-term investments in AI research. Prioritize investments in the next generation of AI that will drive discovery and insight and enable the United States to remain a world leader in AI. Strategy 2: Develop effective methods for human-AI collaboration. Rather than replace humans, most AI systems will collaborate with humans to achieve optimal performance. Research is needed to create effective interactions between humans and AI systems. Strategy 3: Understand and address the ethical, legal, and societal implications of AI. We expect AI technologies to behave according to the formal and informal norms to which we hold our fellow humans. Research is needed to understand the ethical, legal, and social implications of AI, and to develop methods for designing AI systems that align with ethical, legal, and societal goals. Strategy 4: Ensure the safety and security of AI systems. Before AI systems are in widespread use, assurance is needed that the systems will operate safely and securely, in a controlled, well-defined, and well-understood manner. Further progress in research is needed to address this challenge of creating AI systems that are reliable, dependable, and trustworthy. Strategy 5: Develop shared public datasets and environments for AI training and testing. The depth, quality, and accuracy of training datasets and resources significantly affect AI performance. Researchers need to develop high quality datasets and environments and enable responsible access to high-quality datasets as well as to testing and training resources. Strategy 6: Measure and evaluate AI technologies through standards and benchmarks. Essential to advancements in AI are standards, benchmarks, testbeds, and community engagement that guide andevaluate progress in AI. Additional research is needed to develop a broad spectrum of evaluative techniques. Strategy 7: Better understand the national AI R&D workforce needs. Advances in AI will require a strong community of AI researchers. An improved understanding of current and future R&D workforce demands in AI is needed to help ensure that sufficient AI experts are available to address the strategic R&D areas outlined in this plan. The AI R&D Strategic Plan closes with two recommendations: Recommendation 1: Develop an AI R&D implementation framework to identify S&T opportunities and support effective coordination of AI R&D investments, consistent with Strategies 1-6 of this plan. Recommendation 2: Study the national landscape for creating and sustaining a healthy AI R&D workforce, consistent with Strategy 7 of this plan.

Designing AI Systems that Obey Our Laws and Values

PDF Posté par : Nozha Boujemaa

Operational AI Systems (for example, self-driving cars) need to obey both the law of the land and our values. We propose AI oversight systems (“AI Guard- ians”) as an approach to addressing this challenge, and to respond to the potential risks associated with in- creasingly autonomous AI systems.a These AI oversight systems serve to verify that operational systems did not stray unduly from the guidelines of their programmers and to bring them back in compliance if they do stray. The introduction of such sec- ond-order, oversight systems is not meant to suggest strict, powerful, or rigid (from here on ‘strong’) controls. Operations systems need a great de- gree of latitude in order to follow the lessons of their learning from addi- tional data mining and experience and to be able to render at least semi- autonomous decisions (more about this later). However, all operational systems need some boundaries, both in order to not violate the law and to adhere to ethical norms. Developing such oversight systems, AI Guard- ians, is a major new mission for the AI community.

Algorithmic Ideology

Lien Posté par : Dominique Cardon

This article investigates how the new spirit of capitalism gets inscribed in the fabric of search algorithms by way of social practices. Drawing on the tradition of the social construction of technology (SCOT) and 17 qualitative expert interviews it discusses how search engines and their revenue models are negotiated and stabilized in a network of actors and interests, website providers and users first and foremost. It further shows how corporate search engines and their capitalist ideology are solidified in a socio-political context characterized by a techno-euphoric climate of innovation and a politics of privatization. This analysis provides a valuable contribution to contemporary search engine critique mainly focusing on search engines' business models and societal implications. It shows that a shift of perspective is needed from impacts search engines have on society towards social practices and power relations involved in the construction of search engines to renegotiate search engines and their algorithmic ideology in the future.

Can an Algorithm be Unethical?

PDF Posté par : Dominique Cardon

In Information and Communication Technologies (ICTs), computer algorithms now control the display of content across a wide range of industries and applications, from search results to social media (Gillespie, 2013). Abuses of power by Internet platforms have led to calls for “algorithm transparency” and regulation (Pasquale, 2014). This paper responds by asking what an analyst needs to know ​about algorithms i​n order to determine if an ICT is acting improperly, unethically or illegally. It further asks whether “the algorithm” is a useful object for ethical investigation. The paper briefly reviews the technical history of the term, then performs an ethical analysis of a hypothetical surveillance system to investigate the question of whether it is useful to consider an algorithm “unethical” in itself. It finds that law and policy researchers can in fact employ technical expertise about algorithms and that such expertise might be crucial to make judgments about future ICTs.

Can an Algorithm be Disturbed?

PDF Posté par : Dominique Cardon

Within literary and cultural studies there has been a new focus on the “surface” as opposed to the “depth” of a work as the proper object of study. We have seen this interest manifested through what appears to be the return of prior approaches including formalist reading practices, attention to the aesthetic dimensions of a text, and new methodologies that come from the social sciences and are interested in modes of description and observation. In arguing for the adoption of these methodologies, critics have advocated for an end to what Paul Ricoeur has termed “the hermeneutics of suspicion” and various forms of ideological critique that have been the mainstay of criticism for the past few decades.2 While these “new” interpretations might begin with what was once repressed through prior selection criteria, they all shift our attention away from an understanding of a “repressed” or otherwise hidden object by understanding textual features less as signifier, an arrow to follow to some hidden depths, than an interesting object in its own right. Computer aided approaches to literary criticism or “digital readings,” to be sure, not an unproblematic term, have been put forward as one way of making a break from the deeply habituated reading practices of the past, but their advocates risk overstating the case and, in giving up on critique, they remain blind to untheorized dimensions of these computational methods. While digital methods enable one to examine radically larger archives than those assembled in the past, a transformation that Matthew Jockers characterizes as a shift from micro to “macroanalysis”, the fundamental assumptions about texts and meaning implicit in these tools and in the criticism resulting from use of these tools belong to a much earlier period of literary analysis.

Is there an ethics of algorithms?

PDF Posté par : Dominique Cardon

We argue that some algorithms are value-laden, and that two or more persons who accept different value- judgments may have a rational reason to design such algorithms differently. We exemplify our claim by dis- cussing a set of algorithms used in medical image analysis: In these algorithms it is often necessary to set certain thresholds for whether e.g. a cell should count as diseased or not, and the chosen threshold will partly depend on the software designer’s preference between avoiding false positives and false negatives. This preference ultimately depends on a number of value-judgments. In the last sec- tion of the paper we discuss some general principles for dealing with ethical issues in algorithm-design.

Ajouter une ressource

Participer vous aussi au développement de la transparence des algorithmes et des données en ajoutant des ressources