Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems

PDF Posté par : Daniel Le Métayer

Algorithmic systems that employ machine learning play an increasing role in making substantive decisions in modern society, ranging from online personalization to insurance and credit decisions to predictive policing. But their decision-making processes are often opaque—it is difficult to explain why a certain decision was made. We develop a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of influence of inputs on outputs of systems. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals (e.g., a loan decision) and groups (e.g., disparate impact based on gender). Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the marginal influence of individual inputs within such a set (e.g., income). Since a single input may be part of multiple influential sets, the average marginal influence of the input is computed using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting. Further, since transparency reports could compromise privacy, we explore the transparency-privacy tradeoff and prove that a number of useful transparency reports can be made differentially private with very little addition of noise. Our empirical validation with standard machine learning algo- rithms demonstrates that QII measures are a useful transparency mechanism when black box access to the learning system is available. In particular, they provide better explanations than standard associative measures for a host of scenarios that we consider. Further, we show that in the situations we consider, QII is efficiently approximable and can be made differentially private while preserving accuracy.

Mobilitics, saison 2 : Les smartphones et leurs apps sous le microscope de la CNIL et d'Inria

PDF Posté par : Nozha Boujemaa

La CNIL et Inria travaillent depuis maintenant 3 ans sur un projet de recherche et d’innovation ambitieux nommé Mobilitics. Son objectif : mieux connaître les smartphones, ces objets utilisés quotidiennement par des dizaines de millions de français et qui restent de véritables boîtes noires pour les utilisateurs, les chercheurs et les autorités de régulation. Pourtant, ces « amis qui nous veulent du bien » sont d’extraordinaires producteurs et consommateurs de données personnelles. Du point de vue de la recherche, ils incarnent idéalement les enjeux au cœur de l’activité de l’équipe Privatics d’Inria : comprendre les mécanismes techniques autour des données personnelles et concevoir des solutions techniques préservant la vie privée. Un outil capable de détecter les accès à des données personnelles dans les appareils (localisation, photos, carnet d'adresses) a donc été développé, mis au point et expérimenté. Après une première vague de tests en 2013, une « deuxième saison » de Mobilitics a eu lieu pendant l’été 2014. Les premiers résultats présentés dans cette lettre illustrent bien l'intérêt du partenariat entre Inria et la CNIL : des outils imaginés et conçus ensemble sont utilisés par les deux institutions, chacune dans son rôle. Pour la CNIL, il s’agit de mieux comprendre ce qui se passe réellement lors de l’usage de ces appareils, pour définir des priorités d’action et émettre des recommandations. Pour Inria, il s’agit aussi de pousser plus loin les investigations et analyses techniques et de développer des solutions permettant de mieux protéger les utilisateurs. Ces travaux sont donc l’occasion pour les deux institutions de partager leurs analyses et interrogations. En effet, si ces technologies offrent des services extraordinaires aux individus et sont bénéfiques pour la société, elles ne peuvent se développer que dans le respect de la vie privée et des libertés individuelles. Rendre la technologie plus transparente et plus compréhensible aux citoyens est un défi commun pour la recherche et pour l’autorité de régulation.

Measuring Price Discrimination and Steering on E-commerce Web Sites

PDF Posté par : Daniel Le Métayer

Today, many e-commerce websites personalize their content, including Netflix (movie recommendations), Amazon (prod­ uct suggestions), and Yelp (business reviews). In many cases, personalization provides advantages for users: for ex­ ample, when a user searches for an ambiguous query such as “router,” Amazon may be able to suggest the woodworking tool instead of the networking device. However, personaliza­ tion on e-commerce sites may also be used to the user’s dis­ advantage by manipulating the products shown (price steer­ ing) or by customizing the prices of products (price discrim­ ination). Unfortunately, today, we lack the tools and tech­ niques necessary to be able to detect such behavior. In this paper, we make three contributions towards ad­ dressing this problem. First, we develop a methodology for accurately measuring when price steering and discrimina­ tion occur and implement it for a variety of e-commerce web sites. While it may seem conceptually simple to detect dif­ ferences between users’ results, accurately attributing these differences to price discrimination and steering requires cor­ rectly addressing a number of sources of noise. Second, we use the accounts and cookies of over 300 real-world users to detect price steering and discrimination on 16 popular e-commerce sites. We find evidence for some form of per­ sonalization on nine of these e-commerce sites. Third, we investigate the effect of user behaviors on personalization. We create fake accounts to simulate different user features including web browser/OS choice, owning an account, and history of purchased or viewed products. Overall, we find numerous instances of price steering and discrimination on a variety of top e-commerce sites.

Online Tracking: A 1-million-site Measurement and Analysis

PDF Posté par : Daniel Le Métayer

We present the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites (“cookie syncing”). Our findings include multiple so- phisticated fingerprinting techniques never before measured in the wild. This measurement is made possible by our open-source web privacy measurement tool, OpenWPM, which uses an automated version of a full-fledged consumer browser. It supports parallelism for speed and scale, automatic recovery from failures of the underlying browser, and comprehensive browser instrumentation. We demonstrate our platform’s strength in enabling researchers to rapidly detect, quantify, and characterize emerging online tracking behaviors.

How Unique Is Your Web Browser?

PDF Posté par : Daniel Le Métayer

We investigate the degree to which modern web browsers are subject to “device fingerprinting” via the version and configura- tion information that they will transmit to websites upon request. We implemented one possible fingerprinting algorithm, and collected these fingerprints from a large sample of browsers that visited our test side, panopticlick.eff.org. We observe that the distribution of our finger- print contains at least 18.1 bits of entropy, meaning that if we pick a browser at random, at best we expect that only one in 286,777 other browsers will share its fingerprint. Among browsers that support Flash or Java, the situation is worse, with the average browser carrying at least 18.8 bits of identifying information. 94.2% of browsers with Flash or Java were unique in our sample. By observing returning visitors, we estimate how rapidly browser finger- prints might change over time. In our sample, fingerprints changed quite rapidly, but even a simple heuristic was usually able to guess when a fin- gerprint was an “upgraded” version of a previously observed browser’s fingerprint, with 99.1% of guesses correct and a false positive rate of only 0.86%. We discuss what privacy threat browser fingerprinting poses in practice, and what countermeasures may be appropriate to prevent it. There is a tradeoff between protection against fingerprintability and certain kinds of debuggability, which in current browsers is weighted heavily against pri- vacy. Paradoxically, anti-fingerprinting privacy technologies can be self- defeating if they are not used by a sufficient number of people; we show that some privacy measures currently fall victim to this paradox, but others do not.

WifiLeaks: Underestimated Privacy Implications of the ACCESS_WIFI_STATE Android Permission

PDF Posté par : Daniel Le Métayer

On Android, installing an application implies accepting the permissions it requests, and these permissions are then en- forced at runtime. In this work, we focus on the privacy im- plications of the ACCESS_WIFI_STATE permission. For this purpose, we analyzed permissions of the 2700 most popular applications on Google Play and found that the ACCESS- _WIFI_STATE permission is used by 41% of them. We then performed a static analysis of 998 applications requesting this permission and based on the results, chose 88 appli- cations for dynamic analysis. Our analyses reveal that this permission is already used by some companies to collect user Personally Identifiable Information (PII). We also conducted an online survey to study users’ perception of the privacy risks associated with this permission. This survey shows that users largely underestimate the privacy implications of this permission. As this permission is very common, most users are therefore potentially at risk.

EU regulations on algorithmic decision-making and a “right to explanation”

PDF Posté par : Dominique Cardon

We summarize the potential impact that the Euro- pean Union’s new General Data Protection Reg- ulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predic- tors) which “significantly affect” users. The law will also create a “right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large chal- lenges for industry, it highlights opportunities for machine learning researchers to take the lead in designing algorithms and evaluation frameworks which avoid discrimination.

The Trouble Algorithmic Decisions: An Analytic Road Map to Examine Efficiency and Fairness in Automated and Opaque Decision Making

PDF Posté par : Dominique Cardon

We are currently witnessing a sharp rise in the use of algorithmic decision- making tools. In these instances, a new wave of policy concerns is set forth. This article strives to map out these issues, separating the wheat from the chaff. It aims to provide policy makers and scholars with a comprehensive framework for approaching these thorny issues in their various capacities. To achieve this objective, this article focuses its attention on a general analytical framework, which will be applied to a specific subset of the overall discussion. The analytical framework will reduce the discussion to two dimensions, every one of which addressing two central elements. These four factors call for a distinct discussion, which is at times absent in the existing literature. The two dimensions are the specific and novel problems the process assumedly generates and the specific attributes which exacerbate them. While the problems are articulated in a variety of ways, they most likely could be reduced to two broad categories: efficiency and fairness-based concerns. In the context of this discussion, such prob- lems are usually linked to two salient attributes the algorithmic processes feature—its opaque and automated nature.

Dominique Cardon : "Le web est plus riche que les seules plateformes de réseau social" - Test ressource de type Lien

Lien Posté par : Christophe Fraysse

Au lendemain de la comparution de Facebook au tribunal de Paris suite à un litige l'opposant à un utilisateur ayant vu son compte supprimé suite à la publication de "l'Origine du monde"(tableau de Gustave Courbet), Dominique Cardon, sociologue, spécialiste de l'usage des réseaux sociaux, répond aux questions de Nicolas Demorand sur l'impact des algorithmes dans notre usage des réseaux sociaux et la diffusion des fake news.

Ajouter une ressource

Participer vous aussi au développement de la transparence des algorithmes et des données en ajoutant des ressources