Polisis AI Developed to Help People Understand Privacy Policies

It looks as though this AI development could be quite useful in helping people avoid the exploitation of their personal information. Someone reading this may also want to look into a resource called Terms of Service; Didn’t Read, which “aims at creating a transparent and peer-reviewed process to rate and analyse Terms of Service and Privacy Policies in order to create a rating from Class A to Class E.”

But one group of academics has proposed a way to make those virtually illegible privacy policies into the actual tool of consumer protection they pretend to be: an artificial intelligence that’s fluent in fine print. Today, researchers at Switzerland’s Federal Institute of Technology at Lausanne (EPFL), the University of Wisconsin and the University of Michigan announced the release of Polisis—short for “privacy policy analysis”—a new website and browser extension that uses their machine-learning-trained app to automatically read and make sense of any online service’s privacy policy, so you don’t have to.

In about 30 seconds, Polisis can read a privacy policy it’s never seen before and extract a readable summary, displayed in a graphic flow chart, of what kind of data a service collects, where that data could be sent, and whether a user can opt out of that collection or sharing. Polisis’ creators have also built a chat interface they call Pribot that’s designed to answer questions about any privacy policy, intended as a sort of privacy-focused paralegal advisor. Together, the researchers hope those tools can unlock the secrets of how tech firms use your data that have long been hidden in plain sight.

[…]

Polisis isn’t actually the first attempt to use machine learning to pull human-readable information out of privacy policies. Both Carnegie Mellon University and Columbia have made their own attempts at similar projects in recent years, points out NYU Law Professor Florencia Marotta-Wurgler, who has focused her own research on user interactions with terms of service contracts online. (One of her own studies showed that only .07 percent of users actually click on a terms of service link before clicking “agree.”) The Usable Privacy Policy Project, a collaboration that includes both Columbia and CMU, released its own automated tool to annotate privacy policies just last month. But Marotta-Wurgler notes that Polisis’ visual and chat-bot interfaces haven’t been tried before, and says the latest project is also more detailed in how it defines different kinds of data. “The granularity is really nice,” Marotta-Wurgler says. “It’s a way of communicating this information that’s more interactive.”

[…]

The researchers’ legalese-interpretation apps do still have some kinks to work out. Their conversational bot, in particular, seemed to misinterpret plenty of questions in WIRED’s testing. And for the moment, that bot still answers queries by flagging an intimidatingly large chunk of the original privacy policy; a feature to automatically simplify that excerpt into a short sentence or two remains “experimental,” the researchers warn.

But the researchers see their AI engine in part as the groundwork for future tools. They suggest that future apps could use their trained AI to automatically flag data practices that a user asks to be warned about, or to automate comparisons between different services’ policies that rank how aggressively each one siphons up and share your sensitive data.

“Caring about your privacy shouldn’t mean you have to read paragraphs and paragraphs of text,” says Michigan’s Schaub. But with more eyes on companies’ privacy practices—even automated ones—perhaps those information stewards will think twice before trying to bury their data collection bad habits under a mountain of legal minutiae.

U.S. Federal Government Set to Further Expand Mass Surveillance

It’s striking that the same congressional Democrats who verbally denounce the current president as a tyrant then vote to grant the executive branch extremely unjust surveillance authority. U.S. citizens, I encourage you to call the Senate and tell them to vote no on this mass surveillance bill. The Capitol Switchboard number is (202) 804-3305.

With the Senate set to cast its first votes on a bill that reauthorizes and expands the government’s already vast warrantless spying program in a matter of hours, civil libertarians on Tuesday launched a last-ditch effort to rally opposition to the legislation and demand that lawmakers protect Americans’ constitutional right to privacy.

Fight for the Future (FTF), one of many advocacy groups pressuring lawmakers to stop the mass surveillance bill in its tracks, notes that “just 41 senators can stop” the bill from passing.

“In the age of federal misconduct, every member of Congress must move right now to stop the government’s abuse of the internet to monitor everyone; they must safeguard our freedom and the U.S. Constitution,” FTF urged.

The FISA Amendments Reauthorization Act of 2017 (S.139)—passed by the House last week with the revealing but not surprising help of 65 Democrats—would renew Section 702 of FISA, set to expire this Friday.

As The Intercept‘s Glenn Greenwald notes, “numerous Senate Democrats are poised” to join their House colleagues in voting to re-up Section 702, thus violating “the privacy rights of everyone in the United States” and handing President Donald Trump and Attorney General Jeff Sessions sprawling spying powers.

The Senate’s first procedural vote on a cloture motion is expected at 5:30pm ET. If the motion is approved, the path will be clear for the bill to hit the Senate floor.

“Every member of Congress is going to have to decide whether to protect Americans’ privacy, and shield vulnerable communities from unconstitutional targeting, or to leave unconstitutional spying authority in Trump’s—and Jeff Sessions’—hands,” the advocacy group Indivisible notes.

EU Privacy Shield Standard Should be Adopted by More Countries

Online privacy isn’t as appreciated as it should be, but that may change as exponentially more devices are connected to the Internet over the next several years.

If you’re ever expecting a child, Target wants to be one of the first to know. The company has invested in research to identify pregnant customers early on, based upon their purchasing behavior. Then, it targets them with ads for baby gear.

While companies such as Target mine data about products their customers purchase from them (like prenatal vitamins) to send them personalized ads, many also rely on information gathered about us on the web — like what we search for on Google or email our friends. That lets them realize we’re planning a vacation to the Grand Canyon, for instance, and send us ads for local hotels.

 Many people think that it’s an invasion of privacy for companies to gather sensitive data — such as information about our relationships and medical history — and exploit it for commercial purposes. It could also widen social divisions. For example, Facebook determines our political beliefs based upon the pages we like and preferences we list on our profiles. If algorithms peg us as conservative or liberal and we’re targeted with ads accordingly, we may end up never understanding what people of other political persuasions think. Internet activist and author Eli Pariser has argued that America is so politically polarized in part because social media sites leave us in “filter bubbles.” Targeted political advertising could have the same effect.

That’s part of the reason why, in May, a new regulation will go into effect into the European Union giving citizens the “right to object” to “processing of personal data” about them for marketing and other purposes. As Andrus Ansip, the European Commission vice president for the digital single market, tweeted, “Should I not be asked before my emails are accessed and used? Don’t you think the same?” The new law overcame serious opposition from the advertising industry, whose representatives argue that it will disrupt ad revenues needed by the media. Experts say that websites will have to provide more valuable content to users as an incentive for readers to allow them to use their data.

Here in the U.S., most ads are bought through exchanges that allow advertisers to target people based upon data about them. Companies can choose to buy ads that will be seen, for example, by women who live in a particular ZIP code and graduated from a certain school. But according to guidance established by the Digital Advertising Alliance — a consortium of industry trade associations including the American Association of Advertising Agencies, the Association of National Advertisers, and the Better Business Bureau — consumers should have “the ability to exercise choice with respect to the collection and use of data.” Two members of the alliance accept consumer complaints and do their own research to identify violations of the rule. They work with companies to help them fix problems and report violations to regulators. 1  

While the principle behind the new EU law could justify wide-ranging new regulations and restrictions on how companies throughout the world do business, James Ryseff, a former Google engineer, says it’s likely that initially it will simply allow users to opt out of the “cookies” that track internet users as they surf the web. Although this will reduce the amount of data that tech companies can collect, it doesn’t truly allow users to opt out of targeted advertising, since businesses can still use the information they gather through other techniques — such as in-store purchases — to classify and reach customers. That’s why, Ryseff says, Americans should have more sophisticated ways to determine exactly what advertisers learn about us.

First, for example, we should be able to decide whether companies are able to gather generic data about who we are (such as our age, gender and location) or information about what we’re doing (such as researching a medical condition) — or neither, or both. “In general, I think ‘What I do’ information has a greater ability to freak people out,” Ryseff says. “Used incorrectly, it makes you feel like Google is stalking you.”

Second, Americans should get to decide where and when our data is tracked. For example, some people might be more comfortable being tracked on a search engine that knows their buying behavior and can make recommendations accordingly, but less so on personal email which can identify private facts about their lives — or work email which might contain proprietary information. (Google previously used data from the content of users’ emails to target them with ads, but pledged in June to stop the practice.) And we might want to temporarily stop allowing search engines to track our activities when we’re looking up something private, like medical symptoms. 2

Third, we should get to decide whether we’re willing to be targeted with ads based upon our own behaviors or people algorithms have decided are like us.

Research Develops First Reliable Method for Websites to Track Users With Multiple Browsers

Either legal or technological defenses will be required to stop this tracking that so invades personal privacy.

Researchers have recently developed the first reliable technique for websites to track visitors even when they use two or more different browsers. This shatters a key defense against sites that identify visitors based on the digital fingerprint their browsers leave behind.

State-of-the-art fingerprinting techniques are highly effective at identifying users when they use browsers with default or commonly used settings. For instance, the Electronic Frontier Foundation’s privacy tool, known as Panopticlick, found that only one in about 77,691 browsers had the same characteristics as the one commonly used by this reporter. Such fingerprints are the result of specific settings and customizations found in a specific browser installation, including the list of plugins, the selected time zone, whether a “do not track” option is turned on, and whether an adblocker is being used.

Until now, however, the tracking has been limited to a single browser. This constraint made it infeasible to tie, say, the fingerprint left behind by a Firefox browser to the fingerprint from a Chrome or Edge installation running on the same machine. The new technique—outlined in a research paper titled (Cross-)Browser Fingerprinting via OS and Hardware Level Features—not only works across multiple browsers, it’s also more accurate than previous single-browser fingerprinting.

Fingerprinting isn’t automatically bad and, in some cases, offers potential benefits to end users. Banks, for instance, can use it to know that a person logging into an online account isn’t using the computer that has been used on every previous visit. Based on that observation, the bank could check with the account holder by phone to make sure the login was legitimate. But fingerprinting also carries sobering privacy concerns.

“From the negative perspective, people can use our cross-browser tracking to violate users’ privacy by providing customized ads,” Yinzhi Cao, the lead researcher who is an assistant professor in the Computer Science and Engineering Department at Lehigh University, told Ars. “Our work makes the scenario even worse, because after the user switches browsers, the ads company can still recognize the user. In order to defeat the privacy violation, we believe that we need to know our enemy well.”

[…]

Cross-browser fingerprinting is only the latest trick developers have come up with to track people who visit their sites. Besides traditional single-browser fingerprinting, other tracking methods include monitoring the way visitors type passwords and other text and embedding inaudible sound in TV commercials or websites. The Tor browser without an attached microphone or speakers is probably the most effective means of protection, although the researchers said running a browser inside a virtual machine may also work.

Giant Data Leak Exposes Data on 123 Million U.S. Households

This is yet another data breach that would be much less likely to happen if the NSA would primarily do its actual job and protect Americans instead of spying on them and other relatively innocent foreign citizens. Up to 90 percent of the NSA’s budget is dedicated to offense and spying when it should be dedicated to securing vital technological infrastructure and defending the public instead. Unfortunately though, the NSA today is largely an example of the government — compromised through excessive corporate control — treating its own domestic population as the enemy, and that sort of example happens far too frequently in the modern world.

Researchers revealed Tuesday that earlier this year they discovered a massive database — containing information on more than 123 million American households — that was sitting unsecured on the internet.

The cloud-based data repository from marketing analytics company Alteryx exposed a wide range of personal details about virtually every American household, according to researchers at cybersecurity company UpGuard. The leak put consumers at risk for a range of nefarious activities, from spamming to identity theft, the researchers warned.

Though no names were exposed, the data set included 248 different data fields covering a wide variety of specific personal information, including address, age, gender, education, occupation and marital status. Other fields included mortgage and financial information, phone numbers and the number of children in the household.

“From home addresses and contact information, to mortgage ownership and financial histories, to very specific analysis of purchasing behavior, the exposed data constitutes a remarkably invasive glimpse into the lives of American consumers,” UpGuard researchers Chris Vickery and Dan O’Sullivan wrote in their analysis.

A cascade of recent database breaches has left consumers on edge about the security of their personal information. After credit monitoring company Equifax revealed in September that cybercriminals had made off with data on more than 145 million Americans, US lawmakers began efforts to hold such businesses accountable to the everyday people whose data they collect for profit.

[…]

“The data exposed in this bucket would be invaluable for unscrupulous marketers, spammers and identity thieves, for whom this data would be largely reliable and, more importantly, varied,” the researchers said. “With a large database of potential victims to survey — with such details as ‘mortgage ownership’ revealed, a common security verification question — the price could be far higher than merely bad publicity.”

More Than 400 of the World’s Most Popular Websites Try to Record Your Every Keystroke

This is significant work done by Princeton researchers. It’s honestly a pretty damning indictment of the world’s most visited websites.

Most people who’ve spent time on the internet have some understanding that many websites log their visits and keep record of what pages they’ve looked at. When you search for a pair of shoes on a retailer’s site for example, it records that you were interested in them. The next day, you see an advertisement for the same pair on Instagram or another social media site.

The idea of websites tracking users isn’t new, but research from Princeton University released last week indicates that online tracking is far more invasive than most users understand. In the first installment of a series titled “No Boundaries,” three researchers from Princeton’s Center for Information Technology Policy (CITP) explain how third-party scripts that run on many of the world’s most popular websites track your every keystroke and then send that information to a third-party server.

Some highly-trafficked sites run software that records every time you click and every word you type. If you go to a website, begin to fill out a form, and then abandon it, every letter you entered in is still recorded, according to the researchers’ findings. If you accidentally paste something into a form that was copied to your clipboard, it’s also recorded. Facebook users were outraged in 2013 when it was discovered that the social network was doing something similar with status updates—it recorded what users they typed, even if they never ended up posting it.

These scripts, or bits of code that websites run, are called “session replay” scripts. Session replay scripts are used by companies to gain insight into how their customers are using their sites and to identify confusing webpages. But the scripts don’t just aggregate general statistics, they record and are capable of playing back individual browsing sessions. The scripts don’t run on every page, but are often placed on pages where users input sensitive information, like passwords and medical conditions.

[…]

Most troubling is that the information session replay scripts collect can’t “reasonably be expected to be kept anonymous,” according to the researchers. Some of the companies that provide this software, like FullStory, design tracking scripts that even allow website owners to link the recordings they gather to a user’s real identity. On the backend, companies can see that a user is connected to a specific email or name. FullStory did not return a request for comment.

[…]

Companies that sell replay scripts do offer a number of redaction tools that allow websites to exclude sensitive content from recordings, and some even explicitly forbid the collection of user data. Still, the use of session replay scripts by so many of the world’s most popular websites has serious privacy implications.

“Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details, and other personal information displayed on a page to leak to the third-party as part of the recording,” the researchers wrote in their post.

Passwords are often accidentally included in recordings, despite that the scripts are designed to exclude them. The researchers found that other personal information was also often not redacted, or only redacted partially, at least with some of the scripts. Two of the companies, UserReplay and SessionCam, block all user inputs by default (they just track where users are clicking), which is a far safer approach.

[…]

Finally, the study’s authors are worried that session script companies could be vulnerable to targeted hacks, especially because they’re likely high-value targets. For example, many of these companies have dashboards where clients can playback the recordings they collect.

[…]

It’s not just session scripts that are following you around the internet. A study published earlier this year found that nearly half of the world’s 1,000 most popular websites use the same tracking software to monitor your behavior in various ways.

If you want to block session replay scripts, popular ad-blocking tool AdBlock Plus will now protect you against all of the ones documented in the Princeton study. AdBlock Plus formerly only protected against some, but has now been updated to block all as a result of the researchers’ work.