Making Algorithms Less Biased and Reducing Inequalities of Power

Algorithms increasingly effect society, from employment (Amazon’s algorithms discriminated against women) to the criminal justice system (where they often discriminate against African-Americans), and making them less biased would reduce inequalities in power. This is also related to how research suggests that AI is able to independently develop its own prejudices.

With machine learning systems now being used to determine everything from stock prices to medical diagnoses, it’s never been more important to look at how they arrive at decisions.

A new approach out of MIT demonstrates that the main culprit is not just the algorithms themselves, but how the data itself is collected.

“Computer scientists are often quick to say that the way to make these systems less biased is to simply design better algorithms,” says lead author Irene Chen, a PhD student who wrote the paper with MIT professor David Sontag and postdoctoral associate Fredrik D. Johansson. “But algorithms are only as good as the data they’re using, and our research shows that you can often make a bigger difference with better data.”

Looking at specific examples, researchers were able to both identify potential causes for differences in accuracies and quantify each factor’s individual impact on the data. They then showed how changing the way they collected data could reduce each type of bias while still maintaining the same level of predictive accuracy.

“We view this as a toolbox for helping machine learning engineers figure out what questions to ask of their data in order to diagnose why their systems may be making unfair predictions,” says Sontag.

Chen says that one of the biggest misconceptions is that more data is always better. Getting more participants doesn’t necessarily help, since drawing from the exact same population often leads to the same subgroups being under-represented. Even the popular image database ImageNet, with its many millions of images, has been shown to be biased towards the Northern Hemisphere.

According to Sontag, often the key thing is to go out and get more data from those under-represented groups. For example, the team looked at an income-prediction system and found that it was twice as likely to misclassify female employees as low-income and male employees as high-income. They found that if they had increased the dataset by a factor of 10, those mistakes would happen 40 percent less often.

In another dataset, the researchers found that a system’s ability to predict intensive care unit (ICU) mortality was less accurate for Asian patients. Existing approaches for reducing discrimination would basically just make the non-Asian predictions less accurate, which is problematic when you’re talking about settings like healthcare that can quite literally be life-or-death.

Chen says that their approach allows them to look at a dataset and determine how many more participants from different populations are needed to improve accuracy for the group with lower accuracy while still preserving accuracy for the group with higher accuracy.

Facebook Seeking to Exploit Consumer Banking Data

Major corporations are interested primarily in profits, not helping human beings. Since data is clearly one of the most valuable resources in the world today, major corporations trying to obtain consumer banking data represents the corporations trying to further engage in data mining and exploitation.

Apparently not satisfied with access to its users’ call history, text messaging data, and online conversations, Facebook has reportedly asked major Wall Street firms like JPMorgan Chase and Wells Fargo to hand over their customers’ sensitive financial data as part of the social media giant’s ongoing attempt to become “a platform where people buy and sell goods and services.”

And according to the Wall Street Journal—which first reported on Facebook’s plans on Monday—the social media behemoth isn’t the only tech company that wants access to Americans’ financial data. Google and Amazon have also “asked banks to share data if they join with them, in order to provide basic banking services on applications such as Google Assistant and Alexa,” the Journal pointed out, citing anonymous sources familiar with the companies’ ambitions.

Over the past year, Facebook has reached out to some of America’s largest banks to request “detailed financial information about their customers, including card transactions and checking account balances, as part of an effort to offer new services to users,” the Journal notes. “Facebook has told banks that the additional customer information could be used to offer services that might entice users to spend more time on Messenger.”

In response to the Journal‘s reporting, critics of corporate power used the word “dystopian” to describe the push by Facebook, Google, and Amazon for ever-greater access to users’ personal information in a bid to boost profits.

[…]

While Facebook insisted in response to the Journal‘s story that it doesn’t want to use any of this data for advertising purposes or share it with third parties, many pointed out that there is no reason to trust Facebook’s expressed commitment to user privacy, particularly in the wake of the Cambridge Analytica scandal and other abuses.

Data Debunks Propaganda on Immigrants Being a Major Source of U.S. Crime

Immigrants have overall had half the crime rates of native born American citizens for a few decades now. The use of immigrants as scapegoats for the real problems facing society — with its lack of focus on corporate crime in the suites — still remains a common ploy used by corrupt officials today though.

As of 2017, according to Gallup polls, almost half of Americans agreed that immigrants make crime worse. But is it true that immigration drives crime? Many studies have shown that it does not.

Immigrant populations in the United States have been growing fast for decades now. Crime in the same period, however, has moved in the opposite direction, with the national rate of violent crime today well below what it was in 1980.

In a large-scale collaboration by four universities, led by Robert Adelman, a sociologist at the State University of New York at Buffalo, researchers compared immigration rates with crime rates for 200 metropolitan areas over the last several decades. The selected areas included huge urban hubs like New York and smaller manufacturing centers less than a hundredth that size, like Muncie, Ind., and were dispersed geographically across the country.

According to data from the study, a large majority of the areas have many more immigrants today than they did in 1980 and fewer violent crimes. The Marshall Project extended the study’s data up to 2016, showing that crime fell more often than it rose even as immigrant populations grew almost across the board.

In 136 metro areas, almost 70 percent of those studied, the immigrant population increased between 1980 and 2016 while crime stayed stable or fell. The number of areas where crime and immigration both increased was much lower — 54 areas, slightly more than a quarter of the total. The 10 places with the largest increases in immigrants all had lower levels of crime in 2016 than in 1980.

Understanding the Default Effect and Its Substantial Relevance

The default effect is an observed phenomenon in human psychology that basically describes how a lot of people will usually use the default option that’s presented to them. It’s an important and valuable concept to understand because of its widespread use in technology and other consumer products. Google has for example paid Firefox hundreds of millions of dollars in the past to have Google’s search engine be the default in Firefox. The Yahoo! corporation has also engaged in a bidding war to have its search engine be the default in Firefox before too, and the reason that these corporations are willing to spend large amounts of money for this basic feature is that their executives understand the power of the default effect in directing substantial amounts of human behavior.

There are plenty of other examples that illustrate the default effect’s relevance, such as Microsoft devoting immense resources to maintaining Windows as the widespread default operating system on many new desktop computers, and Facebook attempting (and fortunately failing) at implementing their restrictive Facebook-only Internet service in India. Profits of these corporations are predicated to a significant extent on personal data, and it’s therefore evident that stronger holds over that data will allow for potentially higher profit shares.

Additionally, the default effect’s power is amplified by the tendency humans often have to form habits. Once a habit is formed around whatever the implementation of the default effect is, that makes the default effect all the stronger.

In upper Silicon Valley domains there is also the talk of “the next billion,” referencing the billions of people who have still not used the Internet. One of the goals of these corporations (whether they admit it or not) is to lock in those users and exclude their competitors so that they are able to take increased advantage of the new data share. The data (especially the surface data) of a lot of other people has already been mined and collected by them, so they are approaching these people who haven’t connected to the online world yet. It’s reminiscent of the cigarette industry’s historic and still ongoing attempts to establish brand loyalty among smokers while also hooking people while they’re young.

In sum here, the default effect is important to understand and therefore be able to more effectively avoid modern technological exploitation. Sharing this sort of insight with others should help them with that as well.

Polisis AI Developed to Help People Understand Privacy Policies

It looks as though this AI development could be quite useful in helping people avoid the exploitation of their personal information. Someone reading this may also want to look into a resource called Terms of Service; Didn’t Read, which “aims at creating a transparent and peer-reviewed process to rate and analyse Terms of Service and Privacy Policies in order to create a rating from Class A to Class E.”

But one group of academics has proposed a way to make those virtually illegible privacy policies into the actual tool of consumer protection they pretend to be: an artificial intelligence that’s fluent in fine print. Today, researchers at Switzerland’s Federal Institute of Technology at Lausanne (EPFL), the University of Wisconsin and the University of Michigan announced the release of Polisis—short for “privacy policy analysis”—a new website and browser extension that uses their machine-learning-trained app to automatically read and make sense of any online service’s privacy policy, so you don’t have to.

In about 30 seconds, Polisis can read a privacy policy it’s never seen before and extract a readable summary, displayed in a graphic flow chart, of what kind of data a service collects, where that data could be sent, and whether a user can opt out of that collection or sharing. Polisis’ creators have also built a chat interface they call Pribot that’s designed to answer questions about any privacy policy, intended as a sort of privacy-focused paralegal advisor. Together, the researchers hope those tools can unlock the secrets of how tech firms use your data that have long been hidden in plain sight.

[…]

Polisis isn’t actually the first attempt to use machine learning to pull human-readable information out of privacy policies. Both Carnegie Mellon University and Columbia have made their own attempts at similar projects in recent years, points out NYU Law Professor Florencia Marotta-Wurgler, who has focused her own research on user interactions with terms of service contracts online. (One of her own studies showed that only .07 percent of users actually click on a terms of service link before clicking “agree.”) The Usable Privacy Policy Project, a collaboration that includes both Columbia and CMU, released its own automated tool to annotate privacy policies just last month. But Marotta-Wurgler notes that Polisis’ visual and chat-bot interfaces haven’t been tried before, and says the latest project is also more detailed in how it defines different kinds of data. “The granularity is really nice,” Marotta-Wurgler says. “It’s a way of communicating this information that’s more interactive.”

[…]

The researchers’ legalese-interpretation apps do still have some kinks to work out. Their conversational bot, in particular, seemed to misinterpret plenty of questions in WIRED’s testing. And for the moment, that bot still answers queries by flagging an intimidatingly large chunk of the original privacy policy; a feature to automatically simplify that excerpt into a short sentence or two remains “experimental,” the researchers warn.

But the researchers see their AI engine in part as the groundwork for future tools. They suggest that future apps could use their trained AI to automatically flag data practices that a user asks to be warned about, or to automate comparisons between different services’ policies that rank how aggressively each one siphons up and share your sensitive data.

“Caring about your privacy shouldn’t mean you have to read paragraphs and paragraphs of text,” says Michigan’s Schaub. But with more eyes on companies’ privacy practices—even automated ones—perhaps those information stewards will think twice before trying to bury their data collection bad habits under a mountain of legal minutiae.

Tax Cuts and Growth Revisited

If the U.S. economy does now have a year of GDP growth considerably higher than what it was in the past decade, it will probably lead into some political lies of the future related to tax cuts and growth. If government officials actually care about better economic growth, they’ll implement policies that invest in technological advancements and employment programs that increase capacity utilization. This growth should then go to the general public instead of the minor faction of the population that’s already wealthy.

The Democrats were virtually unanimous in opposition to the tax cuts that Republicans pushed through Congress last year. They had good cause. The overwhelming majority of the tax cuts go to the richest 1 percent of the population, the same group that has gotten the bulk of the gains from economic growth over the last four decades. For those who don’t think making the rich richer is an important priority of government, the tax cuts were a really bad idea.

[…]

While there is little reason to believe that the tax cuts would lead to the sort of boost in growth claimed by proponents, it is actually very plausible that GDP growth could average 3 percent over the next decade.

There are two factors that determine GDP growth: the rate of growth of the labor force and the rate of growth of productivity. The rate of labor force growth is almost certain to be slower going forward simply because the massive baby boom cohort will be retiring over the next decade.

[…]

This matters hugely because there is some reason to believe that productivity is picking up for reasons having nothing to do with the tax cut. Productivity growth averaged 2.1 percent in the second and third quarter of last year. It then fell slightly in the fourth quarter due to quirks in the data, specifically a surge in the number of people reported as self-employed. But with early reports indicating first quarter GDP growth will be well over 3 percent, we are likely to see another quarter of strong productivity growth.

While this uptick cannot be plausibly explained by the tax cut, there is an alternative explanation: It may simply be the result of a tighter labor market. The tighter labor market has led to increased wage growth at the bottom end of the pay ladder.

[…]

This all matters from a political standpoint because it would be unfortunate if the Republicans were to get credit for a pickup in growth which has nothing to do with them. Some of us did try to warn of this possibility last year, but the leading Democratic economists were not interested in our assessment.

Just to repeat what we said then, it is very possible that we will see something like the 3 percent GDP growth promised by the Republicans, but not because we gave more money to rich people. Because so many denied this possibility, Democratic economists may end up helping to convince people that giving money to the rich is the key to a strong economy.

NSA Violates Court Order by Deleting Data It Was Supposed to Preserve

The NSA shows once again why it’s such a trustworthy agency.

The National Security Agency destroyed surveillance data it pledged to preserve in connection with pending lawsuits and apparently never took some of the steps it told a federal court it had taken to make sure the information wasn’t destroyed, according to recent court filings.

Word of the NSA’s foul-up is emerging just as Congress has extended for six years the legal authority the agency uses for much of its surveillance work conducted through U.S. internet providers and tech firms. President Donald Trump signed that measure into law Friday.

Since 2007, the NSA has been under court orders to preserve data about certain of its surveillance efforts that came under legal attack following disclosures that President George W. Bush ordered warrantless wiretapping of international communications after the 2001 terrorist attacks on the U.S. In addition, the agency has made a series of representations in court over the years about how it is complying with its duties.

However, the NSA told U.S. District Court Judge Jeffrey White in a filing on Thursday night and another little-noticed submission last year that the agency did not preserve the content of internet communications intercepted between 2001 and 2007 under the program Bush ordered. To make matters worse, backup tapes that might have mitigated the failure were erased in 2009, 2011 and 2016, the NSA said.