Social Media Data Collection Can Lead to Violations of Privacy

Tags: USA

Social Media Data Collection Can Lead to Violations of Privacy published by Evanvinh
Writer Rating: 5.0000
Posted on 2016-03-17
Writer Description: Evanvinh
This writer has written 733 articles.

Andre Oboler is chief executive officer of the Online Hate Prevention Institute and a postgraduate law student at Monash University. Kristopher Welsh is a lecturer in the School of Computing at the University of Kent. Lito Cruz is a teaching associate at Monash University and a part-time lecturer at Charles Sturt University. He holds a PhD in computer science from Monash University.

Social media data can be used to collect information about individuals by governments, businesses,journalists, employers, or social media platforms themselves. This data collection can result in numerous kinds of infringements of privacy. It could be used to manipulate voters, track activists, profile job applicants, or even reveal a user's physical movements. Social media platforms have given little consideration to the ethical issues raised. More needs to be done by both social media companies and users to prevent abuses of data.

Computational social science involves the collection, retention, use and disclosure of information to answer enquiries from the social sciences. As an instrument based discipline, the scope of investigation is largely controlled by the parameters of the computer system involved. These parameters can include: the type of information people will make available, data retention policies, the ability to collect and link additional information to subjects in the study, and the processing ability of the system. The capacity to collect and analyze data sets on a vast scale provides leverage to reveal patterns of individual and group behaviour.

The Danger of Data

The revelation of these patterns can be a concern when they are made available to business and government. It is, however, precisely business and government who today control the vast quantities of data used for computational social science analysis.

Some data should not be readily available: this is why we have laws restricting the use of wiretaps, and protecting medical records. The potential damage from inappropriate disclosure of information is sometimes obvious. However, the potential damage of multiple individually benign pieces of information being combined to infer, or a large dataset being analysed to reveal, sensitive information (or information which may later be considered sensitive) is much harder to foresee. A lack of transparency in the way data is analysed and aggregated, combined with a difficulty in predicting which pieces of information may later prove damaging, means that many individuals have little perception of potential adverse effects of the expansion in computational social science.

The risk posed by the ubiquity of computational social science tools ... poses serious questions about the impact that those who control the data and the tools can have on society as a whole.

Both the analysis of general trends and the profiling of individuals can be investigated through social sciences. Applications of computational social science in the areas of social anthropology and political science can aid in the subversion of democracy. More than ever before, groups or individuals can be profiled, and the results used to better manipulate them. This may be as harmless as advertising for a particular product, or as damaging as political brainwashing. At the intersection of these examples, computational social science can be used to guide political advertising; people can be sold messages they will support and can be sheltered from messages with which they may disagree. Access to data may rest with the incumbent government, with those able to pay, or with those favoured by powerful data-rich companies.

Politics and Beyond

Under its new terms of service, Google could for instance significantly influence an election by predicting messages that would engage an individual voter (positively or negatively) and then filtering content to influence that user's vote. The predictions could be highly accurate making use of a user's e-mail in their Google provided Gmail account, their search history, their Google+ updates and social network connections, and their online purchasing history through Google Wallet, data in their photograph collection. The filtering of information could include "recommended" videos in YouTube; videos selectively chosen to highlight where one political party agrees with the user's views and where another disagrees with them. In Google News, articles could be given higher or lower visibility to help steer voters into making "the right choice".

Such manipulation may not be immediately obvious; a semblance of balance can be given with an equal number of positive and negative points made against each party. What computational social science adds is the ability to predict the effectiveness of different messages for different people. A message with no resonance for a particular voter may seem to objectively provide balance, while in reality making little impact. Such services could not only be sold, but could be used by companies themselves to block the election of officials whose agenda runs contrary to their interests.

The ability to create such detailed profiles of individuals extends beyond the democratic process. The risk posed by the ubiquity of computational social science tools, combined with an ever-increasing corpus of data, and free of the ethical restrictions placed on researchers, poses serious questions about the impact that those who control the data and the tools can have on society as a whole. Traditionally, concerns about potential abuses of power focus on government and how its power can be limited to protect individuals; that focus needs to widen.

Social Media Data for Business

Social media systems contain particularly valuable information. This data derives its value from its detail, personal nature, and accuracy. The semi-public nature of the data means it is exposed to scrutiny within a user's network; this increases the likelihood of accuracy when compared to data from other sources. The social media data stores are owned and controlled by private companies. Applications such as Facebook, LinkedIn, and the Google suite of products (including Google search, YouTube, DoubleClick and others), are driven by information sharing, but monetized through internal analysis of the gathered data—a form of computational social science. The data is used by four classes of users: business clients, government, other users within the social media platform, and the platform provider itself.

Business clients draw on this computational social science when they seek to target their advertisements. Facebook, for example, allows advertisers to target users based on variables that range from standard demographics such as age, gender, and geographical location to more personal information such as sexual preferences. Users can also be targeted based on interests, associations, education level and employer. The Facebook platform makes this data (in aggregated form) available to advertisers for a specific purpose, yet Facebook's standard user interface can also be used as a general computational social science tool for other purposes.

The very existence of social media can ... promote government's agenda.

To take an example, the Australian Bureau of Statistics (ABS) estimates the current population of Australia at 22.5 million. The Facebook advertising platform gives an Australia population (on Facebook) of 9.3 million; over 41 percent of the national population. As there is less coverage at the tails, Facebook has only 0.29 million people over 64, while the ABS says there are 3.06 million Australians over 65, the sample for some age ranges must be approaching the entire population and may provide a very good model as a computational social science tool. For example, research shows that about two percent of the Australia population is not heterosexual. From the Facebook advertising platform, we can readily [select] a population of Australians, aged 18 to 21, who are male, and whose sexual preference is for men. The platform immediately tells us the population size is 11,580 people. By comparing this to the total size of the Australian male Facebook population who expressed a sexual preference, we can see this accounts for 2.89 percent of this population, indicating that the data available to Facebook is of similar utility to that available to social scientists for research.

Data for Government

The second class of users of social media as computational social science tools is governmental. This is demonstrated by the U.S. government's demands to Twitter (via court orders) for data on Wikileaks founder Julian Assange and those connected to him. The court order was only revealed after Twitter took legal action to lift a court imposed censorship order relating to the requests. The Wikileaks affair demonstrates how government can act when it sees social media as acting against its interests.

The very existence of social media can also promote government's agenda. During the Iranian elections, for example, Twitter was asked not to take their service off-line for scheduled maintenance. In another example, the U.S. State Department provided training "using the Internet to effect social change" to Egyptian dissidents between 2008 and 2010, then sought (unsuccessfully) to keep social media access available during the January 2011 Egyptian anti-government protests. The Egyptian effort was defeated after Egypt responded by taking the entire country off the Internet, a move perhaps more in response to the U.S. than the protestors. While social media might enable activism, computational social science favours the state or at least those with power. Computational social science tools combined with social media data can be used to reconstruct the movements of activists, to locate dissidents, and to map their networks. Governments and their security services have a strong interest in this activity.

Social Media Data, Journalists, and Providers

The third class of actors are other social media platform users. Journalist Ada Calhoun has described as an epiphany that left her "freaked out" the realisation that anyone could research her just as she researched others while writing their obituaries. In her article, Calhoun reflected that some amateur experts on the anarchic message board 4chan, or professional experts working for government agencies, could likely find out far more than she could. The everyday danger that can result when anyone can research anyone else can be demonstrated through two scenarios:

Scenario one involves Mary who has been a Facebook user for some years. Through Facebook Mary reconnected with an old friend Fred. As time went on, Mary and Fred grew closer and became a couple. One day Mary logged into her Facebook account and noticed that Fred has still not updated his details to say he is in a relationship with her. This makes Mary feel very insecure, and causes her to begin doubting Fred's intentions. Due to this discovery, Mary broke off her relationship with Fred.

Joe applied to a company as a Human Resource team leader. The hiring manager, Bob, found Joe's resume appealing and considered him a good candidate. Bob decides to check Joe's Facebook information. On Joe's publically viewable wall, Bob sees several pictures of Joe in what Bob considers to be "questionable settings". The company never called Joe for an interview. Joe has been given no opportunity to explain, nor any explanation on why his application was rejected.

Computational science can help a company like Facebook correctly profile its users, showing the right advertisements to the right people so as to maximize revenue.

Both Mary and Bob used Facebook as a computational tool to extract selected information as part of an investigation into the social dynamics of society, or in these cases, a particular individual's interactions with society. In this sense, Facebook could be considered a computational social science tool. Mary's inference may be based on a wider realisation that Fred's interactions with her are all in private and not part of his wider representation of himself. Bob may have drawn his conclusions from a combination of text, pictures, and social interactions.

These situations are far from hypothetical. Research released in November 2011 by Telstra, Australia's largest telecommunications company, revealed that over a quarter of Australian bosses were screening job candidates based on social media. At the start of 2012 the Australia Federal Police began an advertising campaign designed to warn the public of the need to protect their reputation online. The advertisement featured a job interview where the interviewer consults a paper resume then proceeds to note various positive attributes about the candidate; all seems to be going very well. The interviewer then turns to his computer screen and adds "and I see from your recent online activity you enjoy planking from high rise buildings, binge drinking, and posting embarrassing photos of your friends online". The advertisement is an accurate picture of the current approach, which takes place at the level of one user examining another. Computational social science may soon lead to software programs that automatically complete pre-selection and filtering of candidates for employment.

The final class or actor we consider are social media platform providers themselves. While Facebook provides numerous metrics to profile users for advertisers, far more data and scope for analysis is available to a platform provider like Facebook itself. Internet advertisements are often sold on a "cost per-click" (CPC) or "cost per-impression" (CPM—with M indicating costs typically conveyed per-thousand impressions). Thus, Facebook may maximize advertising revenue by targeting advertisements to achieve the greatest possible number of clicks for a given number of impressions. This maximization of the click-through rate (CTR) can be achieved using a wealth of hidden information to model which users are most likely to respond to a particular advertisement. Computational science can help a company like Facebook correctly profile its users, showing the right advertisements to the right people so as to maximize revenue. But what else can a company like Facebook or Google do? This depends on the data they hold.

Triangulation, Breadth, and Depth

While horizontal expansion of computational social science allows greater access to selected aggregate data, vertical expansion allows larger operators to add depth to their models. This depth is a result of triangulation, a method originally from land surveying. Triangulation gives a confirmation benefit by using additional data points to increase the accuracy and confidence in a measurement. In a research context triangulation allows for information from multiple sources to be combined in a way that can expose underlying truths and increase the certainty of conclusions.

Social media platforms have added to their data either by acquiring other technology companies, as Google did when acquiring DoubleClick and YouTube, or by moving into new fields as Facebook did in when it created "Facebook Places": a foursquare-like geolocation service. From a computational social science perspective, geolocation services in particular add high value information. Maximising the value of information requires a primary key that connects this data with existing information; a Facebook user ID, or a Google account name provides just such a key.

The breadth of an account measures how many types of online interaction the one account connects. It lets the company providing the account know about a wider slice of a user's life. Three situations are possible. The first involves distinct accounts on multiple sites and allows no overlap of data: what occurs on one site stays on that site. The second situation is where there is a single traceable login, for example your e-mail address, which is used on multiple sites but where the sites are independent. Someone, or some computational social science tool, with access to the datasets could aggregate the data. The third possibility is a single login with complete data sharing between sites. All the data is immediately related and available to any query the underlying company devises. It is this last scenario that forms the Holy Grail for companies like Facebook and Google, and causes the most concern for users.

The announcement by Alma Whitten, Google's Director of Privacy, Product and Engineering in January 2012 that Google would aggregate its data and "treat you as a single user across all our products" has led to a sharp response from critics. Jeffrey Chester, executive director of the Center for Digital Democracy, told theWashington Post: "There is no way a user can comprehend the implication of Google collecting across platforms for information about your health, political opinions and financial concerns." In the same article, Common Sense Media chief executive James Steyer states bluntly that "Google's new privacy announcement is frustrating and a little frightening".

Accounts that are identity-verified, frequently updated, and used across multiple aspects of a person's life present the richest data and pose the greatest risk.

The depth of an account measures the amount of data an account connects. There are three possible situations. The first is an anonymous login with no connection to personal details, the virtual profile is complete in and of itself—it may or may not truthfully represent the real world. The second situation is an account where user details are verified, for example a university login that is only provided once a student registers and identification papers have been checked. A number of online services and virtual communities are now using this model and checking government issued identification to verify age. The third situation involves an account that has a verified identity aggregated with other data collected from additional sources, for example, a credit card provider knows who its customers are, as well as where they have been and what they have bought. The temporal nature of the data is also a matter of depth; your current relationship status has less depth than your complete relationship history.

Facebook's Timeline feature signifies as large a change to depth as Google's policy change does to breadth. Timeline lets users quickly slide to a previous point in time, unearthing social interactions that had long been buried. A Facebook announcement on 24 January 2012 informed the world that Timeline was not optional and would in a matter of weeks be rolled out across all Facebook profiles.

As Sarah Jacobsson Purewal noted in PC World, with Timeline it takes only a few clicks to see data that previously required around 500 clicks on the link labelled "older posts", each click separated by a few seconds delay while the next batch of data loads. Purewal provides a step-by-step guide to reasserting privacy under the new timeline regime, the steps are numerous and the ultimate conclusion is that "you may want to just consider getting rid of your Facebook account and starting from scratch". Though admittedly not scientific, a poll by Sophos, an IT security and data protection company, showed that over half those polled were worried about Timeline. The survey included over 4,000 Facebook users from a population that is likely both more concerned and more knowledgeable about privacy and security than the average user. If that wasn't telling enough, the author of the announcement, Sophos' senior technology consultant, Graham Cluley, announced in the same article that he had shutdown his Facebook account. Cluley's reasoning was a response to realizing exactly how much of his personal data Facebook was holding, and fatigue at Facebook's ever changing and non-consultative privacy regime.

All accounts have both a breadth and a depth. Accounts that are identity-verified, frequently updated, and used across multiple aspects of a person's life present the richest data and pose the greatest risk. The concept of a government-issued national identity card has created fierce debate in many countries, yet that debate has been muted when the data is collected and held by non-government actors. Google's new ubiquitous account and Facebook's single platform for all forms of social communication should raise similar concerns for individuals as both consumers and citizens....

Privacy and Caveat Emptor

In discussing the ethics of social science research, [Constance] Holden noted two schools of thought: utilitarianism (also known as consequentialism) holds that an act can only be judged on its consequences; deontologicalism (also known as non-consequentialism) is predominantly about absolute moral ethics. In the 1960s utilitarianism was dominant, along with moral relativism; in the late 1970s deontologicalism began to hold sway. In computational social science, the debate seems to be academic with little regard given to ethics. Conditions of use are typically one-sided without user input, although Wikipedia is a notable exception. Companies expand their services and data sets with little regard for ethical considerations, and market forces in the form of user backlashes [are] the first, and often only, line of resistance.

One such backlash occurred over Facebook's Beacon software, which was eventually cancelled as part of an out of court settlement. Beacon connected people's purchases to their Facebook account; it advertised to their friends what a user had purchased, where they got it, and whether they got a discount. In one instance, a wife found out about a surprise Christmas gift of jewellery after her husband's purchase was broadcast to all his friends—including his wife. Others found their video rentals widely shared, raising concerns it might out people's sexual preferences and other details of their private life. In addition to closing down Beacon, the settlement involved the establishment of a fund to better study privacy issues, an indication that progress was stepping well ahead of ethical considerations.

The caveat emptor view of responsibility for disclosure of personal data by social networking sites is arguably unsustainable. Through Beacon, retailers shared purchasing information with Facebook based on terms and conditions purchasers either failed to notice, or failed to fully appreciate. Beacon took transactions outside consumers' reasonable expectations. While Facebook was forced to discontinue the service, appropriate ethical consideration by technology professionals could have highlighted the problems at a much earlier stage.

Source Citation

Oboler, Andre, Lito Cruz, and Kristopher Welsh. "Social Media Data Collection Can Lead to Violations of Privacy." Are Social Networking Sites Harmful? Ed. Noah Berlatsky. Farmington Hills, MI: Greenhaven Press, 2015. At Issue. Rpt. from "The Danger of Big Data: Social Media as Computational Social Science." First Monday 17.7 (2 July 2012). Opposing Viewpoints in Context. Web. 17 Mar. 2016.

URL
http://db24.linccweb.org/login?url=http://ic.galegroup.com/ic/ovic/ViewpointsDetailsPage/ViewpointsDetailsWindow?failOverType=&query=&prodId=OVIC&windowstate=normal&contentModules=&display-query=&mode=view&displayGroupName=Viewpoints&dviSelectedPage=&limiter=&currPage=&disableHighlighting=&displayGroups=&sortBy=&zid=&search_within_results=&p=OVIC&action=e&catId=&activityType=&scanId=&documentId=GALE%7CEJ3010744218&source=Bookmark&u=lincclin_spjc&jsid=cbaa86d882063eca0b92592fa47a565d

Gale Document Number: GALE|EJ3010744218

Tools

Citation Tools
Email
Download
Print
Highlights and Notes (0)
Save
Share
Translate

Related Subjects

Business ethics Data mining Intelligence gathering Right of privacy Social media More

Sources:
http://ic.galegroup.com.db24.linccweb.org/ic/ovic/ViewpointsDetailsPage/ViewpointsDetailsWindow?failOverType=&query=&prodId=OVIC&windowstate=normal&contentModules=&display-query=&a

You have the right to stay anonymous in your comments, share at your own discretion.