Verified Document

Preserving Privacy Of Individuals In Data Mining Research Paper

Related Topics:

Introduction There is exponential growth in the amount of data collections that contain person-specific information. The organizations that collect this data are entrusted to ensures that the data remains private and that no external entities have access to the data. However, there are instances that the data can be beneficial to researchers and analysts in their attempts to answer numerous questions. In many cases, organizations would like to share this data while protecting the privacy of the individuals. In an attempt to protect the privacy, it becomes hard for the organization to preserve the utility of the data, which would result in less accurate analytical outcomes (Sweeney, 2002). The data owner would like to have a way that they can transform datasets containing highly sensitive information into privacy-preserving records that they can easily share with other researchers or corporate partners. However, there have been numerous cases of organizations releasing datasets that they believe are anonymized only for the records to be re-identified. Therefore, it is vital for organizations to understand how the anonymizations techniques work and assess how they can be safely applied to datasets. This is where k-anonymity comes into play. K-anonymity is a privacy model that is applied in order to protect the data subjects' privacy when sharing data. A release of data is considered to have k-anonymity property if the data for each individual contained in the release cannot be distinguished from at least one k-1 individuals whose data also appears in the release. K-anonymity reduces the risk of re-identification of any anonymized data by ensuring that any linkages to other datasets are not possible. Using k-anonymity property one is able to make the dataset less precise...

Parts of this document are hidden

View Full Document
svg-one

The generalization method replaces individual values of attributes with a broader category thus preventing the re-identification of the individual values. For example, a value ‘19’ that is of the age attribute could be replaced with ‘? 20’. This would anonymize the values for age and make it hard for re-identification to occur. Suppression of values entails the replacement of certain values of the attributes with an asterisk. All or some of the values found in a column could be replaced by the asterisk. For example, the values of the attribute name could be all replaced with an asterisk or some of the values for zip code could be replaced with asterisks.
These two methods have limitations and combining the two methods into one decreases the risk of the data being re-identified. Kohlmayer et al. (2015) posit that combining the two techniques there is the preservation of the truthfulness of the information in the dataset. It is also possible for the dataset to preserve the privacy of the individuals when the two methods are used together. Any information that is left out by one of the methods can be easily eliminated by the other method and this will ensure that the released dataset…

Sources used in this document:

References

Fung, B. C., Wang, K., Fu, A. W.-C., & Philip, S. Y. (2010). Introduction to privacy-preserving data publishing: Concepts and techniques. Boca Raton, FL: CRC Press.

Kohlmayer, F., Prasser, F., & Kuhn, K. A. (2015). The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss. Journal of biomedical informatics, 58, 37-48.

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570.

 


Cite this Document:
Copy Bibliography Citation

Related Documents

Data Mining
Words: 1427 Length: 4 Document Type: Research Paper

Data Mining Determine the benefits of data mining to the businesses when employing: Predictive analytics to understand the behaviour of customers "The decision science which not only helps in getting rid of the guesswork out of the decision-making process but also helps in finding out the perfect solutions in the shortest possible time by making use of the scientific guidelines is known as predictive analysis" (Kaith, 2011). There are basically seven steps involved

Data Mining in Health Care Data Mining
Words: 1003 Length: 3 Document Type: Essay

Data Mining in Health Care Data mining has been used both intensively and extensively in many organizations.in the healthcare industry data mining is increasingly becoming popular if not essential. Data mining applications are beneficial to all parties that are involved in the healthcare industry including care providers, HealthCare organizations, patients, insurers and researchers (Kirby, Flick,.&Kerstingt, 2010). Benefits of using data mining in health care Care providers can make use of data analysis in

Data Mining Businesses Can Receive Many Benefits
Words: 1387 Length: 4 Document Type: Essay

Data Mining Businesses can receive many benefits from data mining. Which benefits they receive, however, can also depend on the way in which their data mining is undertaken. Predictive analytics are used to understand customer behavior, and businesses use the behavior of the customer in the past to attempt to determine what the customer will do in the future (Cabena, et al., 1997). While it is not an exact science, many

Data Warehousing and Data Mining
Words: 2013 Length: 6 Document Type: Term Paper

Data Warehousing and Data Mining Executive Overview Analytics, Business Intelligence (BI) and the exponential increase of insight and decision making accuracy and quality in many enterprises today can be directly attributed to the successful implementation of Enterprise Data Warehouse (EDW) and data mining systems. The examples of how Continental Airlines (Watson, Wixom, Hoffer, 2006) and Toyota (Dyer, Nobeoka, 2000) continue to use advanced EDW and data mining systems and processes to streamline

Data Mining, a Process That Involves the
Words: 1271 Length: 4 Document Type: Essay

Data mining, a process that involves the extraction of predictive information which is hidden from very large databases (Vijayarani & Nithya,2011;Nirkhi,2010) is a very powerful and yet new technology having a great potential in helping companies to focus on the most important data in their data warehouses. The use of data mining techniques allows for the prediction of trends as well as behaviors thereby allowing various businesses to make proactive

Data Mining Evaluating Data Mining
Words: 3527 Length: 10 Document Type: Thesis

The use of databases as the system of record is a common step across all data mining definitions and is critically important in creating a standardized set of query commands and data models for use. To the extent a system of record in a data mining application is stable and scalable is the extent to which a data mining application will be able to deliver the critical relationship data,

Sign Up for Unlimited Study Help

Our semester plans gives you unlimited, unrestricted access to our entire library of resources —writing tools, guides, example essays, tutorials, class notes, and more.

Get Started Now