Verified Document

Mining The Process Of Extracting New Information Research Proposal

¶ … Mining The process of extracting new information from existing information through the use of computer system is called Text Mining. Text mining retrieves data of available information and establishes the connection between the facts mentioned in that data. This is how, new information is developed. Since it is newly formed information, its validation is conducted through experimentation. The process of web search is often confused with that of text mining, though these are two entirely different processes. In web search, the computers match the keywords in the database and bring the relevant records. The information is written down by somebody and then uploaded on the internet to make it searchable. On the other hand, in text mining, altogether new information is generated out of existing body of knowledge (Berry, 2004).

Text mining finds its roots in data mining. Data mining refers to the process in which the computer system retrieves unique information from the existing database. Hence text mining is also named as Text Data Mining. Other names for text mining are Intelligent Text Analysis and Knowledge-Discovery in Text (KDD). It extracts the interesting information out of unstructured text. Data mining from unstructured information has high value in the emerging field of text mining. It is because of readily availability of unstructured data and its large volume. Text mining enjoys the perception of high commercial value as more than 80% of the information is stored in the form of text and can be explored to generate new body of knowledge. In addition to data extraction, text mining includes computational linguistics, statistics and machine learning as well (Berry, 2004).

Knowledge Discovery from Database (KDD) is enjoying portion of eminence in the field of emerging applications, like Text Understanding. It works through extracting both implicit and explicit concepts from the existing data and then forming semantic relations among the concepts. It is done with the help of Natural Language Processing Techniques commonly known as NLP Techniques. KDD when combined with NLP discovers useful information though knowledge management, information extraction, machine learning, statistics and reasoning (Navathe et al., 2000).

As mentioned earlier, data mining and text mining are somewhat similar concepts. The only difference lies in the type of data explored and the tools used. Data mining works well with highly structured data only, while text mining is applicable for semi-structured or unstructured data as well. The unstructured data includes HTML files, full-text documents and emails. In this perspective, it becomes more preferable to the companies. But there is also an aspect which prevents the use of text mining. This hindrance is the dependence on NLP. It is because natural language was not meant for computer systems initially nor it is developed for this purpose. Because of this issue, structured data and data mining practices are more prevalent in the field of research and development (Navathe et al., 2000).

The obstacles posed by computers system in regard of NLP does not exist in case of human beings. The human beings can easily comprehend the language patterns and can even distinguish between the various ones applied in the same text. The examples are contextual meanings, the slangs and spelling variation in a database. The computer systems are not yet equipped with the capability of linguistic patterns identification quickly (Weiguo, 2005).

A collection of documents is provided to the text mining tool. After exploring them, it selects one particular document to identify its character set and format. After this phase, it starts analyzing the text mentioned in the document. It repeatedly applies various techniques to extract information from the database. The presented example quote three techniques of text analysis, however, there be many others based on the combination of these techniques. It basically depends upon the organizational goals, which provide guidelines about the data to be extracted. The retrieved data is inserted in the organizational management information systems so that the end users may retrieve it for their use (Weiguo, 2005).

Statement of the problem

There is a gap in the literature regarding the text information extraction from a huge database.

Purpose of the study

The study investigates how to extract a specific phrase from a text. It employs survey techniques to interview experts in the field and assesses results using coding techniques.

Rationale of the study

It is important to note that several research studies related to text extraction have been carried out. However, no research has focused on the evaluating text information extraction in large datbases...

Therefore this research will fill this vital gap existing in the literature and focus on investigating the extent to which text extraction can be made accurate and precise.
Lastly, this study offers a number of theoretical contributions as well. Common analytical and operational issues have become increasingly vital as institutions move from comparatively simple methods and communication models, to intricate multi-channel models. Also, it is worth noting that the collective forces of technology, demography, control, as well as, globalization have been pushing organizational information systems, all over the world, to change their strategy so as to keep pace with the ever changing world. Evaluating the extent to which text extraction from large databases can be made accurate and precise has been a neglected topic. This study will shed light on this vital subject.

Research Questions

The question below are the main research questions:

How to extract a specific phrase from a text in large databases?

Literature Review

Technological foundations

The gap that had started to occur between computer and human languages, because of the numerous variations between them, is now narrowing down due to the improvement in technology. The computer is now able to comprehend, criticize and produce text on its own because they have been taught the natural language with the help of a program created by the people who work in the field of natural language. Some of the things developed in the program that helps the computer in producing text are how to track a topic, how to get relevant information from the database, form data in organized manner, shorten it, form links between topics and how to answer questions. All these developments and their role along with how the user will find these programs to be useful will be discussed in detail (Sergio, 2002).

A. Extraction of Information

That program helps realize the main things of a text which is done by identifying how the text is written, known as pattern matching. The link between all places, time and people is indentified so that the user is given useful information out of the database. This is helpful when large quantity of data is being processed. Previously, it was assumed that the information to be used is the related one. However, that is not the case. In many programs the electronic information is not in the form of a structure but freely available. This issue is dealt by the IE as their work is to form a structured data from the raw one. To do this, the IE module used KDD module. After useful information is taken out from all the information provided, DISCOTEX, by using discovery rules, sees if any information has been missed in the database (Sergio, 2002).

B. Topic Tracking

The free of charge topic tracking tool is available to the users at www.alert.yahoo.com which is offered by Yahoo. This tool informs the user about any news available regarding the topic that the user chooses. Thus, a topic tracking system is a system that maintains a user's profile and suggests different documents to the user associated to documents that the user has viewed earlier. Despite being beneficial, topic tracking has its limitations, for instance, a user can get many news on mining for minerals or characteristics of minerals instead of text mining, although he/she has previously set an alert for 'text mining'. A company can be notified when a competitor enters market through topic tracking, which can add to its advantages, so the company will get updated with the changes in market and take a step further accordingly. Students can utilize topic tracking for research on their subjects and articles related to their studies. Organizations can even find out about news on them through topic tracking. Moreover, topic tracking can help doctors and individuals who search for treatments and latest development in the medical field. More and better text mining tools can be utilized which benefit the users who can opt their interests or the software can conclude the user's interest through their previous selections of articles from the database (Sergio, 2002).

A set of particular words in an article that provides a significant explanation of its substance to the users are known as keywords. It has been very time consuming and almost impossible to extract keywords manually from a given database, which can be more difficult in case of news articles that are published in huge quantities on daily basis. The keyword extraction has developed into a source for several text mining applications like summarization, text categorization, topic detection and search engine as worldwide web has created a platform for online documents. Thus, a summarized data can…

Sources used in this document:
Using categorization procedures will allow the customer service centers to segment the database according the content and topic of the database, hence the customers will find it convenient to browse through the relevant topics. Text categorization aims to divide a document according to different elements of topics it comprises of. It also compares documents for content relevance with one another (Gupta and Lehal, 2009).

One method to learn the algorithms of categorization is to understand the procedures from available classified documents. Furthermore, use of these algorithms for categorization of unclassified documents can be useful. For instance, let there be two sets D. And C. containing n and p elements respectively, where'd is a set of categorized documents that belong to set of C, which represent the classes. The objective of learning categorization process is to ascertain which element of set D. corresponds to element of C. Hence, n documents corresponding to different classes are classified into p types. Hence, data collected is run through feature selection process for preparation (Shantanu and Shourya, 2008).

Data collected consists of text that comprises of
Cite this Document:
Copy Bibliography Citation

Related Documents

Extracting Information Sentiment From Blogs
Words: 5886 Length: 20 Document Type: Research Proposal

4. Transparency, authenticity, and focus are good. Bland is bad. Many people are looking for someone who is in authority to share their ideas, experiences, or suggestions (Bielski, 2007, p. 9). Moreover, just as content analysis of other written and symbolic forms has provided new insights that might have otherwise gone unnoticed, the analysis of blog content may reveal some unexpected findings concerning hot topics and significant social trends that are

Mining the Concept of Text
Words: 3299 Length: 10 Document Type: Term Paper

The heuristics that are considered are probabilistic machine learning approaches. Such an approach is the 'Alignment Conditional Random Fields' that is designed for a scoring sequence for undirected graphical models. (Bilenko; Mooney, 2005) There are demands for this type of software and there is a vast area of information analysis where text mining is beginning to get important. One field is in the analysis of literature and research reviews. Literary

Pollution From Mining Activities How Serious Is
Words: 2171 Length: 6 Document Type: Essay

Pollution From Mining Activities How serious is the pollution that results from mining activities? How clean are the coal mining activities in Kentucky, West Virginia, and other Appalachian areas where mountaintops are stripped away to get at the coal? What other mining activities cause pollution of the air, the land, and the waterways? This paper will delve into those mining activities and report the pollution that results from those strategies. The Pollution

Data Mining the Amount of Knowledge Available
Words: 877 Length: 3 Document Type: Essay

Data Mining The amount of knowledge available in today's world is massive. The information technology specialist who's responsible to his or her organization for maximizing the capacity for practical usage of this knowledge, it is becoming increasingly difficult to have a total grasp of the problem. The purpose of this essay is to discuss the importance of implementing data warehousing and mining systems inside an organization. In order to do this,

Human Rights Perspective of Mining Induced Displacement and Resettlement...
Words: 4485 Length: 10 Document Type: Research Paper

Introduction Background The present-day economic development gives rise to a substantially greater magnitude of resettlement in comparison to ten years ago. In the past six decades, the worldwide magnitude of development-induced displacement and resettlement has fully-fledged to an approximated 250 million to over 400 million people (Terminski, 2012). Across the globe, development projects have resulted in approximately 15 million people facing displacement on an annual basis (Van der Ploeg and Vanclay, 2017).

Data Mining
Words: 1427 Length: 4 Document Type: Research Paper

Data Mining Determine the benefits of data mining to the businesses when employing: Predictive analytics to understand the behaviour of customers "The decision science which not only helps in getting rid of the guesswork out of the decision-making process but also helps in finding out the perfect solutions in the shortest possible time by making use of the scientific guidelines is known as predictive analysis" (Kaith, 2011). There are basically seven steps involved

Sign Up for Unlimited Study Help

Our semester plans gives you unlimited, unrestricted access to our entire library of resources —writing tools, guides, example essays, tutorials, class notes, and more.

Get Started Now