Saturday, December 7, 2019
Use of Machine Learning Program & Techniques-Samples for Students
Question: Discus about the Use of Machine learning program and techniques of data mining for speech to speech summarization of the text. Answer: Title: Investigation on the machine learning and data mining activities associated with the speech to speech and speech to text summarization. Introduction: In this paper we are going to research on the use of machine learning program and techniques of data mining for speech to speech summarization of the text. The speech recognition is the technologies which are used for converting the language spoken by the user into the text format by the computer. The voice recognition is the major component of the speech recognition methodology. It is the automatic recognition of the speech for mining of the text from the available data. It is the procedure followed for determining the relevant information and high quality data for the test available. The statistic pattern learning is the most useful techniques used for devising the trends in the accumulation of the high quality and accurate information. The summaries are presented by the speech methodology is categorised into two types which are classified as concatenation of the segmentation of the speech for extracting the unique speech presented and the second method synthesizing process used fo r summarization by making use of speech synthesizer. Research aim: The aim of the research is to implement the techniques of data mining for reducing the errors occurred in the compilation of the sentences. It helps in recognizing the procedures which are used for reducing the errors in the speech to speech program. Research questions and Objectives: The research questions which are undertaken for the completion of the research study are described below: How to avoid the wrong information due to errors in the speech recognition? Objective: To record the errors occurring in the speech recognition Policies used for minimizing the occurrence of errors in the speech recognition techniques The method used for removing the less important information before the compilation of the sentences. Proposal of different data mining techniques for the reduction of errors occurred in the sentence formation. Focusing on the advantage of data mining techniques in the recognition of less important data. Research hypothesis The performance of the existing system can be improved by indulging the speech recognition system in the working platform (Berry, 2014). The research hypothesis is constructed in developing the research study on the error occurred in process of speech recognition. The under-generation and the over-generation are used for creating the interpretation of the research hypothesis undertaken (Perner, 2010). H0: The rejection and acceptance of the error in the sentence construction helps in removing the errors from the phrases used. H1: The clarification of the request helps in confirmation of the concept for designing the error resistance model for the activities undertaken to establish error resistance working model for the summarization of the speech to speech and speech to text model (Cercone, 2012). H2: The concatenation of the speech segment for summarizing the scenario of sentences, words, and phrases helps in extracting the relevant information from it. H3: The errors are the deviation of the outcome from the expected results. H4: The word error rate is used for computing the errors are used for identifying the rate of insertion, deletion, and substitution. H5: The concept error rate is used for determining the quantity of errors occurred in the concatenation of the sentences (Dinoy, 2016). H6: The categorisation of the rigid parser helps in demonstrating the over generation and under generation of the resources. H7: The tendency of recalling the occurrences of the activities helps in managing the errors. The consequences help in determining the constraint ratio. Background and significance The communication plays a vital role in gaining accuracy in the working structure of the report generation and required documentation. The process of speech recognition was introduced in converting the spoken words into textual sentences (Moreno, 2012). The efficiency of converting the words into text can be improved and enhanced with the accumulation of the speech program. It has been analysed that the errors occurred in the preparation of the documentation through speech recognition system raises the dissatisfaction in the user because of the occurrence of errors in the development of the report. The turnaround time of producing the report is increased due to occurrence of errors. The substitution of the wrong word in the phrases can completely changed the meaning of the sentences. The focus should be given on the usage of the two stage protocol for constructing the sentence with accuracy. The effectiveness and efficiency can be improved of the sentence construction for managing the compression ratio of larger sentence to reduce the rate of error. The extraction of the sentence depends on the use of three scores which are categorised as Linguistic score, significance score, and confidence score. The error handling helps in demonstrating the errors in the scenari (Liu, 2006). The following table shows the types of errors which occurred in the given scenario of speech recognition techniques (Source: Kang, 2013). Dialogue strategies to overcome speech recognition errors inform filling dialogue) Categories Over generation error type Under generation error type Categorisation of the errors Low precision of errors Low recalling of the errors Occurrence of ASR errors Process of insertion Process of deletion Occurrence of miscommunication Occurrence of misunderstanding in the complete process of the concatenation of the sentences Non-understanding of the error prone construction of the sentences. Consequences of the error prone sentence Failure of the task Repetition of the occurrence of errors The identification of the errors helps in reducing the errors in the construction of the speech recognition system. The acoustic and language model is used for determining the constant error which occurred in the construction of the sentence. The speech recognition system is associated with the variable errors. The expectation of the outputs helps in developing the error resistance system which is capable of enlightening the clear understanding of the information transmitted by the speaker. The handling of error helps in resolving the issues associated with the understanding capability of the receiver and the sender. The false exception and rejection helps in optimising the reduction in the error rate. The correction and in-correction in the decision creates the problem for the speech recognition system. The detection and correction methods are used list of methods used for determining the sequence of activities. The command dialogue helps in establishing the n-list associated with the errors. The prediction of error helps in handling the classification of baseline for the improvement plan for continuous error deduction procedure used. The decision chart or deduction of the error works on four processes which are classified as acceptance, rejection, understanding of the display, and clarification of the request. The division of the problem helps in the identification of the error prone area. The theoretical model for predicting the data driven policies helps in analysing the ground decision. The error detection in the later phases helps in predicting the inconsistency for the re-evaluation of the assumptions taken in the development of the speech to speech summarization and speech to text summarization. The following table shows the positive and negatives cues of the speech recognition system which is undertaken for analysing the speech to speech and speech to text summarization. The handling of error can be done by the series of functions which are categorised as development of the new system, repetition of the processes undertaken by the user, integration of the system, modification of the user requirement, and the negation of the user. Research methodology The qualitative and quantitative research methodology has been undertaken on analysing the facts and figures associated with the investigation of the data mining techniques used in the formulation of the speech to speech and speech to text summarization. The interview is arranged with the IT experts to gather facts and figures which helps in analysing the investigation of the data mining techniques used in the formulation of the speech to speech and speech to text summarization. The scenario of interview is developed for formulating the solution for reducing the errors occurred in the system of speech synthesis. The questionnaire is arrange with many IT expert under the same platform for analysing the difference in the information provided by different expert on the program of data mining in the speech text summarization system. Focused groups is group of IT experts which are in collecting the relevant information based on real and virtual facts collected from the different sources o n the common platform (Najafabadi, 2012). Sampling method is based on the selection of the small sample for organizing the experiment for analysing the frequency of errors occurred in the construction of the sentence. The sampling methods help in analysing the defects and gaps which exist in the concatenation of the phrases for the development of the sentence (Kawle, 2013). The qualitative and quantitative methods are used for predicting the errors occurred in the spoken sentences. The data gathered from focused groups, interviews, and other qualitative approach helps in providing the details of using the following data mining techniques for reducing the errors in the sentence formation which are stated below: Unsupervised data mining techniques Semi-Supervised Data mining techniques Supervised data mining techniques Sentiment lexicon techniques Classification of the lexical sentiments Support vector machine Distinction of positive and negative binary data Transductive vector machine support Use of nave bayes Orientation of the sentiments Detection of the polarity Adaptation of the nave bayes Extraction of the patterns Mincuts of the randomised data Developments of the decision tree. The identification of the problematic error helps in devising the concept level clarification. The alternative clarification helps in generating parallel hypothesis for the management of the decision problem. The fixing of the errors helps in developing the robust processes for determining the uncertainty and ambiguity in the development of the speech recognition system. The following tasks should be taken under consideration while collecting data on the speech to speech synthesis. Tokenization process: The sequence of character is break down into tokens which can be used for putting punctuation marks in the text for further processing. The higher rte is generated with longer sentences. Filtering: Filtering is the process focuses on removing the extra word from the frequently appeared text. Lemmatization: It is used for doing the morphological analysis on the sequence of characters. Stemming: Stemming is the methodology used for obtaining the root words from the sequence of derived words. Research Philosophy: Research philosophy focuses on the use of knowledge for speech text summarization. The complexity with the investigating techniques is raised due to the potential risks associated with the deployment of speech recognition system. The accuracy is the major factor associated with the preparation of the report through the speech to speech recognition system or speech to text recognition system. The ontological research philosophy is used for defining the process of conceptualization between different terms for finding out the relationship between the knowledge based recognized domains. Research Strategy: The focus should be given on the sources which are responsible for the occurrence of uncertainty and the errors in human, age, gender, and variability in the dialects used for the construction of the sentence for communication between the participating units. The speaking rate is the major factor responsible for the occurrence of errors. The unpredictable results helps in establishing the errors related with the out of vocabulary. The errors can be handled with the distinguishing of bugs and exceptions which occurred in the speech recognition system. Research Design: The research design focuses on analysing the research problem and correlation between dependent and independent variable for analysing the speech text summarization. The system is comprised with the robust assessment of the hypothetical activities for resolving the occurrence of error occurred in the complete scenario for the construction and concatenation of the sentences (Neto, 2015). The acceptance of the concept helps in defining the error resistance background for the construction of the sentence. Data Collection: The following are the data collection methods used for collecting data for the speech text summarization process: Nave Bayes Collection Method: This is the approach which is based on assumptions. Bayes rules is used for collecting the parameters for the study. The independency is the common rule which is used for the different data collected. The calculation of the probabilities can be done by summing the probabilities for the variety of components (Sources: Nenkova, A. (2016). A survey of text summarization techniqies. 1st ed. [ebook]). The highest probability can be calculated by the following: Nearest Neighbour collection method: This method is used for measuring distance based data to improve the classification methodology. The k-nearest neighbour is used for the classification of different components. Decision Tree collection method: This methodology is used for calculating the value of the attributes in the given hierarchy of data. The root node is classified as the instance for the tree structure. Support Vector Machine: This is used for supervising the liner classifiers which helps in taking the decision based on the linear combination of the data. It helps in providing the robust data of high dimension. Data analysis: Analysis of the Speech text summarization: The speech text summarization depends on the sequence of two stages which are categorised as extraction of the sentence and compaction of the sentence. The result helps in calculating the accuracy of the sentence. The filers are removed from the sentence for controlling the automatic speech system. The following diagram shows the automation system which is used for text summarization ( Source: Zhong, N. (2012). Effective pattern discovery for text mining. 1st ed. [ebook]). Procedure of sentence extraction: The following equation is used for storing the result of the automatic speech summarization system. Here, N represent the number of words used in representing the construction of the sentence, L(wi) represent the linguistic score of the sentence, I(wi) represent the significance score, and C(wi) represent the confidence score of the sentence (W). These scores help in the representation of the compaction method. Compaction of the sentence: The low significant sentences are removed for achieving accuracy by reducing the number of errors. The transcription procedures are used for calculating the sentence compaction score. The three scores are used for managing transcription of the word (Govindraj, 2016). The dependency of the phrases can be improved by providing structured format to the grammar used in the construction of the sentences (Chakraborty, 2014). The concatenation score is used for measuring the compression ratio with the use of protocol named as 2-stage dynamic protocol. The fillers are used for managing the difference between the participating units. The rejection and acceptance of the error in the sentence construction helps in removing the errors from the phrases used (Bramer, 2013). The clarification of the request helps in confirmation of the concept for designing the error resistance model for the activities undertaken to establish error resistance working model for the summarization of the speech to spe ech and speech to text model (Furui, 2013)). This protocol helps in developing the compression ratio according to the demand of the sentence formation to achieve accuracy and minimizing errors in the sentence. The accuracy in the summarization can be achieved by using transcription process in the evaluation of the set target. The variation in the speech summarization theory helps in constructing the sentence with accuracy (Zhong, 2012). The following string shows the example of sentence formation with the use of speech recognition technique with accuracy (Source: Zhong, 2012. Effective pattern discovery for text mining. 1st ed. [ebook]). The two stage protocol depends on the random selection of the word, the weighting factor, optimization of the value used, and recurrence in the summary word, linguistic score, confidence score, and significance score. Analysis of the Speech to speech presentation and summarization: The concatenation of the speech segment for summarizing the scenario of sentences, words, and phrases helps in extracting the relevant information from it. The importance should be given on extracting criteria. The summary speech is used for managing the concatenation methods (Antino, 2012). The investigation helps in managing the relationship between words, sentences, and fillers. The reliability of the method can be achieved by managing the occurrence of spontaneous speech. The correction is recognized automatically for extracting the speech segmentation (Chakraborty, 2010). The purpose of this paper is to reduced errors from the construction of the sentence. The important sentence depends on the synchronization of the result achieved. The removal of the unwanted words helps in reducing the length of the sentence and reducing errors (Gonzalez, 2015). The filler units are used for managing the boundaries of the sentence for extracting expected results. The intermediate results can be developed with the use of continuity of acoustic speech. The evaluation of the units depends on the recognition of the consequences associated with the speech extraction. Concatenation of the participating units: The segmentation boundaries help in attaining the required results for obtaining amplitude difference for the waveform formation for analysing the accuracy for the deployment of result. The speaking rate helps in managing the unnatural sound held in the short pause of the sentence. It has been analysed that the length of the speaking rate should be in between 50 and 100 ms (Kang, 2013). The summarization of the sentence helps in increasing the data transfer rate for enhancing the frequency of conversion. The speech period is the time for which text sentences are summarised to give accuracy and relevancy in the result. The text sentences are used for managing the time required for the concatenation of the sentences. The upgrading of the short pause ad long pause helps in demonstrating the speech waveforms for managing the boundaries of the sentences. The attenuation is inserted in the sentences for reducing the rate of errors in the construction of the sentences through the medium of speech recognition system. The insertion of the long pauses helps in identifying the completion of the sentence (Bijuraj, 2013). The concatenation of the word limit helps in analysing the termination of the sentences. Time Line Activity chart of the research study Timeline for starting the research Timeline for completing the research Description of the activities performed Research undertaken 09-Oct-17 10-Oct-17 The research is undertaken on the topic Investigation on the machine learning and data mining activities associated with the speech to speech and speech to text summarization Selection of team criteria 11-Oct-17 13-Oct-17 The experienced and the expertise person should be selected for carrying out the research activities in gathering relevant data for the research study Collection of data required for carrying out the process of literature review 14-Oct-17 16-Oct-17 The format should be developed for collecting required data to carry out the research studies. Tools and techniques used for data analysis 17-Oct-17 19-Oct-17 The study of the literature helps in analysing the required data selection needed for the designing of the research study Construction of research questions according to the research undertaken 20-Oct-17 23-Oct-17 The designing of the research question is the vital role in the preparation of the research study because the data collected from different sources are depends on the primary and secondary research questions prepared for the completion of the research study Drafting a research proposal 24-Oct-17 26-Oct-17 The designing of the draft is based on the research questions designed on the investigation of speech to speech summarization with the use of machine learning program Deployment of research methodologies 27-Oct-17 20-Nov-17 The relevant and adequate data can be collected with the use of research methodologies such as face to face interview, questionnaire, focused group, observation, and others Reviewing of the draft prepared for research undertaken 21-Nov-17 23-Nov-17 Reviewing of the draft prepared for research undertaken Providing the research draft for sanctioning and approval 24-Nov-17 27-Nov-17 The research authority approved the research proposal on the investigation of speech to speech summarization with the use of machine learning program (Garla, 2010) Analysis of the research documentation collected 28-Nov-17 30-Dec-17 The data analysis of tools and technologies are used for investigation of speech to speech summarization with the use of machine learning program Findings and assessment 01-Jan-18 10-Jan-18 The focus should be given on the clear understanding on investigation of speech to speech summarization with the use of machine learning program Completion of the undertaken research 11-Jan-18 20-Jan-18 Submission of the research undertaken on the topic Investigation on the machine learning and data mining activities associated with the speech to speech and speech to text summarization Budget The budget allocated for conducting the research study is around $ 6000 on investigation of speech to speech data mining techniques. $ 1200 is allocated for conducting the literature review and collecting data from research methodologies. $ 1200 is used for data collection and data analysis for finding the result of the undertaken research. $ 2500 is spent on traveling allowance. $ 1100 are used for conducting experiments for the analysis of the error occurred in the process of data mining in the speech to speech summarization. The following table shows the distribution of the budget allocated to the research study. Activities Allocated budget literature review and collecting data from research methodologies $ 1200 Data collection and data analysis for finding the result of the undertaken research $ 1200 Traveling allowance $ 1500 Conducting experiments for the analysis of the error occurred in the process of data mining in the speech to speech summarization $ 1100 Development of the research report $ 1000 Total estimated cost $ 6000 Research limitation The limitation of the research is the number of samples used for study. The sub-analysis of the finding has not been done (Nenkova, 2016). The proposed budget and time is inefficient in handling the research proposal. Constraints: The major constraints associated with the speech text summarization are the concatenation of the words and the construction of the sentence. Ethical Issues: The miscommunication is the major problem which can give birth to the misunderstanding. The interpretation of the speaker language is wrongly done by the receiving units which creates the scenario of the misunderstanding. The intention and the emotion of the listener are misinterpreted by the receiver. Conclusion The purpose of this paper is to reduced errors from the construction of the sentence. The important sentence depends on the synchronization of the result achieved. The removal of the unwanted words helps in reducing the length of the sentence and reducing errors. The two phased protocol developing the compression ratio according to the demand of the sentence formation to achieve accuracy and minimizing errors in the sentence. The reliability of the method can be achieved by managing the occurrence of spontaneous speech. The errors is the deviation of the outcome from the expected results. The errors are categorised into two categories which are under generation and over generation. The insertion and deletion process is used for handling the errors in the construction of the sentence. The correction is recognized automatically for extracting the speech segmentation. The clarification of the request helps in confirmation of the concept for designing the error resistance model for the a ctivities undertaken to establish error resistance working model for the summarization of the speech to speech and speech to text model. The attenuation is inserted in the sentences for reducing the rat of errors in the construction of the sentences through the medium of speech recognition system. The effectiveness and efficiency can be improved of the sentence construction for managing the compression ratio of larger sentence to reduce the rate of error. References: Antino, H. (2012). Emerging technologies of text mining: Techniques and applications. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Berry, M. (2014). Survey of text mining: clustering, classification, and retrieval. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Bijuraj, L. (2013). Clustering and its applications. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Bramer, M. (2013). Research and development in intelligent system. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Cercone, N. (2012).Advances in knowledge discovery and data mining . 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Chakraborty, G. (2014). Analysis of unstructured data: Application of text analytics and sentiment mining. [Accessed 06 Oct. 2017]. Dinoy, I. (2016). Methodological challenges and analytic opportunities for modelling and interpreting big data. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Furui, S. (2013). Speech to speech and speech to text summarization. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Garla, S. (2010). Text mining and analysis. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Gonzalez, G. (2015).Recent advances and emerging applications in text and data mining for biomedical discovery. [Accessed 06 Oct. 2017]. Govindraj, S. (2016). Intensified sentiments analysis of customers product review using acoustic and textual features. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Kang, S. (2013). Dialogue strategies to overcome speech recognition errors inform filling dialogue. [Accessed 06 Oct. 2017]. Kawle, A. (2013). Text to speech web plugin with text summarisation. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Koulali, R. (2011). Topic detection and multi-word terms extraction for Arabic unvowelized documents. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Liu, Y. (2006). A study on the machine learning from imbalanced data for sentence boundary detection in speech. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Moreno, A. (2012). Text analytics: The convergence of big data and artificial intelligence. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Najafabadi, M. (2012). Deep learning application and challenges in big data. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Nenkova, A. (2016). A survey of text summarization techniqies. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Neto, J. (2015). Automatic text summarization using a machine learning approach. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Perner, P. (2010). Machine learning and data mining in pattern recognition. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Solka, J. (2013). Text data mining theory and methods. 1st ed. [ebook]. [Accessed 06 Oct. 2017]. Zhong, N. (2012). Effective pattern discovery for text mining. 1st ed. [ebook]. [Accessed 06 Oct. 2017].
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.