BIG DATA ANALYTICS MACHINE LEARNING ALGORITHMS: A SURVEY1M.Lija,2A.
Aloysius, 3S. Banumathi1M.PhilScholar, 2,3Assistant professor,1, 2, 3Department of Computer Science,1, 2 St Joseph’sCollege, 3 Holy cross College, Trichy-2.
ABSTRACT Bigdata analytics is the process of examining huge and various data sets. Big datato uncover unknown patterns, unidentified correlations, market trends, customerpreferences and other useful information that can help organizations make moreinformed business decisions. Using different algorithms to provide comparisonscan offer some amazing results about the data being used. Making thesecomparisons will give a manager more nearby into business problem andsolutions.Predictiveanalytics is one of types of big data analytics which is learns from knowledgeand expect the future performance or patterns. This paper presents a study ofon machine Learning based evolutionary algorithms which identifies theapplicable accurate algorithm based on problem.
Index Terms— BigData, Big Data prediction, Big Data Analytics, Evolutionary algorithms, machine LearningI. INTRODUCTIONBig data is acollection of data sets that are so huge and complex that usual dataprocessing application software is not enough to deal. Data set isa collection of data. Big data challenges include capturing data, datastorage, data study, investigate, transfer, visualization, querying,and updating and information privacy.
There are three characteristics tobig data known as Volume, Variety and Velocity. The term “big data” wascreated to define the collection of huge amounts of data in structured,semi-structured, or unstructured formats in big databases, file systems, or other types of repositories. Theprocessing of this data in order to produce an analysis and combination of thetrends and actions in actual or almost real-time.
Out of the aboveamounts of data, the unstructured data needs more immediate analysis and bearsmore valuable information to be exposed, providing a more in-depthunderstanding of the researched subject. It is also the unstructured data whichincurs more challenges in collecting, storing, organizing, classifying,analyzing, as well as supervision. In addition to the big data computingability, the quick advances in using intellectual data analytics techniquesdrawn from the emerging areas of artificial intelligence (AI) and machinelearning (ML) provide the ability to process very big amounts of diverseunstructured data that is now being generated daily to extract valuableactionable information. Machine learningexplores the learning and structure of algorithms that can study fromand make predictions on data.
The proper information extractionfrom the variety of resources needs mining, machine learning and naturallanguages processing techniques. Readily available are four types of analyticsspecifically prescriptive, predictive, diagnostic, and descriptive. Accordingto Gartner, most of the association had used predictive compares to othertypes.Generally, Machine Learning Algorithms can beseparated into categories according to their use and the main categories arethe following:· Supervised learning· Unsupervised Learning· Semi-supervised Learning· Reinforcement Learning· EvolutionaryLearning· Deeplearning MACHINE LEARNING UNSUPERVISED LEARNING SUPERVISED LEARNING REGRESSION CLASSIFICATION CLUSTURING Fig: 1 Machine learning techniques ALGORITHM LEARNING TASK SUPERVISED LEARNING NEAREST NEIGHBOR CLASSIFICATION NAIVE BAYES CLASSIFICATION DECISION TREES CLASSIFICATION LINEAR REGRESSION REGRESSION SUPPORT VECTOR MACHINE DUAL USE NEURAL NETWORK DUAL USE UNSUPERVISED LEARNING K_MEANS CLUSTERING CLUSTERING ASSOCIATION RULES BATTERN DETECTION Table1. Machine Learning Algorithmswith Learning TaskII.
MACHINE LEARNING The traditionalMachine Learning (ML) techniques have been developed and used for extractinguseful information from the data through training and validation using labeleddatasets 6. Machine learning (ML), a sub-field of artificial intelligence(AI), focuses on the task of enabling computational systems to learn from dataabout how to perform a wanted task mechanically 11. The goal of machine learning is to developmethods that can automatically identify patterns in data, and then to use. Themachine learning task involves with numerical and probabilistic methods 9.
Itdevelopment training data and produce a predictive model. The data adaptivemachine learning methods can be acknowledged throughout the science world 8. Machine learning has many applicationsincluding decision making, predicting and it is a key enabling technology inthe operation of data mining and big data techniques in the varied fields ofhealthcare, science, engineering, business and finance 5. Fig.1 illustratesthe techniques of Machine Learning. The tasks can be characterized into thefollowing major types: III. SUPERVISEDLEARNING According to thenature of the presented data, the two main categories of learning tasks are: supervised learning when bothinputs and their required outputs (labels) are known and the system learns tomap inputs to outputs.
Classification and regression are examples of supervisedlearning: in classification the outputs take separate values (class labels)while in regression the outputs are continuous. Examples of classificationalgorithms are k-nearest neighbour, logistic regression, and Support Vector Machine(SVM) while regression examples include Support Vector Regression (SVR), linearregression, and polynomial regression. Some algorithms such as neural networkscan be used for both, classification and regression. Table.1illustrates the types of supervised Learning algorithms and its learning task.10a. Nearest Neighbor The nearest neighbor (NN) technique is very easy, highlyefficient and successful in the field of pattern recognition, textcategorization, object recognition etc. Its simplicity is its main benefit, butthe disadvantages can’t be ignored even.
The memory requirement and computationdifficulty also matter. Many techniques are developed to defeat theselimitations 13.b.Naive Bayes NaïveBayes Classifiers are based on Baye’s Theorem that assumes self-determinationamong features given a class. All the attributes are analysed independentlygiving all of them equal importance 18. The very surprised feature of Naivebayes is extremely fast to run large and thin data set .These has beengenerally used for the Internet traffic classification: e.
g., naive Bayesianclassification of the Internet traffic. c.Decision Trees (DT) DecisionTrees define as widely used spontaneous method that can be used for learningand predicting about target features both for quantitative target attributes aswell as nominal target attributes. It is directed tree with root node which hasno incoming edges, and all other nodes with accurately one incoming edges,known as decision nodes. 12. d. Linear Regression Regression analyses mainly focus on findingassociation between a dependent variable and one or more independent variable.
Predict the value of dependent relative variable based on one or moreindependent variable. The regression model basically divided into univariateand multivariate and also that is further divides into linear and nonlinear.15 e.Support Vector Machines SupportVector Machines (SVM) is an extensively used supervised learning technique thatis remarkable for being practical and theoretically sound, simultaneously.
Theapproach of SVM is rooted in the field of numerical learning theory, and issystematic: e.g., training a SVM has an only one of its kind solution (since itinvolves optimization of a concave function) 11.
A support vector machine is a Classificationmethod. An SVM training algorithm builds a model that assigns new examples intoone type or the other, making it a non-probabilistic binary linear classifier.An SVM model is a representation of the examples as points in space, mapped sothat the examples of the separate categories are divided by a clear gap that isas extensive as possible.
The supportvector machine has been developed as strong tool for classification andregression in noisy, complex domains. The two key features of support vectormachines are generalization theory, which leads to a honourable way to choosehypothesis and, kernel functions, which introduce non-linearity in thehypothesis space without explicitly requiring a non-linear algorithm 14.IV. UNSUPERVISED LEARNING Clustering is the basic method inunsupervised learning. In clustering, the learning task is to categorize,without requiring a labeled training set, examples into “clusters? on the basisof perceived relationship. This clustering is used to find the groups of inputswhich have similarity in their characteristics. Spontaneously, clustering issimilar to unsupervised classification while classification in supervisedlearning assumed the availability of a correctly labeled training set, theunsupervised task of clustering seeks to classify the structure of input datadirectly 8. Suggestion services to meet the requirements of users, and timetechnology for analysis of clustering processing is growing in importance,along with the big data analysis technologies 16.
Table.1refers to the types of unsupervised Learning algorithms and its learning task.V. CONCLUSION The paperpresents an overview of big data analytics machine-learning algorithmsparticularly supervised and unsupervised algorithms for big data. The quantityof data has been rising and data set analyzing become more competitive.
Machinelearning analytics is the combination of analytics techniques and decisionoptimizations. This revision would provide a support for the researchers inthis area as it provides a wide Collection of previous research. This review isimaginary in nature. However, infuture work can be implement for enhancing SVM based algorithm in MachineLearning. REFERENCES 1″Big Data Tutorial” https://intellipaat.com/blog/big-data-tutorial-for-beginners/ 2 “Big Data ” https://en.wikipedia.
org/wiki/Big_data3″Big Data”http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data 4 BogdanIonescu, Dan Ionescu, Cristian Gadea, Bogdan Solomon and Mircea Trifan ,”AnArchitecture and Methods for Big Data Analysis”, vol356,pp 491-514, springer2014. 5. Fatima, M.
and Pasha, “Survey of Machine Learning Algorithms for Disease Diagnostic.Journal of Intelligent Learning Systems and Applications , 9, 1-16, March 2017.6ShanSuthaharan, ” Big Data Classification:Problems and Challenges in Network Intrusion Prediction with Machine Learning” Performance Evaluation Review, Vol. 41, No. 4, March 2014.
7 Nehakhan, Mohd Shahid Husain, Mohd Rizwan Beg, “Big data classification usingevolutionary techniques: a survey”, IEEE conference on Engineering andTechnology (ICETECH), 2015. 8 M. I. Jordan ,T. M.
Mitchell, “Machine learning: Trends, perspectives, and prospects”,sciencemag.org, vol349, Issue 6245, 2015. 9 G.Vaitheeswaran,L. Arockiam, “Machine Learning Based Approach to Enhance the Accuracy ofSentiment Analysis on Tweets”, International Journal Of Advanced Research InComputer Science And Management Studies, volume4,issue 5,2016. 10AlexandraL’Heureux, Katarina Grolinger, Hany F.
ElYamany, Miriam A. M. Capretz,”MachineLearning with Big Data: Challenges and Approaches”,DOI 10.
1109/ACCESS.2017.2696365,IEEE Access, 2017. 11 S.Banumathi, A.Aloysius, ” Big data predictionusing evolutionary techniques: a survey”, Journalof Emerging Technologies and Innovative Research (JETIR), sep 2016.12 WeiDai , Wei Ji, ” A Map reduce implementation of c4.
5 Decision Tree algorithm”,International Journal of Data base theory and Applications, Vol.7, No.1, 2014. 13 Farhad Soleimanian Gharehchopogh, Seyyed Reza Khaze,Isa Maleki3, “ANew Approach in Bloggers Classification with Hybrid of K-Nearest Neighbor andArtificial Neural Network Algorithms”, IndianJournal of Science and Technology, Vol 8(3), 237–246, February 2015.14 Mrs.
P.SheelaRani, S.Shalini, J.Rukmani, A.Shanthini, “Energy efficient scheduling of mapreduce for evolving big data applications”, International journal of advancedresearch in computer and communication engineering, vol.
5, issue.2, 2016. 15 RamyaMG, Chetan Balaji, Girish L, ” Environment change prediction to adapt climatesmart agriculture using big data analytics”, IJARCT.ORG, 2015. 16 Se-Hoon Jung,Jong-Chan Kim, Chun-Bo Sim,” Prediction Data Processing Scheme using anArtificial Neural Network and Data Clustering for Big Data”, IEEE Explorer,2015. 17 CharlesW.
Anderson?, Minwoo Lee†and Daniel L. Elliott, “Faster Reinforcement Learning After Pretraining DeepNetworks to Predict State Dynamics”, IEEE Explorer, 2015. 18 Amir Gandomi, MurtazaHaider, “Beyond the hype: big data concepts, methods, and analytics”,International Journal of Information Management, 2015.