FEATURE CLUSTERING USING SUBSELECTION ALGORITHM IN BIG DATA USING FIDOOP

Published on July 2016 | Categories: Research | Downloads: 26 | Comments: 0 | Views: 254
of 4
Download PDF   Embed   Report

Big data processing is a high demand area which imposes a heavy burden on computation, communication, storage in data centers, which incurs considerable operational cost to data center provider. So minimizing cost has become an issue for the upcoming big data. Different from conventional cloud service one of the main feature of the big data service is the tight coupling between data and computation, as computation task can be conducted only when the corresponding data are available. As a result, three factors that is communicational cost, computational cost, operational cost effects the expenditure cost of data centers. So in order to minimize the cost clustering is used. Clustering groups a selected objects into classes of similar objects. Feature Selection Removes Irrelevant Features- it occurs in the batch processing (scheduling algorithm) Redundant Features its occurs in the cluster formation (data-centric algorithm) joint-optimization– 2 steps Features divided into clusters(subsets) MST Cluster representatives are selected Efficient, Effective, Independent. Based on these criteria, a feature clustering based on selection algorithm is proposed and experimentally evaluated for a sample cancer dataset. This work finds the effective attributes used and removes redundancy.

Comments

Content

Big data processing is a high demand area which imposes a heavy burden on computation, communication, storage in data centers, which incurs considerable operational cost to data center provider. So minimizing cost has become an issue for the upcoming big data. Different from conventional cloud service one of the main feature of the big data service is the tight coupling between data and computation, as computation task can be conducted only when the corresponding data are available. As a result, three factors that is communicational cost, computational cost, operational cost effects the expenditure cost of data centers. So in order to minimize the cost clustering is used. Clustering groups a selected objects into classes of similar objects. Feature Selection Removes Irrelevant Features- it occurs in the batch processing (scheduling algorithm) Redundant Features its occurs in the cluster formation (data-centric algorithm) joint-optimization– 2 steps Features divided into clusters(subsets) MST Cluster representatives are selected Efficient, Effective, Independent. Based on these criteria, a feature clustering based on selection algorithm is proposed and experimentally evaluated for a sample cancer dataset. This work finds the effective attributes used and removes redundancy.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close