Abstract

Data clustering in social networks is an emerging need for categorizing the user’s data according to similarity of topics. Twitter is a great source for providing platform to social media users for sharing their views or opinions, or exchanges the ideas. Social media provides a large amount of health-related data and tends to more scope for its research in the direction of early monitoring and predict risk factors. Existing system uses two problems in the development of healthcare intelligent system by social media data, these problems are namely health transition detection and health transition prediction. Health topic models are widely used techniques in text mining for extraction of social data features. Traditional health topic models, namely Latent semantic indexing (LSI), Probabilistic latent semantic indexing (PLSI), Latent dirichlet allocation (LDA), non-negative matrix factorization (NMF) are used for extraction of latent variables or hidden topics of social data. As a part of the research work an attempt will be made to develop Ailment Topic Aspect Model (ATAM) is a new latent model that can be dedicated for capturing the topics from heath tweet data. It aims to extract health-related topic transitions by minimizing the prediction error on topic distributions between consecutive posts at different time and geographic granularities. Healthcare costs are driving the demand for big data-driven healthcare applications. Technology decision-makers in healthcare systems can't ignore the increased efficiencies, the attractive economics, and the rapid pace of innovation that can now be applied to delivering and paying for healthcare. Social system is a great source for sharing views or conversations by different people on health-related topics such as types of diseases, symptoms and medicines. Extraction of sentiments from such kind of social data is an emerging need in healthcare and recent research shown social recommended solutions for healthcare. Social data clustering is performed by LSI, PLSI, LDA, and NMF and they deliver health clustering results without knowing the knowledge of prior cluster tendency. Estimation of number of clusters for given social data is known as cluster tendency. This problem is intractable by exiting topic models with the information of tweet-term matrix of social health related data. In proposed frame work, it can be addressed by finding topic-document dense matrix through assessment of similar topic. The similarity features are computed and tweets are re-ordered according to similarity features during assessment of social data cluster tendency. Visual approaches are proposed for visualizing health clusters that useful for knowing prior number of clusters and improve the efficiency proposed topic models in social data health clustering.