TY - JOUR
T1 - Cost-effective online trending topic detection and popularity prediction in microblogging
AU - Miao, Zhongchen
AU - Chen, Kai
AU - Fang, Yi
AU - He, Jianhua
AU - Zhou, Yi
AU - Zhang, Wenjun
AU - Zha, Hongyuan
PY - 2017/6/9
Y1 - 2017/6/9
N2 - Identifying topic trends on microblogging services such as Twitter and estimating those topics’ future popularity have great academic and business value, especially when the operations can be done in real time. For any third party, however, capturing and processing such huge volumes of real-time data in microblogs are almost infeasible tasks, as there always exist API (Application Program Interface) request limits, monitoring and computing budgets, as well as timeliness requirements. To deal with these challenges, we propose a cost-effective system framework with algorithms that can automatically select a subset of representative users in microblogging networks in offline, under given cost constraints. Then the proposed system can online monitor and utilize only these selected users’ real-time microposts to detect the overall trending topics and predict their future popularity among the whole microblogging network. Therefore, our proposed system framework is practical for real-time usage as it avoids the high cost in capturing and processing full real-time data, while not compromising detection and prediction performance under given cost constraints. Experiments with real microblogs dataset show that by tracking only 500 users out of 0.6 million users and processing no more than 30,000 microposts daily, about 92% trending topics could be detected and predicted by the proposed system and, on average, more than 10 hours earlier than they appear in official trends lists.
AB - Identifying topic trends on microblogging services such as Twitter and estimating those topics’ future popularity have great academic and business value, especially when the operations can be done in real time. For any third party, however, capturing and processing such huge volumes of real-time data in microblogs are almost infeasible tasks, as there always exist API (Application Program Interface) request limits, monitoring and computing budgets, as well as timeliness requirements. To deal with these challenges, we propose a cost-effective system framework with algorithms that can automatically select a subset of representative users in microblogging networks in offline, under given cost constraints. Then the proposed system can online monitor and utilize only these selected users’ real-time microposts to detect the overall trending topics and predict their future popularity among the whole microblogging network. Therefore, our proposed system framework is practical for real-time usage as it avoids the high cost in capturing and processing full real-time data, while not compromising detection and prediction performance under given cost constraints. Experiments with real microblogs dataset show that by tracking only 500 users out of 0.6 million users and processing no more than 30,000 microposts daily, about 92% trending topics could be detected and predicted by the proposed system and, on average, more than 10 hours earlier than they appear in official trends lists.
KW - cost
KW - microblogging
KW - prediction
KW - topic detection
UR - http://dl.acm.org/citation.cfm?doid=3026478.3001833
UR - http://www.scopus.com/inward/record.url?scp=85008240000&partnerID=8YFLogxK
U2 - 10.1145/3001833
DO - 10.1145/3001833
M3 - Article
AN - SCOPUS:85008240000
SN - 1046-8188
VL - 35
JO - ACM Transactions on Information Systems
JF - ACM Transactions on Information Systems
IS - 3
M1 - 18
ER -