Heterogeneous-accelerator-based High-performance/Energy-efficient Data Analytics

Big data applications demand the underlying hardware systems to deliver unprecedented performance, in terms of data processing throughput and access latency, at increasingly higher scales to fulfill the ever-growing needs. Hardware acceleration can effectively improve computing throughput and energy consumption for certain domains of algorithms over general-purpose processors. With the advent of high-level synthesis technology that provides the C-to-hardware mapping capability, FPGAs now have become the 2nd possible accelerator for server computing, after GPUs. GEARS exploits the strong synergy between GPUs and FPGAs to enable significant improvements in both performance and energy efficiency for big data systems.

Publications:

  1. S. Biookaghazadeh, M. Zhao, and Ren. F, “Are FPGAs Suitable for Edge Computing?” The USENIX Workshop on Hot Topics in Edge Computing (HotEdge '18), July 2018.
  2. Y. Li, Z. Liu, K. Xu, H. Yu, and F. Ren, “A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks,” ACM Journal on Emerging Technologies in Computing (JETC), vol. 14, no. 2, p. 18, 2018.
  3. K. Xu and F. Ren, “CSVideoNet: A Real-time End-to-end Learning Framework for High-frame-rate Video Compressive Sensing,” IEEE Winter Conference on Applications of Computer Vision (WACV). To appear.
  4. Y. Feng, F. Yang, X. Zhou, Y. Guo, T. Tang, F. Ren, J. Guo, and S. Ji, “A Deep Learning Approach for Targeted Contrast-Enhanced Ultrasound Based Prostate Cancer Detection,” IEEE/ACM Transactions on Computational Biology and Bioinformatics. To appear.
  5. Y. Li, Z. Liu, K. Xu, H. Yu, and F. Ren, “A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks,” ACM Journal on Emerging Technologies in Computing Systems (JETC) 14, no. 2, 2018.
  6. Z. Liu, Y. Li, F. Ren, H. Yu, and W. Goh, “SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network,” The AAAI Conference on Artificial Intelligence (AAAI), pp. 7194-7201, 2018.

High-performance/Energy-efficient Deep Storage Hierarchy for Big Data

Big data applications are inherently data intensive, thereby posing great performance and reliability challenges to the underlying storage systems since the inception of big data. Emerging solid-state storage has the potential to bridge the gap between DRAMs and HDDs as it offers
much better performance than HDDs and much larger capacity and lower power consumption than DRAMs. But solid-state storage has its own performance, capacity, and endurance constraints, and needs to be strategically incorporated into big data systems instead of completely replacing DRAMs or HDDs. Based on GEARS, we will further study how to address the aforementioned limitations of existing big data storage by employing solid-state storage to form an effective deep storage hierarchy for high performance and energy efficiency.

Publications:

  1. Q. Yang, R. Jin, and M. Zhao, “SmartDedup: Optimizing Deduplication for Resource-constrained Devices,” Proceedings of the USENIX Annual Technical Conference (USENIX ATC ’19), July 2019.
  2. M. Zhao and Y. Xu, “vPFS+: Managing I/O Performance for Diverse HPC Applications,” Proceedings of the 35th International Conference on Massive Storage Systems and Technology (MSST 2019), May 2019.
  3. P. Zuo, Y. Hua, M. Zhao, W. Zhou and Y. Guo, “Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes,” Proceedings of the 51st IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2018.
  4. J. Fu, Y. Lu, J. Shu, G. Liu, and M. Zhao, “CowCache: Effective Flash Caching for Copy-on-Write Virtual Disks,” Cluster Computing, June 2019.
  5. G. Vietri, L. V. Rodriguez, W. A. Martinez, S. Lyons, J. Liu, R. Rangaswami, M. Zhao, G. Narasimhan, “Driving Cache Replacement with ML-based LeCaR,” Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage), July 2018.
  6. Y. Xu and M. Zhao, “IBIS: Interposed Big-data I/O Scheduler,” Proceedings of the 25th International Symposium on High-Performance Parallel and Distributed Computing, May 2016.
  7. W. Li, G. Jean-Baptise, J. Riveros, G. Narasimhan, T. Zhang, and M. Zhao, “CacheDedup: In-line Deduplication for Flash Caching,” Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16), February 2016.
  8. D. Arteaga, J. Cabrera, J. Xu, S. Sundararaman, and M. Zhao, “CloudCache: On-demand Flash Cache Management for Cloud Computing,” Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16), February 2016.
  9. V. Tarasov, L. Rupprecht, D. Skourtis, W. Li, R. Rangaswami, and M. Zhao, “Evaluating Docker Storage Performance: from Workloads to Graph Drivers,” Cluster Computing, January 2019.

Scalable Tensor Decompositions for Understanding Billion-Scale Data

Data-driven understanding of dynamics of emerging phenomena are increasingly critical in various application domains, from predicting geo-temporal evolution of social-communities to helping reduce energy footprints of buildings leading to more sustainable building systems and architectural designs. Many of the algorithms for model fitting are based on an iterative processes, such as alternating least squares (ALS) or stochastic gradient descent (SGD) based techniques and these are subject to major bottlenecks involving significant amount of data movement. This necessitates massively parallel, yet I/O sensitive, data processing
architectures to discover models even for moderate input data sizes.

Publications:

  1. K. S. Candan, “Modeling Complex Dynamic Systems,” Workshop on Conflict, Competition, Cooperation and Complexity: Using Evolutionary Game Theory to Model Realistic Populations, July 2019.
  2. G. Pedrielli, K. S. Candan, X. Chen, L. Mathesen, A. Inanalouganji, J. Xu, C. Chen, L. Lee, “Generalized Ordinal LEarning Framework (GOLF) for Decision Making with Future Simulated Data,” Asia-Pacific Journal of Operational Research, 2019.
  3. H. Behrens, K. S. Candan, X. Chen, Y. Garg, M. Li, X. Li, and S. Liu, “DataStorm: Coupled, Continuous Simulations for Complex Urban Environments,” Special Issue on Urban Computing and Smart Cities of ACM’s Journal of Transactions on Data Science (TDS), 2019.
  4. S. Liu, S. R. Poccia, K. S. Candan, M. Sapino, X. Wang, “Robust MultiVariate Temporal Features of Multivariate Time Series,” TOMCCAP 14(1): 7:1-7:24, 2018.
  5. X. Li, K. S. Candan, M. L. Sapino, “nTD: Noise-Profile Adaptive Tensor Decomposition,” Proceedings of the 26th International Conference on World Wide Web, pp. 243-252. International World Wide Web Conferences Steering Committee, April 2017.
  6. P. Casagranda, M. L. Sapino, K. S. Candan, “Context-Aware Proactive Personalization of Linear Audio Content,” EDBT 2017: 574-577, 2017.
  7. S. R. Poccia, M. L. Sapino, S. Liu, X. Chen, Y. Garg, S. Huang, J. H. Kim, X. Li, P. Nagarkar, K. S. Candan, “SIMDMS: Data Management and Analysis to Support Decision Making through Large Simulation Ensembles,” EDBT 2017: 582-585, 2017.
  8. J. H. Kim, M.-L. Li, K. S. Candan, M. L. Sapino, “Personalized PageRank in Uncertain Graphs with Mutually Exclusive Edges,” SIGIR 2017: 525-534, 2017.
  9. X. Chen, K. S. Candan, M. L. Sapino, “Exploring an Input Parameter Space with a Limited Simulation Budget,” INFORMS, October 2017.
  10. X. Li, K. S. Candan, M. L. Sapino, “nTD: Noise-Profile Adaptive Tensor Decomposition,” WWW 2017: 243-252, 2017.
  11. M. Kim, K. S. Candan, “Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions,” Data Min. Knowl. Discov. 30(1), pp. 1-46, 2016.
  12. S. Huang, K. S. Candan, and M. L. Sapino, “BICP: Block-Incremental CP Decomposition with Update Sensitive Refinement,” Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1221-1230. ACM, 2016.
  13. X. Li, S. Huang, K. S. Candan, M. L. Sapino, “2PCP: Two-phase CP decomposition for billion-scale dense tensors,” IEEE 32nd International Conference on Data Engineering (ICDE), pp. 835-846, May 2016.

Collaborative Visual Analytics

In a technological era characterized by mobility and rapidly changing hardware, remote and collaborative data analytics is needed to provide a viable means for interactive effective high quality visualizations of large-scale data sets. GEARS will allow us to further explore how to enable analysts to link such simulations with streaming media data. Specifically, we leverage the massive parallelism from CPUs and GPUs and dedicated hardware acceleration from FPGAs to achieve high throughput for high-resolution data exploration and low latency for in-situ analysis; and we leverage the deep storage hierarchy to provide high throughput access to large simulation and social media datasets as well as low latency access to steaming data and visualization results. Our new platform provides real-time support for user in the loop visual analytics, where analysts can combine ensembles of climate models and link these to ongoing streams of media data.

Publications:

  1. B. Mathis, Y. Ma, M. Mancenido, R. Maciejewski, “Exploring the Design Space of Sankey Diagrams for the Food-Energy-Water Nexus,” IEEE computer graphics and applications. To appear.
  2. A. K. Opejin, R. Aggarwal, D. D. White, R. Maciejewski, “Tracing the Trajectory of Food-Energy-Water Nexus Literature: A Bibliometric Analysis,” AGU Fall Meeting Abstracts, 2018.
  3. H. Wang, Y. Lu, S. T. Shutters, M. Steptoe, F. Wang, S. Landis, R. Maciejewski, “A Visual Analytics Framework for Spatiotemporal Trade Network Analysis,” IEEE transactions on visualization and computer graphics 25 (1), 331-341, 2019.
  4. Y. Lu, H. Wang, S. Landis, R. Maciejewski, “A Visual Analytics Framework for Identifying Topic Drivers in Media,” IEEE Transactions on Visualization and Computer Graphics. To appear.
  5. M. Steptoe, R. Krueger, R. Garcia, X. Liang, R. Maciejewski, “A Visual Analytics Framework for Exploring Theme Park Dynamics,” ACM Transactions on Interactive Intelligent Systems, 8(1), 4, 2018.
  6. W. Luo, M. Steptoe, Z. Chang, R. Link, L. Clarke, R. Maciejewski, “The Impact of Spatial Scales on the Inter-Comparison of Climate Scenarios,” IEEE Computer Graphics and Applications, 37(5): 40-49, 2017.
  7. Z. Liu, Y. Li, F. Ren, and H. Yu, “A Binary Convolutional Encoder-decoder Network for Real-time Natural Scene Text Processing,” The 1st International Workshop on Efficient Methods for Deep Neural Networks - Conference on Neural Information Processing Systems (NIPS), 2016.
  8. K. Xu, Y. Li, and F. Ren, “A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing,” The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.

Deep Learning with Social Data

Recent advances in deep learning have proven its ability to excel when applied to standard learning tasks such as language processing and representation learning. Many exciting new research questions in this area exist within the context of social media mining where traditional data is augmented by additional information extracted from the explicit social graph and context through these linked, multi-source, and heterogeneous social data. However, the high complexity and vast size of linked data pose interesting but significant challenges. Data generated through social media tends to have additional variability in spelling, grammar, slang, and creation protocols which, while providing useful cues for learning, can complicate the process. The added complexity of leveraging the rich network structure data found in social media on top of the heavy data and processing requirements of a neural network leads to prohibitively high requirements for computing and storage systems despite the promising potential return. GEARS allows us to address the above challenges in deep social data learning.

Publications:

  1. K. Shu, X. Zhou, S. Wang, R. Zafarani, and H. Liu, “The Role of User Profiles for Fake news Detection,” short paper, Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, August 2019.
  2. J. Li, L. Wu, R. Guo, C. Liu, H. Liu, “Multi-Level network Embedding with Boosted Low-Rank Matrix Approximation," Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, August 2019.
  3. L. Cheng, J. Li, Y. Silva, D. Hall, and H. Liu, “PI-Bully: Personalized Cyberbullying Detection with Peer Influence," Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI2019), August 2019.
  4. K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “dEFEND: Explainable Fake News Detection," Proceedings of the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019), August 2019.
  5. T. H. Nazer, M. Davis, M. Karami, L. Akoglu, D. Koelle, and H. Liu, “Bot Detection: Will Focusing on Recall Cause Overall Performance Deterioration," International Conference on Social Computing, Behavior-Culture Modeling, and Prediction (SBP-BRiMS 2019), July 2019.
  6. J. Subramanian, V. Sridharan, K. Shu, and H. Liu, “Exploiting Emojis for Sarcasm Detection," International Conference on Social Computing, Behavior-Culture Modeling, and Prediction (SBP-BRiMS 2019), July 2019.
  7. L. Cheng, R. Guo, Y. N. Silva, D. Hall, and H. Liu, “Hierarchical Attention Networks for Cyberbullying Detection on the Instagram Social Network," Proceedings of the 2019 SIAM International Conference on Data Mining (SDM19), May 2019.
  8. S. Yang, K. Shu, S. Wang, R. Gu, F. Wu, and H. Liu, “Unsupervised Fake News Detection on Social Media: A Generative Approach," Proceedings of 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), January 2019.
  9. L. Cheng, J. Li, Y. Silva, D. Hall, and H. Liu, “XBully: Cyberbullying Detection within a Multi-Modal Context," Proceedings of the 12th ACM International Conference on Web Search and Data Mining (WSDM 2019). February 2019.
  10. K. Shu, S. Wang, and H. Liu, “Beyond News Content: The Role of Social Context for Fake News Detection," Proceedings of the 12th ACM International Conference on Web Search and Data Mining (WSDM 2019), January 2019.
  11. S. Wang, J. Tang, F. Morstatter, H. Liu, “Paired Restricted Boltzmann Machine for Linked Data,” Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM-16), October 2016.
  12. S. Wang, J. Tang, C. Aggarwal, H. Liu, “Linked Document Embedding for Classification,” Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM-16), October 2016.
  13. S. Wang, J. Tang, C. Aggarwal, Y. Chang, H. Liu, “Signed Network Embedding in Social Media,” Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM-17), June 2017.
  14. S. Wang, Y. Wang, J. Tang, K. Shu, S. Ranganath, H. Liu, “What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation,” Proceedings of the 26th World Wide Web Conference (WWW-17), April 2017.
  15. S. Wang, C. Aggarwal, J. Tang, H. Liu, “Attributed Signed Network Embedding,” Proceedings of the 26th ACM International Conference on Information and Knowledge Management (CIKM-17), November 2017.

Real-time Threat Intelligence Using Social Media

We developed a multi-scale model including social, political, and cultural variables measuring radical change-orientation and attitudes about violence. Based on GEARS, we deliver computational tools to distill destabilization indicators, identify root causes, and track drivers of social unrest in any region from social media. To this end, we develop a machine learning framework which extracts such indicators by detecting, monitoring, and predicting various trends of online communities and their characteristic discourse patterns. We aim to jointly detect online communities and their discourse patterns with state-of-the-art matrix analytics from the diverse information available on social media and develop incremental and evolutionary algorithms to monitor their dynamics.

Publications:

  1. A. Salehi, M. Ozer, H. Davulcu, “Sentiment-Driven Community Profiling and Detection on Social Media,” ACM Conference on Hypertext and Social Media (HT’2018), pp. 229-237, July 2018.
  2. A. Salehi, and H. Davulcu, "Detecting Antagonistic and Allied Communities on Social Media," IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18), pp. 99-106, August 2018.
  3. M. Ozer, “Measuring the Impact of Bot Accounts on Political Network Polarization,” 14th Network Science Conference (NetSci’19), 2019.
  4. M. Ozer, Y. Yildirim, H. Davulcu, “Negative Link Prediction and Its Applications in Online Political Networks,” ACM Conference on Hypertext and Social Media (ACM Hypertext 17), July 2017.
  5. M. Ozer, N. Kim, H. Davulcu, “Community Detection in Political Twitter Networks using Nonnegative Matrix Factorization Methods,” IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'16), August 2016.
  6. S. Koc, I.H. Toroslu, H. Davulcu, “Co-Clustering Signed 3-Partite Graphs,” International Symposium on Foundations and Applications of Big Data Analytics (FAB'16), August 2016.
  7. H. Alostad, H. Davulcu, “Directional Prediction of Stock Prices using Breaking News on Twitter,'' Web Intelligence Journal, Vol. 15(1), pp. 1-17, December 2017.

Team Collaborations in Big Networks

The emergence of network science and the advent of big data era create a brand new environment where different users collaborate with each other to perform complex cognitive tasks collectively by providing an easier-than-ever access to the massive knowledge base as well as the broad social connectivity at an unprecedented scale, speed, and granularity. The goal of this GEARS-enabled project is two-fold. On one side, we want to understand the dynamic associational and causal mechanisms that drive peak team performance. On the other side, we want to create a suite of new instruments to predict, explore, and design high-performing teams. GEARS enables us to develop this system and largely broaden the scale, speed, and scope of the team networks one could ever analyze.

Publications:

  1. L. Zhao, Y. Yao, G. Guo, H. Tong, F. Xu, J. Lu, “Team Expansion in Collaborative Environments,” PAKDD (3), pp. 713-725, June 2018.
  2. Q. Zhou, L. Li, N. Cao, N. Buchler, H. Tong, “Extra: explaining team recommendation in networks,” Proceedings of the 12th ACM Conference on Recommender Systems (pp. 492-493). ACM, September 2018.
  3. Y. Wang, C. Shi, L. Li, H. Tong, H. Qu, “Visualizing Research Impact through Citation Data,” ACM Transactions on Interactive Intelligent Systems (TiiS) 8(1), p. 5, Mar 2018.
  4. L. Li, H. Tong, N. Cao, K. Ehrlich, Y. Lin, N. Buchler, “Enhancing Team Composition in Professional Networks: Problem Definitions and Fast Solutions,” IEEE Trans. Knowl. Data Eng. 29(3), pp. 613-626, 2017.
  5. L. Li, H. Tong, “Uncovering Teamwork in Networks - Prediction, Optimization and Explanation,” IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1132-1133. IEEE, 2017.
  6. L. Li, H. Tong, Y. Wang, C. Shi, N. Cao, N. Buchler, “Is the Whole Greater Than the Sum of Its Parts?” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 295-304. ACM, August 2017.
  7. L. Li, H. Jing, H. Tong, J. Yang, Q. He, B. Chen, “NEMO: Next Career Move Prediction with Contextual Embedding,” Proceedings of the 26th International Conference on World Wide Web Companion, pp. 505-513. International World Wide Web Conferences Steering Committee, April 2017.
  8. S. Wang, J. Tang, Y. Wang and H. Liu, “Exploring Hierarchical Structures for Recommender Systems,” IEEE TKDE, 2018.
    K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake News Detection on Social Media: A Data Mining Perspective,” SIGKDD Explorations, 2017.
  9. S. Ranganath, X. Hu, J. Tang, S. Wang and H. Liu, “Understanding and Identifying Rhetorical Questions in Social Media,” ACM TIST, 2017.
  10. S. Ranganath, S. Wang, X. Hu, J. Tang and H. Liu, “Facilitating Time Critical Information Seeking in Social Media,” IEEE TKDE, 2017.
  11. X. Meng, S. Wang, H. Liu, Y. Zhang, “Exploiting Emotion on Reviews for Recommender Systems,” In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), April 2018.
  12. X. Meng, S. Wang, K. Shu, J. Li, B. Chen, H. Liu, Y. Zhang, “Personalized Privacy-Preserving Social Recommendation,” In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), April 2018.
  13. K. Shu, S. Wang, J. Tang, Y. Wang, H. Liu, “CrossFire: Cross Media Joint Item and Friend Recommendations,” In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM 2018), February 2018.
  14. S. Wang, C. Aggarwal, J. Tang, H. Liu, “Attributed Signed Network Embedding,” In Proceedings of 26th ACM International Conference on Information and Knowledge Management (CIKM-17), November 2017.
  15. S. Wang, Y. Wang, J. Tang, K. Shu, S. Ranganath, H. Liu, “What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation,” In Proceedings of the 26th World Wide Web Conference (WWW-17), April 2017.
  16. S. Wang, J. Tang, C. Aggarwal, Y. Chang, H. Liu, “Signed Network Embedding in Social Media,” In Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM-17), 2017.
  17. K. Shu, S. Wang, H. Liu, “Understanding User Profiles on Social Media for Fake News Detection,” In Proceedings of the 1st IEEE International Workshop on Fake MultiMedia (FakeMM-18), 2018.
  18. C. Chen, J. He, N. Bliss and H. Tong, “Towards Optimal Connectivity on Multi-layered Networks,” IEEE Trans. Knowl. Data Eng, October 2017.
  19. C. Chen, H. Tong, L. Xie, L. Ying and Q. He, “Cross-Dependency Inference in Multi-layered Networks: A Collaborative Filtering Perspective,” ACM TKDD 42:1-42:26, August 2017.
  20. C. Chen, H. Tong, “On the eigen-functions of dynamic graphs: Fast tracking and attribution algorithms,” Statistical Analysis and Data Mining 10(2): 121-135, 2017.
  21. L. Li, H. Tong, N. Cao, K. Ehrlich, Y. Lin, N. Buchler, “Enhancing Team Composition in Professional Networks: Problem Definitions and Fast Solutions,” IEEE Trans. Knowl. Data Eng. 29(3): 613-626, 2017.
  22. L. Li, H. Tong, Y. Wang, C. Shi, N. Cao and N. Buchler, “Is the Whole Greater Than the Sum of Its Parts?” KDD, August 2017.
  23. D. Zhou, S. Zhang, M. Yildirim, S. Alcorn, H. Tong, H. Davulcu and J. He, “A Local Algorithm for Structure-Preserving Graph Cut,” KDD, August 2017.
  24. B. Du, S. Zhang, N. Cao and H. Tong, “FIRST: Fast Interactive Attributed Subgraph Matching,” KDD, August 2017.
  25. J. Xu, Y. Yao, H. Tong, X. Tao and J. Lu, “HoORaYs: High-order Optimization of Rating Distance for Recommender Systems,” KDD, August 2017.
  26. F. Du, N. Cao, Y. Lin, P. Xu, H. Tong, “iSphere: Focus+Context Sphere Visualization for Interactive Large Graph Exploration,” CHI 2017: 2916-2927, 2017.
  27. S. Zhang, D. Zhou, M. Yildirim, S. Alcorn, J. He, H. Davulcu and H. Tong, “HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection,” SDM, 570-578, 2017.
  28. A. Nelakurthi, H. Tong, R. Maciejewski, N. Bliss and J. He, “User-guided Cross-domain Sentiment Classification,” SDM, 471-479, 2017.
  29. L. Li, H. Jing, H. Tong, J. Yang, Q. He, B. Chen, “NEMO: Next Career Move Prediction with Contextual Embedding,” WWW 2017: 505-513, 2017.

Rare Event Mining in Large-scale, Heterogeneous Social Media Data

Online social media is ubiquitous in almost everyone’s personal, professional, and social lives; and it has become one major platform where people communicate and interact with each other. This brings the unique opportunity to better understand human behaviors, as social media provides a valuable data source to record/project people’s daily activities. In this project, we leverage GEARS as the platform to (1) conduct large-scale rare event detection research in terms of both applications and algorithms, and (2) develop a large-scale rare event detection toolkit to better educate the next generation of researchers in this area. The general research theme in this project is to understand and analyze the non-separable rare events (e.g., rumor creation and dissemination) from large-scale, heterogeneous online social media data.

Publications:

  1. Y. Shi, Y. Liu, H. Tong, J. He, G. Yan, N. Cao, “Visual Analytics of Anomalous User Behaviors: A Survey,” CoRR abs/1905.06720 May 2019.
  2. J. Wu, J. He, Y. Liu, “ImVerde: Vertex-Diminished Random Walk for Learning Imbalanced Network Representation,” IEEE International Conference on Big Data (Big Data), pp. 871-880, December 2018.
  3. Y. Zhou, A. Nelakurthi, and J. He, “Unlearn What You Have Learned: Adaptive Crowd Teaching with Exponentially Decayed Memory Learners,” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2817-2826. ACM, July 2018.
  4. D. Zhou, J. He, H. Yang, and W. Fan, “SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization,” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2807-2816. ACM, July 2018.
  5. J. Li, J. He, and Y. Zhu, “E-tail Product Return Prediction via Hypergraph-based Local Graph Cut,” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 519-527. ACM, July 2018.
  6. Y. Zhu, J, Li, J. He, A. Deshpande, and B. Quanz, “A Local Algorithm for Product Return Prediction in E-Commerce,” IJCAI, pp. 3718-3724, 2018.
  7. P. Yang, Q. Tan, and J. He, “Function-on-Function Regression with Mode-Sparsity Regularization,” ACM Transactions on Knowledge Discovery from Data (TKDD), 12(3), p.36, April 2018.
  8. J. Li, Y. Zhu, and J. He, “HiMuV: Hierarchical Framework for Modeling Multi-Modality Multi-Resolution Data,” IEEE International Conference on Data Mining (ICDM), pp. 267-276, November 2017.
  9. Y. Zhou, and J. He, “A Randomized Approach for Crowdsourcing in the Presence of Multiple Views,” IEEE International Conference on Data Mining (ICDM), pp. 685-694. IEEE, November 2017.  
  10. D. Zhou, A. Karthikeyan, K. Wang, N. Cao, and J. He, “Discovering Rare Categories from Graph Streams,” Data Mining and Knowledge Discovery 31(2), pp. 400-423, 2017.
  11. H. Lin, S. Gao, D. Gotz, F. Du, J. He, and N. Cao, “RCLens: Interactive Rare Category Exploration and Identification,” IEEE Transactions on Visualization and Computer Graphics. To appear.
  12. D. Zhou, S. Zhang, M. Yildirim, S. Alcorn, H. Tong, H. Davulcu and J. He, “A Local Algorithm for Structure-Preserving Graph Cut,” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 655-664. ACM, August 2017.
  13. H. Yang, Y. Zhu and J. He, “Local Algorithm for User Action Prediction Towards Display Ads,” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2091-2099. ACM, August 2017.
  14. S. Zhang, D. Zhou, M. Yildirim, S. Alcorn, J. He, H. Davulcu and H. Tong, “HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection,” Proceedings of the SIAM International Conference on Data Mining, pp. 570-578. Society for Industrial and Applied Mathematics, June 2017.
  15. D. Zhou, J. He, Y. Cao, J. Seo, “Bi-level Rare Temporal Pattern Detection,” In IEEE 16th International Conference on Data Mining (ICDM), pp. 719-728, December 2016.

Deep-structure-based Learning of Multimodal Web-scale Images/Videos

The social media era has seen unprecedented amount of user-generated data on the Web. While machine learning techniques have been the primary mechanism for automatic data analysis, mining useful information from such heterogamous Web-scale data uploaded by users poses as a new challenge. There are two primary difficulties: (i) the need of deep structures for learning abstract knowledge based on only raw multi-modal data such as user-uploaded images/videos with incomplete and noisy annotations; (ii) the lack of computational models and computing resources that scale to the vastness of the Web data. This GEARS-enabled project is concerned with learning semantic knowledge from Web-scale image or video datasets that are in the order of hundreds of millions to tens of billions of images or hundreds of thousands of videos, with incomplete/noisy textual annotations.

Publications:

  1. V. Gattupalli, Y. Zhuo, B. Li, "Weakly Supervised Deep Image Hashing using Tag Embedding," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  2. P.L.K. Ding, Z. Li, Y. Zhou, B. Li, "Deep residual dense U-Net for resolution enhancement in accelerated MRI acquisition," Medical Imaging: Image Processing 2019.
  3. X. Zhou, P.L.K. Ding, B. Li, "Improving Robustness of Random Forest Under Label Noise," IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 950-958, 2019.
  4. T. Yu, J. Yan, W. Liu, B. Li, "Generalizing Graph Matching beyond Quadratic Assignment Model," Conference on Neural Information Processing Systems (NIPS), 2018.
  5. Y. Li, T. Yu, B. Li, "Simultaneous Event Localization and Recognition for Surveillance Video," IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), 2018.
  6. T. Yu, J. Yan, W. Liu, B. Li, "Incremental Multi-graph Matching via Diversity and Randomness-based Graph Clustering," European Conference on Computer Vision (ECCV), 2018.
  7. T. Yu, J. Yan, J. Zhao, B. Li, "Joint Cuts and Matching of Partitions in One Graph," IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  8. P.S. Chandakkar, R, Venkatesan, B. Li, "Feature Extraction and Learning for Visual Data," Book Chapter in Feature Engineering for Machine Learning and Data Analytics, Ed. G. Dong & H. Liu, CRC Press, 2018.
  9. Y. Wang, S. Wang, G. Qi, J. Tang, B. Li, "Weakly Supervised Facial Attribute Manipulation via Deep Adversarial Network," IEEE Winter Conf. on Applications of Computer Vision (WACV), 2018.
  10. Y. Li, P.K Ding, B. Li, “Training Neural Networks by Using Power Linear Units (PoLUs),” arXiv:1802.00212, 2018.
  11. Y. Wang, S. Wang, G. Qi, J. Tang, B. Li, “Weakly Supervised Facial Attribute Manipulation via Deep Adversarial Network,” IEEE Winter Conf. on Applications of Computer Vision (WACV), 2018.
  12. P.L.K. Ding, B. Li, K. Chang, “Convex Dictionary Learning for Single Image Super-resolution,” IEEE Intl. Conference on Image Processing (ICIP), 2017.
  13. R. Venkatesan, B. Li. Convolutional Neural Networks in Visual Computing: A Concise Guide,CRC Press, ISBN 9781498770392.
  14. X. Zhou, P. L. K. Ding, B. Li, “Non-negative Dictionary Learning with Pairwise Partial Similarity Constraint,” IEEE International Conference on Multimedia and Expo (ICME), 2017.
  15. P. S. Chandakkar, Y. Li, P.L.K. Ding, B. Li, “Strategies for Re-training a Pruned Network in an Edge Computing Paradigm," IEEE EDGE Computing, 2017.
  16. P. Chandakkar, R. Venkatesan, B. Li, “Feature Extraction and Learning for Visual Data,” Book Chapter, in Feature Engineering for Machine Learning and Data Analytics, Ed. By G. Dong and H. Liu, CRC Press, 2018.
  17. P. S. Chandakkar, Y. Li, P. L. K. Ding and B. Li, “Strategies for Re-training a Pruned Neural Network in an Edge Computing Paradigm,” IEEE International Conference on Edge Computing, June, 2017.
  18. R. Venkatesan, B. Li, “Convolutional Neural Networks in Visual Computing: A Concise Guide,” CRC Press, 2017.