Research Projects

Fast, Scalable and Geo-Distributed PCA forBig Data Analytics

In our study, we take advantage of the zero-noise-limit Probabilistic PCA model, and introduce a block-division method for it in order to suppress the explosion of intermediate data efficiently. We employ several optimization ideas such as mean propagation for preserving sparsity, dynamic tuning of the number of blocks to automatically adjust to large dimensions, etc. Additionally, in the geo-distributed environment, we propose a communication efficient solution by reducing idle time, passing only the required parameters, and choosing geographically ideal central datacenter for faster accumulation. We refer to our algorithm as TallnWide. Our empirical evaluation with real datasets shows that TallnWide can successfully handle at least $\mathbf{10\times}$ higher dimensional data than existing methods, and offer up to $\mathbf{2.9\times}$ improvement in running time in the geo-distributed environment compared to the conventional approaches.

Supervisor: Dr. Muhammad Abdullah Adnan
Associate Professor, Deaprtment of CSE
Bangladesh Universiry of Engineering and Technology (BUET)
Co Authors:
1. Md. Mehrab Tanjim, PhD Student, CSE, University of California, San Diego
Status: Published in Information System [Paper Link] Code

UACD: A Local Approach for Identifying the Most Influential Spreaders in Twitter in a Distributed Environment

We propose a novel method of identifying the most influential spreaders on Twitter social network by incorporating the user-specific information (extracted from his/her Twitter account) to the topological information. We provide a distributed implementation of our proposed algorithm on the Amazon EC2 and observe that the algorithm is scalable and can process a significantly large network. We compare our ranking result with that of the existing methods using widely accepted metrics of ranking comparison and our experimental results indicate that our new method is $\mathbf{12.5\%}$ (average) more accurate and can produce the result in $\mathbf{175\times}$ less time.

Supervisor: Dr. Muhammad Abdullah Adnan
Associate Professor, Deaprtment of CSE
Bangladesh Universiry of Engineering and Technology (BUET)
Co Authors:
1. Md. Saiful Islam, PhD Student, CSE, University of Rochester
2. Md. Tarikul Islam Papon, PhD Researcher, CS, Boston University
Status: Under review at SNAM

To Download or Not to Download: A Machine Learning Approach for Detecting Privacy Evasive Mobile Applications on Google Play Store

In our study, we apply unsupervised learning in order to clusterize the mobile apps from Google Play Store based on their description and permissions they seek from the user. After that, we detect the out-layered apps that significantly deviate from the relevant permissions and label them as unsafe. Finally, we design an LSTM based deep learning model which can provide a rating that best represents the app’s behavior regarding user privacy. We evaluate the accuracy of our model using our self-labeled (description and permissions vs safe/unsafe) dataset.

Supervisor: Dr. Muhammad Abdullah Adnan
Associate Professor, Deaprtment of CSE
Bangladesh Universiry of Engineering and Technology (BUET)
Co Authors:
1. Md. Mehrab Tanjim, PhD Student, CSE, University of California, San Diego
2. Md. Touhidul Islam, Graduate RA, CSE, Penn State University
Status: Preprint

Hierarchical Attention for Host Intrusion Detection

The host-based intrusion detection system (HIDS) analyzes auditing data from operating systems, whereas System-call based HIDS is about analyzing collected Linux system call traces to detect any malicious activity. The traditional methods of HIDS have been proven to be vulnerable to higher number of false alarms. In our work, we propose a novel hierarchical attention based deep learning method of detection intrusion on a host. We evaluate our model on ADFA-LD dataset, which is a collection of a trace data of Linux system calls. We tune our model’s hyper parameters to produce the optimum result, and our method successfully outperforms the existing methods in terms of accuracy as well as lower false alarm rate.

Supervisor: Dr. Md. Shohrab Hossain
Professor, Deaprtment of CSE
Bangladesh Universiry of Engineering and Technology (BUET)
Co Authors:
1. Md. Shehab Sarar Ahmed, Lecturer, CSE, Bangladesh Universiry of Engineering and Technology (BUET)
Status: Preprint

Protein Function Prediction using Multi-Layer CNN

Developing efficient computational approaches for automatic protein function prediction is of utmost importance to reduce the large gap between the numbers of proteins with known primary sequences and those with experimental annotations. In our work, we have targeted to develop a highly accurate method for predicting protein functions which incorporates a novel hierarchical multi-layer convolutional neural network (CNN) in order to effectively capture the long-range interactions among the amino acid residues. We have evaluated our model using CAFA3 and Uniprot dataset.

Supervisor: Dr. Md. Shamsuzzoha Bayzid
Associate Professor, Deaprtment of CSE
Bangladesh Universiry of Engineering and Technology (BUET)
Co Authors:
1. Nafis Sadeq, PhD Student, CSE, University of California, San Diego
2. Shafayat Ahmed Piyal, Graduate TA, CSE, Virginia Tech
Status: Preprint