Hypergraph Text Classification for Mental Health Misleading Advice

This paper introduces HyperMAD, a novel Hypergraph Convolutional Network model designed for the multiclass classification of mental health advice in Arabic tweets. The model distinguishes between misleading and valid advice, further categorizing each tweet into specific classes of advice. HyperMAD leverages high-order relations between words in short texts, captured through the definition of four types of hyperedges that represent local and global contexts as well as semantic similarity. Extensive experiments demonstrate the effectiveness of HyperMAD, with results outperforming those from existing baselines. The study also includes an ablation study to investigate the significance and contribution of each hyperedge type. The paper presents a case study analyzing the accuracy and types of Arabic mental health advice on Twitter, revealing that about 9% of the advice in response to mental health expressions on Twitter was accurate in general. The paper concludes with the hope that the application of HyperMAD can be utilized in flagging misleading responses on social media, providing the correct resources for those who choose to share their mental health struggles online.


I. INTRODUCTION
The rise in mental health advice on social media platforms like Instagram, Twitter, and TikTok is a response to increased awareness and reduced stigma around mental health issues.These platforms host a variety of content, from self-care tips to professional insights, aimed at promoting mental well-being.However, the credibility of these sources varies, and not all advice is accurate or reliable.The abundance of information can make it difficult for individuals to distinguish between valid advice and misinformation, which can potentially worsen their mental health conditions.Therefore, it's essential for users to critically evaluate these sources, verify their credibility, and consult with qualified professionals to ensure they receive accurate, evidence-based guidance.
People turn to social media for mental health advice due to its accessibility, affordability, and anonymity.Some cannot afford traditional services, while others prefer online platforms to avoid judgment or stigma in their immediate social circles.For those living in societies where mental health is stigmatized, social media offers a discreet way to find support and connect with others facing similar challenges.Thus, the convenience and supportive community on these platforms make them an attractive alternative for those who find traditional mental health services inaccessible.
Our work examines mental health advice given on Twitter to Arabic users posting mental health concerns.Twitter's anonymity allows users to express feelings without fear of stigma.Several datasets have been published discussing the mental states of Arab Twitter users, whose potential audience spans 22 countries in the Middle East and North Africa.According to Statista1 , 54% of Arab youth reported difficulty in accessing quality mental health help in their region.Given the stigma associated with mental illness in the Arab world [7], we aim to investigate the nature of advice provided to those seeking help online.
Despite extensive research on medical misinformation on social media [30], [36], no studies specifically target mental health misinformation.Existing works focus on general health misinformation or fake news detection, employing various methods such as knowledge graphs [3], [23], transformerbased models [10], [11], and graph-based approaches [9], [20].However, none concentrate solely on mental health misinformation.This gap presents challenges: 1) accurately labeling misleading mental health advice from Twitter while considering the advice type and 2) capturing highorder word associations in tweets to enhance tweet class representation.Our goal extends beyond binary classification of misleading advice; we aim for multiclass classification of advice types, requiring labeled data not readily available from Twitter.Furthermore, we aim to capture complex word relations in tweets, which hypergraphs can effectively represent, thereby improving classification.
To address those challenges, we propose HyperMAD, Misleading mental health advice classification for Arabic tweets using hypergraph convolutional networks.In particular, the model distinguishes between misleading advice tweets and valid advice and meanwhile matches each tweet to a specific class of advice (medical, alternative medicine, meditation, physical activity, religious advice, and other).We summarize our contributions as follows:

B. Hypergraph Learning
Hypergraphs, a generalized graph structure, allow for hyperedges connecting multiple nodes, capturing high-order relationships ideal for modeling real-world data like social networks [37].Hypergraph learning, extending from graph learning, includes tasks like node classification [6], link prediction [17], and community detection [2].In text classification, a hypergraph attention network was proposed for long documents [5], and hypergraph convolutional networks were used for short texts like reviews or titles [14].However, [5]'s model was built for long documents and did not address how to handle short texts like those retrieved from social media posts, and [14]'s model represents the text association graph using graph convolutional networks, which overlooks the highorder representations that are captured by the hypergraph.Our work uniquely uses a hypergraph for short texts from Twitter, capturing high-order relations with four types of hyperedges.

C. Dynamic Query Expansion
Dynamic Query Expansion (DQE) is a widely used data mining technique applied in various domains.It is particularly useful in finding keywords in published tweets similar to a seed query word.By leveraging the tweets dataset, DQE expands initial seed queries and reformulates the queries to retrieve tweets containing similar keywords.It has been successfully used in multiple applications to track civil unrest events [34], detect metro emerging threats [38], monitor cyber-attack events [15], and track flu outbreaks [35] through social media data.Our work utilizes the DQE algorithm to capture any similar keywords in tweets that are then used in constructing a semantic DQE-hyperedge in our hypergraph.

III. PRELIMINARIES
In this section, we define the hypergraph since we use it to represent our data.A hypergraph is defined as G = (V, E), where V = {v i } M i=1 represents the set of nodes in the graph, and E = {e j } N j=1 represents the set of hyperedges.Unlike traditional graphs, in a hypergraph, a hyperedge e j can connect two or more nodes from V. The hypergraph G is represented by an incidence matrix H ∈ R m×n , where the entries are defined as: In this representation, H i,j is set to 1 if the node v i is part of the hyperedge e j , otherwise, it is set to 0. For our textbased hypergraph, nodes in V represent words that appear in each tweet T ∥ , where each word node has an attribute vector X of the pre-trained word embeddings denoted as X = [x 1 , x 2 , ..., x m ] T ∈ R m×d .

IV. METHODOLOGY
In this section, we first summarize the notations used in this paper.Then, we define our problem and explain our proposed method, HyperMAD, with detailed information, including the hypergraph construction, hypergraph convolution, and text classification task.Figure 1 shows our proposed framework.

A. Problem Definition
Let G = (H, X) be a hypergraph, where H = (V, E) represents the hypergraph structure and X represents the node attributes.In this work, we propose a method for short text classification that leverages the hypergraph G to capture complex relationships between words in short texts.The goal is to construct H with four types of hyperedges and use a ReLU function to control the information flow from the node attributes X to the hypergraph convolutional network layers.The final classification is performed using an MLP, which takes as input the embeddings from the modified hypergraph convolutional network.The MLP is trained using a cross-entropy loss function for multi-class classification.The objective is to minimize this loss function to achieve accurate classification of the short text tweets.The hypergraph structure, consisting of nodes and hyperedges X The node attribute matrix, representing the term embeddings V The set of nodes in the hypergraph, representing words in the tweets E The set of hyperedges in the hypergraph, representing relationships between words The initial node attribute matrix X (l)  The node attribute matrix after l layers of hypergraph convolution ϵ A hyperparameter controlling the contribution of the self-connections in the hypergraph convolution W1 The weight matrix in the hypergraph convolution and the MLP b1 The bias vector in the hypergraph convolution and the MLP Z The short text embedding matrix, obtained by aggregating the node attributes y The output of the MLP, representing a probability distribution over the classes ŷ The true label vector L(y, ŷ) The cross-entropy loss function

B. Hypergraph Construction
In our proposed method, a hypergraph H = (V, E) is constructed.The set V denotes the collection of nodes, where each individual node represents a unique word present in the dataset.On the other hand, E signifies the set of hyperedges.Each hyperedge within the set E is a specific subset of V and symbolizes a relationship amongst the words.To capture the complexity of relationships between words in the dataset, four distinct types of hyperedges are defined in this context: 1) Sentence Hyperedges: For each sentence in the dataset, we create a hyperedge that connects all the words in the sentence.This captures the local context of each word.For a sentence s containing n words, we define a hyperedge e s = {v 1 , v 2 , . . ., v n }, where each v i is a node representing a word in the sentence.
2) DQE Hyperedges: For each selected word in the dataset that we call our seed query, we use an extended DQE al-gorithm [34] for tweets that contain up to 280 characters to iteratively find similar keywords to our seed query.Algorithm 1 iteratively retrieves all tweets from our dataset that contain keywords related to the seed query and produces a list of those keywords after it converges.We create a hyperedge that connects the seed query with its similar keywords.This captures the semantic similarity between words in our dataset.So, for a keyword k and its set of similar keywords K, we define a hyperedge e k = {v k } ∪ {v i |v i ∈ K}.The DQE hyperedges are meant to connect expanded words of similar meaning, which in turn would introduce connections between tweets that belong to the same topic.

Algorithm 1 Dynamic Query Expansion
Require: A set of Tweets ⟨T 0 , T 1 , . . ., T t ⟩, Seed Query S Ensure: Expanded Query K ▷ Use w e (T p−1 ) weight to compute w e (F p ) 5: until σ ≤ 0 10: until w e (F p ) = w e (F p−1 ) 11: K = F p 3) Tweet Hyperedges: For each tweet in the dataset, we create a hyperedge that connects all the words in the tweet.For a tweet t containing n words, we define a hyperedge e t = {v 1 , v 2 , . . ., v n }, where each v i is a node representing a word in the tweet.This type of hyperedge is meant to capture the high-order relations in the tweet as a short document.
4) Co-occurrence Hyperedges: For each set of words that co-occur in more than 15% of the dataset, we create a hyperedge that connects all the words in the set.This captures the global context of each word.So for a set of co-occurring words C that appear in more than 15% of the dataset, we define a hyperedge e c = {v i |v i ∈ C}.
By defining these four types of hyperedges, we construct a hypergraph that captures both local and global context, as well as semantic similarity.

C. Hypergraph Convolutional Network
With the objective of capturing high-order relationships among word terms within the tweets, we design short text hypergraph convolution networks to learn word term embeddings.The proposed model, HyprMAD, incorporates several convolution layers and utilizes X (l) ∈ R d×N to represent the d-dimensional word embeddings learned at each level.Recognizing that different perspectives may contribute differently to the final classification results, it is not advisable to indiscriminately distribute the initial term embedding X(0) across all layers of the hypergraph convolutional network.Therefore, to manage the distribution of the initial term embedding across the layers of the hypergraph convolutional network, we apply the ReLU function to the term embedding at each layer, which is defined as follows: where X (0) is the initial term embedding, ϵ is a small constant, W 1 is the weight matrix, and b 1 is the bias vector.And for each layer l in the hypergraph convolutional network, the term embedding matrix X (l) is updated using the ReLU function as defined in Equation 3: The main challenge in applying convolutional operations on hypergraphs is determining an effective method for the propagation of lexical item embeddings due to the non-Euclidean nature of hypergraph data.We present the short textual hypergraph convolution following evidence in [8], as follows: The hypergraph convolution operation, a key mechanism in our model, refines term embeddings within the hypergraph structure.Each term starts with an initial term embedding, X (0) , typically derived from pre-trained models like Word2Vec or GloVe.The operation Z T X (l) symbolizes the aggregation of information from individual terms to the short text, while the pre-multiplied Z signifies the reverse aggregation.This operation unfolds over L layers, each broadening the scope of information exchange across the hypergraph.The final term embeddings, which are the result of this operation, are then used in Equation 5 to compute the final short text embedding: The embedding for a specific short text tweet, represented as d i = {v 1 , v 2 , . . ., v s }, is derived by aggregating the representations of the words it contains, as given by: Consequently, the short text embedding matrix for the hypergraph, denoted as Θ h , is computed as the average of tweet text embeddings:

D. Text Classification
In this work, we propose a short text classification method that leverages a hypergraph convolutional network to overcome the challenges posed by the high-order relations between words in tweet texts.Our approach involves the construction of a hypergraph that represents the complex relationships between words in the short texts, using four distinct types of hyperedges.The ReLU function is then employed to control the flow of information in the network.The final classification is carried out using an MLP, which is trained on the embeddings derived from the hypergraph convolutional network using a cross-entropy loss function for multi-class classification.
In the final stage of our method, we perform classification using an MLP.The MLP takes as input the embeddings derived from the hypergraph convolutional network.Each layer of the MLP involves a linear transformation followed by the application of the ReLU activation function, which introduces non-linearity and allows the MLP to learn more complex patterns in the data.The MLP is represented as: where W 1 is the weight matrix, b 1 is the bias vector, x is the input vector, and y is the output vector.The softmax function is used for the output layer to produce a probability distribution over the classes.
The MLP is trained using the cross-entropy loss function, which is defined as: where y is the true label vector, ŷ is the predicted label vector, and the sum is over all classes.This loss function is minimized during training to achieve accurate classification of the short text tweets.

V. EXPERIMENTS
To assess the performance of our proposed model, Hy-perMAD, we conduct extensive experiments to validate its effectiveness.These experiments aim to answer the following questions: • RQ1: How does the performance of HyperMAD compare to the baseline methods with regard to short-text multiclass classification?• RQ2: Does the imbalance in the data affect the classification results for each class?• RQ3: How does each type of hyperedge contribute to the performance, and which combination is the best?• RQ4: How does the complexity of the hypergraph and the number of hyperedges compare to the baseline methods?
Algorithm 2 Hypergraph Convolutional Network for Short Text Classification Require: for each mini-batch B do 3: Construct the hypergraph G = (V, E) with four types of hyperedges: sentence hyperedges, DQE hyperedges, tweet hyperedges, and co-occurrence hyperedges.

5:
for l = 1 to L do 6: Update the term embedding matrix X (l) using the ReLU function (Eq.3): Compute the short text embedding matrix H by aggregating the term embeddings of each short text.

9:
Compute the output of the MLP using the short text embeddings (Eq.9): 10: Compute the cross-entropy loss (Eq.10): 11: Update the parameters of the model using backpropagation and the Adam optimizer with learning rate η.
12: end for 13: end for 14: return The class with the highest probability in the output vector y for each short text.

A. Datasets
Since we were interested in classifying mental health advice on Twitter, we collected a dataset from responses to tweets about mental health states and moods.In order to do that, we found a Twitter dataset that collected Arabic tweets describing different users' moods, anxieties, sleep modules, and depression states [19].Since we were interested in the response and advice given to those tweets, we had to retrieve the data from Twitter and collect all responses.The dataset had about 48.8K tweets, but at the time of retrieval, we incurred a 10% loss which still left us with 43.9K tweets to investigate responses.The resulting dataset of responses consisted of 131.7K tweets.After removing tweets that responded with emojis only or did not include advice, we had 92K tweets left, each corresponding to one of 6 classes.The dataset was labeled by three native Arabic-speaking psychology graduate students to confirm which tweet/advice was valid (v) or misleading (m) and to assign it to the corresponding class.A summary of the dataset statistics is shown in table II.In our experiments, we adopt the train/test data splits similar to previous works [5], [32].For each run, we randomly select 90% of the training samples to train the model, while the remaining 10% of the data is used for validation.This consistent approach allows for a fair and comparable assessment of our results to previous works.
To evaluate the performance of our model in comparison to the baselines, we use accuracy and F1-score as our metrics, as shown in Table III.

C. Implementations Details
Our HyperMAD model, a Hypergraph Convolutional Network, is built with PyTorch and optimized using Adam.It's run on a system with an AMD Ryzen Threadripper 1920X, Windows 10 64-bit, and NVIDIA GeForce RTX 2080Ti GPU with 64 GB RAM.We use d = 50 for the word embedding dimension and L = 4 convolutional layers.All network parameters are initialized using a Normal distribution.Word embeddings are initialized with models like Word2Vec or GloVe.The results reported are the averages from 10 runs under optimal hyperparameter settings.

A. Overall Performance
We conduct extensive experiments to evaluate the performance of our proposed model, HyperMAD, on short-text classification answering questions RQ1 and RQ2 and show the results in Table III and TableIV, respectively.Based on our experiments, we are able to make the following observations: • Overall, HyperMAD the seven baseline methods on the Twitter Arabic Mental Health Advice Dataset (Twitter AR-MHAD).
• Hypergraph-based methods outperform graph-based and word-embedding methods on our short-text dataset.This indicates the importance of the hypergraphs' ability to represent high-order relations between words in text datasets.
• While HyperMAD and STHCN are both hypergraph methods for short-text datasets, our proposed model only represents the data in hyperedges.Whereas STHCN incorporates the text association using a graph convolutional network.We attribute the improvement in performance to the critical role that hypergraphs play in capturing text associations that existing graph-based methods cannot capture.
• In the fine-grained classification task in Table IV, it is noticeable that the class with the most number of tweets, Religious advice, is the best performing one.The second best is the physical activity advice; we attribute that to the distinctive words associated with that class.
• The most difficult tweets to classify are those in the meditation class.Since it has the least amount of tweets, it was harder for the model to differentiate them.Also, the other advice class contains advice that holds a combination  of two or more classes, which makes them more likely to be misclassified.
• Despite the variance in the number of tweets among classes, we find the fine-grained results significant in differentiating between advice classes.

B. Ablation Study
To answer RQ3, we have to investigate the significance and contribution of each hyperedge type to our model, HyperMAD.We conduct an ablation study where we test the model with various combinations of hyperedges, starting with one type at a time and ending with all four types.Table V shows the 15 variants that we tested.The table is color-coded based on the number of hyperedges that appear, and for each color group, we mark the best-performing variant in bold and the second best by underlining its results.We notice that all four types of hyperedges achieve the best performance, confirming that the hypergraph learns the most from the high-order relations in all four types.It is worth noting that the hyperedges generated from the DQE seem to be the most informative among the other single types of hyperedges.This outcome confirms that the hypergraph learns more information from the expanded queries than word co-occurrence, sentences, and tweets.Combining the DQE hyperedges and tweet hyperedges seems to improve the models' expressiveness better than other combinations.Moreover, adding the sentence hyperedges to the previous combination indicates that the context and seman-

C. Computational Efficiency
In order to evaluate the computational efficiency of our model and answer question RQ4, we perform the comparison between HyperMAD and the best-performing method in each baseline type.The results in Table VI show that Hyper-MAD has an advantage pertaining to the amount of memory used over the baseline methods.We attribute this efficiency improvement to HyperMAD's learning which is conducted on the tweet level and the small batches of tweets being stored during training.STHCN and TextGCN both require constructing the tweet graphs for training and testing data which consumes additional memory.Due to our dataset's large unique vocabulary size, the BiLSTM model involves handling a larger number of word embeddings and conducting more calculations for each word, making it consume more memory.

D. Case Study
To show an example of how our DQE algorithm works, we translated the seed keywords we used in the medical category and displayed the initial seeds and expanded queries in Figure 3.
In this case study, we try to analyze the accuracy and types of Arabic mental health advice on Twitter.Seeking help for mental illnesses can be challenging, stigmatized, and expensive, thus, leading many Arabic speakers to vent on social media.The responses in the form of advice that they get to the description of their mental state could affect them negatively or positively depending on the advice.We try to answer the following question in our case study: Could the Arabic Twitter community support each other with accurate advice?We analyzed 92K tweets containing Arabic mental health advice from Twitter.As seen in Figure 4, about 9% of the advice in response to mental health expressions was accurate in general.Meanwhile, over 90% of the advice belonging to the medical, alternative medicine, and meditation categories was misleading.It was also worrisome to see that the misleading advice had a higher reach in terms of likes, views, and retweets than advice and responses to tweets posted by verified medical doctors or institutions, as seen in Figure 2. We hope that these insights raise awareness in fighting misleading mental health advice on social media platforms and highlight the importance of requiring warning labels on advice that could cause more harm than good.We proposed HyperMAD, a novel short text classification model that captures the high-order relations between words in short texts.Specifically, we define four types of hyperedges to capture the complexity of the relations between words in our Twitter dataset.Our constructed hypergraph incorporates several convolutional layers to learn the text embeddings.Additionally, utilizing the MLP allowed us to perform multiclass classification of the advice in response to tweets corresponding to the topic they fall under.Extensive experiments demonstrated the effectiveness of our proposed model.To the best of our knowledge, HyperMAD is the first model to focus on the classification of misleading mental health advice.We hope that this application is utilized in flagging such responses on social media with the correct resources for those who choose to share their mental health struggles online.
set equal to the seed query S F0 Initial feature vector, set equal to S we(F0) Initial weight of F0, set to 1 p Iteration index in the DQE algorithm Fp Feature vector at the p-th iteration we(Fp) Weight of Fp Tp Tweet at the p-th iteration we(Tp) Weight of Tp σ Difference between the minimum weight of Tp and the maximum weight of the remaining tweets idf Inverse document frequency

Fig. 2 :
Fig. 2: Social Media Reach of Misleading and Valid Arabic Mental Health Advice on Twitter

Fig. 4 :
Fig. 4: Arabic Mental Health Advice on Twitter Analysis VII.CONCLUSIONWe proposed HyperMAD, a novel short text classification model that captures the high-order relations between words in short texts.Specifically, we define four types of hyperedges to capture the complexity of the relations between words in our Twitter dataset.Our constructed hypergraph incorporates several convolutional layers to learn the text embeddings.Additionally, utilizing the MLP allowed us to perform multiclass classification of the advice in response to tweets corresponding to the topic they fall under.Extensive experiments demonstrated the effectiveness of our proposed model.To the best of our knowledge, HyperMAD is the first model to focus on the classification of misleading mental health advice.We hope that this application is utilized in flagging such responses on social media with the correct resources for those who choose to share their mental health struggles online.

•
We propose HyperMAD, a hypergraph convolutional network model for short text multiclass classification.Our proposed model captures high-order relations between words in short texts and utilizes those learned representations in text classification.•We define four types of hyperedges: sentence hyperedges, DQE hyperedges, tweet hyperedges, and co-occurrence hyperedges, to connect the words extracted from our dataset to capture local and global contexts as well as semantic similarity.These hyperedge types help capture the complex relations between words in our dataset.• We conduct comprehensive experimental evaluations to validate the efficacy of our proposed model.Our model outperforms seven baseline models in classifying short texts retrieved from Twitter.The experiment results demonstrate the superiority of the proposed approach.

TABLE I :
Notations and their explanations

TABLE II :
Dataset description

TABLE III :
Overall performance of baseline methods in comparison to HyperMAD using Accuracy (Acc) and F1-Score (F1)

TABLE IV :
HyperMAD fine-grained misleading advice per class results

TABLE V :
Ablation study: Evaluation of our proposed model, HyperMAD, on different combinations of hyperedges types where S, D, T, and C correspond to the Sentence, DQE, Tweet, and Co-occurrence hyperedges, respectively

TABLE VI :
GPU memory consumption comparison of the best-performing method in each baseline type