Detection of suicide-related posts in Twitter data streams
Aim: a new approach that uses the social media platform Twitter to quantify suicide warning signs for individuals and to detect posts containing suicide-related content.
Proposed System:
the automatic identification of sudden changes in a user’s online behavior. To detect such changes, combine natural language processing techniques to aggregate behavioral and textual features and pass these features through a martingale framework, which is widely used for hange detection in data streams.
Existing System: traditional detection methods rely heavily on manually annotated speech, which can limit their effectiveness due in part to the varying forms of suicide warning signs in at-risk individuals [6, 11, 12].
Objectives:
1. using research from the field of psychology, we design and develop behavioral features to quantify the level of risk for an individual according to his online behavior on Twitter (speech, diurnal activities, size of social network, etc.). Creating a feature for text analysis called the Suicide Prevention Assistant (SPA) text score.
2. monitor the stream of an individual Twitter user and his behavioral features using an innovative application of a martingale framework to detect sudden behavioral changes.
Methodology:
Suicide Prevention Assistant (SPA): a feature for text analysis.
· Suicide warning signs in online behavior:
Established two groups of behavioral features: user-centric and post-centric features. User-centric features characterize the behavior of the user in the Twitter community, while post-centric features are characteristics that are extracted from the properties of a tweet.
1. user-centric behavioral features, aim to capture changes in a Twitter user’s engagement with other users. The friends and followers features can quantify an individual’s interaction with his or her online community, such as a sudden decrease in communication. On the other hand, they can also reflect an expansion of an individual’s online community. This is relevant, as at-risk individuals have also been shown to increase their time online developing personal relationships [29]. It is important to note that we have chosen the terms friends and followers to represent the unidirectional relationships that are inherent on Twitter. We acknowledge that this term may not apply for certain user accounts such as celebrities and news outlets. Additional features include volume, replies, retweets, and links, which were all identified by De Choudhury et al. [6] as markers for mental health. These measures can help to quantify the number of interactions a user has with their friends and followers for it could be the case that an individual’s social network remains stable while their interactions increase or decrease. The final user-centric feature, questions, may also indicate a user’s attempt to engage with others online.
2. Post-centric behavioral features are characteristics originating from the post itself. One important piece of information is the hour at which the tweet is published (time feature). Late-night activity can be an indication of unusual rhythms in sleep (insomnia and hypersomnia) [6] and can predict future episodes of depression.
· To classify the text of the post, propose two different approaches. The first approach is a natural language processing (NLP) method that combines features generated from the text, based on an ensemble of lexicons. These lexicons are composed of linguistic themes commonly exhibited by at-risk individuals. The second approach, called the distress classifier, is based on machine learning.
Martingale framework: detect sudden behavioral changes.
· Design of a martingale-based approach for emotion change detection:
The full martingale framework can be broken down into three steps.
1. The first step is to calculate the strangeness measure, which quantifies for each specific user how much a tweet is different from previous ones.
2. Next, a statistic is defined to rank the strangeness measures of the tweets.
3. Finally, using this statistic, a family of martingales is defined in order to detect movements in the tweet stream and run the hypothesis test.
Advantage: martingale framework is a flexible approach to analyzing text streams. The algorithm can handle different mixes of features and can be implemented without the use of annotated datasets, which are common to text classification.
Limitations:
1. the parameter setting of the martingale framework could be improved upon. This was one of the major challenges when implementing the framework.
2. the martingale values “react” to changes in online speech, the change point detection method needs improvement. We were able to detect the true change point for one validation case, but the approach needs to be more robust with respect to parameter setting and positive changes in speech.
Future enhancement:
plan to further explore the impact of martingale parameters on the change detection effectiveness. Also hope to expand the approach to include image processing and other social media outlets in order to assess the effectiveness in other settings. Another interesting perspective is to consider more fine-grained emotion classes such as anger, sadness, fear, etc., instead of considering four levels of distress.
For additional details comment below with requirements.
For additional details comment below with requirements.
Comments
Post a Comment