英文摘要 |
Traditional way of conducting analyses of human behaviors is through manual observation. For example in couple therapy studies, human raters observe sessions of interaction between distressed couples and manually annotate the behaviors of each spouse using established coding manuals. Clinicians then analyze these annotated behaviors to understand the effectiveness of treatment that each couple receives. However, this manual observation approach is very time consuming, and the subjective nature of the annotation process can result in unreliable annotation. Our work aims at using machine learning approach to automate this process, and by using signal processing technique, we can bring in quantitative evidence of human behavior. Deep learning is the current state-of-art machine learning technique. This paper proposes to use stacked sparse autoencoder (SSAE) to reduce the dimensionality of the acoustic-prosodic features used in order to identify the key higher-level features. Finally, we use logistic regression (LR) to perform classification on recognition of high and low rating of six different codes. The method achieves an overall accuracy of 75% over 6 codes (husband’s average accuracy of 74.9%, wife’s average accuracy of 75%), compared to the previously-published study of 74.1% (husband’s average accuracy of 75%, wife’s average accuracy of 73.2%) (Black et al., 2013), a total improvement of 0.9%. Our proposed method achieves a higher classification rate by using much fewer number of features (10 times less than the previous work (Black et al., 2013)). |