| 英文摘要 |
Corpus-assisted Discourse Studies (CADS) face the methodological pitfall of statistical storytelling. Researchers often use keywords in context (KWIC) to purposively select samples matching statistical patterns from corpus analysis and then conduct discourse analysis based on these samples without systematically verifying their statistical representativeness. While content analysis offers a mature solution to this methodological pitfall, its high cost renders it practically infeasible for any CADS study. This research proposes a Large Language Model-assisted Content Analysis (LACA) validation mechanism to integrate corpus analysis and content analysis in CADS, rendering previously“theoretically necessary but practically infeasible”semantic verification operationally viable, thereby avoiding the pitfall of“statistical storytelling”.
Research Questions As a proof-of-concept study, this research examines the proposed LACA validation mechanism under a minimum viable configuration, using YouTube comments on the popular song“Fragile”11 as the corpus and addressing four questions. 1. Can LACA effectively identify semantic relationships between co-occurring words in the corpus? 2. How do different LLM (Large Language Model) models and prompts affect LACA’s coding performance? 3. What is the consistency between LLM coding and researcher standards? 4. Can the proposed LACA mechanism effectively bridge statistical patterns and discourse interpretation in CADS, avoiding the pitfall of“statistical storytelling”?
Research Methods First, the study obtains statistically representative KWIC samples through systematic sampling, using personal pronouns我是/I am,你是/you are,我們/们/we, and你們/们/you [plural] as search terms and establishing a reliable foundation for semantic verification. Second, it systematically develops a Standard Coded Set through human-machine collaborative iterative prompt construction and refinement. This hermeneutic circle of construct→verification (κ)→refinement involves iteratively examining LLM coding results and refining prompts to improve coding standards’logic and clarity, until consistency stabilizes at Cohen’sκ≥0.8. This ensures coding judgment principles possess clarity and operability. Third, the study conducts experiments using the established standard coding as an evaluation benchmark, comparing the coding effectiveness and consistency of different LLM configurations (Haiku 3.5 vs. Sonnet 4) and prompt types (simple vs. refined). These experiments verify LACA’s feasibility as a bridging mechanism for CADS. The coding task distinguishes samples where personal pronouns reference specific identities as A (e.g.,“臺灣人,你們讓人喜歡”/“Taiwanese people, you are likable”) from those that do not. The task’s key challenge lies in distinguishing mere lexical collocation from actual semantic reference. Coding standards must identify not only samples lacking identity word collocation (e.g.,“你們讓人喜歡”/“you are likable”) as B, but also false positives where pronouns collocate with identity words without referring to them (e.g.,“你們喜歡臺灣人”/“you like Taiwanese people”) as B. Coding reliability validates whether LACA can effectively handle such judgments, ensuring Category A samples’semantic validity.
Research Findings The findings reveal that model and prompt configurations significantly impact LACA’s coding performance. When both models use refined prompts to code KWIC samples from four search terms (我是,你是,我們/们,你們/们), Sonnet 4 significantly outperforms Haiku 3.5. Sonnet 4 achieves nearly perfect consistency across all tasks (κ= 0.869-0.979). In contrast, Haiku 3.5’s performance declines with corpus complexity - for the most ambiguous“你們/们”samples, reliability drops toκ= 0.380, or below content analysis standards. For prompt comparison, when Sonnet 4 codes identical KWIC samples, refined prompts significantly outperform simple prompts. For simpler“我是”samples, both prompts achieve excellent consistency, but refined prompts further improve reliability (fromκ= 0.924 toκ= 0.979). Prompt effects are more pronounced on complex corpora: for the most challenging“你們/们”samples, refined prompts elevate consistency from acceptable levels (κ= 0.705) to excellent levels (κ= 0.869). Results demonstrate that selecting appropriate model and prompt configurations is critical to ensuring LACA’s effectiveness. Beyond its expected function as a batch semantic verification tool, LACA also serves as a systematic filtering tool, assisting researchers in discovering meaningful discourse patterns from semantically-validated and statistically-representative samples. For instance, this research identifies a novel self-identity metaphor from LACA-verified samples:“I am a coconut.” The research further demonstrates how LACA-verified samples enable identifying discourse patterns in the corpus - specifically,“pervasive irony and distrust toward commenters’self-declarations”- effectively avoiding the pitfall of“statistical storytelling”.
Discussion Building on these findings, the study examines LACA’s methodological significance. Results show that its effectiveness depends on two factors: LLM performance thresholds and researchers’ability to transform domain expertise into executable prompts. However, prompt engineering faces a black-box challenge: specific design principles become obsolete as models evolve, and logically more refined prompts may even reduce coding reliability. To address this challenge, the study proposes the Clinical-Driven principle of prompt engineering, advocating systematic iterative prompt refinement with empirical effectiveness as the optimization standard. This meta-principle ensures LACA’s continued applicability as LLMs and prompt strategies evolve. Reproducibility depends on transparently documenting decision logic and verification processes and not on replicating specific prompt principles. LACA embodies the methodological significance of an interpretive information tool. From theory-driven search term selection and methodologically-informed sampling design to clinically-driven prompt engineering, researchers’theoretical judgments and interpretations are embedded into the CADS process through LACA’s mediation at multiple stages, essentially realizing the batch implementation of thick description. LACA provides a concrete operational framework for integrating quantitative and qualitative approaches. Across five dimensions, LACA performs strongly. In inference quality, it produces discourse analysis results based on statistically- representative samples. In integration effectiveness, it establishes operational integration procedures, reducing CADS’s frequent failure to integrate quantitative and qualitative results. In expanding understanding, it enables researchers to systematically identify unanticipated discourse patterns in corpora. High human-machine coding reliability (under optimal configuration, allκvalues > 0.85) ensures subsequent discourse analysis validity. In feasibility and practical value, compared to traditional content analysis, it achieves approximately significant reductions in both cost and time.
Research Limitations and Future Directions This research adopts a proof-of-concept minimum viable configuration. Future applications can expand as needed. The single-researcher design can involve multiple researchers. Selecting the straightforward“personal pronoun + identity word”pattern, LACA’s potential for more complex pragmatic phenomena awaits exploration. Future research can explore LACA across different theoretical frameworks and corpus types. Leveraging LLMs’multimodal capabilities, the Clinical-Driven principle can serve as the meta-guide for LACA’s methodological expansion towards multimodal applications. |