Went to a talk by Melanie Andressen about a data driven analysis of the textual differences between literary analysis and linguistics phd theses. She used pca in her descriptive analysis and it seemed to work out quite well.

Here were some of her findings:
Linguistics theses had words related to quantitative results while literature analysis had more textual queues pointing to experience: feel, way, and understand were important words.

Linguistics papers had few passive constructions

Relative clauses were characteristic of literary analysis theses.

