A dataset designed to study literary fiction's quality as a multidimensional construct with several different proxies for reception and assessment (e.g. Goodreads' scores, libraries' holdings, selection for long-listed awards, presence in canonical anthologies etc.). The dataset contains metadata and tens of linguistic features for more than 9000 contemporary novels. Ideal for study of literary quality and reception.
An extensive description of the measures annotated in the corpus can be found at GitHub: https://github.com/centre-for-humanities-computing/chicago_corpus/blob/main/data/corpus_description.md