(Source: Introduction to A Linguistic Atlas of Early Middle English, ch. 3, http://www.lel.ed.ac.uk/ihd/laeme1/pdf/Introchap3.pdf)
The literary manuscripts comprise a number of different types and vary greatly in length. They fall into the following main categories:
(a) Texts transcribed and tagged in their entirety
(i) single short (i.e. fewer than 500 words) or fragmentary texts (usually lyrics or parts of lyrics) found in manuscripts with local associations but whose other contents are not in English
(ii) one or more short texts (i.e. fewer than 500 words — usually lyrics) in manuscripts with no local associations. Unless these are found in groups by the same hand, so that their forms can be amalgamated as a single scribal assemblage, these are usually very difficult to localise because there is not enough linguistic material to go on.
(iii) small scribal contributions to larger texts in a different hand.
(iv) small to medium-sized texts (i.e more than 500 and fewer than 10000 words) or medium-sized contributions to larger texts.
(v) Long texts (i.e. more than 10000 words) that have been done completely because of their importance or because of interpretative complexities.
(b) Texts not transcribed and tagged in their entirety.
These comprise long texts that do not seem to present linguistic complexity and that we have therefore sampled rather than tagged completely.
It can be seen that there is no strict cut-off for sample length even for long texts. We take into account textual content and context when choosing where to begin and end a sample. We also take into account comparability with other versions of the same text.