-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise description of collection #3
Comments
Can add that but Descriptions don't allow formatting, so I am not sure how this will look - may have to discuss how we do this.
…----------------------------
Dr Peter Sefton
Senior Technical Advisor, School of Languages and Culture
Mobile: 0404 096 932
________________________________
From: smusgrav ***@***.***>
Sent: Wednesday, December 7, 2022 14:59
To: Language-Research-Technology/corpus-tools-cooee ***@***.***>
Cc: Peter Sefton ***@***.***>; Assign ***@***.***>
Subject: [Language-Research-Technology/corpus-tools-cooee] Revise description of collection (Issue #3)
Description should include information about the two stratifications of the data. This is especially relevant for the time-period stratification - without this, a user would not understand the file naming convention. Suggested text:
Material to be included had to meet with a regional and a temporal criterion. The latter required texts to have been produced between 1788 and 1900 in order to become eligible for COOEE. It was mandatory for a text to have been written in Australia, New Zealand or Norfolk Island. But in a few cases, other localities were allowed. For example, if a person who was a native Australian or who had lived in Australia for a considerable time, wrote a shipboard diary or travelled in other countries.
Contains: Letters, published materials in book form, historical texts.
The collection is stratified in two ways:
Time period - The corpus is divided into four time periods:
* Period 1: 1788-1825
* Period 2: 1826-1850
* Period 3: 1851-1875
* Period 4: 1876-1900
The initial numeral of each file name indicates the period from which the document comes.
Register - The corpus contains material from four registers:
* Speech-based (sb)
* Private written (prw)
* Public written (pcw)
* Government English (ge)
The register to which a file belongs is specified in the metadata at the start of each file in the form <r=[register]> using the abbreviations above.
—
Reply to this email directly, view it on GitHub<#3>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAFYTWHCAW64FNG7GHUUJ73WMADSVANCNFSM6AAAAAASWJUFQU>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description should include information about the two stratifications of the data. This is especially relevant for the time-period stratification - without this, a user would not understand the file naming convention. Suggested text:
Material to be included had to meet with a regional and a temporal criterion. The latter required texts to have been produced between 1788 and 1900 in order to become eligible for COOEE. It was mandatory for a text to have been written in Australia, New Zealand or Norfolk Island. But in a few cases, other localities were allowed. For example, if a person who was a native Australian or who had lived in Australia for a considerable time, wrote a shipboard diary or travelled in other countries.
Contains: Letters, published materials in book form, historical texts.
The collection is stratified in two ways:
Time period - The corpus is divided into four time periods:
The initial numeral of each file name indicates the period from which the document comes.
Register - The corpus contains material from four registers:
The register to which a file belongs is specified in the metadata at the start of each file in the form <r=[register]> using the abbreviations above.
The text was updated successfully, but these errors were encountered: