Skip to content

ncn-foreigners/data-various-typos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Foreign Names and Surnames Dataset

Overview

This repository contains structured datasets for the most popular names and surnames across various languages, including Ukrainian, Russian, Belarusian, and others. Each dataset includes transliterations, equivalents in Polish and English, informal diminutives, and other variants. The primary goal is to assist in transliteration accuracy, typo detection, and cross-linguistic studies.

Contents

The repository currently includes:

  1. Ukrainian Names and Surnames

    • Female Names
    • Male Names
    • Surnames
  2. Russian Names and Surnames

    • Female Names
    • Male Names
    • Surnames
  3. Planned Additions

    • Additional languages such as Belarusian, Vietnamese, and more.
    • Frequency of occurrence of the given names and surnames.

Dataset Structure

Each dataset contains circa 50 female names, 50 male names, and 50 surnames. The data is organized into tables with the following columns:

  • Original Name: The name or surname in its original script.
  • Polish Transliteration: Transliteration based on Polish orthography.
  • Alt Polish Trans: Alternative Polish transliterations.
  • Polish Equivalent: Direct equivalent in Polish.
  • English Equivalent: Standard English equivalent.
  • Informal Diminutive: Common informal or diminutive forms.
  • Other Variants: Additional regional or historical forms.

Example File Structure

Each file is formatted as a XLSX table, and an example structure is shown below:

Female Names (Ukrainian Example)

Original Name Polish Transliteration Alt Polish Trans Polish Equivalent English Equivalent Informal Diminutive Other Variants
Олена Olena Olona Helena Helen Olenka Alyona
Наталія Natalia Nataliya Natalia Natalie Nata Natasha

Surnames (Ukrainian Example)

Original Name Polish Transliteration Alt Polish Trans Polish Equivalent English Equivalent Informal Diminutive Other Variants
Шевченко Szewczenko Szevczenko Szewczyk Shevchenko Chevtchenko
Мельник Melnyk Melnik Mielnik Melnyk Melnykov

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published