Package 'kgp'

Title: 1000 Genomes Project Metadata
Description: Metadata about populations and data about samples from the 1000 Genomes Project, including the 2,504 samples sequenced for the Phase 3 release and the expanded collection of 3,202 samples with 602 additional trios. The data is described in Auton et al. (2015) <doi:10.1038/nature15393> and Byrska-Bishop et al. (2022) <doi:10.1016/j.cell.2022.08.004>, and raw data is available at <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/>. See Turner (2022) <doi:10.48550/arXiv.2210.00539> for more details.
Authors: Stephen Turner [aut, cre]
Maintainer: Stephen Turner <[email protected]>
License: Apache License (>= 2)
Version: 1.1.1
Built: 2025-02-15 03:08:39 UTC
Source: https://github.com/stephenturner/kgp

Help Index


1000 Genomes, SGDP, HGDP, and GGVP metadata

Description

Population metadata from 212 populations from the 1000 Genomes Project (kgp), Simons Genome Diversity Project (sgdp), Human Genome Diversity Project (hgdp), and Gambian Genome Variation Project (ggvp).

Usage

allmeta

Format

A tibble with 212 rows and 8 columns:

pop

Short population code

reg

Short region code

population

Long population description

region

Long region description

regcolor

Color for plotting this region on a map

lat

Population latitude

lng

Population longitude

dataset

Which dataset (kgp = 1000 Genomes Project; ggvp = Gambian Genome Variation Project; hgdp = Human Genome Diversity Project; Simons Genome Diversity Project).

References

Byrska-Bishop, Marta, et al. "High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios." Cell 185.18 (2022): 3426-3440.

1000 Genomes Project Consortium. "A global reference for human genetic variation." Nature 526.7571 (2015): 68.

Clarke, Laura, et al. "The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data." Nucleic acids research 45.D1 (2017): D854-D859.

License information is available at https://github.com/igsr/1000Genomes_data_indexes/blob/master/LICENSE. The 1000 Genomes data is made publicly available according to the Fort Lauderdale Agreement (https://www.genome.gov/Pages/Research/WellcomeReport0303.pdf).


1000 Genomes Project sample data (Phase 3)

Description

Sample, pedigree, and population data for 2,504 samples in the Phase 3 release of the 1000 Genomes Project data.

Usage

kgp3

Format

A tibble with 2504 rows and 10 columns:

fid

Family ID

id

Individual ID

pid

Paternal ID

mid

Maternal ID

sex

Sex (1=Male, 2=Female)

sexf

Sex as a factor

pop

Short population code

reg

Short region code

population

Long population description

region

Long region description

Source

http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/

References

Byrska-Bishop, Marta, et al. "High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios." Cell 185.18 (2022): 3426-3440.

1000 Genomes Project Consortium. "A global reference for human genetic variation." Nature 526.7571 (2015): 68.

License information is available at https://github.com/igsr/1000Genomes_data_indexes/blob/master/LICENSE. The 1000 Genomes data is made publicly available according to the Fort Lauderdale Agreement (https://www.genome.gov/Pages/Research/WellcomeReport0303.pdf).


1000 Genomes Project sample data (Expanded)

Description

Sample, pedigree, and population data for 3,202 samples in the expanded 1000 Genomes Project data.

Usage

kgpe

Format

A tibble with 3202 rows and 11 columns:

fid

Family ID

id

Individual ID

pid

Paternal ID

mid

Maternal ID

sex

Sex (1=Male, 2=Female)

sexf

Sex as a factor

pop

Short population code

reg

Short region code

population

Long population description

region

Long region description

phase3

Logical; indicates whether this sample is included in the Phase 3 release data

Source

http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/

References

Byrska-Bishop, Marta, et al. "High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios." Cell 185.18 (2022): 3426-3440.

1000 Genomes Project Consortium. "A global reference for human genetic variation." Nature 526.7571 (2015): 68.

License information is available at https://github.com/igsr/1000Genomes_data_indexes/blob/master/LICENSE. The 1000 Genomes data is made publicly available according to the Fort Lauderdale Agreement (https://www.genome.gov/Pages/Research/WellcomeReport0303.pdf).


1000 Genomes Project population metadata

Description

Population metadata from 26 populations across five continental regions.

Usage

kgpmeta

Format

A tibble with 26 rows and 7 columns:

pop

Short population code

reg

Short region code

population

Long population description

region

Long region description

regcolor

Color for plotting this region on a map

lat

Population latitude

lng

Population longitude

Source

http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/

References

Byrska-Bishop, Marta, et al. "High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios." Cell 185.18 (2022): 3426-3440.

1000 Genomes Project Consortium. "A global reference for human genetic variation." Nature 526.7571 (2015): 68.

License information is available at https://github.com/igsr/1000Genomes_data_indexes/blob/master/LICENSE. The 1000 Genomes data is made publicly available according to the Fort Lauderdale Agreement (https://www.genome.gov/Pages/Research/WellcomeReport0303.pdf).