AutismKB

Data Collection Document


  • Data Collection of syndromic autism genes
  • This dataset contains 99 genes and comes from Catalina Betancur's review published in Brain Research in 2011(Betancur, 2011). Different genetic and genomic disorders in which ASDs have been described as one of the possible manifestations were collected.

  • Data Collection of non-syndromic autism genes
  • Genes, CNVs and linkage regions associated with autism were searched from literature and curated. Six categories of literatures were included in our collection: genome-wide association studies, expression profiling, genome-wide CNV studies, linkage analysis, low-scale genetic association studies and other low-scale gene studies. Representative meta-data about key clinical and demographic characteristics was collected.
    • Flow Chart of data collection
    • We search Pubmed to get the literatures we needed. Figure 1 shows the flow chart of the data collection.
      Initial Search for Association Studies: "autism and associat*" (2740 hits)
      Initial Search for other gene Studies: "autism AND (gene OR microarray OR proteomics)"(1368 hits)
      Initial Search for CNV and Linkage Studies: "autism AND (CNV OR copy number variation OR microarray* OR microdel* OR microdup* OR rearrange* OR (genome-wide AND (linkage OR associa* OR scan)))"

      Figure 1: Flow Chart of Data Collection

    • Collecting of meta-data
      Information about key clinical and demographic characteristics of each study was collected.
      • Genome-wide association studies(GWAS)
      • CategoriesRelated Features
        PublicationFirst author
        Year of publication
        PubMed ID
        date of the inclusion
        PopulationAncestral background, Country of origin
        Sample and control inclusion and exclusion criteria
        Number of cases and controls with gender ratio
        Age at examination
        Diagnosis Criteria
        Type"GWAS" (genome-wide association study);
        "Chromosome #" (chromosome-wide association study);
        "cSNP" (coding-region SNP);
        "pooled" (large-scale association study based on pooled genotyping);
        "Other" (other large-scale association study);
        StageDiscovery/Replication
        Study DesignFamily-based or case-control
        Methods/Platform
        ResultsNumber of polymorphisms
        Related Genes
        P value and combined P value
        Genotype & allele distributionPolymorphism (dbSNP ID or most commonly used name)
        Genotype distribution (allele frequency and genotype frequency)
        Other autism related featuresIQ
        autism-specific endophenotype

        Table 1: Collected features of GWAS studies

      • Expression Profiling
      • CategoriesRelated Features
        PublicationFirst author
        Year of publication
        PubMed ID
        date of the inclusion
        PopulationAncestral background, Country of origin
        Sample and control inclusion and exclusion criteria
        Number of cases and controls with gender ratio
        Age at examination
        Diagnosis Criteria
        Tissue Used
        Study DesignMethods/Platform
        Statistic Methods
        Geo ID
        ResultsReported gene name
        Reported probes/ESTs/RefSeq_ID
        Fold Change; Up or Down regulated; P value
        Other autism related featuresIQ
        autism-specific endophenotype

        Table 2: Collected features of Microarray studies

        CategoriesRelated Features
        PublicationFirst author
        Year of publication
        PubMed ID
        date of the inclusion
        PopulationAncestral background, Country of origin
        Sample and control inclusion and exclusion criteria
        Number of cases and controls with gender ratio
        Age at examination
        Diagnosis Criteria
        Tissue Used
        Study DesignMethods/Platform
        ResultsReported gene name

        Table 3: Collected features of protemics studies

      • Genome-wide CNV studies
      • CategoriesRelated Features
        PublicationFirst author
        Year of publication
        PubMed ID
        date of the inclusion
        PopulationAncestral background, Country of origin
        Sample and control inclusion and exclusion criteria
        Number of cases and controls with gender ratio
        Age at examination
        Diagnosis Criteria
        Study DesignFamily-based or case-control
        Methods/Platform
        ResultsCNV regions (chromosome, start and end)
        Band
        Gain/Loss
        Evidence TypeCNVs Only Present In Patients;
        De novo CNVs;
        Overlapping/Recurrent CNVs;
        CNVs Overlapping With ACRD;
        CNVs Not Present In Control;
        Significant Enriched CNVs;
        Others

        Table 4: Collected features of CNV studies

      • Genome-wide Linkage studies
      • CategoriesRelated Features
        PublicationFirst author
        Year of publication
        PubMed ID
        date of the inclusion
        PopulationAncestral background, Country of origin
        Sample and control inclusion and exclusion criteria
        Number of cases and controls with gender ratio
        Age at examination
        Diagnosis Criteria
        Study DesignFamily-based or case-control
        Methods/Platform
        ResultsLinkage regions (chromosome, start and end)
        Band
        Marker
        LOD, NPL or P value

        Table 5: Collected features of Linkage studies

      • Low-scale association studies
      • CategoriesRelated Features
        PublicationFirst author
        Year of publication
        PubMed ID
        date of the inclusion
        PopulationAncestral background, Country of origin
        Sample and control inclusion and exclusion criteria
        Number of cases and controls with gender ratio
        Age at examination
        Diagnosis Criteria
        Study DesignFamily-based or case-control
        Methods/Platform
        ResultsReported gene name
        Reported study results (positive or negative)
        P value
        Genotype & allele distributionPolymorphism (dbSNP ID or most commonly used name)
        Genotype distribution (allele frequency and genotype frequency)
        Other autism related featuresIQ
        autism-specific endophenotype

        Table 6: Collected features of low scale association studies

      • Other low-scale studies
      • CategoriesRelated Features
        PublicationFirst author
        Year of publication
        PubMed ID
        date of the inclusion
        PopulationAncestral background, Country of origin
        Number of cases and controls with gender ratio
        Diagnosis Criteria
        Tissue Used
        autism-specific endophenotype
        Study DesignMethods/Platform
        ResultsReported gene name
        Description of the gene with autism
        Reported study results (positive or negative)
        Evidence TypeGenetics; RNA level function; protein level function

        Table 7: Collected features of other low scale studies

    • Data Statistic
    • Categories of studiesNumber of Genes
      Syndromic Autism Genes99
      non-syndromic
      Autism Genes
      GWAS132
      Expression studies1664
      Low Scale Association studies163
      Other Low Scale Studies308
      Total2135
      Total2193
      CNVs4964 CNVs
      Linkage regions158 Linkage Regions

      Table 7: Data statistic of current collection

  • Quality Score
  • We made a scoring system to score different datasets. All the genes in the CNVs or Linkage Regions were retrieved from UCSC. In total, ,12,180 genes were collected in our final gene lists. Table 8 shows the function of our score system.
    • Function of Quality Score for different categories
    • Experimental MethodsQuality Score of the genes
      Low scale Association studiesScore 1: one positive study (P<=0.05);
      Score 2: two or more positive studies and P>0.001;
      Score 3: two or more positive studies and P<=0.001
      GWASScore 1: one positive study (P<=1e-5);
      Score 2: two positive studies and P>1e-7;
      Score 3: two positive studies and P<=1e-7
      Expression studiesScore 1: one positive study;
      Score 2: two positive studies
      Score 3: three or more positive studies
      Single gene studiesScore 1: one positive study;
      Score 2: two positive studies
      Score 3: three or more positive studies
      Score of CNVs related genesScore 1: 1-3 positive studies;
      Score 2: 4-8 positive studies;
      Score 3: >=9 positive studies
      Score of Linkage regions related genesScore 1: 1-3 positive studies;
      Score 2: 4-8 positive studies;
      Score 3: >=9 positive studies

      Table 8: Function of the score system

    • Score Distribution of different categories
    • Here, we listed the quality score distribution of the six categories:
      Experimental MethodsScoreNumber of genes
      Low scale Association studies1128
      223
      312
      GWAS181
      246
      35
      Expression studies11320
      2285
      359
      Single gene studies1241
      237
      330
      Score of CNVs related genes11086
      234
      319
      Score of Linkage regions related genes1535
      243
      3