Data

As part of the SMARTEn project we have performed several DNA sequencing experiments. As we expect that the data we generated will be useful to anyone working with MinION, below we provide details of how it was generated and how it can be downloaded.

Input

We performed MinION sequencing of mock community consisting of five algal species and two bacterial species. The cultures were purchased from the University of Texas at Austin. The table below provides references.

Name Type NCBI ID UTEX ID NCBI Tax ID
Chlamydomonas reinhardtii Green algae GCA_000002595.3 UTEX 2243 3055
Synechocystis sp. Cyanobacteria GCA_000478825.2 UTEX B 2470 1147
Anabaena flos-aquae Cyanobacteria GCA_000521175.1 UTEX LB 2558 284502
Chlorella vulgaris Green algae GCA_001021125.1 UTEX 2714 3077
Scenedesmus obliquus Green algae GCA_002149895.1 UTEX 393 3088
Nannochloropsis oculata Algae GCA_004335455.1 UTEX 2164 1259847
Euglena gracilis Algae GCA_900893395.1 UTEX B 367 3039

DNA Extraction and Mock Communities

The DNA extraction was performed using the FastDNA Spin Kit for Soil (MoBio, Solon, OH) following manufacturer instructions. DNA was eluted in 50µL warmed Elution Buffer. The extracted DNA was quantified using a QUBIT 3.0 fluorometer (Invitrogen), and mixed in differing proportions to create three unique mock communities. Mock community compositions were calculated based on the relative DNA mass contributed by each organism. The following mock communities were obtained:

Sequencing

The sequencing libraries were prepared using the Genomic DNA by Ligation sequencing kit SQK-LSK109 (Oxford Nanopore Technologies, Oxford, United Kingdom) and the corresponding protocol (Version: GDE_9063_v109_revX_14Aug2019). As recommended, a minimum of 1µg of genomic DNA was utilized for sequencing library preparation and no additional fragmentation steps were performed.

Prepared sequencing libraries were loaded onto a FLO-MIN106D flow cell (R9.4.1 chemistry) and sequenced using a MIN-101B MinION sequencing device. Sequencing was performed using a 48-hour run script with active channel selection enabled and refueling was performed at approximately 22 hours by loading 250µL Flush Buffer (FB) into the flow cell.

Raw MinION reads were basecalled using Guppy 5.0.7+2332e8d.

Data Summary

Raw FAST5 and Basecalled FASTQ Reads

Currently, all data is available from our Box cloud service. Please use the following link to access the data: https://buffalo.box.com/s/7p5uu1j1ldekkvzw0qqm0bb8gep8xm2v

Reference Database

To test our metagenomic classifier we used a reference database consisting of the key Algae and Cyanobacteria reference genomes combined with an additional set of bacterial genomes. All reference sequences came from the NCBI RefSeq. Below we provide a collection of files that can be used to reconstruct our reference database: https://buffalo.box.com/s/0nhldjstw3cfxvg2elh4yieinc5z3416