The open-access, open-data journal GigaScience (published by BGI and Biomed Central), announces today the publication of an article on the genome sequencing of 3000 rice strains along with the release of this entire dataset in a citable format in journal's affiliated open-access database, GigaDB. The publication and release of this enormous data set (which quadruples the current amount of publicly available rice sequence data) coincides with World Hunger Day to highlight one of the primary goals of this project— to develop resources that will aid in improving global food security, especially in the poorest areas of the world. This work is the completion of stage one of the 3000 Rice Genomes Project, a collaborative effort made up of the Chinese Academy of Agricultural Sciences (CAAS), the International Rice Research Institute (IRRI), and BGI, and is funded by the Bill and Melinda Gates Foundation and the Chinese Ministry of Science and Technology.
With more than 1/8th of the world's population living in extreme hunger and poverty, and an every-increasing world population (estimated to reach 9.6 billion by 2050), there is a huge need to create new resources to improve crop yield, reduce the impact of agricultural practices on the environment, and develop food crops that are of high yield and nutrition and can grow successfully in environments stressed by drought, pests, diseases, or poor soil quality. While rice research has greatly advanced since the completion of the first high-quality rice genome sequence in 2005, there has been limited change in breeding practices that are important for producing improved and better adapted rice strains.
The 3000 Rice Genomes Project provides a major step forward for addressing these challenges by creating and releasing an extensive amount of genetic information that can ultimately be applied to intelligent breeding practices, which take advantage of the natural variation between different plant strains and information on the genetic mechanisms that underlie these traits to select strains for breeding that will be more successful in producing hybrid strains with characteristics that are highly suited for growing successfully in different environments.
Dr. Zhikang Li, the Project Director at CAAS, stated that the 3000 Rice Genomes Project is part of an ongoing effort to provide resources specifically for poverty-stricken farmers in Africa and Asia, aiming to reach at least 20 million rice farmers in 16 target countries (8 African and 8 Asian countries). "Rice is the staple food for most Asian people, and has increasing consumption in Africa," said Dr. Li. "With decreasing resources (water and land), food security is —and will be— the most challenging issue in these countries, both currently and in the future. As a scientist in rice genetics, breeding and genomics, it would be a dream to help to solve this problem."
Dr. Jun Wang, Director of BGI, added to this, saying that, "the population boom and worsening climate crisis have presented big challenges on global food shortage and safety. BGI is dedicated to applying genomics technologies to make a fast, controllable and highly efficient molecular breeding model possible. This opens a new way to carry out agricultural breeding. With the joined forces with CAAS, IRRI and Gates Foundation, we have made a step forward in big-data-based crop research and digitalized breeding. We believe every step will get us closer to the ultimate goal of improving the wellbeing of human race."
According to IRRI director general Dr. Robert Zeigler, "access to 3,000 genomes of rice sequence data will tremendously accelerate the ability of breeding programs to overcome key hurdles mankind faces in the near future." This collaborative project, added Zeigler, "will add an immense amount of knowledge to rice genetics, and enable detailed analysis by the global research community to ultimately benefit the poorest farmers who grow rice under the most difficult conditions."
Drs, Wang and Zeigler, and Dr. Jia-Yang Li, President of CAAS, provide further information on the goals of this project in an accompanying commentary in GigaScience.
To reach their goals, the three-institute collaboration has not only released 13.4 terabytes of data, they have also collected seeds from each strain (available in the International Rice Genebank Collection housed at IRRI). Having banked seeds is essential to make full use of these now genetically defined strains to develop and sustain the most appropriate hybrid strains for different environments. There remains, however, one additional component to achieve this goal: this is information that allows researchers and breeders to directly link the genetic information (genotype) to the physical traits (phenotype) of these different strains. This requires careful assessment and curation of each rice strain for agriculturally important traits, which can then be linked to genetic markers in the now available genome sequences.
Current breeding practices, which have essentially remained the same since the development of agriculture, typically use apparent physical traits to guide strain selection for crossbreeding with the hope that the offspring will manifest a combination and improvement of the desired traits, such as drought, pest and disease resistance and increased crop productivity and improved nutritional value. However, the underlying genetic makeup can often confound breeder expectations because unknown genetic interactions can block, modify, or alter the development of the selected physical characteristics when two strains are bred. Thus, trial and error and multiple successive breeding stages are often required.
Having full knowledge of the genetic makeup of a plant allows researchers to identify genetic markers related to specific physical traits, and better understand how different genetic interactions effect plant phenotypes. This information allows a breeder to make more intelligent choices in strain selection, resulting in more accurate and rapid development of rice strains that are better suited to different agricultural environments in poor and environmentally stressed economies.
This is a process that requires a great deal of care and manpower. Thus, the release of these data, and making the genetic information freely available to plant breeders and scientists across the world, will greatly aid in defining genotype/phenotype relationships as well as serve as an extensive resource improving our understanding of plant biology.
Publication in GigaScience includes storage of relevant associated data in the journal's affiliated database, GigaDB, where every dataset is provided with a digital object identifier (DOI), making it possible to cite, find and track data in standard scientific literature, which serves as a strong incentive for researchers to more rapidly release expensive and work-intensive datasets for community use. On top of hosting the terabytes of supporting data in GigaDB, to provide the most extensive availability to the community, the sequence reads for this project have also been submitted to the SRA repository at PRJEB6180.
Further Reading
- The 3,000 Rice Genomes Project. The 3,000 Rice Genomes Project. GigaScience 3:7 http://dx.doi.org/10.1186/2047-217X-3-7
- Li, J-Y, Wang, J. and Zeigler The 3000 Rice Genomes Project: new opportunities and challenges for future rice research. GigaScience 2014, 3:8 http://dx.doi.org/10.1186/2047-217X-3-8
- The 3000 Rice Genomes Project (2014): The Rice 3000 Genomes Project Data. GigaScience Database. http://dx.doi.org/10.5524/200001
- International Rice Genebank Collection: http://irri.org/our-work/research/genetic-diversity/international-rice-genebank
- About World Hunger Day: http://www.thehungerproject.co.uk/getinvolved/worldhungerday/world-hunger-day-purpose/
About GigaScience
GigeScience is co-published by BGI, the world's largest genomics organization, and BioMed Central, the world's first open-access publisher. The journal covers research that uses or produces 'big data' from the full spectrum of the life sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life sciences. The journal has a completely novel publication format — one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, GigaDB , as well as in their publicly available repositories. GigaScience can provide users access to associated online tools and workflows, and includes an integrated data analysis platform, GigaGalaxy, maximizing the potential utility and re-use of data. (Follow us on twitter @GigaScience; Facebook, and keep up-to-date from our blog.
Journal
GigaScience