Generating accurate and valid scientific results

Machine learning (ML) allows software applications to become more accurate in predicting outcomes with increased use. ML involves building algorithms that can predict an output value within an acceptable range.

CSISA generates numerous scientific datasets on crop production practices and agronomic field trials, but generating frequent and valid results from these thousands of observations is a challenge. ML tools can help.

CSISA organized a five-day workshop in Odisha to train CSISA scientists from Bihar, eastern Uttar Pradesh (EUP) and Odisha in the use of ML tools – based on the open-source statistical computing and graphics software, ‘R,’ – to analyze CSISA’s crop cut and production practice survey datasets.

Each year, CSISA generates data from multi-location adaptive trials, production practice diagnostic surveys and a few other targeted needs-based surveys in Bangladesh, India and Nepal. These datasets are used to determine the most important yield attributing factor(s), information that could help policymakers target and refine recommendations and advisories. ML allows us to draw quick, accurate and valid results from these datasets.

Under the leadership of CSISA-Nepal’s Socioeconomist, Gokul Paudel, participants jointly reviewed production practice survey datasets, cleaned the data, applied relevant analytical tools and generated results.

The group started by reviewing basic statistics and R-software, the rationale behind ML and algorithms such as classification and regression tree (CART) and random forest models. Using R, participants checked data summary statistics and visualized in histograms, boxplots, scattered plots and correlation plots. With CART, the participants produced graphical results by chronologically classifying covariates in terms of their possible predictive roles in a particular outcome. CART showed that sowing date is the most important factor in determining wheat yield in Bihar and EUP, followed by crop establishment method, amount of nitrogen applied and number of irrigations.

Participants also used the random forest model, which is more robust in terms of training and validation performance because multiple decision trees, based on different characteristics, are built. Results also identified sowing date as the most important factor, also matching CART results for other covariates determining wheat yield.

These ML results provide sufficient evidence of the role of sowing date in wheat yield in UP and Bihar, which has also been documented earlier by CSISA.

This team of CSISA scientists successfully analyzed and visualized data with modern statistical tools and gained confidence to consistently undertake robust diagnostic surveys and collaborative research trials, as well as generate location specific insights, discuss these insights with partners and inform decision makers at relevant levels. All publications, along with full datasets, will be made available to the public through open source channels.

Posted on CSISA Success Story, India-news, Nepal-news, News - Homepage, News & Announcements, May 21, 2018


Copyright © 2017 CIMMYT

CSISA Website

Disclaimer

While every precaution has been taken in the preparation of this website and its contents, CIMMYT and its implementing partner organizations for CSISA – IFPRI and IRRI – assume no responsibility for errors or omissions. All information and features described herein are subject to change without notice. This website may contain links to third-party websites. CIMMYT is not responsible for the contents of any linked site or any link contained in a linked site. This website is providing these links only as a convenience, and the inclusion of a link does not imply endorsement by CIMMYT of the linked sites or their content.

Terms of Use

Copyright © 2017 International Maize and Wheat Improvement Center (CIMMYT)
CIMMYT holds the copyright to all CSISA publications and web pages but encourages use of these materials for non-commercial purposes, unless specifically stated otherwise. Permission to make digital or hard copies of part or all of this work for personal or classroom use is hereby granted without fee and without a formal request provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. For copyrights not owned by CIMMYT, express permission must be pursued with the owner of the information. To republish or redistribute for commercial purposes, prior permission is required.