High throughput sequencing technologies generate large volumes of data that is characterized by high dimensionality, unwanted noises, significant computational memory usage and high management costs. Ms. Consolata Gakii from the department of Computing and Information Technology in the University of Embu together with Dr. Richard Rimiru (www.jkuat.ac.ke) and Dr. Paul Mireji (www.kalro.org) have found a potential solution to such challenges in RNAseq lung cancer datasets. The team have proposed a graph-based feature selection approach that can be used to select and find associations between cancer and non-cancer related genes based on their expression values. Their innovation was based on the fact that biological features are usually related to functions in living systems and hence a relationship cannot be deduced by feature selection and classification alone.
The work has been published in Algorithms journal of MDPI under the title Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets (https://www.mdpi.com/1999-4893/15/1/21). Another related publication from the same project is: Gakii, C., & Rimiru, R. (2021). Identification of cancer related genes using feature selection and association rule mining. Informatics in Medicine Unlocked, 24, 100595. https://doi.org/10.1016/j.imu.2021.100595