'Bingo' - a large language model- and graph neural network (LLM-GNN)-based workflow for the prediction of essential genes from protein data
- 影响因子:9.5
- 发表刊物:Briefings in Bioinformatics
- 关键字:essential gene prediction; large language model; graph neural network; adversarial training; biological interpretation
- 摘要:Theidentificationandcharacterizationofessentialgenesarecentraltoourunderstandingofthecorebiologicalfunctionsineukaryoticorganisms,andhasimportantimplicationsforthetreatmentofdiseasescausedby,forexample,cancersandpathogens.Giventhemajorconstraintsintestingthefunctionsofgenesofmanyorganismsinthelaboratory,duetotheabsenceofinvitroculturesand/orgeneperturbationassaysformostmetazoanspecies,therehasbeenaneedtodevelopinsilicotoolsfortheaccuratepredictionorinferenceofessentialgenestounderpinsystemsbiologicalinvestigations.Majoradvancesinmachinelearningapproachesprovideunprecedentedopportunitiestoovercometheselimitationsandacceleratethediscoveryofessentialgenesonagenome-widescale.Here,wedevelopedandevaluatedalargelanguagemodel-andgraphneuralnetwork(LLM–GNN)-basedapproach,called‘Bingo’,topredictessentialprotein-codinggenesinthemetazoanmodelorganismsCaenorhabditiselegansandDrosophilamelanogasteraswellasinMusmusculusandHomosapiens(aHepG2cellline)byintegratingLLMandGNNswithadversarialtraining.Bingopredictsessentialgenesundertwo‘zero-shot’scenarioswithtransferlearning,showingpromisetocompensateforalackofhigh-qualitygenomicandproteomicdatafornon-modelorganisms.Inaddition,theattentionmechanismsandGNNExplainerwereemployedtomanifestthefunctionalsitesandstructuraldomainwithmostcontributiontoessentiality.Inconclusion,Bingoprovidestheprospectofbeingabletoaccuratelyinfertheessentialgenesoflittle-orunder-studiedorganismsofinterest,andprovidesabiologicalexplanationforgeneessentiality.
- 论文类型:期刊论文
- 文献类型:J
- 是否译文:否
- 发表时间:2024-01-12
- 收录刊物:SCI
附件