Abstract: | 腎臟癌是一個廣泛性的名稱,其最主要的是腎臟細胞發生癌變。從流行病學的角度探討,吸菸、肥胖、高血壓還有不當的作息及飲食、酒精會增加腎臟癌的風險。職業中的危害,像是農業的除草劑、建築工業用的石棉也都會增加腎臟癌症的風險,腎臟癌的診斷是由超音波確認患者腎臟是否有腫瘤,在目前機器學習大數據的時代,可以使用大數據去找出生物標誌,因超音波發現時已經形成腫瘤,腎臟超音波也不是必要的檢測項目,故若能使用生物標誌去找出高風險族群,能早期發現此標誌相關的生活環境與習慣以及影響該標誌的原因,就能防患於未然。
DNA甲基化是一種重要的表觀遺傳修飾,有相當的背景研究指出與癌症、糖尿病、身心科疾病關聯。許多研究表明,異常的DNA甲基化與不規則的基因沉默有關,當啟動子區域存在高度5-甲基胞嘧啶時就會發生這種情況。癌症中的低甲基化會導致基因組不穩定,從而活化癌化基因,一些腫瘤抑制基因則可以被 DNA 高甲基化活化。因此,本研究希望能透過分析TCGA(The Cancer Genome Atlas)基因甲基化的數據找出癌症的生物標誌。
我們從TCGA下載腎透明細胞癌(kidney renal clear cell carcinoma,簡稱KIRC)癌組織及其配對非癌化組織的Illumina human methylation 450K晶片中的CpG位點之甲基化實驗數據及其臨床資料;接著以β-value分析配對組織之差異甲基化表現;再結合患者的臨床數據以是否發生遠端轉移為預後並投入Logistics regression訓練一個機器學習的風險預測模型。最後,成功建立一個具有高正確率的二元羅吉斯風險預測模型,並且從模型的結果輸出中找到了RP11基因受甲基化調控並與腫瘤細胞的移動相關。
本研究建立之二元羅吉斯風險預測模型可判斷腎透明細胞癌產生遠端轉移的相關風險,找出影響該基因甲基化的因素,從生活環境、習慣、飲食等等來降低腎透明細胞癌化風險,另外此甲基化分析方法也可以用於其他癌症甲基化數據的研究。
Kidney cancer is a broad term that primarily refers to the development of cancer cells in the kidneys. From an epidemiological point of view, smoking, obesity, high blood pressure, improper work and rest, diet and alcohol consumption all increase the risk of kidney cancer. Occupational hazards, such as agricultural herbicides and asbestos used in construction, also increase the risk of kidney cancer. Kidney cancer is diagnosed by using ultrasound to confirm whether there is a tumor in the patient's kidney. In the current era of machine learning and big data, big data can be used to find biomarkers. Kidney ultrasound is not a necessary test because the tumor is already formed when the ultrasound is found. Therefore, if a biomarker can be used to identify high-risk groups, the living environment associated with the marker can be identified early. With habits and causes that influence signs, you can prevent problems before they happen.
DNA methylation is an important epigenetic modification that has been implicated in cancer, diabetes, and psychosomatic diseases and has received extensive background research. Numerous studies have shown that aberrant DNA methylation is associated with irregular gene silencing, which occurs when high levels of 5-methylcytosine are present in promoter regions. Hypomethylation in cancer can lead to genomic instability that activates oncogenes, and some tumor suppressor genes can be activated by DNA hypermethylation. Therefore, this study hopes to identify cancer biomarkers by analyzing TCGA (The Cancer Genome Atlas) gene methylation data.
We downloaded the methylation experimental data of CpG sites in the Illumina human methylation 450K wafers of renal clear cell carcinoma (kidney renal clear cell carcinoma, referred to as KIRC) cancer tissue and its paired non-cancerous tissue from TCGA and its clinical data; then analyze the differential methylation performance of paired tissues with β-value; then combine the clinical data of the patient to take the occurrence of distant metastasis as the prognosis and put it into Logistics regression to train a machine learning risk prediction model. Finally, a binary Logistics regression risk prediction model with high accuracy was successfully established, and the RP11 gene was found to be regulated by methylation and related to the movement of tumor cells from the output of the model.
The binary Logistics regression risk prediction model established in this study can determine the risk associated with distant metastasis of renal clear cell carcinoma, find out the factors that affect the gene methylation, and reduce the risk from living environment, habits, diet, etc. The risk of cell canceration, and this methylation analysis method can also be used for the study of other cancer methylation data. |