矩陣乘法是科學與工程計算中常見的運算之一,許多人皆為了能增進其計算效率而努力。近幾十年以來,為了加速這類需要龐大計算量的運算,平行處理不外乎為最佳的選擇。隨著硬體製造技術的進步,選擇高速的處理器或是採用多個處理器來執行這類型的運算也非常普遍。在此篇論文中,合併運算將被包含在平行架構中進行。合併運算打破個別的乘法器與加法器的界線,而將乘法與加法視為一體同時執行。然而,在做個別乘積項的加法時,並沒有任一個方法總是最好的。因此,我們提出一個包含之前的方法和新的混合方式來尋找最有效率的一種。有鑒於使用者對系統的考量並不唯一,我們的模擬程式將輸出三種量測標準供使用者選擇,分別是時間,成本和時間乘以成本。除此之外,大致的硬體連接方式也被呈現於結果中,協助之後的實作設計。對於追求高效能以及低成本的系統設計中,此研究的成果應能提供莫大的幫助。Since matrix multiplication is one of the most used operations in science and engineering, a lot of efforts for improving its efficiency have been made greatly. To accelerate such enormous computing,parallel processing architectures are mostly considered by decades. For the advance of manufacturing technology, high clock rate processors or multiple processors are also used to speed up the computation. In this work, another approach called merged arithmetic is included into our parallel architecture. It dissolves the boundary between the individual multipliers and adders to perform multiple multiply and addition in parallel. However, none of the methods, which were presented previous for reducing partial product matrix, is absolutely better than others. This study proposes a combined method to find out the most efficient reduction. Respecting the user’s demand is not the same all the time; our simulation results include three metrics, delay, cost, and delay × cost. Moreover, the hardware interconnection for further implementation is also offered. It is very helpful for the design of such systems because a high performance throughput and low cost system are both what we concern.