Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. We have developed a technique called bypassing that allows CellSs to perform core-to-core DMA transfers for generic applications. In this overview paper we concisely summarise the bypassing practice and introduce two improvements: just-in-time renaming and lazy write-back. These extensions come at no additional cost and potentially increase performance by improving the perceived bandwidth of the Element Interconnect Bus (EIB). Although the integration of bypassing with CellSs is work in progress we present results for four fundamental linear algebra kernels to demonstrate the applicability of these techniques and quantify the benefit that can be reaped.