• Publications
  • Influence
Collisions of SHA-0 and Reduced SHA-1
TLDR
Improvements to the techniques used to cryptanalyze SHA-0 are described and improvements that allow us to find collisions of reduced versions of SHA-1 are presented, that show that collisions up to about 53–58 rounds can still be found faster than by birthday attacks. Expand
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts
TLDR
Preliminary experimental results demonstrate that, because of the sensitivity of cache conflicts to small changes in problem size and base addresses, selective copying can lead to better overall performance than either no copying, complete copying, or copying based on manually applied heuristics. Expand
Cache interference phenomena
TLDR
The different types of cache interferences that can occur in numerical loop nests are identified and an analytical method is developed for detecting the occurrence of interferences and, more important, for computing the number of cache misses due to interferences. Expand
Evaluation of CPU frequency transition latency
TLDR
An experimental study on the measurement of frequency transition latencies shows that, while changing CPU frequency upward leads to higher transition delays, changing it downward leads to smaller or similar transition delays across the set of available frequencies. Expand
CQA: A code quality analyzer tool at binary level
TLDR
The CQA tool is presented, a loop-centric code quality analyzer based on a simplified unicore architecture performance modeling and on quality metrics that provides high level metrics along with human understandable reports that relates to source code. Expand
Deep jam: conversion of coarse-grain parallelism to instruction-level and vector parallelism for irregular applications
TLDR
It is shown that good speedups can be achieved through deep jam, a new transformation of the program control- and data-flow, which combines scalar and array renaming with a generalized form of recursive unroll-and-jam, contributing to the extraction of fine-grain parallelism in irregular applications. Expand
MAQAO : Modular Assembler Quality Analyzer and Optimizer for Itanium 2
Code Representation IA64 Assembly Parser Front End IA64 Assembly Parser Back End .s instrumented code .s
Performance Tuning of x86 OpenMP Codes with MAQAO
TLDR
This paper presents a tool for the performance analysis of multithreaded codes (OpenMP programs support at the moment) that relies on static performance evaluation to identify compiler optimizations and assess performance of loops and can analyze the results and provide hints for tuning the code. Expand
...
1
2
3
4
5
...