Currently many non-tractable considered problems have been solved satisfactorily through methods of approximate optimization called metaheuristic. These methods use non-deterministic approaches that find good solutions which, however, do not guarantee the determination of the global optimum. The success of a metaheuristic is conditioned by capacity to adequately alternate between exploration and exploitation of the solution space. A way to guide such algorithms while searching for better solutions is supplying them with more knowledge of the solution space (environment of the problem). This can to be made in terms of a mapping of such environment in states and actions using reinforcement learning. This paper proposes the use of a technique of reinforcement learning - Q-learning algorithm - for the constructive phase of GRASP and reactive GRASP metaheuristic. The proposed methods will be applied to the symmetrical traveling salesman problem.