The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.