Organizations with delivery operations like courier services, retail distribution networks, ride-hailing services, municipal bodies, service/technician scheduling and routing, energy companies, supply chain and distribution of products etc. are faced with an increasingly competitive marketplace. They are constantly seeking new and enhanced tools to further automate operations and optimize their routing to lower costs and better serve customers. However, since Vehicle Routing Problems(VRP) are NP-hard combinatorial problems, they are still widely researched and multiple heuristics are still being developed to find near optimal solutions in a reasonable amount of time. Recently, it has been shown that heuristics for solving combinatorial problems can be learned using a Reinforcement Learning based approach. We created a dynamic routing algorithm to meet daily demands of stores from number of warehouses for a retail chain with network of more than 2500 stores spread across US. A unique methodology was developed to find near optimal routes using reinforcement learning while satisfying business constraints. Algorithm results has shown significant improvement from retailers’ existing methods. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following business feasibility rules. Our deep learning model based on transformer architecture is trained using the policy gradient algorithm with a simple deterministic greedy roll-out baseline, the trained model produces the solution without the need to re-train for every new problem instance. The solutions were able to deliver not only theoretically optimal routes based on real time demand inputs, but also operationally executable routes for the business.