A vast amount of music information available on social media, web pages, online forums, and digital libraries, etc., is represented in natural language. Making sense of this information is challenging due to the unstructured nature of the data. Music and language data also share many similarities such as its sequential nature. With machine learning-based natural language processing (NLP) technology, we attempt to tackle the rich complexity of human languages in order to extract useful insights for tasks such as music information retrieval (MIR) and audio AI. In this talk, I discuss the application of NLP in music information technology in the light of the latest transformations brought about by deep learning, enabling machines to make sense of the world through multimodal music and sound data. I conclude the talk by identifying emerging areas of interesting challenges and parallel trends at the intersection of these two exciting fields.