Jobs

    Nlp And Text Classfication Ml Engineering - Singapore - DATA DISTINCTION

    DATA DISTINCTION
    DATA DISTINCTION Singapore

    13 hours ago

    Default job background
    Full time
    Description
    b'
    This description is supported by a more extensive requirements document.

    Overview We are looking to establish a text classification model that will create a mapping between a dictionary (JSON) of trip types with locales or destinations which are individually described in "articles" in Wiki Voyage. There are three specific outcomes that can be achieved progressively across a few sprints:
  • Extract and capture keywords from each WV article representing a locale
  • Classify each locale as "a good place to visit" based on a separate set of keywords or dictionary that will provided
  • Classify each locale as an appropriate place to experience each trip type A schema for the output to JSON documents has been defined and we will confirm with the ML engineer.

    Content Sources
  • Locale content is approximately 30,000 Wiki Voyage articles will have already been "cleaned" by the time we start to minimize pre-processing. This will be confirmed, but the total corpus should not exceed 1 Gb
  • Trip Type dictionary is a JSON document with a consistent structure to facilitate extraction of keywords and phrases
  • "Good Place to Visit" keyword list

    Implementation We anticipate organizing the work along a few short sprints that align the outputs and a logical NLP/text classification process, but will perform final planning to confirm this with the ML engineer. The model will be implemented in AWS and we would like it to be automatically retrained as new locale content is collected and as the trip type dictionary expands or is update. We will explain our architecture in order to enable this.

    The developer can develop in our AWS environment or using their own tools or sandbox, but the model must run on AWS and be managed in the future based on our DevOps standards.

    We will provide the content structure of the WV articles other data in S3 as well as work with the ML engineer to confirm the implementation and configuration within our AWS environment.

    Skills and Collaboration
  • Knowledge and experience performing text classification modeling and putting into production (working an architect and cloud engineer)
  • Python to perform data source preprocessing
  • Proactive communication and affinity for collaboration

    Secondary Objectives Although not part of the ML engineer\'s scope, we will be happy to have insight that will help us with our secondary objectives, including:

    Data Set Adjustments Insight
  • Initial validation of the key word definition to determine approaches to enrich the Trip Type dictionary for improved mapping in future iterations
    including understanding the strengths and weaknesses of the existing Wiki Voyage data and article quality and variability
    key word format or structure
    key words existing in nested Trip Types
  • Gain perspective on Locale hierarchy levels at which to pursue mapping (region, country, subdivision, city, district)
  • Consider implications for \xe2\x80\x9crelated\xe2\x80\x9d trip types

    Near Term Model Tuning Priorities and Opportunities
  • Gain perspective on how Locales can be prioritized for a Trip Type
  • Evaluate \xe2\x80\x9cgood place to visit\xe2\x80\x9d key words to determine if categories should be established
  • Understand how to process \xe2\x80\x9cinconclusive\xe2\x80\x9d scores, for example\xe2\x80\xa6

    Advanced Model Expansion
  • Evaluate each Locale based on a combination of the two classifications
  • Understand the relative differences, advantages, and future opportunities between Bag of Words and Word Embeddings
  • Clarity as to the nuances of mapping to improve the model and next steps, including applying supervised learning
  • Opportunity and considerations for adopting neural networks as a more advanced approach

    Architecture Standards
  • Confirm data structure and \xe2\x80\x9cend points\xe2\x80\x9d to support Traveler interaction (associated with item above regarding form of output
  • Begin to formalize tools and architecture that fit into the the broader platform architecture