Skip to content

VisionLang/YapayGazeteci-Teknofest2024-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

News Generation

Introduction

This project is a part of the Teknofest 2024 Türkçe Doğal Dil İşleme competition. The aim of the project is to generate news title and content from a given image.

Dataset

The dataset is collected from the Sabah news website. The dataset consist of news titles, news content and images. The dataset is in Turkish Language.

Data-Preprocessing

  • Sample Data: image


      title = "Balıkesir’de tarihi bina yangında küle döndü"
      word_index = {'Balıkesir’de': 9, 'tarihi': 5, 'bina': 3, 'yangında': 7, 'küle': 5, 'döndü': 6 }
      tokens: [start_token, 9, 5, 3, 7, 5, 6, end_token]
    Input Output
    Image + start_token 9
    Image + start_token + 9 5
    Image + start_token + 9 + 5 3
    Image + start_token + 9 + 5 + 3 7
    Image + start_token + 9 + 5 + 3 + 7 5
    Image + start_token + 9 + 5 + 3 + 7 + 5 6
    Image + start_token + 9 + 5 + 3 + 7 + 5 + 6 end_token

Model

The model is a combination of CNN and LSTM, where the image is fed to the Encoder(CNN) and the output of the CNN is fed to the Decoder(LSTM) along with the input text.