Suyeon’s NLP Blog

Writing

문장 기술

Aug 5, 2022

최근에 “국문 글을 어떻게 더 잘 쓸 수 있을까?”라는 고민을 하고 있던 참에 우연히 회사 러닝 포털에서 배상복 기자님의 글쓰기...

Python

Codility Lesson 6

Jul 5, 2022

Lesson 6: Sorting 1. Distinct Instructions Write a function def solution(A) that, given an array A consisting of N integers,...

Paper Review

English to Korean Multilingual Transfer Learnin...

Jun 30, 2022

Abstract: This study focuses on constructing a Korean Sentence-BERT model in a novel method, using student-teacher knowledge distillation. The limitations...

The Last Year

Jun 21, 2022

Where I’ve Been 딱히 이 블로그를 다른 사람들 읽으라고 쓰지는 않지만 내 기록을 위해 근황을 좀 알리자면 대학원 졸업도 하고...

NLP Model

The Uncomfortable Truth About Facebook LASER...

Jul 13, 2021

Facebook LASER Last year, Facebook released code for LASER, or Language-Agnostic SEntence Representations. As it states on their github, LASER...

Paper Review

Billion-scale similarity search with GPUs

Jul 8, 2021

Abstract: Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically...

NLP Model

Extractive Summarization in NLP: Training with ...

Jun 25, 2021

Summarization in NLP Among the many challenges faced by Natural Language Processing (NLP) researchers today, the summarization task is perhaps...

Python

Codility Lessons 2-5

Jun 16, 2021

Codility I’ve been putting my algorithm coding skills to the test through Codility. Here’s how I solved the problems in...

Guides

How to Use Streamlit

Jun 8, 2021

Streamlit Introduction Again, I was training a simple chatbot and I wanted to upload it online so that other users...

Guides

How to use Pytorch Lightning

May 31, 2021

Pytorch Lightning Introduction I was training a simple GPT2 chatbot the other day and came across a code that utilized...

Guides

편리한 코딩을 위한 Python 알쓸신잡

May 18, 2021

지난 번에 자연어처리에 필요한 python 함수와 팁들을 정리했었는데 다른 사람들은 보는지 안 보는지 모르겠지만 내가 계속 들오가서 보게 된다;;ㅋㅋㅋ 사실상...

Paper Review

Margin-based Parallel Corpus Mining with Multil...

May 6, 2021

Abstract: Machine translation is highly sensitive to the size and quality of the training data, which has led to an...

Paper Review NLP Model

Sentence-BERT: Sentence Embeddings using Siames...

Apr 26, 2021

Abstract: BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair...

Algorithm

Mutual Information and Diverse Decoding Improve...

Apr 21, 2021

Abstract: Sequence-to-sequence neural translation models learn semantic and syntactic relations between sentence pairs by optimizing the likelihood of the target...

Paper Review

DIALOGPT : Large-Scale Generative Pre-training ...

Apr 14, 2021

Abstract: We present a large, tunable neural conversa- tional response generation model, D IALOGPT (dialogue generative pre-trained transformer). Trained on...

Paper Review NLP Model

GPT-2 and GPT-3: Towards a More General Languag...

Apr 12, 2021

About GPT Text Generation에 크게 관심이 없었기 때문에 GPT에 대해서 그냥 Autoregressive한 모델이구나… generation에 쓰이구나… 정도만 알고 있었다. 근데 최근...

Paper Review

Beyond English-Centric Multilingual Machine Tra...

Mar 30, 2021

Abstract: Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to...

Guides

HuggingFace 정복하기

Mar 22, 2021

최근 NLP 분야 공부 또는 연구하는 사람이라면 당연히 HuggingFace를 사용해봤거나 들어봤을 것이다. BERT부터 GPT까지 웬만한 NLP 분야 논문에 나온 모델을...

Paper Review NLP Model

Attention is All You Need

Mar 18, 2021

Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The...

Paper Review

Reformer: The Efficient Transformer

Mar 12, 2021

Abstract: Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively...

Paper Review Algorithm

LexRank: Graph-based Lexical Centrality as Sali...

Mar 1, 2021

Abstract: We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test...

Paper Review

MAD-X: An Adapter-Based Framework for Multi-Tas...

Feb 26, 2021

Abstract: The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and boot- strapping...

Guides

자연어처리를 위한 Python 알쓸신잡

Feb 25, 2021

오늘은 자연어처리 작업을 할 때 알면 좋은 Python 기본 함수를 정리해보려고 한다. 실제로 내가 과제나 실험을 진행하면서 많이 사용하는 함수들...

Paper Review NLP Model

Parameter-Efficient Transfer Learning for NLP

Feb 15, 2021

Abstract: Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks,...

Paper Review

Neural Machine Translation with Byte-Level Subw...

Feb 2, 2021

Abstract: Almost all existing machine translation models are built on top of character-based vocabularies: characters, subwords or words. Rare characters...

Paper Review

SentencePiece: A simple and language independen...

Feb 1, 2021

Abstract: This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation....

Guides

Korean NLP Preprocessing Module | 한국어 전처리 모듈 만들기

Jan 28, 2021

전처리는 매번 느끼지만 정말 손이 많이 가는 작업이다. 하지만 굉장히 중요한 작업이다. 학습 데이터의 질에 따라 모델 성능도 천차만별이기 때문이다....

Guides

Korean-English Parallel Corpora | OPUS 말뭉치 다운로드...

Jan 20, 2021

Parallel Corpus To train multilingual NLP models for cross-lingual tasks, you need parallel corpora. Usually a parallel corpus consists of...

Guides

한국어 코퍼스 리스트 및 전처리 준비

Jan 18, 2021

한국어 코퍼스 Pre-train 모델의 성능은 데이터의 양질에 완전히 의존한다. 한국어는 영어만큼 많은 코퍼스가 존재하지 않지만, 점점 늘어나는 추세이다. 아래는 내가...

Paper Review NLP Model

RoBERTa: A Robustly Optimized BERT Pretraining ...

Jan 15, 2021

Abstract: We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many...

Guides

Github Pages 블로그 만들기

Jan 13, 2021

Github과 별로 친하지 않은 사람으로써 Github Pages 블로그를 만들기로 결심한 이후 조금 애를 썼다. 혹시 나중에 또 블로그를 만들 때...

Paper Review

Cross-Lingual Alignment vs. Joint Training: A C...

Aug 14, 2020

Abstract: Learning multilingual representations of text has proven a successful method for many cross-lingual transfer learning tasks. There are two...

Paper Review

Making Monolingual Sentence Embeddings Multilin...

Aug 1, 2020

Abstract: We present an easy and efficient method to extend existing sentence embedding models to new languages. This allows to...

Paper Review

Adversarial NLI: A New Benchmark for Natural La...

Jul 24, 2020

Abstract: We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training...

Paper Review