Website Category Classification Using Fine-Tuned Bert Language Model

No Thumbnail Available

Date

2020

Authors

Demirkıran, Ferhat
Çayır, Aykut
Ünal, Uğur
Dağ, Hasan

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Top 10%
Influence
Average
Popularity
Top 10%

Research Projects

Journal Issue

Abstract

The contents on the Word Wide Web is expanding every second providing web users a rich content. However, this situation may cause web users harm rather than good due to its harmful or misleading information. The harmful contents can contain text, audio, video, or image that can be about violence, adult contents, or any other harmful information. Especially young people may readily be affected with these harmful information psychologically. To prevent youth from these harmful contents, various web filtering techniques, such as keyword filtering, Uniform Resource Locator (URL) based filtering, Intelligent analysis, and semantic analysis, are used. We propose an algorithm that can classify websites, which may contain adult contents, with 67.81% (BERT) accuracy among 32 unique categories. We also show that a BERT model gives higher accuracy than both the Sequential and Functional API models when used for text classification.

Description

Keywords

BERT, Functional API, Sequential API, Text classification, Web filtering, Text classification, Functional API, Sequential API, Web filtering, BERT

Turkish CoHE Thesis Center URL

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Citation

WoS Q

Scopus Q

OpenCitations Logo
OpenCitations Citation Count
8

Source

2020 5th International Conference on Computer Science and Engineering (UBMK)

Volume

Issue

Start Page

333

End Page

336
PlumX Metrics
Citations

CrossRef : 1

Scopus : 9

Captures

Mendeley Readers : 12

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
2.54766277

Sustainable Development Goals

4

QUALITY EDUCATION
QUALITY EDUCATION Logo