Website Category Classification Using Fine-Tuned Bert Language Model

dc.contributor.author Demirkıran, Ferhat
dc.contributor.author Çayır, Aykut
dc.contributor.author Ünal, Uğur
dc.contributor.author Dağ, Hasan
dc.date.accessioned 2020-12-17T18:36:21Z
dc.date.available 2020-12-17T18:36:21Z
dc.date.issued 2020
dc.description.abstract The contents on the Word Wide Web is expanding every second providing web users a rich content. However, this situation may cause web users harm rather than good due to its harmful or misleading information. The harmful contents can contain text, audio, video, or image that can be about violence, adult contents, or any other harmful information. Especially young people may readily be affected with these harmful information psychologically. To prevent youth from these harmful contents, various web filtering techniques, such as keyword filtering, Uniform Resource Locator (URL) based filtering, Intelligent analysis, and semantic analysis, are used. We propose an algorithm that can classify websites, which may contain adult contents, with 67.81% (BERT) accuracy among 32 unique categories. We also show that a BERT model gives higher accuracy than both the Sequential and Functional API models when used for text classification. en_US
dc.identifier.doi 10.1109/UBMK50275.2020.9219384 en_US
dc.identifier.isbn 978-172817565-2 en_US
dc.identifier.scopus 2-s2.0-85095717414 en_US
dc.identifier.uri https://hdl.handle.net/20.500.12469/3562
dc.identifier.uri https://doi.org/10.1109/UBMK50275.2020.9219384
dc.language.iso en en_US
dc.publisher Institute of Electrical and Electronics Engineers Inc. en_US
dc.relation.ispartof 2020 5th International Conference on Computer Science and Engineering (UBMK)
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject BERT en_US
dc.subject Functional API en_US
dc.subject Sequential API en_US
dc.subject Text classification en_US
dc.subject Web filtering en_US
dc.title Website Category Classification Using Fine-Tuned Bert Language Model en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional Demirkıran, Ferhat en_US
gdc.author.institutional Çayır, Aykut en_US
gdc.author.institutional Ünal, Uğur en_US
gdc.author.institutional Daǧ, Hasan en_US
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department Fakülteler, İşletme Fakültesi, Yönetim Bilişim Sistemleri Bölümü en_US
gdc.description.endpage 336 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.startpage 333 en_US
gdc.identifier.openalex W3094374364
gdc.identifier.wos WOS:000629055500065 en_US
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 6.0
gdc.oaire.influence 3.154466E-9
gdc.oaire.isgreen false
gdc.oaire.keywords Text classification
gdc.oaire.keywords Functional API
gdc.oaire.keywords Sequential API
gdc.oaire.keywords Web filtering
gdc.oaire.keywords BERT
gdc.oaire.popularity 8.282494E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 2.54766277
gdc.openalex.normalizedpercentile 0.91
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 8
gdc.plumx.crossrefcites 1
gdc.plumx.mendeley 12
gdc.plumx.scopuscites 9
gdc.relation.journal 5th International Conference on Computer Science and Engineering, UBMK 2020
gdc.scopus.citedcount 9
gdc.virtual.author Dağ, Hasan
gdc.virtual.author Demirkıran, Ferhat
gdc.wos.citedcount 5
relation.isAuthorOfPublication e02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isAuthorOfPublication 695a8adc-2330-4d32-ab37-8b781716d609
relation.isAuthorOfPublication.latestForDiscovery e02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isOrgUnitOfPublication ff62e329-217b-4857-88f0-1dae00646b8c
relation.isOrgUnitOfPublication acb86067-a99a-4664-b6e9-16ad10183800
relation.isOrgUnitOfPublication b20623fc-1264-4244-9847-a4729ca7508c
relation.isOrgUnitOfPublication.latestForDiscovery ff62e329-217b-4857-88f0-1dae00646b8c

Files