Website Category Classification Using Fine-Tuned Bert Language Model
| dc.contributor.author | Demirkıran, Ferhat | |
| dc.contributor.author | Çayır, Aykut | |
| dc.contributor.author | Ünal, Uğur | |
| dc.contributor.author | Dağ, Hasan | |
| dc.date.accessioned | 2020-12-17T18:36:21Z | |
| dc.date.available | 2020-12-17T18:36:21Z | |
| dc.date.issued | 2020 | |
| dc.description.abstract | The contents on the Word Wide Web is expanding every second providing web users a rich content. However, this situation may cause web users harm rather than good due to its harmful or misleading information. The harmful contents can contain text, audio, video, or image that can be about violence, adult contents, or any other harmful information. Especially young people may readily be affected with these harmful information psychologically. To prevent youth from these harmful contents, various web filtering techniques, such as keyword filtering, Uniform Resource Locator (URL) based filtering, Intelligent analysis, and semantic analysis, are used. We propose an algorithm that can classify websites, which may contain adult contents, with 67.81% (BERT) accuracy among 32 unique categories. We also show that a BERT model gives higher accuracy than both the Sequential and Functional API models when used for text classification. | en_US |
| dc.identifier.doi | 10.1109/UBMK50275.2020.9219384 | en_US |
| dc.identifier.isbn | 978-172817565-2 | en_US |
| dc.identifier.scopus | 2-s2.0-85095717414 | en_US |
| dc.identifier.uri | https://hdl.handle.net/20.500.12469/3562 | |
| dc.identifier.uri | https://doi.org/10.1109/UBMK50275.2020.9219384 | |
| dc.language.iso | en | en_US |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
| dc.relation.ispartof | 2020 5th International Conference on Computer Science and Engineering (UBMK) | |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | BERT | en_US |
| dc.subject | Functional API | en_US |
| dc.subject | Sequential API | en_US |
| dc.subject | Text classification | en_US |
| dc.subject | Web filtering | en_US |
| dc.title | Website Category Classification Using Fine-Tuned Bert Language Model | en_US |
| dc.type | Conference Object | en_US |
| dspace.entity.type | Publication | |
| gdc.author.institutional | Demirkıran, Ferhat | en_US |
| gdc.author.institutional | Çayır, Aykut | en_US |
| gdc.author.institutional | Ünal, Uğur | en_US |
| gdc.author.institutional | Daǧ, Hasan | en_US |
| gdc.bip.impulseclass | C4 | |
| gdc.bip.influenceclass | C5 | |
| gdc.bip.popularityclass | C4 | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::conference output | |
| gdc.collaboration.industrial | false | |
| gdc.description.department | Fakülteler, İşletme Fakültesi, Yönetim Bilişim Sistemleri Bölümü | en_US |
| gdc.description.endpage | 336 | en_US |
| gdc.description.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
| gdc.description.startpage | 333 | en_US |
| gdc.identifier.openalex | W3094374364 | |
| gdc.identifier.wos | WOS:000629055500065 | en_US |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.oaire.diamondjournal | false | |
| gdc.oaire.impulse | 6.0 | |
| gdc.oaire.influence | 3.154466E-9 | |
| gdc.oaire.isgreen | false | |
| gdc.oaire.keywords | Text classification | |
| gdc.oaire.keywords | Functional API | |
| gdc.oaire.keywords | Sequential API | |
| gdc.oaire.keywords | Web filtering | |
| gdc.oaire.keywords | BERT | |
| gdc.oaire.popularity | 8.282494E-9 | |
| gdc.oaire.publicfunded | false | |
| gdc.oaire.sciencefields | 0202 electrical engineering, electronic engineering, information engineering | |
| gdc.oaire.sciencefields | 02 engineering and technology | |
| gdc.openalex.collaboration | National | |
| gdc.openalex.fwci | 2.54766277 | |
| gdc.openalex.normalizedpercentile | 0.91 | |
| gdc.openalex.toppercent | TOP 10% | |
| gdc.opencitations.count | 8 | |
| gdc.plumx.crossrefcites | 1 | |
| gdc.plumx.mendeley | 12 | |
| gdc.plumx.scopuscites | 9 | |
| gdc.relation.journal | 5th International Conference on Computer Science and Engineering, UBMK 2020 | |
| gdc.scopus.citedcount | 9 | |
| gdc.virtual.author | Dağ, Hasan | |
| gdc.virtual.author | Demirkıran, Ferhat | |
| gdc.wos.citedcount | 5 | |
| relation.isAuthorOfPublication | e02bc683-b72e-4da4-a5db-ddebeb21e8e7 | |
| relation.isAuthorOfPublication | 695a8adc-2330-4d32-ab37-8b781716d609 | |
| relation.isAuthorOfPublication.latestForDiscovery | e02bc683-b72e-4da4-a5db-ddebeb21e8e7 | |
| relation.isOrgUnitOfPublication | ff62e329-217b-4857-88f0-1dae00646b8c | |
| relation.isOrgUnitOfPublication | acb86067-a99a-4664-b6e9-16ad10183800 | |
| relation.isOrgUnitOfPublication | b20623fc-1264-4244-9847-a4729ca7508c | |
| relation.isOrgUnitOfPublication.latestForDiscovery | ff62e329-217b-4857-88f0-1dae00646b8c |
