A deep web data extraction model for web mining: a review
The World Wide Web has become a large pool of information. Extracting structured data from a published web pages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, dueto variety of web data and the unstructured data from hypertext mark up language (...
Saved in:
Main Authors: | , |
---|---|
Format: | EJournal Article |
Published: |
Institute of Advanced Engineering and Science,
2021-07-01.
|
Subjects: | |
Online Access: | Get fulltext |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
LEADER | 02385 am a22003013u 4500 | ||
---|---|---|---|
001 | ijeecs25157_15217 | ||
042 | |a dc | ||
100 | 1 | 0 | |a Ahmad Sabri, Ily Amalina |e author |
100 | 1 | 0 | |e contributor |
700 | 1 | 0 | |a Man, Mustafa |e author |
245 | 0 | 0 | |a A deep web data extraction model for web mining: a review |
260 | |b Institute of Advanced Engineering and Science, |c 2021-07-01. | ||
500 | |a https://ijeecs.iaescore.com/index.php/IJEECS/article/view/25157 | ||
520 | |a The World Wide Web has become a large pool of information. Extracting structured data from a published web pages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, dueto variety of web data and the unstructured data from hypertext mark up language (HTML) files. The aim of this paper is to provide a comprehensive overview of current web data extraction techniques, in termsof extracted quality data. This paper focuses on study for data extraction using wrapper approaches and compares each other to identify the best approach to extract data from online sites. To observe the efficiency of the proposed model, we compare the performance of data extraction by single web page extraction with different models such as document object model (DOM), wrapper using hybrid dom and json (WHDJ), wrapper extraction of image using DOM and JSON (WEIDJ) and WEIDJ (no-rules). Finally, the experimentations proved that WEIDJ can extract data fastest and low time consuming compared to other proposed method. | ||
540 | |a Copyright (c) 2021 Institute of Advanced Engineering and Science | ||
540 | |a http://creativecommons.org/licenses/by-nc/4.0 | ||
546 | |a eng | ||
690 | |||
690 | |a Data extraction techniques; Document object model; Noisy information; Web data extraction; Wrapper extraction of image using DOM and JSON; Wrapper using hybrid DOM and JSON | ||
655 | 7 | |a info:eu-repo/semantics/article |2 local | |
655 | 7 | |a info:eu-repo/semantics/publishedVersion |2 local | |
655 | 7 | |2 local | |
786 | 0 | |n Indonesian Journal of Electrical Engineering and Computer Science; Vol 23, No 1: July 2021; 519-528 | |
786 | 0 | |n 2502-4760 | |
786 | 0 | |n 2502-4752 | |
786 | 0 | |n 10.11591/ijeecs.v23.i1 | |
787 | 0 | |n https://ijeecs.iaescore.com/index.php/IJEECS/article/view/25157/15217 | |
856 | 4 | 1 | |u https://ijeecs.iaescore.com/index.php/IJEECS/article/view/25157/15217 |z Get fulltext |