Hey Guys, I know it has been a while. I hope you don’t mind. 😉

Lets get to this article. Because I am not going to write this today. The fun thing about this article is, that one of my 600+ followers of Qlikshow might want to write this one.

The idea is simple here. Paste the below script in a Qlik Sense App, change the connection name in the variable ‘vConnection’ to an already existing web file connection, or create a new one and click on ‘Load data’.

After that, I am assuming that, something will happen 😉

To be able to achieve what this article is intended to do, I would like you guys sending in the comments for this article’s writing. I will just approve any comments related to this article in which, maybe, you guys wrote what this script is all about.

// #################################################################################################################################################################
// Set your connection to an already existing web file connection (or create a new one)
// #################################################################################################################################################################

LET vConnection = 'lib://MyGenericWebFileConnectionName'; // Define any webfile connection you might have setup

// #################################################################################################################################################################
// Scrape the first level of URLs to get HTML from
// #################################################################################################################################################################

LET vURL 		= 'https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub/Scripting/AggregationFunctions/aggregation-functions.htm';

GetCategories:
LOAD
	RowNo() as Rowno,
    '$(vURL)' as CategoryScrapeURL,
    TextBetween(
    purgechar("@1:n", chr(10)&chr(13)&chr(9))
    , '<body'
    , '</body') as HTML
    
FROM [$(vConnection)]
(URL is [$(vURL)], fix, utf8, no labels, record is 10000000 lines);


GetCategoryLinks:
LOAD
	*,
    SubField(Path, '/', 2) as Category,
    SubField(Path, '/', 3) as SubCategory,
    'https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub' & Path as CategoryLink
    
    Where WildMatch(Path, '*/Scripting/*', '*/ChartFunctions/*')
    and not WildMatch(Path, '*rel="nofollow"*', '*Previous topic*')
    
;
LOAD
	*,
    TextBetween(aTag, '<a href="/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub', '">') as Path
;
LOAD
    SubField(HTML, '</li>') as aTag
    
Resident GetCategories;
Drop Field aTag;
Drop Field HTML;

// #################################################################################################################################################################
// Scrape the second level of URLs to get HTML from
// #################################################################################################################################################################

FOR i = 0 to NoOfRows('GetCategoryLinks')-1;

LET vURL 		= Peek('CategoryLink', $(i), 'GetCategoryLinks');
LET vCategory	= Peek('Category', $(i), 'GetCategoryLinks');

GetFunctions:
LOAD
	RowNo() as fRowno,
    '$(vURL)' as CategoryLink,
    '$(vURL)' as FunctionScrapeURL,
    TextBetween(
    purgechar("@1:n", chr(10)&chr(13)&chr(9))
    , '<body'
    , '</body') as HTML
    
FROM [$(vConnection)]
(URL is [$(vURL)], fix, utf8, no labels, record is 10000000 lines)
;

NEXT i;


GetFunctionLinks:
LOAD DISTINCT
	*,
    SubField(Path, '/', 2) as Category,
    SubField(Path, '/', 3) as SubCategory,
    upper(TextBetween(Path, SubField(Path, '/', 3)&'/', '.htm')) as Function,
    'https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub' & Path as FunctionLink
    
    Where 1=1
    //and WildMatch(fPath, '*$(vCategory)*')
    and not WildMatch(Path, '*rel="nofollow"*', '*-*.htm*', '*aria-label*')
    and not WildMatch(SubField(Path, '/', 3), '*.htm*', '*Operators*')
    
;
LOAD
	*,
    TextBetween(aTag, '<a href="/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub', '">') as Path
;
LOAD
    SubField(HTML, '</li>') as aTag
    
Resident GetFunctions;
Drop Field aTag;


Drop Tables
    GetFunctions,
    GetCategoryLinks,
    GetCategories
;

So, this is my best attempt today to approach something traditional into something different.

Lets see if this works!

See you next post!