Is there another method of running the Crawler

#1
Hello,

LiteSpeed Crawler is not active on the server where my site is located. The service provider does not allow it to be activated. Is there any other way I can get the Crawler feature to work?

Thanks in advance for your interest and help.
Regards!
 

serpent_driver

Well-Known Member
#2
There are a lot of ways to warmup the cache, but each method does the same. It does request URLs of your page like LiteSpeed crawler does it. The question is if your provider will check this and disallows each method that crawls your page. The best way would be to kick this provider off. He can't offer LiteSpeed services, but disallows crawler. Stupid!

You can use "Xenu". A local Windows application that is meant for checking broken links and works like crawler.

http://home.snafu.de/tilman/xenulink.html
 

Germont

Well-Known Member
#5
For 100k urls/hour it can't be the usual hosting with hundreds/thousands of shared accounts.
My VPS has 4 core and 5 shared accounts.
It's configured so each account cronjobs never overlap with others. In one hour it can preload not more than 5k urls.
 

serpent_driver

Well-Known Member
#9
In 2013 curl already had commands I use to make requests much faster.

I am curious about your plugin, if you can share one way or another.
Possible, but this script (no plugin) is part of an application and uses sources from this application. I have to extract this sources from application to make a standalone version. It is already on my to do list, but with low priority. Ask my again in some weeks....
 

serpent_driver

Well-Known Member
#10
@Germont

I invested a lot of time to develop this script. This crawler is x-time faster than everything that LiteSpeed has ever published. Before I invest time again to make a standalone version of this crawler, are you ready to pay for it? I will publish it, but not for free!!
 

Germont

Well-Known Member
#11
I will have a look at features, but buying it's another story. Few clients I have, momentarily are happy with free tools, therefore must choose accordingly ;)
 
Last edited:

serpent_driver

Well-Known Member
#12
Which feature do you need? Free Netflix access while crawling, free entry to all soccer (football) games in premier league in your country, a dozen virgins? :)

I have integrated all features that are possible with cURL. Only cURL limits what is possible and impossible. Witout cURL no HighSpeed crawling!
 

serpent_driver

Well-Known Member
#13
@Germont

Have a look on attached log file and check the amount of crawled URLs and the total time to crawl 1000 URLs.

Result: 1000 URLs within ~1 minute, but it can go faster depending on server ressources. This crawl session ran on a shared hosting. Server load: Low
 

Attachments

Germont

Well-Known Member
#14
Now I recalculated and it seems in WordPress I can preload even 10k urls / hour at full load with PHP8 and Memcached.
But I ignored the VPS I manage is still on MySQL 5.7, and it seems MySQL 8 has 2x more speed.
What version do you have for PHP/ MySQL?
Do you also use Object Caching?

That partially explains the difference.
 

serpent_driver

Well-Known Member
#15
Memcached, Object cache or any other caching doesn't matter for warmup crawling. I noted no difference if it is used or not, because I am not shure if these kind of cache is available in command line interface (CLI).

On this shared hosting I have PHP 7.4 and MariaDB, but it seems also not matter which versions are used. The difference in speed doesn't depend on any installed software for PHP and MySQL. It depends on how URLs are requested and how powerful the server is.
 

Germont

Well-Known Member
#16
Ahh, I forgot my server uses mechanical HDD, which increase time response.
Maybe in your setup Object Caching doesn't matter much. But in my tests, it decreased preload time from 55 minutes, to 47 (1700 urls). That was opencart crawler, less performance.
 

serpent_driver

Well-Known Member
#18
@Germont

Last weekend I found the time to generate a standalone version of "HyperSpeed Multithreading Cache Warmup Crawler." :) I am just doing some fine tuning, but it is almost done. Do you want to have a look on it?
 

Germont

Well-Known Member
#19
Yes, I would have a look, thanks!
But if it's indeed more performant than LS Crawler, maybe you can have a deal with LS team, so your work can have a better reward.
 

serpent_driver

Well-Known Member
#20
Yes, I would have a look, thanks!
But if it's indeed more performant than LS Crawler, maybe you can have a deal with LS team, so your work can have a better reward.
Good idea, but I already had a lot of discussions with leaders of LiteSpeed, but it seems they have not much interests in business. If they think an alternative crawler might be good, they would have released one already......

I'll send you PM to give you the link to control panel.
 
Top