Crawler - No Need to Cache

#1
Hi,
I have seen various threads here but none seem to give an answer so i am hoping i can get some help.
I am running magento 2.3.3-p1 and the M2-crawler.sh script always says "No Need to Cache" for every page. I do have more pages than the cache can hold, is this why, but surely some pages would be warmed until my limit is reached?
I created my sitemap.xml with the built-in magento generator, testing using curl doesnt show anything about litemade in the header, but loading my site in a browser and then refreshing works fine so i know litemage is working properly.

Thanks,
Alex
 
#3
I would prefer something that i can run from the server on a cron as URLs can change weekly, i'd also like to understand why i get this issue. I will try XENU in the mean time though
 
#5
But its not cached, that is my issue. Testing using curl doesnt have anything related to litemage in its response, loading a page for the first time in a browser i get a miss, but the second time it is fast and shows a hit.
 

serpent_driver

Well-Known Member
#6
Do you have different vary cache rules like login/not-login user? The problem with crawler script is, that it cannot simulate real users if for example there are vary cache rules based on cookies. The same happens if you check or warmup the cache with curl. To make it work with curl you must add --cookie extra header in curl request. The same can be done with crawler script, but must be modified. That can be the reason why you can get different cache header. Do you know what I mean?
 
#7
I dont think i have varying rules but you may be on the right track with something about cookies as i have noticed this:
Code:
curl -Il https://www.clicksaveandprint.com
HTTP/1.1 200 OK
Connection: Keep-Alive
X-Powered-By: PHP/7.2.31
Set-Cookie: PHPSESSID=2e70d7fb23bb827550a4281828cece71; expires=Sun, 24-May-2020 09:57:34 GMT; Max-Age=3600; path=/; domain=www.clicksaveandprint.com; HttpOnly; secure
Content-Security-Policy: upgrade-insecure-requests;
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-LiteSpeed-Tag: MB_porto_homeslider_4,store,MB,MB_389,MB_tawk-widget,cms_p_75,MB_porto_custom_B_for_header,MB_porto_custom_notice,mfb_p_0,mfb_p_118,C_2,P_32779,P,P_32780,P_32781,P_32782,P_32783,P_32784,P_32785,P_32786,P_32787,P_32788,F
X-LiteSpeed-Cache-Control: public,max-age=86400
Pragma: no-cache
Cache-Control: max-age=0, no-store
Expires: Fri, 24 May 2019 08:57:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 135635
Date: Sun, 24 May 2020 08:57:34 GMT
Access-Control-Allow-Origin: *
X-UA-Compatible: IE=edge
Alt-Svc: quic=":443"; ma=2592000; v="43,46", h3-Q043=":443"; ma=2592000, h3-Q046=":443"; ma=2592000, h3-Q050=":443"; ma=2592000, h3-25=":443"; ma=2592000, h3-27=":443"; ma=2592000
Is when running without visiting in the browser and this this after visiting in a browser:
Code:
curl -Il https://www.clicksaveandprint.com
HTTP/1.1 200 OK
Connection: Keep-Alive
X-Powered-By: PHP/7.2.31
Content-Security-Policy: upgrade-insecure-requests;
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Pragma: no-cache
Cache-Control: max-age=0, no-store
Expires: Fri, 24 May 2019 08:58:09 GMT
Content-Type: text/html; charset=UTF-8
Etag: "835-1590310689;;;"
X-LiteSpeed-Cache: hit,litemage
Date: Sun, 24 May 2020 08:58:14 GMT
Access-Control-Allow-Origin: *
X-UA-Compatible: IE=edge
Alt-Svc: quic=":443"; ma=2592000; v="43,46", h3-Q043=":443"; ma=2592000, h3-Q046=":443"; ma=2592000, h3-Q050=":443"; ma=2592000, h3-25=":443"; ma=2592000, h3-27=":443"; ma=2592000
I notice the Set-cookie in the first run, would this be the cause?
 
#8
I noticed one time i ran curl a little earlier it came back with an error 503. Now im getting status 200 and run the crawler again it sets the pages to caching.
I had tested this numerous times over the course of a week all with this issue, but now it seems to be working. Very odd, but im happy!
 

serpent_driver

Well-Known Member
#9
Set-Cookie header in your response is only session cookie and this cookie should'nt matter, but that depends on existing vary cache rules. If a URL is cached there is no session, therefore a session cookie cannot be set, but this is nomal. To find out if I am right you must request a non cached page with browser, curl and crawler script. If you request a URL first with browser and then with curl and crawler and cache header is always miss with curl/crawler script, then this is definitely the reason for the issue. If so either warmup cache with curl, but with extra headers or crawler script must be modified.
 
Top