Some issues with product recache by crawler

AndreyPopov

Well-Known Member
#1
a) in Opencart by default are THREE path to product page:

1. only product_id path: /index.php?route=product/product&product_id=41
2. by category_id (categoy path) /index.php?route=product/product&path=20_27&product_id=41
3. by manufacturer_id : /index.php?route=product/product&manufacturer_id=8&product_id=41

crawler algorithm contain 1 and 2, path 3 (by manufacturer_id) forgotten!

b) on huge number of products, for example, more than 6000, array urls() exceed php memory limit and crawler stop!



that's why I replace in
catalog/controller/extension/module/lscache.php

PHP:
        echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        foreach ($this->model_catalog_product->getProducts() as $result) {
            foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
                if (isset($categoryPath[$category['category_id']])) {
                    $urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
                }
            }

            $urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
        }

        $this->crawlUrls($urls, $cli);

by this:

PHP:
        echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        $UrlsCount = 0;
        $UrlsCountCount = 0;
        $this->load->model('catalog/manufacturer');
        foreach ($this->model_catalog_product->getProducts() as $result) {
            foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
                if(isset( $categoryPath[$category['category_id']] )){
                    $urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
                    $UrlsCount++;
                }
            }

            $urls[] = $this->url->link('product/product', 'manufacturer_id=' . $result['manufacturer_id'] . '&product_id=' . $result['product_id']);
            $UrlsCount++;

            $urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
            $UrlsCount++;
            if ( $UrlsCount > 4096 ) {
                $UrlsCountCount++;
                echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
                $this->crawlUrls($urls, $cli);
                $urls = array();
                $UrlsCount = 0;
            }
        }

        echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        $this->crawlUrls($urls, $cli);
 
Last edited:

AndreyPopov

Well-Known Member
#2
after some tests in heavy load real conditions I investigate that 4096 urls in array urls() also can exceed php memory limit

problem in $categoryPath that also required more memory.

I decide reduce limit of $UrlsCount to 2048. testing .......


PHP:
        echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
        $UrlsCount = 0;
        $UrlsCountCount = 0;
        $this->load->model('catalog/manufacturer');
        foreach ($this->model_catalog_product->getProducts() as $result) {
            foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
                if(isset( $categoryPath[$category['category_id']] )){
                    $urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
                    $UrlsCount++;
                }
            }

            $urls[] = $this->url->link('product/product', 'manufacturer_id=' . $result['manufacturer_id'] . '&product_id=' . $result['product_id']);
            $UrlsCount++;

            $urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
            $UrlsCount++;
            if ( $UrlsCount > 2048 ) {
                $UrlsCountCount++;
                echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
                $this->crawlUrls($urls, $cli);
                $urls = array();
                $UrlsCount = 0;
            }
        }

            if ( $UrlsCountCount > 0 ) {
                echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
            }
        $this->crawlUrls($urls, $cli);
 
Last edited:
Top