Catatan Harian Mas Andri

Silahkan digunakan dengan bijaksana

back

Scrap Produk MP Shopee Menggunakan Python

31 Januari 2020 || 21:22:56 WIB || ClassyID

Shopee merupakan marketplace yang saat ini naik daun dikalangan para seller marketplace.

script python cukup sederhana, dan ini dijalankan melalui server linux

secara default python sudah terinstall di linux, dan kita tinggal menambahkan module python pip untuk menginstall paket yang dibutuhkan

cek versi python yg terinstall

root@xtreamUI:~# python3 --version
Python 3.5.2

install python3 pip

apt install python3-pip
Preparing to unpack .../libpython3-dev_3.5.1-3_amd64.deb ...
Unpacking libpython3-dev:amd64 (3.5.1-3) ...
Selecting previously unselected package python3.5-dev.
Preparing to unpack .../python3.5-dev_3.5.2-2ubuntu0~16.04.9_amd64.deb ...
Unpacking python3.5-dev (3.5.2-2ubuntu0~16.04.9) ...
Selecting previously unselected package python3-dev.
Preparing to unpack .../python3-dev_3.5.1-3_amd64.deb ...
Unpacking python3-dev (3.5.1-3) ...
Selecting previously unselected package python3-pip.
Preparing to unpack .../python3-pip_8.1.1-2ubuntu0.4_all.deb ...
Unpacking python3-pip (8.1.1-2ubuntu0.4) ...
Selecting previously unselected package python3-pkg-resources.
Preparing to unpack .../python3-pkg-resources_20.7.0-1_all.deb ...
Unpacking python3-pkg-resources (20.7.0-1) ...
Selecting previously unselected package python3-setuptools.
Preparing to unpack .../python3-setuptools_20.7.0-1_all.deb ...
Unpacking python3-setuptools (20.7.0-1) ...
Selecting previously unselected package python3-wheel.
Preparing to unpack .../python3-wheel_0.29.0-1_all.deb ...
Unpacking python3-wheel (0.29.0-1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up libpython3.5:amd64 (3.5.2-2ubuntu0~16.04.9) ...
Setting up libpython3.5-dev:amd64 (3.5.2-2ubuntu0~16.04.9) ...
Setting up libpython3-dev:amd64 (3.5.1-3) ...
Setting up python3.5-dev (3.5.2-2ubuntu0~16.04.9) ...
Setting up python3-dev (3.5.1-3) ...
Setting up python3-pip (8.1.1-2ubuntu0.4) ...
Setting up python3-pkg-resources (20.7.0-1) ...
Setting up python3-setuptools (20.7.0-1) ...
Setting up python3-wheel (0.29.0-1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...

Menginstall paket yg dibutuhkan melalui pip python

pip3 install -r req.txt
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Collecting beautifulsoup4
  Using cached beautifulsoup4-4.8.2-py3-none-any.whl (106 kB)
Collecting pandas
  Downloading pandas-0.24.2-cp35-cp35m-manylinux1_x86_64.whl (10.0 MB)
     |################################| 10.0 MB 7.7 kB/s
Collecting requests
  Using cached requests-2.22.0-py2.py3-none-any.whl (57 kB)
Collecting lxml
  Downloading lxml-4.5.0-cp35-cp35m-manylinux1_x86_64.whl (5.7 MB)
     |################################| 5.7 MB 11.8 MB/s
Collecting soupsieve>=1.2
  Downloading soupsieve-1.9.5-py2.py3-none-any.whl (33 kB)
Collecting pytz>=2011k
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Collecting python-dateutil>=2.5.0
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting numpy>=1.12.0
  Downloading numpy-1.18.1-cp35-cp35m-manylinux1_x86_64.whl (19.9 MB)
     |################################| 19.9 MB 30 kB/s
Collecting idna<2.9,>=2.5
  Using cached idna-2.8-py2.py3-none-any.whl (58 kB)
Collecting chardet<3.1.0,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2019.11.28-py2.py3-none-any.whl (156 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Using cached urllib3-1.25.8-py2.py3-none-any.whl (125 kB)
Collecting six>=1.5
  Downloading six-1.14.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: soupsieve, beautifulsoup4, pytz, six, python-dateutil, numpy, pandas, idna, chardet, certifi, urllib3, requests, lxml
Successfully installed beautifulsoup4-4.8.2 certifi-2019.11.28 chardet-3.0.4 idna-2.8 lxml-4.5.0 numpy-1.18.1 pandas-0.24.2 python-dateutil-2.8.1 pytz-2019.3 requests-2.22.0 six-1.14.0 soupsieve-1.9.5 urllib3-1.25.8

Menjalankan script shopee.py

python3 shopee.py
sukses - Helm Bogo Dewasa JPN MOMO Hitam Doff Google Masker ZONAHELM
sukses - Paket Helm Anti Polusi
sukses - TERBARU !! JAS HUJAN AXIO
sukses - Kain Bungkus Helm anti debu 50x45cm
sukses - Jaket Ninja Hoodie Pria
sukses - Kacamata Goggle Aksesoris Helm Retro Bogo Cosplay
sukses - Helm Bogo Motif Zonahelm
sukses - Helm Bogo Dewasa SNI JPN MOMO Goggle Masker GMD Path
sukses - BEST SELLER ! Helm Retro Hitam Polos SNI Komplit Dewasa , BOGO ASLI + GOOGLE IMPORT + path
sukses - Helm Bogo Hitam Doff SNI Kaca Original
sukses - 1 Set Kancing Helm Bogo
sukses - Helm BOGO FULLSET SNI ( KULIT )
sukses - Helm Retro Dewasa JPN MOMO Hitam Doff Goggle Masker OSBE
sukses - JAKET OUTDOOR WATERPROOF BIG SIZE
sukses - Helm Cargloss Path Goggle Masker WS
sukses - Helm Klasik Kulit Hitam Polos + Kacamata Google Retro SNI Dewasa
sukses - Masker Bolak Balik
sukses - Gesper atau Pengait Tali Helm Bogo
sukses - AKSESORIS HELM BOGO, Kacamata Lokal, Path, Kancing Helm, Bandana, Gesper, Kain Helm
sukses - Helm Bogo Dewasa West Path GMD 03
sukses - Helm Retro Kulit Kacamata Classy
sukses - [COD] Helm bogo Retro DEWASA Karakter Zonahelm WS
sukses - Helm Bogo Retro Klasik Motif Spongebob Kaca Bogo Original+Kacamata lokal SNI
sukses - Jaring Helm Motor Sepeda Tali Pengait
sukses - Tabung Tempat Jas Hujan ZONAHELM
sukses - Kaca Datar OSBE Type G7
sukses - Helm Dewasa Kulit SNI Club Coklat Kaca Bogo bisa pilih warna
sukses - Path Pendek OSBE Mini
sukses - Helm Kulit Klasik SNI ARMY Bogo Original
sukses - Masker Cawang/Masker Anti Polusi
sukses - Helm Dewasa Sepeda Original Zonahelm
sukses - Goggle Mask / Google Masker Osbe
sukses - Tali Bagasi/Jok Motor
sukses - JPN Momo  Helm Bogo Retro Klasik Kaca Riben Pilot
sukses - Bandana Multi fungsi
sukses - Kacamata Goggle Cross 100%
sukses - Jaket Sweater Hoodie ZEROHEROES
sukses - Parfum Helm Real-U Aktive
sukses - PROMO HARGA GROSIR!!! READY STOK!!! Aksesoris Helm Kacamata Google Import
sukses - Helm Cross GIX
sukses - Helm Pilot Kulit Kaca Riben SNI
sukses - Helm JPX Cross
sukses - Helm Retro Bogo NJS Gundulan
sukses - Kacamata Goggle Masker Neo Rainbow
sukses - Goggle Mask OSBE
sukses - Path Cross Snail
sukses - Helm Retro Klasik Motif Spongebob dilengkapi Path SNI
sukses - Jaket Sweater Hoodie West Brook Kuning
sukses - Pinlock Anti Fog
sukses - Sepatu Cross Trail RNL 4 Straps Gesper
sukses - Sepatu Motocross Alpinestars
sukses - Helm Dewasa Kulit SNI Club Coklat Kaca Bogo bisa pilih warna
sukses - Helm dewasa kulit SNI Club Coklat Kaca Bogo bisa pilih warna
sukses - Helm Retro Nazi Kulit Zonahelm
sukses - Kacamata OMS Aksesoris Helm Bogo
sukses - Path Panjang Osbe || Path Transparan || Path AML
sukses - Helm Bogo Retro Classy Goggle Masker Neo Rainbow
sukses - Goggle Crocs Osbe OB 138
sukses - Helm Sepeda Bolt ZONAHELM
sukses - Goggle Cross GC03
sukses - Goggle Masker Osbe Polos Frame - Kaca
sukses - Goggle Cross OSBE FLO/Stabilo Rainbow
sukses - Goggle Cross MGV
sukses - Kaca Bogo Original Warna RAINBOW
sukses - Helm Anak RKZ ZONAHELM
sukses - Helm Dewasa Kulit SNI Club Coklat

Hasilnya berupa file csv yg berisi judul, deskripsi, harga, link gambar

Dan inilah script python scrap link produk shopee :

import requests
from pandas import DataFrame
import json
import csv
 
with open('URL-shopee.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
url=row[0]
heder={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
url=url.split("-i.")
url=url[1]
api="https://shopee.co.id/api/v2/item/get?"
itemid="itemid="+url.split(".")[1]
shopid="shopid="+url.split(".")[0]
furl=api+itemid+"&"+shopid
res = requests.get(furl, headers=heder)
data = res.content.decode("utf-8", "ignore")
anu = json.loads(data)
nama=anu["item"]["name"]
price=anu["item"]["price_max"]
desc=anu["item"]["description"]
img=anu["item"]["images"]
price=int(price/100000)
imgs=[]
keys=range(len(img))
for i in img:
imgs.append("https://cf.shopee.co.id/file/"+i)
 
img_dict=dict(zip(keys, imgs))
#print(desc)
#print(img_dict)
cat=''
data = {'cate':cat, 'Nama': nama, 'Desc':desc, 'Price': price}
res = {**data, **img_dict}
#print(res)
index = [0]
df= DataFrame(res, index=index)
#print(df)
print("sukses - "+nama)
exportcsv=df.to_csv(r'.\result-shopee.csv', index = None, header=False ,mode='a')

Screenshoot installasi python pip dan proses scrap

Semoga berguna