Tuesday, December 13, 2011

Scraping

LXML is useful Python library for scraping. Here is an example of scraping script

pip install requests
pip install lxml

#! /usr/bin/python
import requests
import lxml
from lxml import html

r = requests.get('https://www.google.com/')
tree = lxml.html.fromstring(r.content)
elements = tree.get_element_by_id("prm")
for el in elements:
print el.text_content()

Note: It works with Python 2.6.1 (did not work with Python 4). LXML specs

No comments:

Post a Comment