Business research usually starts with a list -- brands, competitors, people, products, whatever. Most people begin with Google, trying to discover commonalities, gather basic information, or just find out what all these things are. Then, there is a marathon session of cut and paste, with a hopefully tidy but usually messy document to show for the hours of work.
This post describes a quick Python script that uses the Google Search API to automate the routine parts of the task, giving you more time to analyze and understand the results. Here is what it does:
* Accepts a list of terms you want to research. These can be whatever you like, as long as there is one term per line.
* Uses the Google Search API to return the most likely URL and Description that Google thinks matches the term. (This is sort of like the "I am feeling lucky" button, so you will still need to double check the results.)
* Outputs the results to tab delimited format so that you can use it in other documents (or scripts)
This diagram should give you the basic idea of what the script does:

Before you Start
You will need the following stuff to run this script:
* Python. If you are on a Mac, it is built-in. If you are in Windows, you can get it from ActiveState.
* A JSON processing module for Python. I am using simplejson in this script.
* A list of terms you want to research
Overview of the Google Ajax Search API
The Google Search API has a simple REST interface -- you provide a URL with an encoded parameter (called "q") that has your term, and Google returns a JSON structure representing the results. For example, here is how you would search for "Python":
http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=python
If all goes well, you will get back a JSON structure that looks something like this -- the interesting stuff is in the "results" structure (I have removed a lot of other stuff for clarity) :
{
...
"results": [
{
"GsearchResultClass": "GwebSearch",
"cacheUrl": "http://www.google.com/search?q=cache:YSBN_oGSAEYJ:www.python.org",
"content": "Home page for Python, an interpreted, interactive,
object-oriented, extensible programming language. It provides an
extraordinary combination of clarity and ...",
"title": "Python Programming Language -- Official Website",
"titleNoFormatting": "Python Programming Language -- Official Website",
"unescapedUrl": "http://www.python.org/",
"url": "http://www.python.org/",
"visibleUrl": "www.python.org"
},
...
]
},
...
}
Results are in order by pagerank, so the first result is the one that probably fits the search term. Obviously, this can break down in a lot of ways, but it works surprisingly well for popular brands or products.
The Code
Now that we have gotten the basics out of the way, here is the code for a script called term2url.py:
#
# This is a quick and dirty script to pull the most likely url and description
# for a list of terms. Here is how you use it:
#
# python term2url.py < {a txt file with a list of terms} > {a tab delimited file of results}
#
# You will must install the simplejson module to use it
#
import urllib
import urllib2
import simplejson
import sys
# Read the terms we want to convert into URL from info redirected from the command line
terms = sys.stdin.readlines()
#Now loop through each term in the list and return the highest ranking result
for term in terms:
# Define the query to pass to Google Search API
query = urllib.urlencode({q : term.rstrip("
")})
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s" % (query)
# Fetch the results and create JSON structure
search_results = urllib2.urlopen(url)
json = simplejson.loads(search_results.read())
# Process the results by pulling the first record, which has the best match
results = json[ aresponseData][ aresults]
for r in results[:1]:
url = r[url]
desc = r[content].encode(ascii, areplace)
# Print the results to stdout. Use redirect to capture the output
print "%s %s %s" % (term.rstrip("
"), url, desc)
You can download a formatted version from the resources section at the bottom of the post.
Running the Code
Assuming you have gotten Python and the simplejson module installed correctly, running the script is a snap. From the command line, type:
python term2url.py < input file name > output file name
In my example, the input file is called brands.txt and the output file is called brands_data.txt.
Once you have run the script (it may take a while), you can open the data file with a spreadsheet and format it.
Related Stuff
-
MooV: Using cutting edge Video phones and Software Video Phones - coupling all that with VoIP and empowering the disabled.
-
Moo Telecom: VoIP communications made easy - Ring anyway with the fun and ease of using a normal phone
-
TagR:Mobile Social Network with Real Time Locations Based services, and Ambience Intelligence, VoiP, IM, Skype, Googletalk, Mapping, Flickr, Events, Calendaring, Scheduling, SecondLife Support
-
ClearSMS : ClearSMS is a Web-based application that lets you send bulk SMS messages to your customers, contacts, or just about anyone.
-
Jajah:jah is a VoIP (Voice over IP) provider, founded by Austrians Roman Scharf and Daniel Mattes in 2005[1]. The Jajah headquarters are located in Mountain View, CA, USA, and Luxembourg. Jajah maintains a development centre in Israel.
-
Skype: It’s free to download and free to call other people on Skype. Skype the number one voice over ip software
- PrivatePhone: a free local phone number with voicemail and messages you can check online or from any phone.

Original Source: