Have you ever wondered how people create URL shortening websites. They just do it using common sense. You heard it right. I too thought it is a very big task. But after thinking a bit, I came to know that simple mathematical concepts can be used in writing beautiful applications. What is the link between mathematics and URL shortening?. That is what we are going to unveil in this article.
In a single statement URL shortening service is built upon two things.
- String mapping Algorithm to map long strings to short strings ( Base 62)
- A simple web framework (Flask, Tornado) that redirects a short URL to Original URL
There are two obvious advantages of URL shortening.
- Can remember the URL. Easy to maintain.
- Can use the links where there are restrictions in text length Ex. Twitter.
Technique of URL shortening
There is nothing like URL shortening algorithm. Under the hoods, every record storing in the database is allocated with one Primary Key(PK). That PK is passed into an algorithm which in turn generates a string. We will indirectly map that short string with the URL that customer registers with us.
I visit website of Bit.ly and pass my blog link http://www.impythonist.wordpress.com to it. Then I got this short link.
Here one question comes to our mind. How they reduce lengthy string to a short one? . They are not actually reducing size of original link.They just do abstraction here. Steps every one do are:
- Insert a record with URL into database
- Use the record ID returned to generate the short string
- Pass it back to Customer
- Whenever you receive a request, then extract short string from URL and re-generate Database record ID -> Fetch the URL -> Simple Redirect to Website
That’s it. It is very simple to generate a short string from a given large number using Base62 Algorithm. Whenever a request comes to our website, we can get back the number by decoding the short string from URL. Then use that number ID to fetch record from database and redirect to that URL.
Let us build one such URL shortener in Python
Code for this project is available at my git repo. https://github.com/narenaryan/Pyster
As I told you before there are three ingredients in preparing a URL shortening service.
- Base62 Encoder and Decoder
- Flask for handling requests and redirects
- SQLite3 for serving the purpose of database
Now If you know about converting Base10 to Base64 or Base62( any base) then you can proceed with me. Other wise just see what are base conversions here.
I here interested only in Base62 because I need to generate strings which are combinations of [a-z][A-Z][0-9]. Encoder maps integer to a string. Decoder generates integer from given string. They are like Function and Reverse Functions. This is the Base62 code for encoder and decoder in Python
from math import floor import string def toBase62(num, b = 62): if b <= 0 or b > 62: return 0 base = string.digits + string.lowercase + string.uppercase r = num % b res = base[r]; q = floor(num / b) while q: r = q % b q = floor(q / b) res = base[int(r)] + res return res def toBase10(num, b = 62): base = string.digits + string.lowercase + string.uppercase limit = len(num) res = 0 for i in xrange(limit): res = b * res + base.find(num[i]) return res
$ sqlite3 urls.db
Now I am creating main.py for flask app and a template file.
# main.py from flask import Flask, request, render_template, redirect from math import floor from sqlite3 import OperationalError import string, sqlite3 from urlparse import urlparse host = 'http://localhost:5000/' #Assuming urls.db is in your app root folder def table_check(): create_table = """ CREATE TABLE WEB_URL( ID INT PRIMARY KEY AUTOINCREMENT, URL TEXT NOT NULL ); """ with sqlite3.connect('urls.db') as conn: cursor = conn.cursor() try: cursor.execute(create_table) except OperationalError: pass # Base62 Encoder and Decoder def toBase62(num, b = 62): if b <= 0 or b > 62: return 0 base = string.digits + string.lowercase + string.uppercase r = num % b res = base[r]; q = floor(num / b) while q: r = q % b q = floor(q / b) res = base[int(r)] + res return res def toBase10(num, b = 62): base = string.digits + string.lowercase + string.uppercase limit = len(num) res = 0 for i in xrange(limit): res = b * res + base.find(num[i]) return res app = Flask(__name__) # Home page where user should enter @app.route('/', methods=['GET', 'POST']) def home(): if request.method == 'POST': original_url = request.form.get('url') if urlparse(original_url).scheme == '': original_url = 'http://' + original_url with sqlite3.connect('urls.db') as conn: cursor = conn.cursor() insert_row = """ INSERT INTO WEB_URL (URL) VALUES ('%s') """%(original_url) result_cursor = cursor.execute(insert_row) encoded_string = toBase62(result_cursor.lastrowid) return render_template('home.html',short_url= host + encoded_string) return render_template('home.html') @app.route('/<short_url>') def redirect_short_url(short_url): decoded_string = toBase10(short_url) redirect_url = 'http://localhost:5000' with sqlite3.connect('urls.db') as conn: cursor = conn.cursor() select_row = """ SELECT URL FROM WEB_URL WHERE ID=%s """%(decoded_string) result_cursor = cursor.execute(select_row) try: redirect_url = result_cursor.fetchone() except Exception as e: print e return redirect(redirect_url) if __name__ == '__main__': # This code checks whether database table is created or not table_check() app.run(debug=True)
- We have Base62 encoder and decoder
- We have two functions one is index. Another one is short_url
- Index function(‘/’) returns home page and also posts original URL into database
- short url(‘/short_url’) just recieves the request for redirect and finally redirects shortened URL to Original URL. If you observe code carefully, you can easily grasp things.
We can also give look at template here. https://raw.githubusercontent.com/narenaryan/Pyster/master/templates/home.html .
Project structure looks this way.
Run the flask app on port 5000.
$ python main.py * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) * Restarting with stat......
If you visit http://localhost:5000 in your browser you will see
Now enter URL to shorten and click submit. It posts data to database and generates short string like below image. In my case it is http://localhost:5000/f . The string seems to be very short, but as no of URLs registered increase the string increases gradually. Ex. 11Qxd etc