This repository holds the files for my personal website, accessible at tdemareuil.github.io.
It also includes a SQL queries translator (Hive ↔︎ Presto ↔︎ Vertica) that I developed as a side project next to my job at Criteo. It's available at: https://tdemareuil.github.io/translate_sql.html.
- Faster conversion of SQL queries between Hive ↔︎ Presto ↔︎ Vertica
- Less hurdle checking the small language details manually, less errors
- Mainly for data analysis purposes, not prod
- Some array functions and concatenation operations still aren’t supported by the script
- I tried to make it robust to a lot of situations, but sometimes you still might need to check the output of conversion
- Translating more complex, production-ready queries, and potentially in batches, is a very complicated task and could be an entire project of its own.
|
Hive |
Presto |
Vertica |
Translation available |
---|---|---|---|---|
Operators & common functions |
|
|
|
✅ |
|
|
|
✅ |
|
Supports |
Supports |
Doesn't support |
✅ |
|
Supports |
Supports |
Supports |
✅ |
|
Doesn't support |
Doesn't support |
Supports |
✅ |
|
Use |
Use |
Use |
✅ |
|
To compute any quantile, use |
To compute any quantile, use |
To compute any quantile, use |
✅ |
|
Data types |
Dividing |
Dividing |
Dividing |
✅ |
Supports |
Doesn't support |
Doesn't support |
✅ |
|
Cannot use |
Cannot use |
Can use |
✅ |
|
Syntax |
Can start an identifier with numbers |
Identifiers that start with numbers must be quoted using double quotes: |
Identifiers that start with numbers must be quoted using double quotes: |
✅ |
Doesn't support |
Doesn't support |
Supports |
✅ |
|
Uses |
Uses |
Uses |
✅ |
|
Cannot use column index in |
Can use column index in |
Can use column index in |
✅ |
|
Array operations |
Arrays are indexed starting from 0 |
Arrays are indexed starting from 1 |
Arrays are indexed starting from 0 |
✅ |
Aggregate values as an array: use |
Aggregate values as an array: use |
Use |
✅ |
|
Standalone array deduplication: use |
Standalone array deduplication: use |
Seems like we cannot deduplicate an array without cross-joining. |
🚫 |
|
Forced to use |
All the main array functions (min, max, sum, avg, etc.) start with |
All the main array functions (min, max, sum, avg, etc.) start with |
🚫 |
|
No direct equivalent |
Complex array functions such as |
No direct equivalent |
🚫 |
|
|
|
|
✅ |
|
Use |
Use |
Should be able to use |
✅ |
|
Concatenate strings with |
Concatenate strings with |
Concatenate strings with |
🚫 |
|
|
|
|
✅ |
|
|
|
|
✅ |
|
|
|
|
✅ |
|
Date & Time |
Use |
Use |
Use |
✅ |
|
|
|
✅ |
|
Supports |
Doesn't support |
Doesn't support |
✅ |
|
You can extract date parts with |
You can extract date parts with |
You can extract date parts with |
✅ |
|
|
|
|
✅ |
|
|
|
|
✅ |
|
|
|
|
✅ |
Usually, to run Python code in the browser, you need to set-up a web app (e.g. Flask) and to host it on a server (e.g. Heroku or PythonAnywhere). However, it's possible to run a short Python script directly in the browser's front end, even more easily, thanks to Pyodide.
Pyodide is a Python distribution for the browser and Node.js based on WebAssembly. It comes with a robust Javascript ⟺ Python foreign function interface so that you can freely mix these two languages in your code. See documentation.
How does Pyodide work in practice?
→ You just have to include the Pyodide <script> at the start of your HTML file. Then you can access functions such as
runPython
. You can access variables from the Python scope from JS, and vice versa. All the main Python packages are supported, and you can import others manually.
You can access my small tool at: https://tdemareuil.github.io/translate_sql.html.
Of course, the UI is very simple (pure HTML), and it's not secure at all (you can actually see if the full Python code if you just inspect the HTML..!) — but it's a first step!