Personal website

This repository holds the files for my personal website, accessible at tdemareuil.github.io.

SQL translator

It also includes a SQL queries translator (Hive ↔︎ Presto ↔︎ Vertica) that I developed as a side project next to my job at Criteo. It's available at: https://tdemareuil.github.io/translate_sql.html.

Goals

Faster conversion of SQL queries between Hive ↔︎ Presto ↔︎ Vertica
Less hurdle checking the small language details manually, less errors
Mainly for data analysis purposes, not prod

Possible improvements

Some array functions and concatenation operations still aren’t supported by the script
I tried to make it robust to a lot of situations, but sometimes you still might need to check the output of conversion
Translating more complex, production-ready queries, and potentially in batches, is a very complicated task and could be an entire project of its own.

Detailed capabilities

User Interface

Usually, to run Python code in the browser, you need to set-up a web app (e.g. Flask) and to host it on a server (e.g. Heroku or PythonAnywhere). However, it's possible to run a short Python script directly in the browser's front end, even more easily, thanks to Pyodide.

Pyodide is a Python distribution for the browser and Node.js based on WebAssembly. It comes with a robust Javascript ⟺ Python foreign function interface so that you can freely mix these two languages in your code. See documentation.

How does Pyodide work in practice?

→ You just have to include the Pyodide <script> at the start of your HTML file. Then you can access functions such as runPython. You can access variables from the Python scope from JS, and vice versa. All the main Python packages are supported, and you can import others manually.

You can access my small tool at: https://tdemareuil.github.io/translate_sql.html.

Of course, the UI is very simple (pure HTML), and it's not secure at all (you can actually see if the full Python code if you just inspect the HTML..!) — but it's a first step!

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets		assets
other template		other template
other		other
README.md		README.md
index.html		index.html
t.png		t.png
translate_sql.html		translate_sql.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Personal website

SQL translator

Goals

Possible improvements

Detailed capabilities

User Interface

About

Uh oh!

Releases

Packages

Uh oh!

Languages

	Hive	Presto	Vertica	Translation available
Operators & common functions	`LATERAL VIEW EXPLODE (original_column) t AS new_column`, with small (but very tricky) syntax differences when you explode an array, a map or an array of structs	`CROSS JOIN UNNEST(original_column) AS t (new_column)`, with small (but very tricky) syntax differences when you unnest an array, a map or an array of structs	`CROSS JOIN UNNEST(original_column) AS t (new_column)`, with small (but very tricky) syntax differences when you unnest an array, a map or an array of structs	✅
	`LATERAL VIEW OUTER EXPLODE` (to keep rows with empty array/map)	`LEFT JOIN UNNEST` (to keep rows with empty array/map)	`LEFT JOIN UNNEST` (to keep rows with empty array/map)	✅
	Supports `IF()` function (and `CASE WHEN`)	Supports `IF()` function (and `CASE WHEN`)	Doesn't support `IF()` function → use `CASE WHEN`	✅
	Supports `LIKE` and `RLIKE` (regex), not `ILIKE` (case insensitive)	Supports `LIKE` only	Supports `LIKE` and `ILIKE` (case insensitive), not `RLIKE` (regex)	✅
	Doesn't support `IFNULL` function → use `COALESCE`	Doesn't support `IFNULL` function → use `COALESCE`	Supports `IFNULL` function (and `COALESCE`). Also supports `NULLIFZERO` and `ZEROIFNULL`.	✅
	Use `PMOD(n,m)` to return the modulus (remainder) of `n` divided by `m`	Use `MOD(n,m)` to return the modulus (remainder) of `n` divided by `m`	Use `MOD(n,m)` to return the modulus (remainder) of `n` divided by `m`	✅
	To compute any quantile, use `PERCENTILE_APPROX(x, 0.7)`	To compute any quantile, use `APPROX_PERCENTILE(x, 0.7)`	To compute any quantile, use `APPROXIMATE_PERCENTILE(x USING PARAMETERS percentile=0.7)`	✅
Data types	Dividing `7` by `2` will result in `3.5`	Dividing `7` by `2` will result in `3` → to perform floating point division on two integers, cast one of them as a double	Dividing `7` by `2` will result in `3.5`	✅
	Supports `STRING` format or `VARCHAR(n)`, with n specified	Doesn't support `STRING` format but supports `VARCHAR(n)`, with n specified or not	Doesn't support `STRING` format but supports `VARCHAR(n)`, with n specified or not	✅
	Cannot use `BOOL` instead of `BOOLEAN`	Cannot use `BOOL` instead of `BOOLEAN`	Can use `BOOL` instead of `BOOLEAN`	✅
Syntax	Can start an identifier with numbers	Identifiers that start with numbers must be quoted using double quotes: `"7day_active"`	Identifiers that start with numbers must be quoted using double quotes: `"7day_active"`	✅
	Doesn't support `::` notation for type casting	Doesn't support `::` notation for type casting	Supports `::` notation for type casting	✅
	Uses ` for identifiers	Uses `"` for identifiers	Uses `"` for identifiers	✅
	Cannot use column index in `GROUP BY` and `ORDER BY` (or change parameters: `SET hive.groupby.orderby.position.alias=true`)	Can use column index in `GROUP BY` and `ORDER BY`	Can use column index in `GROUP BY` and `ORDER BY`	✅
Array operations	Arrays are indexed starting from 0	Arrays are indexed starting from 1	Arrays are indexed starting from 0	✅
	Aggregate values as an array: use `collect_list()`. Add the `DISTINCT` keyword to deduplicate or use `collect_set()` to do both at once.	Aggregate values as an array: use `array_agg()`. Add the `DISTINCT` keyword or `array_distinct()` to deduplicate (note that the `DISTINCT` keyword cannot be used in a window function in Presto).	Use `listagg()` to aggregate values as comma-separated strings, not as a proper array (could use `STRING_TO_ARRAY('['\|\|col\|\|']', ',')` to do the trick, and could add `USING PARAMETERS max_length=1000000` to avoid length errors). Add the `DISTINCT` keyword to deduplicate.	✅
	Standalone array deduplication: use `collect_set(flatname) from (select id, flatname from test lateral view explode(name) t as flatname) g group by id`	Standalone array deduplication: use `array_distinct()`	Seems like we cannot deduplicate an array without cross-joining.	🚫
	Forced to use `lateral view explode` to run array functions (regular min, max, sum, avg)	All the main array functions (min, max, sum, avg, etc.) start with `array_{function}()`. Note that for the average it’s `array_average()` (not `avg`)	All the main array functions (min, max, sum, avg, etc.) start with `array_{function}()`.	🚫
	No direct equivalent	Complex array functions such as `map`, `transform`, `map_from_entries`	No direct equivalent	🚫
	`size()` returns the length of an array	`cardinality()` returns the length of an array	`array_length()` returns the length of an array	✅
	Use `ARRAY(content)` to create an array	Use `ARRAY[content]` to create an array	Should be able to use `ARRAY[content]` to create an array (?)	✅
	Concatenate strings with `concat_ws('sep', string1, string2, etc.)` or `concat_ws('sep', array(strings))` (input must be strings or array of strings, contrary to Presto and Vertica)	Concatenate strings with `array_join(array, 'sep')` or with the `\|\|` operator	Concatenate strings with `concat(val1, val2)` (2 elements max) or with the `\|\|` operator	🚫
	`NAMED_STRUCT('A', col_A, 'B', col_B)` (realiasing inside the struct is not possible when doing Hive → Presto / Vertica)	`ROW(col_A, col_B)` or `CAST(ROW(col_A, col_B) as ROW(A int, B int))`(realiasing of the struct is possible when doing Presto / Vertica -> Hive)	`ROW(col_A, col_B)` or `CAST(ROW(col_A, col_B) as ROW(A int, B int))`(realiasing of the struct is possible when doing Presto / Vertica -> Hive)	✅
	`map_from_arrays(key, collect_list(value))` or `map_from_arrays(key, collect_set(value))`	`MAP_AGG(key, value)`	`MAPAGGREGATE(key, value)`	✅
	`array_contains(array, x)`	`contains(array, x)`	`array_contains(array, x)`	✅
Date & Time	Use `from_unixtime()` and `unix_timestamp()` functions to convert timestamp ↔︎ unixtime	Use `from_unixtime()` and `to_unixtime()` functions to convert timestamp ↔︎ unixtime	Use `to_timestamp()` and `EXTRACT(EPOCH from date)` to convert timestamp ↔︎ unixtime	✅
	`select current_date - INTERVAL '10' day` returns a timestamp, and doesn't need `''`	`select current_date - INTERVAL '10' day` returns a date and needs `''`	`select current_date - INTERVAL '10' day` returns a timestamp and needs `''`	✅
	Supports `to_date()` function	Doesn't support `to_date()` function → use `DATE()`	Doesn't support `to_date()` function → use `DATE()`	✅
	You can extract date parts with `TRUNC(str, pattern)` or `EXTRACT(part from str)`	You can extract date parts with `DATE_TRUNC(part, date)` or `EXTRACT(part from date)`	You can extract date parts with `TRUNC(date, pattern)` or `TIMESTAMP_TRUNC(date, pattern)` or `DATE_TRUNC(part, date)` or `DATE_PART(part, date)` or `EXTRACT(part from date)`	✅
	`DATEDIFF(str, str)` only returns a day difference, works with strings, and substracts the other way round vs. Presto and Vertica	`DATE_DIFF(unit, date, date)`	`DATEDIFF(unit, date, date)` or `TIMESTAMPDIFF(unit, str, str)`	✅
	`DATE_FORMAT(str, str_format)` works with a date already formatted as a string + beware of pattern letters differences	`DATE_FORMAT(date, str_format)` only works with a date + beware of pattern letters differences	`TO_CHAR(date, str_format)` only works with a date + beware of pattern letters differences	✅
	`DATE_ADD(str, value)`, can only add days, supports date string, negative values are possible but `DATE_SUB` exists as well	`DATE_ADD(unit_str, value, date)`, with possible negative value, only datetime formats	`TIMESTAMPADD(unit_str, value, date)`, with possible negative value, supports date string, returns a timestamp (not date)	✅

tdemareuil/tdemareuil.github.io

Folders and files

Latest commit

History

Repository files navigation

Personal website

SQL translator

Goals

Possible improvements

Detailed capabilities

User Interface

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages