sqoop - sqoop2 - sqoop2 tutorials - apache sqoop - sqoop tutorial - sqoop hadoop
What is Sqoop2? - sqoop2 - sqoop2 tutorial
- Teradata, MySQL, PostgreSQL, Oracle, Netezza
- HDFS (text, sequence file), Hive, HBase, Avro And vice versa
sqoop-hadoop-hive :
Sqoop1 Architecture :
- CrypAc, contextual command line arguments
- Tight coupling between data transfer and output format
- Security concerns with openly shared credentials
- Not easy to manage installation/Configuration
- Connectors are forced to follow JDBC model
Sqoop2 Architecture :
Sqoop1: Client side Tool
- - Connectors are installed/configured locally
- – Local requires root privileges
- – JDBC drivers are needed locally
- – Database connecAvity is needed locally
Sqoop2: Sqoop as a Service - client side tools :
- – Connectors are installed/configured in one place
- – Managed by administrator and run by operator
- – JDBC drivers are needed in one place
- – Database connectivity is needed on the server
Client Interface
- – Command line interface (CLI) based
- – Can be automated via scripting
- – CLI based (in either interactive or script mode)
- – Web based (remotely accessible)
- – REST API is exposed for external tool integration
Sqoop 2: Connection vs Job metadata :
- Connection (distinct per database)
- Job (distinct per table)
- Connectors Register Metadata
- Metadata enables creation of Connections and Jobs
- Connections and Jobs stored in Metadata Repository
- Operator runs Jobs that use appropriate connections
- Admins set policy for connection use
Sqoop 2: Security
- Administrators create/edit/delete connections
- Operators use connections