About 89,800 results
Open links in new tab
  1. What is the difference between AWS Glue ETL Job and AWS EMR?

    Jun 7, 2020 · When to choose aws glue Data size is huge but structured i.e. it is in the table structure and is of known format (CSV, parquet, orc, json). Lineage is required, if you need the data lineage …

  2. AWS Glue: passing additional Python modules to the job ...

    Jan 26, 2023 · I'm trying to run a Glue job (version 4) to perform a simple data batch processing. I'm using additional python libraries that Glue environment doesn't provide with - translate and langdetect.

  3. How to use extra files for AWS glue job - Stack Overflow

    Apr 15, 2020 · Seems like the glue job doesn't accepts .zip format ? if yes, then what compression format shall I use ? UPDATE: I checked out that glue job has option of taking in extra files …

  4. How to run parallel threads in AWS Glue PySpark?

    Jul 4, 2020 · Execute the job in parallel using Glue workflows or step functions. Now suppose you have 100 table's to ingest, you can divide the list in 10 table's each and run the job concurrently 10 times.

  5. python - AWS Glue: ModuleNofFoundError - Stack Overflow

    Sep 9, 2021 · In my glue script (Spark 3.1, Python 3, Glue 3) I'm trying to use df.to_excel() function from pandas library. Apparently pandas library has dependencies on openpyxl. My code is: import sys …

  6. AWS Glue - CRON Scheduled Trigger - Stack Overflow

    Mar 21, 2023 · I have been trying to find a Cron expression to use in my AWS Glue Job. I tried many examples and I'm still not sure if this is possible or NOT, since this is the first time I'm adding time …

  7. How to pass environment variables to AWS Glue - Stack Overflow

    Mar 21, 2022 · You can use these parameters when building platforms and custom frameworks on top of AWS Glue, to let your users write jobs on top of it. Enabling these two flags will allow you to set …

  8. Failed to connect Amazon RDS in AWS Glue Data Catalog's connection

    Aug 14, 2023 · Extra Notes Based on AWS Glue doc, Glue 4.0 is using PostgreSQL JDBC driver 42.3.6 which is quite new. However, I am not sure why the connection still failed. Note my Amazon RDS is …

  9. AWS Glue -- pass jar file to Glue Job Properly - Stack Overflow

    Mar 1, 2024 · I have a working AWS Glue pyspark script that I'm trying to optimize. The script reads large text gz files, does some light transformation, and then loads them by partition in to a parquet …

  10. python - AWS Glue psycopg2 installation - Stack Overflow

    Aug 4, 2020 · I'm trying to run a code that uses psycopg2 to manipulate a Redshift instance. I have tried by importing a wheel file as I see they are supported in Glue python jobs. I see the library is installed...