Explained: Create Empty Dataframe in Databricks

Are you looking to find how to use the Empty Dataframe within the Databricks cloud or maybe you are looking for a solution, to how to use the Empty Dataframe in the Databricks? If you are looking for any of these problem solutions then you have landed on the correct page. I will also show you what and how to use Empty Dataframe. I will explain the Empty Dataframe by taking a practical example. So don’t waste time let’s start step by step guide to understanding what is the Empty Dataframe in Databricksחליפות מידות גדולות נשים jorgensenfarmsinc.com bežecká obuv propiedadesenrepublicadominicana.com bogner overal bežecká obuv automatický dávkovač mýdla lidl bežecké tenisky Mens VANS 2020 bežecké tenisky jorgensenfarmsinc.com haynesplumbingllc.com janwoodharrisart.com Mens VANS 2020 automatický dávkovač mýdla lidl

What is an Empty Dataframe in Databricks?

Empty Dataframe in Databricks is a dataframe with no data. We can also create an empty databrame with no schema as well. For example, we will be having dataframe with no columns and no data.

How can we create an Empty Dataframe in Databricks?

SparkContext.emptyRDD function, for example spark.sparkContext.emptyRDD, is used to create an empty RDD.

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('azurelib.com').getOrCreate()

#Creates Empty RDD

emptyRDD1 = spark.sparkContext.emptyRDD()
print(emptyRDD)


#Output

EmptyRDD[0] at emptyRDD at NativeMethodAccessorImpl.java:0

Similarly, we can also get the Empty RDD by using parallelize().

#Creates Empty RDD using parallelize

emptyrdd2= spark.sparkContext.parallelize([])
print(emptyrdd2)


#Output
ParallelCollectionRDD[1] at readRDDFromInputStream at PythonRDD.scala:413


  • we can also Create Empty DataFrame with Schema. For that create a schema first using StructType and StructFields in order to create an empty Databricks DataFrame with schema (column names & data types).  
#Create Schema

from pyspark.sql.types import StructType,StructField, StringType
schema = StructType([
  StructField('firstname', StringType(), True),
  StructField('lastname', StringType(), True)
  ])

  • Then pass the RDD composed above along with column names and data types to createDataFrame() of SparkSession.  
#Create empty DataFrame from empty RDD

df1 = spark.createDataFrame(emptyRDD,schema)
df1.printSchema()

#Output

root
 |-- firstname: string (nullable = true)
 |-- lastname: string (nullable = true)

  • We can also convert Empty RDD to DataFrame by using toDF().
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('azurelib.com').getOrCreate()

#Creates Empty RDD

emptyRDD = spark.sparkContext.emptyRDD()


#Convert an empty RDD to Dataframe

df2 = emptyRDD.toDF(schema)
df2.printSchema()

  • Up to now, I have covered creating an empty DataFrame from RDD, but I will now create it manually using schema and without RDD.
#Create Schema

from pyspark.sql.types import StructType,StructField, StringType
schema = StructType([
  StructField('firstname', StringType(), True),
  StructField('lastname', StringType(), True)
  ])


#Creating an empty DataFrame directly

df3 = spark.createDataFrame([], schema)
df3.printSchema()

  • We can also create an empty dataframe without schema ( without columns ). Create a schema and use it when creating Databricks DataFrame to create an empty DataFrame with no schema (no columns).
#Create empty DatFrame with no schema (without columns)

df4 = spark.createDataFrame([], StructType([]))
df4.printSchema()

#Output

root

What is the Syntax for Empty Dataframe in Databricks?

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('azurelib.com').getOrCreate()

#Creates Empty RDD

emptyRDD1 = spark.sparkContext.emptyRDD()
print(emptyRDD)

#Creates Empty RDD using parallelize

emptyrdd2= spark.sparkContext.parallelize([])
print(emptyrdd2)


#Creating Empty Dataframe using createDataFrame()

spark.createDataFrame([], StructType([]))


Dataframe Argument Details :

DataThe Actual Data
ColumnsColumn Names

FULL Example of Empty Dataframe in Databricks:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('azurelib.com').getOrCreate()

#Creates Empty RDD

emptyRDD = spark.sparkContext.emptyRDD()
print(emptyRDD)

rdd2= spark.sparkContext.parallelize([])
print(rdd2)


#Create a Schema

from pyspark.sql.types import StructType,StructField, StringType
schema = StructType([
  StructField('firstname', StringType(), True),
  StructField('lastname', StringType(), True)
  ])


#Two different ways to create Empty dataframe

df1 = spark.createDataFrame(emptyRDD,schema)
df1.printSchema()

df2 = emptyRDD1.toDF(schema)
df2.printSchema()


#Create empty DatFrame with no schema (without columns)

df4 = spark.createDataFrame([], StructType([]))
df4.printSchema()

When you should use Empty Dataframe in Databricks?

There are certain use case scenarios when it is recommended to use the Empty Dataframe within the Databricks cloud data warehouse which is as follows:

  • If we want to create a dataframe without data and schema, then we can make use of this method to create Empty Dataframe in Databricks.

Real World Use Case Scenarios for Empty Dataframe in Databricks?

  • A company with no superiors and employees.
  • A bank with no staff and customers.

Dataframe Official Documentation Link

Final Thoughts

In this article, we have learned about Empty Dataframe and their uses with the examples explained clearly. I have also covered different scenarios with a practical example that could be possible. I hope the information that was provided is helped in gaining the knowledge.

Please share your comments and suggestions in the comment section below and I will try to answer all your queries as time permits.