If you don’t understand this your calculation will be wrong So these calculations will matter for your interview Central That’s what I’m saying Because if you go for a typical interview where if you say you know spark these are the things people ask usually I mean they weren’t asked like Hey what is the difference between spark and map Anybody will say that they know that right So these are kind of questions they left from Bellevue Get memory Who will give memory What will happen if I do this Even I am not an expert on the spot I’ve started working like three years or monster but some of the things even I’m reserve place Sometimes you will say Oh this is possible I didn’t know this So then you’ll run it and see that Okay I can see this but not everything you can’t learn or so in spite, Okay So this part partitions and ah battle is um.
So now if you look at the picture does it make more sense This picture So you have four blocks you ask for four containers for partitions Ideal case Each is loaded and that’s called it our dearly I Now the real question How do you write a spark program I basically that is what you want Oh apart from our deity or lamb and all Once you get the data you to analyze the data So how do we analyze the data in python Did you learn something called a higher-order function So normally when you write a python function you will say function than function Name blah blah blah Let’s have our deaf and or then you write the function and you will reuse the function Why are you creating a function you can garlic anything There is something coiled on anonymous function or disposable function Excitement to create a function.
I will use it only ones I don’t mind it anymore so you don’t have to really give a name for the function or different You can create it on the flight that is called Anonymous Function Scored Anonymous Function Anonymous Okay in a spot programming inspired programming What we do is that we have something called higher-order functions That is something called it Hyatt Order Function What is a higher-order function Let’s say I have a function called ABC I can’t pass another function to this function That’s called a higher-order function meaning this ABC is a function on.
Normally you will pass some parameter or something Some value right you’ll say is this means this That is what you said But I can have a function and I can pass another function Tow this function that’s called a higher-order function so we will be passing and autonomous functions here This lambda isn’t an owner’s on only must function I will show you the court It’ll become better but inspired basically what we do is that once you create an art dealer now you have the data Really Your data now I want to process the data How do you process the data You have something called transformations So it is actually there is a spot website It’s also good that you can look at spark Official website How do you go Spark daughter Patchy dot org.
Yeah So spot not Apache dot org is the official website off spot If you bought documentation it’ll say latest releases two or three or zero Right on These are the older versions and all If you click on here you can see all the spark versions You can see that 163 It was the last spark One version Now the Arun who trees This is our abortion latest portion on if you go to the documentation let’s say the latest release Imagine the latest release, Okay if you scroll down Okay Here You can see our dear programming guy See And if you click on this okay And scroll down a bit This we will come back You can see a resilient distributed data set or are Really this is what we created light So we just created an rdd at least theoretically.
So once you create an rdd I will show you how to create it, Okay You have are really operations So now my data is available is in our daily What can I do that the artery Right So that is where you can start writing your functions, Okay Anonymous function on the Yeah These are the transformations do This is what you need to understand So these are all transformations You can use them Spark map filter flat map that has many actually So if I want to filter my later I will just call this failure Okay If I call Filter it will ask me What do you want me to filter within this bracket I will write my expression to filter That is how you feel about your date Map is another It is like for each Okay map I will call Mapple Asked me like what you want me to do So will write an expression within Mac what map has to perform So these are all higher-order functions Map filter flat map There’s that all had order functions actually.
So you do something called transformations in one our daily If you apply any of this functions it will create a new artery That’s for transformation That is how you analyze your later there So I want to filter my date I will call the filter transformation and our ladies are immutable Very important point Once you create an art dealer you cannot change it You can only create another one by applying some logic you can never edit on are really there immutable right If I go to my pad Yeah so we have created a log Lines are really fine builders We have understood on the Then what I did probably I’m interested only in error messages from this artery So you know you see that A lot of data in four warning eras I want only error messages to filter.
So what I can do I can call the fill their transformation Okay so I can call a filter transformation on then I can say that Hey spark Okay match only error lines and give it to me and I’ll show you how to light the logic This will produce another are very and I can call it as error Sandra Dee Dee This is the steps in which you write a spot program First your creator rdd on Now I want to do a filter I will call a filter on it will you know change whatever I mean little fills the only other messages on that I will store it as another Really Now somebody was asking me what will happen to the memory Right Settled Believe this RV I mean if there is not enough memory.
So let’s say this is really fit into the memory And then you call this filter action it will filter whatever this record and this will be gone This is not required now because you have this right because next processing that starts from this okay or is Emily Let’s assume in the normal use case you call a functioning to create and that is really on This artist is going No no this is your current data So if you have any builder in young dynamic allocation then this will be gone I told you right This is actually a partition And the executor is running here This guy will be ideal because it doesn’t know what it cannot predict That there’ll be no date.
All right So that excuse So that is one more problem Now your problem is you have four executors The second executor has no data process because it’s filter all the other There is no Arab of this guy will be sitting idle So one date and all will be there when executed will be there That has no later will simply sit idle So that becomes a problem right How can I solve that problem It’s actually very easy to inspire There is a transformation called police oil leases a very common transformation And why do you call koi lease.