You can see that you can see partitions You can see there Dag You can see how many executives it is launching Everything is visible It’s not just theory Terry um thing But unless I speak about it and just sure Then you will say how this came Know how the partitions actually came You will be confused and the driver collects the data So this is the first step If you want to look at it once more If you look at the picture I have a very good picture to explain this Just see this picture Same stuff I’m doing That is law glance Add arrests are really clean orderly And finally I’m calling an action called Count Countess and other action.
So if you say count everything will work from the beginning because it has to come Right So count Action called And what happened It showed me that a cell test five women fire lines in the artery No surprises Now I also want to call one more action on the rd recourse safe to Cassandra The problem is if I call this action by planets empty spark has to start from reading from the block creating the original artery then filtering then narrows all Then I can say to Cassandra Okay now one more thing then after I’m doing one more filter I want only messages One from this data See message one stage one I just get on living Stage one and then I say collect So I just want to see this.
But again what will happen It will start from the no So if you look at this three actions you can see all three actions depend on this oddity Everything is starting from here I That is where you can cash and our TV It is possible to cash on our TV So when I say cashing whatever data you have in that artery along with our dearly will be cashed and then there can you cash you can cash in Graham You can cash in hard disk and you can also say RAM plus hard disk lake where everyone So there are multiple options to cash it and burns your cash and our lady.
Let’s say you say count it will count Then you say save Cassandra It’ll start from here because this artery data is already cashed on Don’t think if you’re catching your cashing for one month once the spark program is over cashing his deleted cashing is holding their daytime memory without flushing So normally when you say collector temporary So this is cashing is valued only till your program completes all the action Once all actions are over you exit everything is gone Then you’re not So I will show you tomorrow and go start a spot spark application If I’m a development I will create something called Spark and Next Object There’s an object I create and that presents my program And I really When my program finishes I’ll kill that object If I kill that everything is going.
Even cashing will be gone Because that cashing his valued only the spirit of my existing program I’m not cashing it for a program All right Next month that is not possible During that execution time I want to spirit up So you’re so-called rdd answer dot cash Then it’ll cash it otherwise of your cash Once you say cash is a transformation So if I said cash not cash protocol in action to cash it after that right So I have to say dot cash Then count that look counted and cash it the previous artery.
Then whatever action I call it to start from there it is in memory But once you call an action everything is related from memory Right So that is that cashing makes sense right If you cash it even if you call to action it in or delete it from memory it’ll keep it See I created this sickly Narnia What of it on I say count It will count and delete Or this thing from memory right It is not in memory during the execution reason memory after the execution is all where nothing is in memory, Ah Then you should keep in the intermediate data in memory right You should tell us far I will call an action Show me the output But this rdd whatever data you have don’t delete from your pipeline Keep it because I don’t want to use it for some other action than anything.
That’s for cash in So there is a command called un persist Unprocessed is an action sort of transformation If use the un persistently believe it from the market I’ll show you how to catch tomorrow It’s very easy Otherwise if you kill your spark our next object like your session is gone it’ll automatically be deleted You can’t rental battle plans the Only one knew Kendra So So basically so you may be thinking So how do I start programming with the spot I I want to write us a proper spot program So the first thing you should learn how to create an rdd If you know how to create in our Lady then I can start with the simper single transformations map Now there is also a flip start off all of this The flip side is initially even spar came into the industry.
Everybody was mad about our release transformations and I’ll write a mat filter People were literally dying on writing these things, Okay but soon there was a problem with this See the problem is if you write your court using these transformations and actions there is no way spark can optimize your cold Why am I saying this is that you say Phil Okay In the filter You’re saying that filter only other messages by unless Parker runs it It doesn’t know what you’re talking about on this These things do not have a very strict schema All these are not having very strict schema So then you want to process structured it Alexa like C s Wi-Fi Okay I want to really see us We file on I want control I want ski My letting is really is not a good way to process it You can process it but spark internally will not be able to optimize your court.
That is where sparks equal comes into the picture for data frames We call it So Spark has a moral core Sparks equal and sparks equal is much smaller powerful on optimizing course park So if any right sparks equal you should know how to create a table and query using secrets and also receive quickly so that you see for two programs on More than that it is much more optimized and core our duties because you’re using all Lambda and Spark, has no way to understand what land that you’re using unless it transit Right I wrote the Lambda Court Some Beard Court Spark has nowhere to understand what is the meaning of this Lambda unless it sees the data But if it is a table I can’t say Phil on this column Spot knows what is that column for dessert later If I were it is right It can avoid that calling by loading the data to optimize.
I’m saying right so if I write a sequel query it has a filter grew by Then join and something spark can read it Apply the scheme and understand what you’re talking about Now I’ll tell you if you write a sequel Query very wrote a joint and then a filter Then the filter will come first right, Huh That is not possible here I’m saying because it doesn’t know very well filtering There is no strip scheme All right It’s not a sequel right It’s an anonymous function So it doesn’t know what you’re filtering Burial filtering So it has to load the free later Then do a join then go for whatever you wrote for filter So in secret side spots equal side it is much more optimized So what is recommended by spark ISS Lerner really is nice Great Do some transformations and understand but stick with sparks equal so support.
It’s more Actually the corps are really transformation than actions are very less use now because most of the people started writing date after inquiries like sequel queries and of course you need to bring structure to the later I understand But most of the data you can get it in structure for now, Okay They’re tough Mr. Collection off data sets a very low in a data frame is nothing but a day Pass it That is the definition Ah so usually your own operate on data sets much because data every mass such contained state as it’s right when you have a cluster So I’m just connected to the cluster right now right Spark gives you a shell toe Work like the high shelf if you remember So you started high even started typing So very similar toe that spark use you something called a spark-shell on the spark-shell is available in Python Scala On our there are no job Marshal Java assets does not have a shell functionality.
You’re right using I d or something By now if you want your spark shell in Scholar we’re going to do python I think I’ll export apart to get started this partial So let me try this Let’s try the pie turns shell by spark to the shirt 99 So this job eight as off now Java it is supported Okay I think in the cluster they have disabled a scallop shell Access Maybe I have to export some configuration to do that That’s why I’m not able to open the scholarship But if you want a pie tone shell your type by spark to that’s a command wipe I spark to because we were inspired to this plaster has sparked one and spark to install You can try this We will just I know some dough Some basics Not like our grand stuff but yeah So scattered So very simple So this is hereby township Okay On what It stays when starting the shell it stays type help and blah blah blah Logging level for spark on and or it’s a spark version is Stuart would or zero So on the cloud Ex lab yearning Totoro Totoro zero using by turn version 275 That’s okay Okay Spark session available as spark I will talk about this later Later What is this Park session But it just stays X box session of a liberalized spark.