By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,778 Members | 1,318 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,778 IT Pros & Developers. It's quick & easy.

How to call a defined function with map()?

P: 3
Hello,

I am trying to call a defined function with the map(). I understand that that the map function appl the same procedure to every item in an iterable data structure. But the pre-defined is not being called.

I have a defined function, like this:

Expand|Select|Wrap|Line Numbers
  1. def process (records):
  2.    fields = records.map(lambda x: x.split('ID:')[0].split('\n'))
  3.    return fields;
  4.  
Here is the map() and the defined function being called:

Expand|Select|Wrap|Line Numbers
  1. files = sc.wholeTextFiles('file:///data/*/*')
  2. records = files.map(lambda x: x[1])
  3. results = records.map(lambda x: process(x))
  4.  
There are 4,560 files loaded up by wholeTextFiles. Perhaps, I am missing something basic. Can you help?
Feb 26 '18 #1
Share this Question
Share on Google+
4 Replies


Expert 100+
P: 621
List comprehension may work better
Expand|Select|Wrap|Line Numbers
  1. fields=[x.split('ID:')[0].split('\n') for x in records] 
but since you don't open and read the files (don't know what wholeTextFiles does) you won't have any data for list comprehension/map.
Feb 26 '18 #2

P: 3
Hi dwblas, thanks for your reply. WholeTextFiles = Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. https://spark.apache.org/docs/2.0.2/...rkContext.html

It returns the file name and the contents of the file, a pairRDD.

I did a count on “records” and there are 4,650 files with data, so there is something there in “records”.

I am new to Python. If I had a RDD, how would I call a predefined function with the map(), so I can iterate through each data set in the RDD? This does not work: results = records.map(lambda x: process(x))
Feb 27 '18 #3

Expert 100+
P: 621
What happened when you tried the code I posted. This list comprehension replaces
Expand|Select|Wrap|Line Numbers
  1. records = files.map(lambda x: x[1])
so hopefully it will now work but I am shooting in he dark as don't know what files and records contains.
Expand|Select|Wrap|Line Numbers
  1. records=[x[1] for x in files] 
Feb 27 '18 #4

P: 3
Hi Dwblas, I was able to figure this out. I was calling a defined function that has a map() with the map(). I experienced that this call errors. I rewrote my defined function that does not include the map().

Expand|Select|Wrap|Line Numbers
  1. files = sc.wholeTextFiles ('file:///data/*/*')
  2. records = files.map(lambda x: x[1])
  3. results = records.map(lambda x: process_records(x))
  4.  
Mar 1 '18 #5

Post your reply

Sign in to post your reply or Sign up for a free account.