468,740 Members | 2,105 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,740 developers. It's quick & easy.

multiple conditional substring function

Hi There,

I am new to python so please be kind to me.
I am learning the substring function, so I am just trying different cases to get familiar.

I have managed to get the output for a single substring function. However when I apply multiple conditions then I get the following error. Appreciate if you guys educate me on this.

Expand|Select|Wrap|Line Numbers
  1. data = {'name': ['John', 'Aaron', 'Anie', 'Nancy', 'Steve'], 
  2.         'Gender': ['00M00','00M00','00F00','00F00','00x00'], 
  3.         'Dept': ['01MK00', '02FN00', '03LG00', '04HR00', '05DR00']}
  4. df = pd.DataFrame(data, columns = ['name', 'Gender', 'Dept'])
  5. df
  6.  
  7.  
  8. var=[]
  9.  
  10. for i in df["Gender"]:
  11. for x in df["Dept"]:
  12.  
  13.     if i[2].lower()=='m' & x[2:4].lower()=='mk':
  14.         var.append('Male in Marketing')
  15.  
  16.     elif i[2].lower()=='f' & x[2:4].lower()=='fn':
  17.         var.append('Female in Finance')
  18.  
  19.     else:
  20.         var.append('Others')
  21.  
  22. Error message below
  23.   File "<ipython-input-79-ff06a7e562be>", line 4
  24.     for x in df["Dept"]:
  25.       ^
  26. IndentationError: expected an indented block
  27.  
  28.  
Regards,
CK
Jul 8 '20 #1

✓ answered by SioSio

It can use the built-in function zip() to get the values ​​of multiple columns at once.

Expand|Select|Wrap|Line Numbers
  1. for Gender, Dept in zip(df['Gender'], df['Dept']):
  2.     if Gender[2].lower() in 'm' and Dept[2:4].lower() in 'mk':
  3.         var.append('Male in Marketing')
  4.     elif Gender[2].lower()in 'f' and Dept[2:4].lower() in 'fn':
  5.         var.append('Female in Finance')
  6.     else:
  7.         var.append('Others')
  8.  

7 2688
SioSio
252 128KB
If you just want to fix the error in this code:
Expand|Select|Wrap|Line Numbers
  1. import pandas as pd
  2.  
  3. data = {'name': ['John', 'Aaron', 'Anie', 'Nancy', 'Steve'],
  4.         'Gender': ['00M00','00M00','00F00','00F00','00x00'],
  5.         'Dept': ['01MK00', '02FN00', '03LG00', '04HR00', '05DR00']}
  6. df = pd.DataFrame(data, columns = ['name', 'Gender', 'Dept'])
  7. df
  8.  
  9.  
  10. var=[]
  11.  
  12. for i in df["Gender"]:
  13.     for x in df["Dept"]:
  14.  
  15.         if i[2].lower() in 'm' and x[2:4].lower() in 'mk':
  16.             var.append('Male in Marketing')
  17.         elif i[2].lower()in 'f' and x[2:4].lower() in 'fn':
  18.             var.append('Female in Finance')
  19.         else:
  20.             var.append('Others')
  21.  
Jul 8 '20 #2
Hi There,

Thanks for this,

However, still, I am getting the following error after running the above code:

Is there any better way to enhance the code to get the right output.

Expand|Select|Wrap|Line Numbers
  1.  
  2.  
  3. var=[]
  4.  
  5. for i in df["Gender"]:
  6.     for x in df["Dept"]:
  7.  
  8.         if i[2].lower() in 'm' and x[2:4].lower() in 'mk':
  9.             var.append('Male in Marketing')
  10.         elif i[2].lower()in 'f' and x[2:4].lower() in 'fn':
  11.             var.append('Female in Finance')
  12.         else:
  13.             var.append('Others')
  14.  
  15. df["new_col"]=var
  16. df.head()
  17.  
  18. Error message below
  19.  
  20.  
  21. ValueError                                Traceback (most recent call last)
  22. <ipython-input-93-dd3e254bfbaf> in <module>
  23. ----> 1 df["new_col"]=var
  24.       2 df.head(5)
  25.  
  26. H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
  27.    2936         else:
  28.    2937             # set column
  29. -> 2938             self._set_item(key, value)
  30.    2939 
  31.    2940     def _setitem_slice(self, key, value):
  32.  
  33. H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
  34.    2998 
  35.    2999         self._ensure_valid_index(value)
  36. -> 3000         value = self._sanitize_column(key, value)
  37.    3001         NDFrame._set_item(self, key, value)
  38.    3002 
  39.  
  40. H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
  41.    3634 
  42.    3635             # turn me into an ndarray
  43. -> 3636             value = sanitize_index(value, self.index, copy=False)
  44.    3637             if not isinstance(value, (np.ndarray, Index)):
  45.    3638                 if isinstance(value, list) and len(value) > 0:
  46.  
  47. H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index, copy)
  48.     609 
  49.     610     if len(data) != len(index):
  50. --> 611         raise ValueError("Length of values does not match length of index")
  51.     612 
  52.     613     if isinstance(data, ABCIndexClass) and not copy:
  53.  
  54. ValueError: Length of values does not match length of index
  55.  
Jul 8 '20 #3
SioSio
252 128KB
Error Message: "Value length does not match index length"

The array size of df is 5, but var is 5x5 = 25.
Jul 8 '20 #4
Hi There,

Is there any workaround to satisfy the above condition.
Jul 8 '20 #5
SioSio
252 128KB
It can use the built-in function zip() to get the values ​​of multiple columns at once.

Expand|Select|Wrap|Line Numbers
  1. for Gender, Dept in zip(df['Gender'], df['Dept']):
  2.     if Gender[2].lower() in 'm' and Dept[2:4].lower() in 'mk':
  3.         var.append('Male in Marketing')
  4.     elif Gender[2].lower()in 'f' and Dept[2:4].lower() in 'fn':
  5.         var.append('Female in Finance')
  6.     else:
  7.         var.append('Others')
  8.  
Jul 8 '20 #6
HI SioSio,

Thanks for the advice and help with this.

Kind regards,
CK
Jul 8 '20 #7
markelvy
1 Bit
The ValueError: Length of values does not match length of index raised because the previous columns you have added in the DataFrame are not the same length as the most recent one you have attempted to add in the DataFrame. So, you need make sure that the length of the array you are assign to a new column is equal to the length of the dataframe .

The simple solution is that you first convert the list/array to a pandas Series , and then when you do assignment, missing index in the Series will be filled with NaN values .

Expand|Select|Wrap|Line Numbers
  1. df = pd.DataFrame({'X': [1,2,3,4]})
  2. df['Y'] = pd.Series([3,4])
3 Weeks Ago #8

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

7 posts views Thread by Radhika Sambamurti | last post: by
1 post views Thread by CARIGAR | last post: by
xarzu
2 posts views Thread by xarzu | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.