By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,470 Members | 966 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,470 IT Pros & Developers. It's quick & easy.

Number and Character Formats

P: 15
Hi There,

I need your advice on formatting.

Dataset below:

Expand|Select|Wrap|Line Numbers
  1. data = {'funds': ['DOMCSP','DOMCSP','DOMFEE','DOMFEE','INTON','INTON','INTOFF','INTOFF'], 
  2.          'risk': [1, 2, 3, 4, 5,6,7,100]}
  3. df = pd.DataFrame(data, columns = ['funds', 'risk',])
  4. df
  5.  
If funds are DOMCSP and DOMFEE then format it as Domestic if funds are INTON and INTOFF then format it as International.

If risk 1-3 then format it as Low
if risk 4-5 then format it as Medium
if risk >=6 then format it as High

Looking for your advice.

Kind regards,
CK
4 Weeks Ago #1

✓ answered by SioSio

It can replace String Type with String Type, but it can't replace Integer Type ​​with String Type (strictly possible, but not recommended).

The code below removes the "risk" from the dataframe and adds a new "risk" dataframe.

Expand|Select|Wrap|Line Numbers
  1. df.replace('DOMCSP', 'Domestic', inplace=True)
  2. df.replace('DOMFEE', 'Domestic', inplace=True)
  3. df.replace('INTON', 'International', inplace=True)
  4. df.replace('INTOFF', 'International', inplace=True)
  5. risk=[]
  6. for j in df["risk"]:
  7.     if j >= 1 and j <= 3:
  8.         risk.append('Low')
  9.     elif j >= 4 and j <= 5:
  10.         risk.append('Medium')
  11.     elif j >= 6:
  12.         risk.append('High')
  13.     else:
  14.         risk.append('')
  15. df.drop('risk', axis=1)
  16. df ['risk'] = risk

Share this Question
Share on Google+
9 Replies


100+
P: 110
I don't know what "FORMAT" means, but in the following example add it to df as format1 and format2.
Expand|Select|Wrap|Line Numbers
  1. data = {'funds': ['DOMCSP','DOMCSP','DOMFEE','DOMFEE','INTON','INTON','INTOFF','INTOFF'],
  2.          'risk': [1, 2, 3, 4, 5,6,7,100]}
  3. df = pd.DataFrame(data, columns = ['funds', 'risk',])
  4. df
  5.  
  6. format1=[]
  7. format2=[]
  8. for i in df["funds"]:
  9.     if i == 'DOMCSP' or i == 'DOMFEE':
  10.         format1.append('Domestic')
  11.     elif i == 'INTON' or i == 'INTOFF':
  12.         format1.append('International')
  13. for j in df["risk"]:
  14.     if j >= 1 and j <= 3:
  15.         format2.append('Low')
  16.     elif j >= 4 and j <= 5:
  17.         format2.append('Medium')
  18.     elif j >= 6:
  19.         format2.append('High')
  20. df ["format1"] = format1
  21. df ["format2"] = format2
  22. print(df)
  23.  
4 Weeks Ago #2

P: 15
HI There,

Sorry, we use that term format in SAS a lot.

I am talking something like this below
Expand|Select|Wrap|Line Numbers
  1. broad_format={"DOMCSP":"Domestic Common Wealth Supported","DOMFEE":"Domestic Fee Paying",
  2.              "DOMRTS":"Domestic Research Training Scheme",
  3.            "INTOFF":"International Offshore","INTON":"International"}
  4.  
  5. crse_type_format={1:"Higher Doctorate",2:"Doctorate by Research",10:"Bachelor's Pass",
  6.                 11:"Graduate Certificate",12:"Doctorate by Coursework"}
Expand|Select|Wrap|Line Numbers
  1. enrol_v1.replace({"broad_funds":broad_format,"course_type":crse_type_format})
Is it something which can be done just like above.
4 Weeks Ago #3

100+
P: 110
Will it create a new DataFrame "enrol_v1" from "df" and the conditions?

Expand|Select|Wrap|Line Numbers
  1. data = {'funds': ['DOMCSP','DOMCSP','DOMFEE','DOMFEE','INTON','INTON','INTOFF','INTOFF'],
  2.          'risk': [1, 2, 3, 4, 5,6,7,100]}
  3. df = pd.DataFrame(data, columns = ['funds', 'risk',])
  4. df
  5. enrol_v1 = pd.DataFrame(columns = ['broad_funds','broad_format','course_type','crse_type_format'])
  6.  
  7. broad_format=[]
  8. crse_type_format=[]
  9. for i in df["funds"]:
  10.     if i == 'DOMCSP' or i == 'DOMFEE':
  11.         broad_format.append('Domestic')
  12.     elif i == 'INTON' or i == 'INTOFF':
  13.         broad_format.append('International')
  14.     else:
  15.         broad_format.append('')
  16. for j in df["risk"]:
  17.     if j >= 1 and j <= 3:
  18.         crse_type_format.append('Low')
  19.     elif j >= 4 and j <= 5:
  20.         crse_type_format.append('Medium')
  21.     elif j >= 6:
  22.         crse_type_format.append('High')
  23.     else:
  24.         crse_type_format.append('')
  25.  
  26. enrol_v1 ['broad_funds'] =   df['funds']
  27. enrol_v1 ['broad_format'] = broad_format
  28. enrol_v1 ['course_type'] =   df['risk']
  29. enrol_v1 ['crse_type_format'] = crse_type_format
  30. print(enrol_v1)
4 Weeks Ago #4

P: 15
Hi There,
Sorry for the confusion.

I don't want to create another data frame or the condition:
what I meant was similar to the recent post I want to use REPLACE function.
There was a post in StackOverflow using Apply.
https://stackoverflow.com/questions/...ting-in-python
I am able to do it for a single observation, but when it comes to multiple observations I am not able to.
4 Weeks Ago #5

P: 15
Hi,
Just to give you a heads up.
This is the format I have created in SAS. and the format name is Broad Funds, I can use this format Broad Funds anytime I want in my SAS coding. Its just one-time formatting and calling that format whenever I required in my coding.
Expand|Select|Wrap|Line Numbers
  1. Value $Broad_Funds
  2. 'DOMCSP','DOMRTS','DOMRTP'= " Domestic C'wealth Sup"
  3. 'DOMFEE'="Domestic Fee-Paying"
  4. 'INTON','INTRTP' =" International On-shore"
  5. 'INTOFF'="International Off-shore";
Instead of a short name, I have created SAS format Broad_Funds which is an abbreviation.
Let me know if this helps you to understand my requirement.
4 Weeks Ago #6

100+
P: 110
It can replace String Type with String Type, but it can't replace Integer Type ​​with String Type (strictly possible, but not recommended).

The code below removes the "risk" from the dataframe and adds a new "risk" dataframe.

Expand|Select|Wrap|Line Numbers
  1. df.replace('DOMCSP', 'Domestic', inplace=True)
  2. df.replace('DOMFEE', 'Domestic', inplace=True)
  3. df.replace('INTON', 'International', inplace=True)
  4. df.replace('INTOFF', 'International', inplace=True)
  5. risk=[]
  6. for j in df["risk"]:
  7.     if j >= 1 and j <= 3:
  8.         risk.append('Low')
  9.     elif j >= 4 and j <= 5:
  10.         risk.append('Medium')
  11.     elif j >= 6:
  12.         risk.append('High')
  13.     else:
  14.         risk.append('')
  15. df.drop('risk', axis=1)
  16. df ['risk'] = risk
3 Weeks Ago #7

P: 15
Hi There,
Thanks for the advice.
I will make a note of it.
3 Weeks Ago #8

100+
P: 110
For reference,
It also shows how direct replacement code uses "slices".
But, this will put a warning message.
Expand|Select|Wrap|Line Numbers
  1. l = 0
  2. for k in df['risk']:
  3.     if k >= 1 and k <= 3:
  4.         df['risk'][l] = 'Low'
  5.     elif k >= 4 and k <= 5:
  6.         df['risk'][l] = 'Medium'
  7.     elif k >= 6:
  8.         df['risk'][l] = 'High'
  9.     l = l + 1
3 Weeks Ago #9

P: 15
Hi There,
Thanks for this.
3 Weeks Ago #10

Post your reply

Sign in to post your reply or Sign up for a free account.